Options for JIT-compilation of CUDA PTX code. More...

#include <compilation_options.hpp>

Inheritance diagram for cuda::rtc::compilation_options_t< ptx >:

Collaboration diagram for cuda::rtc::compilation_options_t< ptx >:

Classes
struct	caching_mode_spec_t
	Options for fully-specifying a caching mode. More...

Public Member Functions
optional< caching_mode_t< memory_operation_t::load > > &	default_load_caching_mode () override
	Get a reference to the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode. More...

optional< caching_mode_t< memory_operation_t::load > >	default_load_caching_mode () const override
	Get the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode. More...

::std::vector<::std::string > &	entries ()

::std::vector<::std::string > &	kernels ()

::std::vector<::std::string > &	kernel_names ()

Public Member Functions inherited from cuda::rtc::compilation_options_base_t< ptx >
compilation_options_base_t &	add_target (device::compute_capability_t compute_capability)
	Have the compilation also target a specific compute capability. More...

compilation_options_base_t &	set_target (device::compute_capability_t compute_capability)
	Have the compilation target one one specific compute capability. More...

compilation_options_base_t &	set_target (device_t device)

Public Attributes
bool	parse_without_code_generation { false }
	Makes the PTX compiler run without producing any CUBIN output (for PTX verification only)

bool	allow_expensive_optimizations_below_O2 { false }
	Allow the JIT compiler to perform expensive optimizations using maximum available resources (memory and compile-time). More...

bool	compile_as_tools_patch { false }
	Compile as patch code for CUDA tools. More...

bool	compile_extensible_whole_program { false }
	Expecting only whole-programs to be directly usable, allow some calls to not be resolved until device-side linking is performed (see link_t).

bool	use_fused_multiply_add { true }
	Enable the contraction of multiplcations-followed-by-additions (or subtractions) into single fused instructions (FMAD, FFMA, DFMA)

bool	verbose { false }
	Print code generation statistics along with the compilation log.

bool	dont_merge_basicblocks { false }
	Prevent the compiler from merging consecutive basic blocks (https://en.wikipedia.org/wiki/Basic_block) into a single block. More...

bool	disable_warnings { false }
	The equivalent of suppressing all findings which currently trigger a warning.

bool	disable_optimizer_constants { false }
	Disable use of the "optimizer constant bank" feature.

bool	return_at_end_of_kernel { false }
	Prevents the optimizing away of the return instruction at the end of a program (a kernel?), making it possible to set a breakpoint just at that point.

bool	preserve_variable_relocations { false }
	Generate relocatable references for variables and preserve relocations generated for them in the linked executable. More...

struct cuda::rtc::compilation_options_t< ptx >:: { ... }	situation_warnings
	Warnings about situations likely to result in poor performance or other problems. More...

struct cuda::rtc::compilation_options_t< ptx >:: { ... }	maximum_register_counts
	Limits on the number of registers which generated object code (of different kinds) is allowed to use.

struct cuda::rtc::compilation_options_t< ptx >:: { ... }	caching_modes

::std::vector<::std::string >	mangled_entry_function_names
	Specifies the GPU kernels, or `__global__` functions in CUDA-C++ terms, or `.entry` functions in PTX terms, for which code must be generated. More...

bool	double_precision_ops { false }

bool	local_memory_use { false }

bool	registers_spill_to_local_memory { false }

bool	indeterminable_stack_size { true }

bool	double_demotion { false }

optional< rtc::ptx_register_count_t >	kernel {}

optional< rtc::ptx_register_count_t >	device_function {}

caching_mode_spec_t	default_ {}
	The caching mode to be used for instructions which don't specify a caching mode.

caching_mode_spec_t	forced {}
	A potential forcing of the caching mode, overriding even what instructions themselves specify.

Public Attributes inherited from cuda::rtc::compilation_options_base_t< ptx >
::std::unordered_set< cuda::device::compute_capability_t >	targets_
	Target devices in terms of CUDA compute capability. More...

Public Attributes inherited from cuda::rtc::common_ptx_compilation_options_t
optional< ptx_register_count_t >	max_num_registers_per_thread {}
	Limit the number of registers which a kernel thread may use.

optional< grid::block_dimension_t >	min_num_threads_per_block {}
	The minimum number of threads per block which the compiler should target.

optional< optimization_level_t >	optimization_level {}
	Compilation optimization level (as in -O1, -O2 etc.)

optional< device::compute_capability_t >	specific_target
	Which NVIDIA physical architecture to generate SASS code for.

bool	generate_source_line_info {false}
	Generate indications of which PTX/SASS instructions correspond to which lines of the source code, within the compiled output.

bool	generate_debug_info {false}
	Generate debugging information associating SASS instructions to locations in the source, embedding it within the compilation output (-g)

optional< caching_mode_t< memory_operation_t::load > >	default_load_caching_mode_
	Which of the memory-load-instruction caching modes (see {caching_mode_t}) to use by default, when no caching mode is specified in a PTX instruction. More...

bool	generate_relocatable_device_code { false }
	Generate relocatable code that can be linked with other relocatable device code. More...

Additional Inherited Members
Public Types inherited from cuda::rtc::compilation_options_base_t< ptx >
using	optional = cuda::optional< T >

Detailed Description

template<>
class cuda::rtc::compilation_options_t< ptx >

Options for JIT-compilation of CUDA PTX code.

Member Function Documentation

◆ default_load_caching_mode() [1/2]

optional<caching_mode_t<memory_operation_t::load> >& cuda::rtc::compilation_options_t< ptx >::default_load_caching_mode ( )

inlineoverridevirtual

Get a reference to the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode.

Reimplemented from cuda::rtc::common_ptx_compilation_options_t.

◆ default_load_caching_mode() [2/2]

optional<caching_mode_t<memory_operation_t::load> > cuda::rtc::compilation_options_t< ptx >::default_load_caching_mode ( ) const

inlineoverridevirtual

Get the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode.

Reimplemented from cuda::rtc::common_ptx_compilation_options_t.

Member Data Documentation

◆ allow_expensive_optimizations_below_O2

bool cuda::rtc::compilation_options_t< ptx >::allow_expensive_optimizations_below_O2 { false }

Allow the JIT compiler to perform expensive optimizations using maximum available resources (memory and compile-time).

◆ compile_as_tools_patch

bool cuda::rtc::compilation_options_t< ptx >::compile_as_tools_patch { false }

Compile as patch code for CUDA tools.

Note: :

Cannot Shall not be used in conjunction with parse_without_code_generation or {compile_extensible_whole_program}.
Some PTX ISA features may not be usable in this compilation mode.

◆ dont_merge_basicblocks

bool cuda::rtc::compilation_options_t< ptx >::dont_merge_basicblocks { false }

Prevent the compiler from merging consecutive basic blocks (https://en.wikipedia.org/wiki/Basic_block) into a single block.

Normally, the compiler attempts to merge consecutive "basic blocks" as part of its optimization process. However, for debuggable code this is very confusing.

◆ mangled_entry_function_names

::std::vector<::std::string> cuda::rtc::compilation_options_t< ptx >::mangled_entry_function_names

Specifies the GPU kernels, or __global__ functions in CUDA-C++ terms, or .entry functions in PTX terms, for which code must be generated.

Note: The PTX source may contain code for additional .entry functions.

◆ preserve_variable_relocations

bool cuda::rtc::compilation_options_t< ptx >::preserve_variable_relocations { false }

Generate relocatable references for variables and preserve relocations generated for them in the linked executable.

◆ situation_warnings

struct { ... } cuda::rtc::compilation_options_t< ptx >::situation_warnings

Warnings about situations likely to result in poor performance or other problems.

The documentation for this class was generated from the following file:

src/cuda/rtc/compilation_options.hpp

Classes

Public Member Functions

Public Attributes

Additional Inherited Members