cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::rtc::compilation_options_t< ptx > Class Template Referencefinal

Options for JIT-compilation of CUDA PTX code. More...

#include <compilation_options.hpp>

Inheritance diagram for cuda::rtc::compilation_options_t< ptx >:
Collaboration diagram for cuda::rtc::compilation_options_t< ptx >:

Classes

struct  caching_mode_spec_t
 Options for fully-specifying a caching mode. More...
 

Public Member Functions

optional< caching_mode_t< memory_operation_t::load > > & default_load_caching_mode () override
 Get a reference to the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode. More...
 
optional< caching_mode_t< memory_operation_t::load > > default_load_caching_mode () const override
 Get the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode. More...
 
::std::vector<::std::string > & entries ()
 
::std::vector<::std::string > & kernels ()
 
::std::vector<::std::string > & kernel_names ()
 
- Public Member Functions inherited from cuda::rtc::compilation_options_base_t< ptx >
compilation_options_base_tadd_target (device::compute_capability_t compute_capability)
 Have the compilation also target a specific compute capability. More...
 
compilation_options_base_tset_target (device::compute_capability_t compute_capability)
 Have the compilation target one one specific compute capability. More...
 
compilation_options_base_tset_target (device_t device)
 

Public Attributes

bool parse_without_code_generation { false }
 Makes the PTX compiler run without producing any CUBIN output (for PTX verification only)
 
bool allow_expensive_optimizations_below_O2 { false }
 Allow the JIT compiler to perform expensive optimizations using maximum available resources (memory and compile-time). More...
 
bool compile_as_tools_patch { false }
 Compile as patch code for CUDA tools. More...
 
bool compile_extensible_whole_program { false }
 Expecting only whole-programs to be directly usable, allow some calls to not be resolved until device-side linking is performed (see link_t).
 
bool use_fused_multiply_add { true }
 Enable the contraction of multiplcations-followed-by-additions (or subtractions) into single fused instructions (FMAD, FFMA, DFMA)
 
bool verbose { false }
 Print code generation statistics along with the compilation log.
 
bool dont_merge_basicblocks { false }
 Prevent the compiler from merging consecutive basic blocks (https://en.wikipedia.org/wiki/Basic_block) into a single block. More...
 
bool disable_warnings { false }
 The equivalent of suppressing all findings which currently trigger a warning.
 
bool disable_optimizer_constants { false }
 Disable use of the "optimizer constant bank" feature.
 
bool return_at_end_of_kernel { false }
 Prevents the optimizing away of the return instruction at the end of a program (a kernel?), making it possible to set a breakpoint just at that point.
 
bool preserve_variable_relocations { false }
 Generate relocatable references for variables and preserve relocations generated for them in the linked executable. More...
 
struct cuda::rtc::compilation_options_t< ptx >:: { ... }  situation_warnings
 Warnings about situations likely to result in poor performance or other problems. More...
 
struct cuda::rtc::compilation_options_t< ptx >:: { ... }  maximum_register_counts
 Limits on the number of registers which generated object code (of different kinds) is allowed to use.
 
struct cuda::rtc::compilation_options_t< ptx >:: { ... }  caching_modes
 
::std::vector<::std::string > mangled_entry_function_names
 Specifies the GPU kernels, or __global__ functions in CUDA-C++ terms, or .entry functions in PTX terms, for which code must be generated. More...
 
bool double_precision_ops { false }
 
bool local_memory_use { false }
 
bool registers_spill_to_local_memory { false }
 
bool indeterminable_stack_size { true }
 
bool double_demotion { false }
 
optional< rtc::ptx_register_count_tkernel {}
 
optional< rtc::ptx_register_count_tdevice_function {}
 
caching_mode_spec_t default_ {}
 The caching mode to be used for instructions which don't specify a caching mode.
 
caching_mode_spec_t forced {}
 A potential forcing of the caching mode, overriding even what instructions themselves specify.
 
- Public Attributes inherited from cuda::rtc::compilation_options_base_t< ptx >
::std::unordered_set< cuda::device::compute_capability_ttargets_
 Target devices in terms of CUDA compute capability. More...
 
- Public Attributes inherited from cuda::rtc::common_ptx_compilation_options_t
optional< ptx_register_count_tmax_num_registers_per_thread {}
 Limit the number of registers which a kernel thread may use.
 
optional< grid::block_dimension_tmin_num_threads_per_block {}
 The minimum number of threads per block which the compiler should target.
 
optional< optimization_level_toptimization_level {}
 Compilation optimization level (as in -O1, -O2 etc.)
 
optional< device::compute_capability_tspecific_target
 Which NVIDIA physical architecture to generate SASS code for.
 
bool generate_source_line_info {false}
 Generate indications of which PTX/SASS instructions correspond to which lines of the source code, within the compiled output.
 
bool generate_debug_info {false}
 Generate debugging information associating SASS instructions to locations in the source, embedding it within the compilation output (-g)
 
optional< caching_mode_t< memory_operation_t::load > > default_load_caching_mode_
 Which of the memory-load-instruction caching modes (see {caching_mode_t}) to use by default, when no caching mode is specified in a PTX instruction. More...
 
bool generate_relocatable_device_code { false }
 Generate relocatable code that can be linked with other relocatable device code. More...
 

Additional Inherited Members

- Public Types inherited from cuda::rtc::compilation_options_base_t< ptx >
using optional = cuda::optional< T >
 

Detailed Description

template<>
class cuda::rtc::compilation_options_t< ptx >

Options for JIT-compilation of CUDA PTX code.

Member Function Documentation

◆ default_load_caching_mode() [1/2]

optional<caching_mode_t<memory_operation_t::load> >& cuda::rtc::compilation_options_t< ptx >::default_load_caching_mode ( )
inlineoverridevirtual

Get a reference to the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode.

Reimplemented from cuda::rtc::common_ptx_compilation_options_t.

◆ default_load_caching_mode() [2/2]

optional<caching_mode_t<memory_operation_t::load> > cuda::rtc::compilation_options_t< ptx >::default_load_caching_mode ( ) const
inlineoverridevirtual

Get the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode.

Reimplemented from cuda::rtc::common_ptx_compilation_options_t.

Member Data Documentation

◆ allow_expensive_optimizations_below_O2

bool cuda::rtc::compilation_options_t< ptx >::allow_expensive_optimizations_below_O2 { false }

Allow the JIT compiler to perform expensive optimizations using maximum available resources (memory and compile-time).

◆ compile_as_tools_patch

bool cuda::rtc::compilation_options_t< ptx >::compile_as_tools_patch { false }

Compile as patch code for CUDA tools.

Note
:
  1. Cannot Shall not be used in conjunction with parse_without_code_generation or {compile_extensible_whole_program}.
  2. Some PTX ISA features may not be usable in this compilation mode.

◆ dont_merge_basicblocks

bool cuda::rtc::compilation_options_t< ptx >::dont_merge_basicblocks { false }

Prevent the compiler from merging consecutive basic blocks (https://en.wikipedia.org/wiki/Basic_block) into a single block.

Normally, the compiler attempts to merge consecutive "basic blocks" as part of its optimization process. However, for debuggable code this is very confusing.

◆ mangled_entry_function_names

::std::vector<::std::string> cuda::rtc::compilation_options_t< ptx >::mangled_entry_function_names

Specifies the GPU kernels, or __global__ functions in CUDA-C++ terms, or .entry functions in PTX terms, for which code must be generated.

Note
The PTX source may contain code for additional .entry functions.

◆ preserve_variable_relocations

bool cuda::rtc::compilation_options_t< ptx >::preserve_variable_relocations { false }

Generate relocatable references for variables and preserve relocations generated for them in the linked executable.

◆ situation_warnings

struct { ... } cuda::rtc::compilation_options_t< ptx >::situation_warnings

Warnings about situations likely to result in poor performance or other problems.


The documentation for this class was generated from the following file: