cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Options for JIT-compilation of CUDA PTX code. More...
#include <compilation_options.hpp>
Classes | |
struct | caching_mode_spec_t |
Options for fully-specifying a caching mode. More... | |
Public Member Functions | |
optional< caching_mode_t< memory_operation_t::load > > & | default_load_caching_mode () override |
Get a reference to the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode. More... | |
optional< caching_mode_t< memory_operation_t::load > > | default_load_caching_mode () const override |
Get the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode. More... | |
::std::vector<::std::string > & | entries () |
::std::vector<::std::string > & | kernels () |
::std::vector<::std::string > & | kernel_names () |
![]() | |
compilation_options_base_t & | add_target (device::compute_capability_t compute_capability) |
Have the compilation also target a specific compute capability. More... | |
compilation_options_base_t & | set_target (device::compute_capability_t compute_capability) |
Have the compilation target one one specific compute capability. More... | |
compilation_options_base_t & | set_target (device_t device) |
Public Attributes | |
bool | parse_without_code_generation { false } |
Makes the PTX compiler run without producing any CUBIN output (for PTX verification only) | |
bool | allow_expensive_optimizations_below_O2 { false } |
Allow the JIT compiler to perform expensive optimizations using maximum available resources (memory and compile-time). More... | |
bool | compile_as_tools_patch { false } |
Compile as patch code for CUDA tools. More... | |
bool | compile_extensible_whole_program { false } |
Expecting only whole-programs to be directly usable, allow some calls to not be resolved until device-side linking is performed (see link_t). | |
bool | use_fused_multiply_add { true } |
Enable the contraction of multiplcations-followed-by-additions (or subtractions) into single fused instructions (FMAD, FFMA, DFMA) | |
bool | verbose { false } |
Print code generation statistics along with the compilation log. | |
bool | dont_merge_basicblocks { false } |
Prevent the compiler from merging consecutive basic blocks (https://en.wikipedia.org/wiki/Basic_block) into a single block. More... | |
bool | disable_warnings { false } |
The equivalent of suppressing all findings which currently trigger a warning. | |
bool | disable_optimizer_constants { false } |
Disable use of the "optimizer constant bank" feature. | |
bool | return_at_end_of_kernel { false } |
Prevents the optimizing away of the return instruction at the end of a program (a kernel?), making it possible to set a breakpoint just at that point. | |
bool | preserve_variable_relocations { false } |
Generate relocatable references for variables and preserve relocations generated for them in the linked executable. More... | |
struct cuda::rtc::compilation_options_t< ptx >:: { ... } | situation_warnings |
Warnings about situations likely to result in poor performance or other problems. More... | |
struct cuda::rtc::compilation_options_t< ptx >:: { ... } | maximum_register_counts |
Limits on the number of registers which generated object code (of different kinds) is allowed to use. | |
struct cuda::rtc::compilation_options_t< ptx >:: { ... } | caching_modes |
::std::vector<::std::string > | mangled_entry_function_names |
Specifies the GPU kernels, or __global__ functions in CUDA-C++ terms, or .entry functions in PTX terms, for which code must be generated. More... | |
bool | double_precision_ops { false } |
bool | local_memory_use { false } |
bool | registers_spill_to_local_memory { false } |
bool | indeterminable_stack_size { true } |
bool | double_demotion { false } |
optional< rtc::ptx_register_count_t > | kernel {} |
optional< rtc::ptx_register_count_t > | device_function {} |
caching_mode_spec_t | default_ {} |
The caching mode to be used for instructions which don't specify a caching mode. | |
caching_mode_spec_t | forced {} |
A potential forcing of the caching mode, overriding even what instructions themselves specify. | |
![]() | |
::std::unordered_set< cuda::device::compute_capability_t > | targets_ |
Target devices in terms of CUDA compute capability. More... | |
![]() | |
optional< ptx_register_count_t > | max_num_registers_per_thread {} |
Limit the number of registers which a kernel thread may use. | |
optional< grid::block_dimension_t > | min_num_threads_per_block {} |
The minimum number of threads per block which the compiler should target. | |
optional< optimization_level_t > | optimization_level {} |
Compilation optimization level (as in -O1, -O2 etc.) | |
optional< device::compute_capability_t > | specific_target |
Which NVIDIA physical architecture to generate SASS code for. | |
bool | generate_source_line_info {false} |
Generate indications of which PTX/SASS instructions correspond to which lines of the source code, within the compiled output. | |
bool | generate_debug_info {false} |
Generate debugging information associating SASS instructions to locations in the source, embedding it within the compilation output (-g) | |
optional< caching_mode_t< memory_operation_t::load > > | default_load_caching_mode_ |
Which of the memory-load-instruction caching modes (see {caching_mode_t}) to use by default, when no caching mode is specified in a PTX instruction. More... | |
bool | generate_relocatable_device_code { false } |
Generate relocatable code that can be linked with other relocatable device code. More... | |
Additional Inherited Members | |
![]() | |
using | optional = cuda::optional< T > |
Options for JIT-compilation of CUDA PTX code.
|
inlineoverridevirtual |
Get a reference to the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode.
Reimplemented from cuda::rtc::common_ptx_compilation_options_t.
|
inlineoverridevirtual |
Get the caching mode the compiler will be told to use as the default, for load instructions which don't explicitly specify a particular caching mode.
Reimplemented from cuda::rtc::common_ptx_compilation_options_t.
bool cuda::rtc::compilation_options_t< ptx >::allow_expensive_optimizations_below_O2 { false } |
Allow the JIT compiler to perform expensive optimizations using maximum available resources (memory and compile-time).
bool cuda::rtc::compilation_options_t< ptx >::compile_as_tools_patch { false } |
Compile as patch code for CUDA tools.
bool cuda::rtc::compilation_options_t< ptx >::dont_merge_basicblocks { false } |
Prevent the compiler from merging consecutive basic blocks (https://en.wikipedia.org/wiki/Basic_block) into a single block.
Normally, the compiler attempts to merge consecutive "basic blocks" as part of its optimization process. However, for debuggable code this is very confusing.
::std::vector<::std::string> cuda::rtc::compilation_options_t< ptx >::mangled_entry_function_names |
Specifies the GPU kernels, or __global__
functions in CUDA-C++ terms, or .entry
functions in PTX terms, for which code must be generated.
.entry
functions. bool cuda::rtc::compilation_options_t< ptx >::preserve_variable_relocations { false } |
Generate relocatable references for variables and preserve relocations generated for them in the linked executable.
struct { ... } cuda::rtc::compilation_options_t< ptx >::situation_warnings |
Warnings about situations likely to result in poor performance or other problems.