|
cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Options for JIT-compilation of CUDA C++ code. More...
#include <compilation_options.hpp>


Public Types | |
| using | parent = compilation_options_base_t< cuda_cpp > |
Public Types inherited from cuda::rtc::compilation_options_base_t< cuda_cpp > | |
| using | optional = cuda::optional< T > |
Public Member Functions | |
| compilation_options_t & | clear_language_dialect () |
| Let the compiler interpret the program source code using its default-assumption for the C++ language dialect. | |
| compilation_options_t & | set_language_dialect (cpp_dialect_t dialect) |
| Set which dialect of the C++ language the compiler will try to interpret the program source code as. More... | |
| compilation_options_t & | set_language_dialect (const char *dialect_name) |
| Set which dialect of the C++ language the compiler will try to interpret the program source code as. More... | |
| compilation_options_t & | set_language_dialect (const ::std::string &dialect_name) |
| Set which dialect of the C++ language the compiler will try to interpret the program source code as. More... | |
| compilation_options_t & | suppress_error (error::number_t error_number) |
| Ignore compiler findings of the specified number (rather than warnings about them or raising an error) | |
| compilation_options_t & | treat_as_error (error::number_t error_number) |
| Treat compiler findings of the specified number as an error (rather than suppressing them or just warning about them) | |
| compilation_options_t & | warn_about (error::number_t error_number) |
| Treat compiler findings of the specified number as warnings (rather than raising an error or ignoring them) | |
Public Member Functions inherited from cuda::rtc::compilation_options_base_t< cuda_cpp > | |
| compilation_options_base_t & | add_target (device::compute_capability_t compute_capability) |
| Have the compilation also target a specific compute capability. More... | |
| compilation_options_base_t & | set_target (device::compute_capability_t compute_capability) |
| Have the compilation target one one specific compute capability. More... | |
| compilation_options_base_t & | set_target (device_t device) |
Public Member Functions inherited from cuda::rtc::common_ptx_compilation_options_t | |
| virtual optional< caching_mode_t< memory_operation_t::load > > & | default_load_caching_mode () |
| see default_load_caching_mode_ | |
| virtual optional< caching_mode_t< memory_operation_t::load > > | default_load_caching_mode () const |
Public Attributes | |
| bool | compile_extensible_whole_program { false } |
| Do extensible whole program compilation of device code. More... | |
| bool | optimize_device_code_in_debug_mode { false } |
| If debug mode is enabled, perform limited optimizations of device code rather than none at all. More... | |
| bool | support_128bit_integers { false } |
Allow the use of the 128-bit __int128 type in the code. | |
| bool | indicate_function_inlining { false } |
| emit a remark when a function is inlined | |
| bool | syntax_check_only { false } |
| Stop compilation after the front-end has verified the program's syntax. More... | |
| bool | less_builtins { false } |
| Have the compiler not provide support for various builtins: More... | |
| optional< size_t > | maximum_register_count { } |
| Specify the maximum amount of registers that GPU functions can use. More... | |
| bool | flush_denormal_floats_to_zero { false } |
| When performing single-precision floating-point operations, flush denormal values to zero. More... | |
| bool | use_precise_square_root { true } |
| For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. More... | |
| bool | use_precise_division { true } |
| For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation. More... | |
| bool | use_fused_multiply_add { true } |
| Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). More... | |
| bool | use_fast_math { false } |
| Make use of fast math operations. More... | |
| bool | link_time_optimization { false } |
| Do not compile fully into PTX/Cubin. More... | |
| bool | source_dirs_in_include_path { true } |
| Implicitly add the directories of source files (TODO: Which source files?) as include file search paths. More... | |
| bool | extra_device_vectorization { false } |
| Enables more aggressive device code vectorization in the LTO IR optimizer. | |
| optional< cpp_dialect_t > | language_dialect { } |
| The dialect of C++ as which the compiler will be forced to interpret the program source code. | |
| ::std::unordered_set<::std::string > | no_value_defines |
| Preprocessor macros to have the compiler define, without specifying a particular value. | |
| ::std::unordered_set<::std::string > | undefines |
| Preprocessor macros to tell the compiler to specifically _un_define. | |
| ::std::unordered_map<::std::string,::std::string > | valued_defines |
| Preprocessor macros to have the compiler define to specific values. | |
| bool | disable_warnings { false } |
| Have the compiler treat all warnings as though they were suppressed, and print nothing. | |
| bool | assume_restrict { false } |
Treat all kernel pointer parameters as if they had the restrict (or __restrict) qualifier. | |
| bool | default_execution_space_is_device { false } |
Assume functions without an explicit specification of their execution space are __device__ rather than __host__ functions. More... | |
| bool | display_error_numbers { true } |
| Display (error) numbers for warning (and error?) messages, in addition to the message itself. | |
| ::std::string | ptxas |
| Extra options for the PTX compiler (a.k.a. "PTX optimizing assembler"). | |
| ::std::vector<::std::string > | additional_include_paths |
| A sequence of directories to be searched for headers. More... | |
| ::std::vector<::std::string > | preinclude_files |
| Header files to preinclude during preprocessing of the source. More... | |
| bool | builtin_move_and_forward { true } |
| Provide builtin definitions of ::std::move and ::std::forward. More... | |
| bool | increase_stack_limit_to_max { true } |
Use setrlimit() to increase the stack size to the maximum the OS allows. More... | |
| bool | builtin_initializer_list { true } |
| Provide builtin definitions of ::std::initializer_list class and member functions. More... | |
| ::std::vector<::std::string > | extra_options |
| Support for additional, arbitrary options which may not be covered by other fields in this class (e.g. More... | |
| ::std::unordered_map< error::number_t, error::handling_method_t > | error_handling_overrides |
Public Attributes inherited from cuda::rtc::compilation_options_base_t< cuda_cpp > | |
| ::std::unordered_set< cuda::device::compute_capability_t > | targets_ |
| Target devices in terms of CUDA compute capability. More... | |
Public Attributes inherited from cuda::rtc::common_ptx_compilation_options_t | |
| optional< ptx_register_count_t > | max_num_registers_per_thread {} |
| Limit the number of registers which a kernel thread may use. | |
| optional< grid::block_dimension_t > | min_num_threads_per_block {} |
| The minimum number of threads per block which the compiler should target. | |
| optional< optimization_level_t > | optimization_level {} |
| Compilation optimization level (as in -O1, -O2 etc.) | |
| optional< device::compute_capability_t > | specific_target |
| Which NVIDIA physical architecture to generate SASS code for. | |
| bool | generate_source_line_info {false} |
| Generate indications of which PTX/SASS instructions correspond to which lines of the source code, within the compiled output. | |
| bool | generate_debug_info {false} |
| Generate debugging information associating SASS instructions to locations in the source, embedding it within the compilation output (-g) | |
| optional< caching_mode_t< memory_operation_t::load > > | default_load_caching_mode_ |
| Which of the memory-load-instruction caching modes (see {caching_mode_t}) to use by default, when no caching mode is specified in a PTX instruction. More... | |
| bool | generate_relocatable_device_code { false } |
| Generate relocatable code that can be linked with other relocatable device code. More... | |
Options for JIT-compilation of CUDA C++ code.
|
inline |
Set which dialect of the C++ language the compiler will try to interpret the program source code as.
|
inline |
Set which dialect of the C++ language the compiler will try to interpret the program source code as.
|
inline |
Set which dialect of the C++ language the compiler will try to interpret the program source code as.
| ::std::vector<::std::string> cuda::rtc::compilation_options_t< cuda_cpp >::additional_include_paths |
A sequence of directories to be searched for headers.
These paths are searched after the list of headers given to nvrtcCreateProgram.
::std::string's rather than const char* or ::std::string_view's, since this class is a value-type, and cannot rely someone else keeping these strings alive.::std::filesystem::path's. | bool cuda::rtc::compilation_options_t< cuda_cpp >::builtin_initializer_list { true } |
Provide builtin definitions of ::std::initializer_list class and member functions.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::builtin_move_and_forward { true } |
Provide builtin definitions of ::std::move and ::std::forward.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::compile_extensible_whole_program { false } |
Do extensible whole program compilation of device code.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::default_execution_space_is_device { false } |
Assume functions without an explicit specification of their execution space are __device__ rather than __host__ functions.
| ::std::vector<::std::string> cuda::rtc::compilation_options_t< cuda_cpp >::extra_options |
Support for additional, arbitrary options which may not be covered by other fields in this class (e.g.
due to newer CUDA versions providing them)
- signs, no combining pairs of consecutive elements as opt=value etc.) | bool cuda::rtc::compilation_options_t< cuda_cpp >::flush_denormal_floats_to_zero { false } |
When performing single-precision floating-point operations, flush denormal values to zero.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::increase_stack_limit_to_max { true } |
Use setrlimit() to increase the stack size to the maximum the OS allows.
The limit is reverted to its previous value after compilation.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::less_builtins { false } |
Have the compiler not provide support for various builtins:
cudaMalloc.cudaError_t. | bool cuda::rtc::compilation_options_t< cuda_cpp >::link_time_optimization { false } |
Do not compile fully into PTX/Cubin.
Instead, only generate NVIDIA's "LTO IR", which is combined with other LTO IR pieces from object files compiled with LTO support, at device link time.
| optional<size_t> cuda::rtc::compilation_options_t< cuda_cpp >::maximum_register_count { } |
Specify the maximum amount of registers that GPU functions can use.
Until a function-specific limit, a higher value will generally increase the performance of individual GPU threads that execute this function. However, because thread registers are allocated from a global register pool on each GPU, a higher value of this option will also reduce the maximum thread block size, thereby reducing the amount of thread parallelism. Hence, a good maxrregcount value is the result of a trade-off. If this option is not specified, then no maximum is assumed. Value less than the minimum registers required by ABI will be bumped up by the compiler to ABI minimum limit.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::optimize_device_code_in_debug_mode { false } |
If debug mode is enabled, perform limited optimizations of device code rather than none at all.
| ::std::vector<::std::string> cuda::rtc::compilation_options_t< cuda_cpp >::preinclude_files |
Header files to preinclude during preprocessing of the source.
::std::string's rather than const char* or ::std::string_view's, since this class is a value-type, and cannot rely someone else keeping these strings alive.::std::filesystem::path's.| bool cuda::rtc::compilation_options_t< cuda_cpp >::source_dirs_in_include_path { true } |
Implicitly add the directories of source files (TODO: Which source files?) as include file search paths.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::syntax_check_only { false } |
Stop compilation after the front-end has verified the program's syntax.
true, the compilation output must not be used. | bool cuda::rtc::compilation_options_t< cuda_cpp >::use_fast_math { false } |
Make use of fast math operations.
Implies use_fused_multiply_add, not use_precise_division and not use_precise_square_root.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::use_fused_multiply_add { true } |
Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA).
Setting use_fast_math implies setting this to false.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::use_precise_division { true } |
For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation.
Setting use_fast_math implies setting this to false.
| bool cuda::rtc::compilation_options_t< cuda_cpp >::use_precise_square_root { true } |
For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation.
Setting use_fast_math implies setting this to false.