cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Public Member Functions | Public Attributes | Static Public Attributes | List of all members
cuda::rtc::compilation_options_t Struct Reference
Collaboration diagram for cuda::rtc::compilation_options_t:
Collaboration graph
[legend]

Public Member Functions

compilation_options_tadd_target (device::compute_capability_t compute_capability)
 Have the compilation also target a specific compute capability. More...
 
compilation_options_tset_target (device::compute_capability_t compute_capability)
 Have the compilation target one one specific compute capability. More...
 
compilation_options_tset_target (device_t device)
 
compilation_options_tset_language_dialect (cpp_dialect_t dialect)
 
compilation_options_tclear_language_dialect ()
 
compilation_options_tset_language_dialect (const char *dialect_name)
 
compilation_options_tset_language_dialect (const ::std::string &dialect_name)
 
compilation_options_tsuppress_error (error::number_t error_number)
 
compilation_options_ttreat_as_error (error::number_t error_number)
 
compilation_options_twarn_about (error::number_t error_number)
 

Public Attributes

::std::unordered_set< cuda::device::compute_capability_ttargets_
 Target devices in terms of CUDA compute capability. More...
 
bool generate_relocatable_code { false }
 Generate relocatable code that can be linked with other relocatable device code. More...
 
bool compile_extensible_whole_program { false }
 Do extensible whole program compilation of device code. More...
 
bool debug { false }
 Generate debugging information (and perhaps limit optimizations?); see also generate_line_info.
 
bool generate_line_info { false }
 Generate information for translating compiled code line numbers to source code line numbers.
 
bool support_128bit_integers { false }
 Allow the use of the 128-bit __int128 type in the code.
 
bool indicate_function_inlining { false }
 emit a remark when a function is inlined
 
bool compiler_self_identification { false }
 Print a self-identification string indicating which compiler produced the code, in the compilation result.
 
size_t maximum_register_count { do_not_set_register_count }
 Specify the maximum amount of registers that GPU functions can use. More...
 
bool flush_denormal_floats_to_zero { false }
 When performing single-precision floating-point operations, flush denormal values to zero. More...
 
bool use_precise_square_root { true }
 For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation. More...
 
bool use_precise_division { true }
 For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation. More...
 
bool use_fused_multiply_add { true }
 Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). More...
 
bool use_fast_math { false }
 Make use of fast math operations. More...
 
bool link_time_optimization { false }
 Do not compile fully into PTX/Cubin. More...
 
bool extra_device_vectorization { false }
 Enables more aggressive device code vectorization in the NVVM optimizer.
 
bool specify_language_dialect { false }
 
cpp_dialect_t language_dialect { cpp_dialect_t::cpp03 }
 Set language dialect to C++03, C++11, C++14 or C++17.
 
::std::unordered_set<::std::string > no_value_defines
 
::std::unordered_map<::std::string,::std::string > valued_defines
 
bool disable_warnings { false }
 
bool assume_restrict { false }
 Treat all kernel pointer parameters as if they had the restrict (or __restrict) qualifier.
 
bool default_execution_space_is_device { false }
 Assume functions without an explicit specification of their execution space are __device__ rather than __host__ functions.
 
bool display_error_numbers { true }
 Display (error) numbers for warning (and error?) messages, in addition to the message itself.
 
::std::vector<::std::string > additional_include_paths
 A sequence of directories to be searched for headers. More...
 
::std::vector<::std::string > preinclude_files
 Header files to preinclude during preprocessing of the source. More...
 
bool builtin_move_and_forward { true }
 Provide builtin definitions of ::std::move and ::std::forward. More...
 
bool increase_stack_limit_to_max { true }
 Use setrlimit() to increase the stack size to the maximum the OS allows. More...
 
bool builtin_initializer_list { true }
 Provide builtin definitions of ::std::initializer_list class and member functions. More...
 
::std::vector<::std::string > extra_options
 Support for additional, arbitrary options which may not be covered by other fields in this class (e.g. More...
 
::std::unordered_map< error::number_t, error::handling_method_t > error_handling_overrides
 

Static Public Attributes

static constexpr const size_t do_not_set_register_count { 0 }
 

Member Function Documentation

◆ add_target()

compilation_options_t& cuda::rtc::compilation_options_t::add_target ( device::compute_capability_t  compute_capability)
inline

Have the compilation also target a specific compute capability.

Note
previously-specified compute capabilities will be targeted in addition to the one specified.

◆ set_target()

compilation_options_t& cuda::rtc::compilation_options_t::set_target ( device::compute_capability_t  compute_capability)
inline

Have the compilation target one one specific compute capability.

Note
any previous target settings are dropped, i.e. no other compute capability will be targeted.

Member Data Documentation

◆ additional_include_paths

::std::vector<::std::string> cuda::rtc::compilation_options_t::additional_include_paths

A sequence of directories to be searched for headers.

These paths are searched after the list of headers given to nvrtcCreateProgram.

Note
The members here are std::string's rather than const char* or std::string_view's, since this class is a value-type, and cannot rely someone else keeping these strings alive.
Todo:
In C++17, consider making the elements std::filesystem::path's.

◆ builtin_initializer_list

bool cuda::rtc::compilation_options_t::builtin_initializer_list { true }

Provide builtin definitions of ::std::initializer_list class and member functions.

Note
Only relevant when the dialect is C++11 or later.

◆ builtin_move_and_forward

bool cuda::rtc::compilation_options_t::builtin_move_and_forward { true }

Provide builtin definitions of ::std::move and ::std::forward.

Note
Only relevant when the dialect is C++11 or later.

◆ compile_extensible_whole_program

bool cuda::rtc::compilation_options_t::compile_extensible_whole_program { false }

Do extensible whole program compilation of device code.

Todo:
explain what that is.

◆ extra_options

::std::vector<::std::string> cuda::rtc::compilation_options_t::extra_options

Support for additional, arbitrary options which may not be covered by other fields in this class (e.g.

due to newer CUDA versions providing them)

Note
These are appended to the command-line verbatim (so, no prefixing with - signs, no combining pairs of consecutive elements as opt=value etc.)

◆ flush_denormal_floats_to_zero

bool cuda::rtc::compilation_options_t::flush_denormal_floats_to_zero { false }

When performing single-precision floating-point operations, flush denormal values to zero.

use_fast_math implies setting this to true.

◆ generate_relocatable_code

bool cuda::rtc::compilation_options_t::generate_relocatable_code { false }

Generate relocatable code that can be linked with other relocatable device code.

It is equivalent to

Note
equivalent to "--relocatable-device-code" or "-rdc" for NVCC.

◆ increase_stack_limit_to_max

bool cuda::rtc::compilation_options_t::increase_stack_limit_to_max { true }

Use setrlimit() to increase the stack size to the maximum the OS allows.

The limit is reverted to its previous value after compilation.

Note
:
  1. Only works on Linux
  2. Affects the entire process, not just the thread invoking the compilation command.

◆ link_time_optimization

bool cuda::rtc::compilation_options_t::link_time_optimization { false }

Do not compile fully into PTX/Cubin.

Instead, only generate NVVM (the LLVM IR variant), which is combined with other NVVM pieces from LTO-compiled "objects", at device link time.

◆ maximum_register_count

size_t cuda::rtc::compilation_options_t::maximum_register_count { do_not_set_register_count }

Specify the maximum amount of registers that GPU functions can use.

Until a function-specific limit, a higher value will generally increase the performance of individual GPU threads that execute this function. However, because thread registers are allocated from a global register pool on each GPU, a higher value of this option will also reduce the maximum thread block size, thereby reducing the amount of thread parallelism. Hence, a good maxrregcount value is the result of a trade-off. If this option is not specified, then no maximum is assumed. Value less than the minimum registers required by ABI will be bumped up by the compiler to ABI minimum limit.

Note
Set this to do_not_set_register_count to not pass this as a compilation option.
Todo:
Use ::std::optional

◆ preinclude_files

::std::vector<::std::string> cuda::rtc::compilation_options_t::preinclude_files

Header files to preinclude during preprocessing of the source.

Note
The members here are std::string's rather than const char* or std::string_view's, since this class is a value-type, and cannot rely someone else keeping these strings alive.
Todo:
In C++17, consider making the elements std::filesystem::path's.
Todo:
Check how these strings are interpreted. Do they need quotation marks? brackets? full paths?

◆ targets_

::std::unordered_set<cuda::device::compute_capability_t> cuda::rtc::compilation_options_t::targets_

Target devices in terms of CUDA compute capability.

Note
Given a compute capability X.Y, the compilation API call will be passed "sm_XY", not "compute_XY". The distinction between the two is not currently supported.
Not all compute capabilities are supported! As of CUDA 11.0, the minimum supported compute capability is 3.5 .
As of CUDA 11.0, the default is compute_52.
Todo:
Use something less fancy than ::std::unordered_set, e.g. a vector-backed ordered set or a dynamic bit-vector for membership.

◆ use_fast_math

bool cuda::rtc::compilation_options_t::use_fast_math { false }

Make use of fast math operations.

Implies use_fused_multiply_add, not use_precise_division and not use_precise_square_root.

◆ use_fused_multiply_add

bool cuda::rtc::compilation_options_t::use_fused_multiply_add { true }

Enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA).

use_fast_math implies setting this to false.

◆ use_precise_division

bool cuda::rtc::compilation_options_t::use_precise_division { true }

For single-precision floating-point division and reciprocals, use IEEE round-to-nearest mode or use a faster approximation.

use_fast_math implies setting this to false.

◆ use_precise_square_root

bool cuda::rtc::compilation_options_t::use_precise_square_root { true }

For single-precision floating-point square root, use IEEE round-to-nearest mode or use a faster approximation.

use_fast_math implies setting this to false.


The documentation for this struct was generated from the following file: