A non-owning wrapper for CUDA kernels - whether they be __global__
functions compiled apriori, or the result of dynamic NVRTC compilation, or obtained in some other future way.
More...
#include <kernel.hpp>
A non-owning wrapper for CUDA kernels - whether they be __global__
functions compiled apriori, or the result of dynamic NVRTC compilation, or obtained in some other future way.
- Note
- The association of a
kernel_t
with an individual device or context is somewhat tenuous. That is, the same function could be used with any other device; However, many/most of the features, attributes and settings are context-specific or device-specific.
-
NVRTC-compiled kernels can only use this class, with apriori-compiled kernels can use their own subclass.
◆ binary_compilation_target_architecture()
- Returns
- the physical microarchitecture which this kernel was compiled to target
◆ handle()
kernel::handle_t cuda::kernel_t::handle |
( |
| ) |
const |
|
inline |
Get the raw (intra-context) CUDA handle for this kernel.
- Note
- for earlier versions of CUDA, and for apriori-compiled kernels, this handle may be unavailable.
◆ max_active_blocks_per_multiprocessor()
Calculates the number of grid blocks which may be "active" on a given GPU multiprocessor simultaneously (i.e.
with warps from any of these block being schedulable concurrently)
- Parameters
-
block_size_in_threads | |
dynamic_shared_memory_per_block | |
disable_caching_override | On some GPUs, the choice of whether to cache memory reads affects occupancy. But what if this caching results in 0 potential occupancy for a kernel? There are two options, controlled by this flag. When it is set to false - the calculator will assume caching is off for the purposes of its work; when set to true, it will return 0 for such device functions. |
- See also
- The "Unified L1/Texture Cache" section of the Maxwell tuning guide regarding caching override.
◆ maximum_threads_per_block()
VIRTUAL_UNLESS_CAN_GET_APRIORI_KERNEL_HANDLE grid::block_dimension_t cuda::kernel_t::maximum_threads_per_block |
( |
| ) |
const |
|
inline |
- Returns
- the maximum number of threads per block for which the GPU device can satisfy this kernel's hardware requirement - typically, the number of registers in use.
- Note
- the kernel may have other constraints, requiring a different number of threads per block; these cannot be determined using this method.
◆ ptx_version()
- Returns
- the PTX version used as the target for the compilation of this kernel
◆ set_cache_preference()
Indicate the desired carve-out between shared memory and L1 cache when launching this kernel - with coarse granularity.
On several nVIDIA GPU micro-architectures, the L1 cache and the shared memory in each symmetric multiprocessor (=physical core) use the same hardware resources. The carve-out between the two uses has a device-wide value (which can be changed), but the driver can set another value for a specific function. This function doesn't make a demand from the CUDA runtime (as in opt_in_to_extra_dynamic_memory
), but rather indicates what is the fraction of L1 to shared memory it would like the kernel scheduler to carve out.
- Parameters
-
preference | one of: as much shared memory as possible, as much L1 as possible, or no preference (i.e. using the device default). |
- Note
- similar to set_preferred_shared_mem_fraction() - but with coarser granularity.
◆ set_maximum_dynamic_shared_memory_per_block()
Change the hardware resource carve-out between L1 cache and shared memory for launches of the kernel to allow for at least the specified amount of shared memory.
On several nVIDIA GPU micro-architectures, the L1 cache and the shared memory in each symmetric multiprocessor (=physical core) use the same hardware resources. The carve-out between the two uses has a device-wide value (which can be changed), but can also be set on the individual device-function level, by specifying the amount of shared memory the kernel may require.
◆ set_shared_memory_bank_size()
Sets a device function's preference of shared memory bank size.
- Parameters
-
config | bank size setting to make |
The documentation for this class was generated from the following file: