cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
kernel.hpp File Reference

Contains a base wrapper class for CUDA kernels - both statically and dynamically compiled; and some related functionality. More...

#include "primary_context.hpp"
#include "current_context.hpp"
#include "device_properties.hpp"
#include "error.hpp"
#include "types.hpp"
Include dependency graph for kernel.hpp:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  cuda::kernel_t
 A non-owning wrapper for CUDA kernels - whether they be __global__ functions compiled apriori, or the result of dynamic NVRTC compilation, or obtained in some other future way. More...
 

Namespaces

 cuda
 Definitions and functionality wrapping CUDA APIs.
 

Typedefs

using cuda::kernel::shared_memory_size_determiner_t = size_t(CUDA_CB *)(int block_size)
 Signature of a function for determining the shared memory size a kernel will use, given the block size in threads. More...
 

Functions

kernel_t cuda::kernel::wrap (device::id_t device_id, context::handle_t context_handle, kernel::handle_t handle, bool hold_primary_context_refcount_unit=false)
 Obtain a proxy object for a CUDA kernel. More...
 
attribute_value_t cuda::kernel::get_attribute (const kernel_t &kernel, attribute_t attribute)
 
void cuda::kernel::set_attribute (const kernel_t &kernel, attribute_t attribute, attribute_value_t value)
 
grid::dimension_t cuda::kernel::occupancy::max_active_blocks_per_multiprocessor (const kernel_t &kernel, grid::block_dimension_t block_size_in_threads, memory::shared::size_t dynamic_shared_memory_per_block, bool disable_caching_override=false)
 See the Driver API documentation for cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.
 
bool cuda::operator== (const kernel_t &lhs, const kernel_t &rhs) noexcept
 
bool cuda::operator!= (const kernel_t &lhs, const kernel_t &rhs) noexcept
 

Detailed Description

Contains a base wrapper class for CUDA kernels - both statically and dynamically compiled; and some related functionality.

Note
This file does not define any kernels itself.

Typedef Documentation

◆ shared_memory_size_determiner_t

using cuda::kernel::shared_memory_size_determiner_t = typedef size_t (CUDA_CB *)(int block_size)

Signature of a function for determining the shared memory size a kernel will use, given the block size in threads.

This functions is necessary for allowing CUDA to determine an optimal block size for a kernel - since CUDA cannot itself determine this value (and it will need to, for several different possible block sizes).

Function Documentation

◆ wrap()

kernel_t cuda::kernel::wrap ( device::id_t  device_id,
context::handle_t  context_handle,
kernel::handle_t  handle,
bool  hold_primary_context_refcount_unit = false 
)
inline

Obtain a proxy object for a CUDA kernel.

Note
This is a named constructor idiom, existing of direct access to the ctor of the same signature, to emphasize that a new kernel is not somehow created.
Parameters
device_idDevice of the context in which the kernel was created
context_handleHandle of the context in which the kernel was created
handleRaw CUDA driver handle for the kernel
hold_primary_context_refcount_unitwhen the event's context is a device's primary context, this controls whether that context must be kept active while the event continues to exist.
Returns
a wrapper object associated with the specified kernel