cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Contains a base wrapper class for CUDA kernels - both statically and dynamically compiled; and some related functionality. More...
#include "primary_context.hpp"
#include "current_context.hpp"
#include "device_properties.hpp"
#include "error.hpp"
#include "types.hpp"
Go to the source code of this file.
Classes | |
class | cuda::kernel_t |
A non-owning wrapper for CUDA kernels - whether they be __global__ functions compiled apriori, or the result of dynamic NVRTC compilation, or obtained in some other future way. More... | |
Namespaces | |
cuda | |
Definitions and functionality wrapping CUDA APIs. | |
Typedefs | |
using | cuda::kernel::shared_memory_size_determiner_t = size_t(CUDA_CB *)(int block_size) |
Signature of a function for determining the shared memory size a kernel will use, given the block size in threads. More... | |
Functions | |
kernel_t | cuda::kernel::wrap (device::id_t device_id, context::handle_t context_handle, kernel::handle_t handle, bool hold_primary_context_refcount_unit=false) |
Obtain a proxy object for a CUDA kernel. More... | |
attribute_value_t | cuda::kernel::get_attribute (const kernel_t &kernel, attribute_t attribute) |
void | cuda::kernel::set_attribute (const kernel_t &kernel, attribute_t attribute, attribute_value_t value) |
grid::dimension_t | cuda::kernel::occupancy::max_active_blocks_per_multiprocessor (const kernel_t &kernel, grid::block_dimension_t block_size_in_threads, memory::shared::size_t dynamic_shared_memory_per_block, bool disable_caching_override=false) |
See the Driver API documentation for cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags. | |
bool | cuda::operator== (const kernel_t &lhs, const kernel_t &rhs) noexcept |
bool | cuda::operator!= (const kernel_t &lhs, const kernel_t &rhs) noexcept |
Contains a base wrapper class for CUDA kernels - both statically and dynamically compiled; and some related functionality.
using cuda::kernel::shared_memory_size_determiner_t = typedef size_t (CUDA_CB *)(int block_size) |
Signature of a function for determining the shared memory size a kernel will use, given the block size in threads.
This functions is necessary for allowing CUDA to determine an optimal block size for a kernel - since CUDA cannot itself determine this value (and it will need to, for several different possible block sizes).
|
inline |
Obtain a proxy object for a CUDA kernel.
device_id | Device of the context in which the kernel was created |
context_handle | Handle of the context in which the kernel was created |
handle | Raw CUDA driver handle for the kernel |
hold_primary_context_refcount_unit | when the event's context is a device's primary context, this controls whether that context must be kept active while the event continues to exist. |