Contains a base wrapper class for CUDA kernels - both statically and dynamically compiled; and some related functionality. More...

#include "primary_context.hpp"
#include "current_context.hpp"
#include "device_properties.hpp"
#include "error.hpp"
#include "types.hpp"

Include dependency graph for kernel.hpp:

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes
class	cuda::kernel_t
	A non-owning wrapper for CUDA kernels - whether they be `__global__` functions compiled apriori, or the result of dynamic NVRTC compilation, or obtained in some other future way. More...

Namespaces
	cuda
	Definitions and functionality wrapping CUDA APIs.

Typedefs
using	cuda::kernel::shared_memory_size_determiner_t = size_t(CUDA_CB *)(int block_size)
	Signature of a function for determining the shared memory size a kernel will use, given the block size in threads. More...

Functions
kernel_t	cuda::kernel::wrap (device::id_t device_id, context::handle_t context_handle, kernel::handle_t handle, bool hold_primary_context_refcount_unit=false)
	Obtain a proxy object for a CUDA kernel. More...

attribute_value_t	cuda::kernel::get_attribute (const kernel_t &kernel, attribute_t attribute)

void	cuda::kernel::set_attribute (const kernel_t &kernel, attribute_t attribute, attribute_value_t value)

grid::dimension_t	cuda::kernel::occupancy::max_active_blocks_per_multiprocessor (const kernel_t &kernel, grid::block_dimension_t block_size_in_threads, memory::shared::size_t dynamic_shared_memory_per_block, bool disable_caching_override=false)
	See the Driver API documentation for cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags.

bool	cuda::operator== (const kernel_t &lhs, const kernel_t &rhs) noexcept

bool	cuda::operator!= (const kernel_t &lhs, const kernel_t &rhs) noexcept

Detailed Description

Contains a base wrapper class for CUDA kernels - both statically and dynamically compiled; and some related functionality.

Note: This file does not define any kernels itself.

Typedef Documentation

◆ shared_memory_size_determiner_t

using cuda::kernel::shared_memory_size_determiner_t = typedef size_t (CUDA_CB *)(int block_size)

Signature of a function for determining the shared memory size a kernel will use, given the block size in threads.

This functions is necessary for allowing CUDA to determine an optimal block size for a kernel - since CUDA cannot itself determine this value (and it will need to, for several different possible block sizes).

Function Documentation

◆ wrap()

kernel_t cuda::kernel::wrap	(	device::id_t	device_id,
		context::handle_t	context_handle,
		kernel::handle_t	handle,
		bool	hold_primary_context_refcount_unit = `false`
	)

inline

Obtain a proxy object for a CUDA kernel.

Note: This is a named constructor idiom, existing of direct access to the ctor of the same signature, to emphasize that a new kernel is not somehow created.

Parameters

device_id	Device of the context in which the kernel was created
context_handle	Handle of the context in which the kernel was created
handle	Raw CUDA driver handle for the kernel
hold_primary_context_refcount_unit	when the event's context is a device's primary context, this controls whether that context must be kept active while the event continues to exist.

Returns: a wrapper object associated with the specified kernel

Classes

Namespaces

Typedefs

Functions

Detailed Description

Typedef Documentation

◆ shared_memory_size_determiner_t

Function Documentation

◆ wrap()