cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
CUDA-Device-global memory on a single device (not accessible from the host) More...
Typedefs | |
using | address_t = CUdeviceptr |
The numeric type which can represent the range of memory addresses on a CUDA device. More... | |
using | unique_region = memory::unique_region< detail_::deleter > |
A unique region of device-global memory. | |
Functions | |
void | free (void *ptr) |
Free a region of device-side memory (regardless of how it was allocated) | |
void | free (region_t region) |
Free a region of device-side memory (regardless of how it was allocated) More... | |
region_t | allocate (const context_t &context, size_t size_in_bytes) |
Allocate device-side memory on a CUDA device context. More... | |
region_t | allocate (const device_t &device, size_t size_in_bytes) |
Allocate device-side memory on a CUDA device. More... | |
template<typename T > | |
void | typed_set (T *start, const T &value, size_t num_elements, optional_ref< const stream_t > stream={}) |
Sets consecutive elements of a region of memory to a fixed value of some width. More... | |
void | set (void *start, int byte_value, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to a fixed value. More... | |
void | set (region_t region, int byte_value, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to a fixed value. More... | |
void | zero (void *start, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to 0 (zero) More... | |
void | zero (region_t region, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to 0 (zero) More... | |
template<typename T > | |
void | zero (T *ptr, optional_ref< const stream_t > stream={}) |
Sets all bytes of a single pointed-to value to 0. More... | |
template<typename T > | |
unique_span< T > | make_unique_span (const context_t &context, size_t size) |
Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
template<typename T > | |
unique_span< T > | make_unique_span (const device_t &device, size_t size) |
Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
template<typename T > | |
unique_span< T > | make_unique_span (size_t size) |
Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
unique_region | make_unique_region (const context_t &context, size_t num_bytes) |
Allocate a region in device-global memory. More... | |
unique_region | make_unique_region (const device_t &device, size_t num_elements) |
Create a variant of ::std::unique_pointer for an array in device-global memory. More... | |
unique_region | make_unique_region (size_t num_elements) |
Create a variant of ::std::unique_pointer for an array in device-global memory on the current device. More... | |
address_t | address (const void *device_ptr) noexcept |
address_t | address (memory::const_region_t region) noexcept |
CUDA-Device-global memory on a single device (not accessible from the host)
using cuda::memory::device::address_t = typedef CUdeviceptr |
The numeric type which can represent the range of memory addresses on a CUDA device.
As these addresses are typically just part of the single, unified all-system memory space, this should be the same type as a system memory address' numeric equivalent.
|
inlinenoexcept |
|
inlinenoexcept |
Allocate device-side memory on a CUDA device context.
cuda::runtime_error | if allocation fails for any reason |
context | the context in which to allocate memory |
size_in_bytes | the amount of global device memory to allocate |
context
) Allocate device-side memory on a CUDA device.
cuda::runtime_error | if allocation fails for any reason |
device | the device on which to allocate memory |
size_in_bytes | the amount of global device memory to allocate |
device
)
|
inline |
Free a region of device-side memory (regardless of how it was allocated)
|
inline |
Allocate a region in device-global memory.
context | The context within which (and in the device global memory of which) to make the allocation |
num_bytes | Size of the region to be allocated, in bytes |
|
inline |
Create a variant of ::std::unique_pointer for an array in device-global memory.
Allocate a region in device-global memory.
T | an array type; not the type of individual elements |
device | on which to construct the array of elements |
num_elements | the number of elements to allocate |
device | The device in the global memory of which to make the allocation |
|
inline |
Create a variant of ::std::unique_pointer for an array in device-global memory on the current device.
Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.
Allocate a region in device-global memory within the primary context of the current CUDA device.
T | an array type; not the type of individual elements |
num_elements | the number of elements to allocate |
device | The device in the global memory of which to make the allocation |
unique_span< T > cuda::memory::device::make_unique_span | ( | const context_t & | context, |
size_t | size | ||
) |
Allocate memory for a consecutive sequence of typed elements in device-global memory.
T | type of the individual elements in the allocated sequence |
context | The CUDA device context in which to make the allocation. |
size | the number of elements to allocate |
unique_span< T > cuda::memory::device::make_unique_span | ( | const device_t & | device, |
size_t | num_elements | ||
) |
Allocate memory for a consecutive sequence of typed elements in device-global memory.
Allocate (but do) device-global memory.
T | type of the individual elements in the allocated sequence |
context | The CUDA device context in which to make the allocation. |
size | the number of elements to allocate |
device | The CUDA device in whose primary context to make the allocation. |
T | an array type; not the type of individual elements |
device | on which to construct the array of elements |
num_elements | the number of elements to allocate |
unique_span< T > cuda::memory::device::make_unique_span | ( | size_t | num_elements | ) |
Allocate memory for a consecutive sequence of typed elements in device-global memory.
Create a variant of ::std::unique_pointer for an array in device-global memory on the current device.
T | type of the individual elements in the allocated sequence |
context | The CUDA device context in which to make the allocation. |
size | the number of elements to allocate |
T | an array type; not the type of individual elements |
num_elements | the number of elements to allocate |
|
inline |
Sets all bytes in a region of memory to a fixed value.
byte_value | value to set the memory region to |
start | starting address of the memory region to set, in a CUDA device's global memory |
num_bytes | size of the memory region in bytes |
stream | an stream on which to schedule the operation; may be omitted |
Asynchronously sets all bytes in a stretch of memory to a single value
start | starting address of the memory region to set, in a CUDA device's global memory |
byte_value | value to set the memory region to |
num_bytes | size of the memory region in bytes |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes in a region of memory to a fixed value.
byte_value | value to set the memory region to |
region | a region to zero-out, in a CUDA device's global memory |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets consecutive elements of a region of memory to a fixed value of some width.
set()
, for different-size units.T | An unsigned integer type of size 1, 2, 4 or 8 |
start | The first location to set to value ; must be properly aligned. |
value | A (properly aligned) value to set T-elements to. |
num_elements | The number of type-T elements (i.e. not necessarily the number of bytes). |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes in a region of memory to 0 (zero)
Asynchronously sets all bytes in a stretch of memory to 0.
start | the beginning of a region of memory to zero-out, accessible within a CUDA device's global memory |
num_bytes | the size in bytes of the region of memory to zero-out |
stream | A stream on which to schedule this action; may be omitted. |
start | starting address of the memory region to set, in a CUDA device's global memory |
num_bytes | size of the memory region in bytes |
stream | stream on which to schedule this action |
stream | A stream on which to enqueue the operation; may be omitted. |
|
inline |
Sets all bytes in a region of memory to 0 (zero)
region | the memory region to zero-out, accessible as a part of a CUDA device's global memory |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes of a single pointed-to value to 0.
ptr | pointer to a value of a certain type, accessible within in a CUDA device's global memory |
stream | an existing stream on which to schedule this action; may be omitted |