|
cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
CUDA-Device-global memory on a single device (not accessible from the host) More...
Typedefs | |
| using | address_t = CUdeviceptr |
| The numeric type which can represent the range of memory addresses on a CUDA device. More... | |
| using | unique_region = memory::unique_region< detail_::deleter > |
| A unique region of device-global memory. | |
Functions | |
| void | free (void *ptr) |
| Free a region of device-side memory (regardless of how it was allocated) | |
| void | free (region_t region) |
| Free a region of device-side memory (regardless of how it was allocated) More... | |
| region_t | allocate (const context_t &context, size_t size_in_bytes) |
| Allocate device-side memory on a CUDA device context. More... | |
| region_t | allocate (const device_t &device, size_t size_in_bytes) |
| Allocate device-side memory on a CUDA device. More... | |
| template<typename T > | |
| void | typed_set (T *start, const T &value, size_t num_elements, optional_ref< const stream_t > stream={}) |
| Sets consecutive elements of a region of memory to a fixed value of some width. More... | |
| void | set (void *start, int byte_value, size_t num_bytes, optional_ref< const stream_t > stream={}) |
| Sets all bytes in a region of memory to a fixed value. More... | |
| void | set (region_t region, int byte_value, optional_ref< const stream_t > stream={}) |
| Sets all bytes in a region of memory to a fixed value. More... | |
| void | zero (void *start, size_t num_bytes, optional_ref< const stream_t > stream={}) |
| Sets all bytes in a region of memory to 0 (zero) More... | |
| void | zero (region_t region, optional_ref< const stream_t > stream={}) |
| Sets all bytes in a region of memory to 0 (zero) More... | |
| template<typename T > | |
| void | zero (T *ptr, optional_ref< const stream_t > stream={}) |
| Sets all bytes of a single pointed-to value to 0. More... | |
| template<typename T > | |
| unique_span< T > | make_unique_span (const context_t &context, size_t size) |
| Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
| template<typename T > | |
| unique_span< T > | make_unique_span (const device_t &device, size_t size) |
| Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
| template<typename T > | |
| unique_span< T > | make_unique_span (size_t size) |
| Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
| unique_region | make_unique_region (const context_t &context, size_t num_bytes) |
| Allocate a region in device-global memory. More... | |
| unique_region | make_unique_region (const device_t &device, size_t num_elements) |
| Create a variant of ::std::unique_pointer for an array in device-global memory. More... | |
| unique_region | make_unique_region (size_t num_elements) |
| Create a variant of ::std::unique_pointer for an array in device-global memory on the current device. More... | |
| address_t | address (const void *device_ptr) noexcept |
| address_t | address (memory::const_region_t region) noexcept |
CUDA-Device-global memory on a single device (not accessible from the host)
| using cuda::memory::device::address_t = typedef CUdeviceptr |
The numeric type which can represent the range of memory addresses on a CUDA device.
As these addresses are typically just part of the single, unified all-system memory space, this should be the same type as a system memory address' numeric equivalent.
|
inlinenoexcept |
|
inlinenoexcept |
Allocate device-side memory on a CUDA device context.
| cuda::runtime_error | if allocation fails for any reason |
| context | the context in which to allocate memory |
| size_in_bytes | the amount of global device memory to allocate |
context) Allocate device-side memory on a CUDA device.
| cuda::runtime_error | if allocation fails for any reason |
| device | the device on which to allocate memory |
| size_in_bytes | the amount of global device memory to allocate |
device)
|
inline |
Free a region of device-side memory (regardless of how it was allocated)
|
inline |
Allocate a region in device-global memory.
| context | The context within which (and in the device global memory of which) to make the allocation |
| num_bytes | Size of the region to be allocated, in bytes |
|
inline |
Create a variant of ::std::unique_pointer for an array in device-global memory.
Allocate a region in device-global memory.
| T | an array type; not the type of individual elements |
| device | on which to construct the array of elements |
| num_elements | the number of elements to allocate |
| device | The device in the global memory of which to make the allocation |
|
inline |
Create a variant of ::std::unique_pointer for an array in device-global memory on the current device.
Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.
Allocate a region in device-global memory within the primary context of the current CUDA device.
| T | an array type; not the type of individual elements |
| num_elements | the number of elements to allocate |
| device | The device in the global memory of which to make the allocation |
| unique_span< T > cuda::memory::device::make_unique_span | ( | const context_t & | context, |
| size_t | size | ||
| ) |
Allocate memory for a consecutive sequence of typed elements in device-global memory.
| T | type of the individual elements in the allocated sequence |
| context | The CUDA device context in which to make the allocation. |
| size | the number of elements to allocate |
| unique_span< T > cuda::memory::device::make_unique_span | ( | const device_t & | device, |
| size_t | num_elements | ||
| ) |
Allocate memory for a consecutive sequence of typed elements in device-global memory.
Allocate (but do) device-global memory.
| T | type of the individual elements in the allocated sequence |
| context | The CUDA device context in which to make the allocation. |
| size | the number of elements to allocate |
| device | The CUDA device in whose primary context to make the allocation. |
| T | an array type; not the type of individual elements |
| device | on which to construct the array of elements |
| num_elements | the number of elements to allocate |
| unique_span< T > cuda::memory::device::make_unique_span | ( | size_t | num_elements | ) |
Allocate memory for a consecutive sequence of typed elements in device-global memory.
Create a variant of ::std::unique_pointer for an array in device-global memory on the current device.
| T | type of the individual elements in the allocated sequence |
| context | The CUDA device context in which to make the allocation. |
| size | the number of elements to allocate |
| T | an array type; not the type of individual elements |
| num_elements | the number of elements to allocate |
|
inline |
Sets all bytes in a region of memory to a fixed value.
| byte_value | value to set the memory region to |
| start | starting address of the memory region to set, in a CUDA device's global memory |
| num_bytes | size of the memory region in bytes |
| stream | an stream on which to schedule the operation; may be omitted |
Asynchronously sets all bytes in a stretch of memory to a single value
| start | starting address of the memory region to set, in a CUDA device's global memory |
| byte_value | value to set the memory region to |
| num_bytes | size of the memory region in bytes |
| stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes in a region of memory to a fixed value.
| byte_value | value to set the memory region to |
| region | a region to zero-out, in a CUDA device's global memory |
| stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets consecutive elements of a region of memory to a fixed value of some width.
set(), for different-size units.| T | An unsigned integer type of size 1, 2, 4 or 8 |
| start | The first location to set to value ; must be properly aligned. |
| value | A (properly aligned) value to set T-elements to. |
| num_elements | The number of type-T elements (i.e. not necessarily the number of bytes). |
| stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes in a region of memory to 0 (zero)
Asynchronously sets all bytes in a stretch of memory to 0.
| start | the beginning of a region of memory to zero-out, accessible within a CUDA device's global memory |
| num_bytes | the size in bytes of the region of memory to zero-out |
| stream | A stream on which to schedule this action; may be omitted. |
| start | starting address of the memory region to set, in a CUDA device's global memory |
| num_bytes | size of the memory region in bytes |
| stream | stream on which to schedule this action |
| stream | A stream on which to enqueue the operation; may be omitted. |
|
inline |
Sets all bytes in a region of memory to 0 (zero)
| region | the memory region to zero-out, accessible as a part of a CUDA device's global memory |
| stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes of a single pointed-to value to 0.
| ptr | pointer to a value of a certain type, accessible within in a CUDA device's global memory |
| stream | an existing stream on which to schedule this action; may be omitted |