cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Representation, allocation and manipulation of CUDA-related memory, of different. More...
Namespaces | |
device | |
CUDA-Device-global memory on a single device (not accessible from the host) | |
host | |
Host-side (= system) memory which is "pinned", i.e. | |
managed | |
Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory. | |
mapped | |
Memory regions appearing in both on the host-side and device-side address spaces with the regions in both spaces mapped to each other (i.e. | |
shared | |
A memory space whose contents is shared by all threads in a CUDA kernel block, but specific to each kernel block separately. | |
Classes | |
struct | allocation_options |
options accepted by CUDA's allocator of memory with a host-side aspect (host-only or managed memory). More... | |
struct | copy_parameters_t |
A builder-ish subclass template around the basic 2D or 3D copy parameters which CUDA's complex copying API actually takes. More... | |
class | pointer_t |
A convenience wrapper around a raw pointer "known" to the CUDA runtime and which thus has various kinds of associated information which this wrapper allows access to. More... | |
class | unique_region |
A class for holding a region_t of memory owned "uniquely" by its creator - similar to how ::std::unique_ptr holds a uniquely- owned pointer. More... | |
Enumerations | |
enum | endpoint_t { source, destination } |
Type for choosing between endpoints of copy operations. | |
enum | portability_across_contexts : bool { isnt_portable = false, is_portable = true } |
A memory allocation setting: Can the allocated memory be used in other CUDA driver contexts (in addition to the implicit default context we have with the Runtime API). | |
enum | cpu_write_combining : bool { without_wc = false, with_wc = true } |
A memory allocation setting: Should the allocated memory be configured as write-combined, i.e. More... | |
enum | type_t : ::std::underlying_type< CUmemorytype >::type { host_ = CU_MEMORYTYPE_HOST, device_ = CU_MEMORYTYPE_DEVICE, array = CU_MEMORYTYPE_ARRAY, unified_ = CU_MEMORYTYPE_UNIFIED, managed_ = CU_MEMORYTYPE_UNIFIED, non_cuda = ~(::std::underlying_type<CUmemorytype>::type{0}) } |
The CUDA execution ecosystem involves different memory spaces in their relation to a GPU device or their treatment by the CUDA driver; this type distinguishes among them. | |
Functions | |
copy_parameters_t< 3 >::intra_context_type | as_intra_context_parameters (const copy_parameters_t< 3 > ¶ms) |
void | set (void *ptr, int byte_value, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Sets a number of bytes in memory to a fixed value. More... | |
void | set (region_t region, int byte_value, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to a fixed value. More... | |
void | zero (region_t region, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to 0 (zero) More... | |
void | zero (void *ptr, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Zero-out a region of memory. More... | |
template<typename T > | |
void | zero (T *ptr) |
Sets all bytes of a single pointed-to value to 0. More... | |
template<dimensionality_t NumDimensions> | |
void | copy (copy_parameters_t< NumDimensions > params, optional_ref< const stream_t > stream={}) |
An almost-generalized-case memory copy, taking a rather complex structure of copy parameters - wrapping the CUDA driver's own most-generalized-case copy. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (const array_t< T, NumDimensions > &destination, const context_t &source_context, const T *source, optional_ref< const stream_t > stream={}) |
Synchronously copies data from a CUDA array into non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (array_t< T, NumDimensions > &destination, const T *source, optional_ref< const stream_t > stream={}) |
Synchronously copies data from a CUDA array into non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (const array_t< T, NumDimensions > &destination, span< T const > source, optional_ref< const stream_t > stream={}) |
Copies a contiguous sequence of elements in memory into a CUDA array. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (const context_t &context, T *destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Synchronously copies data into a CUDA array from non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (T *destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Synchronously copies data into a CUDA array from non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (span< T > destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Copies the contents of a CUDA array into a sequence of contiguous elements in memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (const array_t< T, NumDimensions > &destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream) |
Copies the contents of one CUDA array to another. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (region_t destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Copies the contents of a CUDA array into a region of memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (array_t< T, NumDimensions > &destination, const_region_t source, optional_ref< const stream_t > stream={}) |
Copies the contents of a region of memory into a CUDA array. More... | |
template<typename T > | |
void | copy_single (T *destination, const T *source, optional_ref< const stream_t > stream={}) |
Synchronously copies a single (typed) value between two memory locations. More... | |
void | copy (void *destination, void const *source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Asynchronously copies data between memory spaces or within a memory space. More... | |
template<typename T , size_t N> | |
void | copy (c_array< T, N > &destination, const_region_t source, optional_ref< const stream_t > stream={}) |
Copy the contents of memory region into a C-style array, interpreting the memory as a sequence of elements of the array's element type. More... | |
template<typename T , size_t N> | |
void | copy (region_t destination, c_array< const T, N > const &source, optional_ref< const stream_t > stream={}) |
void | copy (region_t destination, const_region_t source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Asynchronously copies data between memory spaces or within a memory space. More... | |
void | copy (region_t destination, const_region_t source, optional_ref< const stream_t > stream={}) |
void | copy (region_t destination, void *source, optional_ref< const stream_t > stream={}) |
Copy memory between memory regions. More... | |
void | copy (region_t destination, void *source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Copy one region of memory into another. More... | |
void | copy (void *destination, const_region_t source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Copy one region of memory to another location. More... | |
void | copy (void *destination, const_region_t source, optional_ref< const stream_t > stream={}) |
template<typename T > | |
unique_span< T > | make_unique_span (const context_t &context, size_t size) |
See device::make_unique_span(const context_t& context, size_t size) | |
template<typename T > | |
unique_span< T > | make_unique_span (const device_t &device, size_t size) |
See device::make_unique_span(const context_t& context, size_t num_elements) | |
template<typename T , dimensionality_t NumDimensions> | |
void | copy (array_t< T, NumDimensions > &destination, span< T const > source, optional_ref< const stream_t > stream) |
context_t | context_of (void const *ptr) |
Obtain (a non-owning wrapper for) the CUDA context with which a memory address is associated (e.g. More... | |
memory::type_t | type_of (const void *ptr) |
Determine the type of memory at a given address vis-a-vis the CUDA ecosystem: Was it allocated by the CUDA driver? Does it reside solely on a GPU device'? Solely on the host? Movable between locations? etc. | |
void * | as_pointer (device::address_t address) noexcept |
device::unique_region | make_unique_region (const context_t &context, size_t num_elements) |
See device::make_unique_region(const context_t& context, size_t num_elements) | |
device::unique_region | make_unique_region (const device_t &device, size_t num_elements) |
See device::make_unique_region(const device_t& device, size_t num_elements) | |
template<typename T , size_t N> | |
void | copy (span< T > destination, c_array< const T, N > const &source, optional_ref< const stream_t > stream={}) |
Copy the contents of a C-style array into a span of same-type elements. More... | |
template<typename T , size_t N> | |
void | copy (c_array< T, N > &destination, span< T const > source, optional_ref< const stream_t > stream={}) |
Copy the contents of a span into a C-style array. More... | |
template<typename T , size_t N> | |
void | copy (void *destination, c_array< const T, N > const &source, optional_ref< const stream_t > stream={}) |
Copy the contents of a C-style array to another location in memory. More... | |
template<typename T , size_t N> | |
void | copy (c_array< T, N > &destination, T *source, optional_ref< const stream_t > stream={}) |
Copy memory into a C-style array. More... | |
Representation, allocation and manipulation of CUDA-related memory, of different.
enum cuda::memory::cpu_write_combining : bool |
A memory allocation setting: Should the allocated memory be configured as write-combined, i.e.
a write may not be immediately applied to the allocated region and propagated (e.g. to caches, over the PCIe bus). Instead, writes will be applied as convenient, possibly in batch.
Write-combining memory frees up the host's L1 and L2 cache resources, making more cache available to the rest of the application. In addition, write-combining memory is not snooped during transfers across the PCI Express bus, which can improve transfer performance.
Reading from write-combining memory from the host is prohibitively slow, so write-combining memory should in general be used for memory that the host only writes to.
|
inlinenoexcept |
|
inline |
Obtain (a non-owning wrapper for) the CUDA context with which a memory address is associated (e.g.
being the result of an allocation or mapping in that context)
|
inline |
Copy the contents of a C-style array into a span of same-type elements.
destination | A span of elements to overwrite with the array contents. |
source | A fixed-size C-style array from which copy data into destination ,. As this is taken by reference rather than by address of the first element, there is no array-decay. |
void cuda::memory::copy | ( | c_array< T, N > & | destination, |
span< T const > | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Copy the contents of a span into a C-style array.
destination | A fixed-size C-style array, to which to copy the data in source ,of size at least that of source .; as it is taken by reference rather than by address of the first element, there is no array-decay. |
source | A span of the same element type as the destination array, containing the data to be copied |
|
inline |
Copy the contents of a C-style array to another location in memory.
destination | The starting address of a sequence of |
N | values of type |
T | to overwrite with the array contents. |
source | A fixed-size C-style array from which copy data into destination ,. As this is taken by reference rather than by address of the first element, there is no array-decay. |
|
inline |
Copy memory into a C-style array.
destination | A fixed-size C-style array, to which to copy the data in source ,of size at least that of source .; as it is taken by reference rather than by address of the first element, there is no array-decay. |
source | The starting address of a sequence of |
N | elements to copy |
Asynchronously copies data from a memory region into a C-style array
destination | A fixed-size C-style array, to which to copy the data in source ,of size at least that of source .; as it is taken by reference rather than by address of the first element, there is no array-decay. |
source | The starting address of a sequence of |
N | elements to copy |
stream | schedule the copy operation in this CUDA stream |
void cuda::memory::copy | ( | copy_parameters_t< NumDimensions > | params, |
optional_ref< const stream_t > | stream = {} |
||
) |
An almost-generalized-case memory copy, taking a rather complex structure of copy parameters - wrapping the CUDA driver's own most-generalized-case copy.
NumDimensions | The number of dimensions of the parameter structure. |
params | A parameter structure with details regarding the copy source and destination, including CUDA context specifications, which must have been set in advance. This function will not verify its validity, but rather merely pass it on to the CUDA driver |
void cuda::memory::copy | ( | const array_t< T, NumDimensions > & | destination, |
const context_t & | source_context, | ||
const T * | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Synchronously copies data from a CUDA array into non-array memory.
NumDimensions | the number of array dimensions; only 2 and 3 are supported values |
T | array element type |
destination | A { |
NumDimensions}-dimensional | CUDA array, including a specification of the context in which the array is defined. |
source | A pointer to a region of contiguous memory holding destination.size() values of type |
T. | The memory may be located either on a CUDA device or in host memory. |
context | The context in which the source memory was allocated - possibly different than the target array context |
|
inline |
Synchronously copies data from a CUDA array into non-array memory.
NumDimensions | the number of array dimensions; only 2 and 3 are supported values |
T | array element type |
destination | A { |
NumDimensions}-dimensional | CUDA array |
source | A pointer to a region of contiguous memory holding destination.size() values of type |
T. | The memory may be located either on a CUDA device or in host memory. |
Asynchronously copies data into a CUDA array.
destination | A CUDA array to copy data into |
source | A pointer to a a memory region of size destination.size() * sizeof(T) |
stream | schedule the copy operation into this CUDA stream |
void cuda::memory::copy | ( | const array_t< T, NumDimensions > & | destination, |
span< T const > | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Copies a contiguous sequence of elements in memory into a CUDA array.
T | a trivially-copy-constructible, trivially-copy-destructible type of array elements |
void cuda::memory::copy | ( | const context_t & | context, |
T * | destination, | ||
const array_t< T, NumDimensions > & | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Synchronously copies data into a CUDA array from non-array memory.
NumDimensions | the number of array dimensions; only 2 and 3 are supported values |
T | array element type |
destination | A pointer to a region of contiguous memory holding destination.size() values of type |
T. | The memory may be located either on a CUDA device or in host memory. |
source | A { |
NumDimensions}-dimensional | CUDA array |
|
inline |
Synchronously copies data into a CUDA array from non-array memory.
NumDimensions | the number of array dimensions; only 2 and 3 are supported values |
T | array element type |
destination | A pointer to a region of contiguous memory holding destination.size() values of type |
T. | The memory may be located either on a CUDA device or in host memory. |
source | A { |
NumDimensions}-dimensional | CUDA array |
Asynchronously copies data from a CUDA array elsewhere
destination | A pointer to a a memory region of size source.size() * sizeof(T) |
source | A CUDA array cuda::array_t |
stream | schedule the copy operation into this CUDA stream |
void cuda::memory::copy | ( | span< T > | destination, |
const array_t< T, NumDimensions > & | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Copies the contents of a CUDA array into a sequence of contiguous elements in memory.
T | a trivially-copy-constructible, trivially-destructible, type of array elements |
destination
span must be at least as larger as the volume of the array. void cuda::memory::copy | ( | const array_t< T, NumDimensions > & | destination, |
const array_t< T, NumDimensions > & | source, | ||
optional_ref< const stream_t > | stream | ||
) |
Copies the contents of one CUDA array to another.
T | a trivially-copy-constructible type of array elements |
void cuda::memory::copy | ( | region_t | destination, |
const array_t< T, NumDimensions > & | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Copies the contents of a CUDA array into a region of memory.
T | a trivially-copy-constructible type of array elements |
destination
region must be large enough to hold all elements of the array, and may also be larger.Asynchronously copies data from a CUDA array elsewhere
destination | A memory region of size source.size() * sizeof(T) |
source | A CUDA array cuda::array_t |
stream | schedule the copy operation in this CUDA stream |
void cuda::memory::copy | ( | array_t< T, NumDimensions > & | destination, |
const_region_t | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Copies the contents of a region of memory into a CUDA array.
T | a trivially-copy-constructible type of array elements |
destination | A CUDA array to copy data into |
source | A memory region of size destination.size() * sizeof(T) |
stream | schedule the copy operation into this CUDA stream (or leave empty for a synchronous copy) |
|
inline |
Asynchronously copies data between memory spaces or within a memory space.
destination | A pointer to a memory region of size num_bytes , either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
source | A pointer to a memory region of size num_bytes , either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
num_bytes | The number of bytes to copy from source to destination |
stream | A stream on which to enqueue the copy operation |
|
inline |
Copy the contents of memory region into a C-style array, interpreting the memory as a sequence of elements of the array's element type.
destination | A region of memory to which to copy the data in source , of size at least that of source . |
source | A region of at least sizeof(T)*N bytes with whose data to fill the destination array. |
Asynchronously copies data from a memory region into a C-style array
destination | A fixed-size C-style array, to which to copy the data in source ,of size at least that of source .; as it is taken by reference rather than by address of the first element, there is no array-decay. |
source | A region of at least sizeof(T)*N bytes with whose data to fill the destination array. |
stream | schedule the copy operation in this CUDA stream |
|
inline |
destination | A region of memory to which to copy the data in source , of size at least that of source . |
source | A plain array whose contents is to be copied. |
** Asynchronously copies data from an array into a memory region
destination | A region of memory, either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
source | An array, either in host memory or on any CUDA device's global memory. |
stream | A stream on which to enqueue the copy operation |
|
inline |
Asynchronously copies data between memory spaces or within a memory space.
destination | A memory region of size no less than num_bytes , either in host memory or on any CUDA device's global memory. Must be registered with, or visible in, in the same context as stream . |
source | A memory region of size num_bytes , either in host memory or on any CUDA device's global memory. Must be defined in the same contextas the stream. |
num_bytes | The number of bytes to copy from source to destination |
stream | A stream on which to enqueue the copy operation |
|
inline |
destination | A region of memory to which to copy the data in source , of size at least that of source , either in host memory or on any CUDA device's global memory. |
source | A region whose contents is to be copied, either in host memory or on any CUDA device's global memory |
Asynchronously copies data between memory regions
destination | A region of memory, either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
source | A region of memory, either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
stream | A stream on which to enqueue the copy operation |
|
inline |
Copy memory between memory regions.
destination | A target region of memory into which to copy; enough memory will be copied to fill this region |
source | The beginning of a region of memory from which to copy |
Asynchronously copies data between memory regions
destination | A region of memory, either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
source | A pointer to region of memory, of size like that of destination , either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
stream | A stream on which to enqueue the copy operation |
|
inline |
Copy one region of memory into another.
destination | A region of memory to which to copy the data in source , of size at least that of source . |
source | A pointer to a a memory region of size num_bytes . |
num_bytes | The number of bytes to copy from source to destination |
Asynchronously copies data from one region of memory to another
destination | A region of memory, either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
source | Beginning of the region of memory to copy |
num_bytes | Amount of memory to copy |
stream | A stream on which to enqueue the copy operation |
|
inline |
Copy one region of memory to another location.
destination | The beginning of a target region of memory (of size at least num_bytes ) into which to copy |
source | A region of memory from which to copy, of size at least num_bytes |
num_bytes | The number of bytes to copy from source to destination |
Asynchronously copies data between memory regions
destination | The beginning of a memory region of size num_bytes , either in host memory or on any CUDA device's global memory. Must be registered with, or visible in, in the same context as stream . |
source | A memory region of size num_bytes , either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
num_bytes | The number of bytes to copy from source to destination |
stream | A stream on which to enqueue the copy operation |
|
inline |
destination | A memory region of the same size as source . |
source | A region whose contents is to be copied. |
Asynchronously copies data between memory regions
destination | Beginning of a memory region into which to copy data, either in host memory or on any CUDA device's global memory. The memory must be registered in, or visible within, the same context as {stream} . |
source | A memory region of size num_bytes , either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream. |
stream | A stream on which to enqueue the copy operation |
void cuda::memory::copy_single | ( | T * | destination, |
const T * | source, | ||
optional_ref< const stream_t > | stream = {} |
||
) |
Synchronously copies a single (typed) value between two memory locations.
destination | a value residing either in host memory or on any CUDA device's global memory |
source | a value residing either in host memory or on any CUDA device's global memory |
Copy a single (typed) value between memory locations
destination | a value residing either in host memory or on any CUDA device's global memory |
source | a value residing either in host memory or on any CUDA device's global memory |
stream | The CUDA command queue on which this copying will be enqueued |
|
inline |
Sets a number of bytes in memory to a fixed value.
ptr | Address of the first byte in memory to set. May be in host-side memory, global CUDA-device-side memory or CUDA-managed memory. |
byte_value | value to set the memory region to |
num_bytes | The amount of memory to set to byte_value |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes in a region of memory to a fixed value.
region | the memory region to set; may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory. |
byte_value | value to set the memory region to |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes in a region of memory to 0 (zero)
region | the memory region to zero-out; may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory. |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Zero-out a region of memory.
ptr | the beginning of a region of memory to zero-out; may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory. |
num_bytes | the size in bytes of the region of memory to zero-out |
stream | A stream on which to schedule this action; may be omitted. |
|
inline |
Sets all bytes of a single pointed-to value to 0.
ptr | pointer to a single element of a certain type, which may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory. |
stream | A stream on which to schedule this action; may be omitted. |