cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::memory::device Namespace Reference

CUDA-Device-global memory on a single device (not accessible from the host) More...

Typedefs

using address_t = CUdeviceptr
 The numeric type which can represent the range of memory addresses on a CUDA device. More...
 
using unique_region = memory::unique_region< detail_::deleter >
 A unique region of device-global memory.
 

Functions

void free (void *ptr)
 Free a region of device-side memory (regardless of how it was allocated)
 
void free (region_t region)
 Free a region of device-side memory (regardless of how it was allocated) More...
 
region_t allocate (const context_t &context, size_t size_in_bytes)
 Allocate device-side memory on a CUDA device context. More...
 
region_t allocate (const device_t &device, size_t size_in_bytes)
 Allocate device-side memory on a CUDA device. More...
 
template<typename T >
void typed_set (T *start, const T &value, size_t num_elements, optional_ref< const stream_t > stream={})
 Sets consecutive elements of a region of memory to a fixed value of some width. More...
 
void set (void *start, int byte_value, size_t num_bytes, optional_ref< const stream_t > stream={})
 Sets all bytes in a region of memory to a fixed value. More...
 
void set (region_t region, int byte_value, optional_ref< const stream_t > stream={})
 Sets all bytes in a region of memory to a fixed value. More...
 
void zero (void *start, size_t num_bytes, optional_ref< const stream_t > stream={})
 Sets all bytes in a region of memory to 0 (zero) More...
 
void zero (region_t region, optional_ref< const stream_t > stream={})
 Sets all bytes in a region of memory to 0 (zero) More...
 
template<typename T >
void zero (T *ptr, optional_ref< const stream_t > stream={})
 Sets all bytes of a single pointed-to value to 0. More...
 
template<typename T >
unique_span< T > make_unique_span (const context_t &context, size_t size)
 Allocate memory for a consecutive sequence of typed elements in device-global memory. More...
 
template<typename T >
unique_span< T > make_unique_span (const device_t &device, size_t size)
 Allocate memory for a consecutive sequence of typed elements in device-global memory. More...
 
template<typename T >
unique_span< T > make_unique_span (size_t size)
 Allocate memory for a consecutive sequence of typed elements in device-global memory. More...
 
unique_region make_unique_region (const context_t &context, size_t num_bytes)
 Allocate a region in device-global memory. More...
 
unique_region make_unique_region (const device_t &device, size_t num_elements)
 Create a variant of ::std::unique_pointer for an array in device-global memory. More...
 
unique_region make_unique_region (size_t num_elements)
 Create a variant of ::std::unique_pointer for an array in device-global memory on the current device. More...
 
address_t address (const void *device_ptr) noexcept
 
address_t address (memory::const_region_t region) noexcept
 

Detailed Description

CUDA-Device-global memory on a single device (not accessible from the host)

Typedef Documentation

◆ address_t

using cuda::memory::device::address_t = typedef CUdeviceptr

The numeric type which can represent the range of memory addresses on a CUDA device.

As these addresses are typically just part of the single, unified all-system memory space, this should be the same type as a system memory address' numeric equivalent.

Function Documentation

◆ address() [1/2]

address_t cuda::memory::device::address ( const void *  device_ptr)
inlinenoexcept
Returns
a cast of a proper pointer into a numeric address in device memory space (which is usually just a part of the unified all-system memory space)
Note
Typically, this is just a reinterpretation of the same value.

◆ address() [2/2]

address_t cuda::memory::device::address ( memory::const_region_t  region)
inlinenoexcept
Returns
The numeric address of the beginning of a memory region
Note
Typically, this is just a reinterpretation of the same value.

◆ allocate() [1/2]

region_t cuda::memory::device::allocate ( const context_t context,
size_t  size_in_bytes 
)
inline

Allocate device-side memory on a CUDA device context.

Note
The CUDA memory allocator guarantees alignment "suitabl[e] for any kind of variable" (CUDA 9.0 Runtime API documentation), and the CUDA programming guide guarantees since at least version 5.0 that the minimum allocation is 256 bytes.
Exceptions
cuda::runtime_errorif allocation fails for any reason
Parameters
contextthe context in which to allocate memory
size_in_bytesthe amount of global device memory to allocate
Returns
a pointer to the allocated stretch of memory (only usable within context)

◆ allocate() [2/2]

region_t cuda::memory::device::allocate ( const device_t device,
size_t  size_in_bytes 
)
inline

Allocate device-side memory on a CUDA device.

Note
The CUDA memory allocator guarantees alignment "suitabl[e] for any kind of variable" (CUDA 9.0 Runtime API documentation), and the CUDA programming guide guarantees since at least version 5.0 that the minimum allocation is 256 bytes.
Exceptions
cuda::runtime_errorif allocation fails for any reason
Parameters
devicethe device on which to allocate memory
size_in_bytesthe amount of global device memory to allocate
Returns
a pointer to the allocated stretch of memory (only usable on device)

◆ free()

void cuda::memory::device::free ( region_t  region)
inline

Free a region of device-side memory (regardless of how it was allocated)

◆ make_unique_region() [1/3]

unique_region cuda::memory::device::make_unique_region ( const context_t context,
size_t  num_bytes 
)
inline

Allocate a region in device-global memory.

Parameters
contextThe context within which (and in the device global memory of which) to make the allocation
num_bytesSize of the region to be allocated, in bytes
Returns
An owning RAII/CADRe object for the allocated memory region

◆ make_unique_region() [2/3]

unique_region cuda::memory::device::make_unique_region ( const device_t device,
size_t  num_elements 
)
inline

Create a variant of ::std::unique_pointer for an array in device-global memory.

Allocate a region in device-global memory.

Template Parameters
Tan array type; not the type of individual elements
Parameters
deviceon which to construct the array of elements
num_elementsthe number of elements to allocate
Returns
an ::std::unique_ptr pointing to the constructed T array
Parameters
deviceThe device in the global memory of which to make the allocation
Returns
An owning RAII/CADRe object for the allocated memory region

◆ make_unique_region() [3/3]

unique_region cuda::memory::device::make_unique_region ( size_t  num_elements)
inline

Create a variant of ::std::unique_pointer for an array in device-global memory on the current device.

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region in device-global memory within the primary context of the current CUDA device.

Note
The allocation will be made in the device's primary context - which will be created if it has not yet been.
Template Parameters
Tan array type; not the type of individual elements
Parameters
num_elementsthe number of elements to allocate
Returns
an ::std::unique_ptr pointing to the constructed T array
Parameters
deviceThe device in the global memory of which to make the allocation
Returns
An owning RAII/CADRe object for the allocated memory region

◆ make_unique_span() [1/3]

template<typename T >
unique_span< T > cuda::memory::device::make_unique_span ( const context_t context,
size_t  size 
)

Allocate memory for a consecutive sequence of typed elements in device-global memory.

Template Parameters
Ttype of the individual elements in the allocated sequence
Parameters
contextThe CUDA device context in which to make the allocation.
sizethe number of elements to allocate
Returns
A unique_span which owns the allocated memory (and will release said
Note
This function is somewhat similar to ::std:: make_unique_for_overwrite(), except that the returned value is not "just" a unique pointer, but also has a size. It is also similar to {cuda::device::make_unique_region}, except that the allocation is conceived as typed elements.
Typically, this is used for trivially-constructible elements, for which reason the non-construction of individual elements should not pose a problem. But - let the user beware.

◆ make_unique_span() [2/3]

template<typename T >
unique_span< T > cuda::memory::device::make_unique_span ( const device_t device,
size_t  num_elements 
)

Allocate memory for a consecutive sequence of typed elements in device-global memory.

Allocate (but do) device-global memory.

Template Parameters
Ttype of the individual elements in the allocated sequence
Parameters
contextThe CUDA device context in which to make the allocation.
sizethe number of elements to allocate
Returns
A unique_span which owns the allocated memory (and will release said
Note
This function is somewhat similar to ::std:: make_unique_for_overwrite(), except that the returned value is not "just" a unique pointer, but also has a size. It is also similar to {cuda::device::make_unique_region}, except that the allocation is conceived as typed elements.
Typically, this is used for trivially-constructible elements, for which reason the non-construction of individual elements should not pose a problem. But - let the user beware.
Parameters
deviceThe CUDA device in whose primary context to make the allocation.
Template Parameters
Tan array type; not the type of individual elements
Parameters
deviceon which to construct the array of elements
num_elementsthe number of elements to allocate
Returns
an ::std::unique_ptr pointing to the constructed T array

◆ make_unique_span() [3/3]

template<typename T >
unique_span< T > cuda::memory::device::make_unique_span ( size_t  num_elements)

Allocate memory for a consecutive sequence of typed elements in device-global memory.

Create a variant of ::std::unique_pointer for an array in device-global memory on the current device.

Template Parameters
Ttype of the individual elements in the allocated sequence
Parameters
contextThe CUDA device context in which to make the allocation.
sizethe number of elements to allocate
Returns
A unique_span which owns the allocated memory (and will release said
Note
This function is somewhat similar to ::std:: make_unique_for_overwrite(), except that the returned value is not "just" a unique pointer, but also has a size. It is also similar to {cuda::device::make_unique_region}, except that the allocation is conceived as typed elements.
Typically, this is used for trivially-constructible elements, for which reason the non-construction of individual elements should not pose a problem. But - let the user beware.
The current device's primary context will be used (not the current context).
The allocation will be made in the device's primary context - which will be created if it has not yet been.
Template Parameters
Tan array type; not the type of individual elements
Parameters
num_elementsthe number of elements to allocate
Returns
an ::std::unique_ptr pointing to the constructed T array

◆ set() [1/2]

void cuda::memory::device::set ( void *  start,
int  byte_value,
size_t  num_bytes,
optional_ref< const stream_t stream = {} 
)
inline

Sets all bytes in a region of memory to a fixed value.

Note
The equivalent of ::std::memset for CUDA device-side memory
Parameters
byte_valuevalue to set the memory region to
startstarting address of the memory region to set, in a CUDA device's global memory
num_bytessize of the memory region in bytes
streaman stream on which to schedule the operation; may be omitted

Asynchronously sets all bytes in a stretch of memory to a single value

Note
asynchronous version of memory::set(void*, int, size_t)
Parameters
startstarting address of the memory region to set, in a CUDA device's global memory
byte_valuevalue to set the memory region to
num_bytessize of the memory region in bytes
streamA stream on which to schedule this action; may be omitted.

◆ set() [2/2]

void cuda::memory::device::set ( region_t  region,
int  byte_value,
optional_ref< const stream_t stream = {} 
)
inline

Sets all bytes in a region of memory to a fixed value.

Note
The equivalent of ::std::memset for CUDA device-side memory
Parameters
byte_valuevalue to set the memory region to
regiona region to zero-out, in a CUDA device's global memory
streamA stream on which to schedule this action; may be omitted.

◆ typed_set()

template<typename T >
void cuda::memory::device::typed_set ( T *  start,
const T &  value,
size_t  num_elements,
optional_ref< const stream_t stream = {} 
)
inline

Sets consecutive elements of a region of memory to a fixed value of some width.

Note
A generalization of set(), for different-size units.
Template Parameters
TAn unsigned integer type of size 1, 2, 4 or 8
Parameters
startThe first location to set to value ; must be properly aligned.
valueA (properly aligned) value to set T-elements to.
num_elementsThe number of type-T elements (i.e. not necessarily the number of bytes).
streamA stream on which to schedule this action; may be omitted.

◆ zero() [1/3]

void cuda::memory::device::zero ( void *  start,
size_t  num_bytes,
optional_ref< const stream_t stream = {} 
)
inline

Sets all bytes in a region of memory to 0 (zero)

Asynchronously sets all bytes in a stretch of memory to 0.

Parameters
startthe beginning of a region of memory to zero-out, accessible within a CUDA device's global memory
num_bytesthe size in bytes of the region of memory to zero-out
streamA stream on which to schedule this action; may be omitted.
startstarting address of the memory region to set, in a CUDA device's global memory
num_bytessize of the memory region in bytes
streamstream on which to schedule this action
streamA stream on which to enqueue the operation; may be omitted.

◆ zero() [2/3]

void cuda::memory::device::zero ( region_t  region,
optional_ref< const stream_t stream = {} 
)
inline

Sets all bytes in a region of memory to 0 (zero)

Parameters
regionthe memory region to zero-out, accessible as a part of a CUDA device's global memory
streamA stream on which to schedule this action; may be omitted.

◆ zero() [3/3]

template<typename T >
void cuda::memory::device::zero ( T *  ptr,
optional_ref< const stream_t stream = {} 
)
inline

Sets all bytes of a single pointed-to value to 0.

Parameters
ptrpointer to a value of a certain type, accessible within in a CUDA device's global memory
streaman existing stream on which to schedule this action; may be omitted