cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Classes | Namespaces | Macros | Enumerations | Functions
memory.hpp File Reference

freestanding wrapper functions for working with CUDA's various kinds of memory spaces, arranged into a relevant namespace hierarchy. More...

#include <cuda/api/array.hpp>
#include <cuda/api/constants.hpp>
#include <cuda/api/current_device.hpp>
#include <cuda/api/error.hpp>
#include <cuda/api/pointer.hpp>
#include <cuda/api/current_context.hpp>
#include <cuda_runtime.h>
#include <cuda.h>
#include <memory>
#include <cstring>
#include <vector>
Include dependency graph for memory.hpp:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  cuda::memory::allocation_options
 options accepted by CUDA's allocator of memory with a host-side aspect (host-only or managed memory). More...
 
struct  cuda::memory::mapped::region_pair
 A pair of memory regions, one in system (=host) memory and one on a CUDA device's memory - mapped to each other. More...
 
struct  cuda::memory::managed::region_t
 
struct  cuda::memory::managed::const_region_t
 

Namespaces

 cuda
 All definitions and functionality wrapping the CUDA Runtime API.
 
 mapped
 Memory regions appearing in both on the host-side and device-side address spaces with the regions in both spaces mapped to each other (i.e.
 
 cuda::memory::device
 CUDA-Device-global memory on a single device (not accessible from the host)
 
 host
 Host-side (= system) memory which is "pinned", i.e.
 
 cuda::memory::managed
 This type of memory, also known as unified memory, appears within a unified, all-system address space - and is used with the same address range on the host and on all relevant CUDA devices on a system.
 

Enumerations

enum  cuda::memory::portability_across_contexts : bool {
  cuda::memory::portability_across_contexts::is_portable = true,
  cuda::memory::portability_across_contexts::isnt_portable = false
}
 A memory allocation setting: Can the allocated memory be used in other CUDA driver contexts (in addition to the implicit default context we have with the Runtime API).
 
enum  cuda::memory::cpu_write_combining : bool {
  with_wc = true,
  without_wc = false
}
 A memory allocation setting: Should the allocated memory be configured as write-combined, i.e. More...
 
enum  cuda::memory::host::mapped_io_space : bool {
  is_mapped_io_space = true,
  is_not_mapped_io_space = false
}
 Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More...
 
enum  cuda::memory::host::map_into_device_memory : bool {
  map_into_device_memory = true,
  do_not_map_into_device_memory = false
}
 Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More...
 
enum  cuda::memory::host::accessibility_on_all_devices : bool {
  cuda::memory::host::is_accessible_on_all_devices = true,
  cuda::memory::host::is_not_accessible_on_all_devices = false
}
 Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation. More...
 
enum  attachment_t : unsigned {
  global = CU_MEM_ATTACH_GLOBAL,
  host = CU_MEM_ATTACH_HOST,
  single_stream = CU_MEM_ATTACH_SINGLE
}
 
enum  kind_t {
  read_mostly = CU_MEM_RANGE_ATTRIBUTE_READ_MOSTLY,
  preferred_location = CU_MEM_RANGE_ATTRIBUTE_PREFERRED_LOCATION,
  accessor = CU_MEM_RANGE_ATTRIBUTE_ACCESSED_BY
}
 

Functions

region_t cuda::memory::device::allocate (const context_t &context, size_t size_in_bytes)
 Allocate device-side memory on a CUDA device context. More...
 
region_t cuda::memory::device::allocate (const device_t &device, size_t size_in_bytes)
 Allocate device-side memory on a CUDA device. More...
 
template<typename T >
void cuda::memory::device::typed_set (T *start, const T &value, size_t num_elements)
 Sets consecutive elements of a region of memory to a fixed value of some width. More...
 
template<typename T >
void cuda::memory::device::zero (T *ptr)
 Sets all bytes of a single pointed-to value to 0. More...
 
void cuda::memory::set (void *ptr, int byte_value, size_t num_bytes)
 Sets a number of bytes in memory to a fixed value. More...
 
void cuda::memory::set (region_t region, int byte_value)
 Sets all bytes in a region of memory to a fixed value. More...
 
void cuda::memory::zero (region_t region)
 Sets all bytes in a region of memory to 0 (zero) More...
 
void cuda::memory::zero (void *ptr, size_t num_bytes)
 Sets a number of bytes starting in at a given address of memory to 0 (zero) More...
 
template<typename T >
void cuda::memory::zero (T *ptr)
 Sets all bytes of a single pointed-to value to 0. More...
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::copy (const array_t< T, NumDimensions > &destination, const T *source)
 Synchronously copies data from a CUDA array into non-array memory. More...
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::copy (T *destination, const array_t< T, NumDimensions > &source)
 Synchronously copies data into a CUDA array from non-array memory. More...
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::copy (array_t< T, NumDimensions > destination, array_t< T, NumDimensions > source)
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::copy (region_t destination, const array_t< T, NumDimensions > &source)
 
template<typename T >
void cuda::memory::copy_single (T *destination, const T *source)
 Synchronously copies a single (typed) value between two memory locations. More...
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::async::copy (array_t< T, NumDimensions > &destination, const T *source, const stream_t &stream)
 Asynchronously copies data from memory spaces into CUDA arrays. More...
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::async::copy (array_t< T, NumDimensions > &destination, const_region_t source, const stream_t &stream)
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::async::copy (T *destination, const array_t< T, NumDimensions > &source, const stream_t &stream)
 Asynchronously copies data from CUDA arrays into memory spaces. More...
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::async::copy (region_t destination, const array_t< T, NumDimensions > &source, const stream_t &stream)
 
template<typename T , size_t N>
void cuda::memory::async::copy (T(&destination)[N], T *source, const stream_t &stream)
 
template<typename T , size_t N>
void cuda::memory::async::copy (T(&destination)[N], const_region_t source, const stream_t &stream)
 
template<typename T >
void cuda::memory::async::copy_single (T &destination, const T &source, const stream_t &stream)
 Synchronously copies a single (typed) value between memory spaces or within a memory space. More...
 
template<typename T >
void cuda::memory::device::async::typed_set (T *start, const T &value, size_t num_elements, const stream_t &stream)
 Sets consecutive elements of a region of memory to a fixed value of some width. More...
 
void cuda::memory::device::async::set (void *start, int byte_value, size_t num_bytes, const stream_t &stream)
 Asynchronously sets all bytes in a stretch of memory to a single value. More...
 
void cuda::memory::device::async::zero (void *start, size_t num_bytes, const stream_t &stream)
 Similar to set(), but sets the memory to zero rather than an arbitrary value.
 
template<typename T >
void cuda::memory::device::async::zero (T *ptr, const stream_t &stream)
 Asynchronously sets all bytes of a single pointed-to value to 0 (zero). More...
 
void cuda::memory::inter_context::copy (void *destination, const context_t &destination_context, const void *source_address, const context_t &source_context, size_t num_bytes)
 
void cuda::memory::inter_context::copy (void *destination, const context_t &destination_context, const_region_t source, const context_t &source_context)
 
void cuda::memory::inter_context::copy (region_t destination, const context_t &destination_context, const_region_t source, const context_t &source_context)
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::inter_context::copy (array_t< T, NumDimensions > destination, array_t< T, NumDimensions > source)
 
void cuda::memory::inter_context::async::copy (void *destination_address, context_t destination_context, const void *source_address, context_t source_context, size_t num_bytes, stream_t stream)
 
void cuda::memory::inter_context::async::copy (void *destination, context_t destination_context, const_region_t source, context_t source_context, stream_t stream)
 
void cuda::memory::inter_context::async::copy (region_t destination, context_t destination_context, const_region_t source, context_t source_context, stream_t stream)
 
template<typename T , dimensionality_t NumDimensions>
void cuda::memory::inter_context::async::copy (array_t< T, NumDimensions > destination, array_t< T, NumDimensions > source, const stream_t &stream)
 
void * cuda::memory::host::allocate (size_t size_in_bytes, allocation_options options)
 Allocate pinned host memory. More...
 
void * cuda::memory::host::allocate (size_t size_in_bytes, portability_across_contexts portability=portability_across_contexts(false), cpu_write_combining cpu_wc=cpu_write_combining(false))
 
void * cuda::memory::host::allocate (size_t size_in_bytes, cpu_write_combining cpu_wc)
 
void cuda::memory::host::free (void *host_ptr)
 Free a region of pinned host memory which was allocated with allocate. More...
 
void cuda::memory::host::free (region_t region)
 
void cuda::memory::host::register_ (const void *ptr, size_t size, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accesible_to_all)
 
void cuda::memory::host::register_ (const_region_t region, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accesible_to_all)
 
void cuda::memory::host::register_ (void const *ptr, size_t size)
 
void cuda::memory::host::register_ (const_region_t region)
 
void cuda::memory::host::deregister (const void *ptr)
 
void cuda::memory::host::deregister (const_region_t region)
 
void cuda::memory::host::set (void *start, int byte_value, size_t num_bytes)
 Sets all bytes in a stretch of host-side memory to a single value. More...
 
void cuda::memory::host::zero (void *start, size_t num_bytes)
 
template<typename T >
void cuda::memory::host::zero (T *ptr)
 
void cuda::memory::managed::advise_expected_access_by (managed::const_region_t region, device_t &device)
 
void cuda::memory::managed::advise_no_access_expected_by (managed::const_region_t region, device_t &device)
 
template<typename Allocator = ::std::allocator<cuda::device_t>>
typename ::std::vector< device_t, Allocator > cuda::memory::managed::accessors (managed::const_region_t region, const Allocator &allocator=Allocator())
 
region_t cuda::memory::managed::allocate (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
 Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More...
 
region_t cuda::memory::managed::allocate (device_t device, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
 Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More...
 
region_t cuda::memory::managed::allocate (size_t num_bytes)
 Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices. More...
 
void cuda::memory::managed::free (void *managed_ptr)
 Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate.
 
void cuda::memory::managed::free (region_t region)
 
void cuda::memory::managed::advice::set (const_region_t region, kind_t advice, const device_t &device)
 
void cuda::memory::managed::async::prefetch (const_region_t region, const cuda::device_t &destination, const stream_t &stream)
 Prefetches a region of managed memory to a specific device, so it can later be used there without waiting for I/O from the host or other devices.
 
void cuda::memory::managed::async::prefetch_to_host (const_region_t region, const stream_t &stream)
 Prefetches a region of managed memory into host memory. More...
 
template<typename T >
T * cuda::memory::mapped::device_side_pointer_for (T *host_memory_ptr)
 Obtain a pointer in the device-side memory space (= address range) for the device-side memory mapped to the host-side pointer host_memory_ptr.
 
region_pair cuda::memory::mapped::allocate (cuda::context_t &context, size_t size_in_bytes, allocation_options options)
 Allocate a memory region on the host, which is also mapped to a memory region in a context of some CUDA device - so that changes to one will be reflected in the other. More...
 
region_pair cuda::memory::mapped::allocate (cuda::device_t &device, size_t size_in_bytes, allocation_options options=allocation_options{})
 Allocate a memory region on the host, which is also mapped to a memory region in the global memory of a CUDA device - so that changes to one will be reflected in the other. More...
 
void cuda::memory::mapped::free (region_pair pair)
 Free a pair of mapped memory regions. More...
 
void cuda::memory::mapped::free_region_pair_of (void *ptr)
 Free a pair of mapped memory regions using just one of them. More...
 
bool cuda::memory::mapped::is_part_of_a_region_pair (const void *ptr)
 Determine whether a given stretch of memory was allocated as part of a mapped pair of host and device memory regions. More...
 
template<typename T >
memory::region_t cuda::symbol::locate (T &&symbol)
 Locates a CUDA symbol in global or constant device memory. More...
 
void cuda::memory::device::free (void *ptr)
 Free a region of device-side memory (regardless of how it was allocated)
 
void cuda::memory::device::free (region_t region)
 
void cuda::memory::device::set (void *start, int byte_value, size_t num_bytes)
 Sets all bytes in a region of memory to a fixed value. More...
 
void cuda::memory::device::set (region_t region, int byte_value)
 
void cuda::memory::device::zero (void *start, size_t num_bytes)
 Sets all bytes in a region of memory to 0 (zero) More...
 
void cuda::memory::device::zero (region_t region)
 
void cuda::memory::copy (void *destination, const void *source, size_t num_bytes)
 Synchronously copies data between memory spaces or within a memory space. More...
 
void cuda::memory::copy (void *destination, const_region_t source)
 
void cuda::memory::copy (region_t destination, const_region_t source)
 
template<typename T , size_t N>
void cuda::memory::copy (region_t destination, const T(&source)[N])
 
template<typename T , size_t N>
void cuda::memory::copy (T(&destination)[N], const_region_t source)
 
template<typename T , size_t N>
void cuda::memory::copy (void *destination, T(&source)[N])
 
template<typename T , size_t N>
void cuda::memory::copy (T(&destination)[N], T *source)
 
void cuda::memory::copy (region_t destination, void *source, size_t num_bytes)
 
void cuda::memory::copy (region_t destination, void *source)
 
void cuda::memory::async::copy (void *destination, void const *source, size_t num_bytes, const stream_t &stream)
 Asynchronously copies data between memory spaces or within a memory space. More...
 
void cuda::memory::async::copy (void *destination, const_region_t source, size_t num_bytes, const stream_t &stream)
 
void cuda::memory::async::copy (region_t destination, const_region_t source, size_t num_bytes, const stream_t &stream)
 
void cuda::memory::async::copy (void *destination, const_region_t source, const stream_t &stream)
 
void cuda::memory::async::copy (region_t destination, const_region_t source, const stream_t &stream)
 
void cuda::memory::async::copy (region_t destination, void *source, const stream_t &stream)
 
template<typename T , size_t N>
void cuda::memory::async::copy (region_t destination, const T(&source)[N], const stream_t &stream)
 
void cuda::memory::async::copy (region_t destination, void *source, size_t num_bytes, const stream_t &stream)
 

Detailed Description

freestanding wrapper functions for working with CUDA's various kinds of memory spaces, arranged into a relevant namespace hierarchy.

Note
Some of the CUDA API for allocating and copying memory involves the concept of "pitch" and "pitched pointers". To better understand what that means, consider the following two-dimensional representation of an array (which is in fact embedded in linear memory):

X X X X * * * X X X X * * * X X X X * * *

The pitch in the example above is 7 * sizeof(T) The width is 4 * sizeof(T) The height is 3

See also https://stackoverflow.com/questions/16119943/how-and-when-should-i-use-pitched-pointer-with-the-cuda-api

Enumeration Type Documentation

◆ accessibility_on_all_devices

Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation.

Enumerator
is_accessible_on_all_devices 

is_accessible_on_all_devices

is_not_accessible_on_all_devices 

is_not_accessible_on_all_devices

◆ cpu_write_combining

A memory allocation setting: Should the allocated memory be configured as write-combined, i.e.

a write may not be immediately applied to the allocated region and propagated (e.g. to caches, over the PCIe bus). Instead, writes will be applied as convenient, possibly in batch.

Write-combining memory frees up the host's L1 and L2 cache resources, making more cache available to the rest of the application. In addition, write-combining memory is not snooped during transfers across the PCI Express bus, which can improve transfer performance.

Reading from write-combining memory from the host is prohibitively slow, so write-combining memory should in general be used for memory that the host only writes to.

◆ map_into_device_memory

Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.

When true, one can then obtain the device-space pointer using cudaHostGetDevicePointer().

◆ mapped_io_space

Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.

When true, one can then obtain the device-space pointer using cudaHostGetDevicePointer().

Function Documentation

◆ allocate() [1/3]

void * cuda::memory::host::allocate ( size_t  size_in_bytes,
allocation_options  options 
)
inline

Allocate pinned host memory.

Note
This function will fail if
"Pinned" memory is allocated in contiguous physical RAM addresses, making it possible to copy to and from it to the the GPU using DMA without assistance from the GPU. This improves the copying bandwidth significantly over naively-allocated host memory, and reduces overhead for the CPU.
Exceptions
cuda::runtime_errorif allocation fails for any reason
Todo:
Consider a variant of this supporting the cudaHostAlloc flags
Parameters
size_in_bytesthe amount of memory to allocate, in bytes
optionsoptions to pass to the CUDA host-side memory allocator; see {memory::allocation_options}.
Returns
a pointer to the allocated stretch of memory
Note
The allocation does not keep any device context alive/active; that is the caller's responsibility. However, if there is no current context, it will trigger the creation of a primary context on the default device, and "leak" a refcount unit for it.

◆ allocate() [2/3]

region_pair cuda::memory::mapped::allocate ( cuda::context_t context,
size_t  size_in_bytes,
allocation_options  options 
)
inline

Allocate a memory region on the host, which is also mapped to a memory region in a context of some CUDA device - so that changes to one will be reflected in the other.

Parameters
contextThe device context in which the device-side region in the pair will be allocated.
size_in_bytesamount of memory to allocate (in each of the regions)
optionssee allocation_options

◆ allocate() [3/3]

region_pair cuda::memory::mapped::allocate ( cuda::device_t &  device,
size_t  size_in_bytes,
allocation_options  options = allocation_options{} 
)
inline

Allocate a memory region on the host, which is also mapped to a memory region in the global memory of a CUDA device - so that changes to one will be reflected in the other.

Parameters
deviceThe device on which the device-side region in the pair will be allocated
size_in_bytesamount of memory to allocate (in each of the regions)
optionssee allocation_options

◆ copy() [1/15]

void cuda::memory::copy ( void *  destination,
const void *  source,
size_t  num_bytes 
)
inline

Synchronously copies data between memory spaces or within a memory space.

Note
Since we assume Compute Capability >= 2.0, all devices support the Unified Virtual Address Space, so the CUDA driver can determine, for each pointer, where the data is located, and one does not have to specify this.
the sources and destinations may all be in any memory space addressable in the the unified virtual address space, which could be host-side memory, device global memory, device constant memory etc.
Parameters
destinationA pointer to a memory region of size num_bytes.
sourceA pointer to a a memory region of size num_bytes.
num_bytesThe number of bytes to copy from source to destination

◆ copy() [2/15]

void cuda::memory::copy ( void *  destination,
const_region_t  source 
)
inline
Parameters
destinationA memory region of the same size as source.
sourceA region whose contents is to be copied.

◆ copy() [3/15]

void cuda::memory::copy ( region_t  destination,
const_region_t  source 
)
inline
Parameters
destinationA region of memory to which to copy the data in, of size at least that of source , either in host memory or on any CUDA device's global memory.
sourceA region whose contents is to be copied, either in host memory or on any CUDA device's global memory

◆ copy() [4/15]

template<typename T , size_t N>
void cuda::memory::copy ( region_t  destination,
const T(&)  source[N] 
)
inline
Parameters
destinationA region of memory to which to copy the data in source, of size at least that of source.
sourceA plain array whose contents is to be copied.

◆ copy() [5/15]

template<typename T , size_t N>
void cuda::memory::copy ( T(&)  destination[N],
const_region_t  source 
)
inline
Parameters
destinationA region of memory to which to copy the data in source, of size at least that of source.
sourceA region of at least sizeof(T)*N bytes with whose data to fill

◆ copy() [6/15]

template<typename T , size_t N>
void cuda::memory::copy ( T(&)  destination[N],
T *  source 
)
inline
Parameters
destinationA region of memory to which to copy the data in source, of size at least that of source.
sourceThe starting address of
Template Parameters
Nelements to copy

◆ copy() [7/15]

void cuda::memory::copy ( region_t  destination,
void *  source,
size_t  num_bytes 
)
inline
Parameters
destinationA region of memory to which to copy the data in source, of size at least that of source.
sourceA pointer to a a memory region of size num_bytes.
num_bytesThe number of bytes to copy from source to destination

◆ copy() [8/15]

template<typename T , dimensionality_t NumDimensions>
void cuda::memory::copy ( const array_t< T, NumDimensions > &  destination,
const T *  source 
)

Synchronously copies data from a CUDA array into non-array memory.

Template Parameters
NumDimensionsthe number of array dimensions; only 2 and 3 are supported values
Tarray element type
Parameters
destinationA {
Template Parameters
NumDimensions}-dimensionalCUDA array
Parameters
sourceA pointer to a region of contiguous memory holding destination.size() values of type
Template Parameters
T.The memory may be located either on a CUDA device or in host memory.

◆ copy() [9/15]

template<typename T , dimensionality_t NumDimensions>
void cuda::memory::copy ( T *  destination,
const array_t< T, NumDimensions > &  source 
)

Synchronously copies data into a CUDA array from non-array memory.

Template Parameters
NumDimensionsthe number of array dimensions; only 2 and 3 are supported values
Tarray element type
Parameters
destinationA pointer to a region of contiguous memory holding destination.size() values of type
Template Parameters
T.The memory may be located either on a CUDA device or in host memory.
Parameters
sourceA {
Template Parameters
NumDimensions}-dimensionalCUDA array

◆ copy() [10/15]

void cuda::memory::async::copy ( void *  destination,
void const *  source,
size_t  num_bytes,
const stream_t stream 
)
inline

Asynchronously copies data between memory spaces or within a memory space.

Note
Since we assume Compute Capability >= 2.0, all devices support the Unified Virtual Address Space, so the CUDA driver can determine, for each pointer, where the data is located, and one does not have to specify this.
asynchronous version of memory::copy
Parameters
destinationA (pointer to) a memory region of size num_bytes, either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream.
sourceA (pointer to) a memory region of size num_bytes, either in host memory or on any CUDA device's global memory. Must be defined in the same context as the stream
num_bytesThe number of bytes to copy from source to destination
streamA stream on which to enqueue the copy operation

◆ copy() [11/15]

template<typename T , size_t N>
void cuda::memory::async::copy ( region_t  destination,
const T(&)  source[N],
const stream_t stream 
)
inline
Parameters
sourceA plain array whose contents is to be copied.

◆ copy() [12/15]

template<typename T , dimensionality_t NumDimensions>
void cuda::memory::async::copy ( array_t< T, NumDimensions > &  destination,
const T *  source,
const stream_t stream 
)
inline

Asynchronously copies data from memory spaces into CUDA arrays.

Note
asynchronous version of memory::copy
Parameters
destinationA CUDA array cuda::array_t
sourceA pointer to a a memory region of size destination.size() * sizeof(T)
streamschedule the copy operation into this CUDA stream

◆ copy() [13/15]

template<typename T , dimensionality_t NumDimensions>
void cuda::memory::async::copy ( T *  destination,
const array_t< T, NumDimensions > &  source,
const stream_t stream 
)
inline

Asynchronously copies data from CUDA arrays into memory spaces.

Note
asynchronous version of memory::copy
Parameters
destinationA pointer to a a memory region of size source.size() * sizeof(T)
sourceA CUDA array cuda::array_t
streamschedule the copy operation into this CUDA stream

◆ copy() [14/15]

template<typename T , size_t N>
void cuda::memory::async::copy ( T(&)  destination[N],
T *  source,
const stream_t stream 
)
inline
Parameters
destinationA region of memory to which to copy the data in source, of size at least that of source.
sourceThe starting address of
Template Parameters
Nelements to copy

◆ copy() [15/15]

template<typename T , size_t N>
void cuda::memory::async::copy ( T(&)  destination[N],
const_region_t  source,
const stream_t stream 
)
inline
Parameters
destinationA region of memory to which to copy the data in source, of size at least that of source.
sourceA region of at least sizeof(T)*N bytes with whose data to fill

◆ copy_single() [1/2]

template<typename T >
void cuda::memory::copy_single ( T *  destination,
const T *  source 
)

Synchronously copies a single (typed) value between two memory locations.

Parameters
destinationa value residing either in host memory or on any CUDA device's global memory
sourcea value residing either in host memory or on any CUDA device's global memory

◆ copy_single() [2/2]

template<typename T >
void cuda::memory::async::copy_single ( T &  destination,
const T &  source,
const stream_t stream 
)
inline

Synchronously copies a single (typed) value between memory spaces or within a memory space.

Note
asynchronous version of memory::copy_single
Parameters
destinationa value residing either in host memory or on any CUDA device's global memory
sourcea value residing either in host memory or on any CUDA device's global memory
streamThe CUDA command queue on which this copyijg will be enqueued

◆ free() [1/2]

void cuda::memory::host::free ( void *  host_ptr)
inline

Free a region of pinned host memory which was allocated with allocate.

Note
You can't just use cuMemFreeHost - or you'll leak a primary context reference unit.

◆ free() [2/2]

void cuda::memory::mapped::free ( region_pair  pair)
inline

Free a pair of mapped memory regions.

Parameters
paira pair of regions allocated with allocate (or with the C-style CUDA runtime API directly)

◆ free_region_pair_of()

void cuda::memory::mapped::free_region_pair_of ( void *  ptr)
inline

Free a pair of mapped memory regions using just one of them.

Parameters
ptra pointer to one of the mapped regions (can be either the device-side or the host-side)

◆ is_part_of_a_region_pair()

bool cuda::memory::mapped::is_part_of_a_region_pair ( const void *  ptr)
inline

Determine whether a given stretch of memory was allocated as part of a mapped pair of host and device memory regions.

Todo:
What if it's a managed pointer?
Parameters
ptrthe beginning of a memory region - in either host or device memory - to check
Returns
true iff the region was allocated as one side of a mapped memory region pair

◆ locate()

template<typename T >
memory::region_t cuda::symbol::locate ( T &&  symbol)
inline

Locates a CUDA symbol in global or constant device memory.

Note
symbol_t symbols are associated with the primary context
Returns
The region of memory CUDA associates with the symbol

◆ prefetch_to_host()

void cuda::memory::managed::async::prefetch_to_host ( const_region_t  region,
const stream_t stream 
)
inline

Prefetches a region of managed memory into host memory.

It can later be used there without waiting for I/O from any of the CUDA devices.

◆ set() [1/4]

void cuda::memory::set ( void *  ptr,
int  byte_value,
size_t  num_bytes 
)
inline

Sets a number of bytes in memory to a fixed value.

Note
The equivalent of ::std::memset - for any and all CUDA-related memory spaces
Parameters
ptrAddress of the first byte in memory to set. May be in host-side memory, global CUDA-device-side memory or CUDA-managed memory.
byte_valuevalue to set the memory region to
num_bytesThe amount of memory to set to byte_value

◆ set() [2/4]

void cuda::memory::set ( region_t  region,
int  byte_value 
)
inline

Sets all bytes in a region of memory to a fixed value.

Note
The equivalent of ::std::memset - for any and all CUDA-related memory spaces
Parameters
regionthe memory region to set; may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory.
byte_valuevalue to set the memory region to

◆ set() [3/4]

void cuda::memory::device::async::set ( void *  start,
int  byte_value,
size_t  num_bytes,
const stream_t stream 
)
inline

Asynchronously sets all bytes in a stretch of memory to a single value.

Note
asynchronous version of memory::zero
Parameters
startstarting address of the memory region to set, in a CUDA device's global memory
byte_valuevalue to set the memory region to
num_bytessize of the memory region in bytes
streamstream on which to schedule this action

◆ set() [4/4]

void cuda::memory::host::set ( void *  start,
int  byte_value,
size_t  num_bytes 
)
inline

Sets all bytes in a stretch of host-side memory to a single value.

Note
a wrapper for ::std::memset
Parameters
startstarting address of the memory region to set, in host memory; can be either CUDA-allocated or otherwise.
byte_valuevalue to set the memory region to
num_bytessize of the memory region in bytes

◆ typed_set()

template<typename T >
void cuda::memory::device::async::typed_set ( T *  start,
const T &  value,
size_t  num_elements,
const stream_t stream 
)
inline

Sets consecutive elements of a region of memory to a fixed value of some width.

Note
A generalization of async::set(), for different-size units.
Template Parameters
TAn unsigned integer type of size 1, 2, 4 or 8
Parameters
startThe first location to set to value ; must be properly aligned.
valueA (properly aligned) value to set T-elements to.
num_elementsThe number of type-T elements (i.e. not necessarily the number of bytes).
streamThe stream on which to enqueue the operation.

◆ zero() [1/4]

void cuda::memory::zero ( region_t  region)
inline

Sets all bytes in a region of memory to 0 (zero)

Parameters
regionthe memory region to zero-out; may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory.

◆ zero() [2/4]

void cuda::memory::zero ( void *  ptr,
size_t  num_bytes 
)
inline

Sets a number of bytes starting in at a given address of memory to 0 (zero)

Parameters
regionthe memory region to zero-out; may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory.

◆ zero() [3/4]

template<typename T >
void cuda::memory::zero ( T *  ptr)
inline

Sets all bytes of a single pointed-to value to 0.

Parameters
ptrpointer to a single element of a certain type, which may be in host-side memory, global CUDA-device-side memory or CUDA-managed memory

◆ zero() [4/4]

template<typename T >
void cuda::memory::device::async::zero ( T *  ptr,
const stream_t stream 
)
inline

Asynchronously sets all bytes of a single pointed-to value to 0 (zero).

Note
asynchronous version of memory::zero
Parameters
ptra pointer to the value to be to zero; must be valid in the CUDA context of stream
streamstream on which to schedule this action