cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::memory::host Namespace Reference

Host-side (= system) memory which is "pinned", i.e. More...

Typedefs

using unique_region = memory::unique_region< detail_::deleter >
 A unique region of pinned host memory.
 

Enumerations

enum  mapped_io_space : bool {
  is_mapped_io_space = true,
  is_not_mapped_io_space = false
}
 Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More...
 
enum  map_into_device_memory : bool {
  map_into_device_memory = true,
  do_not_map_into_device_memory = false
}
 Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More...
 
enum  accessibility_on_all_devices : bool {
  is_accessible_on_all_devices = true,
  is_not_accessible_on_all_devices = false
}
 Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation. More...
 

Functions

region_t allocate (size_t size_in_bytes, allocation_options options)
 Allocates pinned host memory. More...
 
region_t allocate (size_t size_in_bytes, portability_across_contexts portability=portability_across_contexts(false), cpu_write_combining cpu_wc=cpu_write_combining(false))
 Allocates pinned host memory. More...
 
region_t allocate (size_t size_in_bytes, cpu_write_combining cpu_wc)
 Allocates pinned host memory. More...
 
void free (void *host_ptr)
 Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More...
 
void free (region_t region)
 Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More...
 
void register_ (const void *ptr, size_t size, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all)
 Register a memory region with the CUDA driver. More...
 
void register_ (const_region_t region, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all)
 Register a memory region with the CUDA driver. More...
 
void register_ (void const *ptr, size_t size)
 Register a memory region with the CUDA driver. More...
 
void register_ (const_region_t region)
 Register a memory region with the CUDA driver. More...
 
void deregister (const void *ptr)
 Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More...
 
void deregister (const_region_t region)
 Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More...
 
template<typename T >
unique_span< T > make_unique_span (size_t size)
 Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More...
 
unique_region make_unique_region (size_t num_bytes)
 Allocate a physical-address-pinned region of system memory. More...
 
void set (void *start, int byte_value, size_t num_bytes)
 Sets all bytes in a stretch of host-side memory to a single value. More...
 
void set (region_t region, int byte_value)
 
void zero (void *start, size_t num_bytes)
 Zero-out a region of host memory. More...
 
void zero (region_t region)
 Zero-out a region of host memory. More...
 
template<typename T >
void zero (T *ptr)
 Asynchronously sets all bytes of a single pointed-to value to 0 (zero). More...
 

Detailed Description

Host-side (= system) memory which is "pinned", i.e.

resides in a fixed physical location - and allocated by the CUDA driver.

Enumeration Type Documentation

◆ accessibility_on_all_devices

Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation.

Enumerator
is_accessible_on_all_devices 

is_accessible_on_all_devices

is_not_accessible_on_all_devices 

is_not_accessible_on_all_devices

◆ map_into_device_memory

Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.

When true, one can then obtain the device-space pointer using mapped:device_side_pointer_for()

◆ mapped_io_space

Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.

When true, one can then obtain the device-space pointer using mapped:device_side_pointer_for<T>(T *)

Function Documentation

◆ allocate() [1/3]

region_t cuda::memory::host::allocate ( size_t  size_in_bytes,
allocation_options  options 
)
inline

Allocates pinned host memory.

Note
"pinned" memory is allocated in contiguous physical ram addresses, making it possible to copy to and from it to the the gpu using dma without assistance from the gpu. this improves the copying bandwidth significantly over naively-allocated host memory, and reduces overhead for the cpu.
Exceptions
cuda::runtime_errorif allocation fails for any reason
Parameters
size_in_bytesthe amount of memory to allocate, in bytes
optionsoptions to pass to the cuda host-side memory allocator; see {memory::allocation_options}.
Returns
a pointer to the allocated stretch of memory
Note
The allocation does not keep any device context alive/active; that is the caller's responsibility. However, if there is no current context, it will trigger the creation of a primary context on the default device, and "leak" a refcount unit for it. For this (and other) reasons, one should avoid it, and prefer passing a context, or at least a device, to the allocation function

◆ allocate() [2/3]

region_t cuda::memory::host::allocate ( size_t  size_in_bytes,
portability_across_contexts  portability = portability_across_contexts(false),
cpu_write_combining  cpu_wc = cpu_write_combining(false) 
)
inline

Allocates pinned host memory.

Note
"pinned" memory is allocated in contiguous physical ram addresses, making it possible to copy to and from it to the the gpu using dma without assistance from the gpu. this improves the copying bandwidth significantly over naively-allocated host memory, and reduces overhead for the cpu.
Exceptions
cuda::runtime_errorif allocation fails for any reason
Parameters
size_in_bytesthe amount of memory to allocate, in bytes
optionsoptions to pass to the cuda host-side memory allocator; see {memory::allocation_options}.
Returns
a pointer to the allocated stretch of memory
Note
The allocation does not keep any device context alive/active; that is the caller's responsibility. However, if there is no current context, it will trigger the creation of a primary context on the default device, and "leak" a refcount unit for it. For this (and other) reasons, one should avoid it, and prefer passing a context, or at least a device, to the allocation function
Parameters
portabilitywhether or not the allocated region can be used in different CUDA contexts.
cpu_wcwhether or not the GPU can batch multiple writes to this area and propagate them at its convenience.

◆ allocate() [3/3]

region_t cuda::memory::host::allocate ( size_t  size_in_bytes,
cpu_write_combining  cpu_wc 
)
inline

Allocates pinned host memory.

Note
"pinned" memory is allocated in contiguous physical ram addresses, making it possible to copy to and from it to the the gpu using dma without assistance from the gpu. this improves the copying bandwidth significantly over naively-allocated host memory, and reduces overhead for the cpu.
Exceptions
cuda::runtime_errorif allocation fails for any reason
Parameters
size_in_bytesthe amount of memory to allocate, in bytes
optionsoptions to pass to the cuda host-side memory allocator; see {memory::allocation_options}.
Returns
a pointer to the allocated stretch of memory
Note
The allocation does not keep any device context alive/active; that is the caller's responsibility. However, if there is no current context, it will trigger the creation of a primary context on the default device, and "leak" a refcount unit for it. For this (and other) reasons, one should avoid it, and prefer passing a context, or at least a device, to the allocation function
Parameters
portabilitywhether or not the allocated region can be used in different CUDA contexts.
cpu_wcwhether or not the GPU can batch multiple writes to this area and propagate them at its convenience.

◆ deregister() [1/2]

void cuda::memory::host::deregister ( const void *  ptr)
inline

Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it.

Note
the CUDA API calls this action "unregister", but that's semantically inaccurate. The registration is not undone, rolled back, it's just ended

◆ deregister() [2/2]

void cuda::memory::host::deregister ( const_region_t  region)
inline

Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it.

Note
the CUDA API calls this action "unregister", but that's semantically inaccurate. The registration is not undone, rolled back, it's just ended

◆ free() [1/2]

void cuda::memory::host::free ( void *  host_ptr)
inline

Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions.

Note
The address provided must be the beginning of the region of allocated memory; and the entire region is freed (i.e. the region size is known to/determined by the driver)

◆ free() [2/2]

void cuda::memory::host::free ( region_t  region)
inline

Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions.

Parameters
regionThe region of memory to free

◆ make_unique_region()

unique_region cuda::memory::host::make_unique_region ( size_t  num_bytes)
inline

Allocate a physical-address-pinned region of system memory.

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns
An owning RAII/CADRe object for the allocated memory region

◆ make_unique_span()

template<typename T >
unique_span<T> cuda::memory::host::make_unique_span ( size_t  size)

Allocate memory for a consecutive sequence of typed elements in system (host-side) memory.

Template Parameters
Ttype of the individual elements in the allocated sequence
Parameters
sizethe number of elements to allocate
Returns
A unique_span which owns the allocated memory (and will release said memory upon destruction)
Note
This function is somewhat similar to ::std:: make_unique_for_overwrite(), except that the returned value is not "just" a unique pointer, but also has a size. It is also similar to {cuda::device::make_unique_region}, except that the allocation is conceived as typed elements.
We assume this memory is used for copying to or from device-side memory; hence, we constrain the type to be trivially constructible, destructible and copyable
ignoring alignment

◆ register_() [1/4]

void cuda::memory::host::register_ ( const void *  ptr,
size_t  size,
bool  register_mapped_io_space,
bool  map_into_device_space,
bool  make_device_side_accessible_to_all 
)
inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note
we can't use the name register, since that's a reserved word
Parameters
ptrThe beginning of a pre-allocated region of host memory
sizethe size in bytes the memory region to register
register_mapped_io_spaceregion will be treated as being some memory-mapped I/O space, e.g. belonging to a third-party PCIe device. See CU_MEMHOSTREGISTER_IOMEMORY for more details.
map_into_device_spaceIf true, map the region to a region of addresses accessible from the (current context's) device; in practice, and with modern GPUs, this means the region itself will be accessible from the device. See CU_MEMHOSTREGISTER_DEVICEMAP for more details.
make_device_side_accessible_to_allMake the region accessible in all CUDA contexts.
considered_read_only_by_deviceDevice-side code will consider this region (or rather the region it is mapped to and accessible from the device) as read-only; see CU_MEMHOSTREGISTER_READ_ONLY for more details.

◆ register_() [2/4]

void cuda::memory::host::register_ ( const_region_t  region,
bool  register_mapped_io_space,
bool  map_into_device_space,
bool  make_device_side_accessible_to_all 
)
inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note
we can't use the name register, since that's a reserved word
Parameters
regionThe region to register
register_mapped_io_spaceregion will be treated as being some memory-mapped I/O space, e.g. belonging to a third-party PCIe device. See CU_MEMHOSTREGISTER_IOMEMORY for more details.
map_into_device_spaceIf true, map the region to a region of addresses accessible from the (current context's) device; in practice, and with modern GPUs, this means the region itself will be accessible from the device. See CU_MEMHOSTREGISTER_DEVICEMAP for more details.
make_device_side_accessible_to_allMake the region accessible in all CUDA contexts.
considered_read_only_by_deviceDevice-side code will consider this region (or rather the region it is mapped to and accessible from the device) as read-only; see CU_MEMHOSTREGISTER_READ_ONLY for more details.

◆ register_() [3/4]

void cuda::memory::host::register_ ( void const *  ptr,
size_t  size 
)
inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note
we can't use the name register, since that's a reserved word
Parameters
ptrThe beginning of a pre-allocated region of host memory
sizethe size in bytes the memory region to register

◆ register_() [4/4]

void cuda::memory::host::register_ ( const_region_t  region)
inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note
we can't use the name register, since that's a reserved word
Parameters
regionThe region to register

◆ set() [1/2]

void cuda::memory::host::set ( void *  start,
int  byte_value,
size_t  num_bytes 
)
inline

Sets all bytes in a stretch of host-side memory to a single value.

Note
a wrapper for ::std::memset
Parameters
byte_valueThe value to set each byte in the memory region to.
startstarting address of the memory region to set, in host memory; can be either CUDA-allocated or otherwise.
num_bytessize of the memory region in bytes

◆ set() [2/2]

void cuda::memory::host::set ( region_t  region,
int  byte_value 
)
inline
Parameters
regionThe region of memory to set to the fixed value

◆ zero() [1/3]

void cuda::memory::host::zero ( void *  start,
size_t  num_bytes 
)
inline

Zero-out a region of host memory.

Parameters
ptrthe beginning of a region of host memory to zero-out
num_bytesthe size in bytes of the region of memory to zero-out

◆ zero() [2/3]

void cuda::memory::host::zero ( region_t  region)
inline

Zero-out a region of host memory.

Parameters
regionthe region of host-side memory to zero-out

◆ zero() [3/3]

template<typename T >
void cuda::memory::host::zero ( T *  ptr)
inline

Asynchronously sets all bytes of a single pointed-to value to 0 (zero).

Parameters
ptra pointer to the value to be to zero, in host memory