cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Host-side (= system) memory which is "pinned", i.e. More...
Typedefs | |
using | unique_region = memory::unique_region< detail_::deleter > |
A unique region of pinned host memory. | |
Enumerations | |
enum | mapped_io_space : bool { is_mapped_io_space = true, is_not_mapped_io_space = false } |
Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More... | |
enum | map_into_device_memory : bool { map_into_device_memory = true, do_not_map_into_device_memory = false } |
Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More... | |
enum | accessibility_on_all_devices : bool { is_accessible_on_all_devices = true, is_not_accessible_on_all_devices = false } |
Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation. More... | |
Functions | |
region_t | allocate (size_t size_in_bytes, allocation_options options) |
Allocates pinned host memory. More... | |
region_t | allocate (size_t size_in_bytes, portability_across_contexts portability=portability_across_contexts(false), cpu_write_combining cpu_wc=cpu_write_combining(false)) |
Allocates pinned host memory. More... | |
region_t | allocate (size_t size_in_bytes, cpu_write_combining cpu_wc) |
Allocates pinned host memory. More... | |
void | free (void *host_ptr) |
Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More... | |
void | free (region_t region) |
Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More... | |
void | register_ (const void *ptr, size_t size, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all) |
Register a memory region with the CUDA driver. More... | |
void | register_ (const_region_t region, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all) |
Register a memory region with the CUDA driver. More... | |
void | register_ (void const *ptr, size_t size) |
Register a memory region with the CUDA driver. More... | |
void | register_ (const_region_t region) |
Register a memory region with the CUDA driver. More... | |
void | deregister (const void *ptr) |
Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More... | |
void | deregister (const_region_t region) |
Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More... | |
template<typename T > | |
unique_span< T > | make_unique_span (size_t size) |
Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More... | |
unique_region | make_unique_region (size_t num_bytes) |
Allocate a physical-address-pinned region of system memory. More... | |
void | set (void *start, int byte_value, size_t num_bytes) |
Sets all bytes in a stretch of host-side memory to a single value. More... | |
void | set (region_t region, int byte_value) |
void | zero (void *start, size_t num_bytes) |
Zero-out a region of host memory. More... | |
void | zero (region_t region) |
Zero-out a region of host memory. More... | |
template<typename T > | |
void | zero (T *ptr) |
Asynchronously sets all bytes of a single pointed-to value to 0 (zero). More... | |
Host-side (= system) memory which is "pinned", i.e.
resides in a fixed physical location - and allocated by the CUDA driver.
enum cuda::memory::host::accessibility_on_all_devices : bool |
Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation.
Enumerator | |
---|---|
is_accessible_on_all_devices | is_accessible_on_all_devices |
is_not_accessible_on_all_devices | is_not_accessible_on_all_devices |
enum cuda::memory::host::map_into_device_memory : bool |
Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.
When true, one can then obtain the device-space pointer using mapped:device_side_pointer_for()
enum cuda::memory::host::mapped_io_space : bool |
Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.
When true, one can then obtain the device-space pointer using mapped:device_side_pointer_for<T>(T *)
|
inline |
Allocates pinned host memory.
cuda::runtime_error | if allocation fails for any reason |
size_in_bytes | the amount of memory to allocate, in bytes |
options | options to pass to the cuda host-side memory allocator; see {memory::allocation_options}. |
|
inline |
Allocates pinned host memory.
cuda::runtime_error | if allocation fails for any reason |
size_in_bytes | the amount of memory to allocate, in bytes |
options | options to pass to the cuda host-side memory allocator; see {memory::allocation_options}. |
portability | whether or not the allocated region can be used in different CUDA contexts. |
cpu_wc | whether or not the GPU can batch multiple writes to this area and propagate them at its convenience. |
|
inline |
Allocates pinned host memory.
cuda::runtime_error | if allocation fails for any reason |
size_in_bytes | the amount of memory to allocate, in bytes |
options | options to pass to the cuda host-side memory allocator; see {memory::allocation_options}. |
portability | whether or not the allocated region can be used in different CUDA contexts. |
cpu_wc | whether or not the GPU can batch multiple writes to this area and propagate them at its convenience. |
|
inline |
Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it.
|
inline |
Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it.
|
inline |
Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions.
|
inline |
Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions.
region | The region of memory to free |
|
inline |
Allocate a physical-address-pinned region of system memory.
Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.
unique_span<T> cuda::memory::host::make_unique_span | ( | size_t | size | ) |
Allocate memory for a consecutive sequence of typed elements in system (host-side) memory.
T | type of the individual elements in the allocated sequence |
size | the number of elements to allocate |
|
inline |
Register a memory region with the CUDA driver.
Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().
Currently works within the current context
register
, since that's a reserved wordptr | The beginning of a pre-allocated region of host memory |
size | the size in bytes the memory region to register |
register_mapped_io_space | region will be treated as being some memory-mapped I/O space, e.g. belonging to a third-party PCIe device. See CU_MEMHOSTREGISTER_IOMEMORY for more details. |
map_into_device_space | If true, map the region to a region of addresses accessible from the (current context's) device; in practice, and with modern GPUs, this means the region itself will be accessible from the device. See CU_MEMHOSTREGISTER_DEVICEMAP for more details. |
make_device_side_accessible_to_all | Make the region accessible in all CUDA contexts. |
considered_read_only_by_device | Device-side code will consider this region (or rather the region it is mapped to and accessible from the device) as read-only; see CU_MEMHOSTREGISTER_READ_ONLY for more details. |
|
inline |
Register a memory region with the CUDA driver.
Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().
Currently works within the current context
register
, since that's a reserved wordregion | The region to register |
register_mapped_io_space | region will be treated as being some memory-mapped I/O space, e.g. belonging to a third-party PCIe device. See CU_MEMHOSTREGISTER_IOMEMORY for more details. |
map_into_device_space | If true, map the region to a region of addresses accessible from the (current context's) device; in practice, and with modern GPUs, this means the region itself will be accessible from the device. See CU_MEMHOSTREGISTER_DEVICEMAP for more details. |
make_device_side_accessible_to_all | Make the region accessible in all CUDA contexts. |
considered_read_only_by_device | Device-side code will consider this region (or rather the region it is mapped to and accessible from the device) as read-only; see CU_MEMHOSTREGISTER_READ_ONLY for more details. |
|
inline |
Register a memory region with the CUDA driver.
Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().
Currently works within the current context
register
, since that's a reserved wordptr | The beginning of a pre-allocated region of host memory |
size | the size in bytes the memory region to register |
|
inline |
Register a memory region with the CUDA driver.
Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().
Currently works within the current context
register
, since that's a reserved wordregion | The region to register |
|
inline |
Sets all bytes in a stretch of host-side memory to a single value.
byte_value | The value to set each byte in the memory region to. |
start | starting address of the memory region to set, in host memory; can be either CUDA-allocated or otherwise. |
num_bytes | size of the memory region in bytes |
|
inline |
region | The region of memory to set to the fixed value |
|
inline |
Zero-out a region of host memory.
ptr | the beginning of a region of host memory to zero-out |
num_bytes | the size in bytes of the region of memory to zero-out |
|
inline |
Zero-out a region of host memory.
region | the region of host-side memory to zero-out |
|
inline |
Asynchronously sets all bytes of a single pointed-to value to 0 (zero).
ptr | a pointer to the value to be to zero, in host memory |