cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
freestanding wrapper functions for working with CUDA's various kinds of memory spaces, arranged into a relevant namespace hierarchy. More...
#include "copy_parameters.hpp"
#include "array.hpp"
#include "constants.hpp"
#include "current_device.hpp"
#include "error.hpp"
#include "pointer.hpp"
#include "current_context.hpp"
#include "detail/unique_span.hpp"
#include <cuda_runtime.h>
#include <memory>
#include <cstring>
#include <vector>
#include <utility>
Go to the source code of this file.
Classes | |
struct | cuda::memory::allocation_options |
options accepted by CUDA's allocator of memory with a host-side aspect (host-only or managed memory). More... | |
struct | cuda::memory::mapped::span_pair_t< T > |
A pair of memory spans, one in device-global memory and one in host/system memory, mapped to it. More... | |
struct | cuda::memory::mapped::region_pair_t |
A pair of memory regions, one in system (=host) memory and one on a CUDA device's memory - mapped to each other. More... | |
Namespaces | |
cuda | |
Definitions and functionality wrapping CUDA APIs. | |
cuda::memory | |
Representation, allocation and manipulation of CUDA-related memory, of different. | |
cuda::memory::mapped | |
Memory regions appearing in both on the host-side and device-side address spaces with the regions in both spaces mapped to each other (i.e. | |
cuda::memory::device | |
CUDA-Device-global memory on a single device (not accessible from the host) | |
cuda::memory::host | |
Host-side (= system) memory which is "pinned", i.e. | |
cuda::memory::managed | |
Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory. | |
Typedefs | |
using | cuda::memory::managed::region_t = detail_::region_helper< memory::region_t > |
A child class of the generic region_t with some managed-memory-specific functionality. | |
using | cuda::memory::managed::const_region_t = detail_::region_helper< memory::const_region_t > |
A child class of the generic const_region_t with some managed-memory-specific functionality. | |
Enumerations | |
enum | cuda::memory::portability_across_contexts : bool { isnt_portable = false, is_portable = true } |
A memory allocation setting: Can the allocated memory be used in other CUDA driver contexts (in addition to the implicit default context we have with the Runtime API). | |
enum | cuda::memory::cpu_write_combining : bool { without_wc = false, with_wc = true } |
A memory allocation setting: Should the allocated memory be configured as write-combined, i.e. More... | |
enum | cuda::memory::host::mapped_io_space : bool { is_mapped_io_space = true, is_not_mapped_io_space = false } |
Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More... | |
enum | cuda::memory::host::map_into_device_memory : bool { map_into_device_memory = true, do_not_map_into_device_memory = false } |
Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More... | |
enum | cuda::memory::host::accessibility_on_all_devices : bool { cuda::memory::host::is_accessible_on_all_devices = true, cuda::memory::host::is_not_accessible_on_all_devices = false } |
Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation. More... | |
enum | cuda::memory::managed::attachment_t : unsigned { global = CU_MEM_ATTACH_GLOBAL, host = CU_MEM_ATTACH_HOST, single_stream = CU_MEM_ATTACH_SINGLE } |
Kinds of managed memory region attachments. | |
Functions | |
void | cuda::memory::device::free (void *ptr) |
Free a region of device-side memory (regardless of how it was allocated) | |
void | cuda::memory::device::free (region_t region) |
Free a region of device-side memory (regardless of how it was allocated) More... | |
region_t | cuda::memory::device::allocate (const context_t &context, size_t size_in_bytes) |
Allocate device-side memory on a CUDA device context. More... | |
region_t | cuda::memory::device::allocate (const device_t &device, size_t size_in_bytes) |
Allocate device-side memory on a CUDA device. More... | |
template<typename T > | |
void | cuda::memory::device::typed_set (T *start, const T &value, size_t num_elements, optional_ref< const stream_t > stream={}) |
Sets consecutive elements of a region of memory to a fixed value of some width. More... | |
void | cuda::memory::device::set (void *start, int byte_value, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to a fixed value. More... | |
void | cuda::memory::device::set (region_t region, int byte_value, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to a fixed value. More... | |
void | cuda::memory::device::zero (void *start, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to 0 (zero) More... | |
void | cuda::memory::device::zero (region_t region, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to 0 (zero) More... | |
template<typename T > | |
void | cuda::memory::device::zero (T *ptr, optional_ref< const stream_t > stream={}) |
Sets all bytes of a single pointed-to value to 0. More... | |
void | cuda::memory::set (void *ptr, int byte_value, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Sets a number of bytes in memory to a fixed value. More... | |
void | cuda::memory::set (region_t region, int byte_value, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to a fixed value. More... | |
void | cuda::memory::zero (region_t region, optional_ref< const stream_t > stream={}) |
Sets all bytes in a region of memory to 0 (zero) More... | |
void | cuda::memory::zero (void *ptr, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Zero-out a region of memory. More... | |
template<typename T > | |
void | cuda::memory::zero (T *ptr) |
Sets all bytes of a single pointed-to value to 0. More... | |
template<dimensionality_t NumDimensions> | |
void | cuda::memory::copy (copy_parameters_t< NumDimensions > params, optional_ref< const stream_t > stream={}) |
An almost-generalized-case memory copy, taking a rather complex structure of copy parameters - wrapping the CUDA driver's own most-generalized-case copy. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (const array_t< T, NumDimensions > &destination, const context_t &source_context, const T *source, optional_ref< const stream_t > stream={}) |
Synchronously copies data from a CUDA array into non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (array_t< T, NumDimensions > &destination, const T *source, optional_ref< const stream_t > stream={}) |
Synchronously copies data from a CUDA array into non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (const array_t< T, NumDimensions > &destination, span< T const > source, optional_ref< const stream_t > stream={}) |
Copies a contiguous sequence of elements in memory into a CUDA array. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (const context_t &context, T *destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Synchronously copies data into a CUDA array from non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (T *destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Synchronously copies data into a CUDA array from non-array memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (span< T > destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Copies the contents of a CUDA array into a sequence of contiguous elements in memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (const array_t< T, NumDimensions > &destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream) |
Copies the contents of one CUDA array to another. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (region_t destination, const array_t< T, NumDimensions > &source, optional_ref< const stream_t > stream={}) |
Copies the contents of a CUDA array into a region of memory. More... | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::copy (array_t< T, NumDimensions > &destination, const_region_t source, optional_ref< const stream_t > stream={}) |
Copies the contents of a region of memory into a CUDA array. More... | |
template<typename T > | |
void | cuda::memory::copy_single (T *destination, const T *source, optional_ref< const stream_t > stream={}) |
Synchronously copies a single (typed) value between two memory locations. More... | |
void | cuda::memory::copy (void *destination, void const *source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Asynchronously copies data between memory spaces or within a memory space. More... | |
template<typename T , size_t N> | |
void | cuda::memory::copy (c_array< T, N > &destination, const_region_t source, optional_ref< const stream_t > stream={}) |
Copy the contents of memory region into a C-style array, interpreting the memory as a sequence of elements of the array's element type. More... | |
template<typename T , size_t N> | |
void | cuda::memory::copy (region_t destination, c_array< const T, N > const &source, optional_ref< const stream_t > stream={}) |
void | cuda::memory::copy (region_t destination, const_region_t source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Asynchronously copies data between memory spaces or within a memory space. More... | |
void | cuda::memory::copy (region_t destination, const_region_t source, optional_ref< const stream_t > stream={}) |
void | cuda::memory::copy (region_t destination, void *source, optional_ref< const stream_t > stream={}) |
Copy memory between memory regions. More... | |
void | cuda::memory::copy (region_t destination, void *source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Copy one region of memory into another. More... | |
void | cuda::memory::copy (void *destination, const_region_t source, size_t num_bytes, optional_ref< const stream_t > stream={}) |
Copy one region of memory to another location. More... | |
void | cuda::memory::copy (void *destination, const_region_t source, optional_ref< const stream_t > stream={}) |
void | cuda::memory::inter_context::copy (void *destination_address, const context_t &destination_context, const void *source_address, const context_t &source_context, size_t num_bytes, optional_ref< const stream_t > stream) |
Asynchronously copy a region of memory defined in one context into a region defined in another. | |
void | cuda::memory::inter_context::copy (void *destination, const context_t &destination_context, const_region_t source, const context_t &source_context, optional_ref< const stream_t > stream) |
Asynchronously copy a region of memory defined in one context into a region defined in another. | |
void | cuda::memory::inter_context::copy (region_t destination, const context_t &destination_context, const void *source, const context_t &source_context, optional_ref< const stream_t > stream) |
Asynchronously copy a region of memory defined in one context into a region defined in another. | |
void | cuda::memory::inter_context::copy (region_t destination, const context_t &destination_context, const_region_t source, const context_t &source_context, optional_ref< const stream_t > stream) |
Asynchronously copy a region of memory defined in one context into a region defined in another. | |
template<typename T , dimensionality_t NumDimensions> | |
void | cuda::memory::inter_context::copy (array_t< T, NumDimensions > destination, array_t< T, NumDimensions > source, optional_ref< const stream_t > stream) |
Asynchronously copy a CUDA array defined in one context into a CUDA array defined in another. | |
region_t | cuda::memory::host::allocate (size_t size_in_bytes, allocation_options options) |
Allocates pinned host memory. More... | |
region_t | cuda::memory::host::allocate (size_t size_in_bytes, portability_across_contexts portability=portability_across_contexts(false), cpu_write_combining cpu_wc=cpu_write_combining(false)) |
Allocates pinned host memory. More... | |
region_t | cuda::memory::host::allocate (size_t size_in_bytes, cpu_write_combining cpu_wc) |
Allocates pinned host memory. More... | |
void | cuda::memory::host::free (void *host_ptr) |
Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More... | |
void | cuda::memory::host::free (region_t region) |
Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More... | |
void | cuda::memory::host::register_ (const void *ptr, size_t size, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all) |
Register a memory region with the CUDA driver. More... | |
void | cuda::memory::host::register_ (const_region_t region, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all) |
Register a memory region with the CUDA driver. More... | |
void | cuda::memory::host::register_ (void const *ptr, size_t size) |
Register a memory region with the CUDA driver. More... | |
void | cuda::memory::host::register_ (const_region_t region) |
Register a memory region with the CUDA driver. More... | |
void | cuda::memory::host::deregister (const void *ptr) |
Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More... | |
void | cuda::memory::host::deregister (const_region_t region) |
Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More... | |
void | cuda::memory::managed::advise_expected_access_by (const_region_t region, device_t &device) |
Advice the CUDA driver that device is expected to access region . | |
void | cuda::memory::managed::advise_no_access_expected_by (const_region_t region, device_t &device) |
Advice the CUDA driver that device is not expected to access region . | |
template<typename Allocator = ::std::allocator<cuda::device_t>> | |
typename ::std::vector< device_t, Allocator > | cuda::memory::managed::expected_accessors (const_region_t region, const Allocator &allocator=Allocator()) |
region_t | cuda::memory::managed::allocate (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More... | |
region_t | cuda::memory::managed::allocate (const device_t &device, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More... | |
region_t | cuda::memory::managed::allocate (size_t num_bytes) |
Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices. More... | |
void | cuda::memory::managed::free (void *managed_ptr) |
Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate. | |
void | cuda::memory::managed::free (region_t region) |
Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate. More... | |
void | cuda::memory::managed::prefetch (const_region_t region, const cuda::device_t &destination, const stream_t &stream) |
Prefetches a region of managed memory to a specific device, so it can later be used there without waiting for I/O from the host or other devices. | |
void | cuda::memory::managed::prefetch_to_host (const_region_t region, const stream_t &stream) |
Prefetches a region of managed memory into host memory. More... | |
template<typename T > | |
T * | cuda::memory::mapped::device_side_pointer_for (T *host_memory_ptr) |
Obtain a pointer in the device-side memory space (= address range) given given a host-side pointer mapped to it. More... | |
region_t | cuda::memory::mapped::device_side_region_for (region_t region) |
Get the memory region mapped to a given host-side region. More... | |
const_region_t | cuda::memory::mapped::device_side_region_for (const_region_t region) |
Get the memory region mapped to a given host-side region. More... | |
region_pair_t | cuda::memory::mapped::allocate (cuda::context_t &context, size_t size_in_bytes, allocation_options options) |
Allocate a memory region on the host, which is also mapped to a memory region in a context of some CUDA device - so that changes to one will be reflected in the other. More... | |
region_pair_t | cuda::memory::mapped::allocate (cuda::device_t &device, size_t size_in_bytes, allocation_options options=allocation_options{}) |
Allocate a memory region on the host, which is also mapped to a memory region in the global memory of a CUDA device - so that changes to one will be reflected in the other. More... | |
void | cuda::memory::mapped::free (region_pair_t pair) |
Free a pair of mapped memory regions. More... | |
void | cuda::memory::mapped::free_region_pair_of (void *ptr) |
Free a pair of mapped memory regions using just one of them. More... | |
bool | cuda::memory::mapped::is_part_of_a_region_pair (const void *ptr) |
Determine whether a given stretch of memory was allocated as part of a mapped pair of host and device memory regions. More... | |
template<typename T > | |
unique_span< T > | cuda::memory::device::make_unique_span (const context_t &context, size_t size) |
Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
template<typename T > | |
unique_span< T > | cuda::memory::device::make_unique_span (const device_t &device, size_t size) |
Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
template<typename T > | |
unique_span< T > | cuda::memory::device::make_unique_span (size_t size) |
Allocate memory for a consecutive sequence of typed elements in device-global memory. More... | |
template<typename T > | |
unique_span< T > | cuda::memory::make_unique_span (const context_t &context, size_t size) |
See device::make_unique_span(const context_t& context, size_t size) | |
template<typename T > | |
unique_span< T > | cuda::memory::make_unique_span (const device_t &device, size_t size) |
See device::make_unique_span(const context_t& context, size_t num_elements) | |
template<typename T > | |
unique_span< T > | cuda::memory::host::make_unique_span (size_t size) |
Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More... | |
template<typename T > | |
unique_span< T > | cuda::memory::managed::make_unique_span (const context_t &context, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More... | |
template<typename T > | |
unique_span< T > | cuda::memory::managed::make_unique_span (const device_t &device, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
See device::make_unique_span(const context_t& context, size_t size) More... | |
template<typename T > | |
unique_span< T > | cuda::memory::managed::make_unique_span (size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
See device::make_unique_span(const context_t& context, size_t size) More... | |
template<typename T > | |
memory::region_t | cuda::symbol::locate (T &&symbol) |
Locates a CUDA symbol in global or constant device memory. More... | |
template<typename T , size_t N> | |
void | cuda::memory::copy (span< T > destination, c_array< const T, N > const &source, optional_ref< const stream_t > stream={}) |
Copy the contents of a C-style array into a span of same-type elements. More... | |
template<typename T , size_t N> | |
void | cuda::memory::copy (c_array< T, N > &destination, span< T const > source, optional_ref< const stream_t > stream={}) |
Copy the contents of a span into a C-style array. More... | |
template<typename T , size_t N> | |
void | cuda::memory::copy (void *destination, c_array< const T, N > const &source, optional_ref< const stream_t > stream={}) |
Copy the contents of a C-style array to another location in memory. More... | |
template<typename T , size_t N> | |
void | cuda::memory::copy (c_array< T, N > &destination, T *source, optional_ref< const stream_t > stream={}) |
Copy memory into a C-style array. More... | |
void | cuda::memory::host::set (void *start, int byte_value, size_t num_bytes) |
Sets all bytes in a stretch of host-side memory to a single value. More... | |
void | cuda::memory::host::set (region_t region, int byte_value) |
void | cuda::memory::host::zero (void *start, size_t num_bytes) |
Zero-out a region of host memory. More... | |
void | cuda::memory::host::zero (region_t region) |
Zero-out a region of host memory. More... | |
template<typename T > | |
void | cuda::memory::host::zero (T *ptr) |
Asynchronously sets all bytes of a single pointed-to value to 0 (zero). More... | |
freestanding wrapper functions for working with CUDA's various kinds of memory spaces, arranged into a relevant namespace hierarchy.
X X X X * * * X X X X * * * X X X X * * *
The pitch in the example above is 7 * sizeof(T) The width is 4 * sizeof(T) The height is 3
memory::region_t cuda::symbol::locate | ( | T && | symbol | ) |
Locates a CUDA symbol in global or constant device memory.
symbol_t
symbols are associated with the primary context