|
cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory. More...
Typedefs | |
| using | region_t = detail_::region_helper< memory::region_t > |
| A child class of the generic region_t with some managed-memory-specific functionality. | |
| using | const_region_t = detail_::region_helper< memory::const_region_t > |
| A child class of the generic const_region_t with some managed-memory-specific functionality. | |
| using | unique_region = memory::unique_region< detail_::deleter > |
| A unique region of managed memory, see cuda::memory::managed. | |
Enumerations | |
| enum | attachment_t : unsigned { global = CU_MEM_ATTACH_GLOBAL, host = CU_MEM_ATTACH_HOST, single_stream = CU_MEM_ATTACH_SINGLE } |
| Kinds of managed memory region attachments. | |
| enum | initial_visibility_t { to_all_devices, to_supporters_of_concurrent_managed_access } |
| The choices of which categories CUDA devices must a managed memory region be visible to. | |
Functions | |
| void | advise_expected_access_by (const_region_t region, device_t &device) |
Advice the CUDA driver that device is expected to access region. | |
| void | advise_no_access_expected_by (const_region_t region, device_t &device) |
Advice the CUDA driver that device is not expected to access region. | |
| template<typename Allocator = ::std::allocator<cuda::device_t>> | |
| ::std::vector< device_t, Allocator > | expected_accessors (const_region_t region, const Allocator &allocator=Allocator()) |
| region_t | allocate (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
| Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More... | |
| region_t | allocate (const device_t &device, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
| Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More... | |
| region_t | allocate (size_t num_bytes) |
| Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices. More... | |
| void | free (void *managed_ptr) |
| Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate. | |
| void | free (region_t region) |
| Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate. More... | |
| void | prefetch (const_region_t region, const cuda::device_t &destination, const stream_t &stream) |
| Prefetches a region of managed memory to a specific device, so it can later be used there without waiting for I/O from the host or other devices. | |
| void | prefetch_to_host (const_region_t region, const stream_t &stream) |
| Prefetches a region of managed memory into host memory. More... | |
| template<typename T > | |
| unique_span< T > | make_unique_span (const context_t &context, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
| Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More... | |
| template<typename T > | |
| unique_span< T > | make_unique_span (const device_t &device, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
See device::make_unique_span(const context_t& context, size_t size) More... | |
| template<typename T > | |
| unique_span< T > | make_unique_span (size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
See device::make_unique_span(const context_t& context, size_t size) More... | |
| unique_region | make_unique_region (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility) |
| Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More... | |
| unique_region | make_unique_region (const device_t &device, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
| unique_region | make_unique_region (size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices) |
| Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More... | |
Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory.
This type of memory, also known as unified memory, appears within a unified, all-system address space - and is used with the same address range on the host and on all relevant CUDA devices on a system. It is paged, so that it may exceed the physical size of a CUDA device's global memory. The CUDA driver takes care of "swapping" pages "out" from a device to host memory or "swapping" them back "in", as well as of propagation of changes between devices and host-memory.
|
inline |
Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices.
| context | the initial context which is likely to access the managed memory region (and which will certainly have the region actually allocated for it) |
| num_bytes | size of each of the regions of memory to allocate |
| initial_visibility | will the allocated region be visible, using the common address, to all CUDA device (= more overhead, more work for the CUDA runtime) or just to those devices with some hardware features to assist in this task (= less overhead)? |
|
inline |
Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices.
| device | the initial device which is likely to access the managed memory region (and which will certainly have the region actually allocated for it) |
| num_bytes | size of each of the regions of memory to allocate |
| initial_visibility | will the allocated region be visible, using the common address, to all CUDA device (= more overhead, more work for the CUDA runtime) or just to those devices with some hardware features to assist in this task (= less overhead)? |
Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices.
| ::std::vector<device_t, Allocator> cuda::memory::managed::expected_accessors | ( | const_region_t | region, |
| const Allocator & | allocator = Allocator() |
||
| ) |
|
inline |
Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate.
|
inline |
Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.
| context | A context of possible single-device-visibility |
| context | A context, to set when allocating the memory region, for whatever association effect that may have. |
|
inline |
| [in] | device | whose primary context's memory the unique reqion is to be allocated in. |
| device | A context, whose primary context will be current when allocating the memory region, for whatever association effect that may have. |
|
inline |
Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.
| unique_span< T > cuda::memory::managed::make_unique_span | ( | const context_t & | context, |
| size_t | size, | ||
| initial_visibility_t | initial_visibility = initial_visibility_t::to_all_devices |
||
| ) |
Allocate memory for a consecutive sequence of typed elements in system (host-side) memory.
| T | type of the individual elements in the allocated sequence |
| context | The CUDA device context in which to register the allocation |
| size | the number of elements to allocate |
| initial_visibility | Choices of which category of CUDA devices must the managed region be guaranteed to be visible to |
| unique_span< T > cuda::memory::managed::make_unique_span | ( | const device_t & | device, |
| size_t | size, | ||
| initial_visibility_t | initial_visibility = initial_visibility_t::to_all_devices |
||
| ) |
See device::make_unique_span(const context_t& context, size_t size)
| device | The CUDA device in whose primary context to make the allocation. |
| unique_span< T > cuda::memory::managed::make_unique_span | ( | size_t | size, |
| initial_visibility_t | initial_visibility = initial_visibility_t::to_all_devices |
||
| ) |
See device::make_unique_span(const context_t& context, size_t size)
|
inline |
Prefetches a region of managed memory into host memory.
It can later be used there without waiting for I/O from any of the CUDA devices.