cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::memory::managed Namespace Reference

Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory. More...

Typedefs

using region_t = detail_::region_helper< memory::region_t >
 A child class of the generic region_t with some managed-memory-specific functionality.
 
using const_region_t = detail_::region_helper< memory::const_region_t >
 A child class of the generic const_region_t with some managed-memory-specific functionality.
 
using unique_region = memory::unique_region< detail_::deleter >
 A unique region of managed memory, see cuda::memory::managed.
 

Enumerations

enum  attachment_t : unsigned {
  global = CU_MEM_ATTACH_GLOBAL,
  host = CU_MEM_ATTACH_HOST,
  single_stream = CU_MEM_ATTACH_SINGLE
}
 Kinds of managed memory region attachments.
 
enum  initial_visibility_t {
  to_all_devices,
  to_supporters_of_concurrent_managed_access
}
 The choices of which categories CUDA devices must a managed memory region be visible to.
 

Functions

void advise_expected_access_by (const_region_t region, device_t &device)
 Advice the CUDA driver that device is expected to access region.
 
void advise_no_access_expected_by (const_region_t region, device_t &device)
 Advice the CUDA driver that device is not expected to access region.
 
template<typename Allocator = ::std::allocator<cuda::device_t>>
typename ::std::vector< device_t, Allocator > expected_accessors (const_region_t region, const Allocator &allocator=Allocator())
 
region_t allocate (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
 Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More...
 
region_t allocate (const device_t &device, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
 Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More...
 
region_t allocate (size_t num_bytes)
 Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices. More...
 
void free (void *managed_ptr)
 Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate.
 
void free (region_t region)
 Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate. More...
 
void prefetch (const_region_t region, const cuda::device_t &destination, const stream_t &stream)
 Prefetches a region of managed memory to a specific device, so it can later be used there without waiting for I/O from the host or other devices.
 
void prefetch_to_host (const_region_t region, const stream_t &stream)
 Prefetches a region of managed memory into host memory. More...
 
template<typename T >
unique_span< T > make_unique_span (const context_t &context, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
 Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More...
 
template<typename T >
unique_span< T > make_unique_span (const device_t &device, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
 See device::make_unique_span(const context_t& context, size_t size) More...
 
template<typename T >
unique_span< T > make_unique_span (size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
 See device::make_unique_span(const context_t& context, size_t size) More...
 
template<typename Allocator >
::std::vector< device_t, Allocator > expected_accessors (const_region_t region, const Allocator &allocator)
 
unique_region make_unique_region (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility)
 Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More...
 
unique_region make_unique_region (const device_t &device, size_t num_bytes, initial_visibility_t initial_visibility)
 Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More...
 
unique_region make_unique_region (size_t num_bytes, initial_visibility_t initial_visibility)
 
unique_region make_unique_region (size_t num_bytes)
 Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More...
 

Detailed Description

Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory.

This type of memory, also known as unified memory, appears within a unified, all-system address space - and is used with the same address range on the host and on all relevant CUDA devices on a system. It is paged, so that it may exceed the physical size of a CUDA device's global memory. The CUDA driver takes care of "swapping" pages "out" from a device to host memory or "swapping" them back "in", as well as of propagation of changes between devices and host-memory.

Note
For more details, see Unified Memory for CUDA Beginners on the Parallel4All blog.

Function Documentation

◆ allocate() [1/3]

region_t cuda::memory::managed::allocate ( const context_t context,
size_t  num_bytes,
initial_visibility_t  initial_visibility = initial_visibility_t::to_all_devices 
)
inline

Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices.

Parameters
contextthe initial context which is likely to access the managed memory region (and which will certainly have the region actually allocated for it)
num_bytessize of each of the regions of memory to allocate
initial_visibilitywill the allocated region be visible, using the common address, to all CUDA device (= more overhead, more work for the CUDA runtime) or just to those devices with some hardware features to assist in this task (= less overhead)?

◆ allocate() [2/3]

region_t cuda::memory::managed::allocate ( const device_t device,
size_t  num_bytes,
initial_visibility_t  initial_visibility = initial_visibility_t::to_all_devices 
)
inline

Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices.

Parameters
devicethe initial device which is likely to access the managed memory region (and which will certainly have the region actually allocated for it)
num_bytessize of each of the regions of memory to allocate
initial_visibilitywill the allocated region be visible, using the common address, to all CUDA device (= more overhead, more work for the CUDA runtime) or just to those devices with some hardware features to assist in this task (= less overhead)?

◆ allocate() [3/3]

region_t cuda::memory::managed::allocate ( size_t  num_bytes)
inline

Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices.

Note
While the allocated memory should be available universally, the allocation itself does require some GPU context. This will be the current context, if one exists, or the primary context on the runtime-defined current device.

◆ expected_accessors() [1/2]

template<typename Allocator >
::std::vector<device_t, Allocator> cuda::memory::managed::expected_accessors ( const_region_t  region,
const Allocator &  allocator = Allocator() 
)
Returns
the devices which are marked by attribute as being the accessors of a specified memory region

◆ expected_accessors() [2/2]

template<typename Allocator = ::std::allocator<cuda::device_t>>
typename ::std::vector<device_t, Allocator> cuda::memory::managed::expected_accessors ( const_region_t  region,
const Allocator &  allocator = Allocator() 
)
Returns
the devices which are marked by attribute as being the accessors of a specified memory region

◆ free()

void cuda::memory::managed::free ( region_t  region)
inline

Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate.

◆ make_unique_region() [1/3]

unique_region cuda::memory::managed::make_unique_region ( const context_t context,
size_t  num_bytes,
initial_visibility_t  initial_visibility 
)
inline

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Parameters
contextA context of possible single-device-visibility
Returns
An owning RAII/CADRe object for the allocated managed memory region
An owning RAII/CADRe object for the allocated managed memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region in device-global memory within the primary context of the current CUDA device.

Note
The allocation will be made in the device's primary context - which will be created if it has not yet been.
Template Parameters
Tan array type; not the type of individual elements
Parameters
num_elementsthe number of elements to allocate
Returns
an ::std::unique_ptr pointing to the constructed T array
Parameters
deviceThe device in the global memory of which to make the allocation
Returns
An owning RAII/CADRe object for the allocated memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns
An owning RAII/CADRe object for the allocated memory region
Parameters
contextA context, to set when allocating the memory region, for whatever association effect that may have.

◆ make_unique_region() [2/3]

unique_region cuda::memory::managed::make_unique_region ( const device_t device,
size_t  num_bytes,
initial_visibility_t  initial_visibility 
)
inline

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Parameters
contextA context of possible single-device-visibility
Returns
An owning RAII/CADRe object for the allocated managed memory region
An owning RAII/CADRe object for the allocated managed memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region in device-global memory within the primary context of the current CUDA device.

Note
The allocation will be made in the device's primary context - which will be created if it has not yet been.
Template Parameters
Tan array type; not the type of individual elements
Parameters
num_elementsthe number of elements to allocate
Returns
an ::std::unique_ptr pointing to the constructed T array
Parameters
deviceThe device in the global memory of which to make the allocation
Returns
An owning RAII/CADRe object for the allocated memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns
An owning RAII/CADRe object for the allocated memory region
Parameters
deviceA context, whose primary context will be current when allocating the memory region, for whatever association effect that may have.

◆ make_unique_region() [3/3]

unique_region cuda::memory::managed::make_unique_region ( size_t  num_bytes)
inline

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns
An owning RAII/CADRe object for the allocated managed memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region in device-global memory within the primary context of the current CUDA device.

Note
The allocation will be made in the device's primary context - which will be created if it has not yet been.
Template Parameters
Tan array type; not the type of individual elements
Parameters
num_elementsthe number of elements to allocate
Returns
an ::std::unique_ptr pointing to the constructed T array
Parameters
deviceThe device in the global memory of which to make the allocation
Returns
An owning RAII/CADRe object for the allocated memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns
An owning RAII/CADRe object for the allocated memory region

◆ make_unique_span() [1/3]

template<typename T >
unique_span< T > cuda::memory::managed::make_unique_span ( const context_t context,
size_t  size,
initial_visibility_t  initial_visibility = initial_visibility_t::to_all_devices 
)

Allocate memory for a consecutive sequence of typed elements in system (host-side) memory.

Template Parameters
Ttype of the individual elements in the allocated sequence
Parameters
contextThe CUDA device context in which to register the allocation
sizethe number of elements to allocate
initial_visibilityChoices of which category of CUDA devices must the managed region be guaranteed to be visible to
Returns
A unique_span which owns the allocated memory (and will release said memory upon destruction)
Note
This function is somewhat similar to ::std:: make_unique_for_overwrite(), except that the returned value is not "just" a unique pointer, but also has a size. It is also similar to {cuda::device::make_unique_region}, except that the allocation is conceived as typed elements.
Typically, this is used for trivially-constructible elements, for which reason the non-construction of individual elements should not pose a problem. But - let the user beware, especially since this is accessible in host-side code.

◆ make_unique_span() [2/3]

template<typename T >
unique_span< T > cuda::memory::managed::make_unique_span ( const device_t device,
size_t  size,
initial_visibility_t  initial_visibility = initial_visibility_t::to_all_devices 
)

See device::make_unique_span(const context_t& context, size_t size)

Parameters
deviceThe CUDA device in whose primary context to make the allocation.

◆ make_unique_span() [3/3]

template<typename T >
unique_span< T > cuda::memory::managed::make_unique_span ( size_t  size,
initial_visibility_t  initial_visibility = initial_visibility_t::to_all_devices 
)

See device::make_unique_span(const context_t& context, size_t size)

Note
The current device's primary context will be used (not the current context).

◆ prefetch_to_host()

void cuda::memory::managed::prefetch_to_host ( const_region_t  region,
const stream_t stream 
)
inline

Prefetches a region of managed memory into host memory.

It can later be used there without waiting for I/O from any of the CUDA devices.