Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory. More...

Typedefs
using	region_t = detail_::region_helper< memory::region_t >
	A child class of the generic region_t with some managed-memory-specific functionality.

using	const_region_t = detail_::region_helper< memory::const_region_t >
	A child class of the generic const_region_t with some managed-memory-specific functionality.

using	unique_region = memory::unique_region< detail_::deleter >
	A unique region of managed memory, see cuda::memory::managed.

Enumerations
enum	attachment_t : unsigned { global = CU_MEM_ATTACH_GLOBAL, host = CU_MEM_ATTACH_HOST, single_stream = CU_MEM_ATTACH_SINGLE }
	Kinds of managed memory region attachments.

enum	initial_visibility_t { to_all_devices, to_supporters_of_concurrent_managed_access }
	The choices of which categories CUDA devices must a managed memory region be visible to.

Functions
void	advise_expected_access_by (const_region_t region, device_t &device)
	Advice the CUDA driver that `device` is expected to access `region`.

void	advise_no_access_expected_by (const_region_t region, device_t &device)
	Advice the CUDA driver that `device` is not expected to access `region`.

template<typename Allocator = ::std::allocator<cuda::device_t>>
typename ::std::vector< device_t, Allocator >	expected_accessors (const_region_t region, const Allocator &allocator=Allocator())

region_t	allocate (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
	Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More...

region_t	allocate (const device_t &device, size_t num_bytes, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
	Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices. More...

region_t	allocate (size_t num_bytes)
	Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices. More...

void	free (void *managed_ptr)
	Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate.

void	free (region_t region)
	Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate. More...

void	prefetch (const_region_t region, const cuda::device_t &destination, const stream_t &stream)
	Prefetches a region of managed memory to a specific device, so it can later be used there without waiting for I/O from the host or other devices.

void	prefetch_to_host (const_region_t region, const stream_t &stream)
	Prefetches a region of managed memory into host memory. More...

template<typename T >
unique_span< T >	make_unique_span (const context_t &context, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
	Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More...

template<typename T >
unique_span< T >	make_unique_span (const device_t &device, size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
	See `device::make_unique_span(const context_t& context, size_t size)` More...

template<typename T >
unique_span< T >	make_unique_span (size_t size, initial_visibility_t initial_visibility=initial_visibility_t::to_all_devices)
	See `device::make_unique_span(const context_t& context, size_t size)` More...

template<typename Allocator >
::std::vector< device_t, Allocator >	expected_accessors (const_region_t region, const Allocator &allocator)

unique_region	make_unique_region (const context_t &context, size_t num_bytes, initial_visibility_t initial_visibility)
	Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More...

unique_region	make_unique_region (const device_t &device, size_t num_bytes, initial_visibility_t initial_visibility)
	Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More...

unique_region	make_unique_region (size_t num_bytes, initial_visibility_t initial_visibility)

unique_region	make_unique_region (size_t num_bytes)
	Allocate a region of managed memory, accessible both from CUDA devices and from the CPU. More...

Detailed Description

Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory.

This type of memory, also known as unified memory, appears within a unified, all-system address space - and is used with the same address range on the host and on all relevant CUDA devices on a system. It is paged, so that it may exceed the physical size of a CUDA device's global memory. The CUDA driver takes care of "swapping" pages "out" from a device to host memory or "swapping" them back "in", as well as of propagation of changes between devices and host-memory.

Note: For more details, see Unified Memory for CUDA Beginners on the Parallel4All blog.

Function Documentation

◆ allocate() [1/3]

region_t cuda::memory::managed::allocate	(	const context_t &	context,
		size_t	num_bytes,
		initial_visibility_t	initial_visibility = `initial_visibility_t::to_all_devices`
	)

inline

Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices.

Parameters

context	the initial context which is likely to access the managed memory region (and which will certainly have the region actually allocated for it)
num_bytes	size of each of the regions of memory to allocate
initial_visibility	will the allocated region be visible, using the common address, to all CUDA device (= more overhead, more work for the CUDA runtime) or just to those devices with some hardware features to assist in this task (= less overhead)?

◆ allocate() [2/3]

region_t cuda::memory::managed::allocate	(	const device_t &	device,
		size_t	num_bytes,
		initial_visibility_t	initial_visibility = `initial_visibility_t::to_all_devices`
	)

inline

Allocate a a region of managed memory, accessible with the same address on the host and on CUDA devices.

Parameters

device	the initial device which is likely to access the managed memory region (and which will certainly have the region actually allocated for it)
num_bytes	size of each of the regions of memory to allocate
initial_visibility	will the allocated region be visible, using the common address, to all CUDA device (= more overhead, more work for the CUDA runtime) or just to those devices with some hardware features to assist in this task (= less overhead)?

◆ allocate() [3/3]

region_t cuda::memory::managed::allocate ( size_t num_bytes )

inline

Allocate a a region of managed memory, accessible with the same address on the host and on all CUDA devices.

Note: While the allocated memory should be available universally, the allocation itself does require some GPU context. This will be the current context, if one exists, or the primary context on the runtime-defined current device.

◆ expected_accessors() [1/2]

template<typename Allocator >

::std::vector<device_t, Allocator> cuda::memory::managed::expected_accessors	(	const_region_t	region,
		const Allocator &	allocator = `Allocator()`
	)

Returns: the devices which are marked by attribute as being the accessors of a specified memory region

◆ expected_accessors() [2/2]

template<typename Allocator = ::std::allocator<cuda::device_t>>

typename ::std::vector<device_t, Allocator> cuda::memory::managed::expected_accessors	(	const_region_t	region,
		const Allocator &	allocator = `Allocator()`
	)

Returns: the devices which are marked by attribute as being the accessors of a specified memory region

◆ free()

void cuda::memory::managed::free ( region_t region )

inline

Free a managed memory region (host-side and device-side regions on all devices where it was allocated, all with the same address) which was allocated with allocate.

◆ make_unique_region() [1/3]

unique_region cuda::memory::managed::make_unique_region	(	const context_t &	context,
		size_t	num_bytes,
		initial_visibility_t	initial_visibility
	)

inline

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Parameters

context A context of possible single-device-visibility

Returns: An owning RAII/CADRe object for the allocated managed memory region; An owning RAII/CADRe object for the allocated managed memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region in device-global memory within the primary context of the current CUDA device.

Note: The allocation will be made in the device's primary context - which will be created if it has not yet been.

Template Parameters

T	an array type; not the type of individual elements

Parameters

num_elements the number of elements to allocate

Returns: an ::std::unique_ptr pointing to the constructed T array

Parameters

device The device in the global memory of which to make the allocation

Returns: An owning RAII/CADRe object for the allocated memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns: An owning RAII/CADRe object for the allocated memory region

Parameters

context A context, to set when allocating the memory region, for whatever association effect that may have.

◆ make_unique_region() [2/3]

unique_region cuda::memory::managed::make_unique_region	(	const device_t &	device,
		size_t	num_bytes,
		initial_visibility_t	initial_visibility
	)

inline

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Parameters

context A context of possible single-device-visibility

Returns: An owning RAII/CADRe object for the allocated managed memory region; An owning RAII/CADRe object for the allocated managed memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region in device-global memory within the primary context of the current CUDA device.

Note: The allocation will be made in the device's primary context - which will be created if it has not yet been.

Template Parameters

T	an array type; not the type of individual elements

Parameters

num_elements the number of elements to allocate

Returns: an ::std::unique_ptr pointing to the constructed T array

Parameters

device The device in the global memory of which to make the allocation

Returns: An owning RAII/CADRe object for the allocated memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns: An owning RAII/CADRe object for the allocated memory region

Parameters

device A context, whose primary context will be current when allocating the memory region, for whatever association effect that may have.

◆ make_unique_region() [3/3]

unique_region cuda::memory::managed::make_unique_region ( size_t num_bytes )

inline

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns: An owning RAII/CADRe object for the allocated managed memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Allocate a region in device-global memory within the primary context of the current CUDA device.

Note: The allocation will be made in the device's primary context - which will be created if it has not yet been.

Template Parameters

T	an array type; not the type of individual elements

Parameters

num_elements the number of elements to allocate

Returns: an ::std::unique_ptr pointing to the constructed T array

Parameters

device The device in the global memory of which to make the allocation

Returns: An owning RAII/CADRe object for the allocated memory region

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns: An owning RAII/CADRe object for the allocated memory region

◆ make_unique_span() [1/3]

template<typename T >

unique_span< T > cuda::memory::managed::make_unique_span	(	const context_t &	context,
		size_t	size,
		initial_visibility_t	initial_visibility = `initial_visibility_t::to_all_devices`
	)

Allocate memory for a consecutive sequence of typed elements in system (host-side) memory.

Template Parameters

T	type of the individual elements in the allocated sequence

Parameters

context	The CUDA device context in which to register the allocation
size	the number of elements to allocate
initial_visibility	Choices of which category of CUDA devices must the managed region be guaranteed to be visible to

Returns: A unique_span which owns the allocated memory (and will release said memory upon destruction)

Note: This function is somewhat similar to ::std:: make_unique_for_overwrite(), except that the returned value is not "just" a unique pointer, but also has a size. It is also similar to {cuda::device::make_unique_region}, except that the allocation is conceived as typed elements.; Typically, this is used for trivially-constructible elements, for which reason the non-construction of individual elements should not pose a problem. But - let the user beware, especially since this is accessible in host-side code.

◆ make_unique_span() [2/3]

template<typename T >

unique_span< T > cuda::memory::managed::make_unique_span	(	const device_t &	device,
		size_t	size,
		initial_visibility_t	initial_visibility = `initial_visibility_t::to_all_devices`
	)

See device::make_unique_span(const context_t& context, size_t size)

Parameters

device The CUDA device in whose primary context to make the allocation.

◆ make_unique_span() [3/3]

template<typename T >

unique_span< T > cuda::memory::managed::make_unique_span	(	size_t	size,
		initial_visibility_t	initial_visibility = `initial_visibility_t::to_all_devices`
	)

See device::make_unique_span(const context_t& context, size_t size)

Note: The current device's primary context will be used (not the current context).

◆ prefetch_to_host()

void cuda::memory::managed::prefetch_to_host	(	const_region_t	region,
		const stream_t &	stream
	)

inline

Prefetches a region of managed memory into host memory.

It can later be used there without waiting for I/O from any of the CUDA devices.

Typedefs

Enumerations

Functions

Detailed Description

Function Documentation

◆ allocate() [1/3]

◆ allocate() [2/3]

◆ allocate() [3/3]

◆ expected_accessors() [1/2]

◆ expected_accessors() [2/2]

◆ free()

◆ make_unique_region() [1/3]

◆ make_unique_region() [2/3]

◆ make_unique_region() [3/3]

◆ make_unique_span() [1/3]

◆ make_unique_span() [2/3]

◆ make_unique_span() [3/3]

◆ prefetch_to_host()