Host-side (= system) memory which is "pinned", i.e. More...

Typedefs
using	unique_region = memory::unique_region< detail_::deleter >
	A unique region of pinned host memory.

Enumerations
enum	mapped_io_space : bool { is_mapped_io_space = true, is_not_mapped_io_space = false }
	Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More...

enum	map_into_device_memory : bool { map_into_device_memory = true, do_not_map_into_device_memory = false }
	Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device. More...

enum	accessibility_on_all_devices : bool { is_accessible_on_all_devices = true, is_not_accessible_on_all_devices = false }
	Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation. More...

Functions
region_t	allocate (size_t size_in_bytes, allocation_options options)
	Allocates pinned host memory. More...

region_t	allocate (size_t size_in_bytes, portability_across_contexts portability=portability_across_contexts(false), cpu_write_combining cpu_wc=cpu_write_combining(false))
	Allocates pinned host memory. More...

region_t	allocate (size_t size_in_bytes, cpu_write_combining cpu_wc)
	Allocates pinned host memory. More...

void	free (void *host_ptr)
	Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More...

void	free (region_t region)
	Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions. More...

void	register_ (const void *ptr, size_t size, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all)
	Register a memory region with the CUDA driver. More...

void	register_ (const_region_t region, bool register_mapped_io_space, bool map_into_device_space, bool make_device_side_accessible_to_all)
	Register a memory region with the CUDA driver. More...

void	register_ (void const *ptr, size_t size)
	Register a memory region with the CUDA driver. More...

void	register_ (const_region_t region)
	Register a memory region with the CUDA driver. More...

void	deregister (const void *ptr)
	Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More...

void	deregister (const_region_t region)
	Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it. More...

template<typename T >
unique_span< T >	make_unique_span (size_t size)
	Allocate memory for a consecutive sequence of typed elements in system (host-side) memory. More...

unique_region	make_unique_region (size_t num_bytes)
	Allocate a physical-address-pinned region of system memory. More...


void	set (void *start, int byte_value, size_t num_bytes)
	Sets all bytes in a stretch of host-side memory to a single value. More...

void	set (region_t region, int byte_value)

void	zero (void *start, size_t num_bytes)
	Zero-out a region of host memory. More...

void	zero (region_t region)
	Zero-out a region of host memory. More...

template<typename T >
void	zero (T *ptr)
	Asynchronously sets all bytes of a single pointed-to value to 0 (zero). More...

Detailed Description

Host-side (= system) memory which is "pinned", i.e.

resides in a fixed physical location - and allocated by the CUDA driver.

Enumeration Type Documentation

◆ accessibility_on_all_devices

enum cuda::memory::host::accessibility_on_all_devices : bool

Whether the allocated host-side memory should be recognized as pinned memory by all CUDA contexts, not just the (implicit Runtime API) context that performed the allocation.

Enumerator
is_accessible_on_all_devices	is_accessible_on_all_devices
is_not_accessible_on_all_devices	is_not_accessible_on_all_devices

◆ map_into_device_memory

enum cuda::memory::host::map_into_device_memory : bool

Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.

When true, one can then obtain the device-space pointer using mapped:device_side_pointer_for()

◆ mapped_io_space

enum cuda::memory::host::mapped_io_space : bool

Whether or not the registration of the host-side pointer should map it into the CUDA address space for access on the device.

When true, one can then obtain the device-space pointer using mapped:device_side_pointer_for<T>(T *)

Function Documentation

◆ allocate() [1/3]

region_t cuda::memory::host::allocate	(	size_t	size_in_bytes,
		allocation_options	options
	)

inline

Allocates pinned host memory.

Note: "pinned" memory is allocated in contiguous physical ram addresses, making it possible to copy to and from it to the the gpu using dma without assistance from the gpu. this improves the copying bandwidth significantly over naively-allocated host memory, and reduces overhead for the cpu.

Exceptions

cuda::runtime_error if allocation fails for any reason

Parameters

size_in_bytes	the amount of memory to allocate, in bytes
options	options to pass to the cuda host-side memory allocator; see {memory::allocation_options}.

Returns: a pointer to the allocated stretch of memory

Note: The allocation does not keep any device context alive/active; that is the caller's responsibility. However, if there is no current context, it will trigger the creation of a primary context on the default device, and "leak" a refcount unit for it. For this (and other) reasons, one should avoid it, and prefer passing a context, or at least a device, to the allocation function

◆ allocate() [2/3]

region_t cuda::memory::host::allocate	(	size_t	size_in_bytes,
		portability_across_contexts	portability = `portability_across_contexts(false)`,
		cpu_write_combining	cpu_wc = `cpu_write_combining(false)`
	)

inline

Allocates pinned host memory.

Note: "pinned" memory is allocated in contiguous physical ram addresses, making it possible to copy to and from it to the the gpu using dma without assistance from the gpu. this improves the copying bandwidth significantly over naively-allocated host memory, and reduces overhead for the cpu.

Exceptions

cuda::runtime_error if allocation fails for any reason

Parameters

size_in_bytes	the amount of memory to allocate, in bytes
options	options to pass to the cuda host-side memory allocator; see {memory::allocation_options}.

Returns: a pointer to the allocated stretch of memory

Note: The allocation does not keep any device context alive/active; that is the caller's responsibility. However, if there is no current context, it will trigger the creation of a primary context on the default device, and "leak" a refcount unit for it. For this (and other) reasons, one should avoid it, and prefer passing a context, or at least a device, to the allocation function

Parameters

portability	whether or not the allocated region can be used in different CUDA contexts.
cpu_wc	whether or not the GPU can batch multiple writes to this area and propagate them at its convenience.

◆ allocate() [3/3]

region_t cuda::memory::host::allocate	(	size_t	size_in_bytes,
		cpu_write_combining	cpu_wc
	)

inline

Allocates pinned host memory.

Note: "pinned" memory is allocated in contiguous physical ram addresses, making it possible to copy to and from it to the the gpu using dma without assistance from the gpu. this improves the copying bandwidth significantly over naively-allocated host memory, and reduces overhead for the cpu.

Exceptions

cuda::runtime_error if allocation fails for any reason

Parameters

size_in_bytes	the amount of memory to allocate, in bytes
options	options to pass to the cuda host-side memory allocator; see {memory::allocation_options}.

Returns: a pointer to the allocated stretch of memory

Note: The allocation does not keep any device context alive/active; that is the caller's responsibility. However, if there is no current context, it will trigger the creation of a primary context on the default device, and "leak" a refcount unit for it. For this (and other) reasons, one should avoid it, and prefer passing a context, or at least a device, to the allocation function

Parameters

portability	whether or not the allocated region can be used in different CUDA contexts.
cpu_wc	whether or not the GPU can batch multiple writes to this area and propagate them at its convenience.

◆ deregister() [1/2]

void cuda::memory::host::deregister ( const void * ptr )

inline

Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it.

Note: the CUDA API calls this action "unregister", but that's semantically inaccurate. The registration is not undone, rolled back, it's just ended

◆ deregister() [2/2]

void cuda::memory::host::deregister ( const_region_t region )

inline

Have the CUDA driver "forget" about a region of memory which was previously registered with it, and page-unlock it.

Note: the CUDA API calls this action "unregister", but that's semantically inaccurate. The registration is not undone, rolled back, it's just ended

◆ free() [1/2]

void cuda::memory::host::free ( void * host_ptr )

inline

Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions.

Note: The address provided must be the beginning of the region of allocated memory; and the entire region is freed (i.e. the region size is known to/determined by the driver)

◆ free() [2/2]

void cuda::memory::host::free ( region_t region )

inline

Frees a region of pinned host memory which was allocated with one of the pinned host memory allocation functions.

Parameters

region The region of memory to free

◆ make_unique_region()

unique_region cuda::memory::host::make_unique_region ( size_t num_bytes )

inline

Allocate a physical-address-pinned region of system memory.

Allocate a region of managed memory, accessible both from CUDA devices and from the CPU.

Returns: An owning RAII/CADRe object for the allocated memory region

◆ make_unique_span()

template<typename T >

unique_span<T> cuda::memory::host::make_unique_span ( size_t size )

Allocate memory for a consecutive sequence of typed elements in system (host-side) memory.

Template Parameters

T	type of the individual elements in the allocated sequence

Parameters

size	the number of elements to allocate

Returns: A unique_span which owns the allocated memory (and will release said memory upon destruction)

Note: This function is somewhat similar to ::std:: make_unique_for_overwrite(), except that the returned value is not "just" a unique pointer, but also has a size. It is also similar to {cuda::device::make_unique_region}, except that the allocation is conceived as typed elements.; We assume this memory is used for copying to or from device-side memory; hence, we constrain the type to be trivially constructible, destructible and copyable; ignoring alignment

◆ register_() [1/4]

void cuda::memory::host::register_	(	const void *	ptr,
		size_t	size,
		bool	register_mapped_io_space,
		bool	map_into_device_space,
		bool	make_device_side_accessible_to_all
	)

inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note: we can't use the name register, since that's a reserved word

Parameters

ptr	The beginning of a pre-allocated region of host memory
size	the size in bytes the memory region to register
register_mapped_io_space	region will be treated as being some memory-mapped I/O space, e.g. belonging to a third-party PCIe device. See CU_MEMHOSTREGISTER_IOMEMORY for more details.
map_into_device_space	If true, map the region to a region of addresses accessible from the (current context's) device; in practice, and with modern GPUs, this means the region itself will be accessible from the device. See CU_MEMHOSTREGISTER_DEVICEMAP for more details.
make_device_side_accessible_to_all	Make the region accessible in all CUDA contexts.
considered_read_only_by_device	Device-side code will consider this region (or rather the region it is mapped to and accessible from the device) as read-only; see CU_MEMHOSTREGISTER_READ_ONLY for more details.

◆ register_() [2/4]

void cuda::memory::host::register_	(	const_region_t	region,
		bool	register_mapped_io_space,
		bool	map_into_device_space,
		bool	make_device_side_accessible_to_all
	)

inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note: we can't use the name register, since that's a reserved word

Parameters

region	The region to register
register_mapped_io_space	region will be treated as being some memory-mapped I/O space, e.g. belonging to a third-party PCIe device. See CU_MEMHOSTREGISTER_IOMEMORY for more details.
map_into_device_space	If true, map the region to a region of addresses accessible from the (current context's) device; in practice, and with modern GPUs, this means the region itself will be accessible from the device. See CU_MEMHOSTREGISTER_DEVICEMAP for more details.
make_device_side_accessible_to_all	Make the region accessible in all CUDA contexts.
considered_read_only_by_device	Device-side code will consider this region (or rather the region it is mapped to and accessible from the device) as read-only; see CU_MEMHOSTREGISTER_READ_ONLY for more details.

◆ register_() [3/4]

void cuda::memory::host::register_	(	void const *	ptr,
		size_t	size
	)

inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note: we can't use the name register, since that's a reserved word

Parameters

ptr	The beginning of a pre-allocated region of host memory
size	the size in bytes the memory region to register

◆ register_() [4/4]

void cuda::memory::host::register_ ( const_region_t region )

inline

Register a memory region with the CUDA driver.

Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cuMemAllocHost() to automatically accelerate calls to functions such as cuMemcpy().

Currently works within the current context

Note: we can't use the name register, since that's a reserved word

Parameters

region The region to register

◆ set() [1/2]

void cuda::memory::host::set	(	void *	start,
		int	byte_value,
		size_t	num_bytes
	)

inline

Sets all bytes in a stretch of host-side memory to a single value.

Note: a wrapper for ::std::memset

Parameters

byte_value	The value to set each byte in the memory region to.
start	starting address of the memory region to set, in host memory; can be either CUDA-allocated or otherwise.
num_bytes	size of the memory region in bytes

◆ set() [2/2]

void cuda::memory::host::set	(	region_t	region,
		int	byte_value
	)

inline

Parameters

region The region of memory to set to the fixed value

◆ zero() [1/3]

void cuda::memory::host::zero	(	void *	start,
		size_t	num_bytes
	)

inline

Zero-out a region of host memory.

Parameters

ptr	the beginning of a region of host memory to zero-out
num_bytes	the size in bytes of the region of memory to zero-out

◆ zero() [2/3]

void cuda::memory::host::zero ( region_t region )

inline

Zero-out a region of host memory.

Parameters

region the region of host-side memory to zero-out

◆ zero() [3/3]

template<typename T >

void cuda::memory::host::zero ( T * ptr )

inline

Asynchronously sets all bytes of a single pointed-to value to 0 (zero).

Parameters

ptr	a pointer to the value to be to zero, in host memory

Typedefs

Enumerations

Functions

Detailed Description

Enumeration Type Documentation

◆ accessibility_on_all_devices

◆ map_into_device_memory

◆ mapped_io_space

Function Documentation

◆ allocate() [1/3]

◆ allocate() [2/3]

◆ allocate() [3/3]

◆ deregister() [1/2]

◆ deregister() [2/2]

◆ free() [1/2]

◆ free() [2/2]

◆ make_unique_region()

◆ make_unique_span()

◆ register_() [1/4]

◆ register_() [2/4]

◆ register_() [3/4]

◆ register_() [4/4]

◆ set() [1/2]

◆ set() [2/2]

◆ zero() [1/3]

◆ zero() [2/3]

◆ zero() [3/3]