Fundamental CUDA-related type definitions. More...

#include "detail/optional.hpp"
#include "detail/optional_ref.hpp"
#include "detail/span.hpp"
#include "detail/region.hpp"
#include "detail/type_traits.hpp"
#include <builtin_types.h>
#include <cuda.h>
#include <type_traits>
#include <utility>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <vector>
#include <stdexcept>

Include dependency graph for types.hpp:

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes
struct	cuda::array::dimensions_t< NumDimensions >
	CUDA's array memory-objects are multi-dimensional; but their dimensions, or extents, are not the same as cuda::grid::dimensions_t ; they may be much larger in each axis. More...

struct	cuda::array::dimensions_t< 3 >
	Dimensions for 3D CUDA arrays. More...

struct	cuda::array::dimensions_t< 2 >
	Dimensions for 2D CUDA arrays. More...

struct	cuda::grid::dimensions_t
	A richer (kind-of-a-)wrapper for CUDA's `dim3` class, used to specify dimensions for blocks (in terms of threads) and of grids(in terms of blocks, or overall). More...

struct	cuda::grid::overall_dimensions_t
	Dimensions of a grid in threads, i.e. More...

struct	cuda::grid::composite_dimensions_t
	Composite dimensions for a grid - in terms of blocks, then also down into the block dimensions completing the information to the thread level. More...

Namespaces
	cuda
	Definitions and functionality wrapping CUDA APIs.

	cuda::array
	CUDA facilities for interpolating access to multidimensional array objects, in particular via the array_t class.

	cuda::event
	CUDA timing functionality, via events and their related code (not including the event wrapper type event_t itself)

	cuda::event::ipc
	Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself)

	cuda::stream
	Definitions and functionality related to CUDA streams (not including the device wrapper type stream_t itself)

	cuda::memory
	Representation, allocation and manipulation of CUDA-related memory, of different.

	cuda::memory::device
	CUDA-Device-global memory on a single device (not accessible from the host)

	cuda::memory::shared
	A memory space whose contents is shared by all threads in a CUDA kernel block, but specific to each kernel block separately.

	cuda::memory::managed
	Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory.

	cuda::device
	Definitions and functionality related to CUDA devices (not including the device wrapper type cuda::device_t itself)

	cuda::device::peer_to_peer
	API functions and definitions relating to communications among peer CUDA GPU devices on the same system.

Typedefs
template<typename T , size_t N>
using	cuda::c_array = T[N]

using	cuda::status_t = CUresult
	Indicates either the result (success or error index) of a CUDA Runtime or Driver API call, or the overall status of the API (which is typically the last triggered error). More...

using	cuda::size_t = ::std::size_t
	A size type for use throughout the wrappers library (except when specific API functions limit the size further)

using	cuda::dimensionality_t = size_t
	The index or number of dimensions of an entity (as opposed to the extent in any dimension) - typically just 0, 1, 2 or 3. More...

using	cuda::array::dimension_t = size_t
	An individual dimension extent for an array.

using	cuda::event::handle_t = CUevent
	The CUDA driver's raw handle for events.

using	cuda::stream::handle_t = CUstream
	The CUDA driver's raw handle for streams.

using	cuda::stream::priority_t = int
	CUDA streams have a scheduling priority, with lower values meaning higher priority. More...

using	cuda::stream::callback_t = CUstreamCallback
	The CUDA driver's raw handle for a host-side callback function.

using	cuda::grid::dimension_t = decltype(dim3::x)
	CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...

using	cuda::grid::block_dimension_t = dimension_t
	CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...

using	cuda::grid::block_dimensions_t = dimensions_t
	CUDA kernels are launched in grids of blocks of threads. More...

using	cuda::grid::overall_dimension_t = size_t
	Dimension of a grid in threads along one axis, i.e. More...

using	cuda::memory::pointer::attribute_t = CUpointer_attribute
	Raw CUDA driver choice type for attributes of pointers.

using	cuda::memory::device::address_t = CUdeviceptr
	The numeric type which can represent the range of memory addresses on a CUDA device. More...

using	cuda::memory::shared::size_t = unsigned
	Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ). More...

using	cuda::device::id_t = CUdevice
	Numeric ID of a CUDA device used by the CUDA Runtime API. More...

using	cuda::device::attribute_t = CUdevice_attribute
	CUDA devices have both "attributes" and "properties". More...

using	cuda::device::attribute_value_t = int
	All CUDA device attributes (cuda::device::attribute_t) have a value of this type.

using	cuda::device::peer_to_peer::attribute_t = CUdevice_P2PAttribute
	While Individual CUDA devices have individual "attributes" (attribute_t), there are also attributes characterizing pairs; this type is used for identifying/indexing them.

using	cuda::context::handle_t = CUcontext
	Raw CUDA driver handle for a context; see {context_t}.

using	cuda::context::flags_t = unsigned

using	cuda::device::flags_t = context::flags_t

using	cuda::device::primary_context::handle_t = cuda::context::handle_t
	Raw CUDA driver handle for a device's primary context.

using	cuda::device::host_thread_sync_scheduling_policy_t = context::host_thread_sync_scheduling_policy_t

using	cuda::uuid_t = CUuuid
	The CUDA-driver-specific representation of a UUID value; see also {device_t::uuid()}.

using	cuda::kernel::attribute_t = CUfunction_attribute
	Raw CUDA driver selector of a kernel attribute.

using	cuda::kernel::attribute_value_t = int
	The uniform type the CUDA driver uses for all kernel attributes; it is typically more appropriate to use cuda::kernel_t methods, which also employ more specific, appropriate types. More...

using	cuda::kernel::handle_t = CUfunction

Enumerations
enum	: priority_t { cuda::stream::default_priority = 0 }

enum	cuda::memory::managed::initial_visibility_t { to_all_devices, to_supporters_of_concurrent_managed_access }
	The choices of which categories CUDA devices must a managed memory region be visible to.

enum	cuda::multiprocessor_cache_preference_t : ::std::underlying_type< CUfunc_cache_enum >::type { cuda::multiprocessor_cache_preference_t::no_preference = CU_FUNC_CACHE_PREFER_NONE, cuda::multiprocessor_cache_preference_t::equal_l1_and_shared_memory = CU_FUNC_CACHE_PREFER_EQUAL, cuda::multiprocessor_cache_preference_t::prefer_shared_memory_over_l1 = CU_FUNC_CACHE_PREFER_SHARED, cuda::multiprocessor_cache_preference_t::prefer_l1_over_shared_memory = CU_FUNC_CACHE_PREFER_L1, none = no_preference, equal = equal_l1_and_shared_memory, prefer_shared = prefer_shared_memory_over_l1, prefer_l1 = prefer_l1_over_shared_memory }
	L1-vs-shared-memory balance option. More...

enum	cuda::multiprocessor_shared_memory_bank_size_option_t : ::std::underlying_type< CUsharedconfig >::type { device_default = CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE, four_bytes_per_bank = CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE, eight_bytes_per_bank = CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE }
	A physical core (SM)'s shared memory has multiple "banks"; at most one datum per bank may be accessed simultaneously, while data in different banks can be accessed in parallel. More...

enum	cuda::context::host_thread_sync_scheduling_policy_t : unsigned int { cuda::context::heuristic = CU_CTX_SCHED_AUTO, cuda::context::default_ = heuristic, cuda::context::spin = CU_CTX_SCHED_SPIN, cuda::context::block = CU_CTX_SCHED_BLOCKING_SYNC, cuda::context::yield = CU_CTX_SCHED_YIELD, cuda::context::automatic = heuristic }
	Scheduling policies the CUDA driver may use when the host-side thread it is running in needs to wait for results from a certain device or context. More...

Functions
address_t	cuda::memory::device::address (const void *device_ptr) noexcept

address_t	cuda::memory::device::address (memory::const_region_t region) noexcept

void *	cuda::memory::as_pointer (device::address_t address) noexcept

Detailed Description

Fundamental CUDA-related type definitions.

This is a common file for all definitions of fundamental CUDA-related types, some shared by different APIs.

Note: In this file you'll find several numeric or opaque handle types, e.g. for devices, streams and events. These are mostly to be ignored; they appear here to make interaction with the unwrapped API easier and to break dependencies in the code. Instead, this library offers wrapper classes for them, in separate header files. For example: stream.hpp contains a stream_t class with its unique stream handle. Those are the ones you will want to use - they are more convenient and safer.

Typedef Documentation

◆ attribute_value_t

using cuda::kernel::attribute_value_t = typedef int

The uniform type the CUDA driver uses for all kernel attributes; it is typically more appropriate to use cuda::kernel_t methods, which also employ more specific, appropriate types.

◆ block_dimension_t

using cuda::grid::block_dimension_t = typedef dimension_t

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the number of threads per block is specified in this type.

Note: Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.; At the time of writing, a grid dimension value cannot exceed 2^31 on any axis (even lower on the y and z axes), so signed 32-bit integers can may be safely narrowing-cast into from this type.

◆ block_dimensions_t

using cuda::grid::block_dimensions_t = typedef dimensions_t

CUDA kernels are launched in grids of blocks of threads.

This expresses the dimensions of a block within such a grid, in terms of threads.

Todo:: Consider having both grid and block dims inhert from the same dimensions_t structure, but be incompatible, to prevent mis-casting one as the other.

◆ dimension_t

using cuda::grid::dimension_t = typedef decltype(dim3::x)

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the numbers of blocks per grid is specified in this type.

Note: Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.

◆ overall_dimension_t

using cuda::grid::overall_dimension_t = typedef size_t

Dimension of a grid in threads along one axis, i.e.

a multiplication of a grid's block dimension and the grid's dimension in blocks, on some axis.

Enumeration Type Documentation

◆ host_thread_sync_scheduling_policy_t

enum cuda::context::host_thread_sync_scheduling_policy_t : unsigned int

Scheduling policies the CUDA driver may use when the host-side thread it is running in needs to wait for results from a certain device or context.

Enumerator
heuristic	Default behavior; yield or spin based on a heuristic. The default value if the flags parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. If C > P, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while waiting for results and actively spin on the processor.
default_	Alias for the default behavior; see heuristic .
spin	Keep control and spin-check for result availability. Instruct CUDA to actively spin when waiting for results from the device. This can decrease latency when waiting for the device, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
block	Block the thread until results are available. Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
yield	Yield control while waiting for results. Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device.
automatic	see heuristic

Classes

Namespaces

Typedefs

Enumerations

Functions