cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Classes | Namespaces | Macros | Typedefs | Enumerations | Functions
types.hpp File Reference

Fundamental CUDA-related type definitions. More...

#include <builtin_types.h>
#include <cuda.h>
#include <type_traits>
#include <utility>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <stdexcept>
Include dependency graph for types.hpp:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  cuda::span< T >
 A "poor man's" span class. More...
 
struct  cuda::array::dimensions_t< NumDimensions >
 CUDA's array memory-objects are multi-dimensional; but their dimensions, or extents, are not the same as cuda::grid::dimensions_t ; they may be much larger in each axis. More...
 
struct  cuda::array::dimensions_t< 3 >
 Dimensions for 3D CUDA arrays. More...
 
struct  cuda::array::dimensions_t< 2 >
 Dimensions for 2D CUDA arrays. More...
 
struct  cuda::grid::dimensions_t
 A richer (kind-of-a-)wrapper for CUDA's dim3 class, used to specify dimensions for blocks (in terms of threads) and of grids(in terms of blocks, or overall). More...
 
struct  cuda::grid::composite_dimensions_t
 Composite dimensions for a grid - in terms of blocks, then also down into the block dimensions completing the information to the thread level. More...
 
struct  cuda::grid::overall_dimensions_t
 Dimensions of a grid in threads, i.e. More...
 
struct  cuda::memory::region_t
 
struct  cuda::memory::const_region_t
 

Namespaces

 cuda
 All definitions and functionality wrapping the CUDA Runtime API.
 
 cuda::event
 Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself)
 
 cuda::stream
 Definitions and functionality related to CUDA streams (not including the device wrapper type stream_t itself)
 
 memory
 Representation, allocation and manipulation of CUDA-related memory, of different kinds.
 
 cuda::memory::device
 CUDA-Device-global memory on a single device (not accessible from the host)
 
 cuda::memory::managed
 This type of memory, also known as unified memory, appears within a unified, all-system address space - and is used with the same address range on the host and on all relevant CUDA devices on a system.
 
 cuda::device
 Definitions and functionality related to CUDA devices (not including the device wrapper type device_t itself)
 

Typedefs

using cuda::status_t = CUresult
 Indicates either the result (success or error index) of a CUDA Runtime or Driver API call, or the overall status of the API (which is typically the last triggered error). More...
 
using cuda::size_t = ::std::size_t
 
using cuda::dimensionality_t = unsigned
 The index or number of dimensions of an entity (as opposed to the extent in any dimension) - typically just 0, 1, 2 or 3.
 
using cuda::array::dimension_t = size_t
 
using cuda::event::handle_t = CUevent
 The CUDA Runtime API's numeric handle for events.
 
using cuda::stream::handle_t = CUstream
 The CUDA API's handle for streams.
 
using cuda::stream::priority_t = int
 CUDA streams have a scheduling priority, with lower values meaning higher priority. More...
 
using cuda::grid::dimension_t = decltype(dim3::x)
 CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...
 
using cuda::grid::block_dimension_t = dimension_t
 CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...
 
using cuda::grid::block_dimensions_t = dimensions_t
 CUDA kernels are launched in grids of blocks of threads. More...
 
using cuda::grid::overall_dimension_t = size_t
 Dimension of a grid in threads along one axis, i.e. More...
 
using cuda::memory::pointer::attribute_t = CUpointer_attribute
 
using cuda::memory::device::address_t = CUdeviceptr
 The numeric type which can represent the range of memory addresses on a CUDA device.
 
using cuda::memory::shared::size_t = unsigned
 Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ). More...
 
using cuda::memory::managed::range_attribute_t = CUmem_range_attribute
 
using cuda::device::id_t = CUdevice
 Numeric ID of a CUDA device used by the CUDA Runtime API. More...
 
using cuda::device::attribute_t = CUdevice_attribute
 CUDA devices have both "attributes" and "properties". More...
 
using cuda::device::attribute_value_t = int
 All CUDA device attributes (cuda::device::attribute_t) have a value of this type.
 
using cuda::device::peer_to_peer::attribute_t = CUdevice_P2PAttribute
 While Individual CUDA devices have individual "attributes" (attribute_t), there are also attributes characterizing pairs; this type is used for identifying/indexing them.
 
using cuda::context::handle_t = CUcontext
 
using cuda::context::flags_t = unsigned
 
using cuda::device::flags_t = context::flags_t
 
using cuda::device::primary_context::handle_t = cuda::context::handle_t
 
using cuda::device::host_thread_synch_scheduling_policy_t = context::host_thread_synch_scheduling_policy_t
 
using cuda::native_word_t = unsigned
 
using cuda::uuid_t = CUuuid
 
using cuda::kernel::attribute_t = CUfunction_attribute
 
using cuda::kernel::attribute_value_t = int
 
using cuda::kernel::handle_t = CUfunction
 

Enumerations

enum  : priority_t { cuda::stream::default_priority = 0 }
 
enum  initial_visibility_t {
  to_all_devices,
  to_supporters_of_concurrent_managed_access
}
 
enum  cuda::multiprocessor_cache_preference_t : ::std::underlying_type< CUfunc_cache_enum >::type {
  cuda::multiprocessor_cache_preference_t::no_preference = CU_FUNC_CACHE_PREFER_NONE,
  cuda::multiprocessor_cache_preference_t::equal_l1_and_shared_memory = CU_FUNC_CACHE_PREFER_EQUAL,
  cuda::multiprocessor_cache_preference_t::prefer_shared_memory_over_l1 = CU_FUNC_CACHE_PREFER_SHARED,
  cuda::multiprocessor_cache_preference_t::prefer_l1_over_shared_memory = CU_FUNC_CACHE_PREFER_L1,
  none = no_preference,
  equal = equal_l1_and_shared_memory,
  prefer_shared = prefer_shared_memory_over_l1,
  prefer_l1 = prefer_l1_over_shared_memory
}
 L1-vs-shared-memory balance option. More...
 
enum  cuda::multiprocessor_shared_memory_bank_size_option_t : ::std::underlying_type< CUsharedconfig >::type {
  device_default = CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE,
  four_bytes_per_bank = CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE,
  eight_bytes_per_bank = CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE
}
 A physical core (SM)'s shared memory has multiple "banks"; at most one datum per bank may be accessed simultaneously, while data in different banks can be accessed in parallel. More...
 
enum  cuda::context::host_thread_synch_scheduling_policy_t : unsigned int {
  cuda::context::heuristic = CU_CTX_SCHED_AUTO,
  cuda::context::default_ = heuristic,
  cuda::context::spin = CU_CTX_SCHED_SPIN,
  cuda::context::block = CU_CTX_SCHED_BLOCKING_SYNC,
  cuda::context::yield = CU_CTX_SCHED_YIELD,
  cuda::context::automatic = heuristic
}
 Scheduling policies the Runtime API may use when the host-side thread it is running in needs to wait for results from a certain device. More...
 

Functions

constexpr bool cuda::grid::operator== (composite_dimensions_t lhs, composite_dimensions_t rhs) noexcept
 
constexpr bool cuda::grid::operator!= (composite_dimensions_t lhs, composite_dimensions_t rhs) noexcept
 
constexpr bool cuda::grid::operator== (overall_dimensions_t lhs, overall_dimensions_t rhs) noexcept
 
constexpr bool cuda::grid::operator!= (overall_dimensions_t lhs, overall_dimensions_t rhs) noexcept
 
constexpr overall_dimensions_t cuda::grid::operator* (dimensions_t grid_dims, block_dimensions_t block_dims) noexcept
 
address_t cuda::memory::device::address (const void *device_ptr) noexcept
 Return a pointers address as a numeric value of the type appropriate for device. More...
 
void * cuda::memory::as_pointer (device::address_t address) noexcept
 

Detailed Description

Fundamental CUDA-related type definitions.

This is a common file for all definitions of fundamental CUDA-related types, some shared by different APIs.

Note
In this file you'll find several numeric or opaque handle types, e.g. for devices, streams and events. These are mostly to be ignored; they appear here to make interaction with the unwrapped API easier and to break dependencies in the code. Instead, this library offers wrapper classes for them, in separate header files. For example: stream.hpp contains a stream_t class with its unique stream handle. Those are the ones you will want to use - they are more convenient and safer.

Typedef Documentation

◆ block_dimension_t

using cuda::grid::block_dimension_t = typedef dimension_t

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the number of threads per block is specified in this type.

Note
Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.
At the time of writing, a grid dimension value cannot exceed 2^31 on any axis (even lower on the y and z axes), so signed 32-bit integers can may be safely narrowing-cast into from this type.

◆ block_dimensions_t

using cuda::grid::block_dimensions_t = typedef dimensions_t

CUDA kernels are launched in grids of blocks of threads.

This expresses the dimensions of a block within such a grid, in terms of threads.

Todo:
Consider having both grid and block dims inhert from the same dimensions_t structure, but be incompatible, to prevent mis-casting one as the other.

◆ dimension_t

using cuda::grid::dimension_t = typedef decltype(dim3::x)

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the numbers of blocks per grid is specified in this type.

Note
Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.

◆ overall_dimension_t

using cuda::grid::overall_dimension_t = typedef size_t

Dimension of a grid in threads along one axis, i.e.

a multiplication of a grid's block dimension and the grid's dimension in blocks, on some axis.

◆ size_t

using cuda::memory::shared::size_t = typedef unsigned

Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ).

This type is large enough to hold its size.

Note
actually, uint16_t is usually large enough to hold the shared memory size (as of Volta/Turing architectures), but there are exceptions to this rule, so we have to go with the next smallest.
Todo:
consider using uint32_t.

Enumeration Type Documentation

◆ host_thread_synch_scheduling_policy_t

Scheduling policies the Runtime API may use when the host-side thread it is running in needs to wait for results from a certain device.

Enumerator
heuristic 

Default behavior; yield or spin based on a heuristic.

The default value if the flags parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. If C > P, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while waiting for results and actively spin on the processor.

default_ 

Alias for the default behavior; see heuristic .

spin 

Keep control and spin-check for result availability.

Instruct CUDA to actively spin when waiting for results from the device. This can decrease latency when waiting for the device, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.

block 

Block the thread until results are available.

Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.

yield 

Yield control while waiting for results.

Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device.

automatic 

see heuristic