cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Classes | Namespaces | Macros | Typedefs | Enumerations | Functions
types.hpp File Reference

Fundamental CUDA-related type definitions. More...

#include <builtin_types.h>
#include <type_traits>
#include <cassert>
#include <cstddef>
#include <cstdint>
Include dependency graph for types.hpp:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  cuda::array::dimensions_t< NumDimensions >
 CUDA's array memory-objects are multi-dimensional; but their dimensions, or extents, are not the same as cuda::grid::dimensions_t ; they may be much larger in each axis. More...
 
struct  cuda::array::dimensions_t< 3 >
 Dimensions for 3D CUDA arrays. More...
 
struct  cuda::array::dimensions_t< 2 >
 Dimensions for 2D CUDA arrays. More...
 
struct  cuda::grid::dimensions_t
 A richer (kind-of-a-)wrapper for CUDA's dim3 class, used to specify dimensions for blocks (in terms of threads) and of grids(in terms of blocks, or overall). More...
 
struct  cuda::launch_configuration_t
 Holds the parameters necessary to "launch" a CUDA kernel (i.e. More...
 
struct  cuda::symbol_t
 Object-code symbols. More...
 

Namespaces

 cuda
 All definitions and functionality wrapping the CUDA Runtime API.
 
 cuda::event
 Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself)
 
 cuda::stream
 Definitions and functionality related to CUDA streams (not including the device wrapper type stream_t itself)
 
 cuda::memory
 Management and operations on memory in different CUDA-recognized spaces.
 
 cuda::device
 Definitions and functionality related to CUDA devices (not including the device wrapper type device_t itself)
 

Typedefs

using cuda::status_t = cudaError_t
 Indicates either the result (success or error index) of a CUDA Runtime API call, or the overall status of the Runtime API (which is typically the last triggered error).
 
using cuda::size_t = ::std::size_t
 
using cuda::dimensionality_t = unsigned
 The index or number of dimensions of an entity (as opposed to the extent in any dimension) - typically just 0, 1, 2 or 3.
 
using cuda::array::dimension_t = size_t
 
using cuda::event::id_t = cudaEvent_t
 The CUDA Runtime API's numeric handle for events.
 
using cuda::stream::id_t = cudaStream_t
 The CUDA Runtime API's numeric handle for streams.
 
using cuda::stream::priority_t = int
 CUDA streams have a scheduling priority, with lower values meaning higher priority. More...
 
using cuda::grid::dimension_t = decltype(dim3::x)
 CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...
 
using cuda::grid::block_dimension_t = dimension_t
 CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...
 
using cuda::grid::block_dimensions_t = dimensions_t
 CUDA kernels are launched in grids of blocks of threads. More...
 
using cuda::memory::shared::size_t = unsigned
 Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ). More...
 
using cuda::device::id_t = int
 Numeric ID of a CUDA device used by the CUDA Runtime API.
 
using cuda::device::attribute_t = cudaDeviceAttr
 CUDA devices have both "attributes" and "properties". More...
 
using cuda::device::attribute_value_t = int
 All CUDA device attributes (cuda::device::attribute_t) have a value of this type.
 
using cuda::device::pair_attribute_t = cudaDeviceP2PAttr
 While Individual CUDA devices have individual "attributes" (attribute_t), there are also attributes characterizing pairs; this type is used for identifying/indexing them, aliasing cudaDeviceP2PAttr.
 
using cuda::native_word_t = unsigned
 

Enumerations

enum  : priority_t { cuda::stream::default_priority = 0 }
 
enum  cuda::multiprocessor_cache_preference_t {
  cuda::multiprocessor_cache_preference_t::no_preference = cudaFuncCachePreferNone,
  cuda::multiprocessor_cache_preference_t::equal_l1_and_shared_memory = cudaFuncCachePreferEqual,
  cuda::multiprocessor_cache_preference_t::prefer_shared_memory_over_l1 = cudaFuncCachePreferShared,
  cuda::multiprocessor_cache_preference_t::prefer_l1_over_shared_memory = cudaFuncCachePreferL1,
  none = no_preference,
  equal = equal_l1_and_shared_memory,
  prefer_shared = prefer_shared_memory_over_l1,
  prefer_l1 = prefer_l1_over_shared_memory
}
 L1-vs-shared-memory balance option. More...
 
enum  cuda::multiprocessor_shared_memory_bank_size_option_t : ::std::underlying_type< cudaSharedMemConfig >::type {
  device_default = cudaSharedMemBankSizeDefault,
  four_bytes_per_bank = cudaSharedMemBankSizeFourByte,
  eight_bytes_per_bank = cudaSharedMemBankSizeEightByte
}
 A physical core (SM)'s shared memory has multiple "banks"; at most one datum per bank may be accessed simultaneously, while data in different banks can be accessed in parallel. More...
 
enum  cuda::host_thread_synch_scheduling_policy_t : unsigned int {
  cuda::heuristic = cudaDeviceScheduleAuto,
  cuda::spin = cudaDeviceScheduleSpin,
  cuda::block = cudaDeviceScheduleBlockingSync,
  cuda::yield = cudaDeviceScheduleYield,
  cuda::automatic = heuristic
}
 Scheduling policies the Runtime API may use when the host-side thread it is running in needs to wait for results from a certain device. More...
 

Functions

constexpr launch_configuration_t cuda::make_launch_config (grid::dimensions_t grid_dimensions, grid::block_dimensions_t block_dimensions, memory::shared::size_t dynamic_shared_memory_size=0u) noexcept
 a named constructor idiom for a launch_config_t
 
constexpr bool cuda::operator== (const launch_configuration_t lhs, const launch_configuration_t &rhs) noexcept
 

Detailed Description

Fundamental CUDA-related type definitions.

This is a common file for all definitions of fundamental CUDA-related types, some shared by different APIs.

Note
Most types here are defined using "Runtime API terminology", but this is inconsequential, as the corresponding Driver API types are merely aliases of them. For example, in CUDA's own header files, we have:

typedef CUevent_st * CUevent typedef CUevent_st * cudaEvent_t

Typedef Documentation

◆ block_dimension_t

using cuda::grid::block_dimension_t = typedef dimension_t

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the number of threads per block is specified in this type.

Note
Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.
At the time of writing, a grid dimension value cannot exceed 2^31 on any axis (even lower on the y and z axes), so signed 32-bit integers are "usable" even though this type is unsigned.

◆ block_dimensions_t

using cuda::grid::block_dimensions_t = typedef dimensions_t

CUDA kernels are launched in grids of blocks of threads.

This expresses the dimensions of a block within such a grid, in terms of threads.

◆ dimension_t

using cuda::grid::dimension_t = typedef decltype(dim3::x)

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the numbers of blocks per grid is specified in this type.

Note
Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.

◆ size_t

using cuda::memory::shared::size_t = typedef unsigned

Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ).

This type is large enough to hold its size.

Note
actually, uint16_t is usually large enough to hold the shared memory size (as of Volta/Turing architectures), but there are exceptions to this rule, so we have to go with the next smallest.
Todo:
consider using uint32_t.