cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Fundamental CUDA-related type definitions. More...
#include "detail/optional.hpp"
#include <builtin_types.h>
#include <cuda.h>
#include <type_traits>
#include <utility>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <vector>
#include <stdexcept>
Go to the source code of this file.
Classes | |
struct | cuda::span< T > |
A "poor man's" span class. More... | |
struct | cuda::array::dimensions_t< NumDimensions > |
CUDA's array memory-objects are multi-dimensional; but their dimensions, or extents, are not the same as cuda::grid::dimensions_t ; they may be much larger in each axis. More... | |
struct | cuda::array::dimensions_t< 3 > |
Dimensions for 3D CUDA arrays. More... | |
struct | cuda::array::dimensions_t< 2 > |
Dimensions for 2D CUDA arrays. More... | |
struct | cuda::grid::dimensions_t |
A richer (kind-of-a-)wrapper for CUDA's dim3 class, used to specify dimensions for blocks (in terms of threads) and of grids(in terms of blocks, or overall). More... | |
struct | cuda::grid::composite_dimensions_t |
Composite dimensions for a grid - in terms of blocks, then also down into the block dimensions completing the information to the thread level. More... | |
struct | cuda::grid::overall_dimensions_t |
Dimensions of a grid in threads, i.e. More... | |
struct | cuda::memory::region_t |
struct | cuda::memory::const_region_t |
struct | cuda::memory::external::subregion_spec_t |
Describes a subregion with the context of a larger (memory) region. More... | |
Namespaces | |
cuda | |
All definitions and functionality wrapping the CUDA Runtime API. | |
cuda::event | |
Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself) | |
cuda::stream | |
Definitions and functionality related to CUDA streams (not including the device wrapper type stream_t itself) | |
memory | |
Representation, allocation and manipulation of CUDA-related memory, of different kinds. | |
cuda::memory::device | |
CUDA-Device-global memory on a single device (not accessible from the host) | |
cuda::memory::managed | |
This type of memory, also known as unified memory, appears within a unified, all-system address space - and is used with the same address range on the host and on all relevant CUDA devices on a system. | |
cuda::memory::external | |
Representation of memory resources external to CUDA. | |
cuda::device | |
Definitions and functionality related to CUDA devices (not including the device wrapper type device_t itself) | |
Macros | |
#define | CUDA_API_WRAPPERS_COMMON_TYPES_HPP_ |
#define | __device__ |
#define | __host__ |
#define | CPP14_CONSTEXPR |
#define | NOEXCEPT_IF_NDEBUG noexcept(false) |
Typedefs | |
using | cuda::status_t = CUresult |
Indicates either the result (success or error index) of a CUDA Runtime or Driver API call, or the overall status of the API (which is typically the last triggered error). More... | |
using | cuda::size_t = ::std::size_t |
using | cuda::dimensionality_t = unsigned |
The index or number of dimensions of an entity (as opposed to the extent in any dimension) - typically just 0, 1, 2 or 3. | |
using | cuda::array::dimension_t = size_t |
using | cuda::event::handle_t = CUevent |
The CUDA Runtime API's numeric handle for events. | |
using | cuda::stream::handle_t = CUstream |
The CUDA API's handle for streams. | |
using | cuda::stream::priority_t = int |
CUDA streams have a scheduling priority, with lower values meaning higher priority. More... | |
using | cuda::grid::dimension_t = decltype(dim3::x) |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More... | |
using | cuda::grid::block_dimension_t = dimension_t |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More... | |
using | cuda::grid::block_dimensions_t = dimensions_t |
CUDA kernels are launched in grids of blocks of threads. More... | |
using | cuda::grid::overall_dimension_t = size_t |
Dimension of a grid in threads along one axis, i.e. More... | |
using | cuda::memory::pointer::attribute_t = CUpointer_attribute |
using | cuda::memory::device::address_t = CUdeviceptr |
The numeric type which can represent the range of memory addresses on a CUDA device. | |
using | cuda::memory::shared::size_t = unsigned |
Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ). More... | |
using | cuda::memory::managed::range_attribute_t = CUmem_range_attribute |
using | cuda::memory::external::handle_t = CUexternalMemory |
using | cuda::device::id_t = CUdevice |
Numeric ID of a CUDA device used by the CUDA Runtime API. More... | |
using | cuda::device::attribute_t = CUdevice_attribute |
CUDA devices have both "attributes" and "properties". More... | |
using | cuda::device::attribute_value_t = int |
All CUDA device attributes (cuda::device::attribute_t) have a value of this type. | |
using | cuda::device::peer_to_peer::attribute_t = CUdevice_P2PAttribute |
While Individual CUDA devices have individual "attributes" (attribute_t), there are also attributes characterizing pairs; this type is used for identifying/indexing them. | |
using | cuda::context::handle_t = CUcontext |
using | cuda::context::flags_t = unsigned |
using | cuda::device::flags_t = context::flags_t |
using | cuda::device::primary_context::handle_t = cuda::context::handle_t |
using | cuda::device::host_thread_sync_scheduling_policy_t = context::host_thread_sync_scheduling_policy_t |
using | cuda::native_word_t = unsigned |
using | cuda::uuid_t = CUuuid |
using | cuda::kernel::attribute_t = CUfunction_attribute |
using | cuda::kernel::attribute_value_t = int |
using | cuda::kernel::handle_t = CUfunction |
template<typename T > | |
using | cuda::dynarray = ::std::vector< T > |
Enumerations | |
enum | : priority_t { cuda::stream::default_priority = 0 } |
enum | initial_visibility_t { to_all_devices, to_supporters_of_concurrent_managed_access } |
enum | cuda::multiprocessor_cache_preference_t : ::std::underlying_type< CUfunc_cache_enum >::type { cuda::multiprocessor_cache_preference_t::no_preference = CU_FUNC_CACHE_PREFER_NONE, cuda::multiprocessor_cache_preference_t::equal_l1_and_shared_memory = CU_FUNC_CACHE_PREFER_EQUAL, cuda::multiprocessor_cache_preference_t::prefer_shared_memory_over_l1 = CU_FUNC_CACHE_PREFER_SHARED, cuda::multiprocessor_cache_preference_t::prefer_l1_over_shared_memory = CU_FUNC_CACHE_PREFER_L1, none = no_preference, equal = equal_l1_and_shared_memory, prefer_shared = prefer_shared_memory_over_l1, prefer_l1 = prefer_l1_over_shared_memory } |
L1-vs-shared-memory balance option. More... | |
enum | cuda::multiprocessor_shared_memory_bank_size_option_t : ::std::underlying_type< CUsharedconfig >::type { device_default = CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE, four_bytes_per_bank = CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE, eight_bytes_per_bank = CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE } |
A physical core (SM)'s shared memory has multiple "banks"; at most one datum per bank may be accessed simultaneously, while data in different banks can be accessed in parallel. More... | |
enum | cuda::context::host_thread_sync_scheduling_policy_t : unsigned int { cuda::context::heuristic = CU_CTX_SCHED_AUTO, cuda::context::default_ = heuristic, cuda::context::spin = CU_CTX_SCHED_SPIN, cuda::context::block = CU_CTX_SCHED_BLOCKING_SYNC, cuda::context::yield = CU_CTX_SCHED_YIELD, cuda::context::automatic = heuristic } |
Scheduling policies the Runtime API may use when the host-side thread it is running in needs to wait for results from a certain device. More... | |
Functions | |
constexpr bool | cuda::grid::operator== (composite_dimensions_t lhs, composite_dimensions_t rhs) noexcept |
constexpr bool | cuda::grid::operator!= (composite_dimensions_t lhs, composite_dimensions_t rhs) noexcept |
constexpr bool | cuda::grid::operator== (overall_dimensions_t lhs, overall_dimensions_t rhs) noexcept |
constexpr bool | cuda::grid::operator!= (overall_dimensions_t lhs, overall_dimensions_t rhs) noexcept |
constexpr overall_dimensions_t | cuda::grid::operator* (dimensions_t grid_dims, block_dimensions_t block_dims) noexcept |
address_t | cuda::memory::device::address (const void *device_ptr) noexcept |
Return a pointers address as a numeric value of the type appropriate for device. More... | |
void * | cuda::memory::as_pointer (device::address_t address) noexcept |
Fundamental CUDA-related type definitions.
This is a common file for all definitions of fundamental CUDA-related types, some shared by different APIs.
stream.hpp
contains a stream_t class with its unique stream handle. Those are the ones you will want to use - they are more convenient and safer. using cuda::grid::block_dimension_t = typedef dimension_t |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.
In each of these, the number of threads per block is specified in this type.
using cuda::grid::block_dimensions_t = typedef dimensions_t |
CUDA kernels are launched in grids of blocks of threads.
This expresses the dimensions of a block within such a grid, in terms of threads.
using cuda::grid::dimension_t = typedef decltype(dim3::x) |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.
In each of these, the numbers of blocks per grid is specified in this type.
using cuda::grid::overall_dimension_t = typedef size_t |
Dimension of a grid in threads along one axis, i.e.
a multiplication of a grid's block dimension and the grid's dimension in blocks, on some axis.
using cuda::memory::shared::size_t = typedef unsigned |
Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ).
This type is large enough to hold its size.
enum cuda::context::host_thread_sync_scheduling_policy_t : unsigned int |
Scheduling policies the Runtime API may use when the host-side thread it is running in needs to wait for results from a certain device.