cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
|
Fundamental CUDA-related type definitions. More...
#include "detail/optional.hpp"
#include "detail/optional_ref.hpp"
#include "detail/span.hpp"
#include "detail/region.hpp"
#include "detail/type_traits.hpp"
#include <builtin_types.h>
#include <cuda.h>
#include <type_traits>
#include <utility>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <vector>
#include <stdexcept>
Go to the source code of this file.
Classes | |
struct | cuda::array::dimensions_t< NumDimensions > |
CUDA's array memory-objects are multi-dimensional; but their dimensions, or extents, are not the same as cuda::grid::dimensions_t ; they may be much larger in each axis. More... | |
struct | cuda::array::dimensions_t< 3 > |
Dimensions for 3D CUDA arrays. More... | |
struct | cuda::array::dimensions_t< 2 > |
Dimensions for 2D CUDA arrays. More... | |
struct | cuda::grid::dimensions_t |
A richer (kind-of-a-)wrapper for CUDA's dim3 class, used to specify dimensions for blocks (in terms of threads) and of grids(in terms of blocks, or overall). More... | |
struct | cuda::grid::overall_dimensions_t |
Dimensions of a grid in threads, i.e. More... | |
struct | cuda::grid::composite_dimensions_t |
Composite dimensions for a grid - in terms of blocks, then also down into the block dimensions completing the information to the thread level. More... | |
Namespaces | |
cuda | |
Definitions and functionality wrapping CUDA APIs. | |
cuda::array | |
CUDA facilities for interpolating access to multidimensional array objects, in particular via the array_t class. | |
cuda::event | |
CUDA timing functionality, via events and their related code (not including the event wrapper type event_t itself) | |
cuda::event::ipc | |
Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself) | |
cuda::stream | |
Definitions and functionality related to CUDA streams (not including the device wrapper type stream_t itself) | |
cuda::memory | |
Representation, allocation and manipulation of CUDA-related memory, of different. | |
cuda::memory::device | |
CUDA-Device-global memory on a single device (not accessible from the host) | |
cuda::memory::shared | |
A memory space whose contents is shared by all threads in a CUDA kernel block, but specific to each kernel block separately. | |
cuda::memory::managed | |
Paged memory accessible in both device-side and host-side code by triggering transfers of pages between physical system memory and physical device memory. | |
cuda::device | |
Definitions and functionality related to CUDA devices (not including the device wrapper type cuda::device_t itself) | |
cuda::device::peer_to_peer | |
API functions and definitions relating to communications among peer CUDA GPU devices on the same system. | |
Typedefs | |
template<typename T , size_t N> | |
using | cuda::c_array = T[N] |
using | cuda::status_t = CUresult |
Indicates either the result (success or error index) of a CUDA Runtime or Driver API call, or the overall status of the API (which is typically the last triggered error). More... | |
using | cuda::size_t = ::std::size_t |
A size type for use throughout the wrappers library (except when specific API functions limit the size further) | |
using | cuda::dimensionality_t = size_t |
The index or number of dimensions of an entity (as opposed to the extent in any dimension) - typically just 0, 1, 2 or 3. More... | |
using | cuda::array::dimension_t = size_t |
An individual dimension extent for an array. | |
using | cuda::event::handle_t = CUevent |
The CUDA driver's raw handle for events. | |
using | cuda::stream::handle_t = CUstream |
The CUDA driver's raw handle for streams. | |
using | cuda::stream::priority_t = int |
CUDA streams have a scheduling priority, with lower values meaning higher priority. More... | |
using | cuda::stream::callback_t = CUstreamCallback |
The CUDA driver's raw handle for a host-side callback function. | |
using | cuda::grid::dimension_t = decltype(dim3::x) |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More... | |
using | cuda::grid::block_dimension_t = dimension_t |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More... | |
using | cuda::grid::block_dimensions_t = dimensions_t |
CUDA kernels are launched in grids of blocks of threads. More... | |
using | cuda::grid::overall_dimension_t = size_t |
Dimension of a grid in threads along one axis, i.e. More... | |
using | cuda::memory::pointer::attribute_t = CUpointer_attribute |
Raw CUDA driver choice type for attributes of pointers. | |
using | cuda::memory::device::address_t = CUdeviceptr |
The numeric type which can represent the range of memory addresses on a CUDA device. More... | |
using | cuda::memory::shared::size_t = unsigned |
Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ). More... | |
using | cuda::device::id_t = CUdevice |
Numeric ID of a CUDA device used by the CUDA Runtime API. More... | |
using | cuda::device::attribute_t = CUdevice_attribute |
CUDA devices have both "attributes" and "properties". More... | |
using | cuda::device::attribute_value_t = int |
All CUDA device attributes (cuda::device::attribute_t) have a value of this type. | |
using | cuda::device::peer_to_peer::attribute_t = CUdevice_P2PAttribute |
While Individual CUDA devices have individual "attributes" (attribute_t), there are also attributes characterizing pairs; this type is used for identifying/indexing them. | |
using | cuda::context::handle_t = CUcontext |
Raw CUDA driver handle for a context; see {context_t}. | |
using | cuda::context::flags_t = unsigned |
using | cuda::device::flags_t = context::flags_t |
using | cuda::device::primary_context::handle_t = cuda::context::handle_t |
Raw CUDA driver handle for a device's primary context. | |
using | cuda::device::host_thread_sync_scheduling_policy_t = context::host_thread_sync_scheduling_policy_t |
using | cuda::uuid_t = CUuuid |
The CUDA-driver-specific representation of a UUID value; see also {device_t::uuid()}. | |
using | cuda::kernel::attribute_t = CUfunction_attribute |
Raw CUDA driver selector of a kernel attribute. | |
using | cuda::kernel::attribute_value_t = int |
The uniform type the CUDA driver uses for all kernel attributes; it is typically more appropriate to use cuda::kernel_t methods, which also employ more specific, appropriate types. More... | |
using | cuda::kernel::handle_t = CUfunction |
Enumerations | |
enum | : priority_t { cuda::stream::default_priority = 0 } |
enum | cuda::memory::managed::initial_visibility_t { to_all_devices, to_supporters_of_concurrent_managed_access } |
The choices of which categories CUDA devices must a managed memory region be visible to. | |
enum | cuda::multiprocessor_cache_preference_t : ::std::underlying_type< CUfunc_cache_enum >::type { cuda::multiprocessor_cache_preference_t::no_preference = CU_FUNC_CACHE_PREFER_NONE, cuda::multiprocessor_cache_preference_t::equal_l1_and_shared_memory = CU_FUNC_CACHE_PREFER_EQUAL, cuda::multiprocessor_cache_preference_t::prefer_shared_memory_over_l1 = CU_FUNC_CACHE_PREFER_SHARED, cuda::multiprocessor_cache_preference_t::prefer_l1_over_shared_memory = CU_FUNC_CACHE_PREFER_L1, none = no_preference, equal = equal_l1_and_shared_memory, prefer_shared = prefer_shared_memory_over_l1, prefer_l1 = prefer_l1_over_shared_memory } |
L1-vs-shared-memory balance option. More... | |
enum | cuda::multiprocessor_shared_memory_bank_size_option_t : ::std::underlying_type< CUsharedconfig >::type { device_default = CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE, four_bytes_per_bank = CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE, eight_bytes_per_bank = CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE } |
A physical core (SM)'s shared memory has multiple "banks"; at most one datum per bank may be accessed simultaneously, while data in different banks can be accessed in parallel. More... | |
enum | cuda::context::host_thread_sync_scheduling_policy_t : unsigned int { cuda::context::heuristic = CU_CTX_SCHED_AUTO, cuda::context::default_ = heuristic, cuda::context::spin = CU_CTX_SCHED_SPIN, cuda::context::block = CU_CTX_SCHED_BLOCKING_SYNC, cuda::context::yield = CU_CTX_SCHED_YIELD, cuda::context::automatic = heuristic } |
Scheduling policies the CUDA driver may use when the host-side thread it is running in needs to wait for results from a certain device or context. More... | |
Functions | |
address_t | cuda::memory::device::address (const void *device_ptr) noexcept |
address_t | cuda::memory::device::address (memory::const_region_t region) noexcept |
void * | cuda::memory::as_pointer (device::address_t address) noexcept |
Fundamental CUDA-related type definitions.
This is a common file for all definitions of fundamental CUDA-related types, some shared by different APIs.
stream.hpp
contains a stream_t class with its unique stream handle. Those are the ones you will want to use - they are more convenient and safer. using cuda::kernel::attribute_value_t = typedef int |
The uniform type the CUDA driver uses for all kernel attributes; it is typically more appropriate to use cuda::kernel_t methods, which also employ more specific, appropriate types.
using cuda::grid::block_dimension_t = typedef dimension_t |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.
In each of these, the number of threads per block is specified in this type.
using cuda::grid::block_dimensions_t = typedef dimensions_t |
CUDA kernels are launched in grids of blocks of threads.
This expresses the dimensions of a block within such a grid, in terms of threads.
using cuda::grid::dimension_t = typedef decltype(dim3::x) |
CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.
In each of these, the numbers of blocks per grid is specified in this type.
using cuda::grid::overall_dimension_t = typedef size_t |
Dimension of a grid in threads along one axis, i.e.
a multiplication of a grid's block dimension and the grid's dimension in blocks, on some axis.
enum cuda::context::host_thread_sync_scheduling_policy_t : unsigned int |
Scheduling policies the CUDA driver may use when the host-side thread it is running in needs to wait for results from a certain device or context.