Type-generic wrappers for CUDA atomic operations. More...

#include <type_traits>
#include <climits>
#include <cuda_runtime_api.h>

Typedefs
using	kat::grid_dimension_t = decltype(dim3::x)
	CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...

using	kat::grid_block_dimension_t = grid_dimension_t
	CUDA kernels are launched in grids of blocks of threads, in 3 dimensions. More...

using	kat::native_word_t = unsigned

template<typename Size >
using	kat::promoted_size_t = typename std::common_type< Size, native_word_t >::type
	a size type no smaller than a native word. More...

using	kat::lane_mask_t = unsigned
	A mask with one bit for each lane in a warp. More...

Enumerations
enum	: native_word_t { warp_size = 32 }

enum	: native_word_t { log_warp_size = 5 }

enum	: lane_mask_t { kat::full_warp_mask = 0xFFFFFFFF, kat::empty_warp_mask = 0x0 }

Functions
template<typename T >
constexpr std::size_t	kat::size_in_bits ()
	The number bits in the representation of a value of type T. More...

template<typename T >
constexpr std::size_t	kat::size_in_bits (const T &)
	The number bits in the representation of a value of type T. More...

template<typename Interpreted , typename Original >
KAT_FHD Interpreted	kat::reinterpret (Original &x)

Detailed Description

Type-generic wrappers for CUDA atomic operations.

Some basic type and constant definitions used by all device-side CUDA KAT code.

CUDA's atomic "primitive" atomic functions are non-generic C functions, defined only for some specific types - and sometimes only for some of the types of the same size for which semantics are identical. In this file are found type-generic variants of these same function, with functionality extended as much as possible - either through recasting or using the compare-and-swap (compare-and-exchange) primitive to implement other functions for types not directly supported.

Additionally, the wrapper used for emulating atomics on arbitrary types is made available here for the user to be able to do the same for arbitrary functions.

Note: nVIDIA makes a rather unfortunate and non-intuitive choice of parameter names for its atomic functions, which - at least for now, and for the sake of consistency - I adopt: they call a pointer an "address", and they call the new value "val" even if there is another value to consider (e.g. atomicCAS). Also, what's with the shorthand? Did you run out of disk space? :-(

Typedef Documentation

§ grid_block_dimension_t

using kat::grid_block_dimension_t = typedef grid_dimension_t

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the number of threads per block is specified in this type.

Note: Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.

§ grid_dimension_t

using kat::grid_dimension_t = typedef decltype(dim3::x)

CUDA kernels are launched in grids of blocks of threads, in 3 dimensions.

In each of these, the numbers of blocks per grid is specified in this type.

Note: Theoretically, CUDA could split the type for blocks per grid and threads per block, but for now they're the same.; All three dimensions in dim3 are of the same type as dim3::x

§ lane_mask_t

using kat::lane_mask_t = typedef unsigned

A mask with one bit for each lane in a warp.

Used to indicate which threads meet a certain criterion or need to have some action applied to them.

Todo:: : Consider using a 32-bit bit field

§ promoted_size_t

template<typename Size >

using kat::promoted_size_t = typedef typename std::common_type<Size, native_word_t>::type

a size type no smaller than a native word.

Sometimes, in device code, we only need our size type to cover a small range of values; but - it is still more effective to use a full native word, rather than to risk extra instructions to enforce the limits of sub-native-word values. And while it's true this might not help much, or be optimized away - let's be on the safe side anyway.

Enumeration Type Documentation

§ anonymous enum

anonymous enum : lane_mask_t

Enumerator
full_warp_mask	Bits turned on for all lanes in thw warp.
empty_warp_mask	Bits turned on for all lanes in thw warp.

Function Documentation

§ reinterpret()

template<typename Interpreted , typename Original >

KAT_FHD Interpreted kat::reinterpret ( Original & x )

Note: Interpreted can be either a value or a reference type.

Todo:: Would it be better to return a reference?

§ size_in_bits() [1/2]

template<typename T >

constexpr std::size_t kat::size_in_bits ( )

The number bits in the representation of a value of type T.

Note: with this variant, you'll need to manually specify the type.

§ size_in_bits() [2/2]