cuda-kat
CUDA kernel author's tools
Functions
kat::shared_memory::dynamic::warp_specific Namespace Reference

Functions

template<typename T >
KAT_FD T * contiguous (unsigned num_elements_per_warp, offset_t base_offset=0)
 Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves, using striding. More...
 
template<typename T >
KAT_FD T * strided (offset_t base_offset=0)
 Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves into contiguous areas. More...
 

Detailed Description

Note
This namespace's contents is only relevant for linear grids

Function Documentation

§ contiguous()

template<typename T >
KAT_FD T* kat::shared_memory::dynamic::warp_specific::contiguous ( unsigned  num_elements_per_warp,
offset_t  base_offset = 0 
)

Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves, using striding.

The partitioning pattern is for each warp to get elements at a fixed stride rather than a contiguous set of elements; this pattern ensures that different warps are never in a bank conflict when accessing their "private" shared memory - provided the number of warps divides 32, or is a multiple of 32. The downside of this pattern is that different lanes accessing different elements in a warp's shared memory will likely be in bank conflict (and certainly be in conflict if there are 32 warps).

Template Parameters
Tthe element type assumed for all shared memory (or at least for alignment and for the warp-specific shared memory)
Parameters
base_offsetHow far into the block's overall shared memory to start partitioning the memory into warp-specific sequences
num_elements_per_warpSize in elements of the area agreed to be specific to each warp
Returns
Address of the first warp-specific element in shared memory

§ strided()

template<typename T >
KAT_FD T* kat::shared_memory::dynamic::warp_specific::strided ( offset_t  base_offset = 0)

Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves into contiguous areas.

The partitioning pattern is for each warp to get a contiguous sequence of elements in memory.

Template Parameters
Tthe element type assumed for all shared memory (or at least for alignment and for the warp-specific shared memory)
Parameters
base_offsetHow far into the block's overall shared memory to start partitioning the memory into warp-specific sequences
Returns
Address of the first warp-specific element in shared memory