cuda-kat
CUDA kernel author's tools
Namespaces | Macros | Typedefs | Functions
basic.cuh File Reference

Simpler / more basic utility code for working with shared memory, not involving any actual computation. More...

#include <kat/on_device/grid_info.cuh>
#include <kat/on_device/ptx.cuh>

Namespaces

 kat::shared_memory::dynamic::warp_specific
 

Typedefs

using kat::shared_memory::offset_t = int
 
using kat::shared_memory::size_t = unsigned
 

Functions

KAT_FD size_t kat::shared_memory::size ()
 Obtain the total size in bytes of the (per-block) shared memory for the running kernel - static + dynamic. More...
 
KAT_FD size_t kat::shared_memory::static_::size ()
 Obtain the size in bytes of the (per-block) static shared memory for the running kernel. More...
 
template<typename T = unsigned char>
KAT_FD size_t kat::shared_memory::dynamic::size ()
 Obtain the size of the (per-block) dynamic shared_memory for the running kernel. More...
 
template<typename T >
KAT_DEV T * kat::shared_memory::dynamic::proxy ()
 This gadget is necessary for using dynamically-sized shared memory in templated kernels (i.e. More...
 
template<typename T >
KAT_FD T * kat::shared_memory::dynamic::warp_specific::contiguous (unsigned num_elements_per_warp, offset_t base_offset=0)
 Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves, using striding. More...
 
template<typename T >
KAT_FD T * kat::shared_memory::dynamic::warp_specific::strided (offset_t base_offset=0)
 Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves into contiguous areas. More...
 

Detailed Description

Simpler / more basic utility code for working with shared memory, not involving any actual computation.

Function Documentation

§ proxy()

template<typename T >
KAT_DEV T* kat::shared_memory::dynamic::proxy ( )

This gadget is necessary for using dynamically-sized shared memory in templated kernels (i.e.

shared memory whose size is set by the launch parameters rather than being fixed at compile time). Use of such memory requires a __shared__ extern unspecified-size array variable; however, the way nvcc works, you cannot declare two such variables of different types in your program - even if they're in different scopes. That means we either need to have a different variable name for each type (which would lead us into preprocessor macro hell), or - just use the same type, and reintrepret according to the type we want... which is what this gadget does.

Note
all threads would get the same address when calling this function, so you would need to add different offsets for different threads if you want a warp-specific or thread-specific pointer.
see also https://stackoverflow.com/questions/27570552/

§ size() [1/3]

KAT_FD size_t kat::shared_memory::size ( )

Obtain the total size in bytes of the (per-block) shared memory for the running kernel - static + dynamic.

Note
requires special register access which is not so cheap.

§ size() [2/3]

KAT_FD size_t kat::shared_memory::static_::size ( )

Obtain the size in bytes of the (per-block) static shared memory for the running kernel.

Note
requires special register access which is not so cheap.

§ size() [3/3]

template<typename T = unsigned char>
KAT_FD size_t kat::shared_memory::dynamic::size ( )

Obtain the size of the (per-block) dynamic shared_memory for the running kernel.

Note
without a template parameter, returns the size in bytes
requires special register access which is not so cheap.