cuda-kat
CUDA kernel author's tools
|
Simpler / more basic utility code for working with shared memory, not involving any actual computation. More...
Namespaces | |
kat::shared_memory::dynamic::warp_specific | |
Typedefs | |
using | kat::shared_memory::offset_t = int |
using | kat::shared_memory::size_t = unsigned |
Functions | |
KAT_FD size_t | kat::shared_memory::size () |
Obtain the total size in bytes of the (per-block) shared memory for the running kernel - static + dynamic. More... | |
KAT_FD size_t | kat::shared_memory::static_::size () |
Obtain the size in bytes of the (per-block) static shared memory for the running kernel. More... | |
template<typename T = unsigned char> | |
KAT_FD size_t | kat::shared_memory::dynamic::size () |
Obtain the size of the (per-block) dynamic shared_memory for the running kernel. More... | |
template<typename T > | |
KAT_DEV T * | kat::shared_memory::dynamic::proxy () |
This gadget is necessary for using dynamically-sized shared memory in templated kernels (i.e. More... | |
template<typename T > | |
KAT_FD T * | kat::shared_memory::dynamic::warp_specific::contiguous (unsigned num_elements_per_warp, offset_t base_offset=0) |
Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves, using striding. More... | |
template<typename T > | |
KAT_FD T * | kat::shared_memory::dynamic::warp_specific::strided (offset_t base_offset=0) |
Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves into contiguous areas. More... | |
Simpler / more basic utility code for working with shared memory, not involving any actual computation.
KAT_DEV T* kat::shared_memory::dynamic::proxy | ( | ) |
This gadget is necessary for using dynamically-sized shared memory in templated kernels (i.e.
shared memory whose size is set by the launch parameters rather than being fixed at compile time). Use of such memory requires a __shared__ extern
unspecified-size array variable; however, the way nvcc works, you cannot declare two such variables of different types in your program - even if they're in different scopes. That means we either need to have a different variable name for each type (which would lead us into preprocessor macro hell), or - just use the same type, and reintrepret according to the type we want... which is what this gadget does.
KAT_FD size_t kat::shared_memory::size | ( | ) |
Obtain the total size in bytes of the (per-block) shared memory for the running kernel - static + dynamic.
KAT_FD size_t kat::shared_memory::static_::size | ( | ) |
Obtain the size in bytes of the (per-block) static shared memory for the running kernel.
KAT_FD size_t kat::shared_memory::dynamic::size | ( | ) |
Obtain the size of the (per-block) dynamic shared_memory for the running kernel.