cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::memory::shared Namespace Reference

A memory space whose contents is shared by all threads in a CUDA kernel block, but specific to each kernel block separately. More...

Typedefs

using size_t = unsigned
 Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ). More...
 

Detailed Description

A memory space whose contents is shared by all threads in a CUDA kernel block, but specific to each kernel block separately.

Shared memory is stored in a special area in each of a GPU's individual physical cores (symmetric multiprocessors, or SM's, in NVIDIA parlance) - which is also the area used for L1 cache. One may therefore think of it as L1-cache-like memory which holds arbitrary data rather than a cached copy of any global memory locations. It is only usable in device-side (= kernel) code, but control and inspection of its size is part of the CUDA API functionality.

Typedef Documentation

◆ size_t

using cuda::memory::shared::size_t = typedef unsigned

Each physical core ("Symmetric Multiprocessor") on an nVIDIA GPU has a space of shared memory (see this blog entry ).

This type is large enough to hold its size.

Note
actually, uint16_t is usually large enough to hold the shared memory size (as of Volta/Turing architectures), but there are exceptions to this rule, so we have to go with the next smallest.
Todo:
consider using uint32_t.