Functions
template<typename T >
KAT_FD T *	contiguous (unsigned num_elements_per_warp, offset_t base_offset=0)
	Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves, using striding. More...

template<typename T >
KAT_FD T *	strided (offset_t base_offset=0)
	Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves into contiguous areas. More...

Detailed Description

Note: This namespace's contents is only relevant for linear grids

Function Documentation

§ contiguous()

template<typename T >

KAT_FD T* kat::shared_memory::dynamic::warp_specific::contiguous	(	unsigned	num_elements_per_warp,
		offset_t	base_offset = `0`
	)

Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves, using striding.

The partitioning pattern is for each warp to get elements at a fixed stride rather than a contiguous set of elements; this pattern ensures that different warps are never in a bank conflict when accessing their "private" shared memory - provided the number of warps divides 32, or is a multiple of 32. The downside of this pattern is that different lanes accessing different elements in a warp's shared memory will likely be in bank conflict (and certainly be in conflict if there are 32 warps).

Template Parameters

T	the element type assumed for all shared memory (or at least for alignment and for the warp-specific shared memory)

Parameters

base_offset	How far into the block's overall shared memory to start partitioning the memory into warp-specific sequences
num_elements_per_warp	Size in elements of the area agreed to be specific to each warp

Returns: Address of the first warp-specific element in shared memory

§ strided()

template<typename T >

KAT_FD T* kat::shared_memory::dynamic::warp_specific::strided ( offset_t base_offset = 0 )

Accesses the calling thread's warp-specific dynamic shared memory - assuming the warps voluntarily divvy up the shared memory beyond some point amongst themselves into contiguous areas.

The partitioning pattern is for each warp to get a contiguous sequence of elements in memory.