rocPRIM
Public Types | Public Member Functions | List of all members
block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ > Class Template Reference

The block_store class is a block level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory. More...

#include <block_store.hpp>

Public Types

using storage_type = typename ::rocprim::detail::empty_storage_type
 Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
 

Public Member Functions

template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void store (OutputIterator block_output, T(&items)[ItemsPerThread])
 Stores an arrangement of items from across the thread block into an arrangement on continuous memory. More...
 
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void store (OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid)
 Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid. More...
 
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void store (OutputIterator block_output, T(&items)[ItemsPerThread], storage_type &storage)
 Stores an arrangement of items from across the thread block into an arrangement on continuous memory, using temporary storage. More...
 
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void store (OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid, storage_type &storage)
 Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid, using temporary storage. More...
 

Detailed Description

template<class T, unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >

The block_store class is a block level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.

Template Parameters
T- the output/output type.
BlockSize- the number of threads in a block.
ItemsPerThread- the number of items to be processed by each thread.
Method- the method to store data.
Overview
Example:

In the examples store operation is performed on block of 128 threads, using type int and 8 items per thread.

__global__ void kernel(int * output)
{
const int offset = blockIdx.x * 128 * 8;
int items[8];
rocprim::block_store<int, 128, 8, store_method> blockstore;
blockstore.store(output + offset, items);
...
}

Member Typedef Documentation

◆ storage_type

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
using block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::storage_type = typename ::rocprim::detail::empty_storage_type

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords shared. It can be aliased to an externally allocated memory, or be a part of a union with other storage types to increase shared memory reusability.

Member Function Documentation

◆ store() [1/4]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store ( OutputIterator  block_output,
T(&)  items[ItemsPerThread] 
)
inline

Stores an arrangement of items from across the thread block into an arrangement on continuous memory.

Template Parameters
OutputIterator- [inferred] an iterator type for output (can be a simple pointer.
Parameters
[out]block_output- the output iterator from the thread block to store to.
[in]items- array that data is read from.
Overview
  • The type T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.

◆ store() [2/4]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store ( OutputIterator  block_output,
T(&)  items[ItemsPerThread],
unsigned int  valid 
)
inline

Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid.

Template Parameters
OutputIterator- [inferred] an iterator type for output (can be a simple pointer.
Parameters
[out]block_output- the output iterator from the thread block to store to.
[in]items- array that data is read from.
[in]valid- maximum range of valid numbers to read.
Overview
  • The type T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.

◆ store() [3/4]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store ( OutputIterator  block_output,
T(&)  items[ItemsPerThread],
storage_type storage 
)
inline

Stores an arrangement of items from across the thread block into an arrangement on continuous memory, using temporary storage.

Template Parameters
OutputIterator- [inferred] an iterator type for output (can be a simple pointer.
Parameters
[out]block_output- the output iterator from the thread block to store to.
[in]items- array that data is read from.
[in]storage- temporary storage for outputs.
Overview
  • The type T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void kernel(...)
{
int items[8];
using block_store_int = rocprim::block_store<int, 128, 8>;
block_store_int bstore;
__shared__ typename block_store_int::storage_type storage;
bstore.store(..., items, storage);
...
}

◆ store() [4/4]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store ( OutputIterator  block_output,
T(&)  items[ItemsPerThread],
unsigned int  valid,
storage_type storage 
)
inline

Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid, using temporary storage.

Template Parameters
OutputIterator- [inferred] an iterator type for output (can be a simple pointer.
Parameters
[out]block_output- the output iterator from the thread block to store to.
[in]items- array that data is read from.
[in]valid- maximum range of valid numbers to read.
[in]storage- temporary storage for outputs.
Overview
  • The type T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void kernel(...)
{
int items[8];
using block_store_int = rocprim::block_store<int, 128, 8>;
block_store_int bstore;
__shared__ typename block_store_int::storage_type storage;
bstore.store(..., items, valid, storage);
...
}

The documentation for this class was generated from the following file: