The block_store class is a block level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.
More...
#include <block_store.hpp>
|
| using | storage_type = typename ::rocprim::detail::empty_storage_type |
| | Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
|
| |
|
| template<class OutputIterator > |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread]) |
| | Stores an arrangement of items from across the thread block into an arrangement on continuous memory. More...
|
| |
| template<class OutputIterator > |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid) |
| | Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid. More...
|
| |
| template<class OutputIterator > |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread], storage_type &storage) |
| | Stores an arrangement of items from across the thread block into an arrangement on continuous memory, using temporary storage. More...
|
| |
| template<class OutputIterator > |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid, storage_type &storage) |
| | Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid, using temporary storage. More...
|
| |
template<class T, unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >
The block_store class is a block level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.
- Template Parameters
-
| T | - the output/output type. |
| BlockSize | - the number of threads in a block. |
| ItemsPerThread | - the number of items to be processed by each thread. |
| Method | - the method to store data. |
- Overview
- The
block_store class has a number of different methods to store data:
- Example:
In the examples store operation is performed on block of 128 threads, using type int and 8 items per thread.
__global__ void kernel(int * output)
{
const int offset = blockIdx.x * 128 * 8;
int items[8];
rocprim::block_store<int, 128, 8, store_method> blockstore;
blockstore.store(output + offset, items);
...
}
◆ storage_type
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
| using block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::storage_type = typename ::rocprim::detail::empty_storage_type |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords shared. It can be aliased to an externally allocated memory, or be a part of a union with other storage types to increase shared memory reusability.
◆ store() [1/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
| ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread] |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory.
- Template Parameters
-
| OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
| [out] | block_output | - the output iterator from the thread block to store to. |
| [in] | items | - array that data is read from. |
- Overview
- The type
T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.
◆ store() [2/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
| ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
unsigned int |
valid |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid.
- Template Parameters
-
| OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
| [out] | block_output | - the output iterator from the thread block to store to. |
| [in] | items | - array that data is read from. |
| [in] | valid | - maximum range of valid numbers to read. |
- Overview
- The type
T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.
◆ store() [3/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
| ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
storage_type & |
storage |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory, using temporary storage.
- Template Parameters
-
| OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
| [out] | block_output | - the output iterator from the thread block to store to. |
| [in] | items | - array that data is read from. |
| [in] | storage | - temporary storage for outputs. |
- Overview
- The type
T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.
- Storage reusage
- Synchronization barrier should be placed before
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
- Example.
__global__ void kernel(...)
{
int items[8];
using block_store_int = rocprim::block_store<int, 128, 8>;
block_store_int bstore;
__shared__ typename block_store_int::storage_type storage;
bstore.store(..., items, storage);
...
}
◆ store() [4/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
| ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
unsigned int |
valid, |
|
|
storage_type & |
storage |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid, using temporary storage.
- Template Parameters
-
| OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
| [out] | block_output | - the output iterator from the thread block to store to. |
| [in] | items | - array that data is read from. |
| [in] | valid | - maximum range of valid numbers to read. |
| [in] | storage | - temporary storage for outputs. |
- Overview
- The type
T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.
- Storage reusage
- Synchronization barrier should be placed before
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
- Example.
__global__ void kernel(...)
{
int items[8];
using block_store_int = rocprim::block_store<int, 128, 8>;
block_store_int bstore;
__shared__ typename block_store_int::storage_type storage;
bstore.store(..., items, valid, storage);
...
}
The documentation for this class was generated from the following file: