The block_store
class is a block level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.
More...
#include <block_store.hpp>
|
using | storage_type = typename ::rocprim::detail::empty_storage_type |
| Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
|
|
|
template<class OutputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread]) |
| Stores an arrangement of items from across the thread block into an arrangement on continuous memory. More...
|
|
template<class OutputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid) |
| Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid . More...
|
|
template<class OutputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread], storage_type &storage) |
| Stores an arrangement of items from across the thread block into an arrangement on continuous memory, using temporary storage. More...
|
|
template<class OutputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid, storage_type &storage) |
| Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid , using temporary storage. More...
|
|
template<class T, unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >
The block_store
class is a block level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.
- Template Parameters
-
T | - the output/output type. |
BlockSize | - the number of threads in a block. |
ItemsPerThread | - the number of items to be processed by each thread. |
Method | - the method to store data. |
- Overview
- The
block_store
class has a number of different methods to store data:
- Example:
In the examples store operation is performed on block of 128 threads, using type int
and 8 items per thread.
__global__ void kernel(int * output)
{
const int offset = blockIdx.x * 128 * 8;
int items[8];
rocprim::block_store<int, 128, 8, store_method> blockstore;
blockstore.store(output + offset, items);
...
}
◆ storage_type
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
using block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::storage_type = typename ::rocprim::detail::empty_storage_type |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords shared
. It can be aliased to an externally allocated memory, or be a part of a union with other storage types to increase shared memory reusability.
◆ store() [1/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread] |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory.
- Template Parameters
-
OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
[out] | block_output | - the output iterator from the thread block to store to. |
[in] | items | - array that data is read from. |
- Overview
- The type
T
must be such that an object of type InputIterator
can be dereferenced and then implicitly converted to T
.
◆ store() [2/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
unsigned int |
valid |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid
.
- Template Parameters
-
OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
[out] | block_output | - the output iterator from the thread block to store to. |
[in] | items | - array that data is read from. |
[in] | valid | - maximum range of valid numbers to read. |
- Overview
- The type
T
must be such that an object of type InputIterator
can be dereferenced and then implicitly converted to T
.
◆ store() [3/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
storage_type & |
storage |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory, using temporary storage.
- Template Parameters
-
OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
[out] | block_output | - the output iterator from the thread block to store to. |
[in] | items | - array that data is read from. |
[in] | storage | - temporary storage for outputs. |
- Overview
- The type
T
must be such that an object of type InputIterator
can be dereferenced and then implicitly converted to T
.
- Storage reusage
- Synchronization barrier should be placed before
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
- Example.
__global__ void kernel(...)
{
int items[8];
using block_store_int = rocprim::block_store<int, 128, 8>;
block_store_int bstore;
__shared__ typename block_store_int::storage_type storage;
bstore.store(..., items, storage);
...
}
◆ store() [4/4]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, block_store_method Method = block_store_method::block_store_direct, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_store< T, BlockSizeX, ItemsPerThread, Method, BlockSizeY, BlockSizeZ >::store |
( |
OutputIterator |
block_output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
unsigned int |
valid, |
|
|
storage_type & |
storage |
|
) |
| |
|
inline |
Stores an arrangement of items from across the thread block into an arrangement on continuous memory, which is guarded by range valid
, using temporary storage.
- Template Parameters
-
OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
[out] | block_output | - the output iterator from the thread block to store to. |
[in] | items | - array that data is read from. |
[in] | valid | - maximum range of valid numbers to read. |
[in] | storage | - temporary storage for outputs. |
- Overview
- The type
T
must be such that an object of type InputIterator
can be dereferenced and then implicitly converted to T
.
- Storage reusage
- Synchronization barrier should be placed before
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
- Example.
__global__ void kernel(...)
{
int items[8];
using block_store_int = rocprim::block_store<int, 128, 8>;
block_store_int bstore;
__shared__ typename block_store_int::storage_type storage;
bstore.store(..., items, valid, storage);
...
}
The documentation for this class was generated from the following file: