The warp_store
class is a warp level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.
More...
#include <warp_store.hpp>
|
using | storage_type = typename ::rocprim::detail::empty_storage_type |
| Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
|
|
|
template<class OutputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator output, T(&items)[ItemsPerThread], storage_type &) |
| Stores an arrangement of items from across the warp into an arrangement on continuous memory. More...
|
|
template<class OutputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | store (OutputIterator output, T(&items)[ItemsPerThread], unsigned int valid, storage_type &) |
| Stores an arrangement of items from across the warp into an arrangement on continuous memory, which is guarded by range valid , using temporary storage. More...
|
|
template<class T, unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
class warp_store< T, ItemsPerThread, WarpSize, Method >
The warp_store
class is a warp level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.
- Template Parameters
-
T | - the output/output type. |
ItemsPerThread | - the number of items to be processed by each thread. |
WarpSize | - the number of threads in a warp. It must be a divisor of the kernel block size. |
Method | - the method to store data. |
- Overview
- The
warp_store
class has a number of different methods to store data:
- Example:
In the example a store operation is performed on a warp of 8 threads, using type int
and 4 items per thread.
__global__ void example_kernel(int * output, ...)
{
constexpr unsigned int threads_per_block = 128;
constexpr unsigned int threads_per_warp = 8;
constexpr unsigned int items_per_thread = 4;
constexpr unsigned int warps_per_block = threads_per_block / threads_per_warp;
const unsigned int warp_id = hipThreadIdx_x / threads_per_warp;
const int offset = blockIdx.x * threads_per_block * items_per_thread
+ warp_id * threads_per_warp * items_per_thread;
int items[items_per_thread];
rocprim::warp_store<int, items_per_thread, threads_per_warp, load_method>
warp_store;
warp_store.
store(output + offset, items);
...
}
◆ storage_type
template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
using warp_store< T, ItemsPerThread, WarpSize, Method >::storage_type = typename ::rocprim::detail::empty_storage_type |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords shared
. It can be aliased to an externally allocated memory, or be a part of a union with other storage types to increase shared memory reusability.
◆ store() [1/2]
template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_store< T, ItemsPerThread, WarpSize, Method >::store |
( |
OutputIterator |
output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
storage_type & |
|
|
) |
| |
|
inline |
Stores an arrangement of items from across the warp into an arrangement on continuous memory.
- Template Parameters
-
OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
[out] | output | - the output iterator to store to. |
[in] | items | - array that data is read from. |
- Overview
- The type
T
must be such that an object of type OutputIterator
can be dereferenced and then implicitly assigned from T
.
- Storage reusage
- Synchronization barrier should be placed before
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
◆ store() [2/2]
template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_store< T, ItemsPerThread, WarpSize, Method >::store |
( |
OutputIterator |
output, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
unsigned int |
valid, |
|
|
storage_type & |
|
|
) |
| |
|
inline |
Stores an arrangement of items from across the warp into an arrangement on continuous memory, which is guarded by range valid
, using temporary storage.
- Template Parameters
-
OutputIterator | - [inferred] an iterator type for output (can be a simple pointer. |
- Parameters
-
[out] | output | - the output iterator to store to. |
[in] | items | - array that data is read from. |
[in] | valid | - maximum range of valid numbers to read. |
- Overview
- The type
T
must be such that an object of type OutputIterator
can be dereferenced and then implicitly assigned from T
.
- Storage reusage
- Synchronization barrier should be placed before
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
The documentation for this class was generated from the following file: