rocPRIM
|
The block_shuffle class is a block level parallel primitive which provides methods for shuffling data partitioned across a block. More...
#include <block_shuffle.hpp>
Public Types | |
using | storage_type = detail::raw_storage< storage_type_ > |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More... | |
Public Member Functions | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | offset (T input, T &output, int distance=1) |
Shuffles data across threads in a block, offseted by the distance value. More... | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | offset (const size_t &flat_id, T input, T &output, int distance) |
Shuffles data across threads in a block, offseted by the distance value. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | offset (const size_t &flat_id, T input, T &output, int distance, storage_type &storage) |
Shuffles data across threads in a block, offseted by the distance value, using temporary storage. More... | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | rotate (T input, T &output, unsigned int distance=1) |
Shuffles data across threads in a block, offseted by the distance value. More... | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | rotate (const size_t &flat_id, T input, T &output, unsigned int distance) |
Shuffles data across threads in a block, offseted by the distance value. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | rotate (const size_t &flat_id, T input, T &output, unsigned int distance, storage_type &storage) |
Shuffles data across threads in a block, offseted by the distance value, using temporary storage. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | up (T(&input)[ItemsPerThread], T(&prev)[ItemsPerThread]) |
The thread block rotates a blocked arrange of input items, shifting it up by one item. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | up (const size_t &flat_id, T(&input)[ItemsPerThread], T(&prev)[ItemsPerThread]) |
The thread block rotates a blocked arrange of input items, shifting it up by one item. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | up (const size_t &flat_id, T(&input)[ItemsPerThread], T(&prev)[ItemsPerThread], storage_type &storage) |
The thread block rotates a blocked arrange of input items, shifting it up by one item, using temporary storage. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | up (T(&input)[ItemsPerThread], T(&prev)[ItemsPerThread], T &block_suffix) |
The thread block rotates a blocked arrange of input items, shifting it up by one item. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | up (const size_t &flat_id, T(&input)[ItemsPerThread], T(&prev)[ItemsPerThread], T &block_suffix) |
The thread block rotates a blocked arrange of input items, shifting it up by one item. More... | |
template<int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | up (const size_t &flat_id, T(&input)[ItemsPerThread], T(&prev)[ItemsPerThread], T &block_suffix, storage_type &storage) |
The thread block rotates a blocked arrange of input items, shifting it up by one item, using temporary storage. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | down (T(&input)[ItemsPerThread], T(&next)[ItemsPerThread]) |
The thread block rotates a blocked arrange of input items, shifting it down by one item. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | down (const size_t &flat_id, T(&input)[ItemsPerThread], T(&next)[ItemsPerThread]) |
The thread block rotates a blocked arrange of input items, shifting it down by one item. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | down (const size_t &flat_id, T(&input)[ItemsPerThread], T(&next)[ItemsPerThread], storage_type &storage) |
The thread block rotates a blocked arrange of input items, shifting it down by one item, using temporary storage. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | down (T(&input)[ItemsPerThread], T(&next)[ItemsPerThread], T &block_prefix) |
The thread block rotates a blocked arrange of input items, shifting it down by one item. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | down (const size_t &flat_id, T(&input)[ItemsPerThread], T(&next)[ItemsPerThread], T &block_prefix) |
The thread block rotates a blocked arrange of input items, shifting it down by one item. More... | |
template<unsigned int ItemsPerThread> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | down (const size_t &flat_id, T(&input)[ItemsPerThread], T(&next)[ItemsPerThread], T &block_prefix, storage_type &storage) |
The thread block rotates a blocked arrange of input items, shifting it down by one item, using temporary storage. More... | |
The block_shuffle class is a block level parallel primitive which provides methods for shuffling data partitioned across a block.
T | - the input/output type. |
BlockSizeX | - the number of threads in a block's x dimension, it has no defaults value. |
BlockSizeY | - the number of threads in a block's y dimension, defaults to 1. |
BlockSizeZ | - the number of threads in a block's z dimension, defaults to 1. |
ItemsPerThread
is greater than one,T
is an arithmetic type,In the examples shuffle operation is performed on block of 192 threads, each provides one int
value, result is returned using the same variable as for input.
using block_shuffle< T, BlockSizeX, BlockSizeY, BlockSizeZ >::storage_type = detail::raw_storage<storage_type_> |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__
. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
|
inline |
The thread block rotates a blocked arrange of input items, shifting it down by one item.
[in] | input | - The calling thread's input items |
[out] | next | - The corresponding successor items (may be aliased to input ). The item prev [0] is not updated for threadBlockSize - 1. |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it down by one item.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | next | - The corresponding successor items (may be aliased to input ). The item prev [0] is not updated for threadBlockSize - 1. |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it down by one item, using temporary storage.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | next | - The corresponding successor items (may be aliased to input ). The item prev [0] is not updated for threadBlockSize - 1. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it down by one item.
[in] | input | - The calling thread's input items |
[out] | next | - The corresponding successor items (may be aliased to input ). The item prev [0] is not updated for threadBlockSize - 1. |
[out] | block_prefix | - The item input [0] from thread0 , provided to all threads |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it down by one item.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | next | - The corresponding successor items (may be aliased to input ). The item prev [0] is not updated for threadBlockSize - 1. |
[out] | block_prefix | - The item input [0] from thread0 , provided to all threads |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it down by one item, using temporary storage.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | next | - The corresponding successor items (may be aliased to input ). The item prev [0] is not updated for threadBlockSize - 1. |
[out] | block_prefix | - The item input [0] from thread0 , provided to all threads |
[in] | storage | - reference to a temporary storage object of type storage_type. |
|
inline |
Shuffles data across threads in a block, offseted by the distance value.
[in] | input | - input data to be shuffled to another thread. |
[out] | output | - reference to a output value, that receives data from another thread |
[in] | distance | - The input threadId + distance = output threadId. |
|
inline |
Shuffles data across threads in a block, offseted by the distance value.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - input data to be shuffled to another thread. |
[out] | output | - reference to a output value, that receives data from another thread |
[in] | distance | - The input threadId + distance = output threadId. |
|
inline |
Shuffles data across threads in a block, offseted by the distance value, using temporary storage.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - input data to be shuffled to another thread. |
[out] | output | - reference to a output value, that receives data from another thread |
[in] | distance | - The input threadId + distance = output threadId. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
|
inline |
Shuffles data across threads in a block, offseted by the distance value.
[in] | input | - input data to be shuffled to another thread. |
[out] | output | - reference to a output value, that receives data from another thread |
[in] | distance | - The input threadId + distance = output threadId. |
|
inline |
Shuffles data across threads in a block, offseted by the distance value.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - input data to be shuffled to another thread. |
[out] | output | - reference to a output value, that receives data from another thread |
[in] | distance | - The input threadId + distance = output threadId. |
|
inline |
Shuffles data across threads in a block, offseted by the distance value, using temporary storage.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - input data to be shuffled to another thread. |
[out] | output | - reference to a output value, that receives data from another thread |
[in] | distance | - The input threadId + distance = output threadId. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it up by one item.
[in] | input | - The calling thread's input items |
[out] | prev | - The corresponding predecessor items (may be aliased to input ). The item prev [0] is not updated for thread0. |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it up by one item.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | prev | - The corresponding predecessor items (may be aliased to input ). The item prev [0] is not updated for thread0. |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it up by one item, using temporary storage.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | prev | - The corresponding predecessor items (may be aliased to input ). |
[in] | storage | - reference to a temporary storage object of type storage_type. The item prev [0] is not updated for thread0. |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it up by one item.
[in] | input | - The calling thread's input items |
[out] | prev | - The corresponding predecessor items (may be aliased to input ). The item prev [0] is not updated for thread0. |
[out] | block_suffix | - The item input [ItemsPerThread-1] from threadBlockSize-1 , provided to all threads |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it up by one item.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | prev | - The corresponding predecessor items (may be aliased to input ). The item prev [0] is not updated for thread0. |
[out] | block_suffix | - The item input [ItemsPerThread-1] from threadBlockSize-1 , provided to all threads |
|
inline |
The thread block rotates a blocked arrange of input items, shifting it up by one item, using temporary storage.
[in] | flat_id | - flat thread ID obtained from rocprim::flat_block_thread_id |
[in] | input | - The calling thread's input items |
[out] | prev | - The corresponding predecessor items (may be aliased to input ). The item prev [0] is not updated for thread0. |
[out] | block_suffix | - The item input [ItemsPerThread-1] from threadBlockSize-1 , provided to all threads |
[in] | storage | - reference to a temporary storage object of type storage_type. |