rocPRIM
|
The warp_exchange
class is a warp level parallel primitive which provides methods for rearranging items partitioned across threads in a warp.
More...
#include <warp_exchange.hpp>
Public Types | |
using | storage_type = detail::raw_storage< storage_type_ > |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by the related parallel primitive. More... | |
Public Member Functions | |
template<class U > | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | blocked_to_striped (const T(&input)[ItemsPerThread], U(&output)[ItemsPerThread], storage_type &storage) |
Transposes a blocked arrangement of items to a striped arrangement across the warp, using temporary storage. More... | |
template<class U > | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | blocked_to_striped_shuffle (const T(&input)[ItemsPerThread], U(&output)[ItemsPerThread]) |
Transposes a blocked arrangement of items to a striped arrangement across the warp, using warp shuffle operations. More... | |
template<class U > | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | striped_to_blocked (const T(&input)[ItemsPerThread], U(&output)[ItemsPerThread], storage_type &storage) |
Transposes a striped arrangement of items to a blocked arrangement across the warp, using temporary storage. More... | |
template<class U > | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | striped_to_blocked_shuffle (const T(&input)[ItemsPerThread], U(&output)[ItemsPerThread]) |
Transposes a striped arrangement of items to a blocked arrangement across the warp, using warp shuffle operations. More... | |
template<class U , class OffsetT > | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | scatter_to_striped (const T(&input)[ItemsPerThread], U(&output)[ItemsPerThread], const OffsetT(&ranks)[ItemsPerThread], storage_type &storage) |
Orders input values according to ranks using temporary storage, then writes the values to output in a striped manner. More... | |
The warp_exchange
class is a warp level parallel primitive which provides methods for rearranging items partitioned across threads in a warp.
T | - the input type. |
ItemsPerThread | - the number of items contributed by each thread. |
WarpSize | - the number of threads in a warp. |
warp_exchange
class supports the following rearrangement methods:In the example an exchange operation is performed on a warp of 8 threads, using type int
with 4 items per thread.
using warp_exchange< T, ItemsPerThread, WarpSize >::storage_type = detail::raw_storage<storage_type_> |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by the related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__
. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
|
inline |
Transposes a blocked arrangement of items to a striped arrangement across the warp, using temporary storage.
U | - [inferred] the output type. |
[in] | input | - array that data is loaded from. |
[out] | output | - array that data is loaded to. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
|
inline |
Transposes a blocked arrangement of items to a striped arrangement across the warp, using warp shuffle operations.
Caution: this API is experimental. Performance might not be consistent. ItemsPerThread must be a divisor of WarpSize.
U | - [inferred] the output type. |
[in] | input | - array that data is loaded from. |
[out] | output | - array that data is loaded to. |
|
inline |
Orders input
values according to ranks using temporary storage, then writes the values to output
in a striped manner.
No values in ranks
should exists that exceed WarpSize*ItemsPerThread-1
.
U | - [inferred] the output type. |
[in] | input | - array that data is loaded from. |
[out] | output | - array that data is loaded to. |
[in] | ranks | - array containing the positions. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
|
inline |
Transposes a striped arrangement of items to a blocked arrangement across the warp, using temporary storage.
U | - [inferred] the output type. |
[in] | input | - array that data is loaded from. |
[out] | output | - array that data is loaded to. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
|
inline |
Transposes a striped arrangement of items to a blocked arrangement across the warp, using warp shuffle operations.
Caution: this API is experimental. Performance might not be consistent. ItemsPerThread must be a divisor of WarpSize.
U | - [inferred] the output type. |
[in] | input | - array that data is loaded from. |
[out] | output | - array that data is loaded to. |