rocPRIM
|
Namespaces | |
detail | |
Deprecated: Configuration of device-level scan primitives. | |
Classes | |
class | warp_exchange< T, ItemsPerThread, WarpSize > |
The warp_exchange class is a warp level parallel primitive which provides methods for rearranging items partitioned across threads in a warp. More... | |
class | warp_load< T, ItemsPerThread, WarpSize, Method > |
The warp_load class is a warp level parallel primitive which provides methods for loading data from continuous memory into a blocked arrangement of items across a warp. More... | |
class | warp_load< T, ItemsPerThread, WarpSize, warp_load_method::warp_load_striped > |
class | warp_load< T, ItemsPerThread, WarpSize, warp_load_method::warp_load_vectorize > |
class | warp_load< T, ItemsPerThread, WarpSize, warp_load_method::warp_load_transpose > |
class | warp_reduce< T, WarpSize, UseAllReduce > |
The warp_reduce class is a warp level parallel primitive which provides methods for performing reduction operations on items partitioned across threads in a hardware warp. More... | |
class | warp_scan< T, WarpSize > |
The warp_scan class is a warp level parallel primitive which provides methods for performing inclusive and exclusive scan operations of items partitioned across threads in a hardware warp. More... | |
class | warp_sort< Key, WarpSize, Value > |
The warp_sort class provides warp-wide methods for computing a parallel sort of items across thread warps. More... | |
class | warp_store< T, ItemsPerThread, WarpSize, Method > |
The warp_store class is a warp level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory. More... | |
class | warp_store< T, ItemsPerThread, WarpSize, warp_store_method::warp_store_striped > |
class | warp_store< T, ItemsPerThread, WarpSize, warp_store_method::warp_store_vectorize > |
class | warp_store< T, ItemsPerThread, WarpSize, warp_store_method::warp_store_transpose > |
Enumerations | |
enum | warp_load_method { warp_load_method::warp_load_direct, warp_load_method::warp_load_striped, warp_load_method::warp_load_vectorize, warp_load_method::warp_load_transpose, warp_load_method::default_method = warp_load_direct } |
warp_load_method enumerates the methods available to load data from continuous memory into a blocked/striped arrangement of items across the warp More... | |
enum | warp_store_method { warp_store_method::warp_store_direct, warp_store_method::warp_store_striped, warp_store_method::warp_store_vectorize, warp_store_method::warp_store_transpose, warp_store_method::default_method = warp_store_direct } |
warp_store_method enumerates the methods available to store a blocked/striped arrangement of items into a blocked/striped arrangement in continuous memory More... | |
Functions | |
template<class T > | |
ROCPRIM_DEVICE ROCPRIM_INLINE T | warp_shuffle (const T &input, const int src_lane, const int width=device_warp_size()) |
Shuffle for any data type. More... | |
template<class T > | |
ROCPRIM_DEVICE ROCPRIM_INLINE T | warp_shuffle_up (const T &input, const unsigned int delta, const int width=device_warp_size()) |
Shuffle up for any data type. More... | |
template<class T > | |
ROCPRIM_DEVICE ROCPRIM_INLINE T | warp_shuffle_down (const T &input, const unsigned int delta, const int width=device_warp_size()) |
Shuffle down for any data type. More... | |
template<class T > | |
ROCPRIM_DEVICE ROCPRIM_INLINE T | warp_shuffle_xor (const T &input, const int lane_mask, const int width=device_warp_size()) |
Shuffle XOR for any data type. More... | |
template<typename T > | |
ROCPRIM_DEVICE ROCPRIM_INLINE T | warp_permute (const T &input, const int dst_lane, const int width=device_warp_size()) |
Permute items across the threads in a warp. More... | |
|
strong |
warp_load_method
enumerates the methods available to load data from continuous memory into a blocked/striped arrangement of items across the warp
|
strong |
warp_store_method
enumerates the methods available to store a blocked/striped arrangement of items into a blocked/striped arrangement in continuous memory
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_permute | ( | const T & | input, |
const int | dst_lane, | ||
const int | width = device_warp_size() |
||
) |
Permute items across the threads in a warp.
The value from this thread in the warp is permuted to the dst_lane
-th thread in the warp. If multiple warps write to the same destination, the result is undefined but will be a value from either of the source values. If no threads write to a particular thread then the value for that thread will be 0. The destination index is taken modulo the logical warp size, so any value larger than the logical warp size will wrap around.
Note: The optional width
parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().
input | - input to pass to other threads |
dst_lane | - the destination lane to which the value from this thread is written. |
width | - logical warp width |
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle | ( | const T & | input, |
const int | src_lane, | ||
const int | width = device_warp_size() |
||
) |
Shuffle for any data type.
Each thread in warp obtains input
from src_lane
-th thread in warp. If width
is less than device_warp_size() then each subsection of the warp behaves as a separate entity with a starting logical lane id of 0. If src_lane
is not in [0; width
) range, the returned value is equal to input
passed by the src_lane modulo width
thread.
Note: The optional width
parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().
input | - input to pass to other threads |
src_lane | - warp if of a thread whose input should be returned |
width | - logical warp width |
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_down | ( | const T & | input, |
const unsigned int | delta, | ||
const int | width = device_warp_size() |
||
) |
Shuffle down for any data type.
i
-th thread in warp obtains input
from i+delta
-th thread in warp. If
is not in [0; i+delta
width
) range, thread's own input
is returned.
Note: The optional width
parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().
input | - input to pass to other threads |
delta | - offset for calculating source lane id |
width | - logical warp width |
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_up | ( | const T & | input, |
const unsigned int | delta, | ||
const int | width = device_warp_size() |
||
) |
Shuffle up for any data type.
i
-th thread in warp obtains input
from i-delta
-th thread in warp. If
is not in [0; i-delta
width
) range, thread's own input
is returned.
Note: The optional width
parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().
input | - input to pass to other threads |
delta | - offset for calculating source lane id |
width | - logical warp width |
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_xor | ( | const T & | input, |
const int | lane_mask, | ||
const int | width = device_warp_size() |
||
) |
Shuffle XOR for any data type.
i
-th thread in warp obtains input
from i^lane_mask
-th thread in warp.
Note: The optional width
parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().
input | - input to pass to other threads |
lane_mask | - mask used for calculating source lane id |
width | - logical warp width |