rocPRIM
Classes | Functions
Collaboration diagram for Warp-wide:

Classes

class  warp_reduce< T, WarpSize, UseAllReduce >
 The warp_reduce class is a warp level parallel primitive which provides methods for performing reduction operations on items partitioned across threads in a hardware warp. More...
 
class  warp_scan< T, WarpSize >
 The warp_scan class is a warp level parallel primitive which provides methods for performing inclusive and exclusive scan operations of items partitioned across threads in a hardware warp. More...
 
class  warp_sort< Key, WarpSize, Value >
 The warp_sort class provides warp-wide methods for computing a parallel sort of items across thread warps. More...
 

Functions

template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle (const T &input, const int src_lane, const int width=device_warp_size())
 Shuffle for any data type. More...
 
template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_up (const T &input, const unsigned int delta, const int width=device_warp_size())
 Shuffle up for any data type. More...
 
template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_down (const T &input, const unsigned int delta, const int width=device_warp_size())
 Shuffle down for any data type. More...
 
template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_xor (const T &input, const int lane_mask, const int width=device_warp_size())
 Shuffle XOR for any data type. More...
 

Detailed Description

Function Documentation

◆ warp_shuffle()

template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle ( const T &  input,
const int  src_lane,
const int  width = device_warp_size() 
)

Shuffle for any data type.

Each thread in warp obtains input from src_lane-th thread in warp. If width is less than device_warp_size() then each subsection of the warp behaves as a separate entity with a starting logical lane id of 0. If src_lane is not in [0; width) range, the returned value is equal to input passed by the src_lane modulo width thread.

Note: The optional width parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().

Parameters
input- input to pass to other threads
src_lane- warp if of a thread whose input should be returned
width- logical warp width

◆ warp_shuffle_down()

template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_down ( const T &  input,
const unsigned int  delta,
const int  width = device_warp_size() 
)

Shuffle down for any data type.

i-th thread in warp obtains input from i+delta-th thread in warp. If i+delta is not in [0; width) range, thread's own input is returned.

Note: The optional width parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().

Parameters
input- input to pass to other threads
delta- offset for calculating source lane id
width- logical warp width

◆ warp_shuffle_up()

template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_up ( const T &  input,
const unsigned int  delta,
const int  width = device_warp_size() 
)

Shuffle up for any data type.

i-th thread in warp obtains input from i-delta-th thread in warp. If i-delta is not in [0; width) range, thread's own input is returned.

Note: The optional width parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().

Parameters
input- input to pass to other threads
delta- offset for calculating source lane id
width- logical warp width

◆ warp_shuffle_xor()

template<class T >
ROCPRIM_DEVICE ROCPRIM_INLINE T warp_shuffle_xor ( const T &  input,
const int  lane_mask,
const int  width = device_warp_size() 
)

Shuffle XOR for any data type.

i-th thread in warp obtains input from i^lane_mask-th thread in warp.

Note: The optional width parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().

Parameters
input- input to pass to other threads
lane_mask- mask used for calculating source lane id
width- logical warp width