rocPRIM
|
![]() |
Namespaces | |
detail | |
Deprecated: Configuration of device-level scan primitives. | |
Functions | |
ROCPRIM_DEVICE ROCPRIM_INLINE int | get_bit (int x, int i) |
Returns a single bit at 'i' from 'x'. | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | bit_count (unsigned int x) |
Bit count. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | bit_count (unsigned long long x) |
Bit count. More... | |
ROCPRIM_HOST_DEVICE constexpr unsigned int | warp_size () |
[DEPRECATED] Returns a number of threads in a hardware warp. More... | |
ROCPRIM_HOST unsigned int | host_warp_size () |
Returns a number of threads in a hardware warp for the actual device. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE constexpr unsigned int | device_warp_size () |
Returns a number of threads in a hardware warp for the actual target. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | flat_block_size () |
Returns flat size of a multidimensional block (tile). | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | flat_tile_size () |
Returns flat size of a multidimensional tile (block). | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | lane_id () |
Returns thread identifier in a warp. | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | flat_block_thread_id () |
Returns flat (linear, 1D) thread identifier in a multidimensional block (tile). More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | flat_tile_thread_id () |
Returns flat (linear, 1D) thread identifier in a multidimensional tile (block). | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | warp_id () |
Returns warp id in a block (tile). More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | warp_id (unsigned int flat_id) |
Returns warp id in a block (tile), given the flat (linear, 1D) thread identifier in a multidimensional tile (block). More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | flat_block_id () |
Returns flat (linear, 1D) block identifier in a multidimensional grid. | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | syncthreads () |
Synchronize all threads in a block (tile) | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | wave_barrier () |
Synchronize all threads in the wavefront. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE lane_mask_type | ballot (int predicate) |
Evaluate predicate for all active work-items in the warp and return an integer whose i -th bit is set if and only if predicate is true for the i -th thread of the warp and the i -th thread is active. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int | masked_bit_count (lane_mask_type x, unsigned int add=0) |
Masked bit count. More... | |
Variables | |
const unsigned int | warp_id = hipThreadIdx_x / LogicalWarpSize |
Returns warp id in a block (tile). More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE lane_mask_type ballot | ( | int | predicate | ) |
Evaluate predicate for all active work-items in the warp and return an integer whose i
-th bit is set if and only if predicate
is true
for the i
-th thread of the warp and the i
-th thread is active.
predicate | - input to be evaluated for all active lanes |
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int bit_count | ( | unsigned int | x | ) |
Bit count.
Returns the number of bit of x
set.
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int bit_count | ( | unsigned long long | x | ) |
Bit count.
Returns the number of bit of x
set.
ROCPRIM_DEVICE ROCPRIM_INLINE constexpr unsigned int device_warp_size | ( | ) |
Returns a number of threads in a hardware warp for the actual target.
At device side this constant is available at compile time.
It is constant for a device.
ROCPRIM_DEVICE ROCPRIM_INLINE auto flat_block_thread_id | ( | ) |
Returns flat (linear, 1D) thread identifier in a multidimensional block (tile).
Returns flat (linear, 1D) thread identifier in a multidimensional block (tile). Use template parameters to optimize 1D or 2D kernels.
|
inline |
Returns a number of threads in a hardware warp for the actual device.
At host side this constant is available at runtime time only.
It is constant for a device.
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int masked_bit_count | ( | lane_mask_type | x, |
unsigned int | add = 0 |
||
) |
Masked bit count.
For each thread, this function returns the number of active threads which have i
-th bit of x
set and come before the current thread.
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id | ( | ) |
Returns warp id in a block (tile).
Use template parameters to optimize 1D or 2D kernels.
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id | ( | unsigned int | flat_id | ) |
Returns warp id in a block (tile), given the flat (linear, 1D) thread identifier in a multidimensional tile (block).
flat_id | - the flat id that should be used to compute the warp id. |
|
inline |
[DEPRECATED] Returns a number of threads in a hardware warp.
It is constant for a device. This function is not supported for the gfx1030 architecture and will be removed in a future release. Please use the new host_warp_size() and device_warp_size() functions.
ROCPRIM_DEVICE ROCPRIM_INLINE void wave_barrier | ( | ) |
Synchronize all threads in the wavefront.
Wait for all threads in the wavefront before continuing execution. Memory ordering is guaranteed by this function between threads in the same wavefront. Threads can communicate by storing to global / shared memory, executing wave_barrier() then reading values stored by other threads in the same wavefront.
wave_barrier() should be executed by all threads in the wavefront in convergence, this means that if the function is executed in a conditional statement all threads in the wave must enter the conditional statement.
const unsigned int warp_id = hipThreadIdx_x / LogicalWarpSize |
Returns warp id in a block (tile).
Use template parameters to optimize 1D or 2D kernels.