rocPRIM
Namespaces | Functions | Variables
Intrinsicsmodule
Collaboration diagram for Intrinsicsmodule:

Namespaces

 detail
 Deprecated: Configuration of device-level scan primitives.
 

Functions

ROCPRIM_DEVICE ROCPRIM_INLINE int get_bit (int x, int i)
 Returns a single bit at 'i' from 'x'.
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int bit_count (unsigned int x)
 Bit count. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int bit_count (unsigned long long x)
 Bit count. More...
 
ROCPRIM_HOST_DEVICE constexpr unsigned int warp_size ()
 [DEPRECATED] Returns a number of threads in a hardware warp. More...
 
ROCPRIM_HOST unsigned int host_warp_size ()
 Returns a number of threads in a hardware warp for the actual device. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE constexpr unsigned int device_warp_size ()
 Returns a number of threads in a hardware warp for the actual target. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int flat_block_size ()
 Returns flat size of a multidimensional block (tile).
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int flat_tile_size ()
 Returns flat size of a multidimensional tile (block).
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int lane_id ()
 Returns thread identifier in a warp.
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int flat_block_thread_id ()
 Returns flat (linear, 1D) thread identifier in a multidimensional block (tile). More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int flat_tile_thread_id ()
 Returns flat (linear, 1D) thread identifier in a multidimensional tile (block).
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id ()
 Returns warp id in a block (tile). More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id (unsigned int flat_id)
 Returns warp id in a block (tile), given the flat (linear, 1D) thread identifier in a multidimensional tile (block). More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int flat_block_id ()
 Returns flat (linear, 1D) block identifier in a multidimensional grid.
 
ROCPRIM_DEVICE ROCPRIM_INLINE void syncthreads ()
 Synchronize all threads in a block (tile)
 
ROCPRIM_DEVICE ROCPRIM_INLINE void wave_barrier ()
 Synchronize all threads in the wavefront. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE lane_mask_type ballot (int predicate)
 Evaluate predicate for all active work-items in the warp and return an integer whose i-th bit is set if and only if predicate is true for the i-th thread of the warp and the i-th thread is active. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int masked_bit_count (lane_mask_type x, unsigned int add=0)
 Masked bit count. More...
 

Variables

const unsigned int warp_id = hipThreadIdx_x / LogicalWarpSize
 Returns warp id in a block (tile). More...
 

Detailed Description

Function Documentation

◆ ballot()

ROCPRIM_DEVICE ROCPRIM_INLINE lane_mask_type ballot ( int  predicate)

Evaluate predicate for all active work-items in the warp and return an integer whose i-th bit is set if and only if predicate is true for the i-th thread of the warp and the i-th thread is active.

Parameters
predicate- input to be evaluated for all active lanes

◆ bit_count() [1/2]

ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int bit_count ( unsigned int  x)

Bit count.

Returns the number of bit of x set.

◆ bit_count() [2/2]

ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int bit_count ( unsigned long long  x)

Bit count.

Returns the number of bit of x set.

◆ device_warp_size()

ROCPRIM_DEVICE ROCPRIM_INLINE constexpr unsigned int device_warp_size ( )

Returns a number of threads in a hardware warp for the actual target.

At device side this constant is available at compile time.

It is constant for a device.

◆ flat_block_thread_id()

ROCPRIM_DEVICE ROCPRIM_INLINE auto flat_block_thread_id ( )

Returns flat (linear, 1D) thread identifier in a multidimensional block (tile).

Returns flat (linear, 1D) thread identifier in a multidimensional block (tile). Use template parameters to optimize 1D or 2D kernels.

◆ host_warp_size()

ROCPRIM_HOST unsigned int host_warp_size ( )
inline

Returns a number of threads in a hardware warp for the actual device.

At host side this constant is available at runtime time only.

It is constant for a device.

◆ masked_bit_count()

ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int masked_bit_count ( lane_mask_type  x,
unsigned int  add = 0 
)

Masked bit count.

For each thread, this function returns the number of active threads which have i-th bit of x set and come before the current thread.

◆ warp_id() [1/2]

ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id ( )

Returns warp id in a block (tile).

Use template parameters to optimize 1D or 2D kernels.

◆ warp_id() [2/2]

ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id ( unsigned int  flat_id)

Returns warp id in a block (tile), given the flat (linear, 1D) thread identifier in a multidimensional tile (block).

Parameters
flat_id- the flat id that should be used to compute the warp id.

◆ warp_size()

ROCPRIM_HOST_DEVICE constexpr unsigned int warp_size ( )
inline

[DEPRECATED] Returns a number of threads in a hardware warp.

It is constant for a device. This function is not supported for the gfx1030 architecture and will be removed in a future release. Please use the new host_warp_size() and device_warp_size() functions.

◆ wave_barrier()

ROCPRIM_DEVICE ROCPRIM_INLINE void wave_barrier ( )

Synchronize all threads in the wavefront.

Wait for all threads in the wavefront before continuing execution. Memory ordering is guaranteed by this function between threads in the same wavefront. Threads can communicate by storing to global / shared memory, executing wave_barrier() then reading values stored by other threads in the same wavefront.

wave_barrier() should be executed by all threads in the wavefront in convergence, this means that if the function is executed in a conditional statement all threads in the wave must enter the conditional statement.

Note
On SIMT architectures all lanes come to a convergence point simultaneously, thus no special instruction is needed in the ISA.

Variable Documentation

◆ warp_id

const unsigned int warp_id = hipThreadIdx_x / LogicalWarpSize

Returns warp id in a block (tile).

Use template parameters to optimize 1D or 2D kernels.