rocPRIM
Public Types | Public Member Functions | List of all members
warp_load< T, ItemsPerThread, WarpSize, Method > Class Template Reference

The warp_load class is a warp level parallel primitive which provides methods for loading data from continuous memory into a blocked arrangement of items across a warp. More...

#include <warp_load.hpp>

Public Types

using storage_type = typename ::rocprim::detail::empty_storage_type
 Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
 

Public Member Functions

template<class InputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void load (InputIterator input, T(&items)[ItemsPerThread], storage_type &)
 Loads data from continuous memory into an arrangement of items across the warp. More...
 
template<class InputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void load (InputIterator input, T(&items)[ItemsPerThread], unsigned int valid, storage_type &)
 Loads data from continuous memory into an arrangement of items across the warp. More...
 
template<class InputIterator , class Default >
ROCPRIM_DEVICE ROCPRIM_INLINE void load (InputIterator input, T(&items)[ItemsPerThread], unsigned int valid, Default out_of_bounds, storage_type &)
 Loads data from continuous memory into an arrangement of items across the warp. More...
 

Detailed Description

template<class T, unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
class warp_load< T, ItemsPerThread, WarpSize, Method >

The warp_load class is a warp level parallel primitive which provides methods for loading data from continuous memory into a blocked arrangement of items across a warp.

Template Parameters
T- the input/output type.
ItemsPerThread- the number of items to be processed by each thread.
WarpSize- the number of threads in the warp. It must be a divisor of the kernel block size.
Method- the method to load data.
Overview
Example:

In the example a load operation is performed on a warp of 8 threads, using type int and 4 items per thread.

__global__ void example_kernel(int * input, ...)
{
constexpr unsigned int threads_per_block = 128;
constexpr unsigned int threads_per_warp = 8;
constexpr unsigned int items_per_thread = 4;
constexpr unsigned int warps_per_block = threads_per_block / threads_per_warp;
const unsigned int warp_id = hipThreadIdx_x / threads_per_warp;
const int offset = blockIdx.x * threads_per_block * items_per_thread
+ warp_id * threads_per_warp * items_per_thread;
int items[items_per_thread];
rocprim::warp_load<int, items_per_thread, threads_per_warp, load_method> warp_load;
warp_load.load(input + offset, items);
...
}

Member Typedef Documentation

◆ storage_type

template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
using warp_load< T, ItemsPerThread, WarpSize, Method >::storage_type = typename ::rocprim::detail::empty_storage_type

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords shared. It can be aliased to an externally allocated memory, or be a part of a union with other storage types to increase shared memory reusability.

Member Function Documentation

◆ load() [1/3]

template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
template<class InputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_load< T, ItemsPerThread, WarpSize, Method >::load ( InputIterator  input,
T(&)  items[ItemsPerThread],
storage_type  
)
inline

Loads data from continuous memory into an arrangement of items across the warp.

Template Parameters
InputIterator- [inferred] an iterator type for input (can be a simple pointer.
Parameters
[in]input- the input iterator to load from.
[out]items- array that data is loaded to.
[in]-temporary storage for inputs.
Overview
  • The type T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.

◆ load() [2/3]

template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
template<class InputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_load< T, ItemsPerThread, WarpSize, Method >::load ( InputIterator  input,
T(&)  items[ItemsPerThread],
unsigned int  valid,
storage_type  
)
inline

Loads data from continuous memory into an arrangement of items across the warp.

Template Parameters
InputIterator- [inferred] an iterator type for input (can be a simple pointer.
Parameters
[in]input- the input iterator to load from.
[out]items- array that data is loaded to.
[in]valid- maximum range of valid numbers to load.
[in]-temporary storage for inputs.
Overview
  • The type T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.

◆ load() [3/3]

template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
template<class InputIterator , class Default >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_load< T, ItemsPerThread, WarpSize, Method >::load ( InputIterator  input,
T(&)  items[ItemsPerThread],
unsigned int  valid,
Default  out_of_bounds,
storage_type  
)
inline

Loads data from continuous memory into an arrangement of items across the warp.

Template Parameters
InputIterator- [inferred] an iterator type for input (can be a simple pointer.
Parameters
[in]input- the input iterator to load from.
[out]items- array that data is loaded to.
[in]valid- maximum range of valid numbers to load.
[in]out_of_bounds- default value assigned to out-of-bound items.
[in]-temporary storage for inputs.
Overview
  • The type T must be such that an object of type InputIterator can be dereferenced and then implicitly converted to T.

The documentation for this class was generated from the following file: