rocPRIM
Public Types | Public Member Functions | List of all members
warp_store< T, ItemsPerThread, WarpSize, Method > Class Template Reference

The warp_store class is a warp level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory. More...

#include <warp_store.hpp>

Public Types

using storage_type = typename ::rocprim::detail::empty_storage_type
 Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
 

Public Member Functions

template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void store (OutputIterator output, T(&items)[ItemsPerThread], storage_type &)
 Stores an arrangement of items from across the warp into an arrangement on continuous memory. More...
 
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void store (OutputIterator output, T(&items)[ItemsPerThread], unsigned int valid, storage_type &)
 Stores an arrangement of items from across the warp into an arrangement on continuous memory, which is guarded by range valid, using temporary storage. More...
 

Detailed Description

template<class T, unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
class warp_store< T, ItemsPerThread, WarpSize, Method >

The warp_store class is a warp level parallel primitive which provides methods for storing an arrangement of items into a blocked/striped arrangement on continous memory.

Template Parameters
T- the output/output type.
ItemsPerThread- the number of items to be processed by each thread.
WarpSize- the number of threads in a warp. It must be a divisor of the kernel block size.
Method- the method to store data.
Overview
Example:

In the example a store operation is performed on a warp of 8 threads, using type int and 4 items per thread.

__global__ void example_kernel(int * output, ...)
{
constexpr unsigned int threads_per_block = 128;
constexpr unsigned int threads_per_warp = 8;
constexpr unsigned int items_per_thread = 4;
constexpr unsigned int warps_per_block = threads_per_block / threads_per_warp;
const unsigned int warp_id = hipThreadIdx_x / threads_per_warp;
const int offset = blockIdx.x * threads_per_block * items_per_thread
+ warp_id * threads_per_warp * items_per_thread;
int items[items_per_thread];
rocprim::warp_store<int, items_per_thread, threads_per_warp, load_method> warp_store;
warp_store.store(output + offset, items);
...
}

Member Typedef Documentation

◆ storage_type

template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
using warp_store< T, ItemsPerThread, WarpSize, Method >::storage_type = typename ::rocprim::detail::empty_storage_type

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords shared. It can be aliased to an externally allocated memory, or be a part of a union with other storage types to increase shared memory reusability.

Member Function Documentation

◆ store() [1/2]

template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_store< T, ItemsPerThread, WarpSize, Method >::store ( OutputIterator  output,
T(&)  items[ItemsPerThread],
storage_type  
)
inline

Stores an arrangement of items from across the warp into an arrangement on continuous memory.

Template Parameters
OutputIterator- [inferred] an iterator type for output (can be a simple pointer.
Parameters
[out]output- the output iterator to store to.
[in]items- array that data is read from.
Overview
  • The type T must be such that an object of type OutputIterator can be dereferenced and then implicitly assigned from T.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

◆ store() [2/2]

template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_store_method Method = warp_store_method::warp_store_direct>
template<class OutputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_store< T, ItemsPerThread, WarpSize, Method >::store ( OutputIterator  output,
T(&)  items[ItemsPerThread],
unsigned int  valid,
storage_type  
)
inline

Stores an arrangement of items from across the warp into an arrangement on continuous memory, which is guarded by range valid, using temporary storage.

Template Parameters
OutputIterator- [inferred] an iterator type for output (can be a simple pointer.
Parameters
[out]output- the output iterator to store to.
[in]items- array that data is read from.
[in]valid- maximum range of valid numbers to read.
Overview
  • The type T must be such that an object of type OutputIterator can be dereferenced and then implicitly assigned from T.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

The documentation for this class was generated from the following file: