The warp_load
class is a warp level parallel primitive which provides methods for loading data from continuous memory into a blocked arrangement of items across a warp.
More...
#include <warp_load.hpp>
|
using | storage_type = typename ::rocprim::detail::empty_storage_type |
| Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
|
|
|
template<class InputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | load (InputIterator input, T(&items)[ItemsPerThread], storage_type &) |
| Loads data from continuous memory into an arrangement of items across the warp. More...
|
|
template<class InputIterator > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | load (InputIterator input, T(&items)[ItemsPerThread], unsigned int valid, storage_type &) |
| Loads data from continuous memory into an arrangement of items across the warp. More...
|
|
template<class InputIterator , class Default > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | load (InputIterator input, T(&items)[ItemsPerThread], unsigned int valid, Default out_of_bounds, storage_type &) |
| Loads data from continuous memory into an arrangement of items across the warp. More...
|
|
template<class T, unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
class warp_load< T, ItemsPerThread, WarpSize, Method >
The warp_load
class is a warp level parallel primitive which provides methods for loading data from continuous memory into a blocked arrangement of items across a warp.
- Template Parameters
-
T | - the input/output type. |
ItemsPerThread | - the number of items to be processed by each thread. |
WarpSize | - the number of threads in the warp. It must be a divisor of the kernel block size. |
Method | - the method to load data. |
- Overview
- The
warp_load
class has a number of different methods to load data:
- Example:
In the example a load operation is performed on a warp of 8 threads, using type int
and 4 items per thread.
__global__ void example_kernel(int * input, ...)
{
constexpr unsigned int threads_per_block = 128;
constexpr unsigned int threads_per_warp = 8;
constexpr unsigned int items_per_thread = 4;
constexpr unsigned int warps_per_block = threads_per_block / threads_per_warp;
const unsigned int warp_id = hipThreadIdx_x / threads_per_warp;
const int offset = blockIdx.x * threads_per_block * items_per_thread
+ warp_id * threads_per_warp * items_per_thread;
int items[items_per_thread];
rocprim::warp_load<int, items_per_thread, threads_per_warp, load_method>
warp_load;
warp_load.
load(input + offset, items);
...
}
◆ storage_type
template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
using warp_load< T, ItemsPerThread, WarpSize, Method >::storage_type = typename ::rocprim::detail::empty_storage_type |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords shared
. It can be aliased to an externally allocated memory, or be a part of a union with other storage types to increase shared memory reusability.
◆ load() [1/3]
template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
template<class InputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_load< T, ItemsPerThread, WarpSize, Method >::load |
( |
InputIterator |
input, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
storage_type & |
|
|
) |
| |
|
inline |
Loads data from continuous memory into an arrangement of items across the warp.
- Template Parameters
-
InputIterator | - [inferred] an iterator type for input (can be a simple pointer. |
- Parameters
-
[in] | input | - the input iterator to load from. |
[out] | items | - array that data is loaded to. |
[in] | - | temporary storage for inputs. |
- Overview
- The type
T
must be such that an object of type InputIterator
can be dereferenced and then implicitly converted to T
.
◆ load() [2/3]
template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
template<class InputIterator >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_load< T, ItemsPerThread, WarpSize, Method >::load |
( |
InputIterator |
input, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
unsigned int |
valid, |
|
|
storage_type & |
|
|
) |
| |
|
inline |
Loads data from continuous memory into an arrangement of items across the warp.
- Template Parameters
-
InputIterator | - [inferred] an iterator type for input (can be a simple pointer. |
- Parameters
-
[in] | input | - the input iterator to load from. |
[out] | items | - array that data is loaded to. |
[in] | valid | - maximum range of valid numbers to load. |
[in] | - | temporary storage for inputs. |
- Overview
- The type
T
must be such that an object of type InputIterator
can be dereferenced and then implicitly converted to T
.
◆ load() [3/3]
template<class T , unsigned int ItemsPerThread, unsigned int WarpSize = ::rocprim::device_warp_size(), warp_load_method Method = warp_load_method::warp_load_direct>
template<class InputIterator , class Default >
ROCPRIM_DEVICE ROCPRIM_INLINE void warp_load< T, ItemsPerThread, WarpSize, Method >::load |
( |
InputIterator |
input, |
|
|
T(&) |
items[ItemsPerThread], |
|
|
unsigned int |
valid, |
|
|
Default |
out_of_bounds, |
|
|
storage_type & |
|
|
) |
| |
|
inline |
Loads data from continuous memory into an arrangement of items across the warp.
- Template Parameters
-
InputIterator | - [inferred] an iterator type for input (can be a simple pointer. |
- Parameters
-
[in] | input | - the input iterator to load from. |
[out] | items | - array that data is loaded to. |
[in] | valid | - maximum range of valid numbers to load. |
[in] | out_of_bounds | - default value assigned to out-of-bound items. |
[in] | - | temporary storage for inputs. |
- Overview
- The type
T
must be such that an object of type InputIterator
can be dereferenced and then implicitly converted to T
.
The documentation for this class was generated from the following file: