rocPRIM
|
The block_scan class is a block level parallel primitive which provides methods for performing inclusive and exclusive scan operations of items partitioned across threads in a block. More...
#include <block_scan.hpp>
Public Types | |
using | storage_type = typename base_type::storage_type |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More... | |
Public Member Functions | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | inclusive_scan (T input, T &output, storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs inclusive scan across threads in a block. More... | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | inclusive_scan (T input, T &output, BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | inclusive_scan (T input, T &output, T &reduction, storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs inclusive scan and reduction across threads in a block. More... | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | inclusive_scan (T input, T &output, T &reduction, BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<class PrefixCallback , class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | inclusive_scan (T input, T &output, storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op) |
Performs inclusive scan across threads in a block, and uses prefix_callback_op to generate prefix value for the whole block. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | inclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs inclusive scan across threads in a block. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | inclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | inclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], T &reduction, storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs inclusive scan and reduction across threads in a block. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | inclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], T &reduction, BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<unsigned int ItemsPerThread, class PrefixCallback , class BinaryFunction > | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | inclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op) |
Performs inclusive scan across threads in a block, and uses prefix_callback_op to generate prefix value for the whole block. More... | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | exclusive_scan (T input, T &output, T init, storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs exclusive scan across threads in a block. More... | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | exclusive_scan (T input, T &output, T init, BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | exclusive_scan (T input, T &output, T init, T &reduction, storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs exclusive scan and reduction across threads in a block. More... | |
template<class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | exclusive_scan (T input, T &output, T init, T &reduction, BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<class PrefixCallback , class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | exclusive_scan (T input, T &output, storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op) |
Performs exclusive scan across threads in a block, and uses prefix_callback_op to generate prefix value for the whole block. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | exclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], T init, storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs exclusive scan across threads in a block. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | exclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], T init, BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | exclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], T init, T &reduction, storage_type &storage, BinaryFunction scan_op=BinaryFunction()) |
Performs exclusive scan and reduction across threads in a block. More... | |
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | exclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], T init, T &reduction, BinaryFunction scan_op=BinaryFunction()) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<unsigned int ItemsPerThread, class PrefixCallback , class BinaryFunction > | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | exclusive_scan (T(&input)[ItemsPerThread], T(&output)[ItemsPerThread], storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op) |
Performs exclusive scan across threads in a block, and uses prefix_callback_op to generate prefix value for the whole block. More... | |
The block_scan class is a block level parallel primitive which provides methods for performing inclusive and exclusive scan operations of items partitioned across threads in a block.
T | - the input/output type. |
BlockSizeX | - the number of threads in a block's x dimension. |
Algorithm | - selected scan algorithm, block_scan_algorithm::default_algorithm by default. |
BlockSizeY | - the number of threads in a block's y dimension, defaults to 1. |
BlockSizeZ | - the number of threads in a block's z dimension, defaults to 1. |
ItemsPerThread
is greater than one,T
is an arithmetic type,block_scan_algorithm::using_warp_scan
and block_scan_algorithm::reduce_then_scan.In the examples scan operation is performed on block of 192 threads, each provides one int
value, result is returned using the same variable as for input.
using block_scan< T, BlockSizeX, Algorithm, BlockSizeY, BlockSizeZ >::storage_type = typename base_type::storage_type |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__
. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
|
inline |
Performs exclusive scan across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present exclusive min scan operations performed on a block of 256 threads, each provides one float
value.
If the input
values across threads in a block are {1, -2, 3, -4, ..., 255, -256}
and init
is 0
, then output
values in will be {0, 0, -2, -2, -4, ..., -254, -254}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs exclusive scan across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs exclusive scan and reduction across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present exclusive min scan operations performed on a block of 256 threads, each provides one float
value.
If the input
values across threads in a block are {1, -2, 3, -4, ..., 255, -256}
and init
is 0
, then output
values in will be {0, 0, -2, -2, -4, ..., -254, -254}
and the reduction
will be -256
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs exclusive scan and reduction across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs exclusive scan across threads in a block, and uses prefix_callback_op
to generate prefix value for the whole block.
PrefixCallback | - type of the unary function object used for generating block-wide prefix value for the scan operation. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in,out] | prefix_callback_op | - function object for generating block prefix value. The signature of the prefix_callback_op should be equivalent to the following: T f(const T &block_reduction); . The signature does not need to have const & , but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of input values as input argument. The result of the first thread will be used as the block-wide prefix. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present exclusive prefix sum operations performed on a block of 256 threads, each thread provides one int
value.
If the input
values across threads in a block are {1, 1, 1, ..., 1}
, then output
values in will be {10, 11, 12, 13, ..., 265}
, and the prefix
will be 266
.
|
inline |
Performs exclusive scan across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present exclusive maximum scan operations performed on a block of 128 threads, each provides two long
value.
If the input
values across threads in a block are {-1, 2, -3, 4, ..., -255, 256}
and init
is 0, then output
values in will be {0, 0, 2, 2, 4, ..., 254, 254}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs exclusive scan across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs exclusive scan and reduction across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present exclusive maximum scan operations performed on a block of 128 threads, each provides two long
value.
If the input
values across threads in a block are {-1, 2, -3, 4, ..., -255, 256}
and init
is 0, then output
values in will be {0, 0, 2, 2, 4, ..., 254, 254}
and the reduction
will be 256
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs exclusive scan and reduction across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | init | - initial value used to start the exclusive scan. Should be the same for all threads in a block. |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs exclusive scan across threads in a block, and uses prefix_callback_op
to generate prefix value for the whole block.
ItemsPerThread | - number of items in the input array. |
PrefixCallback | - type of the unary function object used for generating block-wide prefix value for the scan operation. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in,out] | prefix_callback_op | - function object for generating block prefix value. The signature of the prefix_callback_op should be equivalent to the following: T f(const T &block_reduction); . The signature does not need to have const & , but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of input values as input argument. The result of the first thread will be used as the block-wide prefix. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present exclusive prefix sum operations performed on a block of 128 threads, each thread provides two int
value.
If the input
values across threads in a block are {1, 1, 1, ..., 1}
, then output
values in will be {10, 11, 12, 13, ..., 265}
, and the prefix
will be 266
.
|
inline |
Performs inclusive scan across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present inclusive min scan operations performed on a block of 256 threads, each provides one float
value.
If the input
values across threads in a block are {1, -2, 3, -4, ..., 255, -256}
, then output
values in will be {1, -2, -2, -4, ..., -254, -256}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs inclusive scan across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs inclusive scan and reduction across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present inclusive min scan operations performed on a block of 256 threads, each provides one float
value.
If the input
values across threads in a block are {1, -2, 3, -4, ..., 255, -256}
, then output
values in will be {1, -2, -2, -4, ..., -254, -256}
, and the reduction
will be -256
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs inclusive scan and reduction across threads in a block.
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs inclusive scan across threads in a block, and uses prefix_callback_op
to generate prefix value for the whole block.
PrefixCallback | - type of the unary function object used for generating block-wide prefix value for the scan operation. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - thread input value. |
[out] | output | - reference to a thread output value. May be aliased with input . |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in,out] | prefix_callback_op | - function object for generating block prefix value. The signature of the prefix_callback_op should be equivalent to the following: T f(const T &block_reduction); . The signature does not need to have const & , but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of input values as input argument. The result of the first thread will be used as the block-wide prefix. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present inclusive prefix sum operations performed on a block of 256 threads, each thread provides one int
value.
If the input
values across threads in a block are {1, 1, 1, ..., 1}
, then output
values in will be {11, 12, 13, ..., 266}
, and the prefix
will be 266
.
|
inline |
Performs inclusive scan across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present inclusive maximum scan operations performed on a block of 128 threads, each provides two long
value.
If the input
values across threads in a block are {-1, 2, -3, 4, ..., -255, 256}
, then output
values in will be {-1, 2, 2, 4, ..., 254, 256}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs inclusive scan across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs inclusive scan and reduction across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present inclusive maximum scan operations performed on a block of 128 threads, each provides two long
value.
If the input
values across threads in a block are {-1, 2, -3, 4, ..., -255, 256}
, then output
values in will be {-1, 2, 2, 4, ..., 254, 256}
and the reduction
will be 256
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs inclusive scan and reduction across threads in a block.
ItemsPerThread | - number of items in the input array. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[out] | reduction | - result of reducing of all input values in a block. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
|
inline |
Performs inclusive scan across threads in a block, and uses prefix_callback_op
to generate prefix value for the whole block.
ItemsPerThread | - number of items in the input array. |
PrefixCallback | - type of the unary function object used for generating block-wide prefix value for the scan operation. |
BinaryFunction | - type of binary function used for scan. Default type is rocprim::plus<T>. |
[in] | input | - reference to an array containing thread input values. |
[out] | output | - reference to a thread output array. May be aliased with input . |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in,out] | prefix_callback_op | - function object for generating block prefix value. The signature of the prefix_callback_op should be equivalent to the following: T f(const T &block_reduction); . The signature does not need to have const & , but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of input values as input argument. The result of the first thread will be used as the block-wide prefix. |
[in] | scan_op | - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b); . The signature does not need to have const & , but function object must not modify the objects passed to it. |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.The examples present inclusive prefix sum operations performed on a block of 128 threads, each thread provides two int
value.
If the input
values across threads in a block are {1, 1, 1, ..., 1}
, then output
values in will be {11, 12, 13, ..., 266}
, and the prefix
will be 266
.