rocPRIM
Classes | Public Types | Public Member Functions | List of all members
block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ > Class Template Reference

The block_adjacent_difference class is a block level parallel primitive which provides methods for applying binary functions for pairs of consecutive items partition across a thread block. More...

#include <block_adjacent_difference.hpp>

Inheritance diagram for block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >:
Inheritance graph
[legend]
Collaboration diagram for block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >:
Collaboration graph
[legend]

Public Types

using storage_type = detail::raw_storage< storage_type_ >
 Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
 

Public Member Functions

template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item has no reference and is always flagged. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item has no reference and is always flagged. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first and last items of the first and last thread is compared against a tile_predecessor_item and a tile_successor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void subtract_left (const T(&input)[ItemsPerThread], Output(&output)[ItemsPerThread], const BinaryFunction op, storage_type &storage)
 Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item. More...
 
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void subtract_left (const T(&input)[ItemsPerThread], Output(&output)[ItemsPerThread], const BinaryFunction op, const T tile_predecessor, storage_type &storage)
 Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item, with an explicit item before the tile. More...
 
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void subtract_left_partial (const T(&input)[ItemsPerThread], Output(&output)[ItemsPerThread], const BinaryFunction op, const unsigned int valid_items, storage_type &storage)
 Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item, in a partial tile. More...
 
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void subtract_left_partial (const T(&input)[ItemsPerThread], Output(&output)[ItemsPerThread], const BinaryFunction op, const T tile_predecessor, const unsigned int valid_items, storage_type &storage)
 Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item, in a partial tile with a predecessor. More...
 
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void subtract_right (const T(&input)[ItemsPerThread], Output(&output)[ItemsPerThread], const BinaryFunction op, storage_type &storage)
 Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the right item. More...
 
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void subtract_right (const T(&input)[ItemsPerThread], Output(&output)[ItemsPerThread], const BinaryFunction op, const T tile_successor, storage_type &storage)
 Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the right item, with an explicit item after the tile. More...
 
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void subtract_right_partial (const T(&input)[ItemsPerThread], Output(&output)[ItemsPerThread], const BinaryFunction op, const unsigned int valid_items, storage_type &storage)
 Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the right item, in a partial tile. More...
 

Detailed Description

template<class T, unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >

The block_adjacent_difference class is a block level parallel primitive which provides methods for applying binary functions for pairs of consecutive items partition across a thread block.

Template Parameters
T- the input type.
BlockSize- the number of threads in a block.
Overview
  • There are two types of flags:
    • Head flags.
    • Tail flags.
  • The above flags are used to differentiate items from their predecessors or successors.
  • E.g. Head flags are convenient for differentiating disjoint data segments as part of a segmented reduction/scan.
Examples

In the examples discontinuity operation is performed on block of 128 threads, using type int.

__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int head_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage);
...
}

Member Typedef Documentation

◆ storage_type

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
using block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::storage_type = detail::raw_storage<storage_type_>

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.

Member Function Documentation

◆ flag_heads() [1/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item has no reference and is always flagged.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_left() or block_discontinuity::flag_heads() instead.
Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int head_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage);
...
}

◆ flag_heads() [2/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_left() or block_discontinuity::flag_heads() instead.

This overload does not take a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads() [3/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_left() or block_discontinuity::flag_heads() instead.
Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]tile_predecessor_item- first tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int head_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads(head_flags, tile_item, input, flag_op_type(),
storage);
...
}

◆ flag_heads() [4/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_left() or block_discontinuity::flag_heads() instead.

This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [1/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[out]tail_flags- array that contains the tail flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int head_flags[8];
int tail_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, input,
flag_op_type(), storage);
...
}

◆ flag_heads_and_tails() [2/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use block_discontinuity::flag_heads_and_tails() instead.

This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [3/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use block_discontinuity::flag_heads_and_tails() instead.
Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[out]tail_flags- array that contains the tail flags.
[in]tile_successor_item- last tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int head_flags[8];
int tail_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, tile_item,
input, flag_op_type(),
storage);
...
}

◆ flag_heads_and_tails() [4/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use block_discontinuity::flag_heads_and_tails() instead.

This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [5/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use block_discontinuity::flag_heads_and_tails() instead.
Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]tile_predecessor_item- first tile item from thread to be compared against.
[out]tail_flags- array that contains the tail flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int head_flags[8];
int tail_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tile_item, tail_flags,
input, flag_op_type(),
storage);
...
}

◆ flag_heads_and_tails() [6/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use block_discontinuity::flag_heads_and_tails() instead.

This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [7/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first and last items of the first and last thread is compared against a tile_predecessor_item and a tile_successor_item.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use block_discontinuity::flag_heads_and_tails() instead.
Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]tile_predecessor_item- first tile item from thread to be compared against.
[out]tail_flags- array that contains the tail flags.
[in]tile_successor_item- last tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_predecessor_item = 0;
int tile_successor_item = 0;
if (threadIdx.x == 0)
{
tile_predecessor_item = ...
tile_successor_item = ...
}
...
int head_flags[8];
int tail_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tile_predecessor_item,
tail_flags, tile_successor_item,
input, flag_op_type(),
storage);
...
}

◆ flag_heads_and_tails() [8/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use block_discontinuity::flag_heads_and_tails() instead.

This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_tails() [1/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item has no reference and is always flagged.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_right() or block_discontinuity::flag_tails() instead.
Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]tail_flags- array that contains the tail flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int tail_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_tails(tail_flags, input, flag_op_type(), storage);
...
}

◆ flag_tails() [2/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_right() or block_discontinuity::flag_tails() instead.

This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_tails() [3/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_right() or block_discontinuity::flag_tails() instead.
Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]tail_flags- array that contains the tail flags.
[in]tile_successor_item- last tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_adjacent_difference_int = rocprim::block_adjacent_difference<int, 128>;
// allocate storage in shared memory
__shared__ block_adjacent_difference_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int tail_flags[8];
block_adjacent_difference_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_tails(tail_flags, tile_item, input, flag_op_type(),
storage);
...
}

◆ flag_tails() [4/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Deprecated:
The flags API of block_adjacent_difference is deprecated, use subtract_right() or block_discontinuity::flag_tails() instead.

This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ subtract_left() [1/2]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::subtract_left ( const T(&)  input[ItemsPerThread],
Output(&)  output[ItemsPerThread],
const BinaryFunction  op,
storage_type storage 
)
inline

Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item.

The first item in the first thread is copied from the input then for the rest the following code applies.

// For each i in [1, block_size * ItemsPerThread) across threads in a block
output[i] = op(input[i], input[i-1]);
Template Parameters
Output- [inferred] the type of output, must be assignable from the result of op
ItemsPerThread- [inferred] the number of items processed by each thread
BinaryFunction- [inferred] the type of the function to apply
Parameters
[in]input- array that data is loaded from partitioned across the threads in the block
[out]output- array where the result of function application will be written to
[in]op- binary function applied to the items. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b) The signature does not need to have const & but the function object must not modify the objects passed to it.
storagereference to a temporary storage object of type storage_type
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads() .

◆ subtract_left() [2/2]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::subtract_left ( const T(&)  input[ItemsPerThread],
Output(&)  output[ItemsPerThread],
const BinaryFunction  op,
const T  tile_predecessor,
storage_type storage 
)
inline

Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item, with an explicit item before the tile.

// For the first item on the first thread use the tile predecessor
output[0] = op(input[0], tile_predecessor)
// For other items, i in [1, block_size * ItemsPerThread) across threads in a block
output[i] = op(input[i], input[i-1]);
Template Parameters
Output- [inferred] the type of output, must be assignable from the result of op
ItemsPerThread- [inferred] the number of items processed by each thread
BinaryFunction- [inferred] the type of the function to apply
Parameters
[in]input- array that data is loaded from partitioned across the threads in the block
[out]output- array where the result of function application will be written to
[in]op- binary function applied to the items. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b) The signature does not need to have const & but the function object must not modify the objects passed to it.
[in]tile_predecessor- the item before the tile, will be used as the input of the first application of op
storage- reference to a temporary storage object of type storage_type
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads() .

◆ subtract_left_partial() [1/2]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::subtract_left_partial ( const T(&)  input[ItemsPerThread],
Output(&)  output[ItemsPerThread],
const BinaryFunction  op,
const unsigned int  valid_items,
storage_type storage 
)
inline

Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item, in a partial tile.

output[0] = input[0]
// For each item i in [1, valid_items) across threads in a block
output[i] = op(input[i], input[i-1]);
// Just copy "invalid" items in [valid_items, block_size * ItemsPerThread)
output[i] = input[i]
Template Parameters
Output- [inferred] the type of output, must be assignable from the result of op
ItemsPerThread- [inferred] the number of items processed by each thread
BinaryFunction- [inferred] the type of the function to apply
Parameters
[in]input- array that data is loaded from partitioned across the threads in the block
[out]output- array where the result of function application will be written to
[in]op- binary function applied to the items. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b) The signature does not need to have const & but the function object must not modify the objects passed to it.
[in]valid_items- number of items in the block which are considered "valid" and will be used. Must be less or equal to BlockSize * ItemsPerThread
storage- reference to a temporary storage object of type storage_type
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads() .

◆ subtract_left_partial() [2/2]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::subtract_left_partial ( const T(&)  input[ItemsPerThread],
Output(&)  output[ItemsPerThread],
const BinaryFunction  op,
const T  tile_predecessor,
const unsigned int  valid_items,
storage_type storage 
)
inline

Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the left item, in a partial tile with a predecessor.

This combines subtract_left_partial() with a tile predecessor.

Template Parameters
Output- [inferred] the type of output, must be assignable from the result of op
ItemsPerThread- [inferred] the number of items processed by each thread
BinaryFunction- [inferred] the type of the function to apply
Parameters
[in]input- array that data is loaded from partitioned across the threads in the block
[out]output- array where the result of function application will be written to
[in]op- binary function applied to the items. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b) The signature does not need to have const & but the function object must not modify the objects passed to it.
[in]tile_predecessor- the item before the tile, will be used as the input of the first application of op
[in]valid_items- number of items in the block which are considered "valid" and will be used. Must be less or equal to BlockSize * ItemsPerThread
storage- reference to a temporary storage object of type storage_type
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads() .

◆ subtract_right() [1/2]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::subtract_right ( const T(&)  input[ItemsPerThread],
Output(&)  output[ItemsPerThread],
const BinaryFunction  op,
storage_type storage 
)
inline

Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the right item.

The last item in the last thread is copied from the input then for the rest the following code applies.

// For each i in [0, block_size * ItemsPerThread - 1) across threads in a block
output[i] = op(input[i], input[i+1]);
Template Parameters
Output- [inferred] the type of output, must be assignable from the result of op
ItemsPerThread- [inferred] the number of items processed by each thread
BinaryFunction- [inferred] the type of the function to apply
Parameters
[in]input- array that data is loaded from partitioned across the threads in the block
[out]output- array where the result of function application will be written to
[in]op- binary function applied to the items. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b) The signature does not need to have const & but the function object must not modify the objects passed to it.
storage- reference to a temporary storage object of type storage_type
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads() .

◆ subtract_right() [2/2]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::subtract_right ( const T(&)  input[ItemsPerThread],
Output(&)  output[ItemsPerThread],
const BinaryFunction  op,
const T  tile_successor,
storage_type storage 
)
inline

Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the right item, with an explicit item after the tile.

// For each items i in [0, block_size * ItemsPerThread - 1) across threads in a block
output[i] = op(input[i], input[i+1]);
// For the last item on the last thread use the tile successor
output[block_size * ItemsPerThread - 1] =
op(input[block_size * ItemsPerThread - 1], tile_successor)
Template Parameters
Output- [inferred] the type of output, must be assignable from the result of op
ItemsPerThread- [inferred] the number of items processed by each thread
BinaryFunction- [inferred] the type of the function to apply
Parameters
[in]input- array that data is loaded from partitioned across the threads in the block
[out]output- array where the result of function application will be written to
[in]op- binary function applied to the items. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b) The signature does not need to have const & but the function object must not modify the objects passed to it.
[in]tile_successor- the item after the tile, will be used as the input of the last application of op
storage- reference to a temporary storage object of type storage_type
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads() .

◆ subtract_right_partial()

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<typename Output , unsigned int ItemsPerThread, typename BinaryFunction >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_adjacent_difference< T, BlockSizeX, BlockSizeY, BlockSizeZ >::subtract_right_partial ( const T(&)  input[ItemsPerThread],
Output(&)  output[ItemsPerThread],
const BinaryFunction  op,
const unsigned int  valid_items,
storage_type storage 
)
inline

Apply a function to each consecutive pair of elements partitioned across threads in the block and write the output to the position of the right item, in a partial tile.

// For each item i in [0, valid_items) across threads in a block
output[i] = op(input[i], input[i + 1]);
// Just copy "invalid" items in [valid_items, block_size * ItemsPerThread)
output[i] = input[i]
Template Parameters
Output- [inferred] the type of output, must be assignable from the result of op
ItemsPerThread- [inferred] the number of items processed by each thread
BinaryFunction- [inferred] the type of the function to apply
Parameters
[in]input- array that data is loaded from partitioned across the threads in the block
[out]output- array where the result of function application will be written to
[in]op- binary function applied to the items. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b) The signature does not need to have const & but the function object must not modify the objects passed to it.
[in]valid_items- number of items in the block which are considered "valid" and will be used. Must be less or equal to BlockSize * ItemsPerThread
storage- reference to a temporary storage object of type storage_type
Storage reuse
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads() .

The documentation for this class was generated from the following file: