rocPRIM
Classes | Public Types | Public Member Functions | List of all members
block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ > Class Template Reference

The block_discontinuity class is a block level parallel primitive which provides methods for flagging items that are discontinued within an ordered set of items across threads in a block. More...

#include <block_discontinuity.hpp>

Inheritance diagram for block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >:
Inheritance graph
[legend]
Collaboration diagram for block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >:
Collaboration graph
[legend]

Public Types

using storage_type = detail::raw_storage< storage_type_ >
 Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
 

Public Member Functions

template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item has no reference and is always flagged. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not take a reference to temporary storage, instead it is declared as part of the function itself. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item has no reference and is always flagged. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_tails (Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)
 Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first and last items of the first and last thread is compared against a tile_predecessor_item and a tile_successor_item. More...
 
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void flag_heads_and_tails (Flag(&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag(&tail_flags)[ItemsPerThread], T tile_successor_item, const T(&input)[ItemsPerThread], FlagOp flag_op)
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. More...
 

Detailed Description

template<class T, unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >

The block_discontinuity class is a block level parallel primitive which provides methods for flagging items that are discontinued within an ordered set of items across threads in a block.

Template Parameters
T- the input type.
BlockSize- the number of threads in a block.
Overview
  • There are two types of flags:
    • Head flags.
    • Tail flags.
  • The above flags are used to differentiate items from their predecessors or successors.
  • E.g. Head flags are convenient for differentiating disjoint data segments as part of a segmented reduction/scan.
Examples

In the examples discontinuity operation is performed on block of 128 threads, using type int.

__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int head_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage);
...
}

Member Typedef Documentation

◆ storage_type

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
using block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::storage_type = detail::raw_storage<storage_type_>

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.

Member Function Documentation

◆ flag_heads() [1/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item has no reference and is always flagged.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int head_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage);
...
}

◆ flag_heads() [2/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not take a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads() [3/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]tile_predecessor_item- first tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int head_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads(head_flags, tile_item, input, flag_op_type(),
storage);
...
}

◆ flag_heads() [4/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [1/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[out]tail_flags- array that contains the tail flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int head_flags[8];
int tail_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, input,
flag_op_type(), storage);
...
}

◆ flag_heads_and_tails() [2/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [3/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[out]tail_flags- array that contains the tail flags.
[in]tile_successor_item- last tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int head_flags[8];
int tail_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, tile_item,
input, flag_op_type(),
storage);
...
}

◆ flag_heads_and_tails() [4/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [5/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]tile_predecessor_item- first tile item from thread to be compared against.
[out]tail_flags- array that contains the tail flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int head_flags[8];
int tail_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tile_item, tail_flags,
input, flag_op_type(),
storage);
...
}

◆ flag_heads_and_tails() [6/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_heads_and_tails() [7/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first and last items of the first and last thread is compared against a tile_predecessor_item and a tile_successor_item.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]head_flags- array that contains the head flags.
[in]tile_predecessor_item- first tile item from thread to be compared against.
[out]tail_flags- array that contains the tail flags.
[in]tile_successor_item- last tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_predecessor_item = 0;
int tile_successor_item = 0;
if (threadIdx.x == 0)
{
tile_predecessor_item = ...
tile_successor_item = ...
}
...
int head_flags[8];
int tail_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_heads_and_tails(head_flags, tile_predecessor_item,
tail_flags, tile_successor_item,
input, flag_op_type(),
storage);
...
}

◆ flag_heads_and_tails() [8/8]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_heads_and_tails ( Flag(&)  head_flags[ItemsPerThread],
tile_predecessor_item,
Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_tails() [1/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item has no reference and is always flagged.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]tail_flags- array that contains the tail flags.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
...
int tail_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_tails(tail_flags, input, flag_op_type(), storage);
...
}

◆ flag_tails() [2/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.

◆ flag_tails() [3/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op,
storage_type storage 
)
inline

Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item.

Template Parameters
ItemsPerThread- [inferred] the number of items to be processed by each thread.
Flag- [inferred] the flag type.
FlagOp- [inferred] type of binary function used for flagging.
Parameters
[out]tail_flags- array that contains the tail flags.
[in]tile_successor_item- last tile item from thread to be compared against.
[in]input- array that data is loaded from.
[in]flag_op- binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Example.
__global__ void example_kernel(...)
{
// specialize discontinuity for int and a block of 128 threads
using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
// allocate storage in shared memory
__shared__ block_discontinuity_int::storage_type storage;
// segment of consecutive items to be used
int input[8];
int tile_item = 0;
if (threadIdx.x == 0)
{
tile_item = ...
}
...
int tail_flags[8];
block_discontinuity_int b_discontinuity;
using flag_op_type = typename rocprim::greater<int>;
b_discontinuity.flag_tails(tail_flags, tile_item, input, flag_op_type(),
storage);
...
}

◆ flag_tails() [4/4]

template<class T , unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<unsigned int ItemsPerThread, class Flag , class FlagOp >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_discontinuity< T, BlockSizeX, BlockSizeY, BlockSizeZ >::flag_tails ( Flag(&)  tail_flags[ItemsPerThread],
tile_successor_item,
const T(&)  input[ItemsPerThread],
FlagOp  flag_op 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself.

Note that this does NOT decrease the shared memory requirements of a kernel using this function.


The documentation for this class was generated from the following file: