The block_histogram class is a block level parallel primitive which provides methods for constructing block-wide histograms from items partitioned across threads in a block.
More...
#include <block_histogram.hpp>
|
using | storage_type = typename base_type::storage_type |
| Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
|
|
|
template<class Counter > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | init_histogram (Counter hist[Bins]) |
| Initialize histogram counters to zero. More...
|
|
template<class Counter > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | composite (T(&input)[ItemsPerThread], Counter hist[Bins], storage_type &storage) |
| Update an existing block-wide histogram. More...
|
|
template<class Counter > |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | composite (T(&input)[ItemsPerThread], Counter hist[Bins]) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
|
|
template<class Counter > |
ROCPRIM_DEVICE ROCPRIM_INLINE void | histogram (T(&input)[ItemsPerThread], Counter hist[Bins], storage_type &storage) |
| Construct a new block-wide histogram. More...
|
|
template<class Counter > |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | histogram (T(&input)[ItemsPerThread], Counter hist[Bins]) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
|
|
template<class T, unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >
The block_histogram class is a block level parallel primitive which provides methods for constructing block-wide histograms from items partitioned across threads in a block.
- Template Parameters
-
T | - the input/output type. |
BlockSize | - the number of threads in a block. |
ItemsPerThread | - the number of items to be processed by each thread. |
Bins | - the number of bins within the histogram. |
Algorithm | - selected histogram algorithm, block_histogram_algorithm::default_algorithm by default. |
- Overview
-
- Examples
In the examples histogram operation is performed on block of 192 threads, each provides one int
value, result is returned using the same variable as for input.
__global__ void example_kernel(...)
{
using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>;
__shared__ block_histogram_int::storage_type storage;
__shared__ int hist[192];
int value[2];
...
block_histogram_int().histogram(
value,
hist,
storage
);
...
}
◆ storage_type
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
using block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::storage_type = typename base_type::storage_type |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__
. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
◆ composite() [1/2]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::composite |
( |
T(&) |
input[ItemsPerThread], |
|
|
Counter |
hist[Bins], |
|
|
storage_type & |
storage |
|
) |
| |
|
inline |
Update an existing block-wide histogram.
Each thread composites an array of input elements.
- Template Parameters
-
Counter | - [inferred] counter type of histogram. |
- Parameters
-
[in] | input | - reference to an array containing thread input values. |
[out] | hist | - histogram bin count. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
- Storage reusage
- Synchronization barrier should be placed before
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
- Examples
In the examples histogram operation is performed on block of 192 threads, each provides one int
value, result is returned using the same variable as for input.
__global__ void example_kernel(...)
{
using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>;
__shared__ block_histogram_int::storage_type storage;
__shared__ int hist[192];
int value[2];
...
block_histogram_int().init_histogram(
hist
);
block_histogram_int().composite(
value,
hist,
storage
);
...
}
◆ composite() [2/2]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::composite |
( |
T(&) |
input[ItemsPerThread], |
|
|
Counter |
hist[Bins] |
|
) |
| |
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Update an existing block-wide histogram. Each thread composites an array of input elements.
- This overload does not accept storage argument. Required shared memory is allocated by the method itself.
- Template Parameters
-
Counter | - [inferred] counter type of histogram. |
- Parameters
-
[in] | input | - reference to an array containing thread input values. |
[out] | hist | - histogram bin count. |
◆ histogram() [1/2]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::histogram |
( |
T(&) |
input[ItemsPerThread], |
|
|
Counter |
hist[Bins], |
|
|
storage_type & |
storage |
|
) |
| |
|
inline |
Construct a new block-wide histogram.
Each thread contributes an array of input elements.
- Template Parameters
-
Counter | - [inferred] counter type of histogram. |
- Parameters
-
[in] | input | - reference to an array containing thread input values. |
[out] | hist | - histogram bin count. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
- Storage reusage
- Synchronization barrier should be placed before
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.
- Examples
In the examples histogram operation is performed on block of 192 threads, each provides one int
value, result is returned using the same variable as for input.
__global__ void example_kernel(...)
{
using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>;
__shared__ block_histogram_int::storage_type storage;
__shared__ int hist[192];
int value[2];
...
block_histogram_int().histogram(
value,
hist,
storage
);
...
}
◆ histogram() [2/2]
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::histogram |
( |
T(&) |
input[ItemsPerThread], |
|
|
Counter |
hist[Bins] |
|
) |
| |
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Construct a new block-wide histogram. Each thread contributes an array of input elements.
- This overload does not accept storage argument. Required shared memory is allocated by the method itself.
- Template Parameters
-
Counter | - [inferred] counter type of histogram. |
- Parameters
-
[in] | input | - reference to an array containing thread input values. |
[out] | hist | - histogram bin count. |
◆ init_histogram()
template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::init_histogram |
( |
Counter |
hist[Bins] | ) |
|
|
inline |
Initialize histogram counters to zero.
- Template Parameters
-
Counter | - [inferred] counter type of histogram. |
- Parameters
-
[out] | hist | - histogram bin count. |
The documentation for this class was generated from the following file: