rocPRIM
Public Types | Public Member Functions | List of all members
block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ > Class Template Reference

The block_histogram class is a block level parallel primitive which provides methods for constructing block-wide histograms from items partitioned across threads in a block. More...

#include <block_histogram.hpp>

Inheritance diagram for block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >:
Inheritance graph
[legend]
Collaboration diagram for block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >:
Collaboration graph
[legend]

Public Types

using storage_type = typename base_type::storage_type
 Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
 

Public Member Functions

template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void init_histogram (Counter hist[Bins])
 Initialize histogram counters to zero. More...
 
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void composite (T(&input)[ItemsPerThread], Counter hist[Bins], storage_type &storage)
 Update an existing block-wide histogram. More...
 
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void composite (T(&input)[ItemsPerThread], Counter hist[Bins])
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void histogram (T(&input)[ItemsPerThread], Counter hist[Bins], storage_type &storage)
 Construct a new block-wide histogram. More...
 
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void histogram (T(&input)[ItemsPerThread], Counter hist[Bins])
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 

Detailed Description

template<class T, unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >

The block_histogram class is a block level parallel primitive which provides methods for constructing block-wide histograms from items partitioned across threads in a block.

Template Parameters
T- the input/output type.
BlockSize- the number of threads in a block.
ItemsPerThread- the number of items to be processed by each thread.
Bins- the number of bins within the histogram.
Algorithm- selected histogram algorithm, block_histogram_algorithm::default_algorithm by default.
Overview
Examples

In the examples histogram operation is performed on block of 192 threads, each provides one int value, result is returned using the same variable as for input.

__global__ void example_kernel(...)
{
// specialize block_histogram for int, logical block of 192 threads,
// 2 items per thread and a bin size of 192.
using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>;
// allocate storage in shared memory
__shared__ block_histogram_int::storage_type storage;
__shared__ int hist[192];
int value[2];
...
// execute histogram
block_histogram_int().histogram(
value, // input
hist, // output
storage
);
...
}

Member Typedef Documentation

◆ storage_type

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
using block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::storage_type = typename base_type::storage_type

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.

Member Function Documentation

◆ composite() [1/2]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::composite ( T(&)  input[ItemsPerThread],
Counter  hist[Bins],
storage_type storage 
)
inline

Update an existing block-wide histogram.

Each thread composites an array of input elements.

Template Parameters
Counter- [inferred] counter type of histogram.
Parameters
[in]input- reference to an array containing thread input values.
[out]hist- histogram bin count.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples histogram operation is performed on block of 192 threads, each provides one int value, result is returned using the same variable as for input.

__global__ void example_kernel(...)
{
// specialize block_histogram for int, logical block of 192 threads,
// 2 items per thread and a bin size of 192.
using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>;
// allocate storage in shared memory
__shared__ block_histogram_int::storage_type storage;
__shared__ int hist[192];
int value[2];
...
// initialize histogram
block_histogram_int().init_histogram(
hist // output
);
// update histogram
block_histogram_int().composite(
value, // input
hist, // output
storage
);
...
}

◆ composite() [2/2]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::composite ( T(&)  input[ItemsPerThread],
Counter  hist[Bins] 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Update an existing block-wide histogram. Each thread composites an array of input elements.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Template Parameters
Counter- [inferred] counter type of histogram.
Parameters
[in]input- reference to an array containing thread input values.
[out]hist- histogram bin count.

◆ histogram() [1/2]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::histogram ( T(&)  input[ItemsPerThread],
Counter  hist[Bins],
storage_type storage 
)
inline

Construct a new block-wide histogram.

Each thread contributes an array of input elements.

Template Parameters
Counter- [inferred] counter type of histogram.
Parameters
[in]input- reference to an array containing thread input values.
[out]hist- histogram bin count.
[in]storage- reference to a temporary storage object of type storage_type.
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples histogram operation is performed on block of 192 threads, each provides one int value, result is returned using the same variable as for input.

__global__ void example_kernel(...)
{
// specialize block_histogram for int, logical block of 192 threads,
// 2 items per thread and a bin size of 192.
using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>;
// allocate storage in shared memory
__shared__ block_histogram_int::storage_type storage;
__shared__ int hist[192];
int value[2];
...
// execute histogram
block_histogram_int().histogram(
value, // input
hist, // output
storage
);
...
}

◆ histogram() [2/2]

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::histogram ( T(&)  input[ItemsPerThread],
Counter  hist[Bins] 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Construct a new block-wide histogram. Each thread contributes an array of input elements.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Template Parameters
Counter- [inferred] counter type of histogram.
Parameters
[in]input- reference to an array containing thread input values.
[out]hist- histogram bin count.

◆ init_histogram()

template<class T , unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<class Counter >
ROCPRIM_DEVICE ROCPRIM_INLINE void block_histogram< T, BlockSizeX, ItemsPerThread, Bins, Algorithm, BlockSizeY, BlockSizeZ >::init_histogram ( Counter  hist[Bins])
inline

Initialize histogram counters to zero.

Template Parameters
Counter- [inferred] counter type of histogram.
Parameters
[out]hist- histogram bin count.

The documentation for this class was generated from the following file: