rocPRIM
|
The block_radix_sort class is a block level parallel primitive which provides methods for sorting of items (keys or key-value pairs) partitioned across threads in a block using radix sort algorithm. More...
#include <block_radix_sort.hpp>
Public Types | |
using | storage_type = detail::raw_storage< storage_type_ > |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More... | |
Public Member Functions | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs ascending radix sort over keys partitioned across threads in a block. More... | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs descending radix sort over keys partitioned across threads in a block. More... | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs ascending radix sort over key-value pairs partitioned across threads in a block. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs descending radix sort over key-value pairs partitioned across threads in a block. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement. More... | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement. More... | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement. More... | |
template<bool WithValues = with_values> | |
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
The block_radix_sort class is a block level parallel primitive which provides methods for sorting of items (keys or key-value pairs) partitioned across threads in a block using radix sort algorithm.
Key | - the key type. |
BlockSize | - the number of threads in a block. |
ItemsPerThread | - the number of items contributed by each thread. |
Value | - the value type. Default type empty_type indicates a keys-only sort. |
Key
type must be an arithmetic type (that is, an integral type or a floating-point type).BlockSize
and ItemsPerThread
.BlockSize
to be a multiple of the size of the hardware warp.ItemsPerThread
is greater than one. However, when there are too many items per thread, each thread may need so much registers and/or shared memory that occupancy will fall too low, decreasing the performance.Key
is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit
and end_bit
, for example if all keys are in range [100, 10000], begin_bit = 0
and end_bit = 14
will cover the whole range.In the examples radix sort is performed on a block of 256 threads, each thread provides eight int
value, results are returned using the same array as for input.
using block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::storage_type = detail::raw_storage<storage_type_> |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__
. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
|
inline |
Performs ascending radix sort over keys partitioned across threads in a block.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 128 threads, each thread provides two float
value, results are returned using the same array as for input.
If the input
values across threads in a block are {[256, 255], ..., [4, 3], [2, 1]}}
, then then after sort they will be equal {[1, 2], [3, 4] ..., [255, 256]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over keys partitioned across threads in a block.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
|
inline |
Performs ascending radix sort over key-value pairs partitioned across threads in a block.
Value
type is different than empty_type.[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 128 threads, each thread provides two key-value int
-float
pairs, results are returned using the same arrays as for input.
If the keys
across threads in a block are {[256, 255], ..., [4, 3], [2, 1]}
and the values
are {[1, 1], [2, 2] ..., [128, 128]}
, then after sort the keys
will be equal {[1, 2], [3, 4] ..., [255, 256]}
and the values
will be equal {[128, 128], [127, 127] ..., [2, 2], [1, 1]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over key-value pairs partitioned across threads in a block.
Value
type is different than empty_type.[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
|
inline |
Performs descending radix sort over keys partitioned across threads in a block.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 128 threads, each thread provides two float
value, results are returned using the same array as for input.
If the input
values across threads in a block are {[1, 2], [3, 4] ..., [255, 256]}
, then after sort they will be equal {[256, 255], ..., [4, 3], [2, 1]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over keys partitioned across threads in a block.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
|
inline |
Performs descending radix sort over key-value pairs partitioned across threads in a block.
Value
type is different than empty_type.[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 128 threads, each thread provides two key-value int
-float
pairs, results are returned using the same arrays as for input.
If the keys
across threads in a block are {[1, 2], [3, 4] ..., [255, 256]}
and the values
are {[128, 128], [127, 127] ..., [2, 2], [1, 1]}
, then after sort the keys
will be equal {[256, 255], ..., [4, 3], [2, 1]}
and the values
will be equal {[1, 1], [2, 2] ..., [128, 128]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over key-value pairs partitioned across threads in a block.
Value
type is different than empty_type.[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
|
inline |
Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 128 threads, each thread provides two float
value, results are returned using the same array as for input.
If the input
values across threads in a block are {[1, 2], [3, 4] ..., [255, 256]}
, then after sort they will be equal {[256, 128], ..., [130, 2], [129, 1]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
|
inline |
Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
Value
type is different than empty_type.[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 4 threads, each thread provides two key-value int
-float
pairs, results are returned using the same arrays as for input.
If the keys
across threads in a block are {[1, 2], [3, 4], [5, 6], [7, 8]}
and the values
are {[80, 70], [60, 50], [40, 30], [20, 10]}
, then after sort the keys
will be equal {[8, 4], [7, 3], [6, 2], [5, 1]}
and the values
will be equal {[10, 50], [20, 60], [30, 70], [40, 80]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
|
inline |
Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 128 threads, each thread provides two float
value, results are returned using the same array as for input.
If the input
values across threads in a block are {[256, 255], ..., [4, 3], [2, 1]}}
, then then after sort they will be equal {[1, 129], [2, 130] ..., [128, 256]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
|
inline |
Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
Value
type is different than empty_type.[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | storage | - reference to a temporary storage object of type storage_type. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |
storage
is reused or repurposed: __syncthreads()
or rocprim::syncthreads()
.In the examples radix sort is performed on a block of 4 threads, each thread provides two key-value int
-float
pairs, results are returned using the same arrays as for input.
If the keys
across threads in a block are {[8, 7], [6, 5], [4, 3], [2, 1]}
and the values
are {[-1, -2], [-3, -4], [-5, -6], [-7, -8]}
, then after sort the keys
will be equal {[1, 5], [2, 6], [3, 7], [4, 8]}
and the values
will be equal {[-8, -4], [-7, -3], [-6, -2], [-5, -1]}
.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
[in,out] | keys | - reference to an array of keys provided by a thread. |
[in,out] | values | - reference to an array of values provided by a thread. |
[in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)) . Default value: 0 . |
[in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)] . Default value: . |