|
rocPRIM
|
The block_radix_sort class is a block level parallel primitive which provides methods for sorting of items (keys or key-value pairs) partitioned across threads in a block using radix sort algorithm. More...
#include <block_radix_sort.hpp>
Public Types | |
| using | storage_type = detail::raw_storage< storage_type_ > |
| Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More... | |
Public Member Functions | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs ascending radix sort over keys partitioned across threads in a block. More... | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs descending radix sort over keys partitioned across threads in a block. More... | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs ascending radix sort over key-value pairs partitioned across threads in a block. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs descending radix sort over key-value pairs partitioned across threads in a block. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement. More... | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement. More... | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement. More... | |
| template<bool WithValues = with_values> | |
| ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void | sort_desc_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key)) |
| This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
The block_radix_sort class is a block level parallel primitive which provides methods for sorting of items (keys or key-value pairs) partitioned across threads in a block using radix sort algorithm.
| Key | - the key type. |
| BlockSize | - the number of threads in a block. |
| ItemsPerThread | - the number of items contributed by each thread. |
| Value | - the value type. Default type empty_type indicates a keys-only sort. |
Key type must be an arithmetic type (that is, an integral type or a floating-point type).BlockSize and ItemsPerThread.BlockSize to be a multiple of the size of the hardware warp.ItemsPerThread is greater than one. However, when there are too many items per thread, each thread may need so much registers and/or shared memory that occupancy will fall too low, decreasing the performance.Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.In the examples radix sort is performed on a block of 256 threads, each thread provides eight int value, results are returned using the same array as for input.
| using block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::storage_type = detail::raw_storage<storage_type_> |
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
|
inline |
Performs ascending radix sort over keys partitioned across threads in a block.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.
If the input values across threads in a block are {[256, 255], ..., [4, 3], [2, 1]}}, then then after sort they will be equal {[1, 2], [3, 4] ..., [255, 256]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over keys partitioned across threads in a block.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
|
inline |
Performs ascending radix sort over key-value pairs partitioned across threads in a block.
Value type is different than empty_type.| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 128 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.
If the keys across threads in a block are {[256, 255], ..., [4, 3], [2, 1]} and the values are {[1, 1], [2, 2] ..., [128, 128]}, then after sort the keys will be equal {[1, 2], [3, 4] ..., [255, 256]} and the values will be equal {[128, 128], [127, 127] ..., [2, 2], [1, 1]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over key-value pairs partitioned across threads in a block.
Value type is different than empty_type.| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
|
inline |
Performs descending radix sort over keys partitioned across threads in a block.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.
If the input values across threads in a block are {[1, 2], [3, 4] ..., [255, 256]}, then after sort they will be equal {[256, 255], ..., [4, 3], [2, 1]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over keys partitioned across threads in a block.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
|
inline |
Performs descending radix sort over key-value pairs partitioned across threads in a block.
Value type is different than empty_type.| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 128 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.
If the keys across threads in a block are {[1, 2], [3, 4] ..., [255, 256]} and the values are {[128, 128], [127, 127] ..., [2, 2], [1, 1]}, then after sort the keys will be equal {[256, 255], ..., [4, 3], [2, 1]} and the values will be equal {[1, 1], [2, 2] ..., [128, 128]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over key-value pairs partitioned across threads in a block.
Value type is different than empty_type.| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
|
inline |
Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.
If the input values across threads in a block are {[1, 2], [3, 4] ..., [255, 256]}, then after sort they will be equal {[256, 128], ..., [130, 2], [129, 1]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
|
inline |
Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
Value type is different than empty_type.| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 4 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.
If the keys across threads in a block are {[1, 2], [3, 4], [5, 6], [7, 8]} and the values are {[80, 70], [60, 50], [40, 30], [20, 10]}, then after sort the keys will be equal {[8, 4], [7, 3], [6, 2], [5, 1]} and the values will be equal {[10, 50], [20, 60], [30, 70], [40, 80]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
|
inline |
Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.
If the input values across threads in a block are {[256, 255], ..., [4, 3], [2, 1]}}, then then after sort they will be equal {[1, 129], [2, 130] ..., [128, 256]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
|
inline |
Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
Value type is different than empty_type.| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | storage | - reference to a temporary storage object of type storage_type. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().In the examples radix sort is performed on a block of 4 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.
If the keys across threads in a block are {[8, 7], [6, 5], [4, 3], [2, 1]} and the values are {[-1, -2], [-3, -4], [-5, -6], [-7, -8]}, then after sort the keys will be equal {[1, 5], [2, 6], [3, 7], [4, 8]} and the values will be equal {[-8, -4], [-7, -3], [-6, -2], [-5, -1]}.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.
| [in,out] | keys | - reference to an array of keys provided by a thread. |
| [in,out] | values | - reference to an array of values provided by a thread. |
| [in] | begin_bit | - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. |
| [in] | end_bit | - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: . |
1.8.13