rocPRIM
Classes | Public Types | Public Member Functions | List of all members
block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ > Class Template Reference

The block_radix_sort class is a block level parallel primitive which provides methods for sorting of items (keys or key-value pairs) partitioned across threads in a block using radix sort algorithm. More...

#include <block_radix_sort.hpp>

Public Types

using storage_type = detail::raw_storage< storage_type_ >
 Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. More...
 

Public Member Functions

ROCPRIM_DEVICE ROCPRIM_INLINE void sort (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs ascending radix sort over keys partitioned across threads in a block. More...
 
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE void sort_desc (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs descending radix sort over keys partitioned across threads in a block. More...
 
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort_desc (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void sort (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs ascending radix sort over key-value pairs partitioned across threads in a block. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void sort_desc (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs descending radix sort over key-value pairs partitioned across threads in a block. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort_desc (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE void sort_to_striped (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement. More...
 
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort_to_striped (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
ROCPRIM_DEVICE ROCPRIM_INLINE void sort_desc_to_striped (Key(&keys)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement. More...
 
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort_desc_to_striped (Key(&keys)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void sort_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void sort_desc_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], storage_type &storage, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement. More...
 
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void sort_desc_to_striped (Key(&keys)[ItemsPerThread], typename std::enable_if< WithValues, Value >::type(&values)[ItemsPerThread], unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key))
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More...
 

Detailed Description

template<class Key, unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >

The block_radix_sort class is a block level parallel primitive which provides methods for sorting of items (keys or key-value pairs) partitioned across threads in a block using radix sort algorithm.

Template Parameters
Key- the key type.
BlockSize- the number of threads in a block.
ItemsPerThread- the number of items contributed by each thread.
Value- the value type. Default type empty_type indicates a keys-only sort.
Overview
  • Key type must be an arithmetic type (that is, an integral type or a floating-point type).
  • Performance depends on BlockSize and ItemsPerThread.
    • It is usually better for BlockSize to be a multiple of the size of the hardware warp.
    • It is usually increased when ItemsPerThread is greater than one. However, when there are too many items per thread, each thread may need so much registers and/or shared memory that occupancy will fall too low, decreasing the performance.
    • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.
Examples

In the examples radix sort is performed on a block of 256 threads, each thread provides eight int value, results are returned using the same array as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for int, block of 256 threads,
// and eight items per thread; key-only sort
using block_rsort_int = rocprim::block_radix_sort<int, 256, 8>;
// allocate storage in shared memory
__shared__ block_rsort_int::storage_type storage;
int input[8] = ...;
// execute block radix sort (ascending)
block_rsort_int().sort(
input,
storage
);
...
}

Member Typedef Documentation

◆ storage_type

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
using block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::storage_type = detail::raw_storage<storage_type_>

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.

Member Function Documentation

◆ sort() [1/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort ( Key(&)  keys[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs ascending radix sort over keys partitioned across threads in a block.

Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for float, block of 128 threads,
// and two items per thread; key-only sort
using block_rsort_float = rocprim::block_radix_sort<float, 128, 2>;
// allocate storage in shared memory
__shared__ block_rsort_float::storage_type storage;
float input[2] = ...;
// execute block radix sort (ascending)
block_rsort_float().sort(
input,
storage
);
...
}

If the input values across threads in a block are {[256, 255], ..., [4, 3], [2, 1]}}, then then after sort they will be equal {[1, 2], [3, 4] ..., [255, 256]}.

◆ sort() [2/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort ( Key(&)  keys[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs ascending radix sort over keys partitioned across threads in a block.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

◆ sort() [3/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs ascending radix sort over key-value pairs partitioned across threads in a block.

Precondition
Method is enabled only if Value type is different than empty_type.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 128 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for int-float pairs, block of 128
// threads, and two items per thread
using block_rsort_ii = rocprim::block_radix_sort<int, 128, 2, int>;
// allocate storage in shared memory
__shared__ block_rsort_ii::storage_type storage;
int keys[2] = ...;
float values[2] = ...;
// execute block radix sort-by-key (ascending)
block_rsort_ii().sort(
keys, values,
storage
);
...
}

If the keys across threads in a block are {[256, 255], ..., [4, 3], [2, 1]} and the values are {[1, 1], [2, 2] ..., [128, 128]}, then after sort the keys will be equal {[1, 2], [3, 4] ..., [255, 256]} and the values will be equal {[128, 128], [127, 127] ..., [2, 2], [1, 1]}.

◆ sort() [4/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs ascending radix sort over key-value pairs partitioned across threads in a block.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Precondition
Method is enabled only if Value type is different than empty_type.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

◆ sort_desc() [1/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc ( Key(&)  keys[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs descending radix sort over keys partitioned across threads in a block.

Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for float, block of 128 threads,
// and two items per thread; key-only sort
using block_rsort_float = rocprim::block_radix_sort<float, 128, 2>;
// allocate storage in shared memory
__shared__ block_rsort_float::storage_type storage;
float input[2] = ...;
// execute block radix sort (descending)
block_rsort_float().sort_desc(
input,
storage
);
...
}

If the input values across threads in a block are {[1, 2], [3, 4] ..., [255, 256]}, then after sort they will be equal {[256, 255], ..., [4, 3], [2, 1]}.

◆ sort_desc() [2/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc ( Key(&)  keys[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs descending radix sort over keys partitioned across threads in a block.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

◆ sort_desc() [3/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs descending radix sort over key-value pairs partitioned across threads in a block.

Precondition
Method is enabled only if Value type is different than empty_type.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 128 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for int-float pairs, block of 128
// threads, and two items per thread
using block_rsort_ii = rocprim::block_radix_sort<int, 128, 2, int>;
// allocate storage in shared memory
__shared__ block_rsort_ii::storage_type storage;
int keys[2] = ...;
float values[2] = ...;
// execute block radix sort-by-key (descending)
block_rsort_ii().sort_desc(
keys, values,
storage
);
...
}

If the keys across threads in a block are {[1, 2], [3, 4] ..., [255, 256]} and the values are {[128, 128], [127, 127] ..., [2, 2], [1, 1]}, then after sort the keys will be equal {[256, 255], ..., [4, 3], [2, 1]} and the values will be equal {[1, 1], [2, 2] ..., [128, 128]}.

◆ sort_desc() [4/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs descending radix sort over key-value pairs partitioned across threads in a block.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Precondition
Method is enabled only if Value type is different than empty_type.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

◆ sort_desc_to_striped() [1/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc_to_striped ( Key(&)  keys[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.

Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for float, block of 128 threads,
// and two items per thread; key-only sort
using block_rsort_float = rocprim::block_radix_sort<float, 128, 2>;
// allocate storage in shared memory
__shared__ block_rsort_float::storage_type storage;
float input[2] = ...;
// execute block radix sort (descending)
block_rsort_float().sort_desc_to_striped(
input,
storage
);
...
}

If the input values across threads in a block are {[1, 2], [3, 4] ..., [255, 256]}, then after sort they will be equal {[256, 128], ..., [130, 2], [129, 1]}.

◆ sort_desc_to_striped() [2/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc_to_striped ( Key(&)  keys[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs descending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

◆ sort_desc_to_striped() [3/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc_to_striped ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.

Precondition
Method is enabled only if Value type is different than empty_type.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 4 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for int-float pairs, block of 4
// threads, and two items per thread
using block_rsort_ii = rocprim::block_radix_sort<int, 4, 2, int>;
// allocate storage in shared memory
__shared__ block_rsort_ii::storage_type storage;
int keys[2] = ...;
float values[2] = ...;
// execute block radix sort-by-key (descending)
block_rsort_ii().sort_desc_to_striped(
keys, values,
storage
);
...
}

If the keys across threads in a block are {[1, 2], [3, 4], [5, 6], [7, 8]} and the values are {[80, 70], [60, 50], [40, 30], [20, 10]}, then after sort the keys will be equal {[8, 4], [7, 3], [6, 2], [5, 1]} and the values will be equal {[10, 50], [20, 60], [30, 70], [40, 80]}.

◆ sort_desc_to_striped() [4/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_desc_to_striped ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs descending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

◆ sort_to_striped() [1/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_to_striped ( Key(&)  keys[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.

Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 128 threads, each thread provides two float value, results are returned using the same array as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for float, block of 128 threads,
// and two items per thread; key-only sort
using block_rsort_float = rocprim::block_radix_sort<float, 128, 2>;
// allocate storage in shared memory
__shared__ block_rsort_float::storage_type storage;
float keys[2] = ...;
// execute block radix sort (ascending)
block_rsort_float().sort_to_striped(
keys,
storage
);
...
}

If the input values across threads in a block are {[256, 255], ..., [4, 3], [2, 1]}}, then then after sort they will be equal {[1, 129], [2, 130] ..., [128, 256]}.

◆ sort_to_striped() [2/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_to_striped ( Key(&)  keys[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs ascending radix sort over keys partitioned across threads in a block, results are saved in a striped arrangement.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

◆ sort_to_striped() [3/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_to_striped ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
storage_type storage,
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.

Precondition
Method is enabled only if Value type is different than empty_type.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]storage- reference to a temporary storage object of type storage_type.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).
Storage reusage
Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().
Examples

In the examples radix sort is performed on a block of 4 threads, each thread provides two key-value int-float pairs, results are returned using the same arrays as for input.

__global__ void example_kernel(...)
{
// specialize block_radix_sort for int-float pairs, block of 4
// threads, and two items per thread
using block_rsort_ii = rocprim::block_radix_sort<int, 4, 2, int>;
// allocate storage in shared memory
__shared__ block_rsort_ii::storage_type storage;
int keys[2] = ...;
float values[2] = ...;
// execute block radix sort-by-key (ascending)
block_rsort_ii().sort_to_striped(
keys, values,
storage
);
...
}

If the keys across threads in a block are {[8, 7], [6, 5], [4, 3], [2, 1]} and the values are {[-1, -2], [-3, -4], [-5, -6], [-7, -8]}, then after sort the keys will be equal {[1, 5], [2, 6], [3, 7], [4, 8]} and the values will be equal {[-8, -4], [-7, -3], [-6, -2], [-5, -1]}.

◆ sort_to_striped() [4/4]

template<class Key , unsigned int BlockSizeX, unsigned int ItemsPerThread, class Value = empty_type, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
template<bool WithValues = with_values>
ROCPRIM_DEVICE ROCPRIM_FORCE_INLINE void block_radix_sort< Key, BlockSizeX, ItemsPerThread, Value, BlockSizeY, BlockSizeZ >::sort_to_striped ( Key(&)  keys[ItemsPerThread],
typename std::enable_if< WithValues, Value >::type(&)  values[ItemsPerThread],
unsigned int  begin_bit = 0,
unsigned int  end_bit = 8 * sizeof(Key) 
)
inline

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Performs ascending radix sort over key-value pairs partitioned across threads in a block, results are saved in a striped arrangement.

  • This overload does not accept storage argument. Required shared memory is allocated by the method itself.
Parameters
[in,out]keys- reference to an array of keys provided by a thread.
[in,out]values- reference to an array of values provided by a thread.
[in]begin_bit- [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0.
[in]end_bit- [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: 8 * sizeof(Key).

The documentation for this class was generated from the following file: