rocPRIM
Namespaces | Classes | Typedefs | Functions
Primitivesmodule_deviceconfigs
Collaboration diagram for Primitivesmodule_deviceconfigs:

Namespaces

 detail
 Deprecated: Configuration of device-level scan primitives.
 

Classes

struct  default_config
 Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function's parameters and the target device architecture specified by ROCPRIM_TARGET_ARCH. More...
 
struct  kernel_config< BlockSize, ItemsPerThread, SizeLimit >
 Configuration of particular kernels launched by device-level operation. More...
 
struct  radix_sort_onesweep_config< HistogramConfig, SortConfig, RadixBits, RadixRankAlgorithm >
 Configuration of subalgorithm Onesweep. More...
 
struct  reduce_config< BlockSize, ItemsPerThread, BlockReduceMethod, SizeLimit >
 Configuration of device-level reduce primitives. More...
 
struct  scan_config_v2< BlockSize, ItemsPerThread, BlockLoadMethod, BlockStoreMethod, BlockScanMethod, SizeLimit >
 Configuration of device-level scan primitives. More...
 
struct  scan_by_key_config_v2< BlockSize, ItemsPerThread, BlockLoadMethod, BlockStoreMethod, BlockScanMethod, SizeLimit >
 Configuration of device-level scan-by-key operation. More...
 
struct  transform_config< BlockSize, ItemsPerThread, SizeLimit >
 Configuration for the device-level transform operation. More...
 
struct  binary_search_config< BlockSize, ItemsPerThread, SizeLimit >
 Configuration for the device-level binary search operation. More...
 
struct  upper_bound_config< BlockSize, ItemsPerThread, SizeLimit >
 Configuration for the device-level upper bound operation. More...
 
struct  lower_bound_config< BlockSize, ItemsPerThread, SizeLimit >
 Configuration for the device-level lower bound operation. More...
 
struct  histogram_config< HistogramConfig, MaxGridSize, SharedImplMaxBins, SharedImplHistograms >
 Configuration of device-level histogram operation. More...
 
struct  adjacent_difference_config< BlockSize, ItemsPerThread, LoadMethod, StoreMethod, SizeLimit >
 Configuration of device-level adjacent_difference primitives. More...
 
struct  merge_sort_config< MergeOddevenBlockSize, SortBlockSize, SortItemsPerThread, MergeMergepathPartitionBlockSize, MergeMergepathBlockSize, MergeMergepathItemsPerThread, MinInputSizeMergepath >
 Configuration of device-level merge primitives. More...
 
struct  radix_sort_config_v2< SingleSortConfig, MergeSortConfig, OnesweepConfig, MergeSortLimit >
 Configuration of device-level radix sort operation. More...
 
struct  reduce_by_key_config_v2< BlockSize, ItemsPerThread, LoadKeysMethod, LoadValuesMethod, ScanAlgorithm, TilesPerBlock, SizeLimit >
 Configuration of device-level reduce-by-key operation. More...
 
struct  run_length_encode_config< ReduceByKeyConfig, SelectConfig >
 Configuration of device-level run-length encoding operation. More...
 
struct  WarpSortConfig< LogicalWarpSizeSmall, ItemsPerThreadSmall, BlockSizeSmall, PartitioningThreshold, EnableUnpartitionedWarpSort, LogicalWarpSizeMedium, ItemsPerThreadMedium, BlockSizeMedium >
 Configuration of the warp sort part of the device segmented radix sort operation. More...
 
struct  DisabledWarpSortConfig
 Indicates if the warp level sorting is disabled in the device segmented radix sort configuration. More...
 
struct  segmented_radix_sort_config< LongRadixBits, ShortRadixBits, SortConfig, WarpSortConfig >
 Configuration of device-level segmented radix sort operation. More...
 
struct  select_config< BlockSize, ItemsPerThread, KeyBlockLoadMethod, ValueBlockLoadMethod, FlagBlockLoadMethod, BlockScanMethod, SizeLimit >
 Configuration of device-level select operation. More...
 

Typedefs

template<unsigned int BlockSize, unsigned int ItemsPerThread>
using merge_config = kernel_config< BlockSize, ItemsPerThread >
 Configuration of device-level merge primitives.
 
template<class Key , unsigned int MediumWarpSize = ROCPRIM_WARP_SIZE_32>
using select_warp_sort_config_t = std::conditional_t< sizeof(Key)< 2, DisabledWarpSortConfig, WarpSortConfig< 32, 4, 256, 3000,(sizeof(Key) > 2), MediumWarpSize, 4, 256 > >
 Selects the appropriate WarpSortConfig based on the size of the key type. More...
 

Functions

template<unsigned int LongRadixBits, unsigned int ShortRadixBits, class ScanConfig , class SortConfig , class SortSingleConfig = kernel_config<256, 10>, class SortMergeConfig = kernel_config<1024, 1>, unsigned int MergeSizeLimitBlocks = 1024U, bool ForceSingleKernelConfig = false, class OnesweepHistogramConfig = kernel_config<256, 8>, class OnesweepSortConfig = kernel_config<256, 15>, unsigned int OnesweepRadixBits = 4>
struct deprecated ("use radix_sort_config_v2")]] radix_sort_config
 Legacy configuration of device-level radix sort operation. More...
 
template<class ScanConfig , class ReduceConfig >
struct deprecated ("use reduce_by_key_config_v2")]] reduce_by_key_config
 Legacy configuration of device-level reduce-by-key operation. More...
 

Detailed Description

Typedef Documentation

◆ select_warp_sort_config_t

template<class Key , unsigned int MediumWarpSize = ROCPRIM_WARP_SIZE_32>
using select_warp_sort_config_t = std::conditional_t<sizeof(Key) < 2, DisabledWarpSortConfig, WarpSortConfig<32, 4, 256, 3000, (sizeof(Key) > 2), MediumWarpSize, 4, 256 > >

Selects the appropriate WarpSortConfig based on the size of the key type.

Template Parameters
Key- the type of the sorted keys.
MediumWarpSize- the logical warp size of the medium segment processing kernel.

Function Documentation

◆ deprecated() [1/2]

template<unsigned int LongRadixBits, unsigned int ShortRadixBits, class ScanConfig , class SortConfig , class SortSingleConfig = kernel_config<256, 10>, class SortMergeConfig = kernel_config<1024, 1>, unsigned int MergeSizeLimitBlocks = 1024U, bool ForceSingleKernelConfig = false, class OnesweepHistogramConfig = kernel_config<256, 8>, class OnesweepSortConfig = kernel_config<256, 15>, unsigned int OnesweepRadixBits = 4>
struct deprecated ( "use radix_sort_config_v2 )

Legacy configuration of device-level radix sort operation.

Deprecated:
Due to a new implementation the configuration options no longer match the algorithm parameters. Use radix_sort_config_v2 for the new parameters of the algorithm. Only a best effort mapping is provided for these options, parameters not applicable to the new algorithm are ignored.

Radix sort is executed in a single tile (at size < BlocksPerItem) or few iterations (passes) depending on total number of bits to be sorted (begin_bit and end_bit), each iteration sorts either LongRadixBits or ShortRadixBits bits, chosen to cover whole bit range in optimal way.

For example, if LongRadixBits is 7, ShortRadixBits is 6, begin_bit is 0 and end_bit is 32 there will be 5 iterations: 7 + 7 + 6 + 6 + 6 = 32 bits.

Template Parameters
LongRadixBits- number of bits in long iterations.
ShortRadixBits- number of bits in short iterations, must be equal to or less than LongRadixBits.
ScanConfig- configuration of digits scan kernel. Must be kernel_config.
SortConfig- configuration of radix sort kernel. Must be kernel_config.

Configuration of radix sort single kernel.

Configuration of merge sort algorithm.

Configuration of radix sort onesweep.

Maximum number of items to use merge sort algorithm.

◆ deprecated() [2/2]

template<class ScanConfig , class ReduceConfig >
struct deprecated ( "use reduce_by_key_config_v2 )

Legacy configuration of device-level reduce-by-key operation.

Deprecated:
Due to a new implementation the configuration options no longer match the algorithm parameters. Use reduce_by_key_config_v2 for the new parameters of the algorithm. Only a best effort mapping is provided for these options, parameters not applicable to the new algorithm are ignored.
Template Parameters
ScanConfig- configuration of carry-outs scan kernel. Must be kernel_config.
ReduceConfig- configuration of the main reduce-by-key kernel. Must be kernel_config.

Configuration of carry-outs scan kernel.

Configuration of the main reduce-by-key kernel.