|
cuda-kat
CUDA kernel author's tools
|
Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction. More...
Namespaces | |
| special_registers | |
| Special register getter wrappers. | |
Enumerations | |
| enum | funnel_shift_amount_resolution_mode_t { funnel_shift_amount_resolution_mode_t::take_lower_bits_of_amount, funnel_shift_amount_resolution_mode_t::cap_at_full_word_size } |
| Use this to select which variant of the funnel shift intrinsic to use. More... | |
Functions | |
| template<typename I > | |
| KAT_FD I | multiplication_high_bits (I x, I y) |
| When multiplying two n-bit numbers, the result may take up to 2n bits. More... | |
| template<typename F > | |
| KAT_FD F | divide (F dividend, F divisor) |
| Division which becomes faster and less precise than regular "/", when –use-fast-math is specified; otherwise it's the same as regular "/". | |
| template<typename F > | |
| KAT_FD F | clamp_to_unit_segment (F x) |
| clamps the input value to the unit segment [0.0,+1.0]. More... | |
| template<typename T > | |
| KAT_FD T | absolute_value (T x) |
| template<typename T > | |
| KAT_FD T | minimum (T x, T y)=delete |
| template<typename T > | |
| KAT_FD T | maximum (T x, T y)=delete |
| template<typename I > | |
| KAT_FD std::make_unsigned< I >::type | sum_with_absolute_difference (I x, I y, typename std::make_unsigned< I >::type addend) |
Computes addend + |x- y| . More... | |
| template<typename I > | |
| KAT_FD int | population_count (I x) |
| template<typename I > | |
| KAT_FD I | bit_reverse (I x)=delete |
| template<typename I > | |
| KAT_FD unsigned | find_leading_non_sign_bit (I x)=delete |
| Find the most-significant, i.e. More... | |
| template<typename I > | |
| KAT_FD int | count_leading_zeros (I x)=delete |
| Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros) More... | |
| KAT_FD unsigned | permute_bytes (unsigned first, unsigned second, unsigned byte_selectors) |
| See: relevant section of the CUDA PTX reference for an explanation of what this does exactly. More... | |
| template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size> | |
| KAT_FD uint32_t | funnel_shift_right (uint32_t low_word, uint32_t high_word, uint32_t shift_amount) |
| Performs a right-shift on the combination of the two arguments into a single, double-the-length, value. More... | |
| template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size> | |
| KAT_FD uint32_t | funnel_shift_left (uint32_t low_word, uint32_t high_word, uint32_t shift_amount) |
| Performs a left-shift on the combination of the two arguments into a single, double-the-length, value. More... | |
| template<typename I > | |
| I KAT_FD | average (I x, I y)=delete |
| compute the average of two integer values without needing special accounting for overflow - rounding down | |
| template<typename I > | |
| I KAT_FD | average_rounded_up (I x, I y)=delete |
| compute the average of two values without needing special accounting for overflow - rounding up More... | |
Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction.
|
delete |
compute the average of two values without needing special accounting for overflow - rounding up
| KAT_FD F kat::builtins::clamp_to_unit_segment | ( | F | x | ) |
clamps the input value to the unit segment [0.0,+1.0].
|
delete |
Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)
Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)
| x | the number whose representation is to be counted |
|
delete |
Find the most-significant, i.e.
leading, bit that's different from the input's sign bit.
| KAT_FD uint32_t kat::builtins::funnel_shift_left | ( | uint32_t | low_word, |
| uint32_t | high_word, | ||
| uint32_t | shift_amount | ||
| ) |
Performs a left-shift on the combination of the two arguments into a single, double-the-length, value.
| low_word | |
| high_word | |
| shift_amount | The number of bits to left-shift |
| AmountResolutionMode | shift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values. |
| KAT_FD uint32_t kat::builtins::funnel_shift_right | ( | uint32_t | low_word, |
| uint32_t | high_word, | ||
| uint32_t | shift_amount | ||
| ) |
Performs a right-shift on the combination of the two arguments into a single, double-the-length, value.
| low_word | |
| high_word | |
| shift_amount | The number of bits to right-shift |
| AmountResolutionMode | shift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values. |
| KAT_FD I kat::builtins::multiplication_high_bits | ( | I | x, |
| I | y | ||
| ) |
When multiplying two n-bit numbers, the result may take up to 2n bits.
without upcasting, the value of x * y is the lower n bits of the result; this lets you get the upper bits, without performing a 2n-by-2n multiplication
| KAT_FD unsigned kat::builtins::permute_bytes | ( | unsigned | first, |
| unsigned | second, | ||
| unsigned | byte_selectors | ||
| ) |
See: relevant section of the CUDA PTX reference for an explanation of what this does exactly.
| first | a first value from which to potentially use bytes |
| second | a second value from which to potentially use bytes |
| byte_selectors | a packing of 4 selector structures; each selector structure is 3 bits specifying which of the input bytes are to be used (as there are 8 bytes overall in first and second ), and another bit specifying if it's an actual copy of a byte, or instead whether the sign of the byte (intrepeted as an int8_t) should be replicated to fill the target byte. |
| KAT_FD std::make_unsigned<I>::type kat::builtins::sum_with_absolute_difference | ( | I | x, |
| I | y, | ||
| typename std::make_unsigned< I >::type | addend | ||
| ) |
Computes addend + |x- y| .
See the relevant section of the PTX ISA reference.
x and y .
1.8.12