cuda-kat
CUDA kernel author's tools
|
Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction. More...
Namespaces | |
special_registers | |
Special register getter wrappers. | |
Enumerations | |
enum | funnel_shift_amount_resolution_mode_t { funnel_shift_amount_resolution_mode_t::take_lower_bits_of_amount, funnel_shift_amount_resolution_mode_t::cap_at_full_word_size } |
Use this to select which variant of the funnel shift intrinsic to use. More... | |
Functions | |
template<typename I > | |
KAT_FD I | multiplication_high_bits (I x, I y) |
When multiplying two n-bit numbers, the result may take up to 2n bits. More... | |
template<typename F > | |
KAT_FD F | divide (F dividend, F divisor) |
Division which becomes faster and less precise than regular "/", when –use-fast-math is specified; otherwise it's the same as regular "/". | |
template<typename F > | |
KAT_FD F | clamp_to_unit_segment (F x) |
clamps the input value to the unit segment [0.0,+1.0]. More... | |
template<typename T > | |
KAT_FD T | absolute_value (T x) |
template<typename T > | |
KAT_FD T | minimum (T x, T y)=delete |
template<typename T > | |
KAT_FD T | maximum (T x, T y)=delete |
template<typename I > | |
KAT_FD std::make_unsigned< I >::type | sum_with_absolute_difference (I x, I y, typename std::make_unsigned< I >::type addend) |
Computes addend + |x- y| . More... | |
template<typename I > | |
KAT_FD int | population_count (I x) |
template<typename I > | |
KAT_FD I | bit_reverse (I x)=delete |
template<typename I > | |
KAT_FD unsigned | find_leading_non_sign_bit (I x)=delete |
Find the most-significant, i.e. More... | |
template<typename I > | |
KAT_FD int | count_leading_zeros (I x)=delete |
Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros) More... | |
KAT_FD unsigned | permute_bytes (unsigned first, unsigned second, unsigned byte_selectors) |
See: relevant section of the CUDA PTX reference for an explanation of what this does exactly. More... | |
template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size> | |
KAT_FD uint32_t | funnel_shift_right (uint32_t low_word, uint32_t high_word, uint32_t shift_amount) |
Performs a right-shift on the combination of the two arguments into a single, double-the-length, value. More... | |
template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size> | |
KAT_FD uint32_t | funnel_shift_left (uint32_t low_word, uint32_t high_word, uint32_t shift_amount) |
Performs a left-shift on the combination of the two arguments into a single, double-the-length, value. More... | |
template<typename I > | |
I KAT_FD | average (I x, I y)=delete |
compute the average of two integer values without needing special accounting for overflow - rounding down | |
template<typename I > | |
I KAT_FD | average_rounded_up (I x, I y)=delete |
compute the average of two values without needing special accounting for overflow - rounding up More... | |
Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction.
|
delete |
compute the average of two values without needing special accounting for overflow - rounding up
KAT_FD F kat::builtins::clamp_to_unit_segment | ( | F | x | ) |
clamps the input value to the unit segment [0.0,+1.0].
|
delete |
Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)
Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)
x | the number whose representation is to be counted |
|
delete |
Find the most-significant, i.e.
leading, bit that's different from the input's sign bit.
KAT_FD uint32_t kat::builtins::funnel_shift_left | ( | uint32_t | low_word, |
uint32_t | high_word, | ||
uint32_t | shift_amount | ||
) |
Performs a left-shift on the combination of the two arguments into a single, double-the-length, value.
low_word | |
high_word | |
shift_amount | The number of bits to left-shift |
AmountResolutionMode | shift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values. |
KAT_FD uint32_t kat::builtins::funnel_shift_right | ( | uint32_t | low_word, |
uint32_t | high_word, | ||
uint32_t | shift_amount | ||
) |
Performs a right-shift on the combination of the two arguments into a single, double-the-length, value.
low_word | |
high_word | |
shift_amount | The number of bits to right-shift |
AmountResolutionMode | shift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values. |
KAT_FD I kat::builtins::multiplication_high_bits | ( | I | x, |
I | y | ||
) |
When multiplying two n-bit numbers, the result may take up to 2n bits.
without upcasting, the value of x * y is the lower n bits of the result; this lets you get the upper bits, without performing a 2n-by-2n multiplication
KAT_FD unsigned kat::builtins::permute_bytes | ( | unsigned | first, |
unsigned | second, | ||
unsigned | byte_selectors | ||
) |
See: relevant section of the CUDA PTX reference for an explanation of what this does exactly.
first | a first value from which to potentially use bytes |
second | a second value from which to potentially use bytes |
byte_selectors | a packing of 4 selector structures; each selector structure is 3 bits specifying which of the input bytes are to be used (as there are 8 bytes overall in first and second ), and another bit specifying if it's an actual copy of a byte, or instead whether the sign of the byte (intrepeted as an int8_t) should be replicated to fill the target byte. |
KAT_FD std::make_unsigned<I>::type kat::builtins::sum_with_absolute_difference | ( | I | x, |
I | y, | ||
typename std::make_unsigned< I >::type | addend | ||
) |
Computes addend
+ |x-
y|
.
See the relevant section of the PTX ISA reference.
x
and y
.