cuda-kat
CUDA kernel author's tools
Namespaces | Enumerations | Functions
kat::builtins Namespace Reference

Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction. More...

Namespaces

 special_registers
 Special register getter wrappers.
 

Enumerations

enum  funnel_shift_amount_resolution_mode_t {
  funnel_shift_amount_resolution_mode_t::take_lower_bits_of_amount,
  funnel_shift_amount_resolution_mode_t::cap_at_full_word_size
}
 Use this to select which variant of the funnel shift intrinsic to use. More...
 

Functions

template<typename I >
KAT_FD I multiplication_high_bits (I x, I y)
 When multiplying two n-bit numbers, the result may take up to 2n bits. More...
 
template<typename F >
KAT_FD F divide (F dividend, F divisor)
 Division which becomes faster and less precise than regular "/", when –use-fast-math is specified; otherwise it's the same as regular "/".
 
template<typename F >
KAT_FD F clamp_to_unit_segment (F x)
 clamps the input value to the unit segment [0.0,+1.0]. More...
 
template<typename T >
KAT_FD T absolute_value (T x)
 
template<typename T >
KAT_FD T minimum (T x, T y)=delete
 
template<typename T >
KAT_FD T maximum (T x, T y)=delete
 
template<typename I >
KAT_FD std::make_unsigned< I >::type sum_with_absolute_difference (I x, I y, typename std::make_unsigned< I >::type addend)
 Computes addend + |x- y| . More...
 
template<typename I >
KAT_FD int population_count (I x)
 
template<typename I >
KAT_FD I bit_reverse (I x)=delete
 
template<typename I >
KAT_FD unsigned find_leading_non_sign_bit (I x)=delete
 Find the most-significant, i.e. More...
 
template<typename I >
KAT_FD int count_leading_zeros (I x)=delete
 Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros) More...
 
KAT_FD unsigned permute_bytes (unsigned first, unsigned second, unsigned byte_selectors)
 See: relevant section of the CUDA PTX reference for an explanation of what this does exactly. More...
 
template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>
KAT_FD uint32_t funnel_shift_right (uint32_t low_word, uint32_t high_word, uint32_t shift_amount)
 Performs a right-shift on the combination of the two arguments into a single, double-the-length, value. More...
 
template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>
KAT_FD uint32_t funnel_shift_left (uint32_t low_word, uint32_t high_word, uint32_t shift_amount)
 Performs a left-shift on the combination of the two arguments into a single, double-the-length, value. More...
 
template<typename I >
I KAT_FD average (I x, I y)=delete
 compute the average of two integer values without needing special accounting for overflow - rounding down
 
template<typename I >
I KAT_FD average_rounded_up (I x, I y)=delete
 compute the average of two values without needing special accounting for overflow - rounding up More...
 

Detailed Description

Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction.

Note
should contain wrappers for all instructions which are not trivially producible with simple C++ code (e.g. no add or subtract)

Enumeration Type Documentation

§ funnel_shift_amount_resolution_mode_t

Use this to select which variant of the funnel shift intrinsic to use.

Enumerator
take_lower_bits_of_amount 

Shift by shift_amount & (size_in_bits<native_word_t> - 1)

cap_at_full_word_size 

Shift by max(shift_amount, size_in_bits<native_word_t>)

Function Documentation

§ average_rounded_up()

template<typename I >
I KAT_FD kat::builtins::average_rounded_up ( x,
y 
)
delete

compute the average of two values without needing special accounting for overflow - rounding up

Note
ignoring type limits, average_rounded_up(x,y) = floor ((x + y + 1 ) / 2)

§ clamp_to_unit_segment()

template<typename F >
KAT_FD F kat::builtins::clamp_to_unit_segment ( x)

clamps the input value to the unit segment [0.0,+1.0].

Note
behavior undefined for nan/infinity/etc.
Returns
max(0.0,min(1.0,x))

§ count_leading_zeros()

template<typename I >
KAT_FD int kat::builtins::count_leading_zeros ( x)
delete

Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)

Returns
The number of leading zeros, between 0 and the size of I in bits.

Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)

Parameters
xthe number whose representation is to be counted
Returns
the counted number of 0 bits; if x is 0, 32 is returned

§ find_leading_non_sign_bit()

template<typename I >
KAT_FD unsigned kat::builtins::find_leading_non_sign_bit ( x)
delete

Find the most-significant, i.e.

leading, bit that's different from the input's sign bit.

Returns
for unsigned types, 0-based index of the last 1 bit, starting from the LSB towards the MSB; for signed integers it's the same if their sign bit (their MSB) is 0, and the index of the last 0 bit if the sign bit is 1.

§ funnel_shift_left()

template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>
KAT_FD uint32_t kat::builtins::funnel_shift_left ( uint32_t  low_word,
uint32_t  high_word,
uint32_t  shift_amount 
)

Performs a left-shift on the combination of the two arguments into a single, double-the-length, value.

Parameters
low_word
high_word
shift_amountThe number of bits to left-shift
Template Parameters
AmountResolutionModeshift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values.
Returns
the upper bits of the result

§ funnel_shift_right()

template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>
KAT_FD uint32_t kat::builtins::funnel_shift_right ( uint32_t  low_word,
uint32_t  high_word,
uint32_t  shift_amount 
)

Performs a right-shift on the combination of the two arguments into a single, double-the-length, value.

Parameters
low_word
high_word
shift_amountThe number of bits to right-shift
Template Parameters
AmountResolutionModeshift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values.
Returns
the lower bits of the result

§ multiplication_high_bits()

template<typename I >
KAT_FD I kat::builtins::multiplication_high_bits ( x,
y 
)

When multiplying two n-bit numbers, the result may take up to 2n bits.

without upcasting, the value of x * y is the lower n bits of the result; this lets you get the upper bits, without performing a 2n-by-2n multiplication

§ permute_bytes()

KAT_FD unsigned kat::builtins::permute_bytes ( unsigned  first,
unsigned  second,
unsigned  byte_selectors 
)

See: relevant section of the CUDA PTX reference for an explanation of what this does exactly.

Parameters
firsta first value from which to potentially use bytes
seconda second value from which to potentially use bytes
byte_selectorsa packing of 4 selector structures; each selector structure is 3 bits specifying which of the input bytes are to be used (as there are 8 bytes overall in first and second ), and another bit specifying if it's an actual copy of a byte, or instead whether the sign of the byte (intrepeted as an int8_t) should be replicated to fill the target byte.
Returns
the four bytes of first and/or second, or replicated signs thereof, indicated by the byte selectors
Note
If you don't use the sign-related bits, you could call this function "gather bytes" or "select bytes"

§ sum_with_absolute_difference()

template<typename I >
KAT_FD std::make_unsigned<I>::type kat::builtins::sum_with_absolute_difference ( x,
y,
typename std::make_unsigned< I >::type  addend 
)

Computes addend + |x- y| .

See the relevant section of the PTX ISA reference.

Note
The addend and the result are always unsigned, but of the same size as x and y .