Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction. More...

Namespaces
	special_registers
	Special register getter wrappers.

Enumerations
enum	funnel_shift_amount_resolution_mode_t { funnel_shift_amount_resolution_mode_t::take_lower_bits_of_amount, funnel_shift_amount_resolution_mode_t::cap_at_full_word_size }
	Use this to select which variant of the funnel shift intrinsic to use. More...

Functions
template<typename I >
KAT_FD I	multiplication_high_bits (I x, I y)
	When multiplying two n-bit numbers, the result may take up to 2n bits. More...

template<typename F >
KAT_FD F	divide (F dividend, F divisor)
	Division which becomes faster and less precise than regular "/", when –use-fast-math is specified; otherwise it's the same as regular "/".

template<typename F >
KAT_FD F	clamp_to_unit_segment (F x)
	clamps the input value to the unit segment [0.0,+1.0]. More...

template<typename T >
KAT_FD T	absolute_value (T x)

template<typename T >
KAT_FD T	minimum (T x, T y)=delete

template<typename T >
KAT_FD T	maximum (T x, T y)=delete

template<typename I >
KAT_FD std::make_unsigned< I >::type	sum_with_absolute_difference (I x, I y, typename std::make_unsigned< I >::type addend)
	Computes `addend` + \|`x-` `y\|` . More...

template<typename I >
KAT_FD int	population_count (I x)

template<typename I >
KAT_FD I	bit_reverse (I x)=delete

template<typename I >
KAT_FD unsigned	find_leading_non_sign_bit (I x)=delete
	Find the most-significant, i.e. More...

template<typename I >
KAT_FD int	count_leading_zeros (I x)=delete
	Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros) More...

KAT_FD unsigned	permute_bytes (unsigned first, unsigned second, unsigned byte_selectors)
	See: relevant section of the CUDA PTX reference for an explanation of what this does exactly. More...

template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>
KAT_FD uint32_t	funnel_shift_right (uint32_t low_word, uint32_t high_word, uint32_t shift_amount)
	Performs a right-shift on the combination of the two arguments into a single, double-the-length, value. More...

template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>
KAT_FD uint32_t	funnel_shift_left (uint32_t low_word, uint32_t high_word, uint32_t shift_amount)
	Performs a left-shift on the combination of the two arguments into a single, double-the-length, value. More...

template<typename I >
I KAT_FD	average (I x, I y)=delete
	compute the average of two integer values without needing special accounting for overflow - rounding down

template<typename I >
I KAT_FD	average_rounded_up (I x, I y)=delete
	compute the average of two values without needing special accounting for overflow - rounding up More...

Detailed Description

Uniform-naming scheme, templated-when-relevant wrappers of single PTX instruction.

Note: should contain wrappers for all instructions which are not trivially producible with simple C++ code (e.g. no add or subtract)

Enumeration Type Documentation

§ funnel_shift_amount_resolution_mode_t

enum kat::builtins::funnel_shift_amount_resolution_mode_t

strong

Use this to select which variant of the funnel shift intrinsic to use.

Enumerator
take_lower_bits_of_amount	Shift by shift_amount & (size_in_bits<native_word_t> - 1)
cap_at_full_word_size	Shift by max(shift_amount, size_in_bits<native_word_t>)

Function Documentation

§ average_rounded_up()

template<typename I >

I KAT_FD kat::builtins::average_rounded_up	(	I	x,
		I	y
	)

delete

compute the average of two values without needing special accounting for overflow - rounding up

Note: ignoring type limits, average_rounded_up(x,y) = floor ((x + y + 1 ) / 2)

§ clamp_to_unit_segment()

template<typename F >

KAT_FD F kat::builtins::clamp_to_unit_segment ( F x )

clamps the input value to the unit segment [0.0,+1.0].

Note: behavior undefined for nan/infinity/etc.

Returns: max(0.0,min(1.0,x))

§ count_leading_zeros()

template<typename I >

KAT_FD int kat::builtins::count_leading_zeros ( I x )

delete

Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)

Returns: The number of leading zeros, between 0 and the size of I in bits.

Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)

Parameters

x	the number whose representation is to be counted

Returns: the counted number of 0 bits; if x is 0, 32 is returned

§ find_leading_non_sign_bit()

template<typename I >

KAT_FD unsigned kat::builtins::find_leading_non_sign_bit ( I x )

delete

Find the most-significant, i.e.

leading, bit that's different from the input's sign bit.

Returns: for unsigned types, 0-based index of the last 1 bit, starting from the LSB towards the MSB; for signed integers it's the same if their sign bit (their MSB) is 0, and the index of the last 0 bit if the sign bit is 1.

§ funnel_shift_left()

template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>

KAT_FD uint32_t kat::builtins::funnel_shift_left	(	uint32_t	low_word,
		uint32_t	high_word,
		uint32_t	shift_amount
	)

Performs a left-shift on the combination of the two arguments into a single, double-the-length, value.

Parameters

low_word
high_word
shift_amount	The number of bits to left-shift

Template Parameters

AmountResolutionMode shift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values.

Returns: the upper bits of the result

§ funnel_shift_right()

template<funnel_shift_amount_resolution_mode_t AmountResolutionMode = funnel_shift_amount_resolution_mode_t::cap_at_full_word_size>

KAT_FD uint32_t kat::builtins::funnel_shift_right	(	uint32_t	low_word,
		uint32_t	high_word,
		uint32_t	shift_amount
	)

Performs a right-shift on the combination of the two arguments into a single, double-the-length, value.

Parameters

low_word
high_word
shift_amount	The number of bits to right-shift

Template Parameters

AmountResolutionMode shift_amount can have values which are higher than the maximum possible number of bits to right-shift; this indicates how to interpret such values.

Returns: the lower bits of the result

§ multiplication_high_bits()

template<typename I >

KAT_FD I kat::builtins::multiplication_high_bits	(	I	x,
		I	y
	)

When multiplying two n-bit numbers, the result may take up to 2n bits.

without upcasting, the value of x * y is the lower n bits of the result; this lets you get the upper bits, without performing a 2n-by-2n multiplication

§ permute_bytes()

KAT_FD unsigned kat::builtins::permute_bytes	(	unsigned	first,
		unsigned	second,
		unsigned	byte_selectors
	)

See: relevant section of the CUDA PTX reference for an explanation of what this does exactly.

Parameters

first	a first value from which to potentially use bytes
second	a second value from which to potentially use bytes
byte_selectors	a packing of 4 selector structures; each selector structure is 3 bits specifying which of the input bytes are to be used (as there are 8 bytes overall in `first` and `second` ), and another bit specifying if it's an actual copy of a byte, or instead whether the sign of the byte (intrepeted as an int8_t) should be replicated to fill the target byte.

Returns: the four bytes of first and/or second, or replicated signs thereof, indicated by the byte selectors

Note: If you don't use the sign-related bits, you could call this function "gather bytes" or "select bytes"

§ sum_with_absolute_difference()

template<typename I >

KAT_FD std::make_unsigned<I>::type kat::builtins::sum_with_absolute_difference	(	I	x,
		I	y,
		typename std::make_unsigned< I >::type	addend
	)

Computes addend + |x- y| .

See the relevant section of the PTX ISA reference.

Note: The addend and the result are always unsigned, but of the same size as x and y .

Namespaces

Enumerations

Functions

Detailed Description

Enumeration Type Documentation

§ funnel_shift_amount_resolution_mode_t

Function Documentation

§ average_rounded_up()

§ clamp_to_unit_segment()

§ count_leading_zeros()

§ find_leading_non_sign_bit()

§ funnel_shift_left()

§ funnel_shift_right()

§ multiplication_high_bits()

§ permute_bytes()

§ sum_with_absolute_difference()