cuda-kat
CUDA kernel author's tools
Macros | Functions
non-builtins.cuh File Reference

Templated, uniformly-named C++ functions wrapping what should have been single PTX - but aren't (in a dedicated non_builtins namespace). More...

#include <kat/on_device/builtins.cuh>

Functions

template<typename I >
KAT_FD int kat::non_builtins::find_first_set (I x)
 Determine the 1-based index of the first non-zero bit in the argument. More...
 
template<>
KAT_FD int kat::non_builtins::find_first_set< int > (int x)
 
template<>
KAT_FD int kat::non_builtins::find_first_set< long long > (long long x)
 
template<typename I , bool FixSemanticsForZero = true>
KAT_FD int kat::non_builtins::count_trailing_zeros (I x)
 counts the number of initial zeros when considering the binary representation of a number from least to most significant digit More...
 
template<typename I >
KAT_FD int kat::non_builtins::count_leading_zeros (I x)
 counts the number of initial zeros when considering the binary representation of a number from most to least significant digit More...
 

Detailed Description

Templated, uniformly-named C++ functions wrapping what should have been single PTX - but aren't (in a dedicated non_builtins namespace).

There are several functions one would expect would compile to single PTX instructions (Similar ones do compile to single PTX instructions, and on the CPU, they themselves often translate to a single machine instruction) - but strangely, they do not. Implementations of such functions are found in this file rather than in on_device/builtins.cuh; and they get a different namespace to avoid accidental confusion.

Function Documentation

§ count_leading_zeros()

template<typename I >
KAT_FD int kat::non_builtins::count_leading_zeros ( x)
delete

counts the number of initial zeros when considering the binary representation of a number from most to least significant digit

Return the number of bits, beginning from the least-significant, which are all 0 ("leading" zeros)

Parameters
xthe number whose representation is to be counted
Returns
the counted number of 0 bits; if x is 0, 32 is returned

§ count_trailing_zeros()

template<typename I , bool FixSemanticsForZero = true>
KAT_FD int kat::non_builtins::count_trailing_zeros ( x)

counts the number of initial zeros when considering the binary representation of a number from least to most significant digit

Template Parameters
FixSemanticsForZerothe simpler implementation of this function uses the find_first_set() builtin. Unfortunately, that one returns -1 rather than 0 if no bits are set. Fixing this requires a couple of extra instructions. By default, we'll use them, but one might be interested just skipping them and taking -1 instead of 32 (= warp_size) for the no-1's case.
Parameters
xthe number whose binary representation is to be counted
Returns
the number of initial zero bits before the first 1; if x is 0, the full number of bits is returned (or -1, depending on
Template Parameters
FixSemanticsForZero).

§ find_first_set()

template<typename I >
KAT_FD int kat::non_builtins::find_first_set ( x)

Determine the 1-based index of the first non-zero bit in the argument.

Parameters
xthe value to be considered as a container of bits
Returns
If x is 0, returns 0; otherwise, returns the 1-based index of the first non-zero bit in x