cuda-kat
CUDA kernel author's tools
Namespaces | Functions | Variables
kat::ptx Namespace Reference

Code exposing CUDA's PTX intermediate representation instructions to C++ code. More...

Namespaces

 special_registers
 Wrappers for instructions obtaining the value of one of the special hardware registers on nVIDIA GPUs.
 

Functions

KAT_FD void trap ()
 Aborts execution (of the entire kernel grid) and generates an interrupt to the host CPU.
 
KAT_FD void exit ()
 Ends execution of the current thread of this kernel/grid.
 
 DEFINE_IS_IN_MEMORY_SPACE (const) DEFINE_IS_IN_MEMORY_SPACE(global) DEFINE_IS_IN_MEMORY_SPACE(local) DEFINE_IS_IN_MEMORY_SPACE(shared) DEFINE_BFIND(s32) DEFINE_BFIND(s64) DEFINE_BFIND(u32) DEFINE_BFIND(u64) DEFINE_PRMT_WITH_MODE(forward_4_extract
 
f4e DEFINE_PRMT_WITH_MODE (backward_4_extract, b4e) DEFINE_PRMT_WITH_MODE(replicate_8
 
f4e rc8 DEFINE_PRMT_WITH_MODE (replicate_16, rc16) DEFINE_PRMT_WITH_MODE(edge_clam_left
 
f4e rc8 ecl DEFINE_PRMT_WITH_MODE (edge_clam_right, ecl) KAT_FD uint32_t prmt(uint32_t first
 See: relevant section of the CUDA PTX reference for an explanation of what this does exactly. More...
 
 asm ("prmt.b32 %0, %1, %2, %3;" :"=r"(result) :"r"(first), "r"(second), "r"(byte_selectors))
 
 DEFINE_BFE (s32) DEFINE_BFE(s64) DEFINE_BFE(u32) DEFINE_BFE(u64) KAT_FD uint32_t bfi(uint32_t bits_to_insert
 
 asm ("bfi.b32 %0, %1, %2, %3, %4;" :"=r"(ret) :"r"(bits_to_insert), "r"(existing_bit_field), "r"(start_position), "r"(num_bits))
 
KAT_FD uint64_t bfi (uint64_t bits_to_insert, uint64_t existing_bit_field, uint32_t start_position, uint32_t num_bits)
 
 DEFINE_SAD_ (u16)
 
 DEFINE_SAD_ (u32)
 
 DEFINE_SAD_ (u64)
 
 DEFINE_SAD_ (s16)
 
 DEFINE_SAD_ (s32)
 
 DEFINE_SAD_ (s64)
 
 DEFINE_SHIFT_AND_OP (l, add) DEFINE_SHIFT_AND_OP(l
 
min DEFINE_SHIFT_AND_OP (l, max) DEFINE_SHIFT_AND_OP(r
 
min add DEFINE_SHIFT_AND_OP (r, min) DEFINE_SHIFT_AND_OP(r
 

Variables

f4e rc8 ecl uint32_t second
 
f4e rc8 ecl uint32_t uint32_t byte_selectors
 
return result
 
uint32_t existing_bit_field
 
uint32_t uint32_t start_position
 
uint32_t uint32_t uint32_t num_bits
 
return ret
 

Detailed Description

Code exposing CUDA's PTX intermediate representation instructions to C++ code.

With CUDA, device-side code is compiled from a C++-like language to an intermediate representation (IR), which is not supported directly by any GPU, but from which it is easy to compile.

Occasionally, a developer wants to use a specific PTX instruction - e.g. to optimize some code. CUDA's headers expose some of the opcodes for these instructions - but not all of them. Also, the exposed instructions are not templated on the arguments - while PTX instructions are thus templated. These two gaps are filled by this library.

Function Documentation

§ DEFINE_PRMT_WITH_MODE()

f4e rc8 ecl kat::ptx::DEFINE_PRMT_WITH_MODE ( edge_clam_right  ,
ecl   
)

See: relevant section of the CUDA PTX reference for an explanation of what this does exactly.

Parameters
firsta first value from which to potentially use bytes
seconda second value from which to potentially use bytes
byte_selectorsa packing of 4 selector structures; each selector structure is 3 bits specifying which of the input bytes are to be used (as there are 8 bytes overall in first and second ), and another bit specifying if it's an actual copy of a byte, or instead whether the sign of the byte (intrepeted as an int8_t) should be replicated to fill the target byte.
Returns
the four bytes of first and/or second, or replicated signs thereof, indicated by the byte selectors
Note
Only the lower 16 bits of byte_selectors are used.
"prmt" stands for "permute"

Variable Documentation

§ byte_selectors

f4e rc8 ecl uint32_t uint32_t kat::ptx::byte_selectors
Initial value:
{
uint32_t result

§ num_bits

uint32_t uint32_t uint32_t kat::ptx::num_bits
Initial value:
{
uint32_t ret