cuda-kat
CUDA kernel author's tools
Namespaces | Macros | Functions
video_instructions.cuh File Reference

Non-templated wrappers for PTX "video" instructions, which nVIDIA does not provide wrappers for through the CUDA <device_functions.h> header. More...

#include "detail/define_macros.cuh"
#include <kat/on_device/common.cuh>
#include <type_traits>
#include "detail/undefine_macros.cuh"

Namespaces

 kat::ptx
 Code exposing CUDA's PTX intermediate representation instructions to C++ code.
 

Macros

#define CUDA_KAT_PTX_VIDEO_INSTRUCTIONS_CUH_
 
#define DEFINE_SHIFT_AND_OP(direction, second_op)
 bit shift, then apply a binary operator. More...
 

Functions

 kat::ptx::DEFINE_SHIFT_AND_OP (l, add) DEFINE_SHIFT_AND_OP(l
 
min kat::ptx::DEFINE_SHIFT_AND_OP (l, max) DEFINE_SHIFT_AND_OP(r
 
min add kat::ptx::DEFINE_SHIFT_AND_OP (r, min) DEFINE_SHIFT_AND_OP(r
 

Detailed Description

Non-templated wrappers for PTX "video" instructions, which nVIDIA does not provide wrappers for through the CUDA <device_functions.h> header.

"Video" instructions are not really about video (although they're probably used for video somehow). Essentially they're instructions which combine another operation, and another operand, after the main one; additionally, they offer variants with all sorts of saturation, wraparound, sign-extension and similar bells and whistles.

These instructions (at least, the "scalar" ones) are:

vadd - addition vsub - subtraction vabsdiff - absolute difference vmin - minimum vmax - maximum vshl - shift left vshr - shift right vmad - multiply-and-add vset - equality check

For now, we won't implement most of these instructions, and even for the ones we do implement - we'll only choose some of the variants.

Macro Definition Documentation

§ DEFINE_SHIFT_AND_OP

#define DEFINE_SHIFT_AND_OP (   direction,
  second_op 
)
Value:
KAT_FD uint32_t \
vsh##direction##_##second_op ( \
uint32_t x, \
uint32_t shift_amount, \
uint32_t extra_operand) \
{ \
uint32_t ret; \
asm ("vsh" PTX_STRINGIFY(direction) ".u32.u32.u32.clamp." PTX_STRINGIFY(second_op) " %0, %1, %2, %3;" \
: "=r"(ret) \
: "r"(x) \
, "r"(shift_amount) \
, "r"(extra_operand) \
); \
return ret; \
}

bit shift, then apply a binary operator.