cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::caching< memory_operation_t::load > Struct Template Reference

Load operation caching settings. More...

#include <common_ptx_compilation_options.hpp>

Public Types

enum  mode {
  ca = 0,
  all = ca,
  cache_all = ca,
  cache_at_all_levels = ca,
  cash_in_l1_and_l2 = ca,
  cg = 1,
  global = cg,
  cache_global = cg,
  cache_at_global_level = cg,
  cache_in_l2_only = cache_at_global_level,
  cs = 2,
  evict_first = cs,
  cache_as_evict_first = cs,
  cache_streaming = cs,
  lu = 3,
  last_use = lu,
  cv = 4,
  dont_cache = cv,
  fetch_again_and_dont_cache = cv
}
 The combination of effects the execution of an instruction will have on the GPU caching mechanisms. More...
 

Static Public Attributes

static constexpr const char * mode_names [] = { "ca", "cg", "cs", "lu", "cv" }
 

Detailed Description

template<>
struct cuda::caching< memory_operation_t::load >

Load operation caching settings.

Member Enumeration Documentation

◆ mode

enum cuda::caching< memory_operation_t::load >::mode

The combination of effects the execution of an instruction will have on the GPU caching mechanisms.

Enumerator
ca 

ca - Cache at all levels, likely to be accessed again.

The default load instruction cache operation is ld.ca, which allocates cache lines in all levels (L1 and L2) with normal eviction policy. Global data is coherent at the L2 level, but multiple L1 caches are not coherent for global data.

cg 

Cache at global level (cache in L2 and below, not L1).

Use ld.cg to cache loads only globally, bypassing the L1 cache, and cache only in the L2 cache.

cs 

Cache streaming, likely to be accessed once.

The ld.cs load cached streaming operation allocates global lines with evict-first policy in L1 and L2 to limit cache pollution by temporary streaming data that may be accessed once or twice. When ld.cs is applied to a Local window address, it performs the ld.lu operation.

lu 

Last use.

The compiler/programmer may use ld.lu when restoring spilled registers and popping function stack frames to avoid needless write-backs of lines that will not be used again. The ld.lu instruction performs a load cached streaming operation (ld.cs) on global addresses.

cv 

Don't cache and fetch again (consider cached system memory lines stale, fetch again).

The ld.cv load operation applied to a global System Memory address invalidates (discards) a matching L2 line and re-fetches the line on each new load.


The documentation for this struct was generated from the following file: