cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
Typedefs | Enumerations | Functions
cuda::event Namespace Reference

Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself) More...

Typedefs

using duration_t = ::std::chrono::duration< float, ::std::milli >
 The type used by the CUDA Runtime API to represent the time difference between pairs of events.
 
using id_t = cudaEvent_t
 The CUDA Runtime API's numeric handle for events.
 

Enumerations

enum  : bool {
  sync_by_busy_waiting = false,
  sync_by_blocking = true
}
 Synchronization option for cuda::event_t 's. More...
 
enum  : bool {
  dont_record_timings = false,
  do_record_timings = true
}
 Should the CUDA Runtime API record timing information for events as it schedules them?
 
enum  : bool {
  not_interprocess = false,
  interprocess = true,
  single_process = not_interprocess
}
 IPC usability option for {cuda::event_t}'s. More...
 

Functions

duration_t time_elapsed_between (const event_t &start, const event_t &end)
 Determine (inaccurately) the elapsed time between two events. More...
 
event_t create (device_t &device, bool uses_blocking_sync=sync_by_busy_waiting, bool records_timing=do_record_timings, bool interprocess=not_interprocess)
 creates a new execution stream on a device. More...
 

Detailed Description

Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself)

Enumeration Type Documentation

◆ anonymous enum

anonymous enum : bool

Synchronization option for cuda::event_t 's.

Enumerator
sync_by_busy_waiting 

The thread calling event_.synchronize() will enter a busy-wait loop; this (might) minimize delay between kernel execution conclusion and control returning to the thread, but is very wasteful of CPU time.

sync_by_blocking 

The thread calling event_.synchronize() will block - yield control of the CPU and will only become ready for execution after the kernel has completed its execution - at which point it would have to wait its turn among other threads.

This does not waste CPU computing time, but results in a longer delay.

◆ anonymous enum

anonymous enum : bool

IPC usability option for {cuda::event_t}'s.

Enumerator
not_interprocess 

Can only be used by the process which created it.

interprocess 

Can be shared between processes. Must not be able to record timings.

Function Documentation

◆ create()

event_t cuda::event::create ( device_t device,
bool  uses_blocking_sync = sync_by_busy_waiting,
bool  records_timing = do_record_timings,
bool  interprocess = not_interprocess 
)
inline

creates a new execution stream on a device.

Parameters
deviceThe device on which to create the new stream
uses_blocking_syncWhen synchronizing on this new event, shall a thread busy-wait for it, or block?
records_timingCan this event be used to record time values (e.g. duration between events)?
interprocessCan multiple processes work with the constructed event?
Returns
The constructed event proxy
Note
Creating an event

◆ time_elapsed_between()

duration_t cuda::event::time_elapsed_between ( const event_t start,
const event_t end 
)
inline

Determine (inaccurately) the elapsed time between two events.

Note
Q: Why the weird output type? A: This is what the CUDA Runtime API itself returns
Parameters
startfirst timepoint event
endsecond, later, timepoint event
Returns
the difference in the (inaccurately) measured time, in msec