Thin C++-flavored wrappers for the CUDA Runtime API
Typedefs | Enumerations | Functions
cuda::event Namespace Reference

Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself) More...


using duration_t = ::std::chrono::duration< float, ::std::milli >
 The type used by the CUDA Runtime API to represent the time difference between pairs of events.
using handle_t = CUevent
 The CUDA Runtime API's numeric handle for events.


enum  : bool {
  sync_by_busy_waiting = false,
  sync_by_blocking = true
 Synchronization option for cuda::event_t 's. More...
enum  : bool {
  dont_record_timings = false,
  do_record_timings = true
 Should the CUDA Runtime API record timing information for events as it schedules them?
enum  : bool {
  not_interprocess = false,
  interprocess = true,
  single_process = not_interprocess
 IPC usability option for {cuda::event_t}'s. More...


event_t wrap (device::id_t device_id, context::handle_t context_handle, handle_t event_handle, bool take_ownership=false) noexcept
 Wrap an existing CUDA event in a event_t instance. More...
::std::string identify (const event_t &event)
duration_t time_elapsed_between (const event_t &start, const event_t &end)
 Determine (inaccurately) the elapsed time between two events. More...
event_t create (device_t &device, bool uses_blocking_sync=sync_by_busy_waiting, bool records_timing=do_record_timings, bool interprocess=not_interprocess)
 creates a new execution stream on a device. More...
event_t create (const context_t &context, bool uses_blocking_sync, bool records_timing, bool interprocess)

Detailed Description

Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself)

Enumeration Type Documentation

◆ anonymous enum

anonymous enum : bool

Synchronization option for cuda::event_t 's.


The thread calling event_.synchronize() will enter a busy-wait loop; this (might) minimize delay between kernel execution conclusion and control returning to the thread, but is very wasteful of CPU time.


The thread calling event_.synchronize() will block - yield control of the CPU and will only become ready for execution after the kernel has completed its execution - at which point it would have to wait its turn among other threads.

This does not waste CPU computing time, but results in a longer delay.

◆ anonymous enum

anonymous enum : bool

IPC usability option for {cuda::event_t}'s.


Can only be used by the process which created it.


Can be shared between processes. Must not be able to record timings.

Function Documentation

◆ create()

event_t cuda::event::create ( device_t &  device,
bool  uses_blocking_sync = sync_by_busy_waiting,
bool  records_timing = do_record_timings,
bool  interprocess = not_interprocess 

creates a new execution stream on a device.

deviceThe device on which to create the new stream
uses_blocking_syncWhen synchronizing on this new event, shall a thread busy-wait for it, or block?
records_timingCan this event be used to record time values (e.g. duration between events)?
interprocessCan multiple processes work with the constructed event?
The constructed event proxy
Creating an event

◆ time_elapsed_between()

duration_t cuda::event::time_elapsed_between ( const event_t start,
const event_t end 

Determine (inaccurately) the elapsed time between two events.

Q: Why the weird output type? A: This is what the CUDA Runtime API itself returns
startfirst timepoint event
endsecond, later, timepoint event
the difference in the (inaccurately) measured time, in msec

◆ wrap()

event_t cuda::event::wrap ( device::id_t  device_id,
context::handle_t  context_handle,
handle_t  event_handle,
bool  take_ownership = false 

Wrap an existing CUDA event in a event_t instance.

This is a named constructor idiom, existing of direct access to the ctor of the same signature, to emphasize that a new event is not created.
context_handleHandle of the context in which this event was created
event_handlehandle of the pre-existing event
take_ownershipWhen set to false, the CUDA event will not be destroyed along with proxy; use this setting when temporarily working with a stream existing irrespective of the current context and outlasting it. When set to true, the proxy class will act as it does usually, destroying the event when being destructed itself.
an event wrapper associated with the specified event