cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::event Namespace Reference

CUDA timing functionality, via events and their related code (not including the event wrapper type event_t itself) More...

Namespaces

 ipc
 Definitions and functionality related to CUDA events (not including the event wrapper type event_t itself)
 

Typedefs

using duration_t = ::std::chrono::duration< float, ::std::milli >
 The type used by the CUDA Runtime API to represent the time difference between pairs of events.
 
using handle_t = CUevent
 The CUDA driver's raw handle for events.
 

Enumerations

enum  : bool {
  sync_by_busy_waiting = false,
  sync_by_blocking = true
}
 Synchronization option for cuda::event_t 's. More...
 
enum  : bool {
  dont_record_timings = false,
  do_record_timings = true
}
 Should the CUDA Runtime API record timing information for events as it schedules them?
 
enum  : bool {
  not_interprocess = false,
  interprocess = true,
  single_process = not_interprocess
}
 IPC usability option for {cuda::event_t}'s. More...
 

Functions

event_t wrap (device::id_t device_id, context::handle_t context_handle, handle_t event_handle, bool take_ownership=false, bool hold_pc_refcount_unit=false) noexcept
 Wrap an existing CUDA event in a event_t instance. More...
 
::std::string identify (const event_t &event)
 
duration_t time_elapsed_between (const event_t &start, const event_t &end)
 Determine (inaccurately) the elapsed time between two events. More...
 
duration_t time_elapsed_between (const ::std::pair< const event_t &, const event_t &> &event_pair)
 
event_t create (const device_t &device, bool uses_blocking_sync=sync_by_busy_waiting, bool records_timing=do_record_timings, bool interprocess=not_interprocess)
 creates a new event on (the primary execution context of) a device. More...
 
event_t create (const context_t &context, bool uses_blocking_sync=sync_by_busy_waiting, bool records_timing=do_record_timings, bool interprocess=not_interprocess)
 creates a new event. More...
 

Detailed Description

CUDA timing functionality, via events and their related code (not including the event wrapper type event_t itself)

Enumeration Type Documentation

◆ anonymous enum

anonymous enum : bool

Synchronization option for cuda::event_t 's.

Enumerator
sync_by_busy_waiting 

The thread calling event_.synchronize() will enter a busy-wait loop; this (might) minimize delay between kernel execution conclusion and control returning to the thread, but is very wasteful of CPU time.

sync_by_blocking 

The thread calling event_.synchronize() will block - yield control of the CPU and will only become ready for execution after the kernel has completed its execution - at which point it would have to wait its turn among other threads.

This does not waste CPU computing time, but results in a longer delay.

◆ anonymous enum

anonymous enum : bool

IPC usability option for {cuda::event_t}'s.

Enumerator
not_interprocess 

Can only be used by the process which created it.

interprocess 

Can be shared between processes. Must not be able to record timings.

Function Documentation

◆ create() [1/2]

event_t cuda::event::create ( const device_t device,
bool  uses_blocking_sync = sync_by_busy_waiting,
bool  records_timing = do_record_timings,
bool  interprocess = not_interprocess 
)
inline

creates a new event on (the primary execution context of) a device.

Parameters
deviceThe device on which to create the new stream
uses_blocking_syncWhen synchronizing on this new event, shall a thread busy-wait for it, or block?
records_timingCan this event be used to record time values (e.g. duration between events)?
interprocessCan multiple processes work with the constructed event?
Returns
The constructed event proxy
Note
The created event will keep the device's primary context active while it exists.

◆ create() [2/2]

event_t cuda::event::create ( const context_t context,
bool  uses_blocking_sync = sync_by_busy_waiting,
bool  records_timing = do_record_timings,
bool  interprocess = not_interprocess 
)
inline

creates a new event.

Parameters
contextThe CUDA execution context in which to create the event
uses_blocking_syncWhen synchronizing on this new event, shall a thread busy-wait for it, or block?
records_timingCan this event be used to record time values (e.g. duration between events)?
interprocessCan multiple processes work with the constructed event?
Returns
The constructed event proxy
Note
Even if the context happens to be primary, the created event will not keep this context alive.

◆ time_elapsed_between()

duration_t cuda::event::time_elapsed_between ( const event_t start,
const event_t end 
)
inline

Determine (inaccurately) the elapsed time between two events.

Note
Q: Why the weird output type? A: This is what the CUDA Runtime API itself returns
Parameters
startfirst timepoint event
endsecond, later, timepoint event
Returns
the difference in the (inaccurately) measured time, in msec

◆ wrap()

event_t cuda::event::wrap ( device::id_t  device_id,
context::handle_t  context_handle,
handle_t  event_handle,
bool  take_ownership = false,
bool  hold_pc_refcount_unit = false 
)
inlinenoexcept

Wrap an existing CUDA event in a event_t instance.

Note
This is a named constructor idiom, existing of direct access to the ctor of the same signature, to emphasize that a new event is not created.
Parameters
device_idIndex of the device to which the event relates
context_handleHandle of the context in which this event was created
event_handlehandle of the pre-existing event
take_ownershipWhen set to false, the CUDA event will not be destroyed along with proxy; use this setting when temporarily working with a stream existing irrespective of the current context and outlasting it. When set to true, the proxy class will act as it does usually, destroying the event when being destructed itself.
hold_pc_refcount_unitwhen the event's context is a device's primary context, this controls whether that context must be kept active while the event continues to exist.
Returns
an event wrapper associated with the specified event