cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda::launch_config_builder_t Class Reference

A convenience class for gradually constructing a launch_configuration_t instance, as per the "builder pattern". More...

#include <launch_config_builder.hpp>

Public Member Functions

launch_configuration_t build () const
 Use the information specified to the builder (and defaults for the unspecified information) to finalize the construction of a kernel launch configuration, which can then be passed along with the kernel to a kernel-launching function, e.g. More...
 
launch_config_builder_tdimensions (grid::composite_dimensions_t composite_dims)
 
launch_config_builder_tblock_dimensions (grid::block_dimensions_t dims)
 
launch_config_builder_tblock_dimensions (grid::block_dimension_t x, grid::block_dimension_t y=1, grid::block_dimension_t z=1)
 Set the dimensions for each block in the intended kernel launch grid.
 
launch_config_builder_tblock_size (size_t size)
 Set the block in the intended kernel launch grid to be uni-dimensional with a specified size.
 
launch_config_builder_tuse_maximum_linear_block ()
 Set the intended kernel launch grid to have 1D blocks, of the maximum length possible given the information specified to the builder. More...
 
launch_config_builder_tgrid_dimensions (grid::dimension_t x, grid::dimension_t y=1, grid::dimension_t z=1)
 
launch_config_builder_toverall_size (size_t size)
 Set the intended launch grid to be linear, with a specified overall number of threads over all (1D) blocks in the grid.
 
launch_config_builder_tblock_cooperation (bool cooperation)
 Set whether or blocks may synchronize with each other or not. More...
 
launch_config_builder_tblocks_may_cooperate ()
 Let kernel thread blocks synchronize with each other, or are guaranteed to act independently (atomic global memory operations notwithstanding)
 
launch_config_builder_tblocks_dont_cooperate ()
 Prevent kernel thread blocks synchronize with each other, guaranteeing each block will work entirely independently (atomic global memory operations notwithstanding)
 
launch_config_builder_tdynamic_shared_memory_size (kernel::shared_memory_size_determiner_t shared_mem_size_determiner)
 
launch_config_builder_tno_dynamic_shared_memory ()
 Indicate that the intended launch should not allocate any shared memory for the kernel to use beyond the static amount necessitated by its (compiled) code. More...
 
launch_config_builder_tdynamic_shared_memory (kernel::shared_memory_size_determiner_t shared_mem_size_determiner)
 Indicate that the intended launch should allocate additional shared memory for the kernel to use beyond the static amount necessitated by its (compiled) code - with the amount to be determined based on the block size. More...
 
launch_config_builder_tkernel (const kernel_t *wrapped_kernel_ptr)
 Indicate that the specified wrapped kernel will be the one launched with the configuration to be produced by this object. More...
 
launch_config_builder_tsaturate_with_active_blocks ()
 This will use information about the kernel, the already-set block size, and the device to create a unidimensional grid of blocks to exactly saturate the CUDA device's capacity for simultaneous active blocks. More...
 
launch_config_builder_tmin_params_for_max_occupancy ()
 This will use information about the kernel and the device to define a minimum launch grid which should guarantee maximum occupancy of the GPU's multiprocessors. More...
 
launch_config_builder_tgrid_dimensions (grid::dimensions_t dims)
 Set the dimension of the grid for the intended kernel launch, in terms of blocks.
 
launch_config_builder_tgrid_size (size_t size)
 Set the grid for the intended launch to be one-dimensional, with a specified number of blocks.
 
launch_config_builder_tnum_blocks (size_t size)
 
launch_config_builder_toverall_dimensions (grid::overall_dimensions_t dims)
 Set the overall number of threads, in each dimension, of all blocks in the grid of the intended kernel launch.
 
launch_config_builder_toverall_dimensions (grid::overall_dimension_t x, grid::overall_dimension_t y=1, grid::overall_dimension_t z=1)
 
launch_config_builder_tdynamic_shared_memory_size (memory::shared::size_t size)
 Indicate that the intended launch should allocate a certain amount of shared memory for the kernel to use beyond the static amount necessitated by its (compiled) code.
 
launch_config_builder_tdynamic_shared_memory (memory::shared::size_t size)
 
launch_config_builder_tdevice (const device::id_t device_id)
 Indicate that the intended kernel launch would occur on (some stream in some context on) the specified device. More...
 
launch_config_builder_tdevice (const device_t &device)
 
launch_config_builder_tkernel_independent ()
 Clear the association with a specific kernel (which may have been set using the kernel method)
 
launch_config_builder_tno_kernel ()
 

Detailed Description

A convenience class for gradually constructing a launch_configuration_t instance, as per the "builder pattern".

Note
with a constructed class, repeatedly invoke a member function to add settings - the result of the application is the same builder, so you can combine all settings into a single expression, then finally invoke launch_config_builder_t::build to finalize the build and obtain the launch_configuration_t object.
This class will perform some validation of the settings you make - but do not assume it guarantees validity. Also, the validations may be either eager (when making a setting) or lazy (when finally building the launch configuration).

Member Function Documentation

◆ block_cooperation()

launch_config_builder_t& cuda::launch_config_builder_t::block_cooperation ( bool  cooperation)
inline

Set whether or blocks may synchronize with each other or not.

Note
recall that even "non-cooperative" blocks can still access the same global memory locations, and can use atomic operations on such locations for (slow) synchronization.

◆ build()

launch_configuration_t cuda::launch_config_builder_t::build ( ) const
inline

Use the information specified to the builder (and defaults for the unspecified information) to finalize the construction of a kernel launch configuration, which can then be passed along with the kernel to a kernel-launching function, e.g.

the standalone kernel::launch or the stream command stream_t::enqueue_t::kernel_launch

◆ device()

launch_config_builder_t& cuda::launch_config_builder_t::device ( const device::id_t  device_id)
inline

Indicate that the intended kernel launch would occur on (some stream in some context on) the specified device.

Such an indication provides this object with some information regarding ranges of possible values for certain parameters (e.g. shared memory size, dimensions).

Note
Do not call both this and the kernel() method; prefer just that one.

◆ dynamic_shared_memory()

launch_config_builder_t& cuda::launch_config_builder_t::dynamic_shared_memory ( kernel::shared_memory_size_determiner_t  shared_mem_size_determiner)
inline

Indicate that the intended launch should allocate additional shared memory for the kernel to use beyond the static amount necessitated by its (compiled) code - with the amount to be determined based on the block size.

Parameters
shared_mem_size_determinera function determining the dynamic shared memory size given the kernel launch block size

◆ kernel()

launch_config_builder_t& cuda::launch_config_builder_t::kernel ( const kernel_t wrapped_kernel_ptr)
inline

Indicate that the specified wrapped kernel will be the one launched with the configuration to be produced by this object.

Such an indication provides this object with information about the device and context in which the kernel is to be launched, and ranges of possible values for certain parameters (e.g. shared memory size, dimensions).

Note
Calling this method obviates a call to the device() method.

◆ min_params_for_max_occupancy()

launch_config_builder_t& cuda::launch_config_builder_t::min_params_for_max_occupancy ( )
inline

This will use information about the kernel and the device to define a minimum launch grid which should guarantee maximum occupancy of the GPU's multiprocessors.

Note
A builder after this call will set the block dimensions - unlike {saturate_with_active_blocks()} .

◆ no_dynamic_shared_memory()

launch_config_builder_t& cuda::launch_config_builder_t::no_dynamic_shared_memory ( )
inline

Indicate that the intended launch should not allocate any shared memory for the kernel to use beyond the static amount necessitated by its (compiled) code.

◆ saturate_with_active_blocks()

launch_config_builder_t& cuda::launch_config_builder_t::saturate_with_active_blocks ( )
inline

This will use information about the kernel, the already-set block size, and the device to create a unidimensional grid of blocks to exactly saturate the CUDA device's capacity for simultaneous active blocks.

Note
This will not set the block size - unlike {min_params_for_max_occupancy()}.

◆ use_maximum_linear_block()

launch_config_builder_t& cuda::launch_config_builder_t::use_maximum_linear_block ( )
inline

Set the intended kernel launch grid to have 1D blocks, of the maximum length possible given the information specified to the builder.

Note
This will fail if neither a kernel nor a device have been chosen for the launch.

The documentation for this class was generated from the following file: