- File block.cuh
- Some of these assume linear grids, others do not - sort them out
- Member kat::builtins::bit_field::extract_bits (I bit_field, unsigned int start_pos, unsigned int num_bits)=delete
- CUB 1.5.2's BFE wrapper seems kind of fishy. Why does Duane Merill not use PTX for extraction from 64-bit fields? For now only adopting his implementation for the 32-bit case.
- Class kat::collaborative::detail::elements_per_lane_in_full_warp_write< T >
- : Can't we assume that T is a POD type, and just have lanes not write complete T's?
- Member kat::collaborative::warp::active_lanes_atomically_increment (T *counter)
- extend this to other atomic operations
- Member kat::collaborative::warp::elementwise_accumulate_n (AccumulatingOperation op, D *__restrict__ destination, RandomAccessIterator __restrict__ source, Size length)
consider taking a GSL-span-like parameter isntead of a ptr+length
Some inclusions in the block-primitives might only be relevant to the functions here; double-check.
consider using elementwise_apply for this.
Some inclusions in the block-primitives might only be relevant to the functions here; double-check.
consider using elementwise_apply for this.
- Member kat::collaborative::warp::reduce (T value, AccumulationOp op)
- offer both an inclusive and an exclusive versionn
- Class kat::dimensions_t
- consider templating this on the number of dimensions.
- Member kat::lane_mask_t
- : Consider using a 32-bit bit field
- Member kat::linear_grid::collaborative::block::elementwise_accumulate_n (AccumulatingOperation op, D *__restrict__ destination, RandomAccessIterator __restrict__ source, Size length)
consider taking a GSL-span-like parameter isntead of a ptr+length
Some inclusions in the block-primitives might only be relevant to the functions here; double-check.
consider using elementwise_apply for this.
Some inclusions in the block-primitives might only be relevant to the functions here; double-check.
consider using elementwise_apply for this.
- Member kat::linear_grid::collaborative::block::scan_and_reduce (T *__restrict__ scratch, T value, AccumulationOp op, T &scan_result, T &reduction_result)
consider returning a pair rather than using non-const references
lots of code duplication with just-scan
add a bool template param allowing the code to assume the block is full (this saves a few ops)
- Member kat::linear_grid::collaborative::warp::multisearch (const T &lane_needle, const T &lane_hay_straw)
Does it matter if the needles, as opposed to the hay straws, are sorted? I wonder.
consider specializing for non-full warps
Specialize for smaller and larger data types: For larger ones, compare 4-byte parts of the datum separately (assuming
consider specializing for non-full warps
Specialize for smaller and larger data types: For larger ones, compare 4-byte parts of the datum separately (assuming
- Member kat::reinterpret (Original &x)
- Would it be better to return a reference?
- Member kat::swap (T &a, T &b) noexcept(std::is_nothrow_move_constructible< T >::value &&std::is_nothrow_move_assignable< T >::value)
- How does EASTL swap work? Should I incorporate its specializations?
- File warp.cuh
- Some inclusions in the warp-primitives might only be relevant to the functions here; double-check.
- File warp.cuh
- Some of these assume linear grids, others do not - sort them out.
- Use a lane index type