DASH  0.3.0
dash::BytesPerCycleMeasure Class Reference

Static Public Member Functions

static std::vector< double > unit_weights (const TeamLocality_t &tloc)
 Shared memory bandwidth capacities of every unit factored by the mean memory bandwidth capacity of all units in the team. More...
 

Detailed Description

Definition at line 63 of file LoadBalancePattern.h.

Member Function Documentation

◆ unit_weights()

static std::vector<double> dash::BytesPerCycleMeasure::unit_weights ( const TeamLocality_t tloc)
inlinestatic

Shared memory bandwidth capacities of every unit factored by the mean memory bandwidth capacity of all units in the team.

Consequently, a vector of 1's is returned if all units have identical memory bandwidth.

The memory bandwidth balancing weight for a unit is relative to the bytes/cycle measure of its affine core and considers the lower bound ("maximum of minimal") throughput between the unit to any other unit in the host system's shared memory domain.

This is mostly relevant for accelerators that have no direct access to the host system's shared memory. For example, Intel MIC accelerators are connected to the host with a 6.2 GB/s PCIE bus and a single MIC core operates at 1.1 Ghz with 4 hardware threads. The resulting measure (bytes/cycle) is calculated as:

Mpk = 6.2 GB/s Cpk = 1.1 Ghz * 4 = 4.4 G cycles/s BpC = Mpk / Cpk = 5.63 bytes/cycle

The principal idea is that any data used in operations on the MIC target must be moved over the slow PCIE interconnect first. The offload overhead therefore reduces the amount of data assigned to a MIC accelerator, despite its superior ops/s performance.

Definition at line 98 of file LoadBalancePattern.h.

100  {
101  std::vector<double> unit_mem_perc;
102 
103 #if 0
104  // TODO: Calculate and assign neutral weights for units located at
105  // cores with unknown memory bandwidth.
106 
107  std::vector<size_t> unit_mem_capacities;
108  size_t total_mem_capacity = 0;
109 
110  // Calculate average memory bandwidth first:
111  for (auto u : tloc.units()) {
112  auto & unit_loc = tloc.unit_locality(u);
113  size_t unit_mem_cap = std::max<int>(0, unit_loc.max_shmem_mbps());
114  if (unit_mem_cap > 0) {
115  total_mem_capacity += unit_mem_cap;
116  }
117  unit_mem_capacities.push_back(unit_mem_cap);
118  }
119  if (total_mem_capacity == 0) {
120  total_mem_capacity = tloc.units().size();
121  }
122  DASH_LOG_TRACE_VAR("LoadBalancePattern.init_mem_bandwidth_weights",
123  total_mem_capacity);
124  DASH_LOG_TRACE_VAR("LoadBalancePattern.init_mem_bandwidth_weights",
125  unit_mem_capacities);
126 
127  double avg_mem_capacity = static_cast<double>(total_mem_capacity) /
128  tloc.units().size();
129 
130  // Use average value for units with unknown memory bandwidth:
131  for (auto membw = unit_mem_capacities.begin();
132  membw != unit_mem_capacities.end(); ++membw) {
133  if (*membw <= 0) {
134  *membw = avg_mem_capacity;
135  }
136  }
137 #endif
138 
139  std::vector<double> unit_bytes_per_cycle;
140  double total_bytes_per_cycle = 0;
141 
142  // Calculating bytes/cycle per core for every unit:
143  for (auto u : tloc.global_units()) {
144  auto unit_loc = tloc.unit_locality(u);
145  double unit_mem_bw = std::max<int>(0, unit_loc.max_shmem_mbps());
146  double unit_core_fq = unit_loc.num_threads() *
147  unit_loc.cpu_mhz();
148  double unit_bps = unit_mem_bw / unit_core_fq;
149  unit_bytes_per_cycle.push_back(unit_bps);
150  total_bytes_per_cycle += unit_bps;
151  }
152 
153  double avg_bytes_per_cycle =
154  static_cast<double>(total_bytes_per_cycle) / tloc.global_units().size();
155 
156  unit_mem_perc.reserve(unit_bytes_per_cycle.size());
157  for (auto unit_bps : unit_bytes_per_cycle) {
158  unit_mem_perc.push_back(unit_bps / avg_bytes_per_cycle);
159  }
160  return unit_mem_perc;
161  }

The documentation for this class was generated from the following file: