Accurate Power and Energy Measurement on Kepler -based Tesla GPUs
description
Transcript of Accurate Power and Energy Measurement on Kepler -based Tesla GPUs
![Page 1: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/1.jpg)
Accurate Power and Energy Measurementon Kepler-based Tesla GPUs
Martin BurtscherDepartment of Computer Science
![Page 2: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/2.jpg)
2
Introduction GPU-based accelerators
Quickly spreading in PCs and even handheld devices Widely used in high-performance computing
Power and energy efficiency Heat dissipation is a problem Electric bill and battery life are of growing concern Exascale requires 50x boost in performance per watt
Important research area Need to develop techniques to reduce power and energy Have to be able to measure power/energy of programs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 3: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/3.jpg)
3
GPU Power Sensors
Hardware High-end compute GPUs include power sensors For example, K20/K40 Tesla cards have built-in sensor These cards are the target of this talk
Software Can query sensor with NVIDIA Management Library http://developer.nvidia.com/nvidia-management-library-nvml
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 4: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/4.jpg)
4
Problems
Power sensor data behaves strangely Running the same kernel twice yields different energy
First launch: 114 J, second launch: 147 J (29% more energy) Running a kernel 2x as long more than doubles energy
1x input: 732 J, 2x input: 1579 J (8% above doubling)
Power sensor sampling rate varies greatly Ranges from 0.266 ms to 130 ms (7.7 Hz to 3760 Hz)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 5: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/5.jpg)
5
Methodology Hardware
Two K20c, two K20m, two K20X, and two K40m GPUs
Measurement Query power and time in loop on “idle” CPU core
Test code Compute-intensive regular n-body kernel Constant computation rate of over 2 TFlops on a K20c No data dependences; vary n to adjust kernel runtime
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 6: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/6.jpg)
6
Expected Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Kernel starts executing
Kernel stops executing
GPU idle power
Measurement loop runtime
![Page 7: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/7.jpg)
7
Measured Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Power ramps up slowly
Power ramps down slowly
Switch to step shape
Idle power reached
Macroscopic phenomena
5s 3s 4s
![Page 8: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/8.jpg)
8
Energy = Area Under Power Curve
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Integrate to where?
Unclear how big energy is
Missing energy? Delayed
energy?
![Page 9: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/9.jpg)
9
Ramp-up Behavior of 2 Short Runs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Short run same as longer run
2nd run starts higher but also follows curve
Ramp down doesn’t follow
![Page 10: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/10.jpg)
10
Ramp-down Behavior of Several Runs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
16.2 17.2 18.2 19.2 20.2 21.2 22.2 23.2
Mea
sure
d Po
wer
[W]
Shifted Runtime [s]
t2 t3 t4
Shape depends on power at t2
Power increases after kernel done
Shape always the same
Steps down every second
Driver lowers power level
![Page 11: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/11.jpg)
11
Sampling Interval Lengths
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
10
20
30
40
50
60
70
80
0
20
40
60
80
100
120
140
160
10.7 12.0 13.3 14.6 15.9 17.2 18.5 19.8 21.1 22.4 23.7
Sam
plin
g Int
erva
l [m
s]
Mea
sure
d Po
wer
[W]
Runtime [s]
t1 t2 t3 t4
Short intervals
Wide range of intervals
Very long interval
Driver activity can prevent sampling
![Page 12: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/12.jpg)
12
Sampling Interval Lengths (zoomed-in)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
2
4
6
8
10
12
0
20
40
60
80
100
120
12.030 12.035 12.040 12.045 12.050 12.055 12.060
Sam
plin
g Int
erva
l [m
s]
Mea
sure
d Po
wer
[W]
Runtime [s]
Identical values
Many short intervals
Very long interval
Sampled power only ever changes after long interval
![Page 13: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/13.jpg)
13
Correcting the Measurements
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 14: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/14.jpg)
14
Sampling Frequency Eliminate redundant samples
Only sample once every 15 ms (66.7 Hz) Cannot accurately measure kernels under ~150 ms
Account for the variation in interval length Use high-resolution time stamps
Example: energy from t1 to t4
Dotted (fixed intervals): 1205 J Solid (variable intervals): 1066 J 13% discrepancy
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
10.7 12.0 13.3 14.6 15.9 17.2 18.5 19.8 21.1 22.4 23.7
Mea
sure
d Po
wer
[W]
Runtime [s]
t1 t4
![Page 15: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/15.jpg)
15
True Power Sensor hardware
Seems to asymptotically approach true power Reminiscent of capacitor charging
True instant power Ptrue is a function of the slope of the power profile
dP/dt and the power measured by the sensor Psensor
Ptrue = Psensor + C × dPsensor/dt “Capacitance” of sensor
C ≈ 0.84 s on all tested K20 GPUs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 16: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/16.jpg)
16
Back-calculated from Expected Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
‘Capacitor’ function matches measured
values perfectly
Minimized absolute errors to determine C
![Page 17: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/17.jpg)
17
Corrected Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
13 14 15 16 17 18 19 20 21
Pow
er [W
]
Time [s]
t1 t2 t3
Wobbles due to sampling errors
Corrected profile matches expected rectangular profile
‘Active idle’ power level
![Page 18: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/18.jpg)
18
Correction of 2 Short Runs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
111 112 113 114 115 116 117 118 119
Pow
er [W
]
Time [s]
t1a t2b t3bt1bt2a
Corrected power profile matches expected profile
![Page 19: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/19.jpg)
19
Second K20c GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5
Pow
er [W
]
Time [s]
t1 t2 t3
Identical to original K20c
![Page 20: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/20.jpg)
20
K20m GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
180
62.7 63.7 64.7 65.7 66.7 67.7 68.7 69.7
Pow
er [W
]
Time [s]
t1 t2 t3
Similar profile but higher power level
![Page 21: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/21.jpg)
21
K20X GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
180
200
128 129 130 131 132 133 134 135 136 137
Pow
er [W
]
Time [s]
t1 t2 t4
Profile is good, no correction needed!
Huge 600 ms gap
![Page 22: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/22.jpg)
22
K40m GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
K40m again requires correction
![Page 23: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/23.jpg)
23
Application to Full CUDA Program
Implementation of Barnes Hut n-body algorithm Taken from LonestarGPU benchmark suite Contains multiple regular and irregular kernels Highly optimized, but still suffers from load imbalance,
divergence, and uncoalesced accesses Main kernel is ‘regularized’ (warp-based)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
NASA/JPL-Caltech/SSC
![Page 24: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/24.jpg)
24
Barnes Hut Power Profile (1 Step)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Slow then fast drop-off
“Wave” in profile Original profile is
hard to interpret
![Page 25: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/25.jpg)
25
Barnes Hut Power Profile (Kernels)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Slow then fast drop-off
“Wave” in profile Original profile is
hard to interpret
![Page 26: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/26.jpg)
26
Corrected Barnes Hut Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
61.7 62.7 63.7 64.7 65.7 66.7 67.7 68.7
Pow
er [W
]
Time [s]
a b cd ef
Decrease due to load imbal.
Two similar irreg. kernels
One more irreg. kernel
Very short regular kernel
Corrected profile reveals important info
Regularized main kernel
![Page 27: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/27.jpg)
27
K20Power Tool Output
Corrected profile and corresponding ‘active’ energy Features
Computes instant power using ‘capacitor’ formula Employs high-resolution time steps Samples at true frequency of 66.7 Hz
Dissemination Open source, research license http://cs.txstate.edu/~burtscher/research/K20power/
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 28: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/28.jpg)
28
Marcher System Tool will be part of Marcher system at Texas State
NSF-funded green computing infrastructure Marcher is a power-measurable cluster system
832 general-purpose cores 12,000 GPU and MIC cores 1.2 TB of DDR3 with power throttling and scaling 50 TB of hybrid storage with hard drives and SSDs Component-level power measurement tools (e.g.,
CPU, DRAM, Disk, GPU, Xeon Phi)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 29: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/29.jpg)
29
Summary Correctly measuring K20/K40 power and energy
Sample at 66.7 Hz and include time stamps Compute true power with presented formula
Use neighboring power samples to approximate slope Compute true energy by integrating true power
Over intervals where power is above ‘active idle’
K20Power tool Software tool that implements this methodology
Paper at http://cs.txstate.edu/~burtscher/papers/gpgpu14.pdf
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
![Page 30: Accurate Power and Energy Measurement on Kepler -based Tesla GPUs](https://reader035.fdocuments.in/reader035/viewer/2022081513/5681691c550346895de03f8e/html5/thumbnails/30.jpg)
30
Acknowledgments Collaborators
Ivan Zecena and Ziliang Zong U.S. National Science Foundation
DUE-1141022, CNS-1217231, and CNS-1305359 NVIDIA Corporation
Grants and equipment donations Texas State University
Research Enhancement Program
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Nvidia