MeterPU: A Generic Measurement Abstraction API · MeterPU: A Generic Measurement Abstraction API...
Transcript of MeterPU: A Generic Measurement Abstraction API · MeterPU: A Generic Measurement Abstraction API...
MeterPU:A Generic Measurement Abstraction API
– Enabling Energy-tuned Skeleton Backend Selection
To appear in Proc. REPARA’15 workshop at ISPA’15, Helsinki, Finland. IEEE.
Lu Li, Christoph [email protected], [email protected]
1 / 13
Motivation
Energy MeasurementNecessary for energy modeling and optimization.Difficult, especially for GPUs
Power visualization (Li et al., 2015 [4]) for a CUDA program.Illustrating "capacitor effect". Data obtained by Nvidia NVML.
6
6.5
7
7.5
8
8.5
9
·104
Pow
er(m
W)
40
42
44
46
48
50Tem
perature
30
32
34
36
Fan
speed
65 70 75 80 85
Time(s)
Requires numerical postprocessing(Burtscher et al., 2014 [1]) to calculate real energy cost
Goal: as simple as measuring time in good old days.2 / 13
MeterPU
A software multimeterA generic measurement abstraction,hiding platform-specific measurement details.Four simple functions to unify measurement interface forvarious metrics on different hardware components.
Time, Energy on CPU, GPUEasy to extend: FLOPS, cache misses etc.
On top of native measurement libraries (plug-ins).CPU time: clock_gettime()GPU time: cudaEventRecord();CPU and DRAM energy: Intel PCM library.GPU energy: Nvidia NVML library....
3 / 13
MeterPU API
template<cl ass T ype> / / analogous to swi tch/ / on a r eal mul t imeter
cl ass Meter{
publ i c :void st ar t ( ) ; / / st ar t a measurementvoid stop ( ) ; / / stop a measurementvoid cal c ( ) ; / / cal cul at e a met r i c value
/ / of a code regi ontypename Meter_ T raits<Type>:: R esultT ype const &
get_ value( ) const ;/ / get the cal cul at ed met r i c value
pr i vat e :/ / P lat form speci f i c measurement det ai l s hidden . . .
}
4 / 13
An MeterPU Application: Measure CPU Time
#incl ude <MeterPU .h>
i nt main ( ){
cpu_ func ( ) ; / / Do sth here
}
#incl ude <MeterPU .h>
i nt main ( ){
using namespace MeterPU ;Meter<CPU_ Time> meter ;
cpu_ func ( ) ; / / Do sth here
}
#incl ude <MeterPU .h>
i nt main ( ){
using namespace MeterPU ;Meter<CPU_ Time> meter ;
meter . st ar t ( ) ; / / Measurement Star t !
cpu_ func ( ) ; / / Do sth here
meter . stop ( ) ; / / Measurement Stop !
}
#incl ude <MeterPU .h>
i nt main ( ){
using namespace MeterPU ;Meter<CPU_ Time> meter ;
meter . st ar t ( ) ; / / Measurement Star t !
cpu_ func ( ) ; / / Do sth here
meter . stop ( ) ; / / Measurement Stop !
meter . cal c ( ) ;
}
#incl ude <MeterPU .h>
i nt main ( ){
using namespace MeterPU ;Meter<CPU_ Time> meter ;
meter . st ar t ( ) ; / / Measurement Star t !
cpu_ func ( ) ; / / Do sth here
meter . stop ( ) ; / / Measurement Stop !
meter . cal c ( ) ;BUILD_ CPU_ TIME_ MODEL( meter . get_ value( ) ) ;
}
5 / 13
An MeterPU Application: Measure GPU Energy
#incl ude <MeterPU .h>
i nt main ( ){
using namespace MeterPU ;/ / Only one l i ne d i f f er s ! ! ! !Meter<NVML_ Energy<> > meter ;
meter . st ar t ( ) ; / / Measurement Star t !
cuda_ func<<< . . . , . . . >>>(...) ;cudaDeviceSynchronize( ) ;
meter . stop ( ) ; / / Measurement Stop !
meter . cal c ( ) ;BUILD_GPU_ ENERGY_MODEL( meter . get_ value( ) ) ;
}
6 / 13
Measure Combinations of CPUs and GPUs
#incl ude <MeterPU .h>
i nt main ( ){
using namespace MeterPU ;/ / Only one l i ne d i f f er s ! ! ! !Meter< System_ Energy<0> > meter ;
meter . st ar t ( ) ; / / Measurement Star t !
async_ cpu_ func ( ) ;cuda_ func<<< . . . , . . . >>>(...) ;wait_ for_ cpu_ func_ to_ finish ( ) ;cudaDeviceSynchronize( ) ;
meter . stop ( ) ; / / Measurement Stop !
meter . cal c ( ) ;BUILD_ SYSTEM_ ENERGY_MODEL( meter . get_ value( ) ) ;
}
7 / 13
Unification → Reuse Legacy Autotuning Framework
Figure 1 : Unification allows empirical autotuning framework to switch tomultiple meter types on different hardware components.
8 / 13
SkePU Reduce Skeleton (A Single Skeleton Call)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2e+02 1e+03 5e+03 5e+04 5e+05
Problem size for reduce
1
10
100
1000
10000T
ime(
us):
ave
rage
of 1
00 r
uns ●
CPUOMPCUDASelection
(a) Time tuning forReduce.
●●
●●
●
●
●
●
●
●
● ● ● ●
●●
●
●
●
●
2e+03 1e+04 5e+04 2e+05
Problem size for reduce
100
200
500
1000
Ene
rgy(
mill
iJ):
ave
rage
of 1
000
runs
●
CPUOMPCUDASelection
(b) Energy tuning forReduce.
Further results see paperEmpirical autotuning framework for time optimizationreused for energy optimization.
9 / 13
LU decomposition (Multiple Skeleton Calls)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
20 30 40 50
Problem size for LU decomposition
200
500
1000
2000
5000
10000
Tim
e(us
): a
vera
ge o
f 300
0 ru
ns ●
CPUOMPCUDASelection
(a) Time tuning for LUdecomposition
● ●●
●
●
●
●
● ●●
●
●
●
●
5 10 20 50 100
Problem size for LU decomposition
100
200
500
1000
2000
5000
10000
Ene
rgy(
mill
iJ):
ave
rage
of 1
000
runs
●
CPUOMPCUDASelection
(b) Energy tuning for LUdecomposition
Easy switching for optimization goals.
10 / 13
Related Work
Other measurement abstraction software:EML (Cabrera et al., 2015 [2])REPARA (Akos et al., 2015 [3])
MeterPU:Requires much fewer lines of code (LOC) to use.Very low overhead, negligible.
Only one function callCan even be inlined for some meter types.
11 / 13
Pluggable to Other Work
Some MeterPU Use CasesEnergy-tuned SkePUEnergy comparison between SkePU’s Myriad Backend andNvidia GPUGlobal Composition FrameworkTunePU...
12 / 13
Summary of Contributions
Hide complexity of platform-specific energy measurement,especially for GPUs.MeterPU enables to reuse legacy empirical autotuningframeworks, such as the one in SkePU.With MeterPU, SkePU offers the first energy-tunedskeletons, as far as we know.Switching optimization goal can be easy,facilitates building time and energy models formulti-objective optimization.
, Open source, download at:http://www.ida.liu.se/labs/pelab/meterpu
13 / 13
Bibliography
Martin Burtscher, Ivan Zecena, and Ziliang Zong.Measuring GPU power with the K20 built-in sensor.In Proc. Workshop on General Purpose Processing Using GPUs (GPGPU-7). ACM, March 2014.
Alberto Cabrera, Francisco Almeida, Javier Arteaga, and Vicente Blanco.Energy measurement library (eml) usage and overhead analysis.In 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP),pages 554–558, March 2015.
Akos Kiss, Marco Danuletto, Zoltan Herczeg, Akos Kiss, Peter Molnar, Robert Sipka, Massimo Torquati, andLaszlo Vidacs.D6.4: REPARA performance and energy monitoring library.Technical report, c©The REPARA Consortium, March 2015.
Lu Li and Christoph Kessler.Validating Energy Compositionality of GPU Computations.In Proc. HIPEAC Workshop on Energy Efficiency with Heterogeneous Computing (EEHCO-2015), January2015.
13 / 13