DVFS and power-off controls on a multicore operating … and power-off controls on a multicore...
Transcript of DVFS and power-off controls on a multicore operating … and power-off controls on a multicore...
© 2010 Renesas Electronics Corporation. All rights reserved.
Renesas Electronics Corporation
DVFS and power-off controls on a multicore operating system
Yasuhiro Tawara, Renesas Electronics CorporationAkio Idehara, Mitsubishi Electric Corporation
Hitoshi Yamamoto, Mitsubishi Electric Corporation
MPSoC’10
2 © 2010 Renesas Electronics Corporation. All rights reserved.
Contents
BackgroundTarget chipTarget systemLinux® design and implementationLinux evaluationBattery lifeTemperatureDemoSummaryReferences
Linux is a registered trademark of Linus Torvalds.
3 © 2010 Renesas Electronics Corporation. All rights reserved.
Background: CMOS power consumption (primary term)
C
fVCVIP ×××=×= 2αfVCdtdQI ×××== α/
GND
CMOS logic power consumption mainly comes from charge and discharge of MOS channels.
OFF->ON
P ch
N ch
Power:
OFF->ON
OFF->ON
ON->OFF
ON->OFF
ON->OFF
charg
e
dis
charg
eCapacity :
Electric charge:
Current:
V
VCQ ×= αf
Switching ratio:
Frequency:
Voltage:
4 © 2010 Renesas Electronics Corporation. All rights reserved.
Background: CMOS power consumption
leak
leakdynamictotal
IVfVC
PPP
×+×××≈
+=2α
Dynamic Voltage and Frequency Scaling(DVFS) control
Power-off control
5 © 2010 Renesas Electronics Corporation. All rights reserved.
Background: FV propertyV
olt
ag
e [
V]
Fmax-Vdd shmoo plot tells us frequency and voltage correlation. Choose appropriate points for FV control.
******************************************************************************************************************************************************************************************************************************************* ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
Frequency [Hz]
PassFail
Vd
d
Fmax
fVCP ×××= 2α
6 © 2010 Renesas Electronics Corporation. All rights reserved.
Target chip: SH-4A 8-core prototype chip
Core#2 Core#3
Core#1
Core#4 Core#5
Core#6 Core#7
SNC
0SN
C1
DBSC
DDRPADGCPG
CSM
LB
SC
SHWY
URAM DLRAM
Core#0ILRAM
D$
I$
104.8mm2
(10.61mm x 9.88mm)
Chip Size
17
(8 CPUs, 8 URAMs,
common)
Power
Domains
1.0V–1.4V (internal),
1.8/3.3V (I/O)
Supply
Voltage
6.6mm2
(3.36mm x 1.96mm)
CPU Core
Size
90nm, 8-layer, triple-
Vth, CMOS
Process
Technology
VSWC
Ref. M. Ito, ISSCC2008
7 © 2010 Renesas Electronics Corporation. All rights reserved.
Target chip: Five power modes
clock offpower off
2 additional power modes for leakage power savingResume power-off : URAM is kept powered for fast restartFull power-off: Complete leakage power saving
8 CPUs independently select appropriate power mode
Power modes
CPU
Cache
URAM
Normal LightSleep Sleep Resume
Power-offFull
Power-offclock off power on
activeclock offpower on
8 © 2010 Renesas Electronics Corporation. All rights reserved.
Target chip: 16+1 power domains
Core #3
I$16K
D$16K
CPU FPU
User RAM 64K
Local memory
I:8K, D:32K
Core #2
I$16K
D$16K
CPU FPU
User RAM 64K
Local memoryI:8K, D:32K
Core #1
I$16K
D$16K
CPU FPU
User RAM 64K
Local memory
I:8K, D:32K
Core #0
I$16K
D$16K
CPU FPU
URAM 64K
Local memoryI:8K, D:32K
CCNBAR
LCPG0
On-chip system bus (SuperHyway)
DDR2LCPG: Local clock pulse generator
PCR: Power Control RegisterCCN/BAR:Cache controller/Barrier RegisterURAM: User RAM
Sn
oo
p c
on
tro
ller
1
Sn
oo
p c
on
tro
ller
0
Cluster #0 Cluster #1
PCR3
PCR2
PCR1
PCR0
LCPG1PCR7
PCR6
PCR5
PCR4
controlSRAM
controlDMA
control
Core #7
I$16K
D$16K
CPUFPU
User RAM 64KI:8K, D:32K
Core #6
I$16K
D$16K
CPUFPU
User RAM 64KI:8K, D:32K
Core #5
I$16K
D$16K
CPUFPU
User RAM 64KI:8K, D:32K
Core #4
I$16K
D$16K
CPUFPU
URAM 64K
Local memoryI:8K, D:32K
CCNBAR
9 © 2010 Renesas Electronics Corporation. All rights reserved.
Target chip: Power consumptionP
ow
er
Co
nsu
mp
tio
n (
mW
)
Power consumption of 8 CPU cores All data are measured at room temp. at 1.1V by siliconDynamic power for “Normal” is measured by IDLE-loop
• 304mW is still consumed even when all CPUs are in “Sleep” and leakage power accounts for 70%
• 35mW by URAM leakage for “Resume power-off”saves 88% power compared with “Sleep”
Normal
1214
Lightsleep
216
239
Sleep
88
Resumepower-off
35
Fullpower-off
0
1430
455304
216
Leakage power
Dynamicpower
216
88% reduction
10 © 2010 Renesas Electronics Corporation. All rights reserved.
Target chip: Power control
RESETRESUMEPOFFLSLEEP…
Power Control Register(PCR)in LCPG for each CPU core
Light Sleep Normal
FullPower-off
ResumePower-off
Sleep
Sleep instructionwith LSLEEP=0
Sleep instructionwith LSLEEP=1
Interrupt Interrupt
RESUME=1 POFF=1
RESET=1 RESET=1
Transition time between power modes5us for power-off and 30us for recoveryImmediate transition for Sleep/Light-sleep
11 © 2010 Renesas Electronics Corporation. All rights reserved.
Target system: Voltage Control
1.2V300MHz
1.4V600MHz
1.0V150MHz
1.0V75MHz
Power supply
voltage
Highest frequency of
4 cores
ii
ff max3,...,0
max=
=maxfV
The voltage is common among 4 cores and the highest frequency of 4 cores determines the voltage.
∑∑ ==+≈
3
0
3
02
maxmax i leakfi if iIVfCVP α
12 © 2010 Renesas Electronics Corporation. All rights reserved.
Target system: Power evaluation
Total frequencies and Power consumption of Dhrystone2.1
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 500 1000 1500 2000 2500Σfreq (MHz)
Pow
er
(mW
)
4 CPUs
3 CPUs
2 CPUs
1 CPU
1.4V
1.2V
1.0V
The repeated combinations of CPUs and frequencies are 5H4-1=69. Power consumption has been measured in 69 patterns.
Power efficient for the same total frequencies
13 © 2010 Renesas Electronics Corporation. All rights reserved.
Target system: Power efficiency4 CPU operation is power effective.
1 CPU operation is power ineffective.
Power consumption per MHz (Dhrystone2.1)
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
0 500 1000 1500 2000 2500Σfreq (MHz)
Pow
er
per
MH
z(m
W/M
Hz)
4 CPUs
3 CPUs
2 CPUs
1 CPU
1.4V1.2V1.0
14 © 2010 Renesas Electronics Corporation. All rights reserved.
governorsondemand
Linux design and implementation: CPUFreq
Architecturedependent
performance
conservative
userspace
powersave
cpufreq module
core frequency setting
Architectureindependent
15 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux design and implementation: governors of CPUFreq
t
t
t
t
t
Work load
POWERSAVE
PERFORMANCE
ONDEMAND
CONSERVATIVE
Max freq (600Hz * 4 cores)
Min freq (75Hz * 4 cores)
Min freq (75Hz * 4 cores) if no load
Min freq (75Hz * 4 cores) if no load
Total freq
Total freq
Total freq
Total freq
Gradually changes freq to save power consumption
Max freq immediately with high loadto minimize elapsed time
T0
T0
T0+a
T0* bUSERSPACEManual control by user or application
Sampling overhead time
Sampling overhead time
16 © 2010 Renesas Electronics Corporation. All rights reserved.
Hotplug/Unplug
Linux design and implementation: CPU Hotplug/Unplug
CPU Hotplug/Unplug
core power on core power off Architecturedependent
Architectureindependent
17 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux design and implementation: sysfs (CPU Freq)
CPU FreqTo check the frequency of CPU#0
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq600000
This means that frequency of CPU#0 is 600MHz (600000kHz)To set the frequency of CPU#0 to 75MHz
# echo 75000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
To set the governor to “ondemand” (N=0,1,2,3)# echo ondemand > /sys/devices/system/cpu/cpuN/cpufreq/scaling_governor
To set the sampling rate of “ondemand” to 2-second (N=0,1,2,3)# echo 2000000 > /sys/devices/system/cpu/cpuN/cpufreq/ondemand/sampl
ing_rate
18 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux design and implementation: sysfs (CPU Hotplug/Unplug)
CPU Hotplug/UnplugTo check that CPU#1 is in power-on status.
# cat /sys/devices/system/cpu/cpu1/online1
This means that CPU#1 is in power-on status.To turn off power of CPU#1
# echo 0 > /sys/devices/system/cpu/cpu1/onlineTo check that CPU#1 is power-offer off CPU#1
# cat /sys/devices/system/cpu/cpu1/online0
This means that CPU#1 is in power-off status.To turn on power of CPU#1
# echo 1 > /sys/devices/system/cpu/cpu1/online
19 © 2010 Renesas Electronics Corporation. All rights reserved.
sysfs
Linux design and implementation: Idle Reduction daemon
Idle Reduction(daemon)
CPU HotPlug/UnplugCPUFreq(userspace)
procfs Kernel space
User space
With low work load,Idle Reduction turns off power of a core if it has
no work loads for 2 sampling period
With medium work load, Idle Reduction increases freq of lowest-freq core, decreases freq of highest-freq core, or turns off minimum-freq coredepending on work load
Idle Reduction daemon controls both CPUFreq and CPU HotPlug/Unplug.
With high work load,Idle Reduction firstly turns on cores up to # of threads, or secondly increase freq of lowest-freq core
20 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux evaluation: Summary
Idle Reduction is effective (1) with no work loads.(2) with sparse work loads
i.e., core(s) w/ work loads and core(s) w/o work load are mixed.
Idle Reduction was not always effective(3) with full work loads
i.e., all the cores are fully loaded.There is room for improvement in Idle Reduction.
timetime
po
wer
po
wer
Same energy consumptionfor same area.
Energy consumption and elapsed time evaluated are shown hereafter.
21 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux evaluation: (1) no work loads
No work loads for 10 seconds on SMP Linux with Idle Reduction daemon.
Energy consumption16% reduction compared to POWERSAVE.
100.020.0PERFORMANCE
31.06.2POWERSAVE
31.56.3CONSERVATIVE
31.56.3ONDEMAND
26.05.2Idle Reduction
Performance ratio [%]
Energy consumption [Ws]
governor
22 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux evaluation: (2) sparse work loads
2 threads of RAYTRACE in SPLASH2 ran on 4 cores.
Time18% (8 seconds) increase compared to PERFORMANCE.
Energy consumption16% reduction compared to CONSERVATIVE.
100.0 116.1 100.0 44 PERFORMANCE
108.7 126.2 425.0 187 POWERSAVE
95.7 111.1 165.9 73 CONSERVATIVE
104.7 121.6 118.2 52 ONDEMAND
89.0 103.3 118.2 52 Idle Reduction
Performance ratio [%]
Energy consumption[Ws]
Performance ratio [%]
Time [s]governor
23 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux evaluation: (3) full work loads
4 threads of RAYTRACE in SPLASH2 ran on 4 cores.
Time72% (18 seconds) increase compared to PERFORMANCE.
Energy consumption20% increase compared to CONSERVATIVE.
100.082.4100.0 25PERFORMANCE
88.773.1396.0 99POWERSAVE
74.561.4184.0 46CONSERVATIVE
111.591.9128.0 32ONDEMAND
89.173.4172.0 43Idle Reduction
Performance ratio [%]
Energy consumption[Ws]
Performance ratio [%]
Time [s]governor
24 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux evaluation: DVFS and power control overhead time
CPUFreq (DVSF) transition time
CPU Hotplug/Unplug (power on/off) transition time
92600 -> 300with voltage change
94300 -> 600
38300 -> 600
34600 -> 300with voltage changeUP
84300 -> 600
84600 -> 300without voltage changeSMP
Time [us]Frequency controlVoltage controlkernel
SMP
kernel
57241CPU Hot Add
40366CPU Hot Remove
Time [us]transition
25 © 2010 Renesas Electronics Corporation. All rights reserved.
Battery life: Battery characteristic example
Characteristic of SEALED LEAD ACID BATTERY WP22-12 by Kung Long Batteries Industrial Co.
26 © 2010 Renesas Electronics Corporation. All rights reserved.
Battery life: Battery life model example (@1.1V)
0
10
20
30
40
50
60
70
80
90
100
10.5 10.7 10.9 11.1 11.3 11.5 11.7 11.9 12.1 12.3 12.5 12.7 12.9
Voltage of battery [V]
Batt
ery
lif
e l
eft
[%
]
27 © 2010 Renesas Electronics Corporation. All rights reserved.
Battery life: State transition by battery life leftGovernor changes according to the battery life left. To avoid chattering, downward and upward thresholds are separate.
Time
Beyond upward
thresholdInitial
governor
Governor belowdownward threshold
(e.g. Idle-Reduced-POWERSAVE: 75MHz * 1 core)
Governor beyondupward threshold
(e.g. Idle Reduction)
Below downwardthreshold
Battery life left
Upward threshold
Downward threshold
28 © 2010 Renesas Electronics Corporation. All rights reserved.
Temperature: State transition by temperatureGovernor changes according to the temperature. To avoid
chattering, downward and upward thresholds are separate.
Time
Beyond upward
thresholdInitial
Governor
Governor belowdownward threshold(e.g. Idle Reduction)
Governor beyondupward threshold
(e.g. Idle-Reduced-POWERSAVE: 75MHz * 1 core)
Below downwardthreshold
Temperature
Upward threshold
Downward threshold
29 © 2010 Renesas Electronics Corporation. All rights reserved.
Temperature: System level power consumption
MPSoC = 74 oCDRAM1 = 46 oCDRAM2 = 46 oC
Package temperature = 57 oCBoard temperature = 52 oC
Thermography of board Temperature(oC)
70
65
60
55
50
45
40
35
30
8-coreMPSoC
DRAM1
DRAM2
7 cores are [email protected]
73.8
27.5
30 © 2010 Renesas Electronics Corporation. All rights reserved.
Temperature: Power consumption of multicore SoC
0.0
2.0
4.0
6.0
8.0
10.0
12.0
0 1 2 3 4 5 6 7 8# of CPUs
Pow
er
consu
mption [
W]
1.4V w/o Heat Sink or Fan1.4V w/ Heat Sink and Fan1.2V w/o Heat Sink or Fan1.0V w/o Heat Sink or Fan
(55oC)
(69oC)
(87oC)
(113oC)
(57oC)
(75oC)
(59oC)
(Tc)
Ta=26oC
leak
age
fact
or
Power consumption increased beyond 70oC, which is caused by leak current.
(Tc): case temperaturemeasured with thermocouple
8-coreMPSoC
Ta : air temperature
31 © 2010 Renesas Electronics Corporation. All rights reserved.
0
20
40
60
80
100
120
0 2 4 6 8 10 12 14
Power consumption [W]
MPS
oC
Tc
[oC
]
1.0V w/o Heat Sink or Fan
1.2V w/o Heat Sink or Fan
1.4V w/o Heat Sink or Fan
1.4V w/ Heat Sink and Fan
1.6V w/ Heat Sink and Fan
Temperature: Power consumption of multicore SoC
w/ Heat Sink and Fanw/o H
eat S
ink o
r Fan
Cooling MPSoC with heat sink and fan reduced power consumption of MPSoC.
32 © 2010 Renesas Electronics Corporation. All rights reserved.
Temperature: System level thermal simulation
LED83 oC
LDO77 oC
LDO68 oC
MPSoC (with heat sink & fan)64 oC
DRAM53 oC
DRAM48 oC
Thermal simulation of board
Room temperature=25 oC
70.9
58.8
46.7
34.6
Temperature(oC)
83.0
33 © 2010 Renesas Electronics Corporation. All rights reserved.
Temperature: System level power consumption
MPSoC
DC-DC converter
DRAM
External I/O,etc.
39%
29%
20%
12%
34 © 2010 Renesas Electronics Corporation. All rights reserved.
Temperature: System box air flow & thermal simulation
HDDCD
ATXPowerUnit
MPSoC35 oC
Fan
68.8
54.2
39.6
25.0
83.4
Air flow and thermal simulation of system box
Temperature(oC)
Box
Board
35 © 2010 Renesas Electronics Corporation. All rights reserved.
Temperature: Box design
Po
wer
con
sum
pti
on
[W
]
Volume of box is increased as power consumption of a system is increased. E.g. 10W system requires 1 liter box.
Natural air cooling limitation graph(by Naoki Kunimine, Thermal Design Lab.)
Natural-convection air cooling domain
Forced-convection air cooling domain
0.1 1 10 100
100
10
1
1000
Volume of box (liter)
36 © 2010 Renesas Electronics Corporation. All rights reserved.
Demo: DVFS and power-off controls on SMP Linux
CPU#0 load (%)
CPU#1 load (%)
CPU#2 load (%)
CPU#3 load (%)
Battery life left (%)
Temperature (oC)
Power control governor
Sum of frequencies of 4 CPUs
Power consumption (W)
Power consumption (W)(average per second)
37 © 2010 Renesas Electronics Corporation. All rights reserved.
Linux CPUFreq and CPU Hotplug/Unplug are integrated into “Idle Reduction” to control DVFS and power-off capabilities of multicore SoC.
* 16% power reduction compared to CPUFreq powersave governor when all the cores have no loads
* 7% power reduction compared to CPUFreq conservative governor when half of the cores have no loads
* 20% power increase compared to CPUFreq conservative governor when all the cores have loads. Optimization of parameters of “Idle Reduction” is still necessary.
MPSoC is and will be the greatest power eater on the system as the number of cores on a chip increases. Integration of DVFS control and power-off control will be one of the solutions against increasing power consumption.
Summary:
38 © 2010 Renesas Electronics Corporation. All rights reserved.
[1] Akio Idehara, Yasuhiro Tawara, Hitoshi Yamamoto, Haruyuki Ohtani and Shinichi Ochiai, “Idle Reduction: Dynamic Power Manager for Embedded Multicore Processor”, pp.5-12, ESS2009, Information Processing Society of Japan (2009)
[2] Akio Idehara, Yasuhiro Tawara, Hitoshi Yamamoto, Naoto Sugai and Tsuyoshi Iizuka, “An Evaluation of Dynamic Power Management support of SMP Linux for embedded multicore processor”, ESS2008, pp.115-123 , Information Processing Society of Japan(2008).
[3] Ito, M., Hattori, T., Yoshida, Y. et al.: An 8640 MIPS SoC with Independent Power-off Control of 8 CPUs and 8 RAMs by an Automatic Parallelizing Compiler, ISSCC Dig. Tech. Papers, pp.90-91, (2008).
[4] Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta.: The SPLASH-2 Programs: Characterization and Methodological Considerations, In Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24-36, (1995).
[5] L. Yan, j. Luo, and N. K. Jha: Joint Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems, IEEE Trans. on CAD, vol.24, no.7, pp.1003-1041, (2005)
[6] Tohru Ishihara, Hiroyuki Tomiyama, “Software Techniques for Low Power Embedded Systems”, Proc. of Embedded Software Symposium Vol.2005, No.12, pp.188-190, (2005): Principles of CMOS VLIS design, Addision-Wesley, (1993)
[7] IBM and Montavista: Dynamic power management for emedded systems, <http://www.research.ibm.com/arl/publications/papers/DPM_V1.1.pdf> (accessed 2009-06-08)
[8] Brocks, B. and Rajamani, K.: Dynamic Power Management for Embedded Systems, Proceedings of the IEEE International SOC Conference, pp.416-419, (2003).
References:
© 2010 Renesas Electronics Corporation. All rights reserved.
Renesas Electronics Corporation