2430-2431 BENL 2431 - · PDF fileTitle: 2430-2431_BENL_2431.pdf Created Date: 20171110100028+01'
TKTTKT--2431 Soc 2431 Soc Design - TUT · TKTTKT--2431 Soc 2431 Soc Design Lec 11 Lec 11 ––...
Transcript of TKTTKT--2431 Soc 2431 Soc Design - TUT · TKTTKT--2431 Soc 2431 Soc Design Lec 11 Lec 11 ––...
TKTTKT--2431 Soc 2431 Soc DesignDesign
Lec 11 Lec 11 –– Energy consumptionEnergy consumption
ErnoErno SalminenSalminen, , TeroTero ArpinenArpinen
Department of Computer SystemsDepartment of Computer SystemsTampere University of TechnologyTampere University of TechnologyTampere University of TechnologyTampere University of Technology
Fall 2010Fall 2010
Remember the Guest lecture+Conclusions Wed 1 12 2010 at 10:15
Department of Computer SystemsErno Salminen - Nov. 2010
Remember the Guest lecture+Conclusions Wed 1.12.2010 at 10:15
ContentsContentsPower consumption breakdownLow-power design at system levelLow power design at system levelDynamic power management Clock gating power supple shutdown Clock gating, power supple shutdown Dynamic voltage/frequency scaling Low-power operating modesp p g Prediction methods ACPI
#2/67 Department of Computer SystemsErno Salminen - Nov. 2010
Copyright noticeCopyright notice
Part of the slides S. Dey, VLSI Advanced Topics, course material, S. Dey, VLSI Advanced Topics, course material,
UCSD http://ece-classweb.ucsd.edu/ecewebs/year2003-
2004/S i 04/ 260 /2004/Spring04/ece260c/
Part of figures from L Benini A Bogliolo G De Micheli A Survey of L. Benini, A. Bogliolo, G. De Micheli, A Survey of
Design Techniques for System-Level Dynamic Power Management, TVLSI, Vol. 8, No. 3, June 2000, pp. 299-316
#3/67 Department of Computer SystemsErno Salminen - Nov. 2010
At firstAt first
Make sure that simple things worksimple things work before even tryingbefore even trying more complex onesmore complex ones
You should believe this by now
#4/67 Department of Computer SystemsErno Salminen - Nov. 2010
You should believe this by now...
CitationCitation “The power problem is the No. 1 issue in
the long-term for computing. It's time for us to g p gstop making 6-mileper-gallon gas guzzlers. […] Now you're going to see the great un-marketing of megahertz because it doesn't matter anymore.''
G P d l Chi f T h l Offi f Greg Papadopoulos, Chief Technology Officer for Sun Microsystems,“Chip Makers Feel Heat to Solve Power Problem”, San Jose Mercury News,Solve Power Problem , San Jose Mercury News, July 2nd 2004
#5/67 Department of Computer SystemsErno Salminen - Nov. 2010
Motivation (1)Motivation (1) asd
#6/67 Department of Computer SystemsErno Salminen - Nov. 2010
S. Dey, Design of Low-Power, Battery-Efficient Systems, ECE206C course material, UCSD, 2004.
Motivation (2)Motivation (2) A large and increasing number of devices
are battery driven Desktop PCs
(100-300 W)( )
#7/67 Department of Computer SystemsErno Salminen - Nov. 2010
S. Dey, Design of Low-Power, Battery-Efficient Systems, ECE206C course material, UCSD, 2004.
Motivation (3)Motivation (3)
Batteries evolve at HDevolve at lower rate than other
HD
CPU
than other parts
Fig: [John Hockenberry, batteryg [ y,Building a better battery, Wired, iss. 14.11, Nov 2006]
#8/67 Department of Computer SystemsErno Salminen - Nov. 2010
Motivation (4) : CoolingMotivation (4) : Cooling
[G. Lawton, Powering Down the Computing Infrastructure, Computer, Vol. 40,
#9/67 Department of Computer SystemsErno Salminen - Nov. 2010
[ , g p g , p , ,Iss. 2, Feb. 2007, pp. 16 - 19. ]
Power consumption Power consumption Power consumption Power consumption breakdownbreakdown
Department of Computer SystemsErno Salminen - Nov. 2010
Power breakdown: laptop Power breakdown: laptop
#11/67 Department of Computer SystemsErno Salminen - Nov. 2010
S. Dey, Design of Low-Power, Battery-Efficient Systems, ECE206C course material, UCSD, 2004.
Power breakdown: Imagine stream Power breakdown: Imagine stream processing chipprocessing chipp g pp g p
Smart memory hierarchy:
memories ~21%
#12/67 Department of Computer SystemsErno Salminen - Nov. 2010
Mattan Erez, Stream Architectures –Programmability and Efficiency,
Tampere SoC, Nov. 17 2004
Power Power consumptionconsumption in CMOS (1)in CMOS (1) Two measures
Peak power consumption Average power consumption Average power consumption
Usually more interesting than peak power However, large peaks degrade battery life-time and cause
electronmigrationelectronmigration Pavg = Pdynamic + Pshort + Pleakage
idyn
Vout
ishort
Vin
Cout
out
ishort
#13/67 Department of Computer SystemsErno Salminen - Nov. 2010
a) Dynamic b) Short circuitc) Leakage
W. Burleson, ECE 679V, course material, 2002, http://vsp2.ecs.umass.edu/~amaheshw/697v/slides/lecture3.ppt
Power Power consumptionconsumption in CMOS (2)in CMOS (2) Pdynamic has been dominant in CMOS (~50-90%) Leakage power likely has increased with smaller
geometries! E.g. Pshort + Pleakage = ~10% @130nm but 40% @ 65nm
P = K * C * V 2 * f Pdynamic = K Cout Vdd2 f
K = avg transitions on node per clock cycle Cout = driven output capacitance of node HUOM! OBS!
Vdd = supply voltage f = operating frequency
Faulty circuits may have also P
Muy importante!
Faulty circuits, may have also Pstatic E.g. there is DC from Vdd to GND, if gate of PMOS is
stuck-at-zero
#14/67 Department of Computer SystemsErno Salminen - Nov. 2010
Sources of power consumptionSources of power consumption Dynamic power
dominates in logic L k Leakage power
dominates in memory Small activity Dynamic: access one 32b
word at a time in 32MB memory
Leakage: in all other 32MB-4B mem cells
Also in devices that are tl i t d b ll
Claude Schmitt, Panels discussion “It's About Power - Performance and
mostly in stand-by, e.g. cell phones
Different methods must b li d
#15/67 Department of Computer SystemsErno Salminen - Nov. 2010
area alone don't quite cut it anymore!”, DATE 2/14/2005.be applied
Reducing dynamic powerReducing dynamic power Minimize Pdynamic = K * Cout * Vdd
2 * f Hence, minimize Hence, minimize
1. activity K2. the amount of logic (capacitance) Coutg ( p )3. supply voltage Vdd – quadratical impact!4. frequency f – aim for ”just fast enough”5. combination of the above
Parameters are coupled E.g. high f, requires large Vdd Parallel processing may increase C but lowers f
#16/67 Department of Computer SystemsErno Salminen - Nov. 2010
and Vdd
Capacitance and switching minimizationCapacitance and switching minimization
Minimize K, i.e. useless switching K depends on input sequence Disable new values from entering the logic when results are Disable new values from entering the logic when results are
not needed
Cout= Cfo + Cw + Cpp Cfo = input capacitances of fan-out gates (~50%) Cw = wiring capacitance (~40%, increases with new
technologies), hard to estimate before placement Cp = parasitic capacitance of driving gate itself (~ 10%)
No need to minimize C if it is rarely switched C = K * C effective capacitance Ceff = K Cout, effective capacitance
Cout might increase when Ceff minimized Still beneficial
#17/67 Department of Computer SystemsErno Salminen - Nov. 2010
Power supply minimizationPower supply minimization Supply voltage has big effect Designer can rarely change the voltage freelyg y g g y Decreasing f and Vdd together, saves energy Decreasing Vdd, increases delay of transistors (=tp)
4 .5
5
5 .5
Inverter delay tp
3
3 .5
4
(nor
mal
ized
)
J.M. Rabaey, A. Chandrakasan, B. Nikolic, slide set for book “Digital Integrated Circuits A Design Perspective”,2002, http://bwrc eecs berkeley edu/IcBook/Slides/chapter5 ppt
1 .5
2
2 .5t p http://bwrc.eecs.berkeley.edu/IcBook/Slides/chapter5.ppt
#18/67 Department of Computer SystemsErno Salminen - Nov. 2010
0 .8 1 1 .2 1 .4 1 .6 1 .8 2 2 .2 2 .41
VDD
(V )
Voltage vs. Frequency vs. PowerVoltage vs. Frequency vs. Power
Implementing low-power configurable processors - practical options and tradeoffs, Wei, J ; Rowen C ; Design Automation Conference 2005 Proceedings 42nd 13-17 June
#19/67 Department of Computer SystemsErno Salminen - Nov. 2010
J.; Rowen, C.; Design Automation Conference, 2005. Proceedings. 42 ,13 17 June 2005 Page(s):706 - 711
Power vs. energyPower vs. energy HUOM! OBS!
Batteries store energy not power Power measures rate of energy consumption
Energy (E = P * t) saving is often the real goal!
Muy importante!
Energy (E P t) saving is often the real goal! Decreasing f,
increases t Frequency
scaling alone does not decreasedecrease energy
Execution time t istime t is usually constrained
#20/67 Department of Computer SystemsErno Salminen - Nov. 2010
S. Dey, Design of Low-Power, Battery-Efficient Systems, ECE206C course material, UCSD, 2004.
PDP and EDPPDP and EDP Power-Delay Product (PDP) = P * t
avg. energy consumed per switching event Watt*sec = Joule
35
40
45
50
Energy-Delay Product (EDP) =
20
25
30
35
rmal
ized
val
ue
energy*delay
Product (EDP) = PDP*t avg. energy
lti li d b
5
10
15No
energy delaypower*delay
powerdelay
multiplied by execution time
Takes into account the trade 0
0 0.5 1 1.5 2 2.5 3 3.5
Supply voltage [V]
account the trade-off between increased delay and lower
#21/67 Department of Computer SystemsErno Salminen - Nov. 2010
min PDP @1.3V
min EDP @1.7V
energy/operation
LowLow--power design at power design at LowLow--power design at power design at system levelsystem level
Department of Computer SystemsErno Salminen - Nov. 2010
Importance of design levelImportance of design levelle
vel
Applies to all design decision not just power
abst
ract
ion
l
effectAlexander Worm , Algorithm Manipulation for Low-Power Communication Circuit
#23/67 Department of Computer SystemsErno Salminen - Nov. 2010
Implementation, Tampere SoC, Nov. 20 2001. Who copied the figure from:
Power reduction methodsPower reduction methods asd Importance in
future
-
-/+
-+
--
+-/+
#24/67 Department of Computer SystemsErno Salminen - Nov. 2010
Barry Pangrle, Panels discussion “It's About Power - Performance and area alone don't quite cut it anymore!”, DATE 2/14/2005.
+
MethodsMethods
#25/67 Department of Computer SystemsErno Salminen - Nov. 2010
Implementing low-power configurable processors - practical options and tradeoffs, Wei, J.; Rowen, C.; Design Automation Conference, 2005. Proceedings. 42nd,13-17 June 2005 Page(s):706 - 711
Choose the right implementationChoose the right implementationPoint solutions are of course most efficient
w.r.t to powerp Reduced flexibility
Large differences: 9x-1075x!gPay attention to ratio active_P/idle_P Poor ratio (small) in general-purpose devices( ) g p p
[Mayo, Ranganathan, Energy consumption in mobile devices..., HPL-2003-167, 2003]
Power consumption in various applicationsWeb
rcv reply speaker headphone browse text audio text audio max/min min/idle max/idlelaptop [W] 15.16 16.25 18.02 15.99 16.55 14.2 14.65 14.4 15.5 13.975 1.27 1.02 1.29handheld [W] 1.386 1.439 2.091 1.7 1.742 1.276 1.557 1.319 - 1.2584 1.64 1.01 1.66cellphone [mW] 539 472 78 392 1147 26 14.71 3.00 44.12email pager [mW] 92 72 13 1.28 5.54 7.08high-end MP3 [mW] 2977 1884 1 00 1 58 1 58
notes messagingDevice idle RatiosUnit email MP3
#26/67 Department of Computer SystemsErno Salminen - Nov. 2010
high end MP3 [mW] 2977 1884 1.00 1.58 1.58low-end MP3 [mW] 327 143 1.00 2.29 2.29voide recorder [mW] 166 17 - 9.76 9.76ratio = laptop/min - 164.8 225.7 8.6 48.9 9.5 182.1 88.3 36.7 13.5 1075.0 - - -
System level: Choose the right digital System level: Choose the right digital architecture (1)architecture (1)
0.5-5MIPS/mW
( )( )
mP
Prog Mem
lexi
bilit
y
10-100MOPS/ W MAC
UnitAddrGen Embedded
Processor(IpArm)DSP
(TI C6 )
Fl
100-1000
MOPS/mW
( p )(TI C6xxx)
ReconfigurableProcessors
100 1000 MOPS/mW
Direct Mapped
EmbeddedFPGA
Processors(Maia)
Factor of 100-1000
#27/67 Department of Computer SystemsErno Salminen - Nov. 2010
Direct Mappedhardware Power Dissipation
Gary Kelson, BWRC Overview, June 2002.
System level: Choose the right digital System level: Choose the right digital architecture (2)architecture (2)( )( )
#28/67 Department of Computer SystemsErno Salminen - Nov. 2010[Jan Rabaey, System-on-a-chip: A case for heterogeneous architecture, Tampere Soc, 1999].
ReminderReminder: : optopt for for localitylocality
compare
#29/67 Department of Computer SystemsErno Salminen - Nov. 2010
Mattan Erez, Stream Architectures –Programmability and Efficiency,
Tampere SoC, Nov. 17 2004
WillWill new new technologiestechnologies minimizeminimize heatheat??
What next?
[H. Harrer, G.A. Katopis, G.A.; Becker, W., From chips to systems via packaging - A comparison of IBM's mainframe servers, IEEE circuits and systems, Vol. 6, Iss. 4, 2006, pp. 32-41.]
#30/67 Department of Computer SystemsErno Salminen - Nov. 2010
and systems, Vol. 6, Iss. 4, 2006, pp. 32 41.]
Methods for dynamic Methods for dynamic Methods for dynamic Methods for dynamic power management power management
Department of Computer SystemsErno Salminen - Nov. 2010
FullFull speedspeed is is notnot requiredrequired allall the the timetimeRunning at full
speed wastes penergyThe workload is
NOT constant But hard to
forecast at design-time
Adapt Adapt dynamically
L.A. Barroso, U. Holzle, The Case for
#32/67 Department of Computer SystemsErno Salminen - Nov. 2010
Energy-Proportional Computing, Computer, Vol. 40 , Iss. 12, 2007, pp. 33 -37
FullFull speedspeed is is notnot requiredrequired allall the the timetime (2)(2)
Servers do use dynamic ymanagement Far from Ideal power
Wasted
idealEnergy
t b
power
seems to be way too cheapcheap
#33/67 Department of Computer SystemsErno Salminen - Nov. 2010
Dynamic power managements (DPM)Dynamic power managements (DPM) Configures electronic systems at run-time to
provide required performance with minimal activity Applicable if components experience non-uniform
workload Predictable periods of activity and idleness Predictable periods of activity and idleness e.g. simple timeout policy in laptops shuts down
components if they have been idle for certain period
Power manageable component (PMC) has two or more modes of operationa) High performance and power consumptiona) High performance and power consumptionb) Low performance and power
Usually the number of modes very limited
#34/67 Department of Computer SystemsErno Salminen - Nov. 2010
Power control unitPower control unitP t l
Determines when FU is shut down Rule of thumb: pow_ctrl area max
1/10 FU area
Power ctrl
Power consumption of power control must be smaller than resulting savings
a) Power supply shut down Either shut down the Clk and/or Vdd
State is lost, if Vdd shut down Internal idleness requires keeping
a) Power supply shut-down
Power ctrlstate
Shutdown and recovery have non-negligible delays
Sh tti d i t b fi i l if
latchclk_disable
Shutting down is not beneficial if sleep state is short
Hard to determine to optimal timing Performance loss due to recovery b) Cl k ti i i l
Clk
#35/67 Department of Computer SystemsErno Salminen - Nov. 2010
Performance loss due to recovery time
b) Clock gating principle
Isolating the shutdown unitsIsolating the shutdown units Isolate sleeping unit from its neighbors Drive control and status signals into inactive state
O f E.g. sleeping FIFO appears full and other won’t try to write data to it
At the same time, sleeping FIFO appears empty and other t d f itcannot read from it
Sometimes all ouput signals must be frozen during sleep (even if power is cut)
FIFOPRODUCER CONSUMER
active activesleepingdata data
we
full valid
re
’1’ ’0’10
10
#36/67 Department of Computer SystemsErno Salminen - Nov. 2010
clk
power ctrl
enable=’0’
Power supply shutdownPower supply shutdown Cut of power supply with switch Removes also Pleakageg Switch has non-ideal delay T and resistance R
Physical placement and layout important due t t i t ito transient noise
Volatile memories (RAM, flip-flops) lose their statestate Must be saved elsewhere and restored Longer shutdown/wakeup delaysg p y
Utilized in coarse-grain not with small units inside the chip
#37/67 Department of Computer SystemsErno Salminen - Nov. 2010
Power supply shutdown examplePower supply shutdown exampleas
#38/67 Department of Computer SystemsErno Salminen - Nov. 2010
http://failblog.files.wordpress.com/2009/06/fail-owned-conservation-win.jpg
Clock gatingClock gating Clock of FU is shut down Needs latch and AND gate to avoid glitchesg g Adds clock skew
Input values do not propagate through input p p p g g pregisters No switching inside FU
Relatively simple and low overhead control Automatic clock gating supported in CAD tools Small grained control, even at the level of a
couple of DFF’s
#39/67 Department of Computer SystemsErno Salminen - Nov. 2010
Clock gating (2)Clock gating (2)Well suited for self-managed components Clock distribution network itself consumes Clock distribution network itself consumes
energy Highly active (K=1)g y ( ) Large net = large capacitance Stop master clock PLL or oscillator However, most energy consumed by local clocks GALS approach may help since large global clock
t k b litt d i t l ll l knetwork may be splitted into several small clock networks
#40/67 Department of Computer SystemsErno Salminen - Nov. 2010
Clock power in PentiumClock power in Pentium 30% of the total power is attributed to clockMost of the clock power is used in the final clock
b ff d fli flbuffers and flip-flops
#41/67 Department of Computer SystemsErno Salminen - Nov. 2010S. Rusu, Tampere Soc 2004.
Reducing clock network powerReducing clock network power
S. Rusu, Tampere Soc 2004.
In GALS all local clocks can be optimized
#42/67 Department of Computer SystemsErno Salminen - Nov. 2010
In GALS, all local clocks can be optimized separately
Dynamic Frequency/Voltage ScalingDynamic Frequency/Voltage Scaling(DVS / DFS)(DVS / DFS)( )( )
DFS : frequency is changed at runtime DVS : both frequency and voltage are changed at
tiruntime
E(orig) =t * P =t P
E(DFS)=
Orig
( ) 2t * P/2 = E(orig)
DFS
=f/2
E(DVS) = 2t * P/4
E( i )/2
DVS
=f/2=0 71*Vdd
#43/67 Department of Computer SystemsErno Salminen - Nov. 2010
time = E(orig)/2 =0.71*Vdd
DVS/DFS: Idle power is not zero!DVS/DFS: Idle power is not zero!More realistic example: Tasks are initiated with 16 cycle interval
measured as original cycles measured as original cycles Total energy = active + idle energy
P(idle) > 0 W
P P Cycle # Cycles # Cycles Energy Energy Energy Saving (act) (idle) time (act) (idle) (act) (idle) (tot) %
Orig 1.0 0.2 1.0 8.0 8.0 8.0 1.6 9.6 -DFS 0.5 0.2 2.0 8.0 0.0 8.0 0.0 8.0 16.7DVS 0 3 0 2 2 0 8 0 0 0 4 8 0 0 4 8 50 0
#44/67 Department of Computer SystemsErno Salminen - Nov. 2010
DVS 0.3 0.2 2.0 8.0 0.0 4.8 0.0 4.8 50.0
Controlling DPMControlling DPM
Department of Computer SystemsErno Salminen - Nov. 2010
Power statesPower states Low power states have lower performancep longer transition latency
#46/67 Department of Computer SystemsErno Salminen - Nov. 2010
PowerPower--managed systemsmanaged systems Observer collects workload information Controller forces transitions between power Controller forces transitions between power
states
In large In large networked systemssystems, observations and conrols cannot be centralized
#47/67 Department of Computer SystemsErno Salminen - Nov. 2010
Problematic
Power state machinePower state machine Trivial greedy policy can be used, if transitions are
instantaneous and consume no power Not realistic assumption
Returning from power-down mode requires1 turning on and stabilizing power supply1. turning on and stabilizing power supply2. reinitializing system3. restoring context non-negligible delay and energy
Tolerated performance degradation must beexplicitly statedexplicitly stated Max power saving when device is not designed at all.
However, performance loss is 100%...
#48/67 Department of Computer SystemsErno Salminen - Nov. 2010
Power state machine (2)Power state machine (2) Strong ARM SA-1100 (cf. fig 1) can be
modeled with FSM below Transition RUN IDLE is so fast that Greedy policy applicabley p y pp they can be combined into single state ON (self-
managed with greedy policy) PON is weighted sum
of PRUN adn PIDLE
OFF coresponds to state Sleep
#49/67 Department of Computer SystemsErno Salminen - Nov. 2010
BreakBreak--eveneven timetime (1)(1)No computation possible during state
transition → performance losspBreak-even time TBE for inactive state is the
minimum inactivity time required to y qcompensate the cost of state transition(s) Cost depends on transition times and power
comsumption(s) If inactive time Tn < TBE, it is not beneficial to
t i ti t t b t i tenter inactive state because cost is not compensated
#50/67 Department of Computer SystemsErno Salminen - Nov. 2010
BreakBreak--eveneven timetime (2)(2) In simple case, TBE is sum of time for entering
state and exiting stateg Assuming that state transition does not increase
power consumption (like it does with hard-drives)Multiple power states result in multiple break-
even time values
#51/67 Department of Computer SystemsErno Salminen - Nov. 2010
ApplicabilityApplicability of of DPMDPMOne can calculate the max power saving
Psaved,max = Pon – Pideal where P refers to P with ideal DPM where Pideal refers to P with ideal DPM
States with small TBE are more likely applicable, e.g. Strong-ARM TBE,idle = 0.02 ms TBE,sleep = 169.09 ms Idle state can be entered much more often
Workload TBE,idle / 2
Time
Power with Idle
Power with Sleep
#52/67 Department of Computer SystemsErno Salminen - Nov. 2010
Fig. Example of ideal DPM policy (workload known a priori, and hencewakeups always on time and no performance loss)
ApplicabilityApplicability of of DPM (2)DPM (2)
In most cases, workload is not known a priori and DPM reacts to the changesDPM reacts to the changes
Sometimes, workload is known, e.g. sampling sensorvalues once per second and processing them
Workload
Power with Idle,,
ideal DPMStart waking up whencomputation is needed. Processing gets delayed
Entering idle state getsdelayed until allcomputation has been doneProcessing gets delayed
Power with Idle,
real DPM
computation has been done
#53/67 Department of Computer SystemsErno Salminen - Nov. 2010
Fig. Difference between ideal and realistic DPM policies
PredictionPrediction In real world, little information (or not at all) is
availbale about future inputsMust predictOverprediction/Underprediction Predicted idle period longer/shorter than actually
Overprediction causes performance lossN t h ti f k Not enough time for wakeup
Underprediction consumes unnecessary powerpower Low power mode not entered always
#54/67 Department of Computer SystemsErno Salminen - Nov. 2010
PredictionPrediction methodsmethodsa) Fixed timeout :
When elapsed idle time longer than threshold, enterl dlow-power mode
Big threshold increases performance and power Waste power when waiting for timeoutp g Performance loss upon wakeup
b) Predictive shutdown Predict idle time from duration past idle and active
periods No automatic way to decide regression equation Offline data collection required
Predict idle time from last active period Short active periods are usually followed by long idle periods
#55/67 Department of Computer SystemsErno Salminen - Nov. 2010
p y y g p Offline data required
Prediction methods (2)Prediction methods (2)c) Predictive wakeup To reduce performance loss p When elapsed time in low-power mode longer
than threshold, start wakeup procedure Increases power in idle period longer than predicted
d) Adaptive methods change threshold at r ntimeruntime E.g. use several timeout values and measure
how well they performhow well they perform
#56/67 Department of Computer SystemsErno Salminen - Nov. 2010
SelectingSelecting TTBEBE in in fixedfixed timeouttimeoutInteractiveprograms (e.g. games) have
d(m
W) games) have
shorter idleperiods
With hi h b kPsav
ed
With high break-even time, low-power mode seldom used
P
seldom used
Plot of P (T ) for the Sleep state of the StrongARM SA-1100 processor The three curves refer to three
#57/67 Department of Computer SystemsErno Salminen - Nov. 2010
Plot of P (T ) for the Sleep state of the StrongARM SA-1100 processor. The three curves refer to three different workload statistics, computed from real-world CPU traces provided by the IPM monitoring package [5].
Safety vs efficiencySafety vs efficiency
Safety means probability of avoiding performanceloss → there’s always some loss
Efficiency means proportion of achieved power saving from ideal saving
Quality of a timeout-based predictor evaluated as a function of timer duration.
#58/67 Department of Computer SystemsErno Salminen - Nov. 2010
Q y pSafety and efficiency of the timeout used to predict idle periods longer than T=160 ms.
Predictive Predictive shutdownshutdownExample threshold (i.e active periods shorter than this are likely followed by long idle time)
L-shape is necessary condition for di tiprediction
Nex
t
#59/67 Department of Computer SystemsErno Salminen - Nov. 2010
Fig. 7. (a) Scatter plot of T versus T for the workload of the CPU of a personal computer running Linux.
ACPIACPI Advanced Configuration and Power Interface
by Intel, Microsoft and Toshiba D fi i t f b t OS d HW Defines interfaces between OS and HW Targets personal computers (PCs)
#60/67 Department of Computer SystemsErno Salminen - Nov. 2010
ACPI (2)ACPI (2) System has 4 global power states
G0 = ON, G3=OFF Additional state legacy if devices don’n support ACPI Additional state legacy if devices don n support ACPI
State G1 (sleeping) divided into 4 sub-states State G0 (ON) divided into 4 device states and 4 ( )
processor states
”OFF”Max wakeup time
”ON”Min wakeup time
#61/67 Department of Computer SystemsErno Salminen - Nov. 2010
Case: ACPI with hard diskCase: ACPI with hard disk Power management SW takes <1% of time Wakeup power (52.5J / 7s) is larger than active
power Due to inertia when disks start rotating
Break even time 17 6 sec Break-even time 17.6 sec Power reduction 23- 55%
”idle, disks rotating””active”
”OFF”
#62/67 Department of Computer SystemsErno Salminen - Nov. 2010
Case: ResultsCase: Results
TTbreak-even
sec
17.65 4
#63/67 Department of Computer SystemsErno Salminen - Nov. 2010
5.4
ConclusionConclusion Power saving does not necessarily save
energy Basic methods Low-power technology, signal reordering
Mi ff t t i t tiMinor effect, not interesting Voltage scaling, power shutdown
Diffcult, voltage levels CANNOT be freely choseng y Frequency scaling
Must not sacrifice performance too muchDoes not affect energy/task if used aloneDoes not affect energy/task, if used alone
Clock gating, enabled flip-flopsReasonable way for energy saving
#64/67 Department of Computer SystemsErno Salminen - Nov. 2010
Supported by CAD tools
Conclusion (2)Conclusion (2)Several power/performance modes needed Modes have different break-even times
Policy defines the current operating mode Static Adaptive
Policy decisions based on y off-line data monitoring
#65/67 Department of Computer SystemsErno Salminen - Nov. 2010
ExtraExtra
Department of Computer SystemsErno Salminen - Nov. 2010
Glitch MinimizationGlitch Minimization Low-level technique Glitches may add 20% to power Glitches may add 20% to power
[Raghunathan, DAC96] Raghunathan et al. suggest RTLRaghunathan et al. suggest RTL
modifications to decrease glitches Stop glitch propagation (e.g. with registers)p g p p g ( g g ) Glitch generation (due to uneven gate delays) not
considered Important to avoid glitches in control signals
#67/67 Department of Computer SystemsErno Salminen - Nov. 2010
Shutting down unitsShutting down units Functional unit (FU)
Unit is idle when its output values are not needed Hence it can be shut down
ADD
SUB
0
1 Hence, it can be shut-down
a) External idleness – changes in units output are not visible in system ’1’ or ’2’
2
outputs Ouput of ADD is don’t care
b) Internal idleness – units output do not
a) ADD is externally idle
( )b) Internal idleness units output do not change even if units inputs change State-holding required
Not practical to detect all idle
Functional unit (FU)
ADD 0
Not practical to detect all idle conditions Too large overhead
SUB 1
2
#68/67 Department of Computer SystemsErno Salminen - Nov. 2010
Detect most common ’2’
b) FU is internally idle
#69/67 Department of Computer SystemsErno Salminen - Nov. 2010
Stochastic methodsStochastic methodsTake into account uncertainty in workload, power consumption, and y , p p ,
reponse times many power states, buffers, queues etc.
Offer contolled trade-off between performance and powerControlled Markov chains Service requester (SR) models workload Service provider (SP) model power modes Power manager implements commands for SP
C t t i bi d f
#70/67 Department of Computer SystemsErno Salminen - Nov. 2010
Cost metrics combines power and performance
Stochastic methods (2)Stochastic methods (2)State transitions have probabilitiesBursty workload in Fig 9a)Bursty workload in Fig 9a) High probability (0.85) for several requests in a
row Average request stream 1/(1-0.85) = 6.7 requests
0= no request/workload
1= request issuedq
SR kl d SP d
#71/67 Department of Computer SystemsErno Salminen - Nov. 2010
SR = workload SP = power modes
Stochastic methods (3)Stochastic methods (3) Power mode is changed with commands switch_ON
and switch_OFF Transition probabilities model the transition delay
Even if switch_OFF is issued, transition does not occur immediatelyimmediately
Advantages Possible to search global optimum Exact solution in polynomial time Strength and optimality of randomized policies
Note Note Performance and power are expected values, no
guarantees given
#72/67 Department of Computer SystemsErno Salminen - Nov. 2010
Hard to obtain accurate Markov models