Download - Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Transcript
Page 1: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 11

ELEC 5270/6270 Spring 2011ELEC 5270/6270 Spring 2011Low-Power Design of Electronic CircuitsLow-Power Design of Electronic Circuits

Power Aware MicroprocessorsPower Aware Microprocessors

Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor

Dept. of Electrical and Computer EngineeringDept. of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849

[email protected]://www.eng.auburn.edu/~vagrawal/COURSE/E6270_Spr11/

course.html

Page 2: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 22

SIA Roadmap for Processors (1999)SIA Roadmap for Processors (1999)YearYear 19991999 20022002 20052005 20082008 20112011 20142014

Feature size (nm)Feature size (nm) 180180 130130 100100 7070 5050 3535

Logic transistors/cmLogic transistors/cm22 6.2M6.2M 18M18M 39M39M 84M84M 180M180M 390M390M

Clock (GHz)Clock (GHz) 1.251.25 2.12.1 3.53.5 6.06.0 10.010.0 16.916.9

Chip size (mmChip size (mm22)) 340340 430430 520520 620620 750750 900900

Power supply (V)Power supply (V) 1.81.8 1.51.5 1.21.2 0.90.9 0.60.6 0.50.5

High-perf. Power (W)High-perf. Power (W) 9090 130130 160160 170170 175175 183183

Source: http://www.semichips.org

Un

true

pre

dic

tion

s.

Page 3: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 33

Power Reduction in ProcessorsPower Reduction in Processors

Hardware methods:Hardware methods: Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode

Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization

Software methodsSoftware methods

Page 4: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Performance CriteriaPerformance Criteria

Throughput – computations per unit time.Throughput – computations per unit time.Performance is inverse of time – increasing Performance is inverse of time – increasing

CPU time indicates lower performance.CPU time indicates lower performance.Power – computations per watt.Power – computations per watt.Energy efficiency – performance/joule.Energy efficiency – performance/joule.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 44

Page 5: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 55

SPEC CPU2006 BenchmarksSPEC CPU2006 Benchmarks Standard Performance Evaluation Corporation (SPEC)Standard Performance Evaluation Corporation (SPEC) http://www.spec.orghttp://www.spec.org Twelve integer and 17 floating point programs, Twelve integer and 17 floating point programs, CINT2006CINT2006

and and CFP2006CFP2006.. Each program run time is normalized to obtain a Each program run time is normalized to obtain a SPEC SPEC

ratioratio with respect to the run time of with respect to the run time of Sun Ultra Enterprise 2 Sun Ultra Enterprise 2 system with a 296 MHz UltraSPARC II processorsystem with a 296 MHz UltraSPARC II processor..

It takes about 12 days to run all benchmarks on reference It takes about 12 days to run all benchmarks on reference system.system.

CINT2006CINT2006 and and CFP2006CFP2006 metrics are the geometric means metrics are the geometric means of SPEC ratios:of SPEC ratios: Peak metric – each program is individually optimized (aggressive Peak metric – each program is individually optimized (aggressive

compilation).compilation). Base metric – common optimization for all programs.Base metric – common optimization for all programs.

Page 6: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

SPEC CINT2006 ResultsSPEC CINT2006 Results http://www.spec.org/cpu2006/results/cint2006.htmlhttp://www.spec.org/cpu2006/results/cint2006.html

Dell Inc., PowerEdge R610Dell Inc., PowerEdge R610CPU: Intel Xeon X5670, 2.93 GHzCPU: Intel Xeon X5670, 2.93 GHzNumber of chips 2, cores 12, threads/core 2Number of chips 2, cores 12, threads/core 2Performance metric 36.6 base, 39.4 peakPerformance metric 36.6 base, 39.4 peak

Dell Inc. PowerEdge M905Dell Inc. PowerEdge M905CPU: AMD Opteron 8381 HE, 2.50 GHzCPU: AMD Opteron 8381 HE, 2.50 GHzNumber of chips 4, cores 16, threads/core 1Number of chips 4, cores 16, threads/core 1Performance metric 15.8 base, 19.1 peak Performance metric 15.8 base, 19.1 peak 

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 66

Page 7: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

SPEC CFP2006 ResultsSPEC CFP2006 Results http://www.spec.org/cpu2006/results/cfp2006.htmlhttp://www.spec.org/cpu2006/results/cfp2006.html

Dell Inc., PowerEdge R610Dell Inc., PowerEdge R610CPU: Intel Xeon X5670, 2.93 GHzCPU: Intel Xeon X5670, 2.93 GHzNumber of chips 2, cores 12, threads/core 2Number of chips 2, cores 12, threads/core 2Performance metric 42.5 base, 45.8 peakPerformance metric 42.5 base, 45.8 peak

Dell Inc. PowerEdge M905Dell Inc. PowerEdge M905CPU: AMD Opteron 8381 HE, 2.50 GHzCPU: AMD Opteron 8381 HE, 2.50 GHzNumber of chips 4, cores 16, threads/core 1Number of chips 4, cores 16, threads/core 1Performance metric 17.4 base, 21.5 peak Performance metric 17.4 base, 21.5 peak 

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 77

Page 8: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 88

Other BenchmarksOther Benchmarks

LINPACK is numerically intensive floating point linear LINPACK is numerically intensive floating point linear system (Ax = b) program used for benchmarking system (Ax = b) program used for benchmarking supercomputers.supercomputers.

SPECPOWER_ssj2008SPECPOWER_ssj2008 measures power and performance measures power and performance of a computer system.of a computer system. The initial benchmark addresses the performance of server-side The initial benchmark addresses the performance of server-side

Java; additional workloads are planned.Java; additional workloads are planned. http://www.spec.org/benchmarks.html#power http://www.spec.org/benchmarks.html#power

Page 9: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Second Quarter 2010 Second Quarter 2010 SPECpower_ssj2008 ResultsSPECpower_ssj2008 Results

http://www.spec.org/power_ssj2008/results/res2010q2/http://www.spec.org/power_ssj2008/results/res2010q2/

Apr 7, 2010: Hewlett-Packard ProLiant DL385 G7Apr 7, 2010: Hewlett-Packard ProLiant DL385 G7CPU: AMD Opteron 6174, 2.2GHzCPU: AMD Opteron 6174, 2.2GHzNumber of chips 2, cores 12, threads/core 2Number of chips 2, cores 12, threads/core 2Total memory 16GBTotal memory 16GBssj operations @ 100% 888,819ssj operations @ 100% 888,819Average power @ 100% 271 WAverage power @ 100% 271 WAverage power @ active idle 101 WAverage power @ active idle 101 WOverall ssj operations per watt 2,355Overall ssj operations per watt 2,355

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 99

Page 10: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Second Quarter 2010 Second Quarter 2010 SPECpower_ssj2008 ResultsSPECpower_ssj2008 Results

http://www.spec.org/power_ssj2008/results/res2010q2/http://www.spec.org/power_ssj2008/results/res2010q2/

May 19, 2010: Dell Inc., PowerEdge R610May 19, 2010: Dell Inc., PowerEdge R610CPU: Intel Xeon X5670, 2.93 GHzCPU: Intel Xeon X5670, 2.93 GHzNumber of chips 2, cores 12, threads 2Number of chips 2, cores 12, threads 2Total memory 12GBTotal memory 12GBssj operations @ 100% 914,076ssj operations @ 100% 914,076Average power @ 100% 244 WAverage power @ 100% 244 WAverage power @ active idle 62.3 WAverage power @ active idle 62.3 WOverall ssj operations per watt 2,938Overall ssj operations per watt 2,938

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1010

Page 11: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1111

Energy SPEC BenchmarksEnergy SPEC Benchmarks

Energy efficiency mode: Besides the execution time, energy Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by:Energy efficiency of a benchmark program is given by:

1/(Execution time)1/(Execution time)Energy efficiency Energy efficiency == ──────────────────────── joules consumedjoules consumed

D. A. Patterson and J. L. Hennessy, D. A. Patterson and J. L. Hennessy, Computer Organization & Design: The Computer Organization & Design: The Hardware/Software InterfaceHardware/Software Interface, 4, 4thth Edition, Morgan Kaufmann Publishers (Elsevier), Edition, Morgan Kaufmann Publishers (Elsevier), 2009,2009,

Page 12: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1212

Energy EfficiencyEnergy Efficiency

Efficiency averaged on Efficiency averaged on nn benchmark programs: benchmark programs:

nnEfficiencyEfficiency == (( ΠΠ Efficiency Efficiencyii ))

1/1/nn

ii=1=1

where Efficiencywhere Efficiencyii is the efficiency for program is the efficiency for program ii..

Relative efficiency:Relative efficiency:

Efficiency of a computerEfficiency of a computerRelative efficiency = Relative efficiency = ──────────────────────────────────

Eff. of reference Eff. of reference computercomputer

Page 13: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1313

SPEC2000 Relative Energy EfficiencySPEC2000 Relative Energy Efficiency

0

1

2

3

4

5

6

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

Pentium [email protected]/0.6GHz Energy-efficient procesor

Pentium [email protected] (Reference)

Pentium [email protected]

Always max. clock

Laptop adaptive clk.

Min. power min. clock

Page 14: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1414

Voltage ScalingVoltage Scaling

Dynamic: Reduce voltage and frequency Dynamic: Reduce voltage and frequency during idle or low activity periods.during idle or low activity periods.

Static: Static: Clustered voltage scalingClustered voltage scalingLogicLogic on non-critical paths given lower voltage.on non-critical paths given lower voltage.47% power reduction with 10% area increase 47% power reduction with 10% area increase

reported.reported.M. Igarashi et al., “Clustered Voltage Scaling M. Igarashi et al., “Clustered Voltage Scaling

Techniques for Low-Power Design,” Techniques for Low-Power Design,” Proc. IEEE Proc. IEEE Symp. Low Power DesignSymp. Low Power Design, 1997., 1997.

Page 15: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1515

Processor UtilizationProcessor UtilizationThroughput = Operations / second

Th

rou

ghp

ut

Time

Compute-intensiveprocesses

Systemidle

Low throughput(background)

processes

Maximumthroughput

Page 16: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1616

Examples of ProcessesExamples of Processes

Compute-intensive: spreadsheet, spelling Compute-intensive: spreadsheet, spelling check, video decoding, scientific check, video decoding, scientific computing.computing.

Low throughput: data entry, screen Low throughput: data entry, screen updates, low bandwidth I/O data transfer.updates, low bandwidth I/O data transfer.

Idle: no computation, no expected output.Idle: no computation, no expected output.

Page 17: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1717

Effects of Voltage ReductionEffects of Voltage Reduction

Voltage reduction increases delay, Voltage reduction increases delay, decreases throughput:decreases throughput:

Slow reduction in throughput at firstSlow reduction in throughput at firstRapid reduction in throughput for VRapid reduction in throughput for VDD ≤ V≤ Vth

Time per operation (TPO) increasesTime per operation (TPO) increases

Voltage reduction continues to reduce Voltage reduction continues to reduce power consumption:power consumption:

Energy per operation (EPO) = Power × TPOEnergy per operation (EPO) = Power × TPO

Page 18: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1818

Energy per Operation (EPO)Energy per Operation (EPO)

VVDD / V / Vth

1 2 3 4 5

PowerTPO

EPO

1.0

0.5

0.0

Page 19: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1919

Dynamic Voltage and ClockDynamic Voltage and Clock

ThroughputThroughputTime spent in:Time spent in:

Battery Battery lifelifeFast Fast

modemodeSlow Slow modemode

Idle Idle modemode

Always full speedAlways full speed 10%10% 0%0% 90%90% 1 hr1 hr

Sometimes full speedSometimes full speed 1%1% 90%90% 9%9% 5.3 hrs5.3 hrs

Rarely full speedRarely full speed 0.1%0.1% 99%99% 0.9%0.9% 9.2 hrs9.2 hrs

T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessors,Springer, 2002, pp. 35-36.

Page 20: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Example: Find Minimum Energy ModeExample: Find Minimum Energy Mode

Processor data (rated operation):Processor data (rated operation):2 GHz clock2 GHz clock1.5 volt supply voltage1.5 volt supply voltage0.5 volt threshold voltage0.5 volt threshold voltagePower consumptionPower consumption

50 watts dynamic power50 watts dynamic power50 watts static power50 watts static power

Maximum clock frequency for V volt supplyMaximum clock frequency for V volt supply

ff αα (V – V(V – VTHTH)/V)/V

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2020

Page 21: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Example Cont.Example Cont.

Dynamic power:Dynamic power:

PPdd = CV = CV22f = C(1.5)f = C(1.5)22××22××101099 = 50W = 50W

C = 11.11 nF, capacitance switching/cycleC = 11.11 nF, capacitance switching/cycle

PPdd = 11.11 V = 11.11 V22ffDynamic energy per cycle:Dynamic energy per cycle:

EEdd = P = Pdd/f = 11.11 V/f = 11.11 V22

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2121

Page 22: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Example Cont.Example Cont.

Clock frequency:Clock frequency:

f = k (V – Vf = k (V – VTHTH)/V = k (1.5 – 0.5)/1.5 = 2 GHz)/V = k (1.5 – 0.5)/1.5 = 2 GHz

k = 3 GHz, a proportionality constantk = 3 GHz, a proportionality constant

f = 3(V – 0.5)/Vf = 3(V – 0.5)/V GHzGHz

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2222

Page 23: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Example Cont.Example Cont.

Static power:Static power:

PPss = k’ V = k’ V22 = k’ (1.5) = k’ (1.5)22 = 50W= 50W

k’ = 22.22 mho, total leakage conductancek’ = 22.22 mho, total leakage conductance

PPss = 22.22 V = 22.22 V22

Static energy per cycle:Static energy per cycle:

EEss = P= Pss/f = 22.22 V/f = 22.22 V33/[3(V – 0.5)]/[3(V – 0.5)]

= 7.41 V= 7.41 V33/(V – 0.5)/(V – 0.5)

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2323

Page 24: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Example Cont.Example Cont.Total energy per cycle:Total energy per cycle:

E = EE = Edd + E + Ess = 11.11 V = 11.11 V2 2 + 7.41 V+ 7.41 V33/(V – 0.5)/(V – 0.5)To minimize E, To minimize E, ∂E/∂V = 0, or∂E/∂V = 0, or

5V5V2 2 – 4.6V + 0.75 = 0– 4.6V + 0.75 = 0Solutions of quadratic equation:Solutions of quadratic equation:

V = 0.679 volt, 0.221 voltV = 0.679 volt, 0.221 voltDiscard second solution, which is lower Discard second solution, which is lower

than the threshold voltage of 0.5 volt.than the threshold voltage of 0.5 volt.Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2424

Page 25: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Example: ResultExample: Result

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2525

Rated modeLow energy

modeReduction

(%)

Voltage 1.5 V 0.679 V 54.7%

Clock frequency 2 GHz 791 MHz 60%

Dynamic energy/cycle

25.00 nJ 5.12 nJ 79.52%

Static energy/cycle 25.00 nJ 12.96 nJ 48.16%

Total energy/cycle 50.0 nJ 18.08 nJ 63.84%

Dynamic power 50.0 W 4.05 W 91.90%

Static power 50.0 W 10.25 W 79.50%

Total power 100.0 W 14.20 W 85.80%

Page 26: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2626

Problem of Process Variation in Problem of Process Variation in Nanometer TechnologiesNanometer Technologies

Lower Vth Vth Higher Vth

Nu

mb

er

of c

hip

s

Powerspecification

Clockspecification

From a presentation:Power Reduction using LongRun2 in Transmeta’sEfficon Processor, by D. DitzelMay 17, 2006

Yield lossdue to highleakage

Yield lossdue to slowspeedH

ighe

r vo

ltage

ope

ratio

n

Low

er v

olta

ge o

pera

tion

Nominalvoltage

Page 27: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2727

Pipeline GatingPipeline Gating A pipeline processor uses speculative execution.A pipeline processor uses speculative execution.

Incorrect branch prediction results in pipeline stalls and Incorrect branch prediction results in pipeline stalls and wasted energy.wasted energy.

Idea: Stop fetching instructions if a branch Idea: Stop fetching instructions if a branch hazard is expected:hazard is expected:

If the count (M) of incorrect predictions exceeds a pre-If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for specified number (N), then suspend fetching instruction for some k cycles.some k cycles.

Ref.: S. Manne, A. Klauser and D. Grunwald, Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy “Pipeline Gating: Speculation Control for Energy Reduction,” Reduction,” Proc. 25Proc. 25thth Annual International Annual International Symp. Computer ArchitectureSymp. Computer Architecture, June 1998., June 1998.

Page 28: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2828

Slack SchedulingSlack Scheduling Application: Superscalar, out-of-order execution:Application: Superscalar, out-of-order execution:

An instruction is executed as soon as the required data and An instruction is executed as soon as the required data and resources become available.resources become available.

A commit unit reorders the results.A commit unit reorders the results.

Delay the completion of instructions whose result Delay the completion of instructions whose result is not immediately needed.is not immediately needed.

Example of RISC instructions:Example of RISC instructions: addadd r0, r1, r2;r0, r1, r2; (A)(A) sub sub r3, r4, r5;r3, r4, r5; (B)(B) and and r9, r1, r9;r9, r1, r9; (C)(C) or or r5, r9, r10;r5, r9, r10; (D)(D) xor xor r2, r10, r11;r2, r10, r11; (E)(E)

J. Casmira and D. Grunwald,“Dynamic Instruction SchedulingSlack,” Proc. ACM Kool ChipsWorkshop, Dec. 2000.

Page 29: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2929

Slack Scheduling ExampleSlack Scheduling Example

Slack schedulingSlack scheduling

AABB CC

DD

EE

Standard schedulingStandard scheduling

AA BB CC

DD

EE

Page 30: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3030

Slack SchedulingSlack Scheduling

Slack bitLow-power

execution units

Re-order buffer

Sch

edul

ing

logi

c

Page 31: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3131

Clock Distribution H-TreeClock Distribution H-Tree

clock

Fanout, λ = 4

Tree depth, s = logλN

No. of flip-flops = N

Page 32: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3232

Clock PowerClock PowerPclk = CLVDD

2f + CLVDD2f / λ + CLVDD

2f / λ2 + . . .

stages – 1 1= CLVDD

2f Σ ─ n = 0 λn

where CL = total load capacitance of N flip-flops

λ = constant fanout at each stage in distributionnetwork

Clock consumes about 40% of total processor power, because(1)Clock is always active(2)Makes two transitions per cycle, (α = 2)(3)Clock gating is useful; inhibit clock to unused blocks

Page 33: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Properties of H-TreeProperties of H-Tree

Balanced clock skew.Balanced clock skew.Small delay and power consumption.Small delay and power consumption.Requires fine-tuning for complex layout.Requires fine-tuning for complex layout.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3333

Page 34: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Clock Power and DelayClock Power and Delay Unit size buffer or inverter delay = dUnit size buffer or inverter delay = d Total dynamic power supplied to N flip-Total dynamic power supplied to N flip-

flops, P = Cflops, P = CLLVVDDDD22ff

Total power consumption of clock network:Total power consumption of clock network:

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3434

Flip-flps, N Clock power per flip-flop Clock delay

1 P d

4 P 4d

16 1.25P 8d

64 1.3125P 12d

128 1.327125P 16d

Page 35: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3535

Clock Network ExamplesClock Network ExamplesAlpha 21064Alpha 21064 Alpha 21164Alpha 21164 Alpha 21264Alpha 21264

TechnologyTechnology 0.750.75μμ CMOS CMOS 0.50.5μμ CMOS CMOS 0.350.35μμ CMOS CMOS

Frequency (MHz)Frequency (MHz) 200200 300300 600600

Total capacitanceTotal capacitance 12.5nF12.5nFClock gating Clock gating used. Total used. Total power 80 -power 80 -

110W110W

Clock loadClock load 3.25nF3.25nF 3.75nF3.75nF

Clock powerClock power 40%40% 40% (20W)40% (20W)

Max. clock skewMax. clock skew 200ps (<10%)200ps (<10%) 90ps90ps

D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.

Page 36: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3636

Power Reduction ExamplePower Reduction Example Alpha 21064: 200MHz @ 3.45V, power dissipation =Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) =Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) =Eliminate FP, power (3x) = 1.6W Scale 0.75Scale 0.75μμ → 0.35 → 0.35μμ, power (2x) =, power (2x) = 0.8W Reduce clock load, power (1.3x) =Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200 →160MHz, power (1.25x) =Reduce frequency 200 →160MHz, power (1.25x) = 0.5W J. Montanaro J. Montanaro et alet al., “A 160-MHz, 32-b, 0.5-W CMOS RISC ., “A 160-MHz, 32-b, 0.5-W CMOS RISC

Microprocessor,” Microprocessor,” IEEE J. Solid-State CircuitsIEEE J. Solid-State Circuits, vol. 31, no. , vol. 31, no. 11, pp. 1703-1714, Nov. 1996.11, pp. 1703-1714, Nov. 1996.

Page 37: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3737

For More on MicroprocessorsFor More on Microprocessors

T. D. Burd and R. W. Brodersen, Energy T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, Efficient Microprocessor Design, Springer, 2002.2002.

R. Graybill and R. Melhem, R. Graybill and R. Melhem, Power Aware Power Aware ComputingComputing, New York: Plenum Publishers, , New York: Plenum Publishers, 2002.2002.