Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 11
ELEC 5270/6270 Spring 2011ELEC 5270/6270 Spring 2011Low-Power Design of Electronic CircuitsLow-Power Design of Electronic Circuits
Power Aware MicroprocessorsPower Aware Microprocessors
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
Dept. of Electrical and Computer EngineeringDept. of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849
[email protected]://www.eng.auburn.edu/~vagrawal/COURSE/E6270_Spr11/
course.html
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 22
SIA Roadmap for Processors (1999)SIA Roadmap for Processors (1999)YearYear 19991999 20022002 20052005 20082008 20112011 20142014
Feature size (nm)Feature size (nm) 180180 130130 100100 7070 5050 3535
Logic transistors/cmLogic transistors/cm22 6.2M6.2M 18M18M 39M39M 84M84M 180M180M 390M390M
Clock (GHz)Clock (GHz) 1.251.25 2.12.1 3.53.5 6.06.0 10.010.0 16.916.9
Chip size (mmChip size (mm22)) 340340 430430 520520 620620 750750 900900
Power supply (V)Power supply (V) 1.81.8 1.51.5 1.21.2 0.90.9 0.60.6 0.50.5
High-perf. Power (W)High-perf. Power (W) 9090 130130 160160 170170 175175 183183
Source: http://www.semichips.org
Un
true
pre
dic
tion
s.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 33
Power Reduction in ProcessorsPower Reduction in Processors
Hardware methods:Hardware methods: Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode
Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization
Software methodsSoftware methods
Performance CriteriaPerformance Criteria
Throughput – computations per unit time.Throughput – computations per unit time.Performance is inverse of time – increasing Performance is inverse of time – increasing
CPU time indicates lower performance.CPU time indicates lower performance.Power – computations per watt.Power – computations per watt.Energy efficiency – performance/joule.Energy efficiency – performance/joule.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 44
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 55
SPEC CPU2006 BenchmarksSPEC CPU2006 Benchmarks Standard Performance Evaluation Corporation (SPEC)Standard Performance Evaluation Corporation (SPEC) http://www.spec.orghttp://www.spec.org Twelve integer and 17 floating point programs, Twelve integer and 17 floating point programs, CINT2006CINT2006
and and CFP2006CFP2006.. Each program run time is normalized to obtain a Each program run time is normalized to obtain a SPEC SPEC
ratioratio with respect to the run time of with respect to the run time of Sun Ultra Enterprise 2 Sun Ultra Enterprise 2 system with a 296 MHz UltraSPARC II processorsystem with a 296 MHz UltraSPARC II processor..
It takes about 12 days to run all benchmarks on reference It takes about 12 days to run all benchmarks on reference system.system.
CINT2006CINT2006 and and CFP2006CFP2006 metrics are the geometric means metrics are the geometric means of SPEC ratios:of SPEC ratios: Peak metric – each program is individually optimized (aggressive Peak metric – each program is individually optimized (aggressive
compilation).compilation). Base metric – common optimization for all programs.Base metric – common optimization for all programs.
SPEC CINT2006 ResultsSPEC CINT2006 Results http://www.spec.org/cpu2006/results/cint2006.htmlhttp://www.spec.org/cpu2006/results/cint2006.html
Dell Inc., PowerEdge R610Dell Inc., PowerEdge R610CPU: Intel Xeon X5670, 2.93 GHzCPU: Intel Xeon X5670, 2.93 GHzNumber of chips 2, cores 12, threads/core 2Number of chips 2, cores 12, threads/core 2Performance metric 36.6 base, 39.4 peakPerformance metric 36.6 base, 39.4 peak
Dell Inc. PowerEdge M905Dell Inc. PowerEdge M905CPU: AMD Opteron 8381 HE, 2.50 GHzCPU: AMD Opteron 8381 HE, 2.50 GHzNumber of chips 4, cores 16, threads/core 1Number of chips 4, cores 16, threads/core 1Performance metric 15.8 base, 19.1 peak Performance metric 15.8 base, 19.1 peak
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 66
SPEC CFP2006 ResultsSPEC CFP2006 Results http://www.spec.org/cpu2006/results/cfp2006.htmlhttp://www.spec.org/cpu2006/results/cfp2006.html
Dell Inc., PowerEdge R610Dell Inc., PowerEdge R610CPU: Intel Xeon X5670, 2.93 GHzCPU: Intel Xeon X5670, 2.93 GHzNumber of chips 2, cores 12, threads/core 2Number of chips 2, cores 12, threads/core 2Performance metric 42.5 base, 45.8 peakPerformance metric 42.5 base, 45.8 peak
Dell Inc. PowerEdge M905Dell Inc. PowerEdge M905CPU: AMD Opteron 8381 HE, 2.50 GHzCPU: AMD Opteron 8381 HE, 2.50 GHzNumber of chips 4, cores 16, threads/core 1Number of chips 4, cores 16, threads/core 1Performance metric 17.4 base, 21.5 peak Performance metric 17.4 base, 21.5 peak
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 77
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 88
Other BenchmarksOther Benchmarks
LINPACK is numerically intensive floating point linear LINPACK is numerically intensive floating point linear system (Ax = b) program used for benchmarking system (Ax = b) program used for benchmarking supercomputers.supercomputers.
SPECPOWER_ssj2008SPECPOWER_ssj2008 measures power and performance measures power and performance of a computer system.of a computer system. The initial benchmark addresses the performance of server-side The initial benchmark addresses the performance of server-side
Java; additional workloads are planned.Java; additional workloads are planned. http://www.spec.org/benchmarks.html#power http://www.spec.org/benchmarks.html#power
Second Quarter 2010 Second Quarter 2010 SPECpower_ssj2008 ResultsSPECpower_ssj2008 Results
http://www.spec.org/power_ssj2008/results/res2010q2/http://www.spec.org/power_ssj2008/results/res2010q2/
Apr 7, 2010: Hewlett-Packard ProLiant DL385 G7Apr 7, 2010: Hewlett-Packard ProLiant DL385 G7CPU: AMD Opteron 6174, 2.2GHzCPU: AMD Opteron 6174, 2.2GHzNumber of chips 2, cores 12, threads/core 2Number of chips 2, cores 12, threads/core 2Total memory 16GBTotal memory 16GBssj operations @ 100% 888,819ssj operations @ 100% 888,819Average power @ 100% 271 WAverage power @ 100% 271 WAverage power @ active idle 101 WAverage power @ active idle 101 WOverall ssj operations per watt 2,355Overall ssj operations per watt 2,355
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 99
Second Quarter 2010 Second Quarter 2010 SPECpower_ssj2008 ResultsSPECpower_ssj2008 Results
http://www.spec.org/power_ssj2008/results/res2010q2/http://www.spec.org/power_ssj2008/results/res2010q2/
May 19, 2010: Dell Inc., PowerEdge R610May 19, 2010: Dell Inc., PowerEdge R610CPU: Intel Xeon X5670, 2.93 GHzCPU: Intel Xeon X5670, 2.93 GHzNumber of chips 2, cores 12, threads 2Number of chips 2, cores 12, threads 2Total memory 12GBTotal memory 12GBssj operations @ 100% 914,076ssj operations @ 100% 914,076Average power @ 100% 244 WAverage power @ 100% 244 WAverage power @ active idle 62.3 WAverage power @ active idle 62.3 WOverall ssj operations per watt 2,938Overall ssj operations per watt 2,938
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1010
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1111
Energy SPEC BenchmarksEnergy SPEC Benchmarks
Energy efficiency mode: Besides the execution time, energy Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by:Energy efficiency of a benchmark program is given by:
1/(Execution time)1/(Execution time)Energy efficiency Energy efficiency == ──────────────────────── joules consumedjoules consumed
D. A. Patterson and J. L. Hennessy, D. A. Patterson and J. L. Hennessy, Computer Organization & Design: The Computer Organization & Design: The Hardware/Software InterfaceHardware/Software Interface, 4, 4thth Edition, Morgan Kaufmann Publishers (Elsevier), Edition, Morgan Kaufmann Publishers (Elsevier), 2009,2009,
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1212
Energy EfficiencyEnergy Efficiency
Efficiency averaged on Efficiency averaged on nn benchmark programs: benchmark programs:
nnEfficiencyEfficiency == (( ΠΠ Efficiency Efficiencyii ))
1/1/nn
ii=1=1
where Efficiencywhere Efficiencyii is the efficiency for program is the efficiency for program ii..
Relative efficiency:Relative efficiency:
Efficiency of a computerEfficiency of a computerRelative efficiency = Relative efficiency = ──────────────────────────────────
Eff. of reference Eff. of reference computercomputer
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1313
SPEC2000 Relative Energy EfficiencySPEC2000 Relative Energy Efficiency
0
1
2
3
4
5
6
SP
EC
INT
20
00
SP
EC
FP
20
00
SP
EC
INT
20
00
SP
EC
FP
20
00
SP
EC
INT
20
00
SP
EC
FP
20
00
Pentium [email protected]/0.6GHz Energy-efficient procesor
Pentium [email protected] (Reference)
Pentium [email protected]
Always max. clock
Laptop adaptive clk.
Min. power min. clock
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1414
Voltage ScalingVoltage Scaling
Dynamic: Reduce voltage and frequency Dynamic: Reduce voltage and frequency during idle or low activity periods.during idle or low activity periods.
Static: Static: Clustered voltage scalingClustered voltage scalingLogicLogic on non-critical paths given lower voltage.on non-critical paths given lower voltage.47% power reduction with 10% area increase 47% power reduction with 10% area increase
reported.reported.M. Igarashi et al., “Clustered Voltage Scaling M. Igarashi et al., “Clustered Voltage Scaling
Techniques for Low-Power Design,” Techniques for Low-Power Design,” Proc. IEEE Proc. IEEE Symp. Low Power DesignSymp. Low Power Design, 1997., 1997.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1515
Processor UtilizationProcessor UtilizationThroughput = Operations / second
Th
rou
ghp
ut
Time
Compute-intensiveprocesses
Systemidle
Low throughput(background)
processes
Maximumthroughput
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1616
Examples of ProcessesExamples of Processes
Compute-intensive: spreadsheet, spelling Compute-intensive: spreadsheet, spelling check, video decoding, scientific check, video decoding, scientific computing.computing.
Low throughput: data entry, screen Low throughput: data entry, screen updates, low bandwidth I/O data transfer.updates, low bandwidth I/O data transfer.
Idle: no computation, no expected output.Idle: no computation, no expected output.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1717
Effects of Voltage ReductionEffects of Voltage Reduction
Voltage reduction increases delay, Voltage reduction increases delay, decreases throughput:decreases throughput:
Slow reduction in throughput at firstSlow reduction in throughput at firstRapid reduction in throughput for VRapid reduction in throughput for VDD ≤ V≤ Vth
Time per operation (TPO) increasesTime per operation (TPO) increases
Voltage reduction continues to reduce Voltage reduction continues to reduce power consumption:power consumption:
Energy per operation (EPO) = Power × TPOEnergy per operation (EPO) = Power × TPO
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1818
Energy per Operation (EPO)Energy per Operation (EPO)
VVDD / V / Vth
1 2 3 4 5
PowerTPO
EPO
1.0
0.5
0.0
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 1919
Dynamic Voltage and ClockDynamic Voltage and Clock
ThroughputThroughputTime spent in:Time spent in:
Battery Battery lifelifeFast Fast
modemodeSlow Slow modemode
Idle Idle modemode
Always full speedAlways full speed 10%10% 0%0% 90%90% 1 hr1 hr
Sometimes full speedSometimes full speed 1%1% 90%90% 9%9% 5.3 hrs5.3 hrs
Rarely full speedRarely full speed 0.1%0.1% 99%99% 0.9%0.9% 9.2 hrs9.2 hrs
T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessors,Springer, 2002, pp. 35-36.
Example: Find Minimum Energy ModeExample: Find Minimum Energy Mode
Processor data (rated operation):Processor data (rated operation):2 GHz clock2 GHz clock1.5 volt supply voltage1.5 volt supply voltage0.5 volt threshold voltage0.5 volt threshold voltagePower consumptionPower consumption
50 watts dynamic power50 watts dynamic power50 watts static power50 watts static power
Maximum clock frequency for V volt supplyMaximum clock frequency for V volt supply
ff αα (V – V(V – VTHTH)/V)/V
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2020
Example Cont.Example Cont.
Dynamic power:Dynamic power:
PPdd = CV = CV22f = C(1.5)f = C(1.5)22××22××101099 = 50W = 50W
C = 11.11 nF, capacitance switching/cycleC = 11.11 nF, capacitance switching/cycle
PPdd = 11.11 V = 11.11 V22ffDynamic energy per cycle:Dynamic energy per cycle:
EEdd = P = Pdd/f = 11.11 V/f = 11.11 V22
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2121
Example Cont.Example Cont.
Clock frequency:Clock frequency:
f = k (V – Vf = k (V – VTHTH)/V = k (1.5 – 0.5)/1.5 = 2 GHz)/V = k (1.5 – 0.5)/1.5 = 2 GHz
k = 3 GHz, a proportionality constantk = 3 GHz, a proportionality constant
f = 3(V – 0.5)/Vf = 3(V – 0.5)/V GHzGHz
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2222
Example Cont.Example Cont.
Static power:Static power:
PPss = k’ V = k’ V22 = k’ (1.5) = k’ (1.5)22 = 50W= 50W
k’ = 22.22 mho, total leakage conductancek’ = 22.22 mho, total leakage conductance
PPss = 22.22 V = 22.22 V22
Static energy per cycle:Static energy per cycle:
EEss = P= Pss/f = 22.22 V/f = 22.22 V33/[3(V – 0.5)]/[3(V – 0.5)]
= 7.41 V= 7.41 V33/(V – 0.5)/(V – 0.5)
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2323
Example Cont.Example Cont.Total energy per cycle:Total energy per cycle:
E = EE = Edd + E + Ess = 11.11 V = 11.11 V2 2 + 7.41 V+ 7.41 V33/(V – 0.5)/(V – 0.5)To minimize E, To minimize E, ∂E/∂V = 0, or∂E/∂V = 0, or
5V5V2 2 – 4.6V + 0.75 = 0– 4.6V + 0.75 = 0Solutions of quadratic equation:Solutions of quadratic equation:
V = 0.679 volt, 0.221 voltV = 0.679 volt, 0.221 voltDiscard second solution, which is lower Discard second solution, which is lower
than the threshold voltage of 0.5 volt.than the threshold voltage of 0.5 volt.Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2424
Example: ResultExample: Result
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2525
Rated modeLow energy
modeReduction
(%)
Voltage 1.5 V 0.679 V 54.7%
Clock frequency 2 GHz 791 MHz 60%
Dynamic energy/cycle
25.00 nJ 5.12 nJ 79.52%
Static energy/cycle 25.00 nJ 12.96 nJ 48.16%
Total energy/cycle 50.0 nJ 18.08 nJ 63.84%
Dynamic power 50.0 W 4.05 W 91.90%
Static power 50.0 W 10.25 W 79.50%
Total power 100.0 W 14.20 W 85.80%
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2626
Problem of Process Variation in Problem of Process Variation in Nanometer TechnologiesNanometer Technologies
Lower Vth Vth Higher Vth
Nu
mb
er
of c
hip
s
Powerspecification
Clockspecification
From a presentation:Power Reduction using LongRun2 in Transmeta’sEfficon Processor, by D. DitzelMay 17, 2006
Yield lossdue to highleakage
Yield lossdue to slowspeedH
ighe
r vo
ltage
ope
ratio
n
Low
er v
olta
ge o
pera
tion
Nominalvoltage
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2727
Pipeline GatingPipeline Gating A pipeline processor uses speculative execution.A pipeline processor uses speculative execution.
Incorrect branch prediction results in pipeline stalls and Incorrect branch prediction results in pipeline stalls and wasted energy.wasted energy.
Idea: Stop fetching instructions if a branch Idea: Stop fetching instructions if a branch hazard is expected:hazard is expected:
If the count (M) of incorrect predictions exceeds a pre-If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for specified number (N), then suspend fetching instruction for some k cycles.some k cycles.
Ref.: S. Manne, A. Klauser and D. Grunwald, Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy “Pipeline Gating: Speculation Control for Energy Reduction,” Reduction,” Proc. 25Proc. 25thth Annual International Annual International Symp. Computer ArchitectureSymp. Computer Architecture, June 1998., June 1998.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2828
Slack SchedulingSlack Scheduling Application: Superscalar, out-of-order execution:Application: Superscalar, out-of-order execution:
An instruction is executed as soon as the required data and An instruction is executed as soon as the required data and resources become available.resources become available.
A commit unit reorders the results.A commit unit reorders the results.
Delay the completion of instructions whose result Delay the completion of instructions whose result is not immediately needed.is not immediately needed.
Example of RISC instructions:Example of RISC instructions: addadd r0, r1, r2;r0, r1, r2; (A)(A) sub sub r3, r4, r5;r3, r4, r5; (B)(B) and and r9, r1, r9;r9, r1, r9; (C)(C) or or r5, r9, r10;r5, r9, r10; (D)(D) xor xor r2, r10, r11;r2, r10, r11; (E)(E)
J. Casmira and D. Grunwald,“Dynamic Instruction SchedulingSlack,” Proc. ACM Kool ChipsWorkshop, Dec. 2000.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 2929
Slack Scheduling ExampleSlack Scheduling Example
Slack schedulingSlack scheduling
AABB CC
DD
EE
Standard schedulingStandard scheduling
AA BB CC
DD
EE
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3030
Slack SchedulingSlack Scheduling
Slack bitLow-power
execution units
Re-order buffer
Sch
edul
ing
logi
c
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3131
Clock Distribution H-TreeClock Distribution H-Tree
clock
Fanout, λ = 4
Tree depth, s = logλN
No. of flip-flops = N
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3232
Clock PowerClock PowerPclk = CLVDD
2f + CLVDD2f / λ + CLVDD
2f / λ2 + . . .
stages – 1 1= CLVDD
2f Σ ─ n = 0 λn
where CL = total load capacitance of N flip-flops
λ = constant fanout at each stage in distributionnetwork
Clock consumes about 40% of total processor power, because(1)Clock is always active(2)Makes two transitions per cycle, (α = 2)(3)Clock gating is useful; inhibit clock to unused blocks
Properties of H-TreeProperties of H-Tree
Balanced clock skew.Balanced clock skew.Small delay and power consumption.Small delay and power consumption.Requires fine-tuning for complex layout.Requires fine-tuning for complex layout.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3333
Clock Power and DelayClock Power and Delay Unit size buffer or inverter delay = dUnit size buffer or inverter delay = d Total dynamic power supplied to N flip-Total dynamic power supplied to N flip-
flops, P = Cflops, P = CLLVVDDDD22ff
Total power consumption of clock network:Total power consumption of clock network:
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3434
Flip-flps, N Clock power per flip-flop Clock delay
1 P d
4 P 4d
16 1.25P 8d
64 1.3125P 12d
128 1.327125P 16d
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3535
Clock Network ExamplesClock Network ExamplesAlpha 21064Alpha 21064 Alpha 21164Alpha 21164 Alpha 21264Alpha 21264
TechnologyTechnology 0.750.75μμ CMOS CMOS 0.50.5μμ CMOS CMOS 0.350.35μμ CMOS CMOS
Frequency (MHz)Frequency (MHz) 200200 300300 600600
Total capacitanceTotal capacitance 12.5nF12.5nFClock gating Clock gating used. Total used. Total power 80 -power 80 -
110W110W
Clock loadClock load 3.25nF3.25nF 3.75nF3.75nF
Clock powerClock power 40%40% 40% (20W)40% (20W)
Max. clock skewMax. clock skew 200ps (<10%)200ps (<10%) 90ps90ps
D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3636
Power Reduction ExamplePower Reduction Example Alpha 21064: 200MHz @ 3.45V, power dissipation =Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) =Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) =Eliminate FP, power (3x) = 1.6W Scale 0.75Scale 0.75μμ → 0.35 → 0.35μμ, power (2x) =, power (2x) = 0.8W Reduce clock load, power (1.3x) =Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200 →160MHz, power (1.25x) =Reduce frequency 200 →160MHz, power (1.25x) = 0.5W J. Montanaro J. Montanaro et alet al., “A 160-MHz, 32-b, 0.5-W CMOS RISC ., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” Microprocessor,” IEEE J. Solid-State CircuitsIEEE J. Solid-State Circuits, vol. 31, no. , vol. 31, no. 11, pp. 1703-1714, Nov. 1996.11, pp. 1703-1714, Nov. 1996.
Copyright Agrawal, 2007Copyright Agrawal, 2007 ELEC5270/6270 Spring 11, Lecture 14ELEC5270/6270 Spring 11, Lecture 14 3737
For More on MicroprocessorsFor More on Microprocessors
T. D. Burd and R. W. Brodersen, Energy T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, Efficient Microprocessor Design, Springer, 2002.2002.
R. Graybill and R. Melhem, R. Graybill and R. Melhem, Power Aware Power Aware ComputingComputing, New York: Plenum Publishers, , New York: Plenum Publishers, 2002.2002.
Top Related