Methodology for EnergyMethodology for Energy-Efficient...

85
Methodology for Energy-Efficient Design of Digital Circuits Methodology for Energy Methodology for Energy - - Efficient Efficient Design of Digital Circuits Design of Digital Circuits Vojin G. Oklobdzija, Advanced Computer Systems Engineering Laboratory University of California / University of Texas TxACE: Center of Excellence for Analog Circuits http://www. acsel-lab.com IEEE SSCS, Distinguished Lecture Santa Clara Valley Chapter April 15, 2010 *with acknolwedgment to contributions of my former students: Dr. Hoan Dao, AMD, Austin, TX and Dr. Bart Zeydel, Plato Networks

Transcript of Methodology for EnergyMethodology for Energy-Efficient...

Page 1: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

Methodology for Energy-Efficient Design of Digital Circuits

Methodology for EnergyMethodology for Energy--Efficient Efficient Design of Digital CircuitsDesign of Digital Circuits

Vojin G. Oklobdzija,

Advanced Computer Systems Engineering LaboratoryUniversity of California / University of Texas

TxACE: Center of Excellence for Analog Circuitshttp://www. acsel-lab.com

IEEE SSCS, Distinguished Lecture Santa Clara Valley Chapter

April 15, 2010

*with acknolwedgment to contributions of my former students: Dr. Hoan Dao, AMD, Austin, TX and

Dr. Bart Zeydel, Plato Networks

Page 2: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design2

Summary of the PresentationSummary of the Presentation

Energy Efficiency of Digital CMOS Circuits

o Problems

o Energy-Delay Relationship

o Minimizing Energy for a given delay

o Methodology

o Determining the best structures for high-performance system

o Implications on the architecture

Page 3: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design3

Challenges in HighChallenges in High--Performance DesignPerformance DesignOptimizing for power – not speed ! (or maximizing speed under the power budget)

Logical Effort (LE) optimizes for speed, regardless of power i.e. brings us in the worse energy spot.

Our method optimizes for:powerpower @ given @ given speed speed or speedspeed @ given @ given power power We are developing new approaches for power efficiency (overlooked by delay optimization) applicable to:- Circuit structures- Design techniques- Energy-Delay Space- Creation of optimal Standard cell (ASIC) libraries

Page 4: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Motivation for Energy Efficient DesignMotivation for Energy Efficient Design

4 Energy-Efficient CMOS Circuit Design

Power density passed the level found in the Nuclear Reactor ! Power density degrades the reliability and speed.

Shekhar Borkar

Page 5: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010October 9, 2009 Energy-Efficient CMOS Circuit Design5

Page 6: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design6

EnergyEnergy--Delay RelationshipDelay Relationship

Page 7: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design7

EnergyEnergy--Delay Space ViewDelay Space View

Delay

Energy

A

B

Adder A

Adder B

Region 1 Region 2

A’ B’

A”

B”

Speed of A Speed of B

A isfaster

Lesspower

Point where B becomesbetter than A

With better E-Dtradeoff B canachieve more

speed with lesspower than A

• Must look at Energy-Delay Space of designs

Page 8: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design8

0

1E-10

2E-10

3E-10

4E-10

5E-10

7 8 9 10 11 12 13Delay (FO4)

Est

imat

ed E

nerg

y (J

)cdKScdHCcdQT7cdIBMcdLingcdKS4-2cdQT9

EnergyEnergy--Delay Space ViewDelay Space View

High-PerformanceTarget

Low-PowerTarget

Which DesignShould be used?

Page 9: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design9

• Begin to see characteristics of designs

0

1E-10

2E-10

3E-10

4E-10

5E-10

7 8 9 10 11 12 13Delay (FO4)

Est

imat

ed E

nerg

y (J

)cdKScdHCcdQT7cdIBMcdLingcdKS4-2cdQT9

Low-PowerTarget

High-PerformanceTarget

H is changing, w/ Cout=constant

EnergyEnergy--Delay Space ViewDelay Space View

Page 10: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design10

0

1E-10

2E-10

3E-10

4E-10

5E-10

7 8 9 10 11 12 13Delay (FO4)

Est

imat

ed E

nerg

y (J

)cdKScdHCcdQT7cdIBMcdLingcdKS4-2cdQT9

Low-PowerTarget

High-PerformanceTarget

• Begin to see characteristics of designs

EnergyEnergy--Delay Space ViewDelay Space ViewH is changing, w/ Cout=constant

Page 11: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design11

• Best High-Performance designs are clearly seen• Different than what would be chosen from single point

0

1E-10

2E-10

3E-10

4E-10

5E-10

7 8 9 10 11 12 13Delay (FO4)

Est

imat

ed E

nerg

y (J

)cdKScdHCcdQT7cdIBMcdLingcdKS4-2cdQT9

Low-PowerTarget

High-PerformanceTarget

EnergyEnergy--Delay Space ViewDelay Space ViewH is changing, w/ Cout=constant

Page 12: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design12

0

1E-10

2E-10

3E-10

4E-10

5E-10

7 8 9 10 11 12 13Delay (FO4)

Est

imat

ed E

nerg

y (J

)cdKScdHCcdQT7cdIBMcdLingcdKS4-2cdQT9

Low-PowerTarget

High-PerformanceTarget

• Also determines best design for Low-Power Target

EnergyEnergy--Delay Space ViewDelay Space ViewH is changing, w/ Cout=constant

Page 13: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design13

0

1E-10

2E-10

3E-10

4E-10

5E-10

7 8 9 10 11 12 13Delay (FO4)

Est

imat

ed E

nerg

y (J

)cdKS_LEcdHC_LEcdQT9_LEcdQT7_LEcdKS4-2_LEcdLing_LEcdIBM_LE

No Wire

• Without wire, differences appear large

Contribution of Wire to Delay and Energy Contribution of Wire to Delay and Energy should be examined tooshould be examined too

Page 14: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design14

0

1E-10

2E-10

3E-10

4E-10

5E-10

7 8 9 10 11 12 13Delay (FO4)

Est

imat

ed E

nerg

y (J

)cdKScdHCcdQT7cdIBMcdLingcdKS4-2cdQT9

With Wire

• Wire strongly impacts selection of “best adders”

Contribution of Wire to Delay and Energy Contribution of Wire to Delay and Energy should be examined tooshould be examined too

Page 15: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design15

EnergyEnergy

Page 16: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design16

Where does Logical Effort lead us?Where does Logical Effort lead us?

Total Delay

Ener

gy

LE Point

lower stage-effort

higher stage-effort

• It is possible to lower energy by trading delay? or …

Most design approaches focus here

Page 17: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design17

Design in EnergyDesign in Energy--Delay SpaceDelay Space

Delay

Ener

gy

Lowest possible energy

E0

Lowest possible dealy D0

(E-E0)(D-D0)=0.2xE0D0

All Possible Designs

Design ADesign B

*P. Penzes, Caltech, PhD 2002, V. Zyuban, ISLPED 2002

Page 18: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design18

EnergyEnergy--Delay Space of a CircuitDelay Space of a Circuit(Fixed Input Size and Output Load)(Fixed Input Size and Output Load)

En

ergy

[pJ]

Delay [ps]

LE

Minimal Energy-Delay Curve

Cout

Cin

Fixed

Fixed

Variable130nm 1.2V CMOS

Page 19: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design19

Exhaustive searchExhaustive search

──────────────────────────C. Giacomotto, N. Nedovic, and V. G. Oklobdzija, “The Effect of the System Specification on the Optimal Selection of Clocked Storage Elements”, IEEE JoSSC, vol. 42, no. 6, June 2007.

To obtain the minimalE-D curve do I have to check all possible solutions?

A circuit with 10 transistors and 10 possible size for each transistor requires to check 1010 possible solutions!

Page 20: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design20

100

150

200

250

300

350

400

450

6 7 8 9 10 11Delay [FO4]

Ener

gy [f

J]

LE (Dmin)

EDPED2

ED3

ED0.5

Cout = 100Cin

Cin = Minimum size invED3

ED2

EDP

ED0.5

constant metric curve

H = Cout / Cin = 100

Points obtainedthrough gate sizing

Hardware Intensity, Victor Zyuban, IBM T.J.Watson

Hardware Intensity: V. Hardware Intensity: V. ZyubanZyuban, IBM, IBM

Page 21: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design21

• Design choice depends on (E,D) requirements

DesignDesign ObjectiveObjective at at NomialNomial VVDDDD

0

10

20

30

40

50

14 16 18 20 22 24 26 28

Delay (FO4)

Ene

rgy

(pJ)

η=2.1

8-bit ACSFixed Input, Fixed Load(0.13μm, 1.2V CMOS)

Circuit Resizingat Fixed Supply

η=∞(Delay Opt.)

η=0.6

η=6.0

upplysfixedDE

ED

∂∂

−=ηhardware intensity:

Hardware Intensity, Victor Zyuban, IBM T.J.Watson

Page 22: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Prior Work on Prior Work on DesignDesign OptimizationOptimization

• Transistor-based [TILOS]– Sizing individual transistors– Growing complexity– Applicable to small blocks only

• Block-based [Zuyban & Strenski]– Blocks: latch & logic– Trading {energy,delay} of blocks– CAD tools– Fixed interface

Page 23: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

TransistorTransistor--BasedBased ApproachApproach

• Optimization problem: (i=1..M)– Minimize: Area(W1,…,WM) ≈ ΣWi– Constraint: Dworst(W1,…,WM) = T

• Delay modeling– Linear (RC-like): TILOS– Look-up table: AMPS (Synopsys)

• A convex problem ⇒ minimal solution exists– Different polynomial algorithms developed– May have issues with convergence– Long run time with increasing design complexity

Page 24: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

BlockBlock--BasedBased: : ZyubanZyuban (IBM)(IBM)

• Optimization problem for a pipelined stage:– Minimize energy: E(B1,…,BM) = ΣEBi

– Delay constraint: D(B1,…,BM) = ΣDBi = T

• Optimal criteria:– (wi / ui) · ηi = (wk / uk) · ηk for any (i,k)

with wx = EBx/E, ux = DBx/T, ηx = (∂EBx/EBx)/(∂DBx/DBx)

• Similar criteria for a simple pipelined system

[Zyuban, IBM T.J. Watson Research]block B1 block B2

block BM

Page 25: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Application: Application: SolutionSolution VerificationVerification

• Verify optimality of solution:– Block 1: (w1/u1)·η1 = 2.0– Block 2: (w2/u2)·η2 = 2.0

[Zyuban, IBM T.J. Watson Research]

w1=0.5u1=0.2

w2=0.5u2=0.8

Block 1Block 2

LoadInput

wi = Ei / E; E = ΣEiui = Di / D; D = ΣDi

η1=0.8 η2=3.2

Equal ⇒ optimal!

Page 26: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

ApplicationApplication: : SolutionSolution VerificationVerification

• If η1 = 3.2 and η2 = 0.8– Block 1: (w1/u1)·η1 = 8.0– Block 2: (w2/u2)·η2 = 0.5– Better solution? Relax η1 and increase η2

w1=0.5u1=0.2

w2=0.5u2=0.8

Block 1Block 2

LoadInput

wi = Ei / E; E = ΣEiui = Di / D; D = ΣDi

Unequal ⇒ not optimal

Page 27: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

MajorMajor LimitationLimitation

• Zyuban’s assumption:– Delay & energy independence of each block Bi

• Single path: block ≡ gate

• Similar dependency between blocks and pipelines• No analytical solution if accounting dependency

⎪⎪⎩

⎪⎪⎨

⎧∝∝

capgatenext:Ccapgatecurrent:C

}C,C{EEnergy}C,C{TDelay

out

in

inout

inoutd

: energy, delay dependencyof adjacent gates

B1 B2 B3 B4

Input Output

CLoad

Page 28: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Proposed StageProposed Stage--Based ApproachBased Approach

• Stage ≈ logic depth

• Gates → stage– Based on maximal distances to input and output– Stage delay: dstage = max{dgate}– Stage energy: Estage = ΣEgate– Estimated from gate energy & delay models

Page 29: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

• Logical Effort delay model:

Delay and Energy Modeling of GatesDelay and Energy Modeling of Gates

( ) ( )τκκκ

κκ

pghCRCRCR

CC

CRCR

T invinvinvinv

pargate

gate

out

invinv

gategategated +=

⎥⎥⎦

⎢⎢⎣

⎡+= o,

logical effort g(gate dependent)

electrical effort h(load dependent)

parasitic p(gate dependent)

CoutCin =Cgate

Cpar

Rup=Rgate

Rdown=Rgate

stage effort f = g•h

reference delay τ(technology dependent)

inCKlinpoutggate WTPWEWEWE ++=)(

Leakage EnergyActive Energy

Energy model:

Page 30: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

Pipelined Stage OptimizationPipelined Stage Optimization

Page 31: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

StageStage--Based OptimizationBased Optimization

• Optimization functions– Delay: D = ΣDStage(i)– Energy: E = ΣEStage(i)

• Possible design constraints– Delay target, D– Input size, Winput– Output load, Cload

• Posynomial Problems – Solvable with polynomial algorithms

Page 32: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Problem A: Delay OptimizationProblem A: Delay Optimization

• Optimization problem– Minimize: D = ΣDSi– Constraint: {Input, Load} = const.

• Objectives– Obtain minimally achievable delay, Dmin– Wanted in performance-critical designs– Disregard energy consumption (actually, ∂Ei /∂Di = ∞)

Page 33: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Single Path: Logical EffortSingle Path: Logical Effort

• Solution = equal stage effort f (i.e. fan-out)

N/

in

LoadN

iiopti C

Cgff1

1⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛== ∏

=

∑=

+⎟⎟⎠

⎞⎜⎜⎝

⎛+=

N

ii

i

ii p

CCgD

1

1Ci = input cap of gate iCin = C1 : fixedCN+1 = CLoad : fixed

∑=

+=N

iioptmin pfND

1

dD/dCi = 0 (∀i = 1..N):

Path Gain, H

Page 34: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Energy Cost vs. Total DelayEnergy Cost vs. Total Delay

S1 S2 S3 S4

InputOutput

CLoad= 92CCin

Cin=C1.71C2.44C3.16C

4.16C5.6C

7.7C10.9C

16C

0

200

400

600

800

1000

1200

1400

14 16 18 20 22 24Delay (tau)

Tota

l Ene

rgy

(fJ)

Fixed Load = 92C

better performance ⇔larger input, more energy

Page 35: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

MultiMulti--Path CircuitsPath Circuits

• Optimal delay depends on off-path load

=

+

+

++

=

++

⎟⎟⎠

⎞⎜⎜⎝

⎛+

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛+

+=

N

ii

i

i

i

i,offii

N

ii

i

i,offii

pC

CC

CCg

pCCC

gD

1

1

1

11

1

11

Ci = input cap of gate iCoff,i = input cap of i th off-path gate

C1 C2 C3 C4

Input Output

CLoadCin

Coff,2 Coff,3 Coff,4

bi+1 = branching factor at (i+1)th-gate input

on-path

off-paths

Page 36: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Linear BranchingLinear Branching

• Linear branching: Coff,i / Ci = const.

• Similar analytical form for solution

N

in

LoadN

ii

N

iiopti C

Cbgff/1

11⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛== ∏∏

==

dD/dCi = 0 (∀i = 1..N):

∑=

+=N

iioptmin pfND

1

C1 C2 C3 C4

Input Output

CLoadCin

Coff,2 Coff,3 Coff,4

Branching Factor, B

Page 37: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

NonNon--Linear BranchingLinear Branching

• Nonlinearity due to:– Constant off-path load (wire cap, min-size gates)– Unequal path lengths– Parasitic delay difference of gates

• No analytical form– Recursive solving– Solution: unequal stage efforts

C1 C2 C3 C4

Input Output

CLoadCin

Coff,2 Coff,3 Coff,4

Page 38: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

6464--bit Static bit Static KoggeKogge--Stone AdderStone Adder

long wiringwiring

intermediate wiringintermediate wiring

intermediate wiringintermediate wiring

short wiringshort wiring

short wiringshort wiring

short wiringshort wiring

8 S

tage

s

g = aibi ; p = ai+bi

G = gi+pigi-1 ; P = pipi-1

Sum = ai ⊕ bi ⊕ ci

Page 39: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

0

1

2

3

4

5

6St

age

Effo

rt, f

Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Stage7 Stage8

Wire Ignored

Wire Included

6464--b KS: Stage Effort Distributionb KS: Stage Effort Distribution

• Nonlinearity causes unequal stage efforts– Nonlinear factors: wire load, parasitic delay diff.

Input = 3μmLoad = 60μm

wireeffect

parasiticdifference

Page 40: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

6464--b KS: Energy versus Delayb KS: Energy versus Delay

0

50

100

150

200

250

300

11 12 13 14 15 16Delay (FO4)

Ene

rgy

(pJ)

Wire Ignored

Wire Included

wireeffect

Input = {30, 20, 15, 10, 6, 3}μm

exponentialbehavior

• Significant wire effect on delay & energy

Page 41: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Problem B: EnergyProblem B: Energy--Delay TradeoffDelay Tradeoff

• Optimization Problem– Minimize: E = ΣEStage(i)– Constraint: {Input, Load} = const.

ΣDStage(i) = Dmin + ΔD• Objectives

– Avoid infinite energy sensitivity at Dmin– Equalize energy-delay sens.: (∂E/∂D)Si = (∂E/∂D)Sk– Trade delay for energy (traditional approach)

Page 42: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

EE--D TradeD Trade--off: Single Pathoff: Single Path

S1 S2 S3 S4

InputOutput

92CinCin=1

0

100

200

300

400

500

600

20 25 30 35Delay (tau)

Ener

gy (f

J)

D_min

η = 1.0 @D = 1.18Dmin

gooddelay trading

for energy less energy savingfor extra traded delay

(infinite energy sensitivity)

Page 43: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

0

2

4

6

8

10

12

14

16

18

Stag

e Ef

fort

, f

f_Stage1 f_Stage2 f_Stage3 f_Stage4

EE--D TradeD Trade--off: Stage Effort off: Stage Effort DistribDistrib..

logical effort

S1 S2 S3 S4

InputOutput

92CinCin=1

unequal distribution of ΔD to stages

ΔD: small→large

Page 44: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

0

50

100

150

200

250

300

350

400

Ener

gy (f

J)

E_Stage1 E_Stage2 E_Stage3 E_Stage4

EE--D TradeD Trade--off: Energy off: Energy DistribDistrib..

S1 S2 S3 S4

InputOutput

92CinCin=1

logical effort

minimal overallenergy reduction

(*) Including constant output load

ΔD: small→large Constant Constant LoadLoad

EnergyEnergy

Page 45: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

6464--b KS: Energy Delay Tradeoffb KS: Energy Delay Tradeoff

• Dmin solution is very inefficient in energy• 55% energy saving with 10% delay traded• Solution = equal stage energy-delay sensitivity

0

100

200

300

400

11 12 13 14 15 16Delay (FO4)

Ene

rgy

(pJ) Fixed Input & LoadPossible Design

Dmin

EfficientE-D Curve

55%

Ener

gy

10%delay

Page 46: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design46

Design in EnergyDesign in Energy--Delay SpaceDelay Space

LE

Delay

Ener

gy

Decrease Stage Effort

Increase Stage Effort

If this is your design point – the drop is steep !But you should not be

designing there!!

For a small sacrifice in delaythe energy savings are big !

From:R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, "Methods for True Power Minimization," International Conference on Computer-Aided Design, ICCAD-2002, Digest of Technical Papers, San Jose, CA, November 10-14, 2002, pp. 35-42.

Page 47: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Problem C: Energy MinimizationProblem C: Energy Minimization

• Problem:– Minimize: E = ΣESi

– Constraint: D = ΣDSi = const.Load = const.

• Objective:– Obtain absolute minimal energy @ given delay– Equalize energy-delay sens.: (∂E/∂D)Si = (∂E/∂D)Sk

– Trade input size such that (∂E /∂Input)D = 0

Page 48: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Single Path: Stage Effort Redistrib.

S1 S2 S3 S4

InputOutput

92CCin

0123456

789

Stag

e Ef

fort

, f

f_stage1 f_stage2 f_stage3 f_stage4

Logical Effort

Min-Energy

Σf = constredistribution ofstage efforts tofit stage energy

Page 49: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Single Path: Energy DistributionSingle Path: Energy Distribution

0.75E* @4.9x Input

(*) Including energy consumed on output load

0

50

100

150

200

250

300

350

400

Ener

gy (f

J)

E_stage1 E_stage2 E_stage3 E_stage4

Logical Effort

Min-Energy

• 25% less energy• Energy reduction in heavily-loaded stages

S1 S2 S3 S4

InputOutput

92CCin

Constant Constant LoadLoad

EnergyEnergy

Page 50: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

6464--b KS: Minimal Energy vs. Delayb KS: Minimal Energy vs. Delay

• 30 - 50% energy saving @ same performance• 1.6 - 3.6X input size

0

50

100

150

200

250

300

11 12 13 14 15 16 17Delay (FO4)

Ene

rgy

(pJ)

Delay Optimized

Energy Minimized

W LOAD = 60 μ m

38%Energy

Page 51: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

Pipelined System OptimizationPipelined System Optimization

Page 52: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Pipelined System OptimizationPipelined System Optimization

• Design constraints– Delay target– External I/O constraint

• How to obtain minimal-energy solution?– Pipelined stages

• Minimized for energy• Sensitive to input and load variations

– System level• Balancing energy sensitivities at pipelined boundaries

– Recursive process

Page 53: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Pipelined Stage: Efficient EPipelined Stage: Efficient E--D AreaD Area

0

50

100

150

200

250

300

11 12 13 14 15 16 17Delay (FO4)

Ener

gy (p

J)

Energy Rangefor Designs

Achieving the Same Delay

Energy-Minimized Design Points(Lowest Energy Achievable)

30μminput

Smaller H = Cout/Cin

Load = 60 μ m gate width

(64-bit static Kogge-Stone adder in 0.13μm CMOS at 1.2V)

20μminput

10μminput

6μminput 4μm

input 3μminput

Delay-OptimizedDesign Points(Logical Effort)

• Efficient E-D area is upper- and lower-bounded

Page 54: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

0

10

20

30

40

50

60

11 12 13 14 15 16 17

Delay (FO4)

Inpu

t Siz

e ( μ

m)

Delay-OptimizedDesign Points(Logical Effort)

Energy-Minimized Design Points

Range of Input Sizefor Achieving the

Same Delay

(64-bit static Kogge-Stone adder in 0.13μm CMOS at 1.2V)Load = 60 μ m gate width

Efficient InputEfficient Input--Delay AreaDelay Area

• Larger/smaller input less/more energy

Page 55: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Efficient EnergyEfficient Energy--Input @ D = const.Input @ D = const.

• Energy vs. input size: one-to-one relation– Energy sensitivity to input, σE,Input

• Flat energy in upper half of input range

Input Size

Delay-Opt

Ene

rgy

Energy-Min.

Tradable InputTr

adab

le E

nerg

y

Delay = constLoad = const

fixedLoadInputE Input

E

=∂∂

−=,σ

Delay

Energy

Delay

Input

Page 56: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

0

50

100

150

200

250

300

10 15 20 2530 35

40

60

80

100Ener

gy (p

J)

Input (μm)

Load

(μm

)

• Energy sens. to output load, σE,Load– More @ upper bound (delay-opt.)– Less @ lower bound (energy-min.)

• Energy components = energy + its sensitivities

Sensitivity to Load @ D = const.

Load

Ene

rgy

(Fixed delay, fixed input)

Almost Linear

Exponential

Lower Boundary

Uppe

r Bou

ndar

y

LoadE ,σInputE ,σfixedInput

Load,E LoadE

=∂∂

Delay = const

Page 57: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

System Energy Optimization

• Energy minimization– Pipeline: minimize energy @ given input & load– System: balance energy sensitivities @ boundaries

• Trading elements: input size, output load

• Optimal criteria:⎪⎩

⎪⎨⎧

==

−1ii

i

ALoadEAInputE

AStage nimalmiE

,, σσ

Pipeline 1E1, Do

LoadInput

Boundary B2 Boundary BN

Logic

Pipeline 2E2, Do

Logic

Pipeline NEN, Do

Logic

Page 58: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

How to Achieve Less Total EnergyHow to Achieve Less Total Energy

Case A: σE,Input (i) > σE,Load (i-1)

• Increase i th input⇒ less ETotal

)i(Load,E)i(Input,Eii nmiEE 11 −− =⇔=+ σσ

(i -1)th Pipeline:{E, D, σE,Input, σE,Load}

Boundary Bi

i th Pipeline:{E, D, σE,Input, σE,Load}

Logic Logic

Case B:σE,Input (i) < σE,Load (i-1)Reduce i th input⇒ less ETotal

Page 59: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Case Study: Media Case Study: Media DatapathDatapath

• What is the minimal energy solution?

Maximal Input = 30μm

Target Delay = 17FO4

Output Load = 60μm gate width

Page 60: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Case Study: Optimal CriteriaCase Study: Optimal Criteria

Simplified Datapath

MAX{X,Y,U,V} = 30μm

Delay = 17FO4

{Z} = 60μm

σE,Load P1 + σE,Load P2= σE,Input P3

E{P1} = min.

E{P2} = min.

E{P3} = min.

Boun

dary

Page 61: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Case Study: Optimal AlgorithmCase Study: Optimal Algorithm

Simplified Datapath

Page 62: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Case Study: Media Datapath Solution

0

200

400

600

800

1000

1200

4 6 8 10 12Multiplier Load = Adder Input (um)

Ene

rgy

(pJ)

0

50

100

150

200

250

300

Ene

rgy

Sen

sitiv

ity (p

J/m

)

Optimal

37%26%System Energy

EnergySensitivity

M ultipliers

M ediaAdder

BalancedSensitivity

0

50

100

150

200

250

300

3 5 7 9 11 13Multiplier Output Load = Adder Input Size (μm)

Ener

gy S

ensi

tivity

(pJ/

μm)

0

200

400

600

800

1000

1200

Syst

em E

nerg

y (p

J)

Optimal

26%

ene

rgy

savi

ng fr

omde

lay-

optim

ized

add

er

EnergySensitivity

BalancedSensitivities

MediaAdder Multipliers

Region of Design Points

Minimal Energyat Fixed Boundaries

37%

ene

rgy

savi

ng fr

omde

lay-

optim

ized

mul

tiple

rs

SystemEnergy

Media Datapath0.13μm CMOS at 1.2V

Delay = 17FO4

Page 63: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

System EnergySystem Energy--Delay @ VDelay @ VDDDD=const.=const.

• Similar E-D characteristics as single stages• Possibly less input size @ lower delay

VDD=constant0

100

200

300

400

500

600

700

14 16 18 20 22 24 26

Delay (FO4)

Tota

l Sys

tem

Ene

rgy

(pJ)

η = θ =2.1

MinimalAchievable

E-D forCircuit Sizing

{10,5} μ m

{5,1} μ m

input{Mult,Add} = {3.8,0.8} μ m

{13.7,1.4} μ m

{20,10} μ m{25.0,5.0} μ m

{22.1,3.0} μ m

{15,15} μ m

Zyuban & Strenski'sFixed Pipeline Input

OptimizationRegion of

Design Points

Media Datapath0.13μm CMOS at 1.2V

Page 64: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Effect of Supply ScalingEffect of Supply Scaling

• Efficient {E, D, VDD} are dependent• Optimal criterion: θη =

∂∂

=∂∂

systemplyupssizing

orDE

ED

DE

ED

0

50

100

150

200

250

300

350

400

12 17 22 27 32

Delay (FO4)

Tota

l Ene

rgy

(pJ)

0.8V

Optimal (E,D)at 1.2V

Optimal (E,D)at 1.0V

Minimal AchievableE-D for Circuit Sizingand Supply Variation

0.9V1.0V

1.1V

1.2V

1.3V1.4V

1.5V Minimal E-Dfor circuit sizing

at 1.0V supply voltage

Region of Design Points

Media Datapath(0.13μm, 1.2V CMOS)

Page 65: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

0

100

200

300

400

500

1.0 1.1 1.2 1.3 1.4 1.5 1.6Supply Voltage (V)

Ener

gy (p

J)Media Datapath

900psDelay

NominalSupply

794psDelay2.8X

1.4X

Potential Saving of System Energy Potential Saving of System Energy

• Significant energy saving with correct supply or delay selection

Energy @nominal VDD

Energy @optimal supply

(1.2V, 0.13μm CMOS)

Page 66: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

EnergyEnergy--Delay ImprovementDelay Improvementof Pipelined Stagesof Pipelined Stages

Page 67: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Architectural AdvantagesArchitectural Advantages

• Feedback bus is typical

• Length = 128μm

≅ N • area≅ N • bus length1/N clock rate

≅ area≅ bus lengthSame clock rate

Page 68: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

EnergyEnergy--Throughput ComparisonThroughput Comparison

• Pipelining is mostly more efficient in E-D domain!

2x3x

4x5x

Ref

2x 3x 4x 5x

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7Throughput (norm)

Ener

gy (p

J/op

)Parallelism

Pipelining

(0.13 μ m, 1.2V CMOS)

More EnergyEfficient

BetterThroughput

ACS Circuit

Page 69: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

ACS Unit ImplementationNot pipelined8 ACS circuitsLargest areaLeast PM entries per ACS

Pipelined-42 ACS circuits

Small area

Most PM entries / ACS

Pipelined-24 ACS circuitsMidway areaMore PM entries / ACS

Application:64-state rate-½Viterbi decoder

Page 70: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Energy-Throughput Comparison

• Less energy for deeper pipeline at given throughput

Pipelining = 1Parallelism = 8

Pipelining = 2Parallelism = 4

Pipelining = 4Parallelism = 2

0

50

100

150

200

15 20 25 30 35 40Clock Cycle (FO4)

Tota

l AC

S En

ergy

(pJ)

Supply ScalingCircuit Sizing

63%

37%Lower Supply Voltage

(0.04V Step)

ACS Unit(0.13μm, 1.2V CMOS)

Page 71: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design71

Designing a System Designing a System for a Fixed Performance for a Fixed Performance

in the Energyin the Energy--Delay SpaceDelay Space

Page 72: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design72

Pipelining and Parallelism for an ACS CircuitPipelining and Parallelism for an ACS Circuit

Ideal Degree-N Parallelism•N-time throughput•(Area, routing) overhead

Ideal Depth-N Pipelining•N-time throughput•CSE overhead

Page 73: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design73

EnergyEnergy--Delay Results for ParallelismDelay Results for Parallelism

1X2X

3X4X

5X6X

7X8X

9X10X

0

2

4

6

8

10

12

14

16

18

20

0 5 10 15 20 25Delay [FO4]

Ene

rgy

[pJ/

op]

H. Q. Dao, B. R. Zeydel, V. G. Oklobdzija, "Energy-Efficient Optimization of the Viterbi ACS Unit Architecture", Proceedings of the Asian Solid-State Circuit Conference, A-SSCC 2005, Hsinchu, Taiwan, November 1-3, 2005.

Nominal Supply (1.2V)Varying Supply

DecreasingSupply Voltage

52%EnergySavings

HigherParallelism Degree

1.5V

1.3V

1.2V

1.4V

1.14V

1.08V1.02V

0.96V0.90V

0.84V0.78V

0.72V

Gate Sizes are optimized for Nominal Supply (1.2V)

Page 74: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design74

5X4X 3X 2X 1X

0

2

4

6

8

10

12

14

0 5 10 15 20 25

Delay [FO4]

Ener

gy [p

J/op

]

H. Q. Dao, B. R. Zeydel, V. G. Oklobdzija, "Energy-Efficient Optimization of the Viterbi ACS Unit Architecture", Proceedings of the Asian Solid-State Circuit Conference, A-SSCC 2005, Hsinchu, Taiwan, November 1-3, 2005.

61%EnergySavings

HigherPipeline Depth

DecreasingSupply Voltage

Nominal Supply (1.2V)Varying Supply

1.5V

1.3V

1.2V

1.4V

1.14V1.08V

1.02V

0.96V0.90V

0.84V

0.78V 0.72V

Gate Sizes are optimized for Nominal Supply (1.2V)

EnergyEnergy--Delay Results for PipeliningDelay Results for Pipelining

Page 75: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design75

EnergyEnergy--Delay Estimation Matches Complex Delay Estimation Matches Complex Circuit SimulationCircuit Simulation

0.0001

0.001

0.01

0.1

1

10

0.5 1.0 1.5 2.0 2.5 3.0Delay (normalized to 1.2V data)

Ener

gy (n

orm

aliz

ed to

1.2

V da

ta)

Gates

Adder 1

Adder 2

Multiplier 1

Multiplier 2

Lower Supply(0.1V step)

Act

ive

Leak

age

1.5V1.2V

0.7V

Page 76: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design76

Energy Efficiency of Architectural ChoicesEnergy Efficiency of Architectural Choices(including supply scaling and circuit sizing)(including supply scaling and circuit sizing)

2x3x

4x5x

1X2x3x

4x5x

0

2

4

6

8

10

12

14

16

18

20

0 5 10 15 20 25

Delay [FO4]

Ene

rgy

[pJ/

op]

Parallelism

Pipelining

> 52%

Supply ScalingCircuit Sizing

MoreEnergyEfficient

SlightlyBetterDelay

ACS Circuit

H. Q. Dao, B. R. Zeydel, V. G. Oklobdzija, "Energy-Efficient Optimization of the Viterbi ACS Unit Architecture", Proceedings of the Asian Solid-State Circuit Conference, A-SSCC 2005, Hsinchu, Taiwan, November 1-3, 2005.

Page 77: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design77

EergyEergy--Delay Results for Parallelism and PipeliningDelay Results for Parallelism and Pipelining

Reference

Parallel-8

Parallel-4

Parallel-2

Pipeline-2Pipeline-4

Parallel-2Pipeline-2

Parallel-2Pipeline-4

Parallel-4Pipeline-2

0

2

4

6

8

10

12

14

16

18

0 5 10 15 20 25

Delay [FO4]

AC

S E

nerg

y [p

J/op

]depth•degree = 8

depth•degree = 4

Pipelining has the Best EnergyPipelining has the Best Energy--Delay TradeoffDelay Tradeoff

Supply = 1.2V

H. Q. Dao, B. R. Zeydel, V. G. Oklobdzija, "Energy-Efficient Optimization of the Viterbi ACS Unit Architecture", Proceedings of the Asian Solid-State Circuit Conference, A-SSCC 2005, Hsinchu, Taiwan, November 1-3, 2005.

Page 78: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design78

Delay

Ener

gy

DelayOptim ized

Energy Optim ized

Is it possible to lower the Energy ?Is it possible to lower the Energy ?

• Reduce Energy for same Delay! • Improve Delay for same Energy!

EnergySavings

Page 79: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design79

0

50

100

150

200

250

300

350

10 11 12 13 14Delay (FO4)

Ener

gy (p

J)

KS_LE HC_LE

KS_Opt HC_Opt

Achieved EnergyAchieved Energy Savings in KS and HC Savings in KS and HC AddersAdders

• Simulation of 64-bit static adders confirms saving!

50% 45%

Page 80: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design80

0E+0

01E

-10

2E-1

03E

-10

4E-1

05E

-10

7 8 9 10 11 12 13Delay (FO4)

Estim

ated

Ene

rgy

(J)

cdKS cdKS_LEcdHC cdHC_LEcdQT7 cdQT7_LEcdIBM cdIBM_LEcdLing cdLing_LEcdKS4-2 cdKS4-2_LE

27%

18%52%

21%

26%24%

Achieved EnergyAchieved Energy Savings in Savings in Representative AddersRepresentative Adders

• Comparison of high-performance 64-bit adders- Delay and Energy Optimized

Page 81: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design81

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0S1S2S3S4S5S6S7S8S9S10S11

0

5

10

15

20

25

30

35

40

45

50

55

60

Gat

e Si

ze (u

m)

Bit

Stag

e15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

S1S2S3S4S5S6S7S8S9S10S11

0

5

10

15

20

25

30

35

40

45

50

55

60

Gat

e Si

ze (u

m)

Bit

Stag

e

• Energy minimization improves hotspots!

Reduction of HotReduction of Hot--SpotsSpots

Reduces Hotspots

Delay Optimized Energy Optimized

Page 82: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design82

Delay [ns]0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Ene

rgy

[pJ]

0

10

20

30

40

50

60

70

80

Accomplishments Accomplishments (June 30, 2003 to June 30, 2004)(June 30, 2003 to June 30, 2004)

Initial Design

Optimized Design Worst Case Energy VectorWith 100% Input Activity

EnergySavingDelay

Saving

90nm technology

Collaboration with

Intel AMR

215μm

Input Flip FlopsInput Flip FlopsInput Flip FlopsBooth EncodingBooth EncodingBooth Encoding

Booth SelectBooth SelectBooth Select

Partial Product TreePartial Product TreePartial Product Tree

Final AdderFinal AdderFinal AdderOutput Flip FlopsOutput Flip FlopsOutput Flip Flops

130μm

Page 83: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010 Energy-Efficient CMOS Circuit Design83

SummarySummary• Energy-Efficient Design requires:

– Early structure comparison in energy-delay space– Early Layout/Floorplanning– Optimization using energy minimization objective

function

• LE does not guarantee a good design

• Our method of energy-minimization focuses on reducing power

• The same principles hold for other logic functions

Page 84: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Future Work

• Methodology improvement– General design rules– Algorithms with fast convergence– Guidelines for close-to-optimal solutions

• CAD tool– Custom vs. standard cell library

• Improve gate modeling & characterization– Worst-case vs. single-switching,… or between?– Process variation

Page 85: Methodology for EnergyMethodology for Energy-Efficient ...sites.ieee.org/scv-sscs/files/2010/04/Energy-Delay-Optimization... · Methodology for EnergyMethodology for Energy-Efficient

April 19, 2010

Publications on Energy-Delay:• Vojin G. Oklobdzija, Bart R. Zeydel, Hoang Dao, Sanu Mathew, Ram Krishnamurthy,

“Energy-Delay Estimation Technique for High-Performance Microprocessor VLSI Adders”, Proceedings of the International Symposium on Computer Arithmetic, ARITH-16, Santiago de Compostela, SPAIN, June 15-18, 2003.

• Hoang Q. Dao, Bart R. Zeydel, Vojin G. Oklobdzija, “Energy Minimization Method for Optimal Energy-Delay Extraction”, Proceedings of the European Solid-State Circuits Conference, ESSCIRC 2003, Estoril, PORTUGAL, September 16-18, 2003.

• V. G. Oklobdzija, B. R. Zeydel, H. Q. Dao, S. Mathew, R. Krishnamurthy, "Comparison of High-Performance VLSI Adders in Energy-Delay Space", IEEE Transaction on VLSI Systems, Volume 13, Issue 6, pp. 754-758, June 2005.

• Hoang Q. Dao, Bart R. Zeydel, Victor Zyuban, and Vojin G. Oklobdzija, “On Energy Optimization of Digital Systems”, The fourth annual IBM Austin Conference on Energy-Efficient Design, ACEED 2005, Austin, Texas, March 1-3, 2005.

• H. Q. Dao, B. R. Zeydel, V. G. Oklobdzija, "Energy-Efficient Optimization of the ViterbiACS Unit Architecture", Proceedings of the Asian Solid-State Circuit Conference, A-SSCC 2005, Hsinchu, Taiwan, November 1-3, 2005.

• H. Q. Dao, B. R. Zeydel, V. G. Oklobdzija, "Energy Optimization of Pipelined Digital Systems Using Circuit Sizing and Supply Scaling", IEEE Transaction on VLSI Systems,Vol. 14, Issue 2, Feb. 2006 pp. 122-134.

• S. K. Hsu, S. K. Mathew, M. A. Anders, B. R. Zeydel, V. G. Oklobdzija, R. K. Krishnamurthy, S. Y. Borkar, “A 110 GOPS/W 16-bit Multiplier and Reconfigurable PLA Loop in 90-nm CMOS”, IEEE Journal of Solid-State Circuits, Vol.41, No.1, January 2006.

• B. R. Zeydel, D. Baran, V. G. Oklobdzija, “Energy Efficient Design of High-Performance VLSI Adders”, IEEE Journal of Solid-State Circuits, June 2010.