EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005...

26
1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin Project proposals due by Fr 5pm (by e-mail to Huifang and myself) Title Short abstract of 10-15 lines describing the problem you are trying to address Special office hours today right after class (3:30- 4:30pm) Some feedback on ISSCC? What did catch your eye?

Transcript of EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005...

Page 1: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

1

EE241 - Spring 2005Advanced Digital Integrated Circuits

Lecture 6:Optimization for Performance

2

Admin

Project proposals due by Fr 5pm (by e-mail to Huifangand myself)

Title

Short abstract of 10-15 lines describing the problem you are trying to address

Special office hours today right after class (3:30-4:30pm)

Some feedback on ISSCC? What did catch your eye?

Page 2: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

2

3

Today’s lecture

Using the models we have created so far to do create an environment for optimization

Reading:ICCAD paper by Stojanovic et al.

Chapters 2 and 3 in the text by K. Bernstein (High Speed CMOS Design Styles)

Background material from Rabaey, 2nd ed, Chapters 5, 6.

4

Static Timing Analysis

Computing critical (longest) path delayLongest path algorithm on DAG [Kirkpatrick, IBM Jo. R&D, 1966]

Used in most ASIC designs today

LimitationsFalse paths

Simultaneous arrival times

Page 3: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

3

5

Signal Arrival Times

NAND gate:

1

6

Signal Arrival Times

NAND gate:

1

Page 4: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

4

7

Simultaneous Arrival Times

NAND gate:

8

Impact of Arrival Times

A

B

Delay

0 tA - tB

A arrives early B arrives early

Up to 25%

Page 5: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

5

9

Optimization for Performance

Performance critical blocks

Start with a synthesized designEasier to explore architectures

Easy to verify

Provides some level of performance optimization

Understand the limits of synthesized designs

10

Performance Optimization

Power

Delay

Increasing the performanceincreases power!

Page 6: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

6

11

Performance Optimization

Power

Delay

Microarchitecture A

Microarchitecture B

12

Performance Optimization

Power

Delay

SynthesizedMicroarchitecture A

Microarchitecture B

CustomMicroarchitecture A

Page 7: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

7

13

How to Increase Performance?

Scale technology

Circuit level:Transistor sizing, buffering

Wire optimization, repeaters

Supply and Threshold voltage

Logic styles

Timing, latches

MicroarchitectureBlock topologies (adders, multipliers)

Pipelining

Parallelism

14

Sizing Logic Paths for Speed

Frequently, input capacitance of a logic path is constrainedLogic has to drive some capacitanceExample: ALU load in an Intel’s microprocessor is > 0.5pFHow do we size the ALU datapath to achieve maximum speed?Review the method of logical effort

Page 8: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

8

15

Inverter Chain

CL

If CL and CIn are given:- How many stages are needed to minimize the delay?- How to size the inverters?

May need some additional constraints.

In Out

16

Delay Formula

( )

( ) ( )γ/1/1

~

0int ftCCCkRt

CCRDelay

pLWp

LW

+=+=

+

int

int

Cint = γCgin with γ ≈ 1f = CL/Cgin - effective fanoutR = Runit/W ; Cint =WCunittp0 = 0.7RunitCunit

Page 9: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

9

17

Apply to Inverter Chain

CL

In Out

1 2 N

tp = tp1 + tp2 + …+ tpN

⎟⎟⎠

⎞⎜⎜⎝

⎛+ +

jgin

jginunitunitpj C

CCRt

,

1,1~γ

LNgin

N

i jgin

jginp

N

jjpp CC

C

Cttt =⎟

⎟⎠

⎞⎜⎜⎝

⎛+== +

=

+

=∑∑ 1,

1 ,

1,0

1, ,1

γ

18

Apply to Inverter Chain

CL

In Out

1 2 N

tp = tp1 + tp2 + …+ tpN

⎟⎟⎠

⎞⎜⎜⎝

⎛+ +

jgin

jginunitunitpj C

CCRt

,

1,1~

LNgin

N

i jgin

jginp

N

jjpp CC

C

Cttt =⎟

⎟⎠

⎞⎜⎜⎝

⎛+== +

=

+

=∑∑ 1,

1 ,

1,0

1, ,1

1=γ

Page 10: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

10

19

Optimal Tapering for Given N

Delay equation has N - 1 unknowns, Cgin,2 – Cgin,N

Minimize the delay, find N - 1 partial derivatives

Result: Cgin,j+1/Cgin,j = Cgin,j/Cgin,j-1

Size of each stage is the geometric mean of two neighbors

- each stage has the same effective fanout (Cout/Cin)- each stage has the same delay

1,1,, +−= jginjginjgin CCC

20

Optimum Delay and Number of Stages

1,/ ginLN CCFf ==

When each stage is sized by f and has same effective fanout f:

N Ff =

( )γ/10N

pp FNtt +=

Minimum path delay

Effective fanout of each stage:

Page 11: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

11

21

Example

CL= 8 C1

In Out

C11 f f2

283 ==f

CL/C1 has to be evenly distributed across N = 3 stages:

22

Optimum Number of Stages

For a given load, CL and given input capacitance CinFind optimal number of stages, N, and optimal sizing, f

( ) ⎟⎠⎞

⎜⎝⎛ +=+=

fffFt

FNtt pNpp lnln

ln1/ 0/1

γγ

0ln

1lnln2

0 =−−⋅=∂∂

f

ffFt

f

t pp γγ

For γ = 0, f = e, N = lnF

fF

NCfCFC inN

inL lnln

with ==⋅=

fγf += 1e

Page 12: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

12

23

Optimum Effective Fanout f

Optimum f for given process defined by γ( )ff γ+= 1e

fopt = 3.6for γ=1

0 0.5 1 1.5 2 2.5 32.5

3

3.5

4

4.5

5

γ

f op

t

24

Impact of Loading on tp

With self-loading γ=1

1 1.5 2 2.5 3 3.5 4 4.5 50

1

2

3

4

5

6

7

f

norm

aliz

ed d

elay

Page 13: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

13

25

Extending the Model

For given N: Ci+1/Ci = Ci/Ci-1To find N: Ci+1/Ci ~ 4

Method of logical effort generalizes this to any logic path

CL

In Out

1 2 N

( )∑=

⋅+=N

iiii fgpDelay

1(in units of τinv)

26

Logical Effort

( )fgp

CC

CRkDelayin

Lunitunit

⋅+=

⎟⎟⎠

⎞⎜⎜⎝

⎛+⋅=

τγ

1

p – intrinsic delay - gate parameter ≠ f(W)g – logical effort – gate parameter ≠ f(W)f – electrical effort (fanout)

Normalize everything to an inverter:ginv =1, pinv = 1

Divide everything by τinv

(everything is measured in unit delays τinv)Assume γ = 1.

Page 14: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

14

27

Delay in a Logic Gate

Gate delay:

d = h + p

effort delay intrinsic delay

Effort delay:

h = g f

logical effort effective fanout = Cout/Cin

Logical effort is a function of topology, independent of sizingEffective fanout (electrical effort) is a function of load/gate size

28

Logical Effort

Inverter has the smallest logical effort and intrinsic delay of all static CMOS gatesLogical effort of a gate presents the ratio of its input capacitance to the inverter capacitance when sized to deliver the same current

Logical effort increases with the gate complexity

Page 15: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

15

29

Logical Effort

Logical effort is the ratio of input capacitance of a gate to the inputcapacitance of an inverter with the same output current

g = 1 g = g =

Size factor:1.8Size factor:1.5

30

Logical Effort of Gates

Fan-out (f)

Nor

mal

ized

del

ay (

d)

t

1 2 3 4 5 6 7

pINV

t pNAND

F(Fan-in)

g=p=d=

g=p=d=

Page 16: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

16

31

Logical Effort of Gates

Fan-out (f)

Nor

mal

ized

del

ay (

d)t

1 2 3 4 5 6 7

pINVtpNAND

F(Fan-in)

g=1p=1d=f+1

g=3.5/3p=5.5/3d=(3.5/3)f+1.8

32

Add Branching Effort

Branching effort:

pathon

pathoffpathon

C

CCb

−− +=

Page 17: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

17

33

Multistage Networks

Stage effort: hi = gifi

Path electrical effort: F = Cout/Cin

Path logical effort: G = g1g2…gN

Branching effort: B = b1b2…bN

Path effort: H = GFB

Path delay D = Σdi = Σpi + Σhi

( )∑=

⋅+=N

iiii fgpDelay

1

34

Optimum Effort per Stage

HhN =

When each stage bears the same effort:

N Hh =

( ) PNHpfgD Niii +=+=∑ /1ˆ

Minimum path delay

Effective fanout of each stage: ii ghf =

Stage efforts: g1f1 = g2f2 = … = gNfN

Page 18: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

18

35

Optimal Number of Stages

For a given load, and given input capacitance of the first gateFind optimal number of stages and optimal sizing

PNHD N += /1

( ) 0ln /1/1/1 =++−=∂∂

PHHHND NNN

NHhˆ/1=Substitute ‘best stage effort’

36

Logical Effort Optimization Methodology

For smaller problems, easy to translate into set of analytical expressions

Feed them into Matlab optimizerWith some careful manipulations, can be turned into a convex optimization problem (Stojanovic)

Easily extended to add power/energy

Page 19: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

19

37

Optimization for Performance

Options• Technology choice

CMOS, bipolar, BiCMOS, GaAs, Superconducting• Logic level optimizations

logic depth, network topology, fan-out, gate complexity• Circuit optimizations

logic style, transistor sizing• Physical optimization

implementation choice, layout strategy

• Wires are the key

38

Logic Level Optimizations

R R

Logic Depth

or

Techniques: Restructuring, pipelining, retiming, technology mapping

Well covered by today’s logic and sequential synthesis

Page 20: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

20

39

Logic Optimizations (2)

Technique: Removal of common sub-expressionStart from tree structure/output

Fanout

Tp = O(FO) also effects wiring capacitance

Late arriving

40

Logic Optimizations (3)

1 3 5 7 9fan-in

0.0

1.0

2.0

3.0

4.0

t p(n

sec)

tpHL

tp

tpLHlinear

quadratic

AVOID LARGE FAN-IN GATES! (Typically not more than FI < 4)

Tp = O(FI2) !Observation: only true if FI

translates in series devices -

otherwise linear

e.g. NAND pull-down

NOR pull-up

Fanin

Page 21: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

21

41

Logic Optimizations (4)

Fan-out

t p(p

sec)

t

1 2 3 4 5 6 7

pINVtpNAND

F(Fan-in)

Slope is a function of “driving strength”

pNORt

All the gates have the same drive current

42

Technology Mapping for Performance

Alternative coverings

Use low FI modules on critical path(s)Library composition?

Page 22: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

22

43

CMOS Logic Styles

CMOS tradeoffs:SpeedPower (energy)Area

Design tradeoffsRobustness, scalability

Design time

Many styles: don’t try to remember the names –remember the principlesChanging the logic style – can it be done without breaking the synthesis flow?

44

CMOS Logic Styles

PUN

PDN

ABC

OUT

VDD

GND

ABC

Complementary

robustscales

large and slow

LOGICNETWORK

ABC

OUT

Pass Transistor Logic

simple and fastnot always very efficientversatile

Page 23: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

23

45

CMOS Logic Styles

LOAD

ABC PDN

OUT

GND

GND

VDD

Ratioed Logic

small & faststatic power

RPDN <<RLOAD

VDD

PDN

φ

In1In2

In3

Out

φ

CL

Dynamic Logic

Small & fastest!Noise issuesScales?

46

Pulsed Static CMOS

RH – Reset highRL – Reset low

Fast pull-up Fast pull-down

Chen, Ditlow, US Pat. 5,495,188 Feb. 1996.

Page 24: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

24

47

PS-CMOS

Evaluation and reset waves: reset is 1.5x slower

48

PS-CMOS

Advantages:

No dynamic nodes – good noise immunity

Reset delay slower than evaluation

No data dependent delay (worst case gets better)

No false transitions

Disadvantages

Width of reset wave limits logic depth

Margin in design

Page 25: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

25

49

Skewing Gates

Different rising and falling delays

W

W

LE =

50

Skewing Gates

4W

W

LE =

Page 26: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/...1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 6: Optimization for Performance 2 Admin

26

51

Skewing Gates