Buffer Insertion

47
Interconnect Optimizations

description

basics of buffer insertion

Transcript of Buffer Insertion

Page 1: Buffer Insertion

Interconnect Optimizations

Page 2: Buffer Insertion

A scaling primer• Ideal process scaling:

– Device geometries shrink by S= 0.7x)• Device delay shrinks by s

– Wire geometries shrink by • R/ : /(ws.hs) = r/s2

• Cc/ : (hs)./(Ss) = Cc• C/: similar• R/ doubles, C/ and Cc/ unchanged

SS

GG

DD

h

w

l

S

l

h

Sw

Page 3: Buffer Insertion

Interconnect role• Short (local) interconnect

– Used to connect nearby cells– Minimize wire C, i.e., use short min-width wires

• Medium to long-distance (global) interconnect– Size wires to tradeoff area vs. delay– Increasing width Capacitance increases, Resistance

decreases Need to find acceptable tradeoff - wire sizing problem• “Fat” wires

– Thicker cross-sections in higher metal layers– Useful for reducing delays for global wires– Inductance issues, sharing of limited resource

Page 4: Buffer Insertion

Cross-Section of A Chip

Page 5: Buffer Insertion

Block scaling

• Block area often stays same – # cells, # nets doubles

– Wiring histogram shape invariant

• Global interconnect lengths don’t shrink• Local interconnect lengths shrink by s

Page 6: Buffer Insertion

Interconnect delay scaling• Delay of a wire of length l :

int = (rl)(cl) = rcl2 (first order)

• Local interconnects : int : (r/s2)(c)(ls)2 = rcl2

– Local interconnect delay unchanged (compare to faster devices)

• Global interconnects : int : (r/s2)(c)(l)2 = (rcl2)/s2

– Global interconnect delay doubles – unsustainable!

• Interconnect delay increasingly more dominant

Page 7: Buffer Insertion

Buffer Insertion For Delay Reduction

Page 8: Buffer Insertion

Analysis of Simple RC Circuit

)()()(

)())(()(

)()()(

tvtvdttdvRC

dttdvC

dttCvdti

tvtvtiR

T

T

state variable

Inputwaveform

± v(t)CR

vT(t)

i(t)

Page 9: Buffer Insertion

Analysis of Simple RC Circuit

Step-input response:

match initial state:

output response for step-input:

v0v0u(t)

v0(1-e-t/RC)u(t)

)()()(0 tuvtv

dttdvRC

)()( 0 tuvKetv RCt

)()1()( 0 tuevtv RCt

0)( 0)0( 0 tuvKv

Page 10: Buffer Insertion

Delays of Simple RC Circuit• v(t) = v0(1 - e-t/RC) -- waveform

under step input v0u(t)

• v(t)=0.5v0 t = 0.69RC– i.e., delay = 0.69RC (50% delay)

v(t)=0.1v0 t = 0.1RC

v(t)=0.9v0 t = 2.3RC– i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)

• Commonly used metric TD = RC (= Elmore delay)

Page 11: Buffer Insertion

Elmore Delay

Delay

Page 12: Buffer Insertion

Elmore Delay

• Driver is modeled as R• Driver intrinsic gate delay t(B)• Delay = all Ri all Cj downstream from Ri Ri*Cj• Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2• Elmore delay at n1 R(B)*(C1+C2)

R(B)C1 R(w) C2

n1

B

n2

Page 13: Buffer Insertion

Elmore Delay

• For uniform wire

• No matter how to lump, the Elmore delay is the same

x

C

unit wire capacitance cunit wire resistance r

Page 14: Buffer Insertion

Delay for Buffer

v

C

u

C(b)

u

Intrinsic buffer delayDriver resistanceInput capacitance

Page 15: Buffer Insertion

R

Buffers Reduce Wire Delay

x/2

cx/4 cx/4rx/2

t_unbuf = R( cx + C ) + rx( cx/2 + C )

t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb

t_buf – t_unbuf = RC + tb – rcx2/4

x/2

cx/4 cx/4rx/2

CC R

x

∆t

Page 16: Buffer Insertion

Combinational Logic Delay

Combinational logic delay <= clock period

Combinational Logic

RegisterPrimary Input

RegisterPrimary Outputclock

Page 17: Buffer Insertion

Buffered global interconnects: Intuition

Interconnect delay = r.c.l2

Now, interconnect delay = r.c.li2 < r.c.l2 (where l = lj )

since (lj 2) < (lj )2

(Of course, account for buffer delay also)

l1 lnl3l2

l

Page 18: Buffer Insertion

Optimal inter-buffer length• First order (lumped parasitic, Elmore delay) analysis

• Assume N identical buffers with equal inter-buffer length l (L = Nl)

• For minimum delay,

gddg

ggd

CRl

cRrCrclL

clCrlclCRNT

12/

2/

0dldT

02 2

opt

gd

lCRrcL

rcCR

l gdopt

2

L

Rd – On resistance of inverterCg – Gate input capacitancer,c – Resistance, cap. per micron

… …l

Page 19: Buffer Insertion

Optimal interconnect delay• Substituting lopt back into the interconnect delay

expression:

rcCR

CRcRrC

rcCR

rcL

CRl

cRrCrclLT

gd

gddg

gd

gdopt

dgoptopt

2

2

1

cRrCrcCRLT dggdopt 2

Delay grows linearly with L (instead of quadratically)

Page 20: Buffer Insertion

Total buffer count

• Ever-increasing fractions of total cell count will be buffers– 70% in 32nm

0

10

20

30

40

50

60

70

80

90nm 65nm 45nm 32nm

% c

ells

use

d to

buf

fer n

ets

clk-bufbuftot-buf

Page 21: Buffer Insertion

Source: ITRS, 2003Source: ITRS, 20030.1

1

10

100250 180 130 90 65 45 32

Feature size (nm)Relativedelay

Gate delay (fanout 4)Local interconnect (M1,2)Global interconnect with repeatersGlobal interconnect without repeaters

ITRS projections

Page 22: Buffer Insertion

Buffers Improve Slack

RAT = 300Delay = 350Slack = -50 RAT = 700Delay = 600Slack = 100RAT = 300Delay = 250Slack = 50RAT = 700Delay = 400Slack = 300

slackmin = -50

slackmin = 50Decouple capacitive load from critical path

RAT = Required Arrival TimeSlack = RAT - Delay

Page 23: Buffer Insertion

Timing Driven Buffering Problem Formulation

• Given– A Steiner tree– RAT at each sink– A buffer type– RC parameters– Candidate buffer locations

• Find buffer insertion solution such that the slack at the driver is maximized

Page 24: Buffer Insertion

Candidate Buffering Solutions

Page 25: Buffer Insertion

Candidate Solution Characteristics

• Each candidate solution is associated with– vi: a node

– ci: downstream capacitance

– qi: RAT

vi is a sinkci is sink capacitance

v is an internal node

Page 26: Buffer Insertion

Van Ginneken’s Algorithm

Candidate solutions are propagated toward the source

Dynamic Programming

Page 27: Buffer Insertion

Solution Propagation: Add Wire

• c2 = c1 + cx• q2 = q1 – rcx2/2 – rxc1

• r: wire resistance per unit length• c: wire capacitance per unit length

(v1, c1, q1)(v2, c2, q2)x

Page 28: Buffer Insertion

28

Solution Propagation: Insert Buffer

• c1b = Cb • q1b = q1 – Rbc1 – tb

• Cb: buffer input capacitance

• Rb: buffer output resistance

• tb: buffer intrinsic delay

(v1, c1, q1)(v1, c1b, q1b)

Page 29: Buffer Insertion

Solution Propagation: Merge

• cmerge = cl + cr

• qmerge = min(ql , qr)

(v, cl , ql) (v, cr , qr)

Page 30: Buffer Insertion

Solution Propagation: Add Driver

• q0d = q0 – Rdc0 = slackmin

• Rd: driver resistance

• Pick solution with max slackmin

(v0, c0, q0)(v0, c0d, q0d)

Page 31: Buffer Insertion

Example of Solution Propagation

(v1, 1, 20)22

v1 v1

(v2, 3, 16)

• r = 1, c = 1• Rb = 1, Cb = 1, tb = 1• Rd = 1

(v2, 1, 12)

v1

(v3, 5, 8)v1

(v3, 3, 8)

slack = 5

slack = 3

Add wire

Add wire

Insert buffer Add wire

Add driver

Add driver

Page 32: Buffer Insertion

32

Example of Merging

Left candidates

Right candidates

Merged candidates

Page 33: Buffer Insertion

Solution Pruning

• Two candidate solutions– (v, c1, q1)– (v, c2, q2)

• Solution 1 is inferior if – c1 > c2 : larger load

– and q1 < q2 : tighter timing

Page 34: Buffer Insertion

Pruning When Insert Buffer

They have the same load cap Cb, only the one with max q is kept

Page 35: Buffer Insertion

35

Generating Candidates(1)

(2)

(3)

From Dr. Charles Alpert

Page 36: Buffer Insertion

36

Pruning Candidates(3)

(a) (b)

Both (a) and (b) “look” the same to the source.Throw out the one with the worst slack

(4)

Page 37: Buffer Insertion

37

Candidate Example Continued(4)

(5)

Page 38: Buffer Insertion

38

Candidate Example ContinuedAfter pruning

(5)

At driver, compute which candidate maximizesslack. Result is optimal.

Page 39: Buffer Insertion

39

Merging Branches

Right Candidates

Left Candidates

Page 40: Buffer Insertion

40

Pruning Merged Branches

Critical

With pruning

Page 41: Buffer Insertion

41

Van Ginneken Example

(20,400)

(20,400)(30,250)(5, 220)

WireC=10,d=150

BufferC=5, d=30

(20,400)

BufferC=5, d=50C=5, d=30

WireC=15,d=200C=15,d=120

(30,250)(5, 220)

(45, 50)(5, 0)(20,100)(5, 70)

Page 42: Buffer Insertion

42

Van Ginneken Example Cont’d

(20,400)(30,250)(5, 220)

(45, 50)(5, 0)(20,100)(5, 70)

(5,0) is inferior to (5,70). (45,50) is inferior to (20,100)

(20,400)(30,250)(5, 220)

(20,100)(5, 70)(30,10)

(15, -10)

Pick solution with largest slack, follow arrows to get solution

Wire C=10

Page 43: Buffer Insertion

Basic Data Structure

(c1, q1) (c2, q2) (c3, q3)

Sorted list such that• c1 < c2 < c3

• If there is no inferior candidates q1 < q2 < q3

Worse load cap

Better timing

Page 44: Buffer Insertion

44

Prune Solution List

(c1, q1) (c2, q2) (c3, q3)

Increasing c

q1 < q2 ?

(c4, q4)

q3 < q4 ?

Y

N Prune 2 q1 < q3 ?

q2 < q3 ?

Yq3 < q4 ?

YPrune 3 q1 < q4 ?

N Prune 3

N

N Prune 4N Prune 4

q2 < q4 ?

Page 45: Buffer Insertion

45

Pruning In Merging

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

ql1 < ql2 < qr1 < ql3 < qr2

Merged candidate

s

(cl1+cr1, ql1)

(cl2+cr1, ql2)

(cl3+cr1, qr1)

(cl3+cr2, ql3)

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

(cl1, ql1)

(cl2, ql2)

(cl3, ql3)

(cr1, qr1)

(cr2, qr2)

Left candidate

s

Right candidate

s

Page 46: Buffer Insertion

Van Ginneken Complexity

• Generate candidates from sinks to source

• Quadratic runtime– Adding a wire does not change #candidates

– Adding a buffer adds only one new candidate

– Merging branches additive, not multiplicative

– Linear time solution list pruning

• Optimal for Elmore delay model

Page 47: Buffer Insertion

Multiple Buffer Types

(v1, 1, 20)22

v1

v1

(v2, 3, 16)

• r = 1, c = 1

• Rb1 = 1, Cb1 = 1, tb1 = 1

• Rb2 = 0.5, Cb2 = 2, tb2 = 0.5

• Rd = 1

(v2, 1, 12)v1

(v2, 2, 14)