Sizing Router Buffers Guido Appenzeller Isaac Keslassy Nick McKeown Stanford University.
Adapted from EE271 notes, Stanford University · PDF fileAdapted from EE271 notes, Stanford...
Transcript of Adapted from EE271 notes, Stanford University · PDF fileAdapted from EE271 notes, Stanford...
Delay Calculation
Kenneth YunUC San Diego
Adapted from EE271 notes,Stanford University
Overviewn Review (RC model)n Elmore delayn Transmission gatesn Shared contactsn Transistor sizingn Gate sizingn Reading
n W&E 4.1-4.3.6, 4.5.4
Review of RC Modeln RC Model
n Model transistor as a linear resistorn R = R x L/W
n Model load as capacitorn Delay = RC
n Recall that R is adjusted to RC the correct value
n Source of load capacitancen Gate cap of driven transistorsn Diffusion cap of source/drain region of driving
transistorn Wire cap
Rule of Thumb Cap Table
2.0fF/pdiff (5 or 6 wide)
2.0fF/ndiff (5 or 6 wide)
2.0fF/gate (poly over diff)Cap/Transistor cap
0.2fF/metal 2 (3 or 4 wide)0.3fF/metal 1 (3 or 4 wide)0.2fF/poly wire (2 wide)Cap/Wire cap
Simple Diffusion Cap Modeln Diffusion cap = W x / x 2fF/
n assuming that width of diffusion contact is 5 (6)n W in n / = 0.2 in 0.35 m process
n For example, for W=16 in 0.35 m processn Diffusion cap = 16 x 0.2 x 2 fF = 6.4fF
W
5
Folding Transistorsn To reduce diffusion cap
n Cap of region x for 16 transistor = 6.4fFn Cap of region x for 16 folded transistor =
3.2fF
16
5
86
x
xGnd Gnd Gnd
RC Delay
n Rpu = 30K/12 = 2.5K; Rpd = 15K/8 = 1.875K
n Cload = 100 x 0.3fF + (16+8) x 0.2 x 2fF + (24+16) x 0.2 x 2fF = 30+9.6+16 = 55.6fF
n Delayin x = 1.875K x 55.6fF = 104.25ps; Delayin x = 2.5K x 55.6fF = 139ps
100 M1
24:2
16:2
16:2
8:2
in x
RC Delay (Continued)
n RP1 = 30K/4 = 7.5K; RN1 = 15K/2 = 7.5Kn Cx = 100 x 0.3fF + (16+8) x 0.2 x 2fF +
(8+4) x 0.2 x 2fF = 30+9.6+4.8 = 44.4fFn Delayinx = 7.5K x 44.4fF = 333ps
100 M1
8:2
4:2
16:2
8:2
in x200 M1
32:2
16:2
y
P1
N1
P2
N2
P3
N3
RC Delay (Continued)
n RP2 = 30K/8 = 3.75K = RN2n Cy = 200 x 0.3fF + (32+16) x 0.2 x 2fF +
(16+8) x 0.2 x 2fF = 60+19.2+9.6 = 88.8fFn Delayxy = 3.75K x 88.8fF = 333psn Delayiny = Delayinx + Delayxy = 666ps
100 M1
8:2
4:2
16:2
8:2
in x200 M1
32:2
16:2
y
P1
N1
P2
N2
P3
N3
Series Stacks
n If C3 >> C1 + C2, then the delay is approximately (R1+R2+R3)C3
n Otherwise, ???n Distributed RC
C1 C2 C3R1
R2
R3
Elmore Delayn For distributed RC network
n Delay = R1C1 + (R1+R2)C2 + (R1+R2 +R3)C3 + (R1+R2 +R3 +R4)C4
n Sum of the delays to charge (discharge) individual capacitors
k
n
k
k
iin CR
= =
=
1 1
C1 C2 C3 C4
R1 R2 R3 R4
C1 C2 C3 C4
R1 R2 R3 R4
Distributed RC Model for Wiren R and C are not
really lumpedn Use modeln Break wire into many
lumped elementsn Delay independent of
number of segments
Rtrans
C
Rtrans
2
C
2
C
R
Rtrans
4
C
2
C
4
C
2
R
2
R
2
RCCRtrans +=
Series Stack
n Gate capn Cap between gate and channeln Case I: gate cap only seen from gate side if
source is groundedn Case II: gate cap also seen on the other side if
source voltage is in transition (and gate voltage fixed)
Cx
Cg
Cx
Cg
Cx
Case I Case IIsource
Series Stack Delayn R = 15K/4 = 3.75Kn C = Cdiff + Cgate = 2 x 8 x 0.2 x 2fF = 6.4fFn Cout = Cdiff + Cload = 3.2fF + Cloadn Delay = RC + 2RC + 3RC + 4RCout = 6RC +
4RCout = 144ps + 4RCoutn For n stages, delay = n(n1)RC/2 + nRCout
n O(n2): quadratic in n
Gnd
C C C
R R R
Cout
R
out
8
6
Resistance of Transmission Gaten Two transistors in parallel
n When passing 0, resistance of pMOS doubles (roughly) because |Vgs| = Vdd |Vth| for pMOSn Rp = 2 x 30K/ = 60K/n Rn = 15K/n For 1:1 p to n ratio, R = Rp || Rn = 12K/
n When passing 1, resistance of nMOS doubles (roughly) because Vgs = Vdd Vth for nMOSn Rn = 2 x 15K/ = 30K/n Rp = 30K/n For 1:1 p to n ratio, R = Rp || Rn = 15K/
n So R 15K/n For 8:2 T-gate, R 3.75K
Capacitance of Transmission Gaten When off, cap on node A (or B) entirely from
two diffusion contactsn 2 x 8 x 0.2 x 2fF = 6.4fF
n When on, an additional gate cap is also seen because a source voltage is in transitionn But only one matters because Vgs = 0 for pMOS,
when passing 0, and Vgs = 0 for nMOS, when passing 1
n Total cap = 3 x 3.2fF = 9.6fF
8:2
8:2
3.75K
A B 9.6fF 9.6fF
Transmission Gate Example
n Ca = 2 diff contacts (INV1) + TG cap (on) = 2 x 3.2fF + 9.6fF = 16fF
n Cb = TG cap (on) + TG cap (off) + gate cap (INV2) = 9.6fF + 6.4fF + 6.4fF = 22.4fFn Why is the second TG cap smaller?
All transistors are 8:2
3.75K9.6fF 9.6fF
1in
0
c
a
b d
1
2
RC Model of Example
n tinb = 7.5K x 16fF + (7.5K + 3.75K) x 22.4fF = 372ps
a b7.5K 3.75K
16fF 22.4fF
3.75K
16fF 22.4fFa b
3.75K
n tinb = 3.75K x 16fF + (3.75K + 3.75K) x 22.4fF = 228ps
Shared Contacts
n Cap on node a reduced by 6.4fFn Inverter and TG share diffusion
contacts
n Cap on node b reduced by 6.4fFn Two TGs share diffusion contacts
INV TG TG
ab
a b3.75K
9.6fF 16fFTG
RC Delay with Shared Contacts
n tinb = 7.5K x 9.6fF + (7.5K + 3.75K) x 16fF = 252ps
3.75K
9.6fF 16fFa b
3.75K
n tinb = 3.75K x 9.6fF + (3.75K + 3.75K) x 16fF = 156ps
> 30% reduction in delay!
7.5K 3.75K
9.6fF 16fFa b
Transistor Sizingn Need delay estimates to
size transistorsn Need to know
n Load the transistor drivesn Load the transistor
presents to its predecessor
n For example, to drive a large cap (2pF shown below)n need a large driver
(400:2 and 200:2 for p and n)
n But large driver also slows down the predecessor stage
8:2
2pF (10mm of M2)4:2
15ns delay
400:2
200:2 2pF300ps delay
8:2
4:2
7.5K x 600 x 0.2 x 2fF = 1.8ns delay
Optimum Transistor Sizingn Minimize delay of chain
n Equalize the delay of every gaten How?
n Each gate drives f times larger gate
1 f f 2 f 3
Optimum Transistor Sizing Justification
n Introduce irregularity in the chain by making the second inverter to be g times as large as the first (instead of f times)
n Then the delay becomes
Assume Wp = 2Wn so that rise time = fall time
1 g f 2 f 3R
gC
R/g R/f 2 R/f 3
f 2C f 3C
Cload
01 2
2
=
=
RCg
f
g
RCC
C
ff
g
fgRC
ffRCRC
g
fgRC
+++=+++= load3
2
load3
2 11
n Differentiate it with respect to g
n Optimum value of g is then equal to f
Choosing fn Each inverter drives an inverter f times its
sizen N inverters in the chainn R = resistance of a driving transistor in the first
inverter (assume that Wp = 2Wn)n C = input cap of the first inverter
loadCCfN =
( )( )f
CCN
ln
ln load=
( )( ) fRCf
CC
ln
ln load=
( )( )[ ] ( ) 0lnln
1ln2 =
=
RCCCf
f
f load
ef =
Total delay:
e f
Choosing f (continued)n So far, we ignored diffusion and wire capn Assuming that Cdiff is the diffusion cap of the
first inverter n Inverter delay fRC + RCdiff = (f+)RC
n where = Cdiff / Cn Why is the second factor (RCdiff) independent of f?
n Total delay = N(f+)RC ( )
( ) RCffCC
)(ln
ln += load
( )
( )[ ]( ) 0ln
ln
1ln
2 =
+
=
RCCCf
ff
f load
e
1
( )fln
f
f
+1
4 ffor small
Gate Sizingn Equalize loaded delay in every stagen If delays not equal,
n Make the gate with the longest delay larger,
n which decreases its delay but increases predecessors
n But the overall delay decreases as long as the delay reduction is greater than the increase in predecessors delay
n Repeat until all delays are equal
Gate Sizing Justificationn Suppose N2 has the longest delayn Make N2 x times larger
0)1(
)1(11
)(1
2211
112211221122
Standard Cell Transistor Sizen Delay should not be too sensitive to
placementn which implies that wire cap should be small
compared to totaln Long wires are in mms (0.2pF/mm)n Thus transistors should be large
Wire Delayn For short wires, Rwire
Long Wiresn Break them into three regions
Optimal repeater distance
Optimal repeater
size
n For middle region, optimal spacing is determined whenn Added buffer delay matches the reduction
in wire delay
Rules of Thumb for Fast Designsn Keep fanouts of all gates less than 5n Keep delays of gates in critical path
roughly the samen Large fanin gates should have fewer
fanoutsn Limit faninn Use short buffer chains (sometimes one
inverter) when necessaryn Use bubble shuffling to reduce logic