1
Interconnect Centric VLSI Design Automation
Chung-Kuan ChengCSE DepartmentUC San DiegoLa Jolla, CA [email protected]
2
Interconnect Dominance
Goals: Speed, Power, Cost
Constraints: Area, Current, Skew
Challenges: PVT Variations, Signal Integrity
0
40
80
120
160
200
180 150 130 100 90 80 70 65 57 50Process Technology Node (nm)
Dela
y (p
s)
1mm Global I nterconnect with Scattering(source: I TRS Roadmap 2004)
FO4 I nverter Delay (Estimated by0.36*Ldraw)
1mm distortionless Transmission Line (Speedof Light)
3
Outlines• Interconnect Technologies
• Buffers, Pitches, Circuit Styles• Geometric Planning
• Wire Orientations, Chip Shapes• Interconnect Networks
• Topologies, Wire Styles• Power and Ground Distributions• Clock Networks• Functional Modules
• Adders, Shifters • Packaging• Conclusion
4
Interconnect Technologies
• RC Wires• Wire Pitch, Width, Separation• Buffer Size, Buffer Interval
• Transmission Line• RLCG
5
Interconnect Technologies
Year (On-Chip) 2005 2010 2015
rncn (ps)T40 0.870 0.400 0.180
rwcw (ps/mm)T80
metal 1/global
440
110
1792
523
5951
1601
interval (mm) 0.136
0.272
0.0457
0.0846
0.0168
0.0324
delay (ps/mm) 120
60
164
89
200
104
,)1(2
ww
gn
cr
crfInt
SRC Roadmap 2005
wgwntrtr ccrrfllDelay ))1(22(/)(
6
Interconnect Tech.: Transmission Line• Speed-of-the-light on-chip communication
• < 1/5 Delay of Traditional Wires
• Low Power Consumption• < 1/5 Power Consumption
• Robust against process variations• Short Latency• Insensitive to Feature Size
7
Interconnect Tech.: Transmission Line
RΔl LΔlGΔl CΔl
RΔl LΔl
GΔl CΔl …
i(z,t)
RΔl LΔl RΔl LΔl
Differential Transmission Line Surfliner
RΔl LΔlCΔl
RΔl LΔl
CΔl …
i(z,t)
RΔl LΔl RΔl LΔl
Serial resistance causes voltage loss.
Speed and attenuation are frequency dependent.
Shunt conductance compensates voltage loss: R/G = L/C.
Flat from DC Mode to Giga Hz
Telegraph Cable: O. Heaviside in 1887.
8
Theory (Telegrapher’s Equation)• Telegrapher’s equation:
),(),(),(
),(),(
),(
tzGVdt
tzdVC
dz
tzdIdt
tzdILtzRI
dz
tzdV
• Propagation Constant:
jCjGLjR ))((
• Wave Propagation: zjzeVzV 0)(
• Alpha and Beta corresponds to speed and phase velocity. Both are frequency dependant
9
Theory (Distortionless Line)
• Set G=RC/L• Frequency Independent speed and attenuation:
LCCLR ,//
• Characteristic impedance: (pure resistive)
CLZ /0 • Phase Velocity (Speed of light in the media)
cLCv /1• Attenuation:
zZ
R
ezA 0)(
10
Digital Signal Response
11
Interconnect Tech.: Transmission Line• Add shunt conductance between differential
wires
• Resistors realized by serpentine unsilicided poly, diffusion resistors, or high resistive metal
12
Geometrical Planning
• Wire Orientations• Manhattan, Hexagonal, Octagonal, Euclidean
• Die Shapes• Rectangle, Diamond, Hexagon, Octagon, Circle
13
Average Radius of Unit-Circle Area
lambda geo.
Shape
Man. Y-Arch X-Arch Euclid.
Square 1.329 1.122 1.070 1.017
Diamond 1.253 1.121 1.070 1.017
Hexagon 1.276 1.100 1.058 1.003
Octagon 1.272 1.104 1.054 1.001
Circle 1.273 1.103 1.055 1.000
14
Throughput : concurrent flow demand
lambda geo.
Shape
Man. Y-Arch X-Arch*
M: Square 1.000 1.225 1.346
M: Diamond 1.195
Y: Hexagon 1.315
X: Octagon* 1.420
*ratio of 0-90 planes and 45-135 planes is not fixed
15
Flow congestion map for uniform 90 Degree meshes
16
12 by 12 13 by 13
Congestion map of square chip using X-architecture
17
12 by 12 13 by 13
Y-architecture + Square Chip
18
Y Architecture + Hexagonal Chip
19
X-Architecture + Octagonal Chip
20
Manhattan Architecture + Diamond Chip
21
Routing Grids
(http://www.xinitiative.org/img/062102forum.pdf)
X-Architecture Y-Architecture
22
Interconnect Networks
• Optimized Interconnect Architecture• Data Bus, Control Signals
• Shared Interconnect• Packet Switching• Circuit Switching• RTL Level Partition
23
Interconnect Networks
Physical Implementation
On-chip transmission line for long distance communication
Well spaced RC wire with buffer insertion for local connections
24
Interconnect Networks
• Obj: Power, Latency
• Constraints:• Routing Area, Bandwidth
• Design Space:• Topology• Wire Styles, Switches
• Model:• Traffic Demand• Data Bus, Control Signals
25
Interconnect Networks: Design Flow
Power & Delay Lib
Topology Lib
Multi-commodity network flow (MCF)
formulation
Power Evaluation(MCF solver)
Latency Aware low power NoC topology
with wire style optimization
26
Power and Latency Tradeoffs for Optimal 8x8 Topologies
48
53
58
63
68
73
2. 3 2. 5 2. 7 2. 9 3. 1 3. 3Average Latency (ns)
Powe
r Co
nsum
ptio
n (W
)
area=3000um area=4000um area=5000umarea=6000um area=7000um area=8000umarea=9000um area=10000um area=11000um
27
Topology Selection (Latency, Power, BW)
48
58
68
78
88
98
2. 3 2. 8 3. 3 3. 8 4. 3
Average Latency (ns)
Powe
r Co
nsum
ptio
n (W
)
topo=opt i mal topo=mesh topo=torus topo=hypercube
28
(a) Optimal topology when area = 3000um
(b) Optimal topology when area = 7000um
(c) Optimal topology when area = 11000um
Topology Selection (Latency, Power, BW)
Optimal 8-node topologies vs area resources
29
Power Ground Networks• Power network on chip and package• Decoupling capacitor • Conducting transistors connected to the P/G
grids
30
Power Ground Analysis• Constraints: Voltage Drops and Current Densit
y• IR Drop: Static Analysis
• Resistive Networks• Static Current Sources
• dI/dt: Dynamic Analysis• RLC Networks• Power on and off• Sleep mode on and off• Gated clocks• Various operation modes
31
Power/Ground Networks:Natural Frequency
• RLC Network Characteristics: Natural frequencies (Quality Factor)
• Operation Modes: Excite the resonance
• Decoupling Capacitance: Shift the natural frequencies.
32H. Chen, IBM
33
Clock Distributions
34
Clock: Linear Variations Model
• Process variation model• Transistor length• Wire width• Linear variation model
• Power variation model• Supply voltage varies randomly (10%)
ykxkdd yx 0
35
Clock: RC Model
Input: an n level meshes and h-treesConstraint: routing area, parameter variationsObjective: skew
36
Simplified Circuit Model
Vs1Rs
C
Vs2Rs
C
R
1
2
u(t)
u(t-T)
37
Skew Expression
110 2ln CRt
)(2
1.
2
21.
1
21
V
VV
V
VVT
Assumptions:
1. T<<RsC
2. Rs /R <<RsC/T
Using first order
Taylor expansion ex=1+x,
)2ln2exp(
:function Skew
R
RTT s
38
Optimal Routing Resources Allocation
level-1
level-2
level-3
level-4
total area
39
Inductance Diminishes Shunt Effects
Vs1 Rs
C
Vs2 Rs
C
R
1
2
u(t)
u(t-T)
L
f(GHz) 0.5 1 1.5 2 3 3.5 4 5
skew(ps) 3.9 4.2 5.8 7.5 9.9 13 17 26
• 0.5um wide 1.2 cm long copper wire
• Input skew 20ps
40
Clock Distributions
Surfliner
41
Functional Modules (Data Path)
• Arithmetic
• Algorithms
• Logic
• Logic Styles
• Placement
42
Cyclic Shifter
S2
S1
S0
D7 D6 D5 D4 D3 D2 D1 D0
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)
(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)
(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)
(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)
Delay: nlogn, Power: ¼ n2
43
Fan-out Splitting
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
D7 D6 D5 D4 D3 D2 D1 D0
S2
S1
S0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)
(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)
(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)
(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)
Delay Comparison
50.7
102.7
218.7
484
45.376
128
224
0
100
200
300
400
500
600
8-bit 16-bit 32-bit 64-bit
Del
ay
MuxShifterDemuxShifter
τ
Impr. Ratio8-bit 10.7%16-bit 26.0%32-bit 41.5%64-bit 53.7%
Power Comparison
150.7484
1561.3
5220
146.9453.7
1397.6
4458.9
0
1000
2000
3000
4000
5000
6000
8-bit 16-bit 32-bit 64-bit
Tot
al S
witc
hed
Cap
acita
nce
MuxShifterDemuxShifter
Cg
Impr. Ratio8-bit 2.5%16-bit 6.3%32-bit 10.5%64-bit 14.6%
Delay: nPower: 3/16 n2
44
Cell Permutation• Fix the input/output stage, permute cells of intermediate
stages to further improve delay.• Formulate as an ILP problem and solve by CPLEX.
• Optimal solution in terms of delay• Delay/Power tradeoff
Additional Delay Reductionby Cell Permutation
13
29
61
125
818
38
78
0
20
40
60
80
100
120
140
8-bit 16-bit 32-bit 64-bit
Del
ay (
in u
nit
of c
olum
ns s
pann
ed)
w/o permuatation
w/ permutation
7 6 5 4 3 2 1 0
6 5 4 3 7 2 1 0
3 4 2 6 7 5 1 0
7 6 5 4 3 2 1 0
>> 1-bit
>> 2-bit
>> 4-bit
0 0 00 0 0 0 0
1 1 01 1 4 0 0 11130000
11141111
4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3
4 2 4 5 4 5 4 3
8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8
8 7 8 7 8 7 6 8
Optimal Timing solution for 8-bit
45
Packaging: Pin Breakaway
• Row by Row Escape: Escape interconnect row by row from outside toward
inside.
46
Packaging: Escape Sequence Strategies
• Parallel triangular sequence: This method divides the objects into groups
and escape each group with a triangular outline.
47
Packaging: Escape Sequence Strategies• Central triangular sequence: Escape objects from the center of the outside row and
expand the indent with a single triangular outline. In this method, the outline capacity is increased continuously layer by layer while the first several layers is small.
48
Packaging: Escape Sequence Strategies• Two-sided sequence: Escape objects from the inside as well as from the outside.
The outline shrinks slowly and also follows zigzag shape.
49
Packaing: Escape Sequence Strategies
50
Packaging: Experimental Results
Layer Row by rowParallel
triangularCentral
triangularTwo sided
1 304 276 100 312
2 272 340 156 328
3 240 292 228 308
4 208 240 300 324
5 176 164 372 328
6 144 124 444
7 112 164
8 80
9 48
10 16
40 x 40
51
V.3. Experimental Results
Layer Row by rowParallel
triangularCentral
triangularTwo sided
1 144 132 92 140
2 112 144 116 160
3 80 96 140 100
4 48 28 52
5 16
20 x 20
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
Conclusion
• Interconnect Technologies
• Geometric Planning
• Interconnect Planning
• Power and Ground Distribution
• Clock Networks
• Data Path
• Packaging
69
Interconnect Technologies
2.5 0.31 0.08 1.31 1.5 1.5 0.13t h h ta
c s s s sC t
e e e es
0.2
0.35 0.651.53 0.98 0.01w sfr
c h hC h
e es
asC w
h
0.05 0.25
1.21.05 0.63 0.0632
t sfrs s h
C s te e
s h h
Coupling capacitance:
Self capacitance:
a fr a frc c s s
w
C C C CC
Total wire capacitance:
t
h
h
w
s
d
p/g plane
wire
w: wire widtht: wire thickness
s: wire spacingh: distance to p/g plane
d: wire pitch
Cc
Cs
Cc: coupling capacitance Cs: self capacitance
Cs
70
Interconnect Technologies
ndelay bandwidth /bandwidth powerDesign metrics:
ndelay n ndelay power 2n ndelay powerObjective
functions:
For each pitch, For each objective function
Find wire width w, buffer size sinv and buffer interval linv
Calculate the metrics
71
Experimental Results – normalized delay
50100
150200
0
0.5
1
1.5x 10
-6
0
0.5
1
1.5
2x 10
-7
de
lay n(s
/m)
min-d
min-ddp
min-dp
1 2 3 4 5 6 7 8 9 10
x 10-7
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8x 10
-7
pitch(m)
dela
y n(s/m
)
min-dmin-ddpmin-dp180nm130nm100nm70nm
0 0.2 0.4 0.6 0.8 1 1.2
x 10-6
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
-7
pitch(m)
dela
y n(s/m
)
min-dmin-ddpmin-dp
Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: saturating pitch)
Bottom right: at 70nm technology
72
Interconnect Tech. -- bandwidth
50100
150200
0
0.51
1.5x 10-6
1
2
3
4
5x 10
13
band
wid
th(b
its/s
)
min-d
min-ddp
min-dp
1.5 2 2.5 3 3.5 4 4.5
x 10-7
2.5
3
3.5
4
4.5
5x 10
13
pitch(m)
band
wid
th(b
its/s
)
min-dmin-ddpmin-dp180nm130nm100nm70nm
0 0.2 0.4 0.6 0.8 1 1.2
x 10-6
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
13
pitch(m)
band
wid
th(b
its/s
)min-dmin-ddpmin-dp
Top left: OverviewTop right: at min-pitchesBottom right: at 70nm technology
73
Interconnect Tech. – bandwidth/power
50
100
150
200
00.5
11.5x 10
-6
0
1
2
3
4
5x 10
23
band
wid
th/p
ower
(m/J
s)
min-d
min-ddp
min-dp
1 2 3 4 5 6 7 8 9
x 10-7
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
23
pitch(m)
ba
nd
wid
th/p
ow
er(
bits
*m/J
s)
min-dmin-ddpmin-dp180nm130nm100nm70nm
0 0.2 0.4 0.6 0.8 1 1.2
x 10-6
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
23
pitch(m)
band
wid
th/p
ower
(bits
*m/J
s)min-dmin-ddpmin-dp
Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: optimal pitch)
Bottom right: at 70nm technology
74
Interconnect Technologies
mcr
crfl
ww
gn 242)1(2
41gw
wn
cr
crs
Example: w= 85nm, t= 145nm
Optimal interval
mmpsmfsccrrfllDelay wgwntrtr /194/194))1(22(/)(
Optimal buffer size
Optimal delay
rn= 10Kohm,cn=0.25fF,cg=2.34xcn=0.585fFrw=2ohm/um, cw=0.2fF/um
75
Experiments—Optimized Skew
s-mesh(s) m-mesh(s) ratio0.00 2.92E-11 2.92E-11 100.0%0.25 2.79E-11 2.60E-11 93.2%0.40 2.71E-11 2.45E-11 90.4%1.00 2.42E-11 1.98E-11 81.8%3.00 1.70E-11 1.24E-11 73.2%5.00 1.24E-11 8.72E-12 70.5%
skewtotal area
76
Robustness Against Supply Voltage Variations
ave worst ave worst0.00 2.10E-11 2.91E-11 2.10E-11 2.91E-111.00 8.38E-12 1.14E-11 8.26E-12 1.43E-112.00 2.71E-12 4.42E-12 6.18E-12 1.11E-113.00 1.89E-12 3.33E-12 4.83E-12 8.73E-124.00 1.45E-12 2.48E-12 3.88E-12 6.96E-125.00 1.16E-12 2.02E-12 3.18E-12 5.64E-12
total areamutli-level mesh single-level mesh
Top Related