An Efficient Technology Mapping Algorithm Targeting Routing Congestion Under Delay Constraints...
-
Upload
mikel-orum -
Category
Documents
-
view
216 -
download
0
Transcript of An Efficient Technology Mapping Algorithm Targeting Routing Congestion Under Delay Constraints...
An Efficient Technology Mapping Algorithm Targeting Routing
Congestion Under Delay Constraints
Rupesh S. Shelar
Intel Corporation
Hillsboro, OR 97124
Prashant Saxena
Synopsys Inc
Hillsboro, OR 97124
Xinning Wang
Intel Corporation
Hillsboro, OR 97124
Sachin S. Sapatnekar
University of Minnesota
Minneapolis, MN 55455
International Symposium on Physical Design San Francisco
April 5, 2005
2
Outline
• Introduction• Algorithm Overview • Congestion Map Generation • Slack-constrained Covering• Results & Conclusion
3
Motivation
Technology Scaling Routing resources growing at same rate?
Upper metal layers for global signals Resistive (i.e., wide) wires
Result: Routing Congestion
4
Targeting Routing Congestion
Can be alleviated during routing, placement, technology mapping, and logic synthesis
Limited flexibility during P & R points to technology mapping
Mapping decides wires
Desig
n
Fre
ed
om
Placement
RTL
Routing
TechnologyMapping
5
Previous Work
Structural logic synthesis Adhesion metric, Kudva et al.,TCAD’03
Computationally expensive
Congestion-aware Technology Mapping using Wirelength, Stok et al., ICCAD’01, Pandini et al.,TCAD’03
– “ … a purely top-down single-pass congestion-aware technology mapping is merely wishful thinking.’’
Mutual contraction (MC), Liu et al., ISPD’05
Predictive probabilistic congestion, Shelar et al., TCAD’05
– Congestion map based on subject graph
6
Outline
• Introduction• Algorithm Overview • Congestion Map Generation • Slack-constrained Covering• Results & Conclusion
7
Problem Definition
Minimize routing congestion under delay constraints during technology mapping
Dynamic programming for delay constraints Routing congestion: captured by track overflow and
max. congestion Minimize total track overflow under delay constraints
8
Employing Placement-level Metric
Wirelength and mutual contraction cannot capture track overflow
Predictive probabilistic congestion map can Same congestion map for different choices
Can we instead employ placement-/routing-level metric?
TechnologyMapping
Placement
Routing
RTLPredictive PCM,
Mutual Contraction, Wirelength
Probabilistic CM (PCM)
CongestionMap (CM)
Estim
atio
n
Erro
r
9
Probabilistic Congestion Map
Probabilistic congestion map, a post-placement metric Lou et al., TCAD’02; Westra et al., ISPD’04
Pin 2
Pin 1
1
2
34 5
6
All routes equally possible:
Probability of any route =1/6
1/12 3/12 5/12 3/12
1/12 2/12 2/12 1/12
1/12 2/12 2/12 1/12
3/12 5/12 3/12 1/12
10
“Chicken-and-Egg” Problem
Overflow computation requires congestion map Available after mapping
Track overflow of a wire depends on other cones also Overflow due to Wire1 depends on Wire2 and vice versa
Area or delay at Wire1 do not depend on Wire2
Placement
RTL
ProbabilisticCM
TechnologyMapping
?
RTL
ProbabilisticCM
TechnologyMapping
+ Placement
Wire1
Wire2
11
Solution OverviewTrack overflow cannot be computed
incrementally, but congestion maps can. Construct congestion maps using algebraic operations
Defer track overflow computation to covering Requires congestion maps capturing all wires in
mapping solutions
Overcome the “chicken-and-egg” problem: Construct congestion maps bottom-up during matching Compute track overflow during covering
12
Outline
• Introduction• Algorithm Overview• Congestion Map Generation • Slack-constrained Covering• Results & Conclusion
13
The Matching Phase
Store the load-delay curve containing non-inferior delay matches Performed for all nodes in topological order
Compute congestion map for each non-inferior match
Load
Delay
L1 L2
M1
M2
M3
D1
D2
M1
M2
M3 Cunknown
14
Algebraic Addition for Congestion Maps
N2
N3
N1
M1
1.25 0.75 0.00
0.00
0.00
0.000.00
0.75 0.25
0.00 0.00 0.00
0.00
0.00
0.501.00
0.00 0.00
0.00 0.25 0.25
0.50
0.25
0.250.00
0.00 0.50
1.25 1.00 0.25
0.50
0.25
0.751.00
0.75 0.75
+
+ =
15
Handling Multiple Fanouts
For forward propagation, divide congestion maps by the number of fanoutsAllows correct computation of maps for
solutions at PO’s
N1
N2
N3
0.2 0.8 0.4
0.40.2
0.20.4
0.6 0.6
0.1 0.4 0.2
0.20.1
0.10.2
0.3 0.3
16
Congestion Map Generation
Congestion map for a match at a node represents wires from the fan-in cone only
Add congestion maps for matches at PO’s to get congestion map for an entire solution
Extensible to congestion based on fast global routing
Applicable to generation and propagation of any 2-D maps, e.g., power-density map
17
Outline
• Introduction• Algorithm Overview• Congestion Map Generation • Slack-constrained Covering• Results & Conclusion
18
Exploiting Slacks
Classical covering: choose an optimum delay match For Cload=15, M2 is optimal with Delay = 50
Assume: slack of 10 M1 and M3 also satisfy delay constraints
Allow non-delay-optimal matches on non-critical paths M1 or M3 preferred if the corresponding overflows smaller
Load
Delay
10 20
M1
M2
M3
10
3
21
M1
M2
M3 1
5
4060
19
Slack-constrained CoveringCompute delays and slacks at the primary
outputs (PO’s) due to delay-optimal solutionCompute corresponding congestion mapFor all nodes in reverse topological order,
– Compute delay and track overflow due to delay-optimal and congestion-optimal matches
– If congestion-optimal match exists, store it– Else, store delay-optimal match – Propagate updated slacks to inputs of match
20
Extensions and Complexity
Slack-constrained covering applicable for Different cost functions, e.g., maximum congestion Traditional objectives, e.g., area, power
Time complexity Linear in number of nodes (for a fixed library and layout area)
Run-times practical
Memory complexity High memory requirement due to congestion map storage
for all matches– Asymptotically same as conventional
Memory efficient variants possible Current implementation applicable up to ~5,000 cells Ideal for ECO mode hot-spot (re-)synthesis
21
Outline
• Introduction• Algorithm Overview• Congestion Map Generation • Slack-constrained Covering• Results & Conclusion
22
Experimental Setup
Mapping algorithm incorporated in SIS Capo for placement Timing driven routing ISCAS’85 benchmarks 100 nm process parameters from Predictive
Technology Model Library: enhanced lib2.genlib with up to 4
strengths for each gate Experiments on 400 MHz Sun Ultra Sparc 60 Comparison with conventional mapping in SIS
Subject Graph
Placement
Placement
Routing
TimingAnalysis
TechnologyMapping
23
Track Overflow Comparison
0
200
400
600
800
1000
1200
1400
1600
Circuits
Overf
low
Conventional
Ours
24
Maximum Congestion Comparison
0
0.5
1
1.5
2
2.5
Circuits
Max.
Co
ng
esti
on
Conventional
Ours
25
Delay Comparison
0
1000
2000
3000
4000
5000
6000
Circuits
Dela
y (
ps)
Conventional
Ours
26
Row-utilization Comparison
68
70
72
74
76
78
80
82
84
Circuits
Ro
w-u
tili
zati
on
(%
)
Conventional
Ours
27
Run-time Comparison
0
50
100
150
200
250
300
Circuits
Ru
n-t
ime
(s
)
ConventionalOurs
28
Summary of Experimental Results
Track overflows: 44% betterDelays: no adverse impactMaximum congestion: 25% betterRow-utilization: no significant correlationRun-times: 2x worse, but still practical
29
ConclusionPresented a delay-optimal mapping algorithm to
minimize routing congestionValidated effectiveness on benchmark circuitsAlgorithmic framework applicable for optimization
of other cost functions and propertiesFuture directions
Implementation of memory efficient version Placement-legalization based flow Application to ECO-mode logic (re-)synthesis
30
Backup
31
Analogy with Classical Matching Mapping for area optimization under delay constraint,
Chaudhary et al., TCAD’95 Similarities
The gate-area for a match at a given node represents gates only due to the nodes in the fan-in cone
Similarly, congestion map for a match at a given node represents wires due to the nodes in fan-in cone
Gate-area divided at multiple fanout points Congestion-maps divided at multiple fanout points
Differences Ensures delay optimality Wire-delays accounted for in the delay computation Routing congestion more complex than gate-area
32
Experimental ResultsCkt. Area (μ2) RU (%) Overflow (Gain %) Delay (ps)
C1355 3439 80 81 227 134 40 789 786
C1908 3616 80 80 323 225 30 1059 1042
C2670 11707 75 77 417 167 59 1258 1240
C3540 25994 75 80 1078 294 72 1655 1632
C432 1962 80 82 66 49 25 854 842
C499 3550 80 79 262 135 48 823 821
C5315 17265 75 77 1100 289 73 1120 1114
C6288 21379 80 80 515 452 12 4771 4731
C7552 28223 75 73 1343 547 59 1341 1309
C880 3944 80 76 378 260 31 890 884
Avg. 78 78 554 255 44 1455 1439
33
Experimental Results (Continued)
Ckt. MC # of Cells Run-time (s)
C1355 1.70 1.30 621 592 11 12
C1908 1.70 1.40 578 571 12 13
C2670 1.65 1.20 1482 1426 24 51
C3540 2.25 1.40 3254 3105 90 279
C432 1.40 1.20 264 311 7 9
C499 1.60 1.20 595 563 11 13
C5315 2.20 1.40 2122 2131 38 121
C6288 1.70 1.40 3737 3596 88 135
C7552 1.60 1.30 3198 3080 132 213
C880 1.70 1.20 584 575 12 13
Avg. 1.74 1.29 1640 1595 42 85