A Framework for Layout-Level Logic Restructuring
Transcript of A Framework for Layout-Level Logic Restructuring
A Framework for Layout-Level Logic Restructuring
Hosung Leo KimJohn Lillis
Motivation: Logical-to-Physical Disconnect
• Performance is determined largely by physical-level Interconnect delay.
• Problem: timing optimization at logic-level ≠actual performance.
Logic-level Optimization
Physical-level Optimization
disconnect
Limited by the structure obtained from the logic-level
fixed netlist
Past Layout-Driven Restructuring Work: Replication Based
• Basic Operations:– Gate Splitting– Fanout Partitioning;
Enables “Path Straightening”
• [Schabas, Brown. ISFPGA03]• [Beraudo, Lillis. DAC03]• [Hrkic, Lillis, Beraudo. TCAD06]• [Chen, Cong. ISFPGA05]
Limitation of Logic Replication
• While interconnect delay can be significantly reduced, the LUT-depth of a path remains unchanged.
• The LUT-depth is typically determined by a technology mapper which does not have an accurate view of critical paths.
Candidate: Remapping
Other Work• Redundant Wires (e.g., [Chang, Cheng, Suaris, Marek-
Sadowska. DAC00])rewire connections while keeping logical equivalence.Predictable, but optimization scope limited
• [Lin, Jagannathan, and Cong. ISFPGA03]Remap based on placement-level timing analysisSignificant restructuring, but placement of remapped cells determined by initial placement (not simultaneous).
• [Singh and Brown. Integration07]Shannon’s expansion / precomputationAllows late signals to skip logic levels, but relatively local in nature
Objectives
• Overcome limitations of basic replication (e.g., fixed LUT-depth)
• Large and flexible remapping space• Explicitly account for placement freedom
of remapped LUTs• Tight coupling with placement
Components of Approach(FPGA Domain)
Placement-LevelStatic Timing
Analysis
Timing-Critical Fan-in Cone Extraction
Induce Replication Tree[Hrkic,TCAD06]
Components of Approach (cont’d)
Replicationtree
Subject Graph(Choice Tree)
A
i j k l
B
a b c d
C
e f g h
choice node
i k
j l
i
l
choice node
e
f g
h
h
e f h
choice node
a b c d
e g
e f g h
k ji j
k l
i j k l
a b
d
g1
g2
g3
c d
a b
g7
g8
dc
g4
g5 c d
g6
c
(a) Given LUT-tree
(b) Choice tree
A
B C
a
e
d
f
bR
cR
Recursive, Exhaustive Ashenhurst LUTDecomposition
Legalizer
Mapper and Embedder(Dynamic
Programming)
Remapping ExampleA
DCB
E
a b c
f g h i j k ld e
m
a
b c
f g
h
i j k l
d e m
A
B
DC
E
a
b c
f g
h
i j k l
d e m
A′
B′
DC
E
A′
DC
B′E
a b c
f g h i j k l
d e
m
(a) Given LUT-tree (b) “Mini-LUT” tree after LUT-decompositions
(c) Alternative mapping (d) Corresponding LUT-tree
Functional Decomposition
• Test for decomposability– Ashenhurst’s theorem
• Recursively decomposeg2
w
g1
y
z
x
0 0 0
0 0
0
0 0
xywz
1
1
1
1
1
1
1
1
f
yxw z
1 bit (Simple)
disjoint
• Simple Disjoint Functional Decomposition
All Recursive Decompositions
f
a b c d
c
a b
d
g1
g2
g3
c d
a b
g7
g8
d
a b
c
g4
g5
g3
a b
c d
g6
g3
g1 g2 dg2 g3 c′g3 a bg4 g5 c′g5 g3 dg6 g3 c′dg7 a b g8g8 c′d
f a b c′d
Choice Tree [Lehman,TCAD97]
A
i j k l
B
a b c d
C
e f g h
choice node
i k
j l
i
l
choice node
e
f g
h
h
e f h
choice node
a b c d
e g
e f g h
k ji j
k l
i j k l
a b
d
g1
g2
g3
c d
a b
g7
g8
dc
g4
g5 c d
g6
c
(a) Given LUT-tree
(b) Choice tree
A
B C
Algorithm
• Mini-LUT Tree Mapping • Fan-in Tree Embedding[Hrkic,TCAD06]
• Simultaneous Remapping and Embedding
Logic Remapping Formulation• Formulation
– Given a “mini-LUT” tree and arrival time at the leaves,– map the tree to K-input LUTs minimizing cost subject
to an arrival time constraint at the root.
a
b c
fed
h i j k l m
g
Solution Signature
• (c,a)– for a sub-tree rooted u, a solution is characterized by two
parameters:• cost of the embedding (and remapping) of a sub-tree.• arrival time at u.
• Dominance Relation– (c,a) is not dominated by (c’,a’) when c
is better than c’ or a is better than a’.
cost
arriv
al ti
me
Solution Sets• Si [u] = {(c,a)}
– u: signal produced by root LUT– i: # inputs of root LUT– c: # LUTs in subtrees– a: the latest among the fan-ins.
i(0,2)
u
h(0,6)
S2[u]={(0,6)}J
• Si[u]– “finalized” solution from Si [u].– c: # LUTs in subtrees + 1– a: the root LUT included.
i(0,2)
u
h(0,6)
S2[u]={(1,7)}• S[u]
– non-dominated_sol(S2[b], … , SK[b])
J
J
Si [u] Example
For simplicity:one LUT = one unit cost
one LUT = one unit delay
J
Si[u] and S[u] Example
• S[b] = non-dominated_sol(S2[b], … , SK[b]) = {(1,7)}
• Si[b]
Computation of Si [u]i = 1, no collapsing of u and Li = K-1, no collapsing of u and ROtherwise, collapsing of u, L, and R.
L R
S4[u] = join(S[a],S3[b]) ∪ join(S2[a], S2[b]) ∪ join(S3[a],S[b])
(a)
(b) (c) (d)
i K - i
J
K = 4
i = 1 i = 2 i = 3 (=K–1)
J J J J J
u
ba
dc
u
ba
dc
Remapping Algorithm Example
arriv
al ti
me
(a) Subject Tree
i(0,2)
a
b c
fed
h(0,6)
j(0,3)
k(0,2)
l(0,1)
m(0,4)
g(0,4)
Algorithms
• Mini-LUT Tree Mapping • Fan-in Tree Embedding[Hrkic,TCAD06]
• Simultaneous Remapping and Embedding
Tree Embedding [Hrkic,TCAD06]
a
e
d
f
a
e
d
f
bR
cR
topologyarrival timepin locations
target layout graphEmbeddingAlgorithm
cost metrics arriv
al ti
me
a
e
d
f
bR
cR
bR
a
cR
e
d
f(0,2)(0,3)
(0,4)
Algorithms
• Mini-LUT Tree Mapping • Fan-in Tree Embedding[Hrkic,TCAD06]
• Simultaneous Remapping and Embedding
Simultaneous Remapping and Embedding
• Formulation– Given a “mini-LUT” tree with fixed leaves and root, and
arrival time at the leaves, a target layout graph– Simultaneously map the tree to K-input LUTs and embed.
Solution Set Si [u][v]
• The remapped root produces signal u and isplaced at v in the target layout graph.
J
Solution Set Si[u][v]
• Solutions Si [u][w] are finalized and drives vertex v in the target layout graph.
• Computed by shortest weight-constrained path algorithm.
w
Si[u][v]
vu
h i j k
J
Solution Set S[u][v]
• S[u][v] ← non-dominated-sol(S2[u][v],…,SK[u][v])• The best remapping regardless of the number of
inputs at v in the target layout graph.
Simultaneous Remapping and Embedding Example
cost
arriv
al ti
me (19,13)
(20,11)
a
b c
fed
h i j k l m
g
(c) S4[a][v23]={(22,10)}
(22,10)
(c) S[a][v23]={(19,13),(20,11),(22,10)}
g
m
lkj
i
h
v23
Experiment• Benchmarks
– 20 MCNC benchmark circuits– At least 20% white space
• Criteria of Interest– LUT depth– Clock period of circuits
• Comparisions– Timing-driven VPR placer– Replication Tree embedder– Arbor embedder [Kim,GLSVLSI06]– Remapping embedder
• Different logic-level mappers and Stability effect of new algorithm
Optimization Flow
Modified Netlist
Initial Netlist & Placement
Tree Embedding
Static Timing Analysis &Replication Tree Construction
Modified Netlist & Placement
Post-Processing & Legalization
• Repl Tree embedder• Remapping embedder
LUT Depth Changes
13974
10888
117
Crit. Path
161085
14988
128
Crit. Path
cktckt
1416s2981010apex298seq88dsip
1314diffeq99alu499misex398apex4
1212tseng99ex5p
New cktInit. ckt
135889
159
2244
Crit. Path
145
10101015102045
Crit. Path
cktckt
1415clma1010s38584.11010s384171111pdc1010ex10101817elliptic1010spla2223frisc55bigkey88des
New cktInit. ckt
Routed Clock Period
0.8260.8480.8861Avg DelayRemapArborReplT-VPR
0
0.2
0.4
0.6
0.8
1
1.2
ex5p
tseng
apex
4mise
x3 alu4
diffeq ds
ip seq
apex
2s2
98 des
bigke
yfris
csp
laell
iptic
ex10
10 pdc
s384
17s3
8584
.1clm
a
VPRReplArborRemap
• Average Normalized Clock Period • Max reduction of REMAP vs Arbor
11.7%
Different Logic-level Mappersand Stability Effect of Remap
0
10
20
30
40
50
60
70
80
90
VPR Repl Remap
FlowMapFlowMap-rZMapPraetorDaomap
• FlowMap: optimal depth.• FlowMap-r: relaxed depth. • ZMap: optimal depth with simultaneous area minimization.• Praetor: minimized area.• Daomap
seq
Span: 12%Span: 4%
Summary
• Study of layout-level restructuring for interconnect optimization.– Functional Decomposition– Choice Tree– Remapping Algorithm– Simultaneous remapping and embedding
• Experimental Result– Average 17% reduction on clock period compared
with T-VPR.
Thank You!