Post on 02-Jan-2016
Algorithms for Simultaneous
Consideration of Multiple Physical
Synthesis Transforms for Timing Closure
Huan Ren and Shantanu Dutt Dept. of Electrical and Computer Engineering
University of Illinois Chicago
Outline
Problem formulation & prior work Network flow model Methodology Flow Discretization Requirements Structures for Accurate Objective Function Cost Simultaneous Detailed Placement—A Holistic Ap
proach! Experimental Results Conclusions
Problem Statement Problem Statement
Simultaneously apply a given set T of synthesis and replacement transforms to cells and nets on critical paths of a initial placed circuit to improve circuit delay near-optimally while satisfying area constraints.
For the current expts, T = {cell resizing, replication, replacement, type-1 & type-2 buffer insertions}
Critical paths (CP) = paths with delay > (1-α) fraction of circuit delay. We choose α=0.1.
Timing objective function [Dutt et al., ICCAD’06]
CS(ni ): critical sinks of nj, in CP D(uj, ni ) : delay of ni at sink uj . Sa(ni ) : allocated slack of ni , which is the path slack of the most critical p
ath through the net divided by the number of nets in the path allows exponential magnification of the timing function for critical nets
in order to approximate min. of the max net timing function ~ min. delays in CP
( )
( , ) / ( )i j i
t j i a in CP u CS n
F D u n S n
Post-placement Incremental Physical Synthesis
Why necessary? Wire load estimation is very inaccurate prior to placement Leaves large room for improvements
Various transforms Cell sizing: effective for improving timing
Continuous sizing [Fishburn et al., ICCAD’85] and Discrete sizing [Hu et al., DAC’07], [Ren et al., IWLS’08]
Options: Different cell sizes available in the library (s options for s sizes)
Incremental global placement Re-place a subset of cells targeting the metric of interest for design clos
ure [Dutt et al., ICCAD’06], [Wonjoon et al., ICCAD’03] Transform options:
Remain in the position in the initial placement Move to the new position determined in a incremental global placement pro
cess
Various Transforms (continued) Buffer insertion
Usually associated with routing tree generation Can be estimated after placement using two different types of
buffers [Jiang et al., TVLSI’98]
Transform options for each buffer type: Do not insert any buffer Insert a buffer with different sizes available in the library (s options
for s sizes)
D
S
S
SBuffer
D
S
S
SBuffer
Driving buffer (type 1)
Isolating buffer (type 2)
Critical
Non-critical
Various Transforms (continued) Cell Replication
Can both improve drive capability and isolating sinks. Need to partition sinks between the two drivers [Srivastava et al., TVLSI’0
4] [Lillis et al., ISCAS’96]. Transform options:
Do not replicate a cell Replicate a driver cell with several possible partitions of the sink cells among th
e two replicas (k options for k partitions)
D
S
S
S
DS
S
S
D
’
Combining Multiple Synthesis Transforms—Past
Work Usually timing-driven Most methods simply apply them sequentially Transforms are not unified [Donath et al., DATE’00]
Incorporating different synthesis transforms in different partition levels in a partition based placement
[Jiang et al., TVLSI’98] Considers both cell resizing and buffer insertion Dynamic sequencing but greedy. Choose the transform with largest
delay improvement to area increase ratio for a net/cell each time. Can be trapped in local optimums. Hard to handle other transforms (e.g. incremental placement which
cause no area increase)
Coarse partition level Detailed partition level
TD pl adjustmentCell resizing Replication and buffering
Our method: -- simultaneous -- unified transforms
An example: A simple transform selection graph (TSG) for one net
Nodes: Transform options for each net (& its cells) Arcs: those in complete bipartite graphs between transform option
sets for a net—all combinations are available as flow paths Flow: has binary meaning: flow through a node the option for the
node is selected Flow: also has a quantitative meaning: In constraint satisfaction
problems, flow amount = constraint metric value = (in our case) sizes of selected options Flow cost is equal to the timing objective function value with
selected options Timing-optimal transform options = the min-cost flow
Network Flow Model
u
Ores(u)
D
2
1 1
2
G
Ob1
v
w
ni
Complete bipartite
TD function value for this choice of options 1 (res), 2 (b1)
Overall Model
Mini-TSG is constructed for each net in CP (net structures)
If two nets have common cells, their net structures are connected by a spanning structure.
n1
n3
n2
n4
S
N1
N3
N2
N4
T
Spanning structures
DPG
Flows indicating selected cell sizes and positions are sent to the DPG to perform detailed placement
Detailed placement “cost” is also considered when selecting options to reach an overall near-optimal soln
Methodology Flow
Determine transform options from trans. set T for every net in CP (from library or using known algorithms, e.g., for replication)
Determine the set CP of near-critical paths = {paths w/ delays >= (1-)[critical path delay)}
Construct the transform selection graph (TSG) and couple it with the detailed placement graph (DPG) [Dutt et al. ICCAD’06]
Determine F- (obj) and C- (discretization) costs for arcs in the TSG
Determine min-cost flow through TSG + DPG using the “concave-cost’’ min-cost method of [Kim & Pardalos, OR Letters, ’99]
Determine transforms across all cells & nets in CP and their legalized detailed placement from the above flow
Mutually exclusive arcs (MEAs) for the output arc and/or input arcs stes of some nodes: at most one arc in an MEA set can have flow through it
Hyper-arc flow Hyper-arcs may be needed in some problems to model k-way dependencies (k > 2). For example, needed in our physical sy
nthesis problems to accurately reflect obj. metric value change caused by flow through nodes in it.
Discretization Requirements in the Network Flow Model
Ores(u)
S
2
1 1
2
T
Ob1MEA sets
MEA sets
4-aryhyperarc
Star graph model—No flow state All flow state
InvalidValid
Star graph model w/ only 2 states
Net Structure and F-cost
First attempt: A linear structure
Product term based arc cost Order of a product term in the timing objective function is the #
of transforms the term is a function of. E.g., Objective func. (linear delay model): d(u,v)+d(u,w)= 2cRdL(ni)+2RdCv+2RdCw
Ores(u)2
1 1 1
2 2 2
1
Ob1 Ores(v) Ores(w)
Distribution node
Gathering node
Rd(Ores(u), Ob1) ·Cv(Ores(v), Orep(u), Orep(v)) order 5
• Each flow path isa transform combination• Set {paths} = Set {transform combos}
uv
w
d(u, v)
uOres(u)v’ Orep(v)
v
w
Ores(v)v
Orep(u)u’
Ob1
u v
u w
d(u, w)
Linear Structure—Issues in Objective Function Cost
Drawbacks of linear structure Cannot handle terms with order >2 Cannot handle terms that depend on two “non-adjacent” transforms.
2
1 1 1
2 2
Supply node
Gathering node
OxOy Oz
T(Ox, Oy)T(Ox
1, Oy2)
T(Ox, Oy ,Oz
)
T(Ox?, Oy
1, Oz2)
T(Ox, Oz)
No bipartite graph
Hyperarcs: Accurate Objective Function Cost
Product term based arc cost Order of a product term in the timing objective function: the # of transforms the ter
m is a function of. Ex: Simple linear delay model: d(u,v)+d(u,w) = 2cRdL(ni)+2RdCw+2RdCv
Rd(Ores(u), Ob1) ·Cv(Ores(v), Orep(v), Orep(u)) order 5
uv
w
d(u, v)i1
j1
k1l1
m1
i1
j1
k1l1
m2
i2
j2
k2l2
m2
2n hyperarcs
• Assuming 2 options per transform, order=n•mn hyperarcs ifm options per transform
Ob1
Ores(u)
Ores(v)Orep(v)
Orep(u)
Meta-hyperarc H forabove order-5 term
“Combination”hyperarcs
d(u, w)
Flo
w n
eed
s to
sel
ect
exac
tly
1 co
mb
. hyp
erar
c
Arcs in network flow graph can only be between two nodes. Parallel arcs between central transform and parallel transform. Each parallel arc & the arcs to the regular transform option nodes it repres
ents corresponds to one hyperarc.
Hyperarcs: Star Graph Structure
T(Oxi, Oy
j, Oz1)
T(Oxi, Oy
j, Ozm)
OyO
…..
Central transform
iOx j Oy
1
Oz….
m
Parallel transform
Regular transforms
Oz
Hyperarc representingan order-3 cost term value
T(Ox, Oy, Oz)
m options
ji
… …
m parallel arcs
T(Oxi, Oy
j, Oz1)
T(Oxi, Oy
j, Ozm)
OxOyParallel arcs
Ox Oy
Oz
Meta arc
Parallel arc sets
Multiple optionnodes
Multiplearcs
Meta Star Graph
f (valid)f’ (invalid)
MEA Satisfaction via Arc C-costs
Besides the objective function based cost (F-cost), a objective function independent C-cost is added
Total arc cost = F-cost + C-cost (cost is a step function—incurred once for any flow amount)
Theorem: A min-cost flow with C-costs on MEA arcs ensures MEA satisfaction
CΔ
Valid flow F-cost
Min-cost invalid flow F-cost
Invalid flow F+C-cost
Valid flow F+C-cost
F-cost diff >= - CΔ C-cost diff >= CΔ+1Total diff >= 1
CΔ +1 CΔ +1
CΔ +1 CΔ +1
MEA sets
Heuristically or randomly select a valid flow& determine its cost C1
Obtain standard min-cost flow of cost C2
w/o discretization constraints
Let CΔ= C1 – C2
Set MEA arc cost = CΔ+1
Consistent Hyperarc flow: Idea: Only the total capacity of a parallel arc a
nd arcs to its consistent regular option nodes can be = to incoming flow amount f.
How: use prime numbers
Hyperarc-Consistent Flows via Arc C-costs
For k total regular option nodes (across allregular transforms), select k prime numbersp1<p2…<pk such that: 1/p1+…+1/pk>(pk-1)/ pk
Cap of non-para arcs: f(1/pj ) Cap of para arcs: f-(cap of its consistent non-para arcs)
C-cost is proportional to arc capacity: Cunit * cap(e) Cunit = (CΔ +1)/ Δcapmin , Δcapmin is the min{cap of invalid arc sets – f} Theorem: A min-cost flow with C-costs on star graph arcs ensures hyparc-consistent flows in star graphs
ji
f(1-1/3)
OxOy
1 Oz
2
f
f(1-1/5)
f(1/3)
f(1/5)
Tot cap = f
Tot cap < fTot cap > f
Discrete Arc Cost
Standard linear flow cost
Cap(e)
Slope=cost(e)/cap(e)
f
c
Cap(e)
f
c
Cost(e)
Step function cost (concave)
Well studied NP-hard problem [Kim et al., ORL’99]; we use their min-cost algo.
• Total arc cost = F-cost + C-cost (incurred once for any amt of flow)—arc cost is discrete
Affected parameters for ni: Driver R: Rd(Ores(u)), WL Li(Orep(u), Ob2), Sink C: Cv(Ores(v), Orep(v), Orep(u)), Cw(Ores(w), Orep(w), Orep(u)) Order > 2 terms: 2RdCv (order 4), 2c · RdLi (order 3), 2RdCw (order 4)
uv
wni
Ob2 Ores(u)
Orep(u)2c · RdLi
Meta arc
Multiple Cost Terms: Intersecting Hyperarcs & Overlapping Star Graphs
There is one star graph structure for each term in the objective function. Option nodes for common transforms between different terms are combined. Example: Consider three transforms: gate sizing (res), replication (rep) and
isolating buffer (b2).
Orep(v)
Ores(v)
Ores(u)
Orep(u) Ob2
Distr. node
Sub-TSG for net ni
Gathering node
Ores(w) Orep(w)
MEA constraint ensures consistent option selection for common transforms in diff. star graphs
Background: Incremental Detailed Placement [Dutt et al., ICCAD’06]
C11 C12 C13 C14
C21 C22 C24
C31 C32 C33
A1
W2
W1
W3
Sink
Row1
Row2
Row3
W21
Cells to be legalized
Flow amount Cell movement Arcs possible movement dire
ctions Arc cost Deterioration on the
objective metric of the corresponding movement
Cells to be legalized are connected to the source
White spaces are connected to the sink.
Flows from the source to the sink perform cell legalization via white spaces.
Source
W1C11 C12 C13 C14
C24 W2W21C21 C22
A1
Directly send branch flows to the detailed placement network flow graph (DPG) to perform simultaneous detailed placement
Flow is sent from the replacement option node of a cell to the corresponding position in the DPG.
Flow amount means the selected size of the cell.
Simultaneous Detailed Placement &
Area Constraint Satisfaction
Coupling between the flow and the size option nodes is needed: Shunting structurePos i of u
Pos j of u
i
j Opl(u)
DPG
i j k
To DPG
Sink
Shunting arc
Ores(u)
Aj(u)
Opl(u)
Amax
Amax -Aj
(u) Aj(u)
(Amax,0)
(Amax,0)
(Aj(u),0)
Experimental Results—Benchmarks
Three benchmark sets TD-Dragon [Yang et al., ICCAD’02], ISCAS’85, TD-IBM Available options
For cell sizing & type-1, type-2 buffers: 4 options for TD-Dragon and ISCAS’85, and 5 options for TD-IBM
For replication: 4 options: 3 replication options with different partitions of sink cells and a no-replication option
For replacement: 2 options: a timing-driven position of each cell is calculated using method in [Dutt et al., ICCAD’06]. A cell can either stay at its original position or be moved to its timing-driven position.
3% extra white space is added to initial circuits in TD-IBM, and 10% extra white space is added to circuits in ISCAS’85 and TD-Dragon
Sequential Application of Transforms
We compare our results to the sequential application of transforms
Order of transform application matters in sequential application. We tested three different orders: 1) Decreasing order of ΔT/ΔA ΔT=25.92% replacement isolating buffer cell resizing drive buffer replic
ation 2) Decreasing order of ΔT ΔT=18.11% replacement cell resizing isolating buffer replication drive b
uffer 3) Increasing order of ΔA ΔT=22.64% replacement isolating buffer drive buffer cell resizing replic
ation
TD-ibm benchmarks
Experimental Results
0
10
20
30
40
50
60
td-i bm01
td-i bm02
td-i bm06
td-i bm9
td-i bm14
td-i bm17
td-i bm18
Avg.
% ti
ming
imp
.
OursSeq
34.8
25.9 8.9
0
5
10
15
20
25
30
C499 C880 C3540 C5315 C7552 Avg.
% ti
ming
imp
.
OursSeq
20.4
12.5
ISCAS’857.9
63.2%
relatively
better
34.4%
relatively
better
Experimental Results
0
5
10
15
20
Mat r i x VP2 MAC32 MAC64 Avg.% ti
ming
imp
rove
ment Ours
Seq
y = 4x - 924
0
1000
2000
3000
4000
5000
6000
7000
0 500 1000 1500 2000
# of cel l s i n CP
Runt
ime
(sec
s)
15.1
8.8 6.3
TD-Dragon
Our run time is about 1.5 times that of the seq. approach Linear increase w.r.t. number of cells on CP.
71.6%
relatively
better
y = 0. 026x + 710
0
2
4
6
8
0 50 100 150 200
# of cel l s (k)
run
time
(k
secs
)
Conclusions
A general discretized n/w flow based approach to TD post-placement multiple physical synthesis; can handle most transforms in an unified manner
Considers transform applications simultaneously Obtained high-quality solutions; is not trapped in local optimas Performs simultaneous detailed placement (DP) so that DP cos
t is considered when selecting transform options Reasonable run time, good scalability & high quality solutions Demonstrates the power of using continuous opt. w/ well-stru
ctured discretizations Applicable to other constrained optimization problems (e.g., po
wer opt w/ area and timing constraints) Future Work: (a) Application to mixed-cell designs; (b) Consider g
lobal re-routing as a transform for signal integrity