Post on 14-Dec-2015
1Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence)
Logic SynthesisLogic Synthesis
Sequential Synthesis
IntroductionIntroduction
• Design optimization from System level to layout – far too complex to approach in one big step
– divide and conquer approach with fine tuned balance between
• capability to apply clean mathematical modeling and abstraction
• algorithmic complexity to compute solutions
• loss of optimality based on hard partitioning
• design and verification methodology that requires user guidance
– sweet spots change over time due to:
• semi-conductor technology improvements
• changes of design architectures/requirements
• new algorithmic solutions, etc.
IntroductionIntroduction
• Example: traditional ASIC methodology:– RTL verification based on simulation
– logic synthesis from RTL to gate level using combinational paradigm
– static timing analysis
– formal equivalence checking based on combinational paradigm
– ATPG and scan-based testing based on combinational paradigm
– standard cell place & route methodology with zero clock-skew distribution
4
IntroductionIntroduction
• However:– clean boundaries between modeling levels get blurred
• larger chips and shrinking device sizes require more detailed modeling
• aggressive performance and power requirements
• new modeling and algorithmic approaches
– Example:
• RTL sign-off methodology
• combined approach to logic synthesis and physical design
5
Overview of Circuit OptimizationsOverview of Circuit Optimizations
Combinational Optimization
Clock Skew Scheduling
Retiming
Architectural Restructuring
System-Level Optimization
Op
tim
iza
tio
n S
pa
ce
Dis
tan
ce
fro
m P
hy
sic
al
Imp
lem
en
tati
on
Ve
rifi
ca
tio
n C
ha
llen
ge
Ne
ces
sit
y o
f In
teg
rate
d
So
luti
on
6
Sequential Optimization TechniquesSequential Optimization Techniques
• State assignment– Lots of theory, practical only for small FSMs, that too targeting 2-level
control logic
• Sequential don’t cares– Compute unreachable states, use them as external don’t cares for the
next-state logic
• State minimization– Easy for completely specified FSMs (n ¢ log n algorithm)
– Incompletely specified FSMs
• Retiming– balancing of path delays by moving registers within circuit topology
– interleaving with combinational optimization techniques
7
Integration in Design FlowIntegration in Design Flow
• Optimization Space– significant more optimization freedom for improving performance,
power, and area
• Distance from Physical Implementation– difficult to accurately model impact on final implementation
– difficult to mathematically characterize optimization space
• Verification Challenge– departure from combinational comparison model would break
formal equivalence checking
– different simulation behavior causes acceptance problems
8
RetimingRetiming
4
2
5
3r1
r2
r3
r4
Dmax=6 Dmax=8
Dmin=3Dmin=2
Dmax=0
Dmin=0
Skew =0Tcycle=8
r’1 r4
4
2
5
3r’1 r4
Skew = -1Tcycle=7( )
9
RetimingRetiming
• Only setup time constraint (0 clock skew)• Simple integration with other logical (e.g. combinational) or
physical optimizations• Easy combination with clock skew scheduling to obtain global
optimum
• Changes combinational model of design– severe impact on verification methodology
• Inaccurate delay model if applied globally • Computation of equivalent reset state required
++
--
10
Retiming - Architectural RestructuringRetiming - Architectural Restructuring
2
2 2r2
r2
r3
r4
r1
. . . . . .
{ 20
2
2 2
r2
r3
r4
. . . . . .
{ 10{ 10
r’1
r’4
11
Retiming - Architectural RestructuringRetiming - Architectural Restructuring
• Smooth extension of regular retiming• Potential to alleviate global performance bottlenecks by adding
sequential redundancy and pipelining
• Significant change of design structure– substantial impact on verification methodology
• Flexible architectural restructuring changes I/O behavior– existing RTL specification methods not always applicable
++
--
12
ExampleExample
Design example: - 360 I/O - 2240 flip-flops - 41665 timing edges
Target cycle time (norm): 1.5
Worst slack: -0.079 (5%)
Distribution:
20% 30 edges40% 63 edges60% 130 edges80% 249 edges100% 425 edges
13
VerificationVerification
• Timing verification unchanged• Sequential optimizations change the next-state and output functions
– traditional combinational equivalence checking not applicable
– simulation runs not recognizable by designer - acceptance problems
• Generic solution:– preserve retime function (mapping function) from synthesis for:
• reducing sequential EC problem back to combinational case
– no false positives possible!!!!
• modifying simulation model to reproduce original simulation output
14
Optimizing Circuits by RetimingOptimizing Circuits by Retiming
Netlist of gates and registers:
Various Goals:– Reduce clock cycle time
– Reduce area
• Reduce number of latches
Inputs
Outputs
15
RetimingRetiming
Problem
– Pure combinational optimization can be suboptimal since relations across register boundaries are disregarded
Solutions
– Retiming: Move register(s) so that
• clock cycle decreases, or number of registers decreases and
• input-output behavior is preserved
– RnR: Combine retiming with combinational optimization techniques
• Move latches out of the way temporarily
• optimize larger blocks of combinational
16
Circuit RepresentationCircuit Representation
[Leiserson, Rose and Saxe (1983)]
Circuit represented as retiming graph G(V,E,d,w)
– V set of gates
– E set of connections
– d(v) = delay of gate/vertex v, (d(v)0)
– w(e) = number of registers on edge e, (w(e)0)
17
Circuit RepresentationCircuit Representation
Example: Correlator (from Leiserson and Saxe) (simplified)
Circuit
(x, y) = 1 if x=y0 otherwise
Operation delay
3
+ 7
Every cycle in Graph has at least one register i.e. no combinational loops.
0
3 3
0
00
02
Retiming Graph (Directed)
7
a b
+
Host
18
PreliminariesPreliminaries
For a path p :
Clock cycle
1
0
0
)()(
)()(
k
ii
k
ii
ewpw
vdpd endpoints) (includes
: ( ) 0max { ( )}p w p
c d p
For correlator c = 13
Path with
w(p)=00
3 3
0
00
02
7
0 11
0 1 1
ke ee
k kv v v v
19
• Movement of registers from input to output of a gate or vice versa
• Does not affect gate functionality's• Mathematical formulation:
– r: V Z, an integer vertex labeling
– wr(e) = w(e) + r(v) - r(u) for edge e = (u,v)
Basic OperationBasic Operation
Retime by 1
Retime by -1
20
Thus in the example, r(u) = -1, r(v) = -1 results in
• For a path p: st, wr(p) = w(p) + r(t) - r(s)
• Retiming:
– r: VZ, an integer vertex labeling
– wr(e) =w(e) + r(v) - r(u) for edge e= (u,v)
– A retiming r is legal if wr(e) 0, eE
Basic OperationBasic Operation
vu0
3 3
0
00
02
7
vu0
3 3
0
11
01
7
21
Retiming for Minimum Clock CycleRetiming for Minimum Clock Cycle
Problem Statement: (minimum cycle time)
Given G (V, E, d, w), find a legal retiming r so that
is minimized
Retiming: 2 important matrices• Register weight matrix
• Delay matrix
: ( ) 0max { ( )}rp w p
c d p
( , ) min{ ( ) : }p
pW u v w p u v
( , ) max{ ( ) : , ( ) ( , )}p
pD u v d p u v w p w u v
22
Retiming for minimum clock cycleRetiming for minimum clock cycle
WV0 V1 V2 V3
V0V1V2V3
0 2 2 20 0 0 00 2 0 00 2 2 0
c p, if d(p) then w(p) 1
DV0 V1 V2 V3
V0V1V2V3
0 3 6 1313 3 6 1310 13 3 107 10 13 7
V2V1
v0 0
3 3
0
00
02
7
W = register path weight matrix (minimum # latches on all paths between u and v)D = path delay matrix (maximum delay on all paths between u and v)
23
Conditions for RetimingConditions for Retiming
Assume that we are asked to check if a retiming exists for a clock cycle Legal retiming: wr(e) 0 for all e. Hence
wr(e) = w(e) = r(v) - r(u) 0 orr (u) - r (v) w (e)
For all paths p: u v such that d(p) , we require wr(p) 1
– Thus 1
0
1
10
0
1 ( ) ( )
[ (
( ) ( ) (
) ( ) ( )]
( ) ( )
)
( )
k
r r ii
k
i i ii
k
w p w e
w e r v
w p r
r v
w p r v v
r u
r
v
Take the least w(p) (tightest constraint) r(u)-r(v) W(u,v)-1
Note: this is independent of the path from u to v, so we just need to apply it to u, v such that D(u,v)
24
• All constraints in difference-of-2-variable form
• Related to shortest path problem
Solving the constraintsSolving the constraints
Correlator: = 7
Legal: r(u)-r(v)w(e)
0)()(0)()(0)()(0)()(2)()(
03
32
31
21
10
vrvrvrvrvrvrvrvrvrvr
1)()(1)()(1)()(1)()(1)()(1)()(1)()(1)()(
23
13
32
12
02
31
01
30
vrvrvrvrvrvrvrvrvrvrvrvrvrvrvrvr
D>7:r(u)-r(v)W(u,v)-1
V2v1
v0 0
3
0
00
02
7
WV0 V1 V2 V3
V0V1V2V3
0 2 2 20 0 0 00 2 0 00 2 2 0
DV0 V1 V2 V3
V0V1V2V3
0 3 6 13 13 3 6 1310 13 3 107 10 13 7
3
25
• Do shortest path on constraint graph: (O(|V||E| )) (Bellman Ford Algorithm)
• A solution exists if and only if there exists no negative weighted cycle.
Solving the constraintsSolving the constraints
Legal: r(u)-r(v)w(e)
0)()(0)()(0)()(0)()(2)()(
03
32
31
21
10
vrvrvrvrvrvrvrvrvrvr
1)()(1)()(1)()(1)()(1)()(1)()(1)()(1)()(
23
13
32
12
02
31
01
30
vrvrvrvrvrvrvrvrvrvrvrvrvrvrvrvr
D>7:r(u)-r(v)W(u,v)-1
A solution is r(v0) = r(v3) = 0, r(v1) = r(v2) = -1
r(1)r(0)
r(3)r(2)
0
1 1
1
1
1
-1
-1
-1
0,-1
0,-1
0
0-1
2
Constraint graph
0
0
0
0
0
26
RetimingRetiming
To find the minimum cycle time, do a binary search among the entries of the D matrix (0(V E logV))
Retime
Retimed correlator:
Clock cycle = 3+3+7=13 Clock cycle = 7
V2v1
v0 0
3 3
0
00
02
7
a b
+
Host
a b
+
Host
WV0 V1 V2 V3
V0V1V2V3
0 2 2 20 0 0 00 2 0 00 2 2 0
DV0 V1 V2 V3
V0V0V1V1V2V2V3V3
0 3 6 1313 3 6 1310 13 3 107 10 13 7
27
1. Relaxation based: – Repeatedly find critical path; – retime vertex at end of path by +1 (O(VElogV))
2. Also, Mixed Integer Linear Program formulation
Retiming: 2 more algorithmsRetiming: 2 more algorithms
+1
uCritical path
v
28
Retiming for Minimum AreaRetiming for Minimum Area
Goal: minimize number of registers used
:
:
min ( )
( ( ) ( ) ( ))
( ) ( ( ) ( ))
( ( ) ( ))
( )(# ( ) # ( )
( )
r re E
e u v
e E e u v
u v
v V
Vv V
N w e
w e r v r u
w e r v r u
N r v r u
N r v fanin v fanout v
N a r v
where av is a constant.
29
Minimize:
Minimum Registers - FormulationMinimum Registers - Formulation
( )vv V
a r v
Subject to: wr(e) =w(e) + r(v) - r(u) 0
• Reducible to a flow problem
30
Problems with RetimingProblems with Retiming
• Computation of equivalent initial states – do not exist necessarily
– General solution requires replication of logic for initialization
• Timing models– too far away from actual implementation
1
0 ?
?