Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

30
1 Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) Logic Synthesis Logic Synthesis Sequential Synthesis

Transcript of Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

Page 1: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

1Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence)

Logic SynthesisLogic Synthesis

Sequential Synthesis

Page 2: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

IntroductionIntroduction

• Design optimization from System level to layout – far too complex to approach in one big step

– divide and conquer approach with fine tuned balance between

• capability to apply clean mathematical modeling and abstraction

• algorithmic complexity to compute solutions

• loss of optimality based on hard partitioning

• design and verification methodology that requires user guidance

– sweet spots change over time due to:

• semi-conductor technology improvements

• changes of design architectures/requirements

• new algorithmic solutions, etc.

Page 3: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

IntroductionIntroduction

• Example: traditional ASIC methodology:– RTL verification based on simulation

– logic synthesis from RTL to gate level using combinational paradigm

– static timing analysis

– formal equivalence checking based on combinational paradigm

– ATPG and scan-based testing based on combinational paradigm

– standard cell place & route methodology with zero clock-skew distribution

Page 4: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

4

IntroductionIntroduction

• However:– clean boundaries between modeling levels get blurred

• larger chips and shrinking device sizes require more detailed modeling

• aggressive performance and power requirements

• new modeling and algorithmic approaches

– Example:

• RTL sign-off methodology

• combined approach to logic synthesis and physical design

Page 5: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

5

Overview of Circuit OptimizationsOverview of Circuit Optimizations

Combinational Optimization

Clock Skew Scheduling

Retiming

Architectural Restructuring

System-Level Optimization

Op

tim

iza

tio

n S

pa

ce

Dis

tan

ce

fro

m P

hy

sic

al

Imp

lem

en

tati

on

Ve

rifi

ca

tio

n C

ha

llen

ge

Ne

ces

sit

y o

f In

teg

rate

d

So

luti

on

Page 6: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

6

Sequential Optimization TechniquesSequential Optimization Techniques

• State assignment– Lots of theory, practical only for small FSMs, that too targeting 2-level

control logic

• Sequential don’t cares– Compute unreachable states, use them as external don’t cares for the

next-state logic

• State minimization– Easy for completely specified FSMs (n ¢ log n algorithm)

– Incompletely specified FSMs

• Retiming– balancing of path delays by moving registers within circuit topology

– interleaving with combinational optimization techniques

Page 7: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

7

Integration in Design FlowIntegration in Design Flow

• Optimization Space– significant more optimization freedom for improving performance,

power, and area

• Distance from Physical Implementation– difficult to accurately model impact on final implementation

– difficult to mathematically characterize optimization space

• Verification Challenge– departure from combinational comparison model would break

formal equivalence checking

– different simulation behavior causes acceptance problems

Page 8: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

8

RetimingRetiming

4

2

5

3r1

r2

r3

r4

Dmax=6 Dmax=8

Dmin=3Dmin=2

Dmax=0

Dmin=0

Skew =0Tcycle=8

r’1 r4

4

2

5

3r’1 r4

Skew = -1Tcycle=7( )

Page 9: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

9

RetimingRetiming

• Only setup time constraint (0 clock skew)• Simple integration with other logical (e.g. combinational) or

physical optimizations• Easy combination with clock skew scheduling to obtain global

optimum

• Changes combinational model of design– severe impact on verification methodology

• Inaccurate delay model if applied globally • Computation of equivalent reset state required

++

--

Page 10: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

10

Retiming - Architectural RestructuringRetiming - Architectural Restructuring

2

2 2r2

r2

r3

r4

r1

. . . . . .

{ 20

2

2 2

r2

r3

r4

. . . . . .

{ 10{ 10

r’1

r’4

Page 11: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

11

Retiming - Architectural RestructuringRetiming - Architectural Restructuring

• Smooth extension of regular retiming• Potential to alleviate global performance bottlenecks by adding

sequential redundancy and pipelining

• Significant change of design structure– substantial impact on verification methodology

• Flexible architectural restructuring changes I/O behavior– existing RTL specification methods not always applicable

++

--

Page 12: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

12

ExampleExample

Design example: - 360 I/O - 2240 flip-flops - 41665 timing edges

Target cycle time (norm): 1.5

Worst slack: -0.079 (5%)

Distribution:

20% 30 edges40% 63 edges60% 130 edges80% 249 edges100% 425 edges

Page 13: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

13

VerificationVerification

• Timing verification unchanged• Sequential optimizations change the next-state and output functions

– traditional combinational equivalence checking not applicable

– simulation runs not recognizable by designer - acceptance problems

• Generic solution:– preserve retime function (mapping function) from synthesis for:

• reducing sequential EC problem back to combinational case

– no false positives possible!!!!

• modifying simulation model to reproduce original simulation output

Page 14: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

14

Optimizing Circuits by RetimingOptimizing Circuits by Retiming

Netlist of gates and registers:

Various Goals:– Reduce clock cycle time

– Reduce area

• Reduce number of latches

Inputs

Outputs

Page 15: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

15

RetimingRetiming

Problem

– Pure combinational optimization can be suboptimal since relations across register boundaries are disregarded

Solutions

– Retiming: Move register(s) so that

• clock cycle decreases, or number of registers decreases and

• input-output behavior is preserved

– RnR: Combine retiming with combinational optimization techniques

• Move latches out of the way temporarily

• optimize larger blocks of combinational

Page 16: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

16

Circuit RepresentationCircuit Representation

[Leiserson, Rose and Saxe (1983)]

Circuit represented as retiming graph G(V,E,d,w)

– V set of gates

– E set of connections

– d(v) = delay of gate/vertex v, (d(v)0)

– w(e) = number of registers on edge e, (w(e)0)

Page 17: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

17

Circuit RepresentationCircuit Representation

Example: Correlator (from Leiserson and Saxe) (simplified)

Circuit

(x, y) = 1 if x=y0 otherwise

Operation delay

3

+ 7

Every cycle in Graph has at least one register i.e. no combinational loops.

0

3 3

0

00

02

Retiming Graph (Directed)

7

a b

+

Host

Page 18: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

18

PreliminariesPreliminaries

For a path p :

Clock cycle

1

0

0

)()(

)()(

k

ii

k

ii

ewpw

vdpd endpoints) (includes

: ( ) 0max { ( )}p w p

c d p

For correlator c = 13

Path with

w(p)=00

3 3

0

00

02

7

0 11

0 1 1

ke ee

k kv v v v

Page 19: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

19

• Movement of registers from input to output of a gate or vice versa

• Does not affect gate functionality's• Mathematical formulation:

– r: V Z, an integer vertex labeling

– wr(e) = w(e) + r(v) - r(u) for edge e = (u,v)

Basic OperationBasic Operation

Retime by 1

Retime by -1

Page 20: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

20

Thus in the example, r(u) = -1, r(v) = -1 results in

• For a path p: st, wr(p) = w(p) + r(t) - r(s)

• Retiming:

– r: VZ, an integer vertex labeling

– wr(e) =w(e) + r(v) - r(u) for edge e= (u,v)

– A retiming r is legal if wr(e) 0, eE

Basic OperationBasic Operation

vu0

3 3

0

00

02

7

vu0

3 3

0

11

01

7

Page 21: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

21

Retiming for Minimum Clock CycleRetiming for Minimum Clock Cycle

Problem Statement: (minimum cycle time)

Given G (V, E, d, w), find a legal retiming r so that

is minimized

Retiming: 2 important matrices• Register weight matrix

• Delay matrix

: ( ) 0max { ( )}rp w p

c d p

( , ) min{ ( ) : }p

pW u v w p u v

( , ) max{ ( ) : , ( ) ( , )}p

pD u v d p u v w p w u v

Page 22: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

22

Retiming for minimum clock cycleRetiming for minimum clock cycle

WV0 V1 V2 V3

V0V1V2V3

0 2 2 20 0 0 00 2 0 00 2 2 0

c p, if d(p) then w(p) 1

DV0 V1 V2 V3

V0V1V2V3

0 3 6 1313 3 6 1310 13 3 107 10 13 7

V2V1

v0 0

3 3

0

00

02

7

W = register path weight matrix (minimum # latches on all paths between u and v)D = path delay matrix (maximum delay on all paths between u and v)

Page 23: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

23

Conditions for RetimingConditions for Retiming

Assume that we are asked to check if a retiming exists for a clock cycle Legal retiming: wr(e) 0 for all e. Hence

wr(e) = w(e) = r(v) - r(u) 0 orr (u) - r (v) w (e)

For all paths p: u v such that d(p) , we require wr(p) 1

– Thus 1

0

1

10

0

1 ( ) ( )

[ (

( ) ( ) (

) ( ) ( )]

( ) ( )

)

( )

k

r r ii

k

i i ii

k

w p w e

w e r v

w p r

r v

w p r v v

r u

r

v

Take the least w(p) (tightest constraint) r(u)-r(v) W(u,v)-1

Note: this is independent of the path from u to v, so we just need to apply it to u, v such that D(u,v)

Page 24: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

24

• All constraints in difference-of-2-variable form

• Related to shortest path problem

Solving the constraintsSolving the constraints

Correlator: = 7

Legal: r(u)-r(v)w(e)

0)()(0)()(0)()(0)()(2)()(

03

32

31

21

10

vrvrvrvrvrvrvrvrvrvr

1)()(1)()(1)()(1)()(1)()(1)()(1)()(1)()(

23

13

32

12

02

31

01

30

vrvrvrvrvrvrvrvrvrvrvrvrvrvrvrvr

D>7:r(u)-r(v)W(u,v)-1

V2v1

v0 0

3

0

00

02

7

WV0 V1 V2 V3

V0V1V2V3

0 2 2 20 0 0 00 2 0 00 2 2 0

DV0 V1 V2 V3

V0V1V2V3

0 3 6 13 13 3 6 1310 13 3 107 10 13 7

3

Page 25: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

25

• Do shortest path on constraint graph: (O(|V||E| )) (Bellman Ford Algorithm)

• A solution exists if and only if there exists no negative weighted cycle.

Solving the constraintsSolving the constraints

Legal: r(u)-r(v)w(e)

0)()(0)()(0)()(0)()(2)()(

03

32

31

21

10

vrvrvrvrvrvrvrvrvrvr

1)()(1)()(1)()(1)()(1)()(1)()(1)()(1)()(

23

13

32

12

02

31

01

30

vrvrvrvrvrvrvrvrvrvrvrvrvrvrvrvr

D>7:r(u)-r(v)W(u,v)-1

A solution is r(v0) = r(v3) = 0, r(v1) = r(v2) = -1

r(1)r(0)

r(3)r(2)

0

1 1

1

1

1

-1

-1

-1

0,-1

0,-1

0

0-1

2

Constraint graph

0

0

0

0

0

Page 26: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

26

RetimingRetiming

To find the minimum cycle time, do a binary search among the entries of the D matrix (0(V E logV))

Retime

Retimed correlator:

Clock cycle = 3+3+7=13 Clock cycle = 7

V2v1

v0 0

3 3

0

00

02

7

a b

+

Host

a b

+

Host

WV0 V1 V2 V3

V0V1V2V3

0 2 2 20 0 0 00 2 0 00 2 2 0

DV0 V1 V2 V3

V0V0V1V1V2V2V3V3

0 3 6 1313 3 6 1310 13 3 107 10 13 7

Page 27: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

27

1. Relaxation based: – Repeatedly find critical path; – retime vertex at end of path by +1 (O(VElogV))

2. Also, Mixed Integer Linear Program formulation

Retiming: 2 more algorithmsRetiming: 2 more algorithms

+1

uCritical path

v

Page 28: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

28

Retiming for Minimum AreaRetiming for Minimum Area

Goal: minimize number of registers used

:

:

min ( )

( ( ) ( ) ( ))

( ) ( ( ) ( ))

( ( ) ( ))

( )(# ( ) # ( )

( )

r re E

e u v

e E e u v

u v

v V

Vv V

N w e

w e r v r u

w e r v r u

N r v r u

N r v fanin v fanout v

N a r v

where av is a constant.

Page 29: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

29

Minimize:

Minimum Registers - FormulationMinimum Registers - Formulation

( )vv V

a r v

Subject to: wr(e) =w(e) + r(v) - r(u) 0

• Reducible to a flow problem

Page 30: Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

30

Problems with RetimingProblems with Retiming

• Computation of equivalent initial states – do not exist necessarily

– General solution requires replication of logic for initialization

• Timing models– too far away from actual implementation

1

0 ?

?