Application of Specific Delay Window Routing for Timing ...

49
Application of Specific Delay Window Routing for Timing Optimization in FPGA Designs Evan Wegley, Qinhai Zhang Lattice Semiconductor Corporation 23 rd ACM / SIGDA International Symposium on FPGAs

Transcript of Application of Specific Delay Window Routing for Timing ...

Page 1: Application of Specific Delay Window Routing for Timing ...

Application of Specific Delay Window Routing

for Timing Optimization in FPGA Designs

Evan Wegley, Qinhai Zhang

Lattice Semiconductor Corporation

23rd ACM / SIGDA International Symposium on FPGAs

Page 2: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 2 FPGA 2015

FPGA Routing

• FPGA routers must optimize for competing goals

• Routability

• Timing performance

• Timing constraints come in many forms

• Setup timing

• Hold timing

• Other timing constraints

• Maximum skew, for example

Page 3: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 3 FPGA 2015

Setup Timing

• Constrains data paths between registers to arrive before the

next clock cycle starts

• Produces an upper bound on delay of the data path

0.1 ns ≤ Delay ≤ 3.0 ns

Page 4: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 4 FPGA 2015

Hold Timing

• Constrains data paths between registers to arrive after the

previous cycle has been captured

• Produces a lower bound on delay of the data path

• At routing phase, we can correct hold timing violations by

rerouting the data path to have additional delay

0.1 ns ≤ Delay ≤ 3.0 ns

Page 5: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 5 FPGA 2015

More Complex Hold Timing

• Example: constrain the data paths to 2.0 ns for maximum setup

time and 0.4 ns for minimum hold time with respect to clock

Page 6: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 6 FPGA 2015

More Complex Hold Timing

• Example: constrain the data paths to 2.0 ns for maximum setup

time and 0.4 ns for minimum hold time with respect to clock

SlackA, Hold = (1.0 + 0.3 – 1.2) – 0.4

SlackA, Hold = –0.3 ns

Hold violation!

Note: ignoring LUT logic delays

Page 7: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 7 FPGA 2015

More Complex Hold Timing

• Example: constrain the data paths to 2.0 ns for maximum setup

time and 0.4 ns for minimum hold time with respect to clock

SlackA, Hold = (1.0 + 0.3 – 1.2) – 0.4

SlackA, Hold = –0.3 ns

Hold violation!

SlackB, Setup = 2.0 – (1.0 + 2.1 – 1.2)

SlackB, Setup = 0.1 ns

Close to setup threshold

Note: ignoring LUT logic delays

Page 8: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 8 FPGA 2015

More Complex Hold Timing

• Need awareness of both setup and hold requirements for each

connection

• We can do so by setting lower and upper bounds on delay for

each connection: slack allocation

Page 9: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 9 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

Page 10: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 10 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

• Allocate slack on setup timing to get upper bound

[1.00, 1.03] [0.56, 2.17]

[0.57, 2.17]

Note: ignoring LUT logic delays

Page 11: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 11 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

• Allocate slack on setup timing to get upper bound

• Allocate slack on hold timing to get lower bound

[1.00, 1.03] [0.60, 2.17]

[0.60, 2.17]

Note: ignoring LUT logic delays

Page 12: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 12 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

• Allocate slack on setup timing to get upper bound

• Allocate slack on hold timing to get lower bound

[1.00, 1.03] [0.60, 2.17]

[0.60, 2.17]

Reroute

Note: ignoring LUT logic delays

Page 13: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 13 FPGA 2015

Maximum Skew

• Constrains the range of delays on the loads of some net(s) or

buses

• Example: constrain the net to have a maximum skew of 0.5 ns

Page 14: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 14 FPGA 2015

Maximum Skew

• Constrains the range of delays on the loads of some net(s) or

buses

• Example: constrain the net to have a maximum skew of 0.5 ns

• Initial skew is 1.0 – 0.2 = 0.8 ns

• Violates constraint of 0.5 ns maximum skew by 0.3 ns

Page 15: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 15 FPGA 2015

Maximum Skew

• Setting delay bounds on each connection

• Step 1: Use the largest delay value as the upper bound

[0.50, 1.00]

Page 16: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 16 FPGA 2015

Maximum Skew

• Setting delay bounds on each connection

• Step 1: Use the largest delay value as the upper bound

• Step 2: Find the lower bound by subtracting the constraint value

[0.50, 1.00]

Page 17: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 17 FPGA 2015

Maximum Skew

• Setting delay bounds on each connection

• Step 1: Use the largest delay value as the upper bound

• Step 2: Find the lower bound by subtracting the constraint value

• Step 3: Reroute any connections falling outside the bounds

[0.50, 1.00]

Page 18: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 18 FPGA 2015

• Definition: routing a connection with a target delay between

some lower and upper bounds

• Lower and upper delay bounds form a “window”

• Allows us to optimize for various timing constraints constituting

both lower and upper bounds on delay

Specific Delay Window Routing

Page 19: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 19 FPGA 2015

Current FPGA Routing Technology

• PathFinder based (McMurchie 1995)

• Basis of VPR router (Betz 1997) and many other academic and

commercial FPGA routers

• Effective at balancing routability and timing performance

• Delay cost is the total delay of the connection

• Traditional single-wave search: total delay contains estimation

Costn = Aij dn + (1 - Aij) cn

Delay Cost Congestion Cost

Criticality Factor

Page 20: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 20 FPGA 2015

Single-Wave Expansion

• One approach to performing Specific Delay Window Routing

• Single-wave expansion using delay estimation to direct search

towards the target delay window

Total Delay = Known Delay + Estimated Delay

Page 21: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 21 FPGA 2015

Single-Wave Expansion

• One approach to performing Specific Delay Window Routing

• Single-wave expansion using delay estimation to direct search

towards the target delay window

Total Delay = Known Delay + Estimated Delay

Page 22: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 22 FPGA 2015

Single-Wave Expansion

• One approach to performing Specific Delay Window Routing

• Single-wave expansion using delay estimation to direct search

towards the target delay window

Total Delay = Known Delay + Estimated Delay

Page 23: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 23 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Manhattan distance: 4

• Estimate: 2 “X2” wires

• Switch between “X2” wires

is missing!

Page 24: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 24 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• LUT input pins have

different delays

• Pins are logically equivalent

• Actual target pin not known

during estimation

Page 25: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 25 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• Congestion adds variability

Page 26: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 26 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• Congestion adds variability

Congested

Area

Page 27: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 27 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• Congestion adds variability

Congested

Area

Page 28: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 28 FPGA 2015

Dual-Wave Expansion

• To address these accuracy issues, we propose to use dual-wave

search

• Instead of directing one search towards the target,

we can expand from both the source and target

• Each time the waves intersect, we check if the resulting path meets

the target delay window

• This eliminates estimation from the selection process

Page 29: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 29 FPGA 2015

Dual-Wave Expansion

Wave from both source and target

Page 30: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 30 FPGA 2015

Dual-Wave Expansion

Intersection:

Total delay does not

meet target window

Total Delay = Known Delay + Known Delay

Page 31: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 31 FPGA 2015

Dual-Wave Expansion

Intersection:

Total delay does

meet target window

Total Delay = Known Delay + Known Delay

Page 32: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 32 FPGA 2015

Routing Flow

No

rmal

Ro

uti

ng

• Global Routing

• Clock routing, other architecture-specific routing

• Detail Routing

• PathFinder-based

Sp

ecif

ic D

ela

y W

ind

ow

Ro

uti

ng

• Skew Optimization

• Calculate delay windows for constrained connections

• Perform specific delay window routing on connections in

violation of skew constraint

• Hold Timing Optimization

• Calculate delay windows using slack allocation

• Perform specific delay window routing on connections in

violation of hold timing

Page 33: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 33 FPGA 2015

Experimental Methodology

• Compared two generations of commercial tools

• Older one uses single-wave expansion with delay estimation

• Newer one uses dual-wave expansion without delay estimation

• Ten designs each tested for hold timing and skew correction

• Customer designs with known violations

• Designs with strict skew constraints

Device Device

Design Family LUT4s Utilization Design Family LUT4s Utilization

N0 MachXO2 4K 70% H0 LatticeEC 33K 9%

N1 LatticeECP3 150K 11% H1 LatticeEC 33K 9%

N2 LatticeECP2 20K 45% H2 MachXO 2K 85%

N3 LatticeECP3 150K 66% H3 MachXO 2K 85%

N4 ECP5 85K 67% H4 LatticeECP3 150K 83%

B0 MachXO2 2K 67% H5 LatticeECP3 150K 14%

B1 LatticeECP2 20K 74% H6 LatticeECP3 150K 82%

B2 LatticeECP3 70K 41% H7 LatticeECP3 150K 87%

B3 LatticeECP3 150K 28% H8 LatticeECP3 150K 76%

B4 ECP5 45K 23% H9 LatticeECP3 150K 84%

Page 34: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 34 FPGA 2015

Experimental Results: Skew Correction

• Varied the skew constraint values from 4.0 ns (loose) to

0.0625 ns (very strict) and compared ability to find a solution

10 10 10

7

3

1 0

10 10 10 10 10

8

4

0

2

4

6

8

10

4.0000 2.0000 1.0000 0.5000 0.2500 0.1250 0.0625

Nu

mb

er

of

de

sig

ns

pa

ss

ed

Constraint Value (ns)

Single

Dual

Page 35: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 35 FPGA 2015

Experimental Results: Skew Correction

• Runtime comparison between dual-wave and single-wave

approaches

0.1

1

10

100

4.0000 2.0000 1.0000 0.5000 0.2500 0.1250 0.0625

Ru

nti

me

Ra

tio

(d

ual-

wave r

untim

e / s

ingle

-wave r

untim

e)

Constraint Value (ns)

Individual Designs Mean

Page 36: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 36 FPGA 2015

Experimental Results: Hold Correction

• Compared ability to solve hold timing violations in 10 designs

• Single-wave approach corrected only 5 of 10

• Dual-wave approach corrected all 10 of 10

-3

-2

-1

0

H0 H1 H2 H3 H4 H5 H6 H7 H8 H9

Wo

rst

Neg

ati

ve

Sla

ck

(n

s)

No Correction Single Dual

Page 37: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 37 FPGA 2015

Experimental Results: Hold Correction

• Runtime not significantly increased using dual-wave approach

• Older tool generation used heuristic method for setup-awareness

• Newer tool generation uses slack allocation, which uses more time

0.0

0.5

1.0

1.5

2.0

H0 H1 H2 H3 H4 H5 H6 H7 H8 H9

Ru

nti

me

Rati

o

(dual-w

ave r

untim

e / s

ingle

-wave r

untim

e)

Individual Designs Mean

Page 38: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 38 FPGA 2015

Conclusion

• Presented specific delay window routing as a generic framework

to optimize for constraints constituting lower and upper bounds

on delay

• Proposed dual-wave expansion to address accuracy issues with

single-wave expansion when targeting a specific delay window

• Future work

• Applying specific delay window routing to more constraints

• Clock-to-output

• Cycle stealing

Page 39: Application of Specific Delay Window Routing for Timing ...
Page 40: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 40 FPGA 2015

Multiple Speed Grades

• We consider multiple speed

grades for slack allocation

• For setup timing, we use the

design speed grade

• For hold timing, we use the

fastest speed grade

• Inverted delay windows

• If the lower bound exceeds

the upper bound, then we

cannot satisfy the setup and

hold requirements for the

connection

Page 41: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 41 FPGA 2015

Slack Allocation

LOOP {

1. Update timing

2. Compute slack on each connection

slack(c) = min(slack(p))

where p is any path containing c

Stop if the cumulative slack is near zero

3. Compute the weight for each connection

weight(c) = delay(c) / max(delay(p))

where p is any path containing c

4. Allocate slack for each connection

delay(c) = delay(c) + weight(c) * slack(c)

}

Page 42: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 42 FPGA 2015

Slack Allocation

0

0.2

0.4

0.6

0.8

0

0.5

1

1.5

2

1 2 3 4 5 6 7 8 9 10 11

Co

nn

ec

tio

n W

eig

ht

Se

tup

Sla

ck

(n

s)

Iteration

Slack

Weight

0

2

4

6

8

10

12

0

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7 8 9 10 11

Itera

tio

n

Co

nn

ec

tio

n D

ela

y (

ns

)

Iteration

Delay

Iteration

Slack allocation on setup

timing for the circled

connection

Page 43: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 43 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 44: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 44 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 45: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 45 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 46: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 46 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 47: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 47 FPGA 2015

Future Work

• Clock-to-output constraint

• Relative timing constraint between paths

• Example: 0.5 ns < clock-to-out path – clock-out path < 4 ns

Page 48: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 48 FPGA 2015

Future Work

• Cycle stealing

• Add extra delay to clock path to alleviate setup violations

Page 49: Application of Specific Delay Window Routing for Timing ...

Evan Wegley, Qinhai Zhang Page 49 FPGA 2015

Future Work

• Cycle stealing

• Add extra delay to clock path to alleviate setup violations