High-Performance Gate Selection with a Signoff Timer Andrew B. Kahng *, Seokhyeong Kang *, Hyein Lee...
-
Upload
june-anderson -
Category
Documents
-
view
216 -
download
0
Transcript of High-Performance Gate Selection with a Signoff Timer Andrew B. Kahng *, Seokhyeong Kang *, Hyein Lee...
High-Performance Gate Selection with a Signoff Timer
Andrew B. Kahng*, Seokhyeong Kang*, Hyein Lee*, Igor L. Markov+ and Pankit Thapar+
UC San Diego* University of Michigan+
2
Outline
• Gate Selection in VLSI Design• Previous Work• Challenges in Gate Selection• High-Performance Gate Selection with a Signoff Timer• Overall Flow• Experimental Results• Conclusions and Future Works
3
Gate Selection in VLSI Design
• Effective approach to power, delay optimization • Objective: select a library cell for each gate
• Tunable cell parameters: gate length, gate width, Vth
• Minimize power • Satisfy constraints: slack, slew, max load capacitance, …
gate-width(drive-strength)
multi-Vth
Lgate-bias
INVX2 INVX4 INVX8 INVX16
HVT NVT LVT
L=60nmL=65nm L=55nm
lower (leakage) powerlower speed
higher (leakage) powerhigher speed
4
Previous Techniques
• Common heuristics/algorithms
• Limitations• Do not account for realistic delay models and constraints (capaci-
tance, slew)• Continuous methods: industrial cell libraries offer discrete gate
sizes, and rounding solutions is not easy• Discrete methods: scalability to large circuits is an issue
Continuous methods
Discrete methods
Linear programming Convex optimization
Lagrangian relaxation
Dynamic programming Sensitivity-based Selection
Optimality Scalability
5
Previous Work
• Our work extends Trident 1.0 [Hu et al. Proc. ICCAD 2012]• Produced strongest results on ISPD 2012 benchmarks
as of ICCAD 2012• Metaheuristic optimization with importance sampling
and sensitivity-guided search• Limitation: no interconnect delay calculation
unrealistic assumption
6
Outline
• Gate Selection in VLSI Design• Previous Works• Challenges in Gate Selection
• Interconnect delay• Incorrect internal timer• Critical paths
• High-Performance Gate Selection with a Signoff Timer• Overall Flow• Experimental Results• Conclusions and Future Works
7
Challenges in Gate Selection
• Selection problem seen at all phases of RTL-to-GDS flow• Becomes more challenging at later design stages
RTL
Gate Level Netlist
Placed Netlist
Routed Netlist
GDS
Logic Synthesis
Placement
Route
Interconnects
Gate Selection
• Timing constraints are strict• Gate and interconnect delay• Slew, max capacitance
• Gate Selection can result in large change in interconnect delay
Challenging
Our Problem
New challenges in the ISPD 2013 Gate Selection Contest• Routed netlists including interconnect • Realistic timing constraints including
slew and capaciatance• Relying on an industry signoff timer
8
Issue 1: Interconnect Delay/Slew
• Delay and slew calculations for gates and wires• Delay : 50% of input transition to 50% of output transi-
tion • Slew : 25% to 75% of transition• Gate delay and slew are estimated with the lookup ta-
ble-based nonlinear delay models (NLDMs)• Interconnect delay and slew are estimated
with analytical models for RC trees
wirecell2cell1
wire delay
SA B
Ccell3
wire slew
75%
25%
A
S
B
C
A B
C
Sv0 v1 v2
v3
v4 v5
C0 C1 C2 C5
C3
R0-1 R1-2
R2-3R3∙4 =R0-1 +R1-2
C4R2-4
R4-5
9
Issue 1: Interconnect Delay/Slew• The impact of interconnects on slew values propagates
to upstream and downstream and makes delay changes
T
S1
S2
FI2
FI1
FO1 FO2
Output pin capacitance change + slew change by interconnect
Slew propagation + slew degradation by interconnect
Large delay changes in upstream and downstream gates and nets
10
Issue 2: Incorrect Internal Timer• Timer is essential to estimate interconnect delay and slew
which are affected by gate Selection/Vth swapping• Two options: Signoff Timer and Internal Timer
An accurate internal timer is needed
Signoff Timer
Gate Selection/Vt-Swapping
Post-Layout
Signoff
Post-Layout Optimizer
Iterative invocation Runtime increaseInternal Timer
TimingDiscrepancy
11
Issue 2: Inaccurate Internal Timer• Challenges in matching signoff timer
• Error propagation along paths• Error accumulation with netlist changes
Error propagation on pathsError(internal – signoff)
Error # logic depth along path# cell change
Netlist change
Error accumulationwith netlist change
Timing calibration to a signoff timer is needed to avoid divergence
12
Issue 3: Critical Paths• Many near-critical paths in the given benchmarks • Challenging to obtain a timing feasible solution
* From ISPD 2013 Discrete Gate Selection Contest Presentation
Dedicated critical path optimization is needed
13
Outline
• Gate Selection in VLSI Design• Previous Works• Challenges in Gate Selection• High-Performance Gate Selection with a Signoff Timer
• Internal Timer with Interconnect Delay Modeling• Calibration to a Signoff Timer• Dedicated Critical Path Optimization• Sensitivity Functions
• Overall Flow• Experimental Results• Conclusions and Future Works
14
Our Sizer
• High-Performance Gate Selection with a Signoff Timer1. Interconnect delay/slew models for an internal timer2. Efficient calibration to a signoff timer3. Critical path optimization for timing-feasible solutions4. Sensitivity-guided cell Selection
15
1. Interconnect Delay/Slew for Internal Timer
• Essential to estimate interconnect delay and slew affected by gate Selection/Vth swapping
• Requirements for an internal timer• Fast enough for move-based optimization • Accurate enough to track signoff timer
• Our approach: use best-performing models for in-terconnect delay/slew from previous work
16
Interconnect Delay/Slew : Previously Known Models• Early optimization does not require accuracy fast interconnect models • We use pre-existing models
• Model selection criterion: endpoint slack error between signoff timer* and our estimation
Elmore delayD2M
DM1, DM2
PERIS2M
delay models slew models
D2M: Alpert et al. ISPD 2000DM1,DM2: Kahng et al. TCAD 1997PERI: Kashyap et al. TAU 2002S2M: Agarwal et al. TCAD 2004 McCormick: McCormick Thesis 1989
McCormickTotal Cap.
Effective Cap. models
* Synopsys PrimeTime
17
Interconnect Delay/Slew : Model Selection• The (D2M, PERI) model combination
has the smallest mean and standard deviation
Endpoint slack error distribution
(EM, PERI) (D2M,PERI)
(DM1,PERI) (DM2,PERI)
x-axis: slack error (ps), y-axis: % of #paths
0
1
2
3
4
5
6Mean StDev
Normalized mean/std. of endpoint slack error
18
2. Calibration to a Signoff Timer• Challenges in matching the results of a signoff timer
• Timing divergence from error propagation along timing paths and error accumulation with netlist changes
• The divergence can be compensated with offset • Offset-based slack calibration [Moon et al. Patent 7,823,098]• Improve the accuracy of a given STA engine by periodically invok-
ing a signoff timer and storing slack differences (offsets)
Signoff TimerInternal Timer
Request timing information
offset = signoff timer – internal timer
19
Calibration Frequency vs. Error• Impact of calibration frequency on average
slack error while Selection:
5% threshold shows <10ps slack errors
X-axis: % of cell changes during leakage optimization
Y-axis: (avg.) slack error over the signoff timer
20
• Tcl socket interface allows send/ receive commands to/from the signoff timer
• Basis of winning ISPD-2013 gate Selection contest entry
Efficient Signoff-Timer Interface
Sizer signoff timer
load designlaunch signoff timer
cell sizing
open socket
cell swap listupdate cell size
incremental STAtiming calibration timing results
(b)
(a)Tcl client (Sizer)
socket interface
Tcl server (signoff timer)
socket -server accept $portvwait events
proc accept {sock addr port} fileevent $sock readable \\ [list svcHandler $sock] fconfigure ...
set server xx.xx.xx set chan [socket $server $port]
proc GetData {} set data [gets $chan] return $data proc SendData {data} puts $chan $data
(a) Tcl socket code
(b) Timing correlation w/ the socket I/F
21
3. Critical Path Optimization• ISPD 2013 contest : many near-critical paths
in benchmarks • Challenging to obtain a timing feasible solution
• Dedicated critical path optimization: optimize cells on the most critical path to reduce WNS*• DownSelection fanouts• Peephole optimization
* WNS: Worst Negative Slack
22
Critical Path Optimization: Downsizing Fanouts• Downsizing fanouts of critical cells
• Improve delay of the target cell by reducing output load
• Downsize fanout cells with sensitivity score
𝑺𝑭 𝒅𝒐𝒘𝒏=𝑪𝒐𝒖𝒕 (𝒄) 𝒔𝒊𝒛𝒆(𝒄)
Critical cells
Fanout cells
DownSelection to reduce input cap.
Cell delay decresewith reduced output load
23
• Pick k cells in a critical path and exhaustively searchthe best combination of k
• All possible combinations are listed in order of Gray code to minimize the overhead of incremental STA*
current window next window
N(# trial) = {#size option}^{k}
...
trial1
trial2
trialN
pick the best move
Critical Path Optimization: Peephole Optimization
Critical path
Enumerate all possible combination w/ Gray code
iSTA
* STA: Static Timing Analysis
24
4. Sensitivity Function: Timing Recovery• Our sensitivity function takes into account
• The direct impact of Selection a given cell on its slack• The required increase in leakage power• The number of critical paths whose slack is improved
• Sensitivity function for timing recovery• , : slack, leakage power change from the cell change• : the number of paths passing through the cell• : Leakage exponent
• Same as [7], but interconnect delay/slewis considered
𝑺𝑭 𝑮𝑻𝑹=∆𝒔𝒍𝒂𝒄𝒌 ·¿𝒑𝒂𝒕𝒉𝒔 ¿∆ 𝒍𝒆𝒂𝒌𝒂𝒈𝒆𝒑𝒐𝒘𝒆𝒓𝜶
25
Sensitivity Function: Leakage Reduction
• Multiple sensitivity functions from [7] are use
• Among five SFs, the best SF is selected and used for the next optimization stage
SF1 ∆leakage / ∆delaySF2 ∆leakage * slackSF3 ∆leakage / (∆delay*#paths)SF4 ∆leakage * slack / #pathsSF5 ∆leakage * slack / (∆delay*#paths)
26
Outline
• Gate Selection in VLSI Design• Previous Works• Challenges in Gate Selection• High-Performance Gate Selection with a Signoff Timer• Overall Flow
• Global Timing Recovery• Power Reduction with Feasible Timing
• Experimental Results• Conclusions and Future Works
27
Overall Optimization Flow• Overall flow: Global Timing Recovery (GTR) +
Power Reduction with Feasible Timing (PRFT)
Routed Netlist, SPEF
GTRwoST
Selection Solution
GTRwST
PRFT phase1
PRFT phase2
Set to minimum size
Global Timing Recovery
Power Reduction w/ Feasible Timing
Find timing feasible solution with an internal timer
Find timing feasible solutionwith a signoff timer
Leakage reduction with different sensitivity functions
Leakage reduction with kick-move
28
GTR without Signoff Timer
• GTR procedure
• Objective: find timing feasible solution with internal timer (no need of accurate timing information)
• Use guardband for the fast solution search
Timing feasible solution(non-feasible with signoff timer)
Increase guardband (GB)
No
Yes
GTR(GB)GTR(GB)GTR(α,γ)
Feasible?Feasible?Feasible?
Multi-threaded
STA
Calculate sensitivity (α)
Upsize γ% of cells in de-scending order of sensi-
tivity
Timing meet?
Incremental STA
NO
29
GTR with Signoff Timer• Objective: find timing feasible solution with signoff timer• Timing recovery is added to GTR flow
Feasible?
Timing feasible Solution
Cell upSelection
Peephole & Critical path optimization
Internal slack calibration
No
Yes
Signoff timer
Update slack offset
• Timing recovery procedure
30
PRFT with Sensitivity Functions• Objective: find the best leakage solution
• Various sensitivity functions are tried sequentially
Best Solution /Sensitivity Function (SF)
Run static timing analysis
Calculate sensitivity for all cells
Downsize cell C with maximum sensitivity
slack (C ) < 0
Incremental STA
NO
Revert the
Selection
YES
Feasible?
SGGS(SFi)
Next Sensitivity Function (SFi)
• SGGS procedure
Timing recovery
No
Yes
31
PRFT : Speeding up Bottleneck Cells• Speed up bottleneck cells: recover timing slack
with minimum power impact • To escape from a local optimum, γ% bottleneck
cells are upsized
Feasible?No
SGGS(SF)
Yes
Timing recovery
Next Kick Move (LSMC) with γ% ratio
Best Solution /Best Sensitivity Function (SF) from PRFT phase 1
32
Outline
• Gate Selection in VLSI Design• Previous Works• Challenges in Gate Selection• High-Performance Gate Selection with a Signoff Timer• Overall Flow• Experimental Results• Conclusions and Future Works
33
ISPD 2013 Gate Selection Contest
• Realistic benchmarks and constraints• Netilst (Verilog), parasitics (SPEF), timing constraint (SDC)
• Max slew/load constraint
• Library: 11 logic functions, 30 cell types (three multi-Vth and ten different sizes) 330 cells
• Leakage power of violation-free solutions are compared• Final timing evaluation with a commercial signoff tool
34
Experimental Results: Power and Runtime Result• Power and runtime comparison with contest best result
http://www.ispd.cc/contests/13/ISPD_2013_Contest_Final.pdf
usb_
phy_
fast
usb_
phy_
slow
pci_b
32_fas
t
pci_b
32_s
low
fft_fa
st
fft_slo
w
cord
ic_slo
w
des_
perf_
slow
edit_
dist_fas
t
edit_
dist_s
low
mat
rix_m
_slow
netc
ard_
fast
netc
ard_
slow
0
0.5
1
1.5 Contest Best
Trident2.0
Normalized leakage power
0
0.5
1
1.5
2
2.5
Contest Best Trident2.0Normalized runtime
No team found a feasible solution for netcard_fast
35
• Runtime breakdown
Experimental Results: Runtime Breakdown
36
• Normalized TNS* and leakage power over GTR iterations• After timing correlation, TNS increases due to discrepancy
between internal timer and signoff timer
Experimental Results: Optimization Trajectories
GTR without signoff timer
GTR with signoff timer
0 5 10 15 20 25 300.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6TNS Leakage
After timing correlation
* TNS: Total Negative Slack
37
• The minimum leakage without timing violation is achieved with calibration for every 5% cell change
• No calibration timing violation cannot be fixed• One calibration leakage increase after timing recovery• Applying gaurdband (GB) leakage overhead
Experimental Results: Impact of Timing Calibration
PRFT after timing recovery97%
100%
103%
106%
109%
112%
calibration (5%) init calibrationno calibration GB=5psGB=10ps
Nor
mal
ized
Lea
kage
(%)
PRFT after timing recovery
-450-400-350-300-250-200-150-100
-500
calibration (5%)init calibrationno calibrationGB=5psGB=10ps
TNS
(ps)
Result of pci_b32_fast
38
• Trident2.0: high-performance gate-Selection• Fast interconnect models with reasonable accuracy
for an efficient internal timer• Calibration to a signoff timer with an interface
to improve timing accuracy• Dedicated critical path optimization with heuristics
• ISPD 2013 gate selection contest• Trident 2.0 took 2nd and 1st places in two contest categories,
resp.
• Future work• See if Lagrangian relaxation helps• Additional industry benchmarks
Conclusions and Future Work
Thank you!