In Search of Lost Time - TAU Workshop · 2016-03-11 · In Search of Lost Time Andrew B. Kahng UCSD...
Transcript of In Search of Lost Time - TAU Workshop · 2016-03-11 · In Search of Lost Time Andrew B. Kahng UCSD...
1A. B. Kahng, TAU 2016
In Search of Lost Time
Andrew B. KahngUCSD CSE and ECE Departments
[email protected]://vlsicad.ucsd.edu
TAU-2016 Friday keynote, Santa Rosa
2A. B. Kahng, TAU 2016
In Search of Lost Time
3A. B. Kahng, TAU 2016
What is Time?
How do we lose Time?
How do we regain Time?
4A. B. Kahng, TAU 2016
What is Time?
5A. B. Kahng, TAU 2016
What is Time?• Time = Schedule
• Moore’s Law: 1% = 1 week• Time = Things convertible to time
• mV, σ, uW, nm, $, μm2
Margin
Time
Product Quality Model and Analysis Accuracy
nm, mV, {skew,jitter, OCV…}
power, area, fmax, Iddq,…rms, %, σ
6A. B. Kahng, TAU 2016
What is Time?• Time = Schedule
• Moore’s Law: 1% = 1 week• Time = Things convertible to time
• mV, σ, uW, nm, $, μm2
• Time = time itself• Flavors: slack, trans, xd, d-trans, …
7A. B. Kahng, TAU 2016
What is Time?• Time = Schedule
• Moore’s Law: 1% = 1 week• Time = Things convertible to time
• mV, σ, uW, nm, $, μm2
• Time = time itself• Flavors: slack, trans, xd, d-trans, …
Time = Money
8A. B. Kahng, TAU 2016
What is Time?
How do we lose Time?
9A. B. Kahng, TAU 2016
How Do We Lose Time?• It’s tough not to …
10A. B. Kahng, TAU 2016
Context I: Race to End of Roadmap• Paper model to v1.0 SPICE model: ~12 months @N10• Many near-term “red bricks”: ArF, Cu, low-k, …• Foundry-fabless dynamics: who gives up margin ?• Time constants limit design-manufacturing co-evolution
(Years) Tech development, app market definition, architecture/front-end design
(Months) RTL-to-GDS implementation,reliability qualification
(Weeks) Fab latency, cycles of yield learning,design re-spins, mask flows
(Days) Process tweaks, design ECOs
Mism
atch
es a
mon
g th
ese
time
cons
tant
s • Model-hardware miscorrelation
• Model guardbanding• Faster node enablement
is challenging !!
11A. B. Kahng, TAU 2016
Context II: Low-Power Grand Challenge
Low power =High complexity
multiple supply voltages,power and clock gating,DVFS, MTCMOS,multi-Lgate, …
Increased timing closure burden
Mobility
Big data
Green datacenters Cloud
Internet of Things
12A. B. Kahng, TAU 2016
How Do We Lose Time?• It’s tough not to …• The margining imperative …
13A. B. Kahng, TAU 2016
Nobody Wants to Own the Scrap• Timing model not 100% accurate • Add margin to cover unknowns
14A. B. Kahng, TAU 2016
Stacks of Margins
performance
Process
Signoff
Temperature
source: Wu 08
Nominal Vdd
Static IR drop
Power grid IR gradient
Dynamic IR
HCI/NBTI
Signoff
Voltage
Signoff
Design margin = stack of layers of conservatism
Reliability
15A. B. Kahng, TAU 2016
Consequences• Diminishing ROI from next node• Typical: Moore’s Law-like scaling• Worst-case: scales, but worse ROI• Signoff with excessive margin: potential gain wiped out
16A. B. Kahng, TAU 2016
Time: Lose Some, Win Some
20nm90nm 45/40nm 28nm 16/14nm 10nm ≤7nm65nm
BTI
Temp inversion
Noise
MCMM
Maxtrans
EM
AOCV / POCV
PBA Fixed-margin spec
patterningMulti-
patterning
Cell-POCV
MOL, BEOL R ↑Dynamic IR
Fill effects
Layout rules
BEOL, MOL variations
Signoff criteria with AVS
SOC complexity
LVF
MIS
Phys-aware timing ECO
Min implant
17A. B. Kahng, TAU 2016
How Do We Lose Time?• It’s tough not to …• The margining imperative …• We give it away
• Intentionallyc2q-setup-hold surface
18A. B. Kahng, TAU 2016
How Do We Lose Time?• It’s tough not to …• The margining imperative …• We give it away
• Intentionally
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Homogeneous BEOL corners (e.g., Cworst) Homogeneous
Cw corner
19A. B. Kahng, TAU 2016
How Do We Lose Time?• It’s tough not to …• The margining imperative …• We give it away
• Intentionally• By miscorrelating
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
T 2Pa
th S
lack
(ns)
T1 Path Slack (ns)
123 ps
20A. B. Kahng, TAU 2016
How Do We Lose Time?• It’s tough not to …• The margining imperative …• We give it away
• Intentionally• By miscorrelating• By wasting it
2013 Contest NDA: Without [EDA vendor’s] prior approval, I shall not write or publish any article or presentation that references [EDA vendor’s tool name].
21A. B. Kahng, TAU 2016
How Do We Lose Time?• It’s tough not to …• The margining imperative …• We give it away
• Intentionally• By miscorrelating• By wasting it
“We don’t have enough time to do it right, but we have enough time to do it wrong”
22A. B. Kahng, TAU 2016
Not Enough Time To Do It Right…Option #1:go with latest available technology = 0.01 AU/year speed
Option #2:spend the next ten years to come up with a spaceship = 0.1 AU/year speed
2016 2026 2027 2031 • Earth to Mars
Option #1 = 0.5 / 0.01 = 50 yearsOption #2 = 0.5 / 0.1 + 10 years = 15 years (B<< A)
• Issue: investment for the long haulOption #1 Option #2
Corner-based STA Statistical STA
Planar 3D
Homogeneous CMOS Heterogeneous CMOS
Need afaster ship
Year:
23A. B. Kahng, TAU 2016
What is Time?
How do we lose Time?
How do we regain Time?
24A. B. Kahng, TAU 2016
How Do We Regain Time?• Learn !!! (machine learning, Big Data mindset)
25A. B. Kahng, TAU 2016
Timer Miscorrelation
• T1 and T2 : commercial signoff STA tools with same inputs (.v, .spef, .lib)
• 123ps slack divergence 20% performance difference one node of Moore’s Law scaling
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
T 2Pa
th S
lack
(ns)
T1 Path Slack (ns)
123 ps
[DATE14]
26A. B. Kahng, TAU 2016
Erase Miscorrelation with Machine Learning!Can also erase P&R vs. signoff STA miscorrelation
ArtificialCircuits
Train Validate Test
NewDesigns
MODELS(Path slack, setup time, stage, cell, wire delays)
If error >
threshold
Outliers (data points)
ONE-TIME
INCREMENTAL
RealDesigns
T1 Path Slack (ns)
T 2Pa
th S
lack
(ns)
31 ps
~4×reduction
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
T 2Pa
th S
lack
(ns)
T1 Path Slack (ns)
123 ps
ML Modeling
BEFORE AFTER
[DATE14]
27A. B. Kahng, TAU 2016
Harder: Non-SI to SI Calibration
.v.db, .lib .spef .v .sdc
Post P & R Database
Calibration: Recipe to Convert Non-SI Timing Report to SI
Timing Report
Non-SI Timing ReportNon-SI Timing Report
SI Timing ReportSI Timing
ReportSI Timing Report
• Complex interplay of electrical, logic structure, and layout parameters
• Black-box code in STA tools• Slack diverges by 81ps (clock
period = 1.0ns) • ~4 stages of logic at 28nm
FDSOI
81ps
SI Path Slack (ns) ($$$)
Non
-SI P
ath
Slac
k (n
s)
($)
[SLIP15]
28A. B. Kahng, TAU 2016
“SI for Free” with Machine Learning • Machine learning of
incremental transition time, delay due to SI
• Accurate SI-aware path delays, slacks
Timing Reports in SI Mode
Timing Reports in Non-SI Mode
Create Training, Validation and Testing Sets
ANN (2 Hidden Layers, 5-Fold Cross-Validation)
Save Model and Exit
SVM (RBF Kernel, 5-Fold Cross-Validation)
HSM (Weighted Predictions from ANN and SVM)
Actual Path Delay (ps)
Pred
icte
d Pa
th D
elay
(ps)
8.2ps
Worst absolute error = 8.2psAverage absolute error = 1.7ps
81ps
SI Path Slack (ns) ($$$)
Non
-SI P
ath
Slac
k (n
s)
($)
ML Modeling
BEFORE AFTER
[SLIP15]
29A. B. Kahng, TAU 2016
Sim Results(Dyn.) Activity Factor (Static)
Timing/Noise
MTTF & Aging
P&R + Optimization
Power Analysis
Thermal Analysis
Task Mapping/ Migration/ (DVFS)
Temp Map
Power Trace
ReliabilityReport
Tech files, signoff criteria, corners
Slack
IR Drop Map
Timing / Glitches
AVS
Sim vectorsBenchmark
RTL
Functional Sim
Similar: Closing Multiphysics Analysis Loops[ASPDAC16]
30A. B. Kahng, TAU 2016
Sim Results(Dyn.) Activity Factor (Static)
Timing/Noise
MTTF & Aging
P&R + Optimization
Power Analysis
Thermal Analysis
Task Mapping/ Migration/ (DVFS)
Temp Map
Power Trace
ReliabilityReport
Tech files, signoff criteria, corners
Slack
IR Drop Map
Timing / Glitches
AVS
Sim vectorsBenchmark
RTL
Functional Sim
STA-IR loop
STA-Thermal loop
Workload-Thermal loop
STA-Reliability loop
Similar: Closing Multiphysics Analysis Loops[ASPDAC16]
31A. B. Kahng, TAU 2016
Multiphysics Analysis is Difficult to Predict• IR drop, thermal, reliability, crosstalk, etc.• Example: Can we predict “risk map” for embedded
memories at floorplan stage ?
SRAM #1
SRA
M S
lack
(ps)
SRAM #5
25ps29ps
[ASPDAC16]
32A. B. Kahng, TAU 2016
Multiphysics Analysis is Difficult to Predict
Implementation Index
SRA
M S
lack
(ps)
[ASPDAC16]
• IR drop, thermal, reliability, crosstalk, etc.• Example: Can we predict “risk map” for embedded
memories at floorplan stage ?
33A. B. Kahng, TAU 2016
Floorplan Pathfinding • Filter bad floorplans (e.g., embedded memory placements,
power plans) comprehending downstream PD flow• Model f estimates combined effects of netlist, constraints,
placement, CTS, routing, optimization, STA
=Slack (w/, w/o IR)
= netlist, constraints, floorplan parameters
= ( )
= ???
Signoff
Extraction, Timing, Verification
Placement
Floorplan, Powerplan
Routing
Gate Netlist
Slack (w/, w/o IR)
Mod
elin
g Sc
ope
Constraints
Clock network synthesisExtraction,
Timing
Costly Iteration
[ASPDAC16]
34A. B. Kahng, TAU 2016
Floorplan Pathfinding Model• False negatives = 3%
• Pessimistic predictions floorplan change that is actually not required
• False positives = 4%• Model incorrectly deems a floorplan to be good
False positives
False negatives
Actual
Pass
FailPassFa
ilPred
icte
d 584 42
38431
Positive slack data points:Precision: tp/(tp +fp) = 93.3%Recall: tp/(tp +fn) = 95.0%
Negative slack data points:Precision: tn/(tn +fp) = 92.5%Recall: tn/(tn +fn) = 90.1%
Precision
Recall
Precision
Recall
[ASPDAC16]
35A. B. Kahng, TAU 2016
Related: Library Groups New k-Factors ?• Library interpolation with each “physics” modeled as
equivalent voltage delta (for example)• Voltage• Process variation• Temperature• Aging / reliability
• Per-instance timing derating for signoff• In spirit of old “k-factors”, perhaps
Derating(V1, P1, T1, A1)
Derating(V2, P2, T2, A2)
Derating(V3, P3, T3, A3)
VoltageProcess variation
Temperature
Aging / reliability
36A. B. Kahng, TAU 2016
How Do We Regain Time?• Learn !!! (machine learning, Big Data mindset)• Embrace the “era of optimization”: 1% = 1 week
37A. B. Kahng, TAU 2016
METRICS (1999): Measure to Improve [ISQED01]
• Goal #1: Predict outcome• Goal #2: Find sweet spot (field of use) of tool, flow• Goal #3: Dial in design-specific tool, flow knobs
38A. B. Kahng, TAU 2016
Pure Optimization is a Big Lever• Project planning and management
• Unforeseen events (late RTL bugs, timing ECO)• Resource co-constraints (e.g., 2 cores per EDA license, 3 concurrent
tapeouts)
( )A4 (3)
( )A5 (1)
( )A1 (1) ( )
A2 (1)( )
A1 (1)
( )A2 (1)
( )A2 (1)
( )A3 (1)
( )A3 (1)
( )A4 (1)
( )A4 (2)( )
A4 (1)
( )A5 (1)
( )A4 (3)
( )A5 (2)
( )A1 (2)
( )A3 (2)
( )A2 (2)
( )A1 (2) ( )
A2 (2)
( )A2 (2)
( )A3 (1)
( )A3 (2)
( )A4 (1)
( )A4 (2)
( )A4 (1)
( )A5 (2)
( )A4 (3)
( )A5(3)( )
A1 (3) ( )
A3 (3)( )
A2 (3)
( )A1 (3)
( )A2 (3)
( )A2 (3)
( )A3 (2)
( )A3 (3)
( )A4 (2)
( )A4 (2)
( )A4 (3)
( )A5(3)
20 22 24 26 28 30 32 34 36 38 40 42
Current servers
Work Weeks
Usa
ge (A
cros
s Thr
ee P
roje
cts)
Datacenter capacity( )A3 (3)
• “How to pack 14 tapeouts into my design center during 2H15?”• Schedule cost minimization (SCM)
• Minimize overall project makespan subject to delay penalties, resource bounds, resource co-constraints, etc.
• Resource cost minimization (RCM)• Minimize number of resources required across all projects
[DAC15 WIP]
39A. B. Kahng, TAU 2016
Example Solver Use Cases (from a design center of a world top-5 semi)
• Schedule modification after late-breaking bug• Three projects, 11 activities/project (e.g., placement, routing, RCX,
STA, etc.)• Five resource types (#cores, #memory, licenses for P&R, RCX, STA,
tools)• Industry solution: Makespan of 41 days across all projects• SCM solution: Makespan of 34 days across all projects (1.4 weeks
saved)• Datacenter resource allocation
• 24 projects, five activities (synthesis, P&R, RCX, STA, PV) per project• Forecast-based allocation for #servers in datacenter • Industry solution: Purchase 600 additional servers • SCM solution: Zero additional servers
• Human resource allocation• Four large projects• Four types of human resources (synthesis, P&R, verification, STA)• RCM solution: ~$5.2M headcount cost savings for company
• MILP solver at http://vlsicad.ucsd.edu/MILP/
40A. B. Kahng, TAU 2016
DARPA’s CRAFT Program (2016-)• “Circuit Realization At Faster Timescales” (UCSD leads a team)• Goal: reduce SOC design time from 130 weeks to 30 weeks • “Iso-PAP” (Performance At Power) at 14/16nm and below
41A. B. Kahng, TAU 2016
How Do We Regain Time?• Learn !!! (machine learning, Big Data mindset)• Embrace the “era of optimization”: 1% = 1 week• Take back what we’ve given away
42A. B. Kahng, TAU 2016
Flexible FF Timing Margin Recovery
setup
c2q
hold
c2q
c2q-setup-hold surface
setup holdc2q
setup
hold
c2q1
c2qn
...
setup-hold-c2q flexible model
• Setup time, hold time and clock-to-q (c2q) delay of FF⇒ NOT fixed values
• Flexible FF timing model considering operating (function/test) modes, path partitioning⇒ Reduce pessimism in timing analysis
• Sequential LP• setup-c2q
optimization + hold-c2q optimization
• Objective: Find the best setup/hold time/c2q for each FF
setup-hold-c2q fixed model
[ISQED14]
43A. B. Kahng, TAU 2016
“Free” Improvement of Timing
Extract path timing information
LP formulation with flexible flip-flop timing model
Solve Sequential LP (STA_FTmax , STA_FTmin)
Annotate new timing model for each flip-flop
Solution
Netlist (and SPEF, if routed)
Timing signoff with annotated timing
• Fix timing violations “for free”
• 48ps average WNS improvement over 5 designs in foundry 65nm technology
• Other opportunities (jitter, ERCs, non-full rail swing, …)
44A. B. Kahng, TAU 2016
Homogeneous Corners• (1) Define RC corners of each layer separately• (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example: worst-case capacitance corner Homogeneous
Cw corner
[ICCD14]
45A. B. Kahng, TAU 2016
Homogeneous Corners• (1) Define RC corners of each layer separately• (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example: worst-case capacitance corner
When variations in different layers are not fully correlated, pessimism of homogeneous corners increase with #layers
46A. B. Kahng, TAU 2016
Tightened BEOL Corners (“TBC”)
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0?
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0?
Timing analysis
using CBC
violation = 0?
ECOusing TBC
done
UCSD, 2014
NoNo
[ICCD14]
47A. B. Kahng, TAU 2016
Pessimism in Conventional BEOL Corners (CBC)• Assumption: a max (setup) path pj is “safe” when the delay
evaluated at a given CBC is larger than nominal delay + 3σjdj(YCBC) ≥ 3σj + dj(Ytyp)
• For a given path, we can compare the statistical delay variation and the delay obtained from a given CBC
αj = 3σj / ∆dj(YCBC) ∆dj(YCBC)= [dj(YCBC) - dj(Ytyp)]
YCBC ∈ {Ycw, Ycb, Yrcw, Yrcb}
• A small αj implies there is a large pessimism
delay-3σ
dj(YCBC)-dj(Ytyp)3σj
Large pessimism
48A. B. Kahng, TAU 2016
Scaling Factor α and Delay Variation• Paths with small ∆drcw and ∆dcw have large α• E.g., there are αj > 0.6 when ((∆drcw < 3%) AND (∆dcw < 3%))• Identify paths for tightened BEOL corners based on ∆drcw and ∆dcw
α
Δd(Ycw)/d(Ytyp)
Δd(Yrcw)/d(Ytyp)
49A. B. Kahng, TAU 2016
Find Paths for Which TBCs Can Be Used
Acw
Arcw
Gtbc = paths which can be safely signed off using tightened corners:Path with ((∆dcw larger than Acw) OR (Path with ∆drcw larger than Arcw))
Δd(Ycw)/d(Ytyp)
Δd(Yrcw)/d(Ytyp)
50A. B. Kahng, TAU 2016
Benefits of Tightened BEOL Corners
• WNS and TNS are reduced by up to 100ps and 53ns
• #paths with timing violations is reduced by 24% to 100%
• TBC-0.5 configuration has little benefits because there are not many paths in Gtbc
Correlation factor, γ = 0.5
-0.2
-0.15
-0.1
-0.05
0LEON SUPERBLUE12 NETCARD
WN
S (n
s)
CBC TBC-0.5 TBC-0.6 TBC-0.7
-100
-80
-60
-40
-20
0LEON SUPERBLUE12 NETCARD
TNS
(ns)
CBC TBC-0.5 TBC-0.6 TBC-0.7
0
500
1000
1500
LEON SUPERBLUE12 NETCARD
#Tim
ing
viol
atio
ns
CBC TBC-0.5 TBC-0.6 TBC-0.7
51A. B. Kahng, TAU 2016
How Do We Regain Time?• Learn correlations!• Enter the “era of optimization”: 1% = 1 week• Take back what we’ve given away• Stop wasting time
52A. B. Kahng, TAU 2016
Poor Enablement, Poor Results• Academic libraries
• OpenPDK• SAED 32/28• 15nm FreePDK• ISPD Sizing Contest Library
• Academic enablements (PDKs, libraries, etc.) quite weak
53A. B. Kahng, TAU 2016
Example: 15nm OpenPDK• Cell delays not realistic• RC information missing
• Cannot extract wire capacitances with commercial RCX tools
• Complex LEF rules missing
Des/Clust/Port Wire Load Model Library------------------------------------------------wb_dma_top wl_zero NanGate_15nm_OCL
Point Fanout Cap Trans Incr Path----------------------------------------------------------------------------------------------clock clk_i (rise edge) 0.000 0.000clock network delay (ideal) 0.000 0.000u3_u1_slv_adr_reg_9_/CLK (DFFRNQ_X1) 0.000 0.000 0.000 ru3_u1_slv_adr_reg_9_/Q (DFFRNQ_X1) 2.494 10.094 10.094 fslv0_adr[9] (net) 1 0.807 0.000 10.094 fU3390/ZN (NOR2_X1) 5.483 3.642 13.735 rn2388 (net) 1 1.616 0.000 13.735 rU2231/ZN (NAND2_X2) 4.255 3.216 16.952 fn2593 (net) 3 2.164 0.000 16.952 fU3389/ZN (INV_X1) 3.705 2.917 19.868 rn3228 (net) 3 1.990 0.000 19.868 rU3387/ZN (NAND2_X1) 6.314 4.207 24.075 fn3230 (net) 3 2.198 0.000 24.075 fU4136/Z (OR2_X1) 3.102 5.762 29.837 fn2318 (net) 2 1.509 0.000 29.837 fU3373/ZN (INV_X1) 2.093 1.799 31.636 rn3435 (net) 1 0.840 0.000 31.636 rU3372/Z (BUF_X2) 15.410 10.845 42.481 rn2367 (net) 31 21.353 0.000 42.481 rU3992/ZN (AOI22_X1) 9.081 5.892 48.373 fn3388 (net) 1 0.631 0.000 48.373 fU3185/ZN (NAND4_X1) 7.109 2.862 51.235 ru0_N3065 (net) 1 0.485 0.000 51.235 ru0_wb_rf_dout_reg_22_/D (DFFRNQ_X1) 7.109 0.000 51.235 rdata arrival time 51.235
clock clk_i (rise edge) 60.000 60.000clock network delay (ideal) 0.000 60.000u0_wb_rf_dout_reg_22_/CLK (DFFRNQ_X1) 0.000 60.000 rlibrary setup time -8.764 51.236data required time 51.236----------------------------------------------------------------------------------------------data required time 51.236data arrival time -51.235----------------------------------------------------------------------------------------------slack (MET) 0.002
Clock Period = 60ps? (1.5ns with 28nm foundry)
Stage delay: 2ps~30ps
STA report from [EDA tool]
54A. B. Kahng, TAU 2016
Example: ISPD13 Gate Sizing Contest Library• “Gap” between academic benchmarks and industry
designs• Unrealistic timing library• Missing MCMM• Missing multiple power domains• Missing multiple clock domains• Missing memories / macro cells• No support for standard formats (.spef, .v, .sdc, .lib)• …
See “A2A” from UCSD: “horizontal benchmark extension” http://vlsicad.ucsd.edu/Publications/Conferences/313/c313.pdf
55A. B. Kahng, TAU 2016
Poor Research Enablement Has Costs• “Good” academic sizers cannot be used for industry designs• No MCMM vs. MCMM Resource (memory/runtime) problem • Simple vs. Complicated timing models Timing accuracy problem• Few benchmarks vs. industry designs Academic sizers don’t port well
• Overtrained on a particular suite of “benchmarks”• Timing/power characteristics, intuition mismatched to reality, actual
outcomes
Benchmark: netcard
aSizer1
aSizer1
aSizer1
[GLSVLSI14]
Commercial sizer wins with foundry technologies(similar leakage, better timing slack, better runtime) cSizer1: commercial sizer
aSizer1: academic sizer
56A. B. Kahng, TAU 2016
How Do We Regain Time?• Learn correlations!• Enter the “era of optimization”: 1% = 1 week• Take back what we’ve given away• Stop wasting time• Harvest low-hanging fruits
57A. B. Kahng, TAU 2016
FEOL: Layout-Dependent Effects• Layout dependent variations
• Variation in poly pitch• Well-proximity effects :
Closer to well edge more Vth shift• Intentional and unintentional Stress:
LOD, STI, DSL and SiGe• Pattern dependent dishing and oxide erosion
[Mark Zwolinski, ISPD2013]
58A. B. Kahng, TAU 2016
BEOL: Statistical RC Extraction
[source: R. Jiang, Synopsys, 2005]
• Statistical RC extraction flow comprehends spatial correlation of interconnect variations
• Proposed by industry, then dropped …. (???)
59A. B. Kahng, TAU 2016
Multi-Die: Signoff Corners • Example: inter-die process variation limits performance
improvement of 3DICs• What if SS Tier 0 and SS Tier 1 will never be stacked
together?
3D integration
SS Tier 1 wafer/die
FF Tier 0 wafer
Wafer-to-wafer (die-to-wafer) bonding: integrate SS wafer/die with FF wafer/die (SS Tier 0 wafer/die + FF Tier 1 wafer or FF Tier 0 wafer/die + SS Tier 1 wafer)
75ps
-180
-140
-100
-60
-20
SS-SS SS-FF FF-SSW
NS
(ps)
XX-YY = XX Tier 0 + YY Tier 1 Technology: 28FDSOI 3D netlist is bipartitioned with min-cut
Mix-and-match
[DATE16]
60A. B. Kahng, TAU 2016
Multi-Die Design for “Mix-and-Match”• Partition netlist such that paths have balanced delay
across two tiers Maximizes timing benefit from mix-and-match
• 16% performance increase at signoff compared to existing flows
Design Clk periodM0 1.2ns
AES 1.1ns
VGA 1.0ns-300-250-200-150-100
-500
50100150
ARM M0 AES VGA
WN
S (p
s)
Brute-force (orig) Brute-force (opt)Shrunk2D (orig) Shrunk2D (opt)
GT2012 (opt) GT2012 (orig)
Technology: 28FDSOI
61A. B. Kahng, TAU 2016
• Self-aligned multiple patterning (SAMP) + Cutmask• Cut shapes and locations determine dummy wires,
end-of-line (EOL) extensions of wire segments ⇒ affect performance
• BACUS15: Co-optimization of• Cut mask minimum spacing rules• EOL extension with usage of multiple cut masks• Metal density constraints (dummy fills)
• Insight into achievable tradeoff of performance and cost
SAMP + Cutmask: Dummy and EOL ΔTiming
Original layout dummy fillFinal layout
extension
1D wires Cut masks
cut
[BACUS15]
62A. B. Kahng, TAU 2016
Timing Impacts• Best vs. Worst EOL extension
• BEST ILP solution: little impact of EOL extensions on timing
• WORST ILP solution in N5 degrades up to 196ps compared to N7
• Post-ILP optimization is beneficial to timing• Different metal density with up to
14ps difference
-0.45-0.4
-0.35-0.3
-0.25-0.2
-0.15-0.1
-0.050
ARMCortexM0 N7
ARMCortexM0 N5
AES N7 AES N5 JPEGN7
JPEGN5
Chan
ges i
n W
NS
(ns)
Changes in WNS
BEST WORST
-0.08
-0.07
-0.06
-0.05
-0.04
-0.03
-0.02
-0.01
0ARM Cortex M0 AES JPEG
Chan
ge in
WN
S (n
s)
Change in WNS for different target metal density
40% 42.5% 45%
63A. B. Kahng, TAU 2016
How Do We Regain Time?• Learn correlations, predictions: kill loops, risk• Enter the “era of optimization”: 1% = 1 week• Take back what we’ve given away• Stop wasting time• Harvest low-hanging fruits
• + resilience, adaptivity (signoff at TT), reliability pessimism• Min cost of resilience (“MinRazor”; OD, AVS-BTI signoff) • PVS ROs
• + new paradigms (stochastic, approximate)• + pure optimization QOR (P&R&Opt for N10, N7)• ? faster RTL design/optimization; DFT …
64A. B. Kahng, TAU 2016
Costs of Reliability
AF (α)
Jrms
Temp
Wire width
MTTF
Driver size
A B Inverse relation; if A increases then B decreases
A BDirect relation; if A increases then B increases
Supply voltage
Timing slack
|ΔVthp |
Wire spacing
TDDB
TDDB
EM
EM
Freq.|ΔVthn |
Slew rate
Load/fanout
Gate length
Junction resistance
EM, TDDB, NBTI, HCI
HCINBTI
HCIHCI
HCI
HCI
HCI
HCI
NBTI
Tunable at design or runtime
Tunable at design
general
general
general
generalgeneral
general
general
general
generalgeneral
general
general
general
general
general
general
general
HCI
HCI
NBTI
65A. B. Kahng, TAU 2016
N10, N7 P&R: “Opt” MethodsDB violation MinIW violation
MinIW violation
MinOW violation
flippedCells are moved
66A. B. Kahng, TAU 2016
In Search of Lost Time
67A. B. Kahng, TAU 2016
Summary• Time = schedule, universal currency, $$$• We lose time in many ways
• Some unavoidable
• Learning, big data, optimization• Take back what we’ve given away• Stop pointless waste• Many low-hanging fruits
• Recovering lost Time = “equivalent scaling”• EDA continues the Moore’s-Law value trajectory
68A. B. Kahng, TAU 2016
THANK YOU !