Post on 01-Jan-2016
-1-UC San Diego / VLSI CAD Laboratory
Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions
Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions
Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego
International Symposium on Physical DesignMarch 27th, 2012
-2-
OutlineOutline
Background and Motivation Benchmark Generation Experimental Framework and Results Conclusions and Ongoing Work
-3-
Gate Sizing in VLSI DesignGate Sizing in VLSI Design
Gate sizing– Essential for power, delay and area
optimization
– Tunable parameters: gate-width, gate-length and threshold voltage
– Sizing problem seen in all phases of RTL-to-GDS flow
Common heuristics/algorithms– LP, Lagrangian relaxation, convex
optimization, DP, sensitivity-based gradient descent, ...
1. Which heuristic is better?
2. How suboptimal a given sizing solution is?
systematic and quantitative comparison is required
-4-
Suboptimality of Sizing HeuristicsSuboptimality of Sizing Heuristics Eyechart *
– Built from three basic topologies, optimally sized with DP – allow suboptimalities to be evaluated
– Non-realistic: Eyechart circuits have different topology from real design – large depth (650 stages) and small Rent parameter (0.17)
More realistic benchmarks are required along w/ automated generation flow
*Gupta et al., “Eyecharts: Constructive Benchmarking of Gate Sizing Heuristics”, DAC 2010.
Chain
MESHSTAR
-5-
Our Work: Realistic Benchmark Generation w/ Known Optimal Solution
Our Work: Realistic Benchmark Generation w/ Known Optimal Solution1.Propose benchmark circuits with known optimal
solutions2.The benchmarks resemble real designs
– Gate count, path depth, Rent parameter and net degree
3.Assess suboptimality of standard gate sizing approaches
Construct chains
Find optimal solution
Connect chainskeeping the optimal solution
Net
list g
ener
atorCharacteristic
parameters
Benchmark circuit
w/ known optimal solution
Real design
Extract parameters
Automated benchmark generation flow
-6-
OutlineOutline
Background and Motivation Benchmark Considerations and
Generation Experimental Framework and Results Conclusions and Ongoing Work
-7-
Benchmark ConsiderationsBenchmark Considerations Realism vs. Tractability to Analysis – opposing
goals To construct realistic benchmark: use design
characteristic parameters– # primary ports, path depth, fanin/fanout
distribution
To enable known optimal solutions– Library simplification as in Gupta et al. 2010:
slew-independent library
1 2 3 4 5 60
0.2
0.4
0.6
fanin fanout
design: JPEG Encoder
Fanin distirbution25%: 1-input60%: 2-input15%: >3-input
Path depth: 72Avg. net degree: 1.84Rent parameter: 0.72
-8-
Benchmark GenerationBenchmark Generation Input parameters
1. timing budget T2. depth of data path K3. number of primary ports N4. fanin, fanout distribution fid(i), fod(j)
Constraints– T should be larger than min. delay of K-stage
chain
Generation flow1. construct N chains with depth K2. attach connection cells (C )3. connect chains netlist with N*K + C cells
-9-
Benchmark Generation: Construct ChainsBenchmark Generation: Construct Chains
1. Construct N chains each with depth k (N*k cells)
2. Assign gate instance according to fid(i)3. Assign # fanouts to output ports according to
fod(o) Assignment strategy: arranged and random
chain1
chain2
chainN
...
stage1 stageK-1 stageK
gate(1,1)
gate(N,K)
-10-
Benchmark Generation: Construct ChainsBenchmark Generation: Construct Chains
1. Construct N chains each with depth k (N*k cells)
2. Assign gate instance according to fid(i)3. Assign # fanouts to output ports according to
fod(o) Assignment strategy: arranged and random
fanout fanin
Arranged assignment Random assignment
-11-
Benchmark Generation: Find Optimal Solution with DP Benchmark Generation: Find Optimal Solution with DP
1. Attach connection cells to all open fanouts- to connect chains keeping optimal solution
2. Perform dynamic programming with timing budget T
- optimal solution is achievable w/ slew-independent lib.
chain1
chain2
chainN
...
connection cellchain1
chain2
chainN
...
-12-
Benchmark Generation: Solving a Chain Optimally (Example)
Benchmark Generation: Solving a Chain Optimally (Example)
6
8 20 1
INV1 INV2 INV3
Dmax = 8
1 10 22 10 23 5 14 5 15 5 16 5 17 5 18 5 1
3 20 24 15 15 15 26 10 17 10 18 10 1
4 20 25 15 16 15 27 10 18 10 1
Stage 1 Stage 2 Stage 3
Stage 3Stage 1 Stage 2
Budget Power Size
Budget Power Size
Budget Power Size
Load= 3
Load= 6
Load= 3
Load= 6
size inputcap
leakage
power
delay
load 3
load 6
Size 1 3 5 3 4
Size 2 6 10 1 2
2 10 23 10 24 5 15 5 16 5 17 5 18 5 1
8 25 2
OPTIMIZED CHAIN
size 2 size 1 size 1
-13-
Benchmark Generation: Connect ChainsBenchmark Generation: Connect Chains
1. Run STA and find arrival time for each gate2. Connect each connection cell to open fanin port
- connect only if timing constraints are satisfied- connection cells do not change the optimal chain solution
3. Tie unconnected ports to logic high or low
c
g
wc,g
ac
ag
dgc
chain1
chain2
chainN
...
VDD
chain1
chain2
chainN
...
-14-
Benchmark Generation: Generated NetlistBenchmark Generation: Generated Netlist Generated output:
– benchmark circuit of N*K + C cells w/ optimal solution
Schematic of generated netlist (N = 10, K = 20)
Chains are connected to each other various topologies
-15-
OutlineOutline
Background and Motivation Benchmark Generation Experimental Framework and
Results Conclusions and Ongoing Work
-16-
Experimental SetupExperimental Setup Delay and Power model (library)
– LP: linear increase in power – gate sizing context
– EP: exponential increase in power – Vt or gate-length
Heuristics compared– Two commercial tools (BlazeMO, Cadence
Encounter)– UCLA sizing tool– UCSD sensitivity-based leakage optimizer
Realistic benchmarks: six open-source designs
Suboptimality calculationSuboptimality = powerheuristic - poweropt
poweropt
-17-
Generated Benchmark - ComplexityGenerated Benchmark - Complexity
Complexity (suboptimality) of generated benchmarkChain-only vs. connected-chain topologies
EP-40-20
EP-40-40
EP-80-20
EP-80-40
LP-40-20
LP-40-40
LP-80-20
LP-80-40
0.0%
5.0%
10.0%
15.0%
20.0%
chain-only
connected
EP-40-20
EP-40-40
EP-80-20
EP-80-40
LP-40-20
LP-40-40
LP-80-20
LP-80-40
-5.0%
0.0%
5.0%
10.0%
15.0%
20.0%chain-only
connected
Sub
opti
malit
y
Commercial tool Greedy
Chain-only: avg. 2.1% Connected-chain: avg. 12.8%
[library]-[N]-[k]
-18-
Generated Benchmark - ConnectivityGenerated Benchmark - Connectivity
Problem complexity and circuit connectivity1. Arranged assignment: improve connectivity
(larger fanin – later stage, larger fanout – earlier stage)
2. Random assignment: improve diversity of topologyarranged random unconnecte
dSubopt.
100% 0% 0.00% 2.60%
75% 25% 0.00% 6.80%
50% 50% 0.25% 10.30%
25% 75% 0.75% 11.20%
0% 100% 17.00% 7.70%
-19-
Suboptimality w.r.t. ParametersSuboptimality w.r.t. Parameters For different number of chains
40 80 160 320 6408%
9%
10%
11%
12%
13%
14%
1
10
100
1000
10000
subopt.(Comm)subopt.(Greedy)subopt.(SensOpt)runtime(Comm)runtime(Greedy)
number of chains
sub
opti
mal
ity
run
tim
e (m
in)
For different number of stages
20 40 60 80 1008%
9%
10%
11%
12%
13%
14%
1
10
100
1000
subopt.(Comm)subopt.(Greedy)subopt.(SensOpt)runtime(Comm)runtime(Greedy)runtime(SensOpt)
number of stages
sub
opti
mal
ity
run
tim
e (m
in)
Total # paths increase significantly w.r.t. N and K
-20-
Suboptimality w.r.t. Parameters (2)Suboptimality w.r.t. Parameters (2)
For different average net degrees
For different delay constraints
1.2 1.6 2 2.40%
20%
40%
60%
80%
100%
120%
0.1
1.0
10.0
100.0
1000.0
subopt.(Comm)subopt.(Greedy)subopt.(SensOpt)runtime(Comm)runtime(Greedy)
average net degree
sub
opti
mal
ity
run
tim
e (m
in)
0.4 0.5 0.6 0.7 0.8 0.9 1 1.10%
5%
10%
15%
20%
25%
0.1
1.0
10.0
100.0
subopt.(Comm)subopt.(Greedy)subopt.(SensOpt)runtime(Comm)runtime(Greedy)
timing constraint (ns)
sub
op
tim
alit
y
run
tim
e (m
in)
-21-
Generated Realistic BenchmarksGenerated Realistic Benchmarks
Target benchmarks– SASC, SPI, AES, JPEG, MPEG (from OpenCores)– EXU (from OpenSPARC T1)
Characteristic parameters of real and generated benchmarks
data depth
#instance
real designs generated
Rent param.
net degree
Rent param.
net degree
SASC 20 624 0.858 2.06 0.865 2.06
SPI 33 1092 0.880 1.81 0.877 1.80
EXU 31 25560 0.858 1.91 0.814 1.90
AES 23 23622 0.810 1.89 0.820 1.88
JPEG 72 141165 0.721 1.84 0.831 1.84
MPEG 33 578034 0.848 1.59 0.848 1.60
-22-
Suboptimality of HeuristicsSuboptimality of Heuristics Suboptimality w.r.t. known optimal solutions
for generated realistic benchmarks
Vt swap context –up to 52.2% avg. 16.3% eye-
chartSASC SPI AES EXU JPEG MPEG
0.00%
20.00%
40.00%
60.00%
Comm1 Comm2 Greedy SensOpt
eye-chart
SASC SPI AES EXU JPEG MPEG
-20.00%
0.00%
20.00%
40.00%
60.00%
Comm1 Comm2 Greedy SensOptGate sizing context –up to 43.7% avg. 25.5%
Suboptimality
* Greedy results for MPEG are missing
With EP library
With LP library
-23-
Comparison w/ Real Designs Comparison w/ Real Designs Suboptimality versus one specific heuristic (SensOpt)
Real designs and real delay/leakage library (TSMC 65nm) case Actual suboptimaltiy will be greater !
SASC SPI AES EXU JPEG MPEG-10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
Comm1 Comm2 Greedy
SASC SPI AES EXU JPEG MPEG-10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
Comm1 Comm2 Greedy
Suboptimality from our benchmarks
Discrepancy: simplified delay model, reduced library set, ...
-24-
ConclusionsConclusions
A new benchmark generation technique for gate sizing construct realistic circuits with known optimal solutions
Our benchmarks enable systematic and quantitative study of common sizing heuristics
Common sizing methods are suboptimal for realistic benchmarks by up to 52.2% (Vt assignment) and 43.7% (sizing)
http://vlsicad.ucsd.edu/SIZING/
-25-
Ongoing WorkOngoing Work
Analyze discrepancies between real and artificial benchmarks
Handle more realistic delay model– Use realistic delay library in the context
of realistic benchmarks with tight upper bounds
Alternate approach for netlist generation– (1) cutting nets in a real design and find
optimal solution (2) reconnecting the nets keeping the optimal solution
-26-
Thank you