-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known...

-1-UC San Diego / VLSI CAD Laboratory

Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions

Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego

International Symposium on Physical DesignMarch 27th, 2012

OutlineOutline

Background and Motivation Benchmark Generation Experimental Framework and Results Conclusions and Ongoing Work

Gate Sizing in VLSI DesignGate Sizing in VLSI Design

Gate sizing– Essential for power, delay and area

optimization

– Tunable parameters: gate-width, gate-length and threshold voltage

– Sizing problem seen in all phases of RTL-to-GDS flow

Common heuristics/algorithms– LP, Lagrangian relaxation, convex

optimization, DP, sensitivity-based gradient descent, ...

1. Which heuristic is better?

2. How suboptimal a given sizing solution is?

systematic and quantitative comparison is required

Suboptimality of Sizing HeuristicsSuboptimality of Sizing Heuristics Eyechart *

– Built from three basic topologies, optimally sized with DP – allow suboptimalities to be evaluated

– Non-realistic: Eyechart circuits have different topology from real design – large depth (650 stages) and small Rent parameter (0.17)

More realistic benchmarks are required along w/ automated generation flow

*Gupta et al., “Eyecharts: Constructive Benchmarking of Gate Sizing Heuristics”, DAC 2010.

MESHSTAR

Our Work: Realistic Benchmark Generation w/ Known Optimal Solution

Our Work: Realistic Benchmark Generation w/ Known Optimal Solution1.Propose benchmark circuits with known optimal

solutions2.The benchmarks resemble real designs

– Gate count, path depth, Rent parameter and net degree

3.Assess suboptimality of standard gate sizing approaches

Construct chains

Find optimal solution

Connect chainskeeping the optimal solution

list g

atorCharacteristic

parameters

Benchmark circuit

w/ known optimal solution

Real design

Extract parameters

Automated benchmark generation flow

OutlineOutline

Background and Motivation Benchmark Considerations and

Generation Experimental Framework and Results Conclusions and Ongoing Work

Benchmark ConsiderationsBenchmark Considerations Realism vs. Tractability to Analysis – opposing

goals To construct realistic benchmark: use design

characteristic parameters– # primary ports, path depth, fanin/fanout

distribution

To enable known optimal solutions– Library simplification as in Gupta et al. 2010:

slew-independent library

1 2 3 4 5 60

fanin fanout

design: JPEG Encoder

Fanin distirbution25%: 1-input60%: 2-input15%: >3-input

Path depth: 72Avg. net degree: 1.84Rent parameter: 0.72

Benchmark GenerationBenchmark Generation Input parameters

1. timing budget T2. depth of data path K3. number of primary ports N4. fanin, fanout distribution fid(i), fod(j)

Constraints– T should be larger than min. delay of K-stage

Generation flow1. construct N chains with depth K2. attach connection cells (C )3. connect chains netlist with N*K + C cells

Benchmark Generation: Construct ChainsBenchmark Generation: Construct Chains

1. Construct N chains each with depth k (N*k cells)

2. Assign gate instance according to fid(i)3. Assign # fanouts to output ports according to

fod(o) Assignment strategy: arranged and random

chain1

chain2

chainN

stage1 stageK-1 stageK

gate(1,1)

gate(N,K)

Benchmark Generation: Construct ChainsBenchmark Generation: Construct Chains

1. Construct N chains each with depth k (N*k cells)

2. Assign gate instance according to fid(i)3. Assign # fanouts to output ports according to

fod(o) Assignment strategy: arranged and random

fanout fanin

Arranged assignment Random assignment

Benchmark Generation: Find Optimal Solution with DP Benchmark Generation: Find Optimal Solution with DP

1. Attach connection cells to all open fanouts- to connect chains keeping optimal solution

2. Perform dynamic programming with timing budget T

- optimal solution is achievable w/ slew-independent lib.

chain1

chain2

chainN

connection cellchain1

chain2

chainN

Benchmark Generation: Solving a Chain Optimally (Example)

8 20 1

INV1 INV2 INV3

Dmax = 8

1 10 22 10 23 5 14 5 15 5 16 5 17 5 18 5 1

3 20 24 15 15 15 26 10 17 10 18 10 1

4 20 25 15 16 15 27 10 18 10 1

Stage 1 Stage 2 Stage 3

Stage 3Stage 1 Stage 2

Budget Power Size

Load= 3

Load= 6

Load= 3

Load= 6

size inputcap

leakage

load 3

load 6

Size 1 3 5 3 4

Size 2 6 10 1 2

2 10 23 10 24 5 15 5 16 5 17 5 18 5 1

8 25 2

OPTIMIZED CHAIN

size 2 size 1 size 1

Benchmark Generation: Connect ChainsBenchmark Generation: Connect Chains

1. Run STA and find arrival time for each gate2. Connect each connection cell to open fanin port

- connect only if timing constraints are satisfied- connection cells do not change the optimal chain solution

3. Tie unconnected ports to logic high or low

chain1

chain2

chainN

chain1

chain2

chainN

Benchmark Generation: Generated NetlistBenchmark Generation: Generated Netlist Generated output:

– benchmark circuit of N*K + C cells w/ optimal solution

Schematic of generated netlist (N = 10, K = 20)

Chains are connected to each other various topologies

OutlineOutline

Background and Motivation Benchmark Generation Experimental Framework and

Results Conclusions and Ongoing Work

Experimental SetupExperimental Setup Delay and Power model (library)

– LP: linear increase in power – gate sizing context

– EP: exponential increase in power – Vt or gate-length

Heuristics compared– Two commercial tools (BlazeMO, Cadence

Encounter)– UCLA sizing tool– UCSD sensitivity-based leakage optimizer

Realistic benchmarks: six open-source designs

Suboptimality calculationSuboptimality = powerheuristic - poweropt

poweropt

Generated Benchmark - ComplexityGenerated Benchmark - Complexity

Complexity (suboptimality) of generated benchmarkChain-only vs. connected-chain topologies

EP-40-20

EP-40-40

EP-80-20

EP-80-40

LP-40-20

LP-40-40

LP-80-20

LP-80-40

chain-only

connected

EP-40-20

EP-40-40

EP-80-20

EP-80-40

LP-40-20

LP-40-40

LP-80-20

LP-80-40

20.0%chain-only

connected

Commercial tool Greedy

Chain-only: avg. 2.1% Connected-chain: avg. 12.8%

[library]-[N]-[k]

Generated Benchmark - ConnectivityGenerated Benchmark - Connectivity

Problem complexity and circuit connectivity1. Arranged assignment: improve connectivity

(larger fanin – later stage, larger fanout – earlier stage)

2. Random assignment: improve diversity of topologyarranged random unconnecte

dSubopt.

100% 0% 0.00% 2.60%

75% 25% 0.00% 6.80%

50% 50% 0.25% 10.30%

25% 75% 0.75% 11.20%

0% 100% 17.00% 7.70%

Suboptimality w.r.t. ParametersSuboptimality w.r.t. Parameters For different number of chains

40 80 160 320 6408%

subopt.(Comm)subopt.(Greedy)subopt.(SensOpt)runtime(Comm)runtime(Greedy)

number of chains

For different number of stages

20 40 60 80 1008%

subopt.(Comm)subopt.(Greedy)subopt.(SensOpt)runtime(Comm)runtime(Greedy)runtime(SensOpt)

number of stages

Total # paths increase significantly w.r.t. N and K

Suboptimality w.r.t. Parameters (2)Suboptimality w.r.t. Parameters (2)

For different average net degrees

For different delay constraints

1.2 1.6 2 2.40%

1000.0

average net degree

0.4 0.5 0.6 0.7 0.8 0.9 1 1.10%

timing constraint (ns)

Generated Realistic BenchmarksGenerated Realistic Benchmarks

Target benchmarks– SASC, SPI, AES, JPEG, MPEG (from OpenCores)– EXU (from OpenSPARC T1)

Characteristic parameters of real and generated benchmarks

data depth

#instance

real designs generated

Rent param.

net degree

Rent param.

net degree

SASC 20 624 0.858 2.06 0.865 2.06

SPI 33 1092 0.880 1.81 0.877 1.80

EXU 31 25560 0.858 1.91 0.814 1.90

AES 23 23622 0.810 1.89 0.820 1.88

JPEG 72 141165 0.721 1.84 0.831 1.84

MPEG 33 578034 0.848 1.59 0.848 1.60

Suboptimality of HeuristicsSuboptimality of Heuristics Suboptimality w.r.t. known optimal solutions

for generated realistic benchmarks

Vt swap context –up to 52.2% avg. 16.3% eye-

chartSASC SPI AES EXU JPEG MPEG

20.00%

40.00%

60.00%

Comm1 Comm2 Greedy SensOpt

eye-chart

SASC SPI AES EXU JPEG MPEG

-20.00%

20.00%

40.00%

60.00%

Comm1 Comm2 Greedy SensOptGate sizing context –up to 43.7% avg. 25.5%

Suboptimality

* Greedy results for MPEG are missing

With EP library

With LP library

Comparison w/ Real Designs Comparison w/ Real Designs Suboptimality versus one specific heuristic (SensOpt)

Real designs and real delay/leakage library (TSMC 65nm) case Actual suboptimaltiy will be greater !

SASC SPI AES EXU JPEG MPEG-10.00%

10.00%

20.00%

30.00%

40.00%

50.00%

Comm1 Comm2 Greedy

SASC SPI AES EXU JPEG MPEG-10.00%

10.00%

20.00%

30.00%

40.00%

Comm1 Comm2 Greedy

Suboptimality from our benchmarks

Discrepancy: simplified delay model, reduced library set, ...

ConclusionsConclusions

A new benchmark generation technique for gate sizing construct realistic circuits with known optimal solutions

Our benchmarks enable systematic and quantitative study of common sizing heuristics

Common sizing methods are suboptimal for realistic benchmarks by up to 52.2% (Vt assignment) and 43.7% (sizing)

http://vlsicad.ucsd.edu/SIZING/

Ongoing WorkOngoing Work

Analyze discrepancies between real and artificial benchmarks

Handle more realistic delay model– Use realistic delay library in the context

of realistic benchmarks with tight upper bounds

Alternate approach for netlist generation– (1) cutting nets in a real design and find

optimal solution (2) reconnecting the nets keeping the optimal solution

Thank you

-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known...

Documents

Transcript of -1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known...

Modern Physical Design - UCSD VLSI CAD Laboratoryvlsicad.ucsd.edu/UCLAWeb/ICCAD99TUTORIAL/part2.pdf · 1 1 Andrew B. Kahng ICCAD Tutorial: November 11, 1999 C Majid Sarrafzadeh Modern

© KLMH Lienig 1 Chapter 1 – Introduction Original Authors: Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin Hu VLSI Physical Design: From Graph Partitioning.

A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.

Smart Non-Default Routing for Clock Power Reduction · Smart Non-Default Routing for Clock Power Reduction Andrew B. Kahng†‡, Seokhyeong Kang† and Hyein Lee† †ECE and ‡CSE

Copy Detection for Intellectual Property Protection of VLSI Design Andrew B. Kahng, Darko Kirovski, Stefanus Mantik, Miodrag Potkonjak and Jennifer L.

A. Kahng, EDA Forum 2003 Keynote, 031106 The Design-Manufacturing Roadmap Andrew B. Kahng UC San Diego CSE & ECE Departments .

UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.

High-Performance Gate Selection with a Signoff Timer Andrew B. Kahng *, Seokhyeong Kang *, Hyein Lee *, Igor L. Markov + and Pankit Thapar + UC San Diego.

Andrew Kahng – September 2001 Finding and Sharing Brick Walls CANDE September 22, 2001 Andrew B. Kahng, UCSD CSE & ECE Departments email: abk@ucsd.edu.

Design Sensitivities to Variability: Extrapolations and Assessments in Nanometer VLSI Y. Kevin Cao *, Puneet Gupta +, Andrew Kahng +, Dennis Sylvester.

-1- Sensitivity-Guided Metaheuristics for Accurate Discrete Gate Sizing Jin Hu*, Andrew B. Kahng, Seokhyeong Kang, Myung-Chul Kim* and Igor L. Markov*

TritonRoute: An Initial Detailed Router for Advanced VLSI ...TritonRoute: An Initial Detailed Router for Advanced VLSI Technologies Andrew B. Kahng†‡, Lutong Wang‡, Bangqi Xu‡

Es Vlsi Vlsi Es Es Vlsi Vlsi Es Vlsi Esd

Analytical Minimization of Signal Delay in VLSI Placement Andrew B. Kahng and Igor L. Markov UCSD, Univ. of Michigan imarkov.

Part V: Design Optimizations - vlsicad.ucsd.edu · DAC-2006 DFM Tutorial: Nagaraj, Schoellkopf, Smayling, Wong, Kahng Andrew B. Kahng Three Trends ... Cadence SOC Encounter • Placement

High-Performance Gate Sizing with a Signoff Timer Andrew B. Kahng *, Seokhyeong Kang *, Hyein Lee *, Igor L. Markov + and Pankit Thapar + UC San Diego.

UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,

-1- UC San Diego / VLSI CAD Laboratory Accuracy-Configurable Adder for Approximate Arithmetic Designs Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY,

Timing Analysis and Optimization Implications of Bimodal CD Distribution in Double Patterning Lithography Kwangok Jeong and Andrew B. Kahng VLSI CAD LABORATORY.

Interconnect Implications of Growth-Based Structural Models for VLSI Circuits* Chung-Kuan Cheng, Andrew B. Kahng and Bao Liu UC San Diego CSE Dept. e-mail:

High-Performance Gate Selection with a Signoff Timer Andrew B. Kahng , Seokhyeong Kang , Hyein Lee *, Igor L. Markov + and Pankit Thapar + UC San Diego.

-1- Sensitivity-Guided Metaheuristics for Accurate Discrete Gate Sizing Jin Hu, Andrew B. Kahng, Seokhyeong Kang, Myung-Chul Kim and Igor L. Markov*

High-Performance Gate Sizing with a Signoff Timer Andrew B. Kahng , Seokhyeong Kang , Hyein Lee *, Igor L. Markov + and Pankit Thapar + UC San Diego.