An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement Andrew B. Kahng and Qinke...

Post on 20-Dec-2015

216 views 1 download

Tags:

Transcript of An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement Andrew B. Kahng and Qinke...

An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement

Andrew B. Kahng and Qinke WangUCSD CSE Department

{abk, qiwang}@cs.ucsd.edu

Work partially supported by the MARCO GigascaleSystems Research Center, NSF MIP-9987678 and the

Semiconductor Research Corporation.

Motivation

• Mixed-size placement– design productivity increasingly requires IP reuse

• processing / interface cores, embedded memories, etc.

– “boulders and dust” challenge:sizes of placeable objects can vary by factors of 10,000 or more

– placement is particularly complex in fixed-die context

• Timing-driven placement– more critical with device and interconnect scaling

Our Work

• APlace [Kahng/Wang ISPD04]: an analytic placer for wirelength-driven standard-cell placement – [Naylor et al., US Patent 6301693, 2001]– superior wirelength quality compared to Cadence

QPlace, Dragon and Capo– strong extensibility: congestion-directed placement,

I/O-core co-placement, constraint handling for mixed-signal, etc.

– poor scalability: average 13.2 X slower than Capo

• This work: extend APlace to address mixed-size placement and timing-driven placement

Outline

• APlace Background

• Extension to Mixed-Size Placement

• Extension to Timing-Driven Placement

• Conclusions and Ongoing Work

Outline

• APlace Background– Formulations

• wirelength minimization• cell spreading = density control

– Implementation

• Extension to Mixed-Size Placement

• Extension to Timing-Driven Placement

• Conclusion and Ongoing Work

Wirelength Formulation

• Placement objective: HPWL• Smooth approximation Naylor et al., US Patent 6301693, 2001

– log-sum-exp formula: pick the most dominant terms among pin coordinates

: smoothing parameter– closer to HPWL when α → 0– precise– strictly convex– continuously differentiable

Density Control

• Common strategy– divide the placement area into grids– equalize the total cell area in each grid

• Penalty of an uneven cell distribution

– not smooth or differentiable– difficult to optimize

Cell Potential Function

• Bell-shaped cell potential function [Naylor et al., US Patent 6301693, 2001]

• Cell c has potential(c, g) with respect to grid g

• Cell c at (x, y) has area A• Grid point g = (x', y')• p(d) : bell-shaped function • r : the radius of cells' potential • C : a proportionality factor, s.t.

r

1-2d2/r2

2(r-d)2/r2

r/2r/2r

d

p(d)

Implementation

• Cells are spread by minimizing the smooth density penalty function

• APlace combines the above two objectives and optimizes the following function using a Conjugate Gradient optimizer:

– Density term drives cell spreading – Wirelength term draws connected components

back toward each other

Wirelength vs. Density Objectives

• Density weight: fixed – larger spread cells out hastily without good wirelength

• Wirelength weight: variable– larger contract cells together and prevent them from

spreading out– initially set to be large– repeat until all cells are spread out evenly:

• execute conjugate-gradient solver until convergence• reduce the weight by half

Objective:

Outline

• APlace Background

• Extension to Mixed-Size Placement– Density control for macros– Legalization– Experimental results

• Extension to Timing-Driven Placement

• Conclusion and Ongoing Work

Previous Works

• Capo flow: a three stage placement-floorplanning-placement flow that uses Capo [Adya et al., ISPD02, ICCAD03]

• mPG-MS: a simulated annealing based multi-level placer[Chang et al., ASPDAC03]

• Feng Shui: a recursive bisection based placement tool using fractional cuts[Khatkhate et al., ISPD04]

Potential Function for Macros (I)

• Each module has a potential or influence with respect to nearby grids

• APlace seeks to equalize the total module potential at each grid

• rm is the radius of module’s potential • Standard-cell placement: rm is a constant r • Mixed-size placement: rm changes

according to the module's dimension• A larger block will have potential with

respect to more nearby grids

Potential Function for Macros (II)

• p(d) : potential function d : distance from module to grid

• Radius rm = w/2 + r for a block with width w

• Convex curved < w/2 + r/2

• Concave curvew/2 + r/2 < d < w/2+ r

• p(d) is smooth atd = w/2 + r/2 w/2+r

1-a*d2

b*(r-d)2

d

p(d)

w/2+r/2w/2+r w/2+r/2

Legalization

• Simplified Tetris algorithm [Hill, US Patent 6370673, 2002]– sort modules based on a linear combination of vertical

coordinate and width– search the current nearest available position for each

module• Pros and cons

fast larger blocks are fixed at a position ahead of nearby

small cells best applied when modules are distributed evenly may fail if the global placement has many overlaps

among macros

APlace-MS Results• Ten ISPD02 Mixed-Size Benchmarks (10K-70K cells)• Average wirelength increase after legalization: 6.5%

circuitWL WL_l inc. (%) CPU WL_dp impr. (%) CPU

ibm01 0.20 0.24 18.5 15 0.23 5.7 1ibm02 0.51 0.52 0.7 45 0.50 2.5 3ibm03 0.70 0.74 6.2 56 0.72 3.5 3ibm04 0.81 0.85 4.8 48 0.83 2.8 4ibm05 1.01 1.00 -0.5 15 0.98 2.0 5ibm06 0.65 0.71 9.6 76 0.68 4.4 5ibm07 1.03 1.09 5.8 98 1.05 3.7 8ibm08 1.49 1.50 0.6 128 1.46 2.7 8ibm09 1.25 1.45 15.7 113 1.38 5.2 9ibm10 2.97 3.07 3.3 206 3.00 2.2 11

APlace-MS detailed placement

Detailed placement by Feng Shui: 3.5% avg. WL improvement

HPWL Comparison

• Capo flow [ICCAD03] 26.0% (11.5% ~ 34.0%)

• mPG-MS [ASPDAC03]24.7% (9.9% ~ 40.1%)

• Feng Shui [ISPD04] 4.0% (-7.3% ~ 20.0%)

• Runtime– Xeon server (2.4GHz

CPU, double-threaded)– much slower than Feng

Shui

circuitHPWL CPU HPWL CPU HPWL CPU HPWL CPU

ibm01 0.31 20 0.30 18 0.24 3 0.23 16ibm02 0.68 11 0.74 32 0.53 5 0.50 48ibm03 1.04 59 1.20 32 0.75 6 0.72 59ibm04 1.01 15 1.05 42 0.80 7 0.83 52ibm05 1.11 5 1.09 36 1.01 8 0.98 20ibm06 0.99 18 0.92 45 0.68 10 0.68 81ibm07 1.53 25 1.37 68 1.17 13 1.05 106ibm08 1.79 29 1.64 82 1.36 16 1.46 136ibm09 1.99 29 1.86 84 1.38 15 1.38 122ibm10 4.55 116 4.36 172 3.75 22 3.00 217

Capo mPG-MS Feng Shui our placer

Placements Before and After Legalization

Outline

• APlace Background

• Extension to Mixed-Size Placement

• Extension to Timing-Driven Placement– Slack-derived edge weights– Timing-driven placement flow– Experimental results

• Conclusion and Ongoing Work

Timing-Driven Approaches

• Path based methods– consider all or a subset of paths directly – maintain an accurate timing view during optimization– complexity is relatively high

• Net based methods– transform timing constraints or requirements into either

net weight or net length (or delay) constraints

Net Based Methods

• Delay budgeting– distribute slacks from the end-points to constituent nets

along the path– may severely over-constrain the problem without

consideration of physical feasibility

• Net weighting– assign weights to nets based on timing criticality– low complexity, strong flexibility and easy

implementation– more attractive as circuit sizes increase and timing

constraints become more complex

Slack-Derived Edge Weights

• Net weighting in TD-APlace

– β : timing criticality exponent– slack(π) : the slack of path π – T : longest path delay

• Heavy net weights are assigned to:– timing critical nets exponential function

[Marquardt et al. 2000]– nets included in many critical paths

[Kong ICCAD02]

Timing-Driven Placement Flow

• Final placement stage• TrialRoute (SoC Encounter

v3.2): a fast global and detailed routing

• Extract RC• Pearl (SE v5.4): static timing

analysis (STA)• Import critical path delays to

decide net weights • Minimize weighted WL

objective

APlace-TD

LEF/DEF/GCF/SDC

TrialRouteExtractRC

Pearl

Critical PathsMin Cycle

Timing Results: Indust1 Testcase

• Indust1: ~ 7k cells• Xeon 2.4GHz CPU,

double-threaded• Minimum cycle time

– measures quality of TD placements

– initially decreases with criticality exponent

– gradually deteriorates as criticality exponent continues to increase

Results with varying criticality exponents (β)

TrialRoutebeta WL CPU WL min cycle impr. (%)

0 0.4468 11 0.5853 14.30 0.00

3 0.4468 12 0.5845 13.86 3.085 0.4469 12 0.5857 13.76 3.787 0.4470 12 0.5860 13.86 3.089 0.4469 12 0.5873 13.62 4.7611 0.4473 12 0.5873 13.66 4.4813 0.4477 12 0.5869 13.57 5.1015 0.4480 12 0.5852 13.84 3.2217 0.4480 12 0.5875 13.57 5.1019 0.4485 11 0.5881 13.58 5.03

Placement STA

Comparison vs. Industry Placers (I)

• Two industry placers– QPlace (SE v5.4)– amoebaPlace (SoC

Encounter v3.2)• Six industry circuits

– 7k ~ 40k cells– two from the ISPD 2001

Circuit Benchmarks• Experimental flow

– TD or non-TD placements– WarpRoute (SoC

Encounter v3.2) : timing-driven routing

– Extract RC– Pearl (SE v5.4): static

timing analysis (STA)

TD-Place

LEF/DEF/GCF/SDC

TD-WarpRouteExtractRC

Pearl

Min Cycle

nonTD-Place

Comparison vs. Industry Placers (II)

• Comparison to TD-QPlace and TD-amoebaPlace

• Final HPWL– TD-QPlace: 7.2%

(-1.2% ~ 7.1%)– TD-amoebaPlace:

6.5% (-11.1% ~ 23.2%)

• Min Cycle– TD-QPlace: 9.6%

(-1.2% ~ 14.8%)– TD-amoebaPlace:

8.5% (-0.8% ~ 28.5%)

– APlace: 2% (0.1% ~ 3.8%)

Route STAckts cells placer HPWL CPU WL min cycle

indust1 7077 TD-QPlace 0.58 21 0.73 15.04TD-Amoeba 0.61 1 0.88 14.91APlace 0.51 14 0.67 14.28TD-APlace 0.51 11 0.68 13.83

indust2 20094 TD-QPlace 1.29 80 2.31 38.87TD-Amoeba 1.40 5 2.11 46.98APlace 1.31 58 2.32 34.92TD-APlace 1.31 55 2.34 33.60

indust3 40447 TD-QPlace 0.34 37 0.41 27.20TD-Amoeba 0.36 5 0.42 27.31APlace 0.35 119 0.43 27.65TD-APlace 0.34 112 0.41 27.53

indust4 35272 TD-QPlaceTD-Amoeba 15.08 6 16.74 402.09APlace 12.84 65 15.33 401.49TD-APlace 12.80 80 15.30 401.28

mac1 5937 TD-QPlace 0.33 6 0.52 4.66TD-Amoeba 0.36 1 0.47 4.46APlace 0.28 6 0.40 4.13TD-APlace 0.28 9 0.40 4.06

mac2 21491 TD-QPlace 1.29 22 2.27 7.25TD-Amoeba 1.48 3 2.18 6.64APlace 1.15 37 2.13 6.26TD-APlace 1.14 38 2.11 6.18

fail in TD-QPlace

Place

Conclusions

• APlace analytic placement framework extended to address mixed-size and timing-driven placement

• Mixed-size placement – HPWL outperforms mPG-MS, Feng Shui and the Capo

flow respectively by 24.7%, 4.0% and 26.0% on average

• Timing-driven placement– Minimum cycle time outperforms that of TD-QPlace and

TD-amoebaPlace respectively by 9.6% and 8.5%– Routed WL outperforms that of TD-QPlace and

TD-amoebaPlace respectively by 7.2% and 6.5%

Ongoing Work

• Scalability issue– APlace currently does not scale to large instances– control scheme for larger circuits– Augmented Lagrangian method for constrained

nonlinear optimization

– multigrid algorithm • Extension to low power or IR drop directed

placement• Extension to 3D or thermal-aware placement

Acknowledgments

• We thank Brent Gregory, Will Naylor and Synopsys, Inc. for a research and educational license pertaining to U.S. Patents 6282693, 6662348, 6301693, 6671859 and 6665851.

Thank You !

HPWL Results Comparison

• Comparison (HPWL) – the Capo flow [ICCAD03]

26.0% (11.5% ~ 34.0%)– mPG-MS [ASPDAC03]

24.7% (9.9% ~ 40.1%)– Feng Shui [ISPD04]

4.0% (-7.3% ~ 20.0%)

• Comparison (Running Time)– Xeon server (2.4GHz

CPU, double-threaded)– much slower than Feng

Shui

Comparison of our results with the Capo flow, mPG-MS and Feng Shui

circuitHPWL CPU HPWL CPU HPWL CPU HPWL CPU

ibm01 0.31 20 0.30 18 0.24 3 0.23 16ibm02 0.68 11 0.74 32 0.53 5 0.50 48ibm03 1.04 59 1.20 32 0.75 6 0.72 59ibm04 1.01 15 1.05 42 0.80 7 0.83 52ibm05 1.11 5 1.09 36 1.01 8 0.98 20ibm06 0.99 18 0.92 45 0.68 10 0.68 81ibm07 1.53 25 1.37 68 1.17 13 1.05 106ibm08 1.79 29 1.64 82 1.36 16 1.46 136ibm09 1.99 29 1.86 84 1.38 15 1.38 122ibm10 4.55 116 4.36 172 3.75 22 3.00 217

Capo mPG-MS Feng Shui our placer