Large-Scale Circuit Placementcadlab.cs.ucla.edu/~cong/slides/git_april03.pdfPeko01 12506 13865 113...

61
Large-Scale Circuit Placement: The Gap and Promise Contributors: Chin-Chih Chang, Kenton Sze, Tim Kong, Michail Romesis, Joe Shinnerl, Min Xie, Xin Yuan Jason Cong Computer Science Department University of California, Los Angeles [email protected]

Transcript of Large-Scale Circuit Placementcadlab.cs.ucla.edu/~cong/slides/git_april03.pdfPeko01 12506 13865 113...

  • Large-Scale Circuit Placement:The Gap and Promise

    Contributors: Chin-Chih Chang, Kenton Sze, Tim Kong, MichailRomesis, Joe Shinnerl, Min Xie, Xin Yuan

    Jason CongComputer Science Department

    University of California, Los Angeles

    [email protected]

  • 2

    Outline

    Optimality and scalability study of placement problem -- gap analysisResearch on multilevel large-scale placement problemOur research plan

  • 3

    Optimality and Scalability Study---Motivation

    Lack of significant progress in wirelength reduction

    Rate of reduction is about 5-10% every 2-3 yearsLatest developments in placement differ mainly in runtime

    Where do we standHow much room for further improvement?Will existing placement engines scale well to 10+M gate designs?

    Need to quantify the optimality and scalability of state-of-the-art placement engines

  • 4

    Optimality and Scalability Study---Motivation (II)

    Most work compare only with existing heuristicsUse real design based benchmarks

    ISPD98 [C. Alpert 1998]

    Use synthetic benchmarkscirc and gen [M. D. Hutton et al, 1998]gnl [D. Stroobandt et al, 2000]

    Little understanding about the gap from the optimal

  • 5

    Optimality and Scalability Study---Related Work

    Quantified Suboptimality of VLSI Layout Heuristics [L. Hagen et al, 1995]

    Construct scaled instance with known upperboundx

    x x x

    x x x

    x x x

    ? Over 10% area suboptimality in TimberWolf Notable wirelength suboptimality in GORDIAN-L But test cases are small, the largest netlist is less than 40K

  • 6

    Our Contribution: Placement Example Construction with Known Optimal Wirelength

    Construct instances with known optimal using the characteristic of the original problem

    ?

    Optimality and Scalability Study of Existing Placement Algorithms [C. Chang et al, 2003]

    Studied the optimality and scalability of existing algorithms on constructed instances

  • 7

    Construction of Placement Examples with Known Optimal Wirelength (PEKO Examples)

    InputDesired number of placeable modules tNet Distribution Vector (NDV) D = ( d2, d3, … dp ), dkis the # of k-pin nets in the circuit

    t and D are extracted from a real circuitOutput

    Cell library LNetlist N with known optimal wirelength

    ConstraintN has D as its NDV

  • 8

    Our Algorithm for Constructing PEKO Examples

    All the modules are of equal size, and there is no space between rows and adjacent modulesFor 2-pin nets , connect any two adjacent modulesFor each n-pin net , connect the n modules in a rectangular region close to a square, i.e., the length of each side is close to sqrt(n)The wirelength is of each n-pin net is given by

    / 2n n n + −

  • 9

    Illustration: PEKO Example Construction

    Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1}

    Total WL = 110

    #2-pin nets = 34, WL = 34

    #7-pin nets = 1, WL = 4

    #3-pin nets = 20, WL = 40

    #5-pin nets = 4, WL = 12

    #6-pin nets = 2, WL = 6

    #4-pin nets = 7, WL= 14

    • Method first conceived by K. Boese (1995), but not implemented

  • 10

    White Space Insertion

    Option 1: expanding one dimension of the chip

    Option 2: removing some of the nets

    Need for white spacemimic real designsEase for legalization

  • 11

    Four New Suites of Placement Examples with Known Optimal Wirelength

    Module number t and NDV extracted from ISPD98 [C. Alpert, 1998]Two suites without pads (suite1 and suite2)

    suite2 is derived by scaling t and NDV by a factor of 10

    Two suites with pads (suite3 and suite4)suite4 is derived by scaling t and NDV by a factor of 10

    15% white space by expanding on dimension of the chip

    URL: http://ballade.cs.ucla.edu/~pubbench/peko.htm

  • 12

    PEKO Characteristics

    ckt #cell #net #row Optimal WLPeko01 12506 13865 113 8.14E+05Peko02 19342 19325 140 1.26E+06Peko03 22853 27118 152 1.50E+06Peko04 27220 31683 166 1.75E+06Peko05 28146 27777 169 1.91E+06Peko06 32332 34660 181 2.06E+06Peko07 45639 47830 215 2.88E+06Peko08 51023 50227 227 3.14E+06Peko09 53110 60617 231 3.64E+06Peko10 68685 74452 263 4.73E+06Peko11 70152 81048 266 4.71E+06Peko12 70439 76603 266 5.00E+06Peko13 83709 99176 290 5.87E+06Peko14 147088 152255 385 9.01E+06Peko15 161187 186225 402 1.15E+07Peko16 182980 189544 429 1.25E+07Peko17 184752 188838 431 1.34E+07Peko18 210341 201648 460 1.32E+07

    ckt #cell #net #row Optimal WLPeko01x10 125060 138650 335 8.14E+06Peko02x10 193420 193250 441 1.26E+07Peko03x10 228530 271180 479 1.50E+07Peko04x10 272200 316830 523 1.75E+07Peko05x10 281460 277770 532 1.91E+07Peko06x10 323320 346600 570 2.06E+07Peko07x10 456390 478300 677 2.88E+07Peko08x10 510230 502270 715 3.14E+07Peko09x10 531100 606170 730 3.64E+07Peko10x10 686850 744520 830 4.73E+07Peko11x10 701520 810480 839 4.71E+07Peko12x10 704390 766030 840 5.00E+07Peko13x10 837090 991760 916 5.87E+07Peko14x10 1470880 1522550 1214 9.01E+07Peko15x10 1611870 1862250 1271 1.15E+08Peko16x10 1829800 1895440 1354 1.25E+08Peko17x10 1847520 1888380 1360 1.34E+08Peko18x10 2103410 2016480 1451 1.32E+08

    PEKO Suite1 ( 12.5k – 210k ) PEKO Suite2 ( 125k – 2.1M )

  • 13

    Tested four State-of-the-Art PlacersCapo [A. E. Caldwell et al, 2000]

    based on multilevel partitioneraims to enhance the routability

    Dragon [M. Wang et al, 2000]uses hMetis for initial partitionSA with bin-based swapping

    mPL [T. Chan et al, 2000]nonlinear programming on the coarsest levelGoto based relaxation

    QPlace [Cadence Inc.]quadratic programmingcomponent of Silicon Ensemble

  • 14

    Experiment with State-of-the-Art Placers Using PEKO Suite1

    1.001.201.401.601.802.002.202.402.602.80

    0 50000 100000 150000 200000 250000

    #cells

    Mul

    tiple

    of O

    ptim

    al

    Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.1.55

    05000

    1000015000200002500030000350004000045000

    0 50000 100000 150000 200000 250000

    #cells

    runt

    ime(

    s)

    Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.1.55

    Existing algorithms are 66-153% away from the optimal on PEKO On examples with pads

    mPL and QPlace show improvement of 12% and 10% respectivelyDragon and Capo do not benefit much from the additional information

    There is significant room for improvement in placement algorithms!

  • 15

    Experiment with State-of-the-Art Placers Using PEKO Suite1 & Suite2

    0

    10000

    20000

    30000

    40000

    50000

    60000

    10000 100000 1000000 10000000#cells

    runt

    ime(

    s)

    Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.11.55

    1.00

    1.20

    1.40

    1.60

    1.80

    2.00

    2.20

    2.40

    2.60

    2.80

    10000 100000 1000000 10000000

    #cells

    Mul

    tiple

    of O

    ptim

    al

    Dragon v.2.0 capo v.8.0 mPL v.1.2 qplace v.5.1.55

    Capo, QPlace and mPL scales well in runtimeAverage solution quality of each tool shows deterioration by an additional 4% to 25% when the problem size increases by a factor of 10QoR of the existing placement algorithms can be 80% - 180% away from the optimal for large designs

  • 16

  • 17

    Limitation of PEKO Examples

    Optimal solution includes local nets onlyUnlikely for real designs

    Measure wirelength only Timing and routability are important objectives for placement algorithms as well

  • 18

    Impact of Global Connections in Real Examples

    circuit height width WL of longest net

    WL contribution of longest 10%

    ibm01 8158 4530 7148 51%ibm02 8158 6430 14224 46%ibm03 8158 6740 10624 58%ibm04 8158 9140 15171 53%ibm05 8158 11055 19064 47%ibm06 8158 8715 13966 61%ibm07 8158 14605 14051 51%ibm08 8158 15895 16142 60%ibm09 8158 16395 13780 55%ibm10 8158 27890 30755 53%ibm11 16350 10925 19234 59%ibm12 16350 15545 26748 52%ibm13 16350 12230 19539 59%ibm14 16350 25475 26370 61%ibm15 16350 23785 27284 63%ibm16 16350 34015 42860 59%ibm17 16283 38895 45686 56%ibm18 16350 37065 52846 64%

    Produced by Dragon on ISPD98The wirelengthcontribution from global connections can be significant!Need to consider the impact of global connections

  • 19

    Placement Examples with Known Upperbounds (PEKU)

    Extend PEKO by introducing non-local nets to mimic global connections

    All the modules are of equal size, and there is no space between rows and adjacent modules

    For nets of degree ii, a subset of them are generated by randomly connecting ii modules, the rest are generated optimally as in PEKO

  • 20

    Placement Examples with Known Upperbounds (PEKU)

    Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} α=0.2

    Total WL = 160

    Generate 28 2-pin optimally

    Generate 6 2-pin randomly

    Generate 16 3-pin optimally

    Generate 4 3-pin randomly

    Generate 6 4-pin optimallyGenerate 1 4-pin randomly

    Generate 4 5-pin optimally

    Generate 2 6-pin optimallyGenerate 1 7-pin optimally

  • 21

    Placement Examples with Global Connections only (G-PEKU)

    Each net connects either a row or column

    Obvious upper boundSum the length of each row and column

    Similar to datapath examples

    Input : t = 64

  • 22

    PEKU Suite

    Module numbers and NDVs extracted from ISPD98 Remove connections with pads Vary α from 0 to 10% 15% white space by expanding one dimension of the chip

  • 23

    PEKU Suite% non-

    local nets

    circuit #cell #net #rowRow

    utilization

    LB UB

    Peku01 12506 14111 113 85% 8.14E+05 8.14E+05Peku05 28146 28446 169 85% 1.91E+06 1.91E+06Peku10 68685 75196 263 85% 4.73E+06 4.73E+06Peku15 161187 186608 402 85% 1.15E+07 1.15E+07Peku18 210341 201920 460 85% 1.32E+07 1.32E+07Peku01 12506 14111 113 85% 8.14E+05 9.23E+05Peku05 28146 28446 169 85% 1.91E+06 2.24E+06Peku10 68685 75196 263 85% 4.73E+06 6.17E+06Peku15 161187 186608 402 85% 1.15E+07 1.71E+07Peku18 210341 201920 460 85% 1.32E+07 2.01E+07Peku01 12506 14111 113 85% 8.14E+05 1.02E+06Peku05 28146 28446 169 85% 1.91E+06 2.63E+06Peku10 68685 75196 263 85% 4.73E+06 7.52E+06Peku15 161187 186608 402 85% 1.15E+07 2.30E+07Peku18 210341 201920 460 85% 1.32E+07 2.75E+07

    Up to 10%

    0

    0.25%

    0.50%

    URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

  • 24

    G-PEKU Suite

    circuit #cell #net #row UBGPeku01 12506 224 113 7.93E+05GPeku05 28146 336 169 1.79E+06GPeku10 68685 525 263 4.38E+06GPeku15 161187 803 402 1.03E+07GPeku18 210341 918 460 1.34E+07

    Module numbers extracted from ISPD98

    URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

  • 25

    Studied Four State-of-the-Art PlacersCapo [A. Caldwell et al, 2000]

    Based on multilevel partitionerAims to enhance the routability

    Dragon [M. Wang et al, 2000]Uses hMetis for initial partitionSA with bin-based swapping

    mPL [T. Chan et al, 2000]Nonlinear programming on the coarsest levelGoto based relaxation

    mPG [C. Chang et al, 2002]Uses FC clustering and hierarchical density control Incremental A-tree for routability

  • 26

    Wirelength Comparison between State-of-the-Art Placers Using PEKU

    1

    1.2

    1.4

    1.6

    1.8

    2

    2.2

    0.00% 0.25% 0.50% 0.75% 1.00% 2.00% 5.00% 10.00%% of non-local nets

    Qua

    lity

    Rat

    io

    Capo v.8.5 Dragon v.2.20 mPG v.1.0 mPL v.2.0

    mPL’s QR increases when α is increased from 0 to 0.75%, while for the other three placers, QRs are steadily decreasingAbsolute value of the QRs may not be meaningful, but it helps to identify the technique that works best under each scenario

  • 27

    Wirelength Comparison between State-of-the-Art Placers Using G-PEKU

    The gap between their solutions and the upper bound varies between 79% and 102% in the worst caseAnother validation that there is significant room for improvement for the placement problem

    circuitDragon v.2.20

    QRCapo v.8.5

    QRmPG v.1.0

    QRmPL v.2.0

    QRGPeku01 1.98 1.56 1.91 1.69GPeku05 2.01 1.69 1.97 1.83GPeku10 2.02 1.72 1.98 1.94GPeku15 1.99 1.79 1.97 1.97GPeku18 2.02 1.78 1.98 1.98

  • 28

    Significant Opportunity for Better Circuit Placement Algorithms

    Best available placement algorithms can be 80% - 180% away from optimalA significant research and business opportunity

    Use of copper interconnect is equivalent to 30% wirelength reductionOne process generation (e.g. from 0.13um to 0.10um) is equivalent to 30% wirelength reductionBoth requires multi-billion dollar investment!

    Better placement may extend/accelerate Moore’s law by 2-3 generations !

  • 29

    Outline

    Optimality and scalability study of placement problemOur research on large-scale placement problemOur research plan

  • 30

    Overview of UCLA research on Large-Scale Placement Problem

    Multilevel Optimization --- a highly scalable framework suitable for complex constraints

    Hierarchy construction by recursive aggregationIntralevel optimization by various techniquesInterpolation to transfer from level to level

    Two Parallel EffortsmPL: multilevel multiheuristic/hybrid optimizationmPG: multilevel simulated annealing with congestion control and mixed block support

  • 31

    Initial Fine-Grain Problem

    Multilevel Optimization Framework

    Intermediate Level

    Intermediate Level

    Coarse-Grain Problem Direct, nonhierarchical solution

    Aggregate

    Aggregate

    Intermediate LevelRelaxation (Refinement)

    Intermediate LevelRelaxation (Refinement)

    Final Fine-Grain Problem.Thorough Relaxation and

    Detailed Solution

    Aggregate Interpolate

    InterpolateAggregateetc. etc. Interpolate

    Interpolate

  • 32

    Multilevel Methods in Scientific Computation

    Originally developed to solve boundary-value PDE problems on spatial domains

    Discretized elliptic PDE is a structured, positive-definite system of linear equations

    Rapidly extended to problems without physical grids during the last 30 years

    Algebraic Multigrid (AMG)

    Most research has been for continuous models, but recent efforts have targeted large-scale discrete optimization

    Graph/Hypergraph Partitioning, traveling salesman, VLSI Physical Design

  • 33

    Overview of mPL

    Recursive ClusteringVersion 1.0 Edge-Separability (CapForest)Versions 1.1 – 2.0: FirstChoice

    Nonlinear Programming at coarsest level(s)Slot assignment and discrete refinement at all levelsRecent Enhancements to Version 2.0

    AMG-based weighted disaggregationQuadratic relaxation on subsets (QRS) at all levelsDistance-based reaggregation for iterated multilevel flow

  • 34

    Multilevel Placement Study

    CoarseningESC, AMG-based, First Choice Clustering (FC)

    Relaxation (Intralevel Optimization)Interior-point nonlinear programmingk-cycle Goto-style discrete exchangeQuadratic relaxation on subsets (QRS)

    InterpolationDeclustering + partitioning + slot assignmentAMG-style weighted averaging

    Iterated Multilevel FlowRepeated/Recursive V-cycles with distance-based reaggregation

  • 35

    Coarsening by Recursive Aggregation

    Edge-Separability Clustering (ESC) [Cong and Lim, 2000]Use CAPFOREST [Ibaraki and Nagamochi, 1992] to

    estimate all-pairs min-cut q(x,y) in N log N time. Rank pairs by connectivity and area.

    First-Choice Clustering (FC) [Karypis, 1999]Match each vertex with a neighboring vertex with which it

    shares the most total hyperedge weight, subject to area-balance constraints. Clusters are connected components.

    Weighted Aggregation (AMG)Split the nodes into C-points and F-points. Associate each F-

    point with several C-points by weighted average.

  • 36

    Multilevel Placement StudyCoarsening

    ESC, AMG, FC, FC+opt, PD-FCRelaxation (Intralevel Optimization)

    Interior-point nonlinear programmingk-cycle Goto-style discrete exchangeQuadratic relaxation on subsets (QRS)

    InterpolationDeclustering + partitioning + slot assignmentAMG-style weighted averaging

    Iterated Multilevel FlowRepeated/Recursive V-cycles with reaggregation

  • 37

    mPL Coarse-Level Formulation

    Nonlinear-Programming FormulationDirect formulation for the coarse placement problemCells are modeled as circular disks for smoothnessQuadratic wirelength objective on a clique-modelPairwise nonoverlap constraints

    Can accelerate evaluation with adaptation of Fast MultipoleMethod

    Nonuniform sizes are OK, but reshaping is difficult to incorporate efficiently

    Reasonable performance for coarse-level sizes N

  • 38

    Interior-Point-Based Nonlinear Programming

    Analogous to force-directed methods, but with direct handling of nonconvex objectives and constraintsEffective but relatively expensive; affordable only at the coarsest level(s) of mPL 1.0—2.0.Net impact: 15% wirelength improvement overall

    Global cell movement is not scalable to finer levelsPlan: Restrict to cell subsets to produce improved, scalable local relaxations at ALL levelsTo date, we have implemented Linear-Programming-based and

    Quadratic-Programming-based subset relaxations

  • 39

    Goto-based Discrete Relaxation

    Each cell’s optimal location is readily calculated when all other cells are held fixed.Compute a chain A, B, C, D, E, whereB is a randomly selected neighbor of A’s optimal location, etc.Examine all permutations of the chain and take the best one.Problem: the chain is not closed (A is not necessarily near any other cell’s optimal location).

  • 40

    Quadratic Relaxation on Noncontiguous Subsets (QRS)

    Select a subset M of cells to move.M is obtained as segments of length 3 along a DFS vertex traversal of the netlist, where starting the DFS at a vertex connected with largest wirelength.Identify other cells and pads, F, connected to M by nets in

    Decouple the horizontal and vertical problems.

    }.|{ φ≠∩∈= MeEeEM

  • 41

    Solving the subproblem

    Problem formulation (horizontal case):

    Iterative solve the weighted quadratic minimization problem, using the current solution to determine the weight (Gordian-L).

    number. small is , )(||

    1 where

    |)(|))((min )()(

    2

    ε

    ε

    ∑ ∑

    ∈ ∈

    =

    +−−

    eve

    Ee evke

    ke

    vxe

    x

    xvxxvx

    M

  • 42

    Ripple-move legalization [Hur and Lillis, 2000]Calculate a max-gain monotone path on the bin grid

    Because QRS ignores overlap constraints, post-QRS cell swaps are used to remove the overlap.

  • 43

    Impact of QRS RelaxationmPL 1.2 mPL 2.0 (with QRS)2 V-cycles; AMG; No QRS 2 V-cycles; AMG; QRS

    Circuit WL Time WL %improTime Rel. TimeBin sizeibm04 7.20E+06 506 6.83E+06 5.14% 866 1.71 2x2ibm07 1.11E+07 749 1.02E+07 8.11% 1302 1.74 2x2ibm09 1.22E+07 860 1.13E+07 7.38% 1569 1.82 3x3ibm10 1.97E+07 1285 1.91E+07 3.05% 2419 1.88 3x3ibm14 4.22E+07 2524 4.03E+07 4.50% 5846 2.32 3x3ibm16 5.72E+07 4018 5.25E+07 8.22% 16760 4.17 4x4ibm17 7.19E+07 5051 6.78E+07 5.70% 12240 2.42 4x4ibm18 5.63E+07 4743 5.45E+07 3.20% 13507 2.85 4x4

    avg. 5.66% 2.36

  • 44

    Multilevel Placement StudyCoarsening

    ESC, AMG, FC, FC+opt, PD-FCRelaxation (Intralevel Optimization)

    Interior-point nonlinear programmingk-cycle Goto-style discrete exchangeQuadratic relaxation on subsets (QRS)

    InterpolationDeclustering + partitioning + slot assignmentAMG-style weighted averaging

    Iterated Multilevel FlowRepeated/Recursive V-cycles with reaggregation

  • 45

    AMG-style Linear Interpolation

    The inherited position of a cluster component ( ) is determined by several cluster positions, not just its own.

  • 46

    AMG-based Interpolation

    Use the clique-model (graph) to define connectivity weightsFor each FC-cluster, select one node of maximal connectivity as a C-pointEach C-point is placed at its cluster’s position.Each F-point is placed at the weighted average of the C-points and F-points to which it is connectedThe F-points’ positions can be iteratively improved.

  • 47

    Impact of AMG Interpolation1 vcycle 1 vcycle wirelength runtimemPL1.1 mPL1.1 AMG %improved %incr

    Circuit dom. WL Time dom. WL Time by AMGibm04 7.71E+06 261 7.33E+06 427 4.93% 63.60%ibm07 1.18E+07 396 1.12E+07 437 5.08% 10.35%ibm09 1.29E+07 455 1.28E+07 563 0.78% 23.74%ibm10 2.11E+07 661 2.08E+07 744 1.42% 12.56%ibm14 4.54E+07 1982 4.28E+07 2226 5.73% 12.31%ibm16 5.88E+07 3187 5.73E+07 3615 2.55% 13.43%ibm17 8.17E+07 4173 8.08E+07 4416 1.10% 5.82%ibm18 5.81E+07 4051 5.78E+07 4496 0.52% 10.98%

    avg. 2.76% 19.10%

  • 48

    Multilevel Placement StudyCoarsening

    ESC, AMG, FC, FC+opt, PD-FCRelaxation (Intralevel Optimization)

    Interior-point nonlinear programmingk-cycle Goto-style discrete exchangeQuadratic relaxation on subsets (QRS)

    InterpolationDeclustering + partitioning + slot assignmentAMG-style weighted averaging

    Iterated Multilevel FlowRepeated/Recursive V-cycles with distance-based reaggregation

  • 49

    Iterated Multilevel Flow

    Iterated V-Cycles O(logN)) F-Cycle (O(logNlogN))

    Backtracking V-Cycle O(logN)

  • 50

    Adjustable Vertex Affinity for Reggregation

    Initially, affinity is connectivity + area balancing

    Subsequently, distance is incorporated to retain the information from preceding cycles

  • 51

    Impact of 2nd V-cycleusing proximity and connectivity in the 2nd+ coarsening pass(es)

    (NP+Goto-exchange only; AMG; FC Coarsening)

    Circuit Rel. Time %improvedibm04 1.10 2.32%ibm07 1.40 3.57%ibm09 1.46 4.69%ibm10 1.49 1.44%ibm14 1.28 1.64%ibm16 1.31 2.27%ibm17 1.30 7.43%ibm18 1.26 5.36%

    Avgs. 1.33 3.59%

  • 52

    Summary of Improvements: 12% wirelength reduction

    Achieved using the following techniques (mPL2.0)Coarsening by area-balanced first-choice clusteringRelaxation (Intralevel Optimization)

    Interior-point nonlinear programming at coarsest levelk-cycle Goto-style discrete exchange at every levelQuadratic relaxation on subsets (QRS) at every level

    Interpolation by AMG-style weighted averagingIterated V-cycles with distance-based reaggregation

  • 53

    mPL2.0 vs. mPL1.1, Capo8.5, Dragon and Gordian-L

    Circuit mPL1.0 Capo8.5 Dragon Gor-L mPL1.0 Capo8.5 Dragon Gor-Libm04 1.12 1.07 0.93 1.00 0.30 0.51 2.91 1.82ibm07 1.16 1.14 0.97 1.07 0.30 0.54 2.98 3.37ibm09 1.14 1.12 1.01 1.04 0.29 0.59 4.75 4.31ibm10 1.11 1.09 0.98 0.98 0.27 0.49 4.20 5.84ibm14 1.13 1.05 0.92 1.01 0.34 0.47 2.48 6.78ibm16 1.12 1.12 0.95 1.05 0.19 0.22 2.84 4.88ibm17 1.21 1.13 1.00 1.00 0.34 0.33 5.27 8.04ibm18 1.07 1.06 0.92 0.99 0.30 0.31 4.34 9.56

    Averages 1.13 1.10 0.96 1.02 0.29 0.43 3.72 5.58

    Wirelength/mPL2 CPU time/mPL2Uniform-Cell-Size IBM/ISPD 98 Circuits

  • 54

    mPL2.0 vs. mPL1.1, Capo8.5, Dragon and Gordian-L

    0.00

    1.00

    2.00

    3.00

    4.00

    5.00

    6.00

    0.95 1.00 1.05 1.10 1.15

    scaled wirelength

    scal

    ed ru

    ntim

    e mPL2.0mPL1.1Capo8.5DragonGordian-L

  • 55

    Net Impact of ImprovementsFC, QRS relax, AMG, 2 V-cycles

    mPL2.0-Dom. mPL2.0-Dom.vs. mPL 1.1-Dom.

    Circuit WL CPU WL CPUibm04 6.83E+06 866 0.89 3.32ibm07 1.02E+07 1302 0.86 3.29ibm09 1.13E+07 1569 0.88 3.45ibm10 1.91E+07 2419 0.91 3.66ibm14 4.03E+07 5846 0.89 2.95ibm16 5.25E+07 16760 0.89 5.26ibm17 6.78E+07 12240 0.83 2.93ibm18 5.45E+07 13507 0.94 3.33

    Average 0.88 3.52

  • 56

    Comparison of mPL on PEKO using PEKO Suite-I and Suite-II

    Comparison with the optimal

    0.00

    0.50

    1.00

    1.50

    2.00

    0 100000 200000 300000 400000 500000 600000

    #cells

    Mul

    tiple

    of o

    ptim

    al

    m PL v.1.2 m PL v.2.0

    Total runtime

    0

    5000

    10000

    15000

    20000

    0 100000 200000 300000 400000 500000 600000

    #cells

    runt

    ime(

    s)

    mPL v.1.2 mPL v.2.0

    mPL v.2.0 improves the Quality Ratio by 8% on the averagemPL v.2.0 increases the runtime by 131% on the average

  • 57

    Multilevel SA-based Coarse Placement: mPG

    Multi-level coarse placement for physical hierarchy generation

    A multi-level SA-based frameworkSupport mixed-size large scale global placementSupport routing congestion controlIntegrate retiming with placement

  • 58

    Summary

    There is significant opportunity for circuit placement improvement

    Possibly equal to several process generation advancement

    Multilevel method is a promising approach to large-scale circuit placement

  • 59

    Our Research Objectives

    Fast, scalable, superior-quality placements by multilevel optimization

    Aim at 20-30% improvement (= 1 technology generation!) in 3 yearsSupport

    large-scale mixed-size placement problemConstraint-driven placement for delay, routing etc.Incremental placement to support dynamic netlistadjustment.

  • 60

    Related PublicationsT. Chan, J. Cong, T. Kong and J. Shinnerl, “Multilevel Optimization for Large-scale Circuit Placement,” ICCAD 2000. Tim Kong. Novel Techniques for Large-Scale Circuit Placement. Ph.D. Thesis, CS Dept., UCLA 2002.Chin-Chih Chang, Jason Cong, David Pan, Xin Yuan. “Physical Hierarchy Generation with Routing Congestion Control”, ISPD 2002.Chin-Chih Chang, Jason Cong, Min Xie. “Optimality and Scalability Study of Existing Placement Algorithms.” ASP-DAC, 2003.Chin-Chih Chang, Jason Cong, Xin Yuan. “Multi-level Placement for Large-Scale Mixed-Size IC Designs”, ASP-DAC 2003. Cong and Shinnerl (editors). Multilevel Optimization and VLSICAD. Kluwer Academic Publishers, 2002.

  • 61http://www.wkap.nl/prod/b/1-4020-1081-8