MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs

Audio Visual Hints

MAPLE: Multilevel Adaptive PLacEmentfor Mixed-Size Designs

Myung-Chul Kim, Natarajan Viswanathan, Charles J. Alpert, Igor L. Markov, Shyam Ramji 1ISPD 2012, Myung-Chul Kim, University of Michigan Dept. of EECS, University of MichiganIBM Corporation1Motivation: Interconnect-driven PlacementInterconnect lagging in performance while transistors continue scalingCircuit delay, power dissipation and areadominated by interconnectRouting quality highly controlled by placement

Interconnect-driven placement remains one of the most influential optimization in physical designThe choice of the wirelength-driven placement engine is paramount even in multi-objective placement

2ISPD 2012, Myung-Chul Kim, University of Michigan

UnloadedCoupling IR dropRC delay

--Global placement is one of the first steps in the VLSI design flow that determines the physical layout of a chip. Circuit Placement becomes critical for todays high performance VLSI design.--The first reason is that interconnect is lagging in performance while transistors continue scaling. The circuit delay, power dissipation, area are increasingly dominated by the interconnections and given that the quality of the attainable routing is highly determined by the placement, global placement has significant impact on the final performance of a design.

Large-scale placement remains one of the most inuential optimizations in interconnect-driven physical design and physical synthesis [3]. Despite the long history of research, three ISPD contests on placement have shown that recent algorithms achieve sizable gains over prior state of art [22].

Large-scale placement remains one of the most influential optimizations in interconnect-driven physical design and physical synthesis[3]. Despite the long history of research, three ISPD contests on placement have shown that recent algorithms achieve sizable gains over prior state of art [22]. The ISPD 2011 routabilitydriven placement contest [30] has demonstrated that the choice ofthe wirelength-driven global placement engine is paramount even in multi-objective placement two of the top three teams reliedon the high-quality SimPL framework [18], including the contest winners, who reimplemented SimPL without having access to theoriginal source code [12]. Yet, no placer dominated across the entire benchmark set, indicating possible improvements. Such improvements are described in this paper, although our work is orthogonal to and compatible with the innovations developed for theISPD 2011 contest [12, 13, 17].2Placement FormulationObjective: Minimize estimated wirelength (Half-Perimeter WireLength)

Subject to constraints:Legality: Row-based placement with no overlaps

Routability: Limiting local interconnect congestion for successful routing

Timing: Meeting performancetarget of a design

3

--The typical objective function of placement is to minimize total wirelength of a design. This is because wirelength can be easily modeled and serve as a good first-order approximation of real objective functions such as timing, power, and area of a design. However, in the course of achieving this goal, the optimization subject to several constraints such as legality, routability, or timing.

--First legality. Quadratic or analytical placements can relax non-overlapping constraints. Resulting illegal placements often have cells placed between rows or on the top of other cells, requiring legalizer that can assign cell to row.--Second routability constraints. The issue /of reducing excessive congestion in local regions such that the router can finish the routing successfully is becoming another important problem in physical synthesis. There is no point in producing an unroutable placement solution. The ISPD 2011 routabilitydriven placement contest [30] has demonstrated that the choice of the wirelength-driven global placement engine is paramount even in multi-objective placement

--Third timing constraints. Even though, producing a good placement wirelength is critical for timing closure of modern designs, it is not directly targeting for performance. [[Often it is necessary to reduce interconnect in critical paths]], so as to meet the performance target of a design.3PerspectivesComparisons and trade-off between linear and quadraticwirelength functionsIs there a tangible gap between B2B net model and HPWL objective in practice?Can quadratic optimization with linear net model be effectively improved on multi-million gate netlists?Is multilevel placement optimization compatible with B2B net model and competitive in performance ?Methodology for module spreading and handling of whitespaceThe composition of multiple optimizations into a high-precision, reliable multi-objective optimization process

4ISPD 2012, Myung-Chul Kim, University of MichiganIn this work, we try to provide answers to three recurring theme in physical design and physical synthesis. First, comparisons. Since 1960s, it was known that the quadratic optimization is computationally efficient, but did not properly track routing demands, which is much closer to the HPWL objective and its weighted variant. In the early 90s, a linearization technique is proposed but the modeling of multi-pin nets remained inaccurate, and the research community has largely replaced quadratic optimization by non-convex optimization techniques. In the mid-2000s, Bound2Bound model is proposed in Krafterwer2 by Spindler, which considerably improve the modeling accuracy of multi-pin nets. With additional improvements to flat quadratic placement, this technique has recently outperformed prior art in both runtime and quality, both in terms of HPWL and in routability-driven placement. This development raised several key research question. Our work answers these questions in the affirmative as will be shown.The second theme addressed in our work is relatively new. Until the late 1990s, whitespace was rare in IC layouts, but now can reach over 60% by area [22]. We develop efficient techniques for spreading modulesduring placement, while optimizing HPWL beyond the accuracy of the Bound2Bound model. This consideration is essential not only to global placement, but also to buffer insertion, gate sizing and congestion-driven placement.

The third fundamental theme explored in our work is ---.Our key discovery is that transitions between multiple objective functions and optimization techniques in placement often lead to major disruptions.

Ultimately, in this work, by providing our answers to these questions, we try to push the performance envelop of state-of-the-art placement algorithms and achieve significant improvement both in wirelength and quality of spreading. 4Key features of MAPLEA multilevel force-directed placement algorithmThe coarsest level placement a variant of SimPLMultilevel extensions reinforced by Progressive Local Refinement (ProLR)Techniques to avoid or suppress disruptions inherent in analytic placement algorithmsAdaptive to current placements relying on a new placement density metric ABUHandling of movable macros

MAPLE produces strong results both in wirelength and the quality of spreading on standard benchmarks

5ISPD 2012, Myung-Chul Kim, University of MichiganMAPLE is a multilevel force-directed placement algorithm that pioneers key algorithmic components and an effective way of combining them. MAPLE generates the coarsestlevel placement by a variant of the SimPL algorithm but also employs multilevel extensions reinforced by our new Progressive Local Refinement (ProLR).We study of obstacles to extending analytic placement with multilevel techniques. We observe that straightforward extensions cause disruptions between successive optimizations during global placement.So in our work, a key insight is to ensure graceful transitions between optimizations at different cluster levels. Optimization adapts to current wirelength/density trade-offs, which we track by a newly developed metric called ABU. MAPLE handles.Compared to recent literature, our implementation produces superior solution quality with reasonable runtimes.

Before I describe the algorithm in detail, Ill first explain more about ABUr5A Placement Density Metric ABU (1)Density metrics during global placementProvide insights into the quality of module spreadingin intermediate placementsEstimate wirelength impact of legality enforcementGlobal placer can adaptively adjust its parameters

ABU: Average Bin Utilization of the top % densest binsReflects the nonuniformity of module distributionMore intuitive than overflow-based metricsEnables comparisons of different parameter settings and even different analytical placers iterations

6ISPD 2012, Myung-Chul Kim, University of MichiganWe explore density metrics during global placement, which provide insights into the quality of module spreading in intermediate placements and estimate wirelength impact of legality enforcement.Based on such a metric, the global placer can adaptively adjust its parameters depending on how concentrated the placement is. To this end, we propose a new density metric, ABU average bin utilization of the top x% densest bins. Given that the only top x% densest bin are averaged, this metric reflects the nonuniformity of module distribution. Compared to overflow-based metrics, ABU provides a more intuitive, crossdesign perspective into the quality of module spreading as it is directly comparable quantity to the target utilization.Monitoring density along with wirelength during placement enables comparisons of different parameter settings and even different analytical placers as shown in the next slide.6A Placement Density Metric ABU (2)Comparisons with different placers speed up new algorithm development 7ISPD 2012, Myung-Chul Kim, University of Michigan

For instance, this plot shows progression of the density metric ABU10 versus wirelength, comparing SimPL lower-bounds (w/ FastPlace-DP) andFastPlace3 on ADAPTEC1. Typical analytical placers start from highly wirelength optimized solution, and decrease the module density over global placement iterations. Therfore, steeper slope and datapoints closer to the origin indicate better trade-offs. Each square box indicates the beginning of detailed placement.

In the next couple of slides,Ill explain the sources of disruptions during analytical placement algorithms.7Analysis of Noise during Analytical Opt. (1)UnclusteringOften include changes to the optimization objectives as well as the netlistWhen wirelength weight is decreased, wirelength and module density sharply change and then refined

Figures are from A. B. Kahng, Q. Wang, Implementation and Extensibility of an Analytic Placer, IEEE TCAD 24(5), 2005 8

IterationsIterationsDiscrepancyHPWLThis figures are quoted from the Aplace paper, Discrepancy is a kind of overflow metic. It is clearly seen from the gures that when wirelength weight is decreased, discrepancy drops sharply and wirelength is often increased at rst and then rened during the optimization

8Analysis of Noise during Analytical Opt. (2)Transition to the HPWL objectiveQuadratic optimization-based placers often use techniques to recover HPWLILR [FastPlace, DPlace2, RQL] increasingly penalize dense bins and allow abrupt moves to decrease local density 9ISPD 2012, Myung-Chul Kim, University of Michigan

FastPlace [28] and RQL [29] use ILR iterations to recover HPWL after quadratic optimization and before detailed placement. ILR iterations include bin resizing over wide ranges to allow large moves across the placement regionIn ILR, each bin maintains a bin-specific utilization weight 0 2% on average1.13x, 2.28x faster than mPL6, APlace2, and2.32x, 6.25x, 7.14x slower than NTUPlace3, FastPlace3, SimPLMAPLE is compared to other state-of-the art standalone placement algorithms on ISPD 2005 benchmarks. MAPLE found placements with the lowest HPWL for seven out of eight circuits in the ISPD 2005 benchmarks(no parameter tuning to specific benchmarks was employed). The gray shaded entries indicate previous best-published numbers.

22Empirical Validation ISPD 2006 23ISPD 2012, Myung-Chul Kim, University of MichiganMAPLE improves scaled HPWL > 3%Compared to RQL and NTUPlace3, MAPLE achieves lower overflow penalty on average.

We compared MAPLE to other state-of-the-art academic and industry placers on the ISPD 2006 benchmark suite. The table reports scaled HPWL and overflow penalty for several placers.

MAPLE obtains the best scaled HPWL results on seven out of eight circuits. Furthermore, compared to the other two best-performing placers on the benchmarks RQL and NTUPlace3, MAPLE achieves lower overflow penalty on average. Thus, MAPLE not only reduces the wirelength but also avoids highly concentrated placements. 23SummaryNew wirelength-driven global placement algorithm MAPLEEmploys a strong force-directed placer for the coarsest levelMultilevel extensions reinforced by two-tier Progressive Local Refinement (ProLR)Techniques to facilitate graceful transitions betweenmultiple optimizations during global placement

MAPLE is implemented and evaluated under an industry frameworkEmpirical evaluation shows strong results on standard benchmarksMany more applications exist in physical synthesis 24ISPD 2012, Myung-Chul Kim, University of MichiganThe significance of large-scale placement in IC physical design is well-documented in recent literature [3] and is continuing to grow with the amount of on-chip random logic and current trends in interconnect scaling.

Our key discovery is that transitions between multiple objective functions and optimization techniques in placement often lead to major disruptions. To this end, we developed new techniques, such as two-tier Progressive Local Refinement(ProLR), to facilitate graceful transitions between multiple optimizations. In placement, these techniques are applied before and after unclustering, during the transition from a quadratic objective to HPWL, and before detailed placement. Many more applications exist in physical synthesis.24Thank you! 25ISPD 2012, Myung-Chul Kim, University of Michigan

25 26ISPD 2012, Myung-Chul Kim, University of Michigan

As MAPLE is currently slower than some of its competitors, we note that industry implementations like ours tend to be handicapped (versus stand alone academic implementations) by the use of a multipurpose design database. Because such a database stores information unnecessary to placement, the decreased cache locality increases runtime. Other relevant legacy infrastructures in our database include netlist-query support for accurate timing analysis and physical synthesis. In contrast to academic placers, ourindustry-strength implementation can work with a netlist that is dynamically changed during physical synthesis.Unlike the original SimPL, our implementation does not use SSE instructions and is almost twice as slow (so far, we focused on solution quality and not runtime).

26Computation of Initial Step 0stepMAPLE uses a step function that distinguishes different cases(1) emphasis on wirelength optimization(2) no bias(3) emphasis on spreading 27ISPD 2012, Myung-Chul Kim, University of Michigan

When the difference between design utilization (design) and target utilization (target) is small, placement iterations should aggressively reduce density, which is achieved by using a large 0Step (greater emphasis on spreading in LR). On the other hand, a wider gap between the two justifies a greater weight for wirelength, and the best wirelength is often achieved by using a small 0step (greater emphasis on wirelength in LR).Given that design is fixed, the step function only depends on target, which is typically chosen by the designer. 27Prior WorkIdeal PlacerFast runtime without sacrificing solution qualityReasonable runtime with superior solution quality 28ISPD 2012, Myung-Chul Kim, University of MichiganSpeedSolution QualityNon-convex optimizationmFAR, Kraftwerk2, FastPlace3 Ideal placermPL6, APlace2, NTUPlace3Quadratic and force-directed--As prior work, State-of-the-art algorithms for global placement can be categorized into two families (1) Force-directed quadratic placement. mFAR, Kraftwerk2, [[FastPlace3 belong]] to this family (2) Placers based on non-convex optimization, such as mPL6, APlace2, NTUPlace3--Quadratic placers formulate the netlength in a quadratic cost function, which can be minimized quite efficiently by solving systems of linear equations. However, minimizing just the netlength may result in considerable overlaps. Therefore, in force-directed placers, spreading forces are added pulling cells away from high-density regions and discourage cell overlap.--On the other hand, non-convex optimization models net length and cell density by functional terms at the same time then minimized the total cost by using [[numerical analysis methods]], which can be more accurate than forces. However, placers based on this optimization often need high CPU times even with their attempts of reducing complexity by taking A multilevel approach. So Ideal placer should have fast runtime without sacrificing solution quality and also we focus on simplicity and easy integration with other optimization. To this end, we present MAPLE28Chart1

Sheet1 Runtimestop-level placement iterations26.3BestChoice clustering and unclustering0.2ProLR-w32ProLR-d33.4Post Global Placement5.5IO2.6

Sheet2

Sheet3

MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs

Documents

Transcript of MAPLE: Multilevel Adaptive PLacEment for Mixed-Size Designs