BonnCell: Automatic Cell Layout in the 7nm Erahougardy/paper/7nmBonnCell_OR_Report.pdf ·...

14
1 BonnCell: Automatic Cell Layout in the 7nm Era Pascal Cremer, Stefan Hougardy, Jannik Silvanus, and Tobias Werner Abstract—Multi patterning technology used in 7nm technology and beyond imposes more and more complex design rules on the layout of cells. The often non local nature of these new design rules is a great challenge not only for human designers but also for existing algorithms. We present a new flow for automatic cell layout generation that is able to deal with these challenges by globally optimizing several design objectives simultaneously. Our transistor placement algorithm not only minimizes the total cell area but at the same time guarantees the routability of the cell and finds a best arrangement and folding of the transistors. Our routing engine computes a detailed routing of all nets simultaneously. It computes a netlength optimal routing using a mixed integer programming formulation. Additional DFM constraints are added to this model to improve yield and reduce chip manufacturing costs. We present experimental results on current 7nm designs. Our approach allows to compute optimized layouts within a few minutes, even for large complex cells. The algorithms are used for the design of logic cells compatible with a published 7nm technology from a leading chip manufacturer where they meet manufacturability requirements and significantly reduced design turn around times. Index Terms—automated cell generation, cell design, multiple patterning lithography, design for manufacturability I. I NTRODUCTION In a hierarchical design of a complex chip the cells are the functional units at the lowest level of the hierarchy. A cell realizes simple logical functionality as for example an AND-function, a buffer or a latch. These cells are used many times on a chip and therefore much effort is spent to find area optimized transistor level layouts. This reduces the overall chip area and thereby improves chip costs. A standard cell library is a collection of these cells that contains many implementations of the same logic function, differing in the number of inputs, power level, and timing behavior. The total number of cells contained in a standard cell library is in the range between 100 and 2,000 cells. Some applications require custom cells, i.e. cells not contained in the standard cell library. An example is the design of high-speed SRAM, where dynamic logic is used which cannot be mapped to standard logic gates. Lacking high-quality automatic cell layout generators, these custom cells have so far been built manually by experienced designers. The design rules and DFM (design for manufac- turability) requirements become increasingly complex with each new technology and the number of different cells used in modern designs is growing steadily. In addition, time-to- market pressure is increasing. The manual layout of a complex cell can take several days making this process a severe P. Cremer, S. Hougardy, and J. Silvanus are with the Research In- stitute for Discrete Mathematics, University of Bonn, Germany (e-mail: [email protected]). T. Werner is with IBM Systems, B¨ oblingen 71032, Germany (e-mail: [email protected]). Fig. 1. Schematic view of a 7nm layout showing a single finFET and some wiring. Fins (yellow), M0, and M2 are horizontal while PC, TS, and M1 are vertical layers. The diffusion area is denoted by RX. The via layers are CA, V0, and V1. bottleneck in turnaround time, particularly due to increased automation in other design steps. This drives the need for high- quality automatic cell layout generators. In this paper, we present a new flow for the automatic generation of cell layouts, both for placement and routing. Our approach provides solutions that are provably optimal in terms of area consumption and guarantees routability. All cells produced by our algorithm are guaranteed to be LVS (Layout Vs Schematic) and DRC (Design Rule Check) clean. We have implemented cell level design rules consistent with [14] which describes Globalfoundries’ proposed 7nm technology and consistent with the SADP trim shape process as described in [2], [12]. We consider all these design rules already during the placement and the routing phase. This is a crucial requirement for the current 7nm technology and beyond as design rule cleanness can no longer be achieved by local post processing operations alone. We also consider many DFM rules to optimize the yield. In Globalfoundries’ 7nm node all cell internal layers are uni-directional (see Fig. 1). The layers PC and TS are used to contact gates and source/drain contacts of the finFETs. A finFET can have more than one gate. We refer to the number of gates as the number of fingers of the transistor. A transistor is called folded if it has more than one finger. Three additional wiring layers M0, M1, and M2 are used that alternately have horizontal and vertical orientation. As M2 is mainly used for inter cell connections, it should be used for internal cell wiring only if necessary. The wiring layers are connected by via layers CA, V0, and V1, where CA connects both PC and TS with M0, V0 connects M0 with M1, and V1 connects M1 with M2. Different multiple patterning techniques are applied for these layers. SAQP (self aligned quadruple patterning) is used for the fins, SADP (self aligned double patterning) for the metal layers, and up to four times litho-edge for the via layers [26]. The cell layout problem can be described as follows. As This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

Transcript of BonnCell: Automatic Cell Layout in the 7nm Erahougardy/paper/7nmBonnCell_OR_Report.pdf ·...

  • 1

    BonnCell: Automatic Cell Layout in the 7nm EraPascal Cremer, Stefan Hougardy, Jannik Silvanus, and Tobias Werner

    Abstract—Multi patterning technology used in 7nm technologyand beyond imposes more and more complex design rules on thelayout of cells. The often non local nature of these new designrules is a great challenge not only for human designers but alsofor existing algorithms. We present a new flow for automaticcell layout generation that is able to deal with these challengesby globally optimizing several design objectives simultaneously.Our transistor placement algorithm not only minimizes the totalcell area but at the same time guarantees the routability of thecell and finds a best arrangement and folding of the transistors.Our routing engine computes a detailed routing of all netssimultaneously. It computes a netlength optimal routing usinga mixed integer programming formulation. Additional DFMconstraints are added to this model to improve yield and reducechip manufacturing costs.

    We present experimental results on current 7nm designs. Ourapproach allows to compute optimized layouts within a fewminutes, even for large complex cells. The algorithms are usedfor the design of logic cells compatible with a published 7nmtechnology from a leading chip manufacturer where they meetmanufacturability requirements and significantly reduced designturn around times.

    Index Terms—automated cell generation, cell design, multiplepatterning lithography, design for manufacturability

    I. INTRODUCTION

    In a hierarchical design of a complex chip the cells arethe functional units at the lowest level of the hierarchy. Acell realizes simple logical functionality as for example anAND-function, a buffer or a latch. These cells are used manytimes on a chip and therefore much effort is spent to find areaoptimized transistor level layouts. This reduces the overall chiparea and thereby improves chip costs. A standard cell library isa collection of these cells that contains many implementationsof the same logic function, differing in the number of inputs,power level, and timing behavior. The total number of cellscontained in a standard cell library is in the range between 100and 2,000 cells. Some applications require custom cells, i.e.cells not contained in the standard cell library. An example isthe design of high-speed SRAM, where dynamic logic is usedwhich cannot be mapped to standard logic gates.

    Lacking high-quality automatic cell layout generators, thesecustom cells have so far been built manually by experienceddesigners. The design rules and DFM (design for manufac-turability) requirements become increasingly complex witheach new technology and the number of different cells usedin modern designs is growing steadily. In addition, time-to-market pressure is increasing. The manual layout of a complexcell can take several days making this process a severe

    P. Cremer, S. Hougardy, and J. Silvanus are with the Research In-stitute for Discrete Mathematics, University of Bonn, Germany (e-mail:[email protected]).

    T. Werner is with IBM Systems, Böblingen 71032, Germany (e-mail:[email protected]).

    Fig. 1. Schematic view of a 7nm layout showing a single finFET and somewiring. Fins (yellow), M0, and M2 are horizontal while PC, TS, and M1 arevertical layers. The diffusion area is denoted by RX. The via layers are CA,V0, and V1.

    bottleneck in turnaround time, particularly due to increasedautomation in other design steps. This drives the need for high-quality automatic cell layout generators.

    In this paper, we present a new flow for the automaticgeneration of cell layouts, both for placement and routing.Our approach provides solutions that are provably optimalin terms of area consumption and guarantees routability.All cells produced by our algorithm are guaranteed to beLVS (Layout Vs Schematic) and DRC (Design Rule Check)clean. We have implemented cell level design rules consistentwith [14] which describes Globalfoundries’ proposed 7nmtechnology and consistent with the SADP trim shape processas described in [2], [12]. We consider all these design rulesalready during the placement and the routing phase. This isa crucial requirement for the current 7nm technology andbeyond as design rule cleanness can no longer be achievedby local post processing operations alone. We also considermany DFM rules to optimize the yield.

    In Globalfoundries’ 7nm node all cell internal layers areuni-directional (see Fig. 1). The layers PC and TS are usedto contact gates and source/drain contacts of the finFETs. AfinFET can have more than one gate. We refer to the numberof gates as the number of fingers of the transistor. A transistoris called folded if it has more than one finger. Three additionalwiring layers M0, M1, and M2 are used that alternately havehorizontal and vertical orientation. As M2 is mainly used forinter cell connections, it should be used for internal cell wiringonly if necessary. The wiring layers are connected by via layersCA, V0, and V1, where CA connects both PC and TS withM0, V0 connects M0 with M1, and V1 connects M1 with M2.Different multiple patterning techniques are applied for theselayers. SAQP (self aligned quadruple patterning) is used forthe fins, SADP (self aligned double patterning) for the metallayers, and up to four times litho-edge for the via layers [26].

    The cell layout problem can be described as follows. As

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 2

    0 1 2 3 4 5 6 7 8

    (a) Placement containing power bus (green), PC (blue), TS (brown),and FET boundaries (black). FETs are arranged in two horizontal stackscontaining five FETs each.

    0 1 2 3 4 5 6 7 8

    (b) DRC clean routing with optimal manufacturability and wire length.

    Fig. 2. Example results after (a) placement, (b) routing and DFM post processing.

    input an image of the cell is given, i.e. an area with predefinedhorizontal power tracks at the top and bottom of the cell,equidistant vertical tracks for PC, TS, and M1, fin positions,and (not necessarily equidistant) horizontal tracks for M0 andM2. The finFETs, partitioned into p-FETs and n-FETs, haveto be placed in two horizontal stacks in between the powertracks (see Fig. 2a). The electrical connectivity of the FETsand their sizes is described in a netlist. The task is to decidehow many fingers a FET should use and to assign a locationto each FET. Both choices are subject to the design rulesand DFM constraints. Here, the width of the cell is the mostimportant optimization criterion as this determines the area ofthe cell on the chip. Given a placement of FETs, the goalin routing is to find an embedding of rectilinear Steiner treeswhich realizes the given netlist. This has to be done meetingthe design rules and DFM constraints, as well. As the overallgoal in routing, we minimize weighted net length, with thetopmost available layer M2 being more expensive than otherlayers. Other objectives (e.g. pin accessibility [25], numberof vias, or electromigration reliability [11], [17], [26]) can beincluded as well.

    A crucial point for the placement algorithm is to guaran-tee routability without making pessimistic assumptions. Weachieve this by fully integrating the routing algorithm into ourplacement algorithm. This way we are able to find area mini-mum placements which are guaranteed to be routable. Amongall such placements, our algorithm finds one that minimizesthe estimated weighted netlength, or if more running time isallowed, it even guarantees to find an area optimum solutionthat minimizes the weighted netlength.

    Simple sequential rip-up and reroute approaches turned outto fail for most of our placements. Moreover, these approachesdo not allow to guarantee the routability of a given placement.Instead, we use an approach that allows us to route all netssimultaneously and consider all design rules already whilebuilding up the nets. The latter is required because only fewdesign rule violations can be fixed after routing due to thelimited space of our compact placements. Our router respectsDFM constraints in the post processing phase. We also havesuccessfully extended our approach to multi-row cells (seeSection II-K).

    A. Related Work

    Most previous work on cell layout only focuses on subprob-lems or restricted versions of the general cell layout problemand is not directly applicable to 7nm layouts. Moreover, designrules and DFM requirements are more restrictive in 7nm thanin previous 14nm/15nm [13] and 10nm finFET technologynodes.

    Many different approaches have been suggested for thetransistor placement problem. In [1] and [10] combinatorialalgorithms are presented that optimize the cell area but assumea given transistor folding. The authors of [6] use an integerprogramming approach that optimizes cell area and includestransistor folding but they assume the possibility to pair n-FETs and p-FETs. In [19] a branch and bound approach isused for transistor placement that allows to optimize additionalobjectives, but transistor folding is not considered.

    The placement algorithm has to consider the complicateddependencies between positions of n-FETs and p-FETs thatarise due to the design rules of the SADP trim mask forthe FEOL (front end of line) layers. Several approaches tohandle SADP in automated design have been suggested, e.g.[24], [26]. Our approach differs in that we allow variablegate widths during trim mask generation and guarantee to findvalid solutions if existent. Moreover, our placement algorithmguarantees routability by applying the routing algorithm whileconstructing a placement. To the best of our knowledge,our placement algorithm is the first to guarantee a routableplacement without wasting cell area.

    For intra cell routing many different approaches are known.Traditional channel routing [22] and simple rip-up and reroutestrategies [15] fail in recent technologies. More successful areSAT-based [18] and the closely related integer-programmingbased [25] approaches. In these approaches, a set of candidatesolutions is generated for each net. A SAT-formulation or aninteger linear program is used to select one realization for eachnet so that all design rules are met.

    Our routing approach is also based on an integer program-ming formulation. However, we do not need to pre-computecandidate solutions for each net but instead implicitly generateall possible routings for all nets simultaneously while packing

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 3

    postoptimizerouting

    routePoptimizingnetlength

    Yes

    No

    increasecellwidth

    storePifitimprovesthebestknownsolution

    FoundroutableplacementP?

    GenerateallextensionsofPbyasingleFET

    andaddtoS

    SelectapartialplacementPfromS

    IsProutable?

    IsPfullyplaced?

    IsSempty?

    No

    Yes

    No Yes

    YesNo

    endoutputsolution

    Placement Routing

    discardP

    startcellwidth=1

    InitializethesetSofpartialplacementsbyinsertingtheemptyplacementintoS

    Fig. 3. Chart of our proposed new flow. Routing steps are in blue. Details of the branch and bound placement algorithm are given in Sections II-D to II-J.Note that the placement algorithm involves queries to a routability oracle which is a key feature of our new flow.

    them. Using this approach we do not have to restrict thecandidate solutions in advance (e.g. by restricting them to liewithin some bounding box around the terminals) but are ableto consider all possible routings. While this makes the searchspace much larger, our well chosen integer programmingformulation turns out to be quickly solvable by standard MIPsolvers. In [9] a conversion from an integer programmingformulation to a SAT-formulation is used to speed up solutiontimes.

    Only few approaches exist so far that simultaneously solvethe placement and routing problem for a cell. One suchexample is [8]. However, all these approaches have to simplifythe placement and/or routing problem to reduce the searchspace complexity. Our approach does not require any sim-plifications and therefore is able to guarantee optimality ofsolutions without requiring too much running time. Typically,our approach is able to place and route cells with up to 15FETs within a few minutes.

    Previous work on BonnCell appeared in [3]. Compared tothis earlier version, many improvements have been made. Mostimportantly, our placement algorithm now, for the first time,allows computing routable layouts minimizing total cell area.

    Our new flow is depicted in Fig. 3, the details will bedescribed in the following sections: In Section II, we discussthe placement algorithm, followed by the routing algorithm inSection III. Section IV reports the results of our implementa-tion on cells at the 7nm technology node.

    II. PLACEMENTA. Problem Definition

    The input of the cell placement problem is a set F ofFETs, a set N of nets and a large number of technology-

    specific constraints. A FET is characterized by a tuple(Smin, Smax, Ng, Ns, Nd, t, v), where• [Smin, Smax] ⊆ N2 is the legal size interval of the FET

    measured in the number of fins intersected by the gates,• Ng, Ns, and Nd are the nets connected to gate, source,

    and drain, respectively,• t ∈ {n-FET, p-FET} is the FET’s type, and• v ∈ N is its VT level.A legal realization of a FET must intersect S ∈ [Smin, Smax]

    fins. By using an interval for possible FET sizes, we allowoptimizations over the size of the FET, if the design allowsthe flexibility. Otherwise, an interval containing a single FETsize can be specified. We allow each FET to be folded, i.e.allow realizations with different numbers of fingers. Therefore,solving the placement problem does not only include theassignment of locations to each transistor but also decidinghow large the FET should be exactly and how many fingersshould be used. The total size S of a FET can be distributedto several fingers. Using only one finger, the FET is realizedwith one gate, intersecting S fins. Using a larger number offingers, the FET is realized with several gates, located next toeach other, which in total intersect S fins. If, for example, thesize of a FET is 6 (measured in fins) it can be realized with 1,2, 3, and 6 fingers, each covering 6, 3, 2, and 1 fin respectively.The number of fins intersected by a single finger is called theheight of a FET. Depending on the used cell image, some FETheights can be forbidden, e.g. most images do not allow FETheights of 1 and there is also an upper bound on the allowedheight. A FET realized with f fingers has f gates and f + 1source and drain contacts. A FET with several fingers connectssource and drain nets alternately. The placement algorithm isalso allowed to swap FETs. In this case, the source and drain

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 4

    Ns Nd

    Ng

    Ns Nd

    Ng

    Ns

    Ng

    Nd Ns

    Ng

    Nd

    Ng

    Fig. 4. A FET of size 4 realized with 1 finger, 2 fingers, and 2 fingersswapped. The heights are 4, 2, and 2 fins respectively. Gates are shown inblue, source and drain contacts in gray.

    contacts of the FET exchange their places. Fig. 4 shows thesame FET realized in three different ways. The transistors arearranged in two horizontal stacks, one next to each power rail.One stack consists of the cell’s n-FETs and is placed directlynext to the lower power rail, whereas the other stack containsthe p-FETs and is placed directly next to the upper power rail.

    Definition 1. The configuration c of a FET is defined as thetuple (x, f, h, s) with• location x ∈ N (measured in PC tracks),• finger number f ∈ N>0,• height h ∈ N>0 (measured in fin intersections per finger),

    and• swap status s ∈ {true, false}.

    For s = false the leftmost contact belongs to the source net,for s = true it belongs to the drain net. Given a configurationc the terms x(c), f(c), h(c), and s(c) give the location, fingernumber, height, and swap status of c respectively.

    Definition 2. Given n FETs F1, . . . , Fn, a placement C isdefined as a tuple (c1, . . . , cn), where ci is a configuration forFET Fi.

    The output of the placement algorithm is a placement C andthe routability guarantee (see Section II-F) of the placement.This information is then passed to the routing algorithm (seeSection III).

    B. Design Rules

    There are many design rules which have to be obeyed. Wesplit them into two sets, legality and routability.

    Legality. The cell image guarantees that FETs do not overlapvertically. In horizontal direction FETs need to stay within thecell image which is given by the cell width Wcell. They alsoneed to obey some minimum horizontal distance to each otherdepending on their configurations. This rule only applies toneighboring FETs. Two FETs of a placement are neighbors ifthere is no other FET in between.

    Two neighboring FETs are allowed to share contacts if theyhave the same VT level, same height, and the overlappingsource and drain contacts belong to the same net. Moreformally, this is captured by the following definition.

    Definition 3. Two neighboring FETs F1, F2 with configura-tions c1 = (x1, f1, h1, s1), c2 = (x2, f2, h2, s2) and x2 > x1are allowed to share if• h1 = h2,• NR(F1, c1) = NL(F2, c2), and

    F1 F2 F3 F4

    Fig. 5. Illustration of placement legality rule. FETs F1 and F2 have the sameVT level, same height and same contact net facing each other. Thus they areallowed to share their diffusion regions. FETs F3 and F4 have different heightand must therefore be separated by two empty PC tracks.

    • V T (F1) = V T (F2),

    where NL(F, c) denotes the leftmost net of FET F in configu-ration c. Similarly NR(F, c) denotes the rightmost net of FETF in configuration c. V T (F ) denotes the VT level of F . Inthis case the configuration is legal if x2 ≥ x1 + f1. Otherwiseit is legal if x2 ≥ x1 + f1 + 2.

    For sharing FETs the diffusion regions overlap and thecontact is used simultaneously by both FETs. If sharing is notallowed, the FETs must be separated by at least two emptyPC tracks. Fig. 5 gives an illustration of this rule. Placementsthat obey this rule are called legal.

    Routability. A legal placement might still not be manufac-turable. There are many more complicated rules. For example,all gates are manufactured with self aligned double patterning(SADP). In the first step, a regular pattern of unidirectionalpoly shapes is generated. In the second step, these shapes arecut off by a trim mask, leaving the desired gates. Not all legalplacements admit a legal layer decomposition. Furthermore,the placement is useless if it is not routable on the metal layers.Therefore a full routability check needs to decide whether aplacement is usable or not.

    The placement legality rules will be obeyed by our place-ment algorithm by construction. Routability rules are checkedby using the routing algorithm as described in Section III asa black box oracle. Only routable placements are returned byour algorithm.

    C. Objective Function

    We only consider placements which are routable underconsideration of all LVS and DRC rules. We simply call theseplacements routable. There may be many routable placementsfor a given instance, some of which are more preferable thanothers. For a routable placement, we define its objective valueas the tuple consisting of

    1) cell width Wcell,2) weighted netlength estimation,

    which is minimized lexicographically. More precisely, thealgorithm returns a placement which respects all design rulesand additionally globally minimizes the cell width Wcell. Ifthere are several such placements the algorithm chooses aplacement among them with the minimum weighted netlengthestimate.

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 5

    D. Placement Algorithm

    Our algorithm is capable to realize multi-row cells. Thesecells occupy multiple circuit rows and have several pairs ofstacks placed upon each other. For the moment, we focuson single-row instances with two stacks, more details on ourmulti-row implementation will be given in Section II-K.

    The placement algorithm is based on a branch and boundapproach. It consists of two parts, the first (Algorithm PLACE-CELL) iterates over the cell width in increasing order and callsAlgorithm PLACEFIXEDCELLWIDTH for each fixed width.Algorithm PLACEFIXEDCELLWIDTH solves the placementproblem and returns an optimum placement or that no routableplacement exists with the given cell width. It proceeds byplacing the FETs iteratively from left to right for both stackssimultaneously. After some FETs have already been placed,the next FET is chosen from the remaining FETs and placed tothe right of the already placed FETs on this stack. For this FETall possible configurations (position, number of fingers, height,swap status) are tried. The resulting search tree is illustratedin Fig. 6. For some FET configurations, the resulting partialplacement is illegal and discarded directly. The remainingplacements are legal but might not be routable. Therefore,routability is checked for each of these placements by a call ofthe routing algorithm (Section III). Since we iterate over thecell width in increasing order, we know that the first foundroutable placement has minimum cell width. Netlength, thesecondary objective, is optimized by comparing all routableplacements with minimum width.

    The algorithm presented above is very simple and canquickly be implemented. However, the resulting search treebecomes infeasibly large, even for small cells. In order toachieve practical running time, we develop several speed uptechniques, presented in Sections II-E to II-J.

    Algorithm 1 PLACECELLInput:F . FETs to be placed

    Output:Routable placement with minimum width

    1: for Wcell := 1, 2, . . . do2: SETCELLWIDTH(Wcell)3: P := PLACEFIXEDCELLWIDTH(F , ∅)4: if P 6= null then5: return P6: end if7: end for

    E. Cell Width Pruning

    The idea of pruning in any branch and bound algorithm isto remove infeasible or non-optimal nodes of the search treewithout visiting them. For a given node, we want to detectsituations in which all of its ancestors are either not routable ornot optimal. In cell width pruning this means that given somepartial placement with configurations for FETs F1, . . . , Fk, weprove that for any configuration of the remaining FETs theresulting placement is illegal. Note that at this point we are

    empty placement

    FET 1 FET 2 FET 3

    1 finger 2 fingers

    height 3 height 4

    unswapped swapped

    x = 0 x = 1 x = 2

    FET 2 FET 3

    ...

    full placement

    Fig. 6. Placement search tree. FETs are placed from left to right in all possiblepermutations and all configurations.

    Algorithm 2 PLACEFIXEDCELLWIDTHInput:F . all FETsFp . placed FETs with applied configuration

    Output:Routable placement with given widthor null if no such placement exists.

    1: if Fp = F then . all FETs placed2: P := CURRENTPLACEMENT(Fp)3: if ISROUTABLE(P ) then4: return P5: else6: return null7: end if8: end if9: Pbest := null . null indicates no placement

    10: for F ∈ F \ Fp do11: for c ∈ LegalConfigurations(F ) do12: ApplyConfiguration(F, c)13: P := PLACEFIXEDCELLWIDTH(F ,Fp ∪ {F})14: Pbest ← BEST(Pbest, P )

    . minimum w.r.t. objective value (see Section II-C)15: end for16: end for17: return Pbest . might be null

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 6

    Fig. 7. Partial placement with three placed FETs. The remaining FETs canonly be placed in the green area to the right of the placed FETs.

    inside the cell width loop. Therefore, the cell width is alreadyfixed, forcing all FETs to some limited area. Furthermore,since FETs are placed from left to right the only availablespace is to the right of all already placed FETs, see Fig. 7.We want to decide whether the remaining space suffices toplace all remaining FETs. We run this step independentlyon both stacks since the FET sharing rules do not imposeany constraints for FETs from different stacks. Since the FETsharing rules only apply to FETs which are direct neighbors,only the rightmost placed FET and the width of the remainingarea are relevant. Note that our goal here is merely to prunesubtrees which would violate the FET sharing rules. This byitself does not guarantee that the remaining placements willbe routable which will be checked at a different point of thealgorithm.

    Our cell width pruning is based on Euler chains whichhave already been used in previous work on automated celldesign [20]. As we will see, the Euler chains method requiresrestrictions on the search space of all possible configurationsof yet to place FETs. We iterate over all possible restrictionsand apply the method for each one of them. If there is no legalplacement for any possible restriction, we know that there isno legal placement at all. If, on the other hand, we find a legalplacement obeying some restrictions we keep the node in oursearch tree. These restrictions are defined as follows.

    Definition 4. A restriction r for a FET F is defined as a tuple(f, h, s), where• f denotes the number of fingers,• h the height, and• s the swap status.

    A FET configuration c = (x, f, h, s) obeys the restrictionr = (f ′, h′, s′), if• f = f ′,• h = h′, and• f is even ⇒ s = s′.

    Definition 5. A placement C = (c1, . . . , cn) obeys therestriction R = (r1, . . . , rn) if ci obeys ri for all i.The set of all legal placements obeying R is defined asC(R) := {C : C legal, obeying R}.

    Note that for an odd number of fingers, the swap status isnot controlled by a restriction.

    Algorithm CELLWIDTHPRUNING is presented below. Thefunction RESTRICTIONS(F) returns the set of all possible re-strictions R, s.t. a placement C obeying R exists. The functionMINWIDTH(F , CL, R) determines the minimum width of a

    placement of F obeying the restriction R with leftmost placedFET CL. MINWIDTH uses a graph model and Euler walks.We will show how to implement MINWIDTH(F , CL, R) forCL = null. CL = null means that no FET has been placed onthis stack yet. This algorithm can then easily be extended forCL 6= null.

    Algorithm 3 CELLWIDTHPRUNINGInput:F . FETs to be placedCL . null or fixed leftmost FETWmax . Maximum allowed placement width

    Output:Does a placement with width at most Wmax exist?

    1: for R ∈ RESTRICTIONS(F) do2: if MINWIDTH(F , CL, R) ≤Wmax then3: return true4: end if5: end for6: return false

    In the following, we will develop the MINWIDTH algorithmand prove its correctness. We start with a formal definition ofthe width of a placement.

    Definition 6. The width of a placement C = (c1, . . . , cn) isdefined as

    W (C) := maxi

    (x(ci) + f(ci))−minix(ci). (1)

    This allows us to formulate our central lemma.

    Lemma 1. Let R be a placement restriction. Then,

    minC∈C(R)

    W (C) = minC∈C(R)

    (n∑i=1

    f(ci) + 2Nno-share(C)

    ), (2)

    where Nno-share(C) is the number of neighboring FETs in Cwhich are not allowed to share.

    Proof. “≥”: For a given legal configuration C, assume w.l.o.g.that the indices are sorted s.t. x(ci) < x(cj) for i < j. Let

    G(ci, ci+1) := x(ci+1)− x(ci)− f(ci) (3)

    be the gap between configurations ci and ci+1. Then, we have

    W (C) = x(cn) + f(cn)− x(c1) (4)

    =n−1∑i=1

    (x(ci+1)− x(ci)) + f(cn) (5)

    =n−1∑i=1

    (G(ci, ci+1) + f(ci)) + f(cn) (6)

    =n∑i=1

    f(ci) +n−1∑i=1

    G(ci, ci+1). (7)

    If Fi and Fi+1 are allowed to share we have G(ci, ci+1) ≥ 0,otherwise G(ci, ci+1) ≥ 2. This implies

    W (C) ≥n∑i=1

    f(ci) + 2Nno-share(C). (8)

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 7

    From Eq. (8) we immediately get

    minC∈C(R)

    W (C) ≥ minC∈C(R)

    (n∑i=1

    f(ci) + 2Nno-share(C)

    ). (9)

    “≤”: Equality in Eq. (8) is obtained if all neighboringFETs Fi, Fi+1 have G(ci, ci+1) = 0 if they can shareand G(ci, ci+1) = 2 otherwise. Such a placement canalways be obtained from an existing placement C withW (C) >

    ∑ni=1 f(ci) + 2Nno-share(C) by moving FETs closer

    to each other.Let C be a placement which minimizes

    n∑i=1

    f(ci) + 2Nno-share(C). (10)

    Then we can construct a placement C ′ from C for whichW (C ′) =

    ∑ni=1 f(ci) + 2Nno-share(C). This yields

    minC∈C(R)

    (n∑i=1

    f(ci) + 2Nno-share(C)

    )=W (C ′) (11)

    ≥ minC∈C(R)

    W (C), (12)

    which concludes the proof.

    The left hand side of Eq. (2) is MINWIDTH(F , CL, R)for CL = null. Moreover, in Eq. (2), the sum

    ∑ni=1 f(ci)

    does not depend on C but is equal for all C obeying R.Therefore, in order to determine minC∈C(R)W (C) we onlyneed to determine minC∈C(R) 2Nno-share(C). This is where weuse Euler chains.

    We sort the FETs which are to be placed into differentgroups. FETs within the same group can potentially share ifplaced next to each other. FETs from different groups are neverallowed to share. This reduces the problem to independentgroups.

    FETs are only allowed to share if they have the same VTlevel and height. Therefore, we sort the FETs into groupswith equal VT level and height. There needs to be a gapbetween two FETs of different groups. The number of gapsis minimized if all FETs within one group are placed directlynext to each other. This leaves Ngroup − 1 gaps between thegroups, where Ngroup is the number of groups.

    Within each group, two FETs can share if the overlappingregion belongs to the same net, see Fig. 5. For FETs withan even number of fingers the outward facing nets are equalon both sides. If the FET is unswapped it is the source neton both sides, and the drain net if it is swapped. For fixedswap status for all FETs with an even number of fingers, it isknown for all FETs which nets belong to the outward contacts.Since we only minimize over placements obeying restrictionR, we know which pairs of FETs are allowed to overlap. Theremaining part of the placement configuration space which isnot restricted is the order of FETs and the swap status of FETswith an odd number of fingers. In the following, we presenta linear time algorithm based on Eulerian walk partitioningto find optimum FET orderings and swap states. A similarapproach has already been used in [1] with additional analog-specific performance constraints. They assume that all devices

    A B C

    D

    E

    (a) FET graph with 5 nodes corresponding to nets (A, B, C, D,E) and 5 edges, corresponding to FETs with an odd numberof fingers and 1 loop corresponding to a FET with an evennumber of fingers.

    A B C D E C B X B

    (b) Placement with minimum number of gaps. This correspondsto the walks A, B, C, D, and E, C, B, B. Note that the net Xdoes not appear as a node in the graph since it is the inner netof a FET with an even number of fingers.

    Fig. 8. FET graph and placement corresponding to a minimum partition intowalks.

    have exactly one finger but the same algorithm can be usedwhen the FETs obey restriction R.

    We use a graph model to determine the minimum numberof gaps that need to be left within one group. For each group,we construct a graph G = (V,E), where V is the set of nets.For each FET with an odd number of fingers, we add an edgee = {v, w} connecting the nodes corresponding to the sourceand drain net of the FET. For each FET with an even numberof fingers, we add a loop e = {v, v}, where v is the net whichbelongs to the leftmost and rightmost contact of the FET. Seethe illustration in Fig. 8. The idea of this graph is that twoFETs can share if their corresponding edges in the graph havea common vertex. Furthermore, a set of FETs can be placednext to each other without any gap if and only if there exists awalk in G consisting of the edges corresponding to the FETs.We call such a placement without gaps a chain.

    Theorem 1. Let F be a set of FETs with equal VT level,equal height, a fixed number of fingers, and fixed swap statusfor FETs with an even number of fingers. Let k be the numberof walks of a minimum partition of the edges of G(F ) intowalks. For any placement C, let C|F be the sub placement ofC consisting of the FETs F . Then

    minC∈C(R)

    Nno-share(C|F ) = k − 1. (13)

    Proof. “≤”: Let P1, . . . , Pk be a partition of the edges of Ginto walks. Then for each walk, we can place the FETs nextto each other in a chain. Since there are k walks, this gives kchains with k − 1 gaps in between.

    “≥”: Let C be a placement minimizing Nno-share(C|F ). Byconstruction of the graph G, any chain corresponds to a walkin G. Therefore, this placements yields a partition of G intoNno-share(C|F ) + 1 walks.

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 8

    The size k of a minimum partition of a connected graph Ginto walks can be determined by the degrees of the vertices.Euler’s well-known result states that for a connected grapha single walk suffices if no more than 2 vertices have odddegree. The following extension is also well-known:

    Theorem 2. Let G be a connected graph, Nodd the numberof vertices with odd degree, and

    k := max

    (Nodd2

    , 1

    ). (14)

    Then k is the size of a minimum partition of the edges of Ginto walks.

    To obtain the size of a minimum partition into walks for apotentially not connected graph, one has to sum the numberof walks for each connected component.

    Using Theorem 1 and Theorem 2 we are able to computethe minimum number of gaps in a placement restricted by R inlinear time. Since we want to know if the placement width isbelow some given threshold Wmax, we only have to deal withplacements C for which

    ∑i f(ci) ≤ Wmax. In practice, this

    means that we start with the minimum number of fingers ofeach FET and incrementally distribute additional fingers to theFETs. Instead of iterating over all swap states of the FETs withan even number of fingers, one can solve a certain VERTEXCOVER problem, where the set of chosen vertices correspondsto the set of swapped FETs. While this does not improve theworst case running time, simple heuristics for VERTEX COVERreduce the running time in practice. Both techniques speed upcalculating the pruning step significantly and lead to very fastrunning times in practice which are negligible to other partsof the algorithm, e.g. routability checks.

    F. Routability Check

    Legal placements are useless if they are not manufacturableor routable. Therefore, we directly check routability alreadyduring placement. This approach guarantees that the placementalgorithm will return a routable solution. This is differentfrom common approaches that only approximately modelroutability, e.g. by using a netlength-based objective function.Furthermore, our approach allows the user to define customconstraints, like blocking certain tracks or fixing the position ofrouting pins. These constraints ensure a seamless embeddingof the cell into the hierarchical context. We let our routingengine decide about the routability of a placement. Since therouting engine is able to deal with custom constraints, theplacement will automatically adapt to them as well.

    In its simplest version, the routability check is run at theleaves of the search tree and discards all nodes which fail.Again, the simple approach is too slow in practice and severalspeedup techniques are used.

    G. Partial Placement Routability Check

    Most legal placements are not manufacturable or notroutable. It is therefore essential to detect partial placementswhich cannot be completed to a routable placement as earlyas possible. The key difficulty here is the non-monotonicity

    N1

    N2

    N3

    N3

    N4

    (a) Illegal placement with 5FETs.

    N1

    N2

    N3

    N3

    N4

    N4

    (b) Legal placement with 6FETs, which is an extension ofthe illegal placement on the left.

    Fig. 9. Adding a FET to an illegal partial placement can make it legal. Gatenets N1 and N2 are different and need to be separated by a trim shape (red).(a) Gate net N4 has no FET placed opposite and therefore also needs to beseparated from the other stack by a trim shape. These trim shapes are tooclose to each other and make the placement illegal. (b) Adding a FET withgate net N4 removes the need for the second trim shape and legalizes thesituation.

    of partial placements w.r.t. to routability. Counter-intuitively,a partial placement can be unroutable but the same partialplacement with an additional placed FET is routable. The sameis true for the FET distance rules. Fig. 9 shows an examplewhere the addition of a FET to an illegal partial placementyields a legal placement.

    Therefore it is not possible to use a partial placement asinput to our routing engine as if it was a full placement todetermine routability of its ancestors. The problem is solvedby assigning one of three states to every position in the partialplacement. The states are• FET placed with configuration c• empty (no FET will be placed here in the future)• unknown (potentially empty, but some FET might be

    placed here in the future)The routing formulation only adds constraints for cases

    where the presence or absence of a FET is known. If thepresence of a FET is unknown, no constraints are added. Forthe example in Fig. 9-(a), no constraints are added whichwould enforce the existence of the right trim shape.

    H. FEOL, FET Access, and Full Routing

    Due to the large complexity of the routing problem, routingoracle calls are very expensive. In many cases (partial) place-ments are illegal even when only a restricted set of rules isconsidered. We distinguish between three sets of rules, calledphases:

    1) FEOL (front end of line). Contains only rules belowM0. This includes floating gates, RX coloring, fin trimshapes, and PC trim shapes among others.

    2) FET-access. Contains rules up to M0. Some connectionsof FETs can be routed below M0 mainly by sharing withother FETs. For all other connections, we know that alltheir terminals must be connected to M0. In this phase,we enforce that all terminals of these nets are connectedto M0. Placements which are illegal w.r.t. FET-accessconstraints usually have a highly congested region with

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 9

    many terminals of different nets. All M0 trim shape rulesand full via coloring are contained in this group. Alsocontains all FEOL rules.

    3) Full. The entire routing. Also honoring net connec-tions to user-specified pin tracks or respecting forbiddentracks. Full routability can only be checked for fullplacements, i.e. leaves of the search tree.

    If a placement is illegal w.r.t. FEOL rules, we know thatit is also illegal w.r.t. to FET-access or full, since the FEOLrules are a subset of the rules of the other phases. Similarlya placement illegal w.r.t. FET-access is also illegal w.r.t. fullrouting. The phases are run one after the other. We stop assoon as one phase proves that the placement is illegal or allphases are legal. Due to the large difference in complexity ofthe phases, their average running time also differs by ordersof magnitude. An FEOL oracle call lasts about 0.1 seconds,FET-access running time is in the order of seconds and fullrouting can take minutes to hours on large instances. Sincemany (partial) placements are already illegal w.r.t. FEOL rules,the phase-based approach is much faster compared to directlychecking full routability.

    The second major advantage is that FEOL and FET-accessroutability can also be checked for partial placements whichis not possible for full routability. This is essential to prunepartial placement nodes of the search tree which violate someDRC constraint without extending them to all possible fullplacements.

    I. FEOL Routing Oracle CacheIn contrast to the previously presented techniques, routing

    oracle caching does not affect the search tree. The running timeof the routing oracle dominates the total placement runningtime. Many partial placements look similar from a routabilityperspective and calling the oracle twice for similar instancescan be avoided. This is especially the case for FEOL routingof partial placements. In FEOL routing only the lower layersare considered which means that some details of the partialplacement do not matter for FEOL routability. Two instancesare equivalent routing instances for example if they can betransformed into each other by a permutation of nets. They stilloriginate from different placements, but their routing problemsare either both routable or both unroutable. In these cases, asingle routing oracle call suffices. To detect these situations,we transform partial placements into a data structure calledroutability data which contains all information needed by theFEOL routing oracle but hides other information containedin the placement, for example net names. If two partialplacements are transformed into the same routability data weknow, by construction, that the FEOL routability oracle willgive the same answer for both. This is exploited by storing allqueries to the routing oracle in a cache.

    In practice, we observe that on average 9 out of 10 queries tothe FEOL routing oracle, can be answered by a cache lookup.This results in a factor 10 speedup.

    J. Netlength PruningOnce a routable placement has been found, the algorithm

    continues to search for placements with better objective value.

    Since the cell width is fixed at this point, the determiningcriterion is netlength. We use the bounding box netlengthmodel which can also be applied to partial placements. Addi-tionally, we calculate lower bounds for the netlength of partialplacements, also considering the FETs which still have to beplaced. In this way, we are able to prune parts of the searchtree which cannot contain routable solutions which are betterthan the one we have already found.

    K. Multi-Row Cells

    Very large cells are typically not implemented on one butseveral neighboring circuit rows. Such multi-row cells can alsobe placed with our algorithm. To place a cell on several circuitrows, we first compute an assignment of FETs to the rowsusing a mixed integer programming approach. Assignmentsare evaluated by their number of row-crossing connections andan estimation for the total cell width. Using Algorithm PLACE-CELL (Section II-D), the rows are placed one after the otherwith each new placement respecting constraints enforced byalready placed rows.

    As the assignment problem of FETs to rows does not modelthe resulting total cell width exactly, we cannot guaranteeglobal optimality for the placement of multi-row cells. How-ever, each individual row has minimum possible width, subjectto the routing constraints implied by the previously placedrows.

    III. ROUTING

    As described in Section II, the routing engine is not onlyused to compute a routing after the cell has been placed,but instead also called many times during the placementalgorithm to ensure that the computed placement is routable.In the following, we explain the details of the full router thatruns after the placement algorithm. Required modifications forusage as routing oracle during the placement phase are givenin Section III-G.

    In order to solve the routing problem on a placed cell, weuse a mixed integer programming (MIP) approach. Next to theinput already given for the placement problem (Section II), thecell routing problem expects the location of each FET fromthe placement.

    Our goal is to find a valid routing that minimizes the wirelength and number of vias in order to optimize the power,timing, and yield properties of the cell. Note that this doesnot impose an algorithmic limitation, as our routing engineallows us to optimize arbitrary linear objective functions.

    In the remainder of this section, we first describe the gridgraph in which we search a disjoint minimum-cost Steiner treepacking. Then, we give the description of our core MIP modelfor the Steiner tree packing problem, and explain how designrules, in particular trim shape rules and via coloring rules,are incorporated into that model. Finally, we discuss requiredmodifications and a post processing routine that optimizes therouting with respect to DFM (design for manufacturability).

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 10

    A. Grid Graph Construction

    Since each wiring layer only allows either vertical orhorizontal wires, we represent the cell routing space by a three-dimensional grid graph G = (V,E) with edge costs. For eachlayer, we are given a set of routing tracks specifying feasiblepositions for wires which are not necessarily equidistant.

    By intersecting routing tracks on adjacent layers, we obtainthe vertex set V . The edge set E consists both of line segmentsconnecting adjacent intersections on the same layer as well asvias between stacked vertices on adjacent layers. Each edgee ∈ E represents the center line of a possible wire shape. Trimshapes allow using edges partially, cf. Section III-E.

    Edge costs are obtained by multiplying their geometriclength by a layer-, track-, and net-dependent value. This allowsto trade off wire length against the number of vias and to leavemore space for inter-cell routing by increasing certain edgecosts. For example, on M1, only every second track is usablefor inter-cell routing due to the power via pattern, and M2 iswidely used for inter-cell routing.

    B. Mixed Integer Programming Formulation

    First, we describe the core MIP we use to model theSteiner tree packing problem in graphs. Then, in Sections III-Dto III-F, we explain how design rules are incorporated into themodel.

    For each net k ∈ N and each edge e ∈ E, we add abinary variable xke specifying whether edge e is used by netk. Furthermore, for each edge e ∈ E, we introduce a binaryvariable xe that determines whether edge e is used by somenet, and add the constraint xe =

    ∑k∈N x

    ke . Since xe is

    upper bounded by 1, this constraint already guarantees edgedisjointness of integral solutions.

    In the following, for some vertex set X ⊂ V , we refer byδ(X) to the set of edges between X and V \X , and, in thedirected case, by δ+(X) to the set of edges leaving X and byδ−(X) to the set of edges entering X .

    We ensure connectivity by adding for each net k ∈ Na formulation of the Steiner tree problem in graphs to themodel. Note that using a formulation with a strong relaxationis essential for small running times. In [5], the undirected cutrelaxation is used for that purpose: For each net k ∈ N , wedenote by Tk ⊆ V the set of its terminals. We say that a cutδ(X) separates Tk if both Tk ∩X and Tk \X are nonempty.Then, for each cut separating the terminal set, the undirectedcut relaxation requires at least one edge of the cut to becontained in the Steiner tree, i.e.

    ∑e∈δ(X) x

    ke ≥ 1. However,

    the undirected cut relaxation has an integrality gap of 2 [4],which is already asymptotically attained in the special casethat G is a circuit, even if all vertices are terminals, as thefractional solution xk ≡ 12 demonstrates.

    One can strengthen this relaxation by using a bidirectedauxiliary graph (V,A) with A = {(i, j) : {i, j} ∈ E} whichcontains two opposing edges (i, j) and (j, i) for each originaledge {i, j} ∈ E. Choose an arbitrary root terminal rk ∈ Tkand add usage variables ~xkij for all directed edges (i, j) ∈ A.Then, for each cut δ+(X) ⊂ A with rk ∈ X and Tk \ Xnonempty, require that at least one edge leaving X is used,

    i.e.∑

    (i,j)∈δ+(X) ~xkij ≥ 1. Finally, lower bound the usage of

    each original edge {i, j} ∈ E by the sum of the usages ofboth directed edges (i, j) and (j, i). This relaxation is calledbidirected cut relaxation. The integrality gap of the bidirectedcut relaxation is unknown, the largest known lower boundis 6/5 [21], and no upper bound stronger than 2, which isimplied by the integrality gap of the undirected cut relaxation,is known.

    By introducing additional flow variables, one can eliminatethe exponential number of cut constraints, resulting in themulticommodity flow relaxation, first introduced in [23]. Thisrelaxation is equivalent to the bidirected cut relaxation [16] andwas already used in [7] to solve Steiner tree packing problems.We will also use the multicommodity flow relaxation:

    For each net k, we denote the set of sink terminals Tk\{rk}by Sk. Then, the multicommodity flow relaxation introducesa commodity for each sink s ∈ Sk and requires a flow of oneunit of the commodity from rk to s to be supported by ~xk.More precisely, for each net k ∈ N , sink s ∈ Sk and directededge (i, j) ∈ A, a flow variable fksij that is upper boundedby ~xkij is introduced, representing the flow of the commoditys of net k along the directed edge (i, j). Then, we add flowconservation constraints at vertices in V \ {s, rk} and enforcethat rk sends one unit of flow and that s receives one unit offlow of commodity s.

    Finally, to ensure vertex disjointness, for each net k ∈ Nand vertex v ∈ V , we add a binary vertex usage variable xkv ,which upper bounds usage variables of incident edges, andadd the constraint that each vertex may be used by at mostone net.

    The complete model is given in Fig. 10, where we denoteby bks(v) :=

    ∑(i,j)∈δ+(v) f

    ksij −

    ∑(i,j)∈δ−(v) f

    ksij the flow

    balance of commodity s of net k at a vertex v ∈ V .

    min∑e∈E

    cexe

    s.t. xe =∑k∈N

    xke ∀ e ∈ E

    xe ∈ {0, 1} ∀ e ∈ Exke ∈ {0, 1} ∀ e ∈ E, k ∈ N

    bks(v) =

    1 if v = rk−1 if v = s0 otherwise

    ∀ v ∈ V, k ∈ N , s ∈ Sk0 ≤ fksij ≤ ~xkij ∀ (i, j) ∈ A, k ∈ N , s ∈ Sk~xkij + ~x

    kji ≤ xk{i,j} ∀ {i, j} ∈ E, k ∈ Nxkv ∈ {0, 1} ∀ v ∈ V, k ∈ Nxke ≤ xkv ∀ v ∈ e ∈ E, k ∈ N∑

    k∈Nxkv ≤ 1 ∀ v ∈ V

    Fig. 10. Core mixed integer program modeling a Steiner tree packing problem.

    In this basic formulation, no additional constraints, espe-cially with respect to distances between shapes, are taken intoconsideration.

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 11

    C. Conditional Constraints

    In order to implement complex design rules, we need tomodel logical implications to conditionally enable constraints.More specifically, consider a linear inequality of the form∑i∈I aixi ≤ b, and let xcond be a binary variable. We want to

    model

    (xcond = 0) =⇒

    (∑i∈I

    aixi ≤ b

    ). (15)

    To this end, we use the following standard approach: Let U bea constant that is an upper bound on

    ∑i∈I aixi−b, which can

    be derived from the variable bounds. For this, we require thatall involved variables xi have finite variable bounds, which isthe case in our application. Then, the constraint∑

    i∈Iaixi − Uxcond ≤ b (16)

    satisfies our needs: If xcond = 0, then the constraint equals theoriginal constraint, and if xcond = 1, then

    ∑i∈I aixi −U ≤ b

    is always satisfied by the choice of U .Of course, a similar approach works in order to condition

    on xcond = 1. By replacing constraints with equality bytwo inequalities, the same approach can be applied to theseconstraints as well. Finally, in case we need to condition onmultiple such binary conditions, we can recursively apply theprocedure above.

    D. Mapping DRC Constraints

    For the wiring within a cell, design rules used to fallinto the two basic categories of same-net rules and diff-netrules. Same-net rules are in place to avoid specific geometricconfigurations of wiring shapes of a single net, while diff-net rules require a certain minimum distance between wiresthat belong to different nets. However, in 7nm technology, allwires on layers used for cell-internal routing are generated asthe complement of trim shapes, which are not associated withany particular net, and constraints on wire shapes are entirelyexpressed in terms of constraints on trim shapes. Hence, therouting model contains additional variables and constraints thatmodel the trim shape configuration, and the only additionalconstraints on non-via edge usage variables are consistencyconstraints with the trim shape model. A detailed descriptionof the trim mask and its manufacturing process is given in [2]and [12].

    All features are manufactured using multiple masks in orderto increase packing density: Shapes on different masks areallowed to have a smaller distance than shapes on the samemask. Hence, a valid routing does not only consist of a disjointSteiner tree packing, but also requires features on such layersto be assigned to masks such that certain design rules are met.We call this assignment coloring. In the 7nm node, this onlyaffects vias, since wire and trim shapes use a fixed predefinedcoloring scheme.

    E. Trim Shape Model

    Recall that on each routing layer, the routing grid graphconsists of parallel routing tracks. Except on the layer TS,

    ≥ d0

    ≥ d1

    A1

    A2 B

    C

    Fig. 11. A trim shape configuration with relevant trim spacing distances,resulting wires and the assignment of trim shapes to grid graph edges. Eachtrim shape is assigned to an edge containing its center.

    each routing track is associated with a fixed color, representingthe mask that is used to manufacture trim shapes on this track.Since trim shapes on tracks of different color are independent,we do not consider colors in the remainder of this section. Thefull model is then obtained by applying the following, for eachlayer and color, to all tracks of that color.

    Now, fix a layer and without loss of generality assume thatrouting tracks on that layer are horizontal. Figure 11 shows aconfiguration with four trim shapes A1, A2, B and C. Notethat two trim shapes A1 and A2 will result in a single trimshape A during manufacturing. Trim shapes on the same trackmust satisfy a certain minimum horizontal distance, indicatedby d0, while trim shapes on neighboring tracks must eitheralign to the same coordinate (as in case of A) or again satisfya minimum horizontal distance, indicated by d1. Note thatthe minimum same track trim shape spacing rule via d0 alsoencodes a minimum area constraint on the wire in between.Since trim shapes have a fixed width Wtrim, any trim shapeconfiguration is uniquely represented by the set of their centercoordinates, indicated by crosses.

    Then, we can assign each trim shape to an edge thatcontains the trim shape’s center. This assignment is illustratedin Figure 11, where edges that have a trim shape assigned tothem are highlighted. Since d0 is sufficiently large, at mostone trim shape can be assigned to any edge.

    Hence, we can model a trim shape configuration as follows:For each edge e, we add a binary variable tactivee whichspecifies whether there is a trim shape with its center on e, andan integral variable tpose that specifies the exact x-coordinateof the trim shape’s center in case there is one, where

    xmin(e) ≤ tpose ≤ xmax(e). (17)

    The distance constraints on trim shapes are then modeledas follows. Let e and f be two edges on the same trackwith xmax(e) ≤ xmin(f). There are three possible cases:If xmax(f)−xmin(e)−Wtrim < d0, i.e., there is no feasibletrim shape configuration with both a trim shape on e and on

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 12

    f , then we add the constraint

    tactivee + tactivef ≤ 1, (18)

    modeling that at most one of the two trim shapes may beactive. Otherwise, if xmin(f) − xmax(e) −Wtrim < d0, i.e.there are both feasible and infeasible trim shape configurationswith trim shapes on e and f , we add the constraint(

    tactivee = 1 ∧ tactivef = 1)

    (19)

    =⇒(tposf − t

    pose −Wtrim ≥ d0

    ), (20)

    which guarantees that trim shapes on e and f are sufficientlyfar away from each other if present.

    Distance constraints on pairs of edges e, f on neighboringtracks are modeled similarly if the x-intervals e and f aredisjoint. Otherwise, the only valid configuration with a trimshape on both e and f requires these to be aligned, which wemodel as(

    tactivee = 1 ∧ tactivef = 1)

    =⇒(tposf = t

    pose

    ). (21)

    Finally, we need to ensure consistency of the edge usagemodel and the trim shape model. Used edges may not containa trim shape, so for each edge e, we add the constraint

    tactivee + xe ≤ 1. (22)

    Furthermore, let e be an edge and k ∈ N be a net. If e isused by net k, then adjacent edges must also be used by net kunless cut off by a trim shape. Hence, if f is an edge adjacentto e, add the constraint(

    xke = 1)

    =⇒(tactivef + x

    kf ≥ 1

    ). (23)

    F. Vias

    On via layers, we need to assign colors to used edges. Foreach via e ∈ E, let Me be the set of possible colors for e.Then, for each via edge e ∈ E, net k ∈ N and color m ∈Me,we add a binary variable mxke and enforce

    xke =∑m∈Me

    mxke . (24)

    Moreover, for each such edge e and color m, we add a binaryvariable mxe =

    ∑k∈N

    mxke , representing whether edge e isused with color m by any net.

    Then, if two close via edges e, f are not allowed to be usedby the same color m, add the constraint

    mxe +mxf ≤ 1. (25)

    Similarly, if two via edges e, f are even too close to be usedby different colors, require

    xe + xf ≤ 1. (26)

    Minimum required spacings between vias and trim shapesare implemented analogously to trim-trim-spacings.

    G. Routing Oracle During PlacementAs discussed in Sections II-G and II-H, the routing engine

    is queried during the placement algorithm to prune partialplacements that cannot be completed to routable placements.These queries are not performed using the full routing model,but instead either use the FEOL (front end of line) or FET-access phase. These phases only respect design rules belowM0 or up to M0, respectively.

    Instead of forcing net connectivity using flow variables,for each FET contact, it is determined whether a connectionto M0 is required, and in that case, constraints enforcingsuch a connection are added. This results in a much simplermodel which we can afford to solve many times duringplacement, and, albeit its limited set of constraints, detectsmany unroutable placements.

    When routing partial placements, we determine the un-placed area where future FETs might be placed. Then, weonly add constraints for placed FETs, and skip all constraintsthat depend on the presence of a FET in the unplaced area.

    H. Post ProcessingDesign for manufacturability (DFM) rules are soft con-

    straints that are not strictly required and aim at increasingyield by avoiding configurations with a higher failure risk.DFM rules include increased trim-via and trim-trim spacings,preferred coordinates for trim shape locations and preferredvia colors at specific locations.

    After computing a full routing, we perform a post process-ing which aims at satisfying as many DFM rules as possiblewhile not increasing wire length and via count significantly.

    For each DFM rule and each location, we add a binaryvariable that determines whether the rule is satisfied at thatlocation. Then, we add a constraint modeling the DFM rule,conditioned on that variable, and add the binary variable to theobjective function, rewarding all satisfied DFM rules. Finally,we restrict all flow variables (cf. Section III-B) in the modelto their current value, which fixes the structure of the routingsolution, and reoptimize.

    IV. EXPERIMENTAL RESULTSWe have implemented all algorithms described in this paper

    and tested them on real-world 7nm instances. We did allexperiments single-threaded on a 2.20GHz Intel Xeon E5-2699 v4 machine using CPLEX 12.7.1 as MIP-solver.

    A. Results for a Standard Cell LibraryAs a first test set, we chose a standard cell library containing

    120 cells. See Table I for details on this library. The numberof FETs for these cells is between 2 (for the inverters) and 8.

    The cells in this standard library have been designed man-ually and much time has been spent to get a best possiblelayout. All layouts found by BonnCell were at least as goodas the known layouts. For 10 of the 120 cells, BonnCell wasable to find better (with respect to area) layouts. In some cases,the BonnCell result improved the library results by more than22%. Figure 12 shows an example of such a cell. The averagerunning time needed per cell was less than 2 minutes. Table Ishows the detailed results of BonnCell on these cells.

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 13

    TABLE IBONNCELL RESULTS ON CELLS FROM A STANDARD LIBRARY. WITHIN

    EACH ROW OF THE TABLE FOR EACH CELL TYPE (COLUMN 2) WE SHOWTHE NUMBER OF VARIANTS OF THIS CELL IN THE LIBRARY (COLUMN 1),THE NUMBER OF FETS AND NETS IN THE CELL (COLUMNS 3 AND 4) ANDTHE AVERAGE RUNNING TIME IN SECONDS NEEDED BY BONNCELL FORALL CELLS WITHIN A ROW FOR THE PLACEMENT STEP (COLUMN 5) AND

    THE ROUTING STEP (COLUMN 6).

    cells cell type FETs nets avg. time avg. timeplacement [s] routing [s]

    11 AOI21 6 9 48 5111 AOI22 8 11 61 6217 INV 2 5 8 916 NAND2 4 7 46 5311 NAND3 6 9 19 24

    5 NAND4 8 11 37 5916 NOR2 4 7 40 50

    8 NOR3 6 9 37 385 NOR4 8 11 33 50

    10 OAI21 6 9 30 3210 OAI22 8 11 143 40

    0 1 2 3 4 5 6 7 8 9 10

    11

    12

    13

    14

    15

    16

    (a) Layout from the standard library.

    0 1 2 3 4 5 6 7 8 9 10

    11

    12

    (b) BonnCell’s layout.

    Fig. 12. Layouts of an OAI22 standard library cell. BonnCell’s layout reducesthe cell’s area by more than 22% and total net length on metal layers by morethan 33%.

    B. Results for Latches

    Latches are much larger and more complex cells. Man-ually finding good layouts for these cells is even for veryexperienced designers a challenging task. Moreover, it maytake a designer several days to find a good layout. We useda testbed of 22 latches ranging in size between 10 and 44FETs. For 7 of these 22 latches, BonnCell found a provablyarea optimal solution. For the other latches, BonnCell ran intoa timeout and the solution found is therefore not guaranteedto be area optimal. Nevertheless, BonnCell found in all butone case a solution that was either better in area comparedto the designer’s solution or needed less M2 tracks. The onecase in which BonnCell was only able to find an equally goodsolution is a latch where the designer’s solution needs no M2and is area optimal and therefore cannot be improved. Table II

    shows the details of BonnCell’s results on this latch testbed.An example layout for a latch that BonnCell built using 5%less area than the designer’s solution is shown in Fig. 13.Using the approach described in Section II-K BonnCell canalso build multi-row layouts of cells. As an example we showin Fig. 14 a 2-row layout for the same latch.

    TABLE IIBONNCELL RESULTS ON LATCHES. WITHIN EACH ROW OF THE TABLE WESHOW THE LATCH NAME (COLUMN 1), THE NUMBER OF FETS AND NETS

    IN THE LATCH (COLUMNS 2 AND 3), THE WIDTH OF THE SOLUTION FOUNDBY BONNCELL IN TRACKS (COLUMN 4) AND BONNCELL’S TOTAL

    RUNNING TIME FOR PLACEMENT AND ROUTING IN SECONDS (COLUMNS 5AND 6) AND IN COLUMN 7 WHETHER IT IMPROVED THE BEST DESIGNER’S

    LAYOUT (YES) OR WAS EQUALLY GOOD (EQUAL).

    latch FETs nets tracks running time [s] improvedplace route area or M2

    DFFFQ X1 38 27 32 7227 2730 yesDFFQDICE X1 43 29 38 3243 2756 yes

    DFFQ X1 28 21 20 3018 1333 yesELATN X1 12 11 12 1218 54 yesELATS X1 10 11 10 544 504 yes

    ELAT X1 12 11 12 1171 49 yesELAT X3 12 11 16 2706 34 yesELAT X8 12 11 30 3236 102 yes

    ESLATN X1 32 25 36 3237 3492 yesESLATS X1 26 25 32 3625 592 yes

    ESLAT X1 32 25 36 3623 2140 yesESLAT X3 32 25 36 3656 594 yes

    L1LATF X1 26 21 22 3966 189 equalN1LAT X1 38 27 38 3238 4725 yesN1LAT X3 38 27 44 3239 1247 yes

    SDFFQN X1 36 28 28 3636 5444 yesSDFFQS X1 32 27 24 28837 2793 yes

    SDFFQ X1 36 28 28 3644 4016 yesSDFFQ X3 36 28 42 3650 701 yes

    SDFFSRPQ X1 44 34 36 3239 1695 yesINV ELAT X1 14 12 16 3606 51 yesINV ELAT X3 14 12 20 603 51 yes

    V. CONCLUSION

    We have presented BonnCell, a new flow for the automaticcell layout that is able to deal with the challenges arising in7nm technology. The main features are the global optimizationof several design objectives and the full integration of therouting algorithm into the placement algorithm. This allowsguaranteeing DRC clean layouts that globally optimize the cellarea with given secondary design objectives, e.g. wire length,M2-limitations, pin access and so on.

    All solutions found by BonnCell were at least as goodas the best designer’s solution. Moreover, BonnCell findsthese solutions much faster than a designer. On a multi-coremachine, the whole library reported in Table I which contains120 different cells can be processed within five minutes.As BonnCell’s layouts are DRC clean by construction noadditional post processing is necessary. This allows generatingseveral different layouts for all cells of a library to see theoverall impact that these different layouts have on later stagesin the layout of the whole chip, e.g. routability, timing, area,power, pin access and so on. It is also possible to addmore complex cells to the library, e.g. gates with invertedinputs/outputs or wider AOIs and OAIs which may help toimprove the overall timing or power consumption of the chip.

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885

  • 14

    0 1 2 3 4 5 6 7 8 9 10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    22

    23

    24

    25

    26

    27

    28

    29

    30

    31

    32

    33

    34

    35

    36

    37

    38

    39

    40

    41

    42

    (a) Manual layout by an experienced designer.

    0 1 2 3 4 5 6 7 8 9 10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    22

    23

    24

    25

    26

    27

    28

    29

    30

    31

    32

    33

    34

    35

    36

    37

    38

    39

    40

    (b) BonnCell’s layout.

    Fig. 13. Layouts of the latch SDFFQ X3. BonnCell’s layout needs 5% less area than the best designer’s solution while using the same number of M2 tracks.

    0 1 2 3 4 5 6 7 8 9 10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    Fig. 14. BonnCell’s 2-row layout of the latch SDFFQ X3.

    REFERENCES

    [1] B. Basaran and R. A. Rutenbar. An O(n) algorithm for transistorstacking with performance constraints. In Proc. DAC’96, pages 221–226, 1996.

    [2] J. H. Chen, T. A. Spooner, J. E. Stephens, M. O’Toole, N. LiCausi,B. Kim, S. Narasimha, and C. Child. Segment removal strategy in SAQPfor advanced BEOL application. In Interconnect Technology Conference(IITC), pages 1–3. IEEE, 2017.

    [3] P. Cremer, S. Hougardy, J. Schneider, and J. Silvanus. Automatic celllayout in the 7nm era. In ISPD ’17, pages 99–106, 2017.

    [4] M. X. Goemans and D. P. Williamson. A general approximationtechnique for constrained forest problems. SIAM Journal on Computing,24(2):296–317, 1995.

    [5] M. Grötschel, A. Martin, and R. Weismantel. The Steiner tree packingproblem in VLSI design. Math. Program., 78:265–281, 1997.

    [6] A. Gupta and J. P. Hayes. Optimal 2-D cell layout with integratedtransistor folding. In ICCAD’98, pages 128–135. IEEE, 1998.

    [7] N.-D. Hoàng and T. Koch. Steiner tree packing revisited. Math MethOper Res, 76(1):95–123, 2012.

    [8] K. Jo, S. Ahn, T. Kim, and K. Choi. Cohesive techniques for cell layoutoptimization supporting 2D metal-1 routing completion. In 23rd Asiaand South Pacific Design Automation Conference (ASP-DAC), pages500–506, 2018.

    [9] I. Kang, D. Park, C. Han, and C.-K. Cheng. Fast and precise routabilityanalysis with conditional design rules. In Proceedings of the 20th SystemLevel Interconnect Prediction Workshop, Article no. 4, 2018.

    [10] C. Lazzari, C. Santos, and R. Reis. A new transistor-level layoutgeneration strategy for static CMOS circuits. In ICECS’06, pages 660–663, 2006.

    [11] J. Lienig and M. Thiele. The pressing need for electromigration-awarephysical design. In ISPD ’18, pages 144–151, 2018.

    [12] H. Liu, T. Han, J. Zhou, and Y. Chen. Layout decomposition andsynthesis for a modular technology to solve the edge-placement chal-lenges by combining selective etching, direct stitching, and alternating-material self-aligned multiple patterning processes. In Design-Process-Technology Co-optimization for Manufacturability X, volume 9781.International Society for Optics and Photonics, Article no. 25, 2016.

    [13] M. Martins, J. M. Matos, R. P. Ribas, A. Reis, G. Schlinker, L. Rech,and J. Michelsen. Open cell library in 15nm freePDK technology. InISPD’15, pages 171–178. ACM, 2015.

    [14] S. Narasimha, B. Jagannathan, A. Ogino, D. Jaeger, B. Greene,C. Sheraw, K. Zhao, B. Haran, U. Kwon, A. Mahalingam, et al. A 7nmCMOS technology platform for mobile and high performance computeapplication. In IEDM’17, pages 29.5.1–29.5.4, 2017.

    [15] C. J. Poirier. Excellerator: Custom CMOS leaf cell layout generator.IEEE Trans. CAD, 8(7):744–755, 1989.

    [16] T. Polzin. Algorithms for the Steiner problem in networks. PhD thesis,MPII Saarbrücken, 2003.

    [17] G. Posser, V. Mishra, P. Jain, R. Reis, and S. S. Sapatnekar. Cell-internalelectromigration: Analysis and pin placement based optimization. IEEETrans. CAD, 35(2):220–231, 2016.

    [18] N. Ryzhenko and S. Burns. Standard cell routing via boolean satisfia-bility. In DAC’12, pages 603–612, 2012.

    [19] B. Taylor and L. Pileggi. Exact combinatorial optimization methodsfor physical design of regular logic bricks. In DAC’07, pages 344–349.ACM, 2007.

    [20] T. Uehara and W. M. vanCleemput. Optimal layout of CMOS functionalarrays. IEEE Trans. on Computers, 30(5):305–312, 1981.

    [21] R. Vicari. Simplex based graphs yield large integrality gaps for thebidirected cut relaxation. Master’s thesis, Research Institute for DiscreteMathematics, University of Bonn, 2018.

    [22] S. Wimer, R. Y. Pinter, and J. A. Feldman. Optimal chaining of CMOStransistors in a functional cell. IEEE Trans. CAD, 6(5):795–801, 1987.

    [23] R. T. Wong. A dual ascent approach for Steiner tree problems on adirected graph. Math. Program., 28(3):271–287, 1984.

    [24] P.-H. Wu, M. P.-H. Lin, T.-C. Chen, T.-Y. Ho, Y.-C. Chen, S.-R. Siao,and S.-H. Lin. 1-D cell generation with printability enhancement. IEEETrans. CAD, 32(3):419–432, 2013.

    [25] W. Ye, B. Yu, Y.-C. Ban, L. Liebmann, and D. Z. Pan. Standard celllayout regularity and pin access optimization considering middle-of-line.In GLSVLSI’15, pages 289–294. ACM, 2015.

    [26] B. Yu, X. Xu, S. Roy, Y. Lin, J. Ou, and D. Z. Pan. Design formanufacturability and reliability in extreme-scaling VLSI. Science ChinaInformation Sciences, 59(6):1–23, 2016.

    This paper appeared in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 39 (2020), 2872-2885