System-Level Power Delivery Network Analysis and Optimization for Monolithic … · 2020. 2....

888 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 27, NO. 4, APRIL 2019

System-Level Power Delivery Network Analysisand Optimization for Monolithic 3-D ICs

Kyungwook Chang , Shidhartha Das, Saurabh Sinha, Brian Cline, Greg Yeric,and Sung Kyu Lim, Senior Member, IEEE

Abstract— As 2-D scaling reaches its limit, monolithic 3-D(M3D) IC is a leading contender for continuing equivalentscaling. Although M3D shows power and performance benefitsover 2-D designs, designing a power delivery network (PDN)for M3D is challenging. In this paper, for the first time,we present a system-level PDN model of M3D designs focusingon both resistive (IR) and inductive (Ldi/dt) components ofpower supply integrity. In addition, we present frequency- andtime-domain analyses of M3D PDNs. We show that the additionalresistance in M3D PDNs, while being worse for resistive drops,improves resiliency against ac current noise showing 35.9% peakimpedance reduction compared to 2-D PDNs during worst caseresonant oscillations. Then, we present methodologies to improvepower supply integrity of M3D designs based on the observations.Our optimization methodologies offer up to 32.6% and 17.0%static and dynamic voltage drop reduction compared to thebaseline M3D designs, respectively, showing 9.0% lower dynamicvoltage drop compared to the 2-D counterparts.

Index Terms— Frequency- and time-domain analysis, mono-lithic 3-D (M3D) IC, power delivery network (PDN), powerdelivery network optimization, static and dynamic rail analysis.

I. INTRODUCTION

COMPARED with through-silicon-via (TSV)-based 3-DICs, monolithic 3-D (M3D) ICs offer manyfold benefitsin vertical dimension for system integration. The key enableris monolithic intertier vias (MIVs) that are 100× smallerthan TSVs. This dramatic dimensional reduction opens upnumerous opportunities in ultrafine-grained design optimiza-tions across multiple tiers without significant overhead.

Challenges in designing a reliable power deliverynetwork (PDN) increase mainly due to lower supply voltage,faster operating clock frequency, and higher power density.Along with restricted budget of resources and cost, thesechallenges may cause functional failures and performancedegradation due to parasitics-induced voltage drop in a non-ideal PDN. The total voltage drop is decomposed into a

Manuscript received July 24, 2018; revised November 28, 2018; acceptedJanuary 12, 2019. Date of publication March 1, 2019; date of current versionMarch 20, 2019. (Corresponding author: Kyungwook Chang.)

K. Chang and S. K. Lim are with the School of Electrical and ComputerEngineering, Georgia Institute of Technology, Atlanta, GA 30332 USA(e-mail: [email protected]; [email protected]).

S. Das is with ARM Ltd., Cambridge CB1 9NJ, U.K. (e-mail: [email protected]).

S. Sinha, B. Cline, and G. Yeric are with ARM Inc., Austin,TX 78735 USA (e-mail: [email protected]; [email protected];[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2019.2897589

resistive (IR)-drop component and an inductive (Ldi/dt)-dropcomponent. Increasing the metallization in a PDN can mitigatethe resistive component of the voltage drop while taking intoaccount routing resources and cost budget.

Meanwhile, the inductance of package including controlledcollapsed chip connection (C4) bumps leads to significantLdi/dt-drop due to time-varying current drawn by cells ina die. To mitigate this drop, decoupling capacitors (decaps)are utilized for local charge storage. Decaps can be placedon a die with decoupling cells (decap cells) or explicitlyadded in package. However, this decap along with resistanceand inductance of a PDN forms an RLC circuit resultingin resonance frequencies [1]. If the resonance frequency lieson the system’s operating frequency range, significant Ldi /dt-drop is induced, and it is crucial to have low input impedanceacross a wide range of frequencies.

While PDNs of 2-D designs have been explored actively [2][3], M3D PDNs have not been studied widely. A study for asystem-level PDN for TSV-based 3-D ICs is presented in [4],but PDNs in M3D and in TSV-based 3-D ICs show quite dif-ferent characteristics due to their tier-connection method andachievable vertical integration density. Samal et al. [5] haveinvestigated the impact of PDNs on power and performanceof M3D by proposing PDN designs, but the authors did notperform voltage drop analysis on them.

In this paper, we investigate the benefits and challengesof PDNs in M3D designs and present optimization method-ologies. Using three benchmarks, we model a system-levelPDN circuit of M3D designs as well as 2-D designs andperform an in-depth analysis on both static and dynamicbehavior of the PDN. We also present resonance frequencyanalysis and in-rush current study for M3D designs, whichshows the frequency- and time-domain response of the PDN.This, to the best of our knowledge, is the first work studyingthe dynamic voltage drop as well as the frequency- andtime-domain analysis in M3D. Then, we present M3D PDNoptimization methodologies, top-tier cell repositioning (RP),and asymmetric top- and bottom-tier M3D PDN.

II. MOTIVATIONAL EXAMPLE

We compare PDNs of 2-D and M3D designs taking intoaccount two analysis modes. The static mode is a vectorlessanalysis mode wherein the switching activity of cells is aver-aged into a single instance. In the dynamic mode, we perform a

1063-8210 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on February 21,2020 at 17:20:27 UTC from IEEE Xplore. Restrictions apply.

https://orcid.org/0000-0002-8513-9890

CHANG et al.: SYSTEM-LEVEL PDN ANALYSIS AND OPTIMIZATION FOR M3D ICs 889

TABLE I

STATIC AND DYNAMIC VOLTAGE DROP INDCT 2-D VERSUS M3D DESIGNS

Fig. 1. Our M3D design flow based on [6], extended to insert a PDN.

real workload-based (vector-based) power analysis for a givenperiod of time. The dynamic mode, thus, incorporates theimpact of inductive (Ldi/dt) transients by taking into accountworkload-dependent time-varying current flow.

Although both M3D and TSV-based 3-D ICs utilize ver-tical integrations to connect PDNs in tiers, there are majordifferences that impact on their static and dynamic behaviors.

In TSV-based 3-D ICs, power is delivered directly to powerpads of each tiers through dedicated power TSVs, forminga parallel resistive path between multiple tiers. However,in M3D, instead of having external power pads on the bottomtier, power MIVs are utilized to connect the bottommost metallayer of the top-tier PDN and the topmost metal layer of thebottom-tier PDN, consisting of a series resistive path acrossmultiple tiers, which makes bottom-tier cells experience muchlonger resistive path compared to TSV-based 3-D ICs. Further-more, irregular power MIV placement due to cell blocking onthe top tier makes power delivery issue more complicated inM3D. For these reasons, M3D suffers much higher voltagedrop in the static mode, especially on the bottom-tier cells asshown in Table I compared to TSV-based 3-D ICs [4].

Although the series resistive path of an M3D PDN worsensthe voltage drop in the static mode, it benefits the voltage dropin the dynamic mode by improving resiliency against ac cur-rent noise, which will be discussed later. Thus, the differencein the voltage drop in the dynamic mode is 7.3%, which issimilar to TSV-based 3-D ICs [4].

III. DESIGN FLOW

Fig. 1 shows the M3D design flow including a PDN used inthis paper. Panth et al. [6] presented an M3D design flow, butit does not include a PDN design step. In this paper, we haveextended the flow to incorporate an M3D PDN.

The flow starts with scaling xy-dimension of all existing2-D standard cells and metal wires by 1/2(1/2) since cellsin an M3D design are placed on half the footprint of thecorresponding 2-D design but onto two tiers. Then, a shrunk

Fig. 2. Our simplified model of a system-level M3D PDN structure.

2-D design is implemented with the shrunk cells and metalwires, performing all steps required in 2-D implementations.From the shrunk 2-D design, only cell placement is retained,and all other information is discarded. Afterward, the size ofall cells is scaled up to their original size, which results inoverlaps among the cells. In order to remove these overlapsand place the cells onto two tiers, we perform area-balancedmin-cut partitioning so that half of the cells are placed on thetop tier and the other half on the bottom tier.

To build a full M3D PDN and determine the location ofsignal MIVs, we duplicate all metal layers, so that the originalmetal layers account for those in the bottom tier and theduplicated ones in the top tier. In addition, the metal layerof pins in every cell is annotated with their respective tiers,so that pins of bottom-tier cells utilize M1 of the bottom tier,and top-tier cells utilize M1 of the top tier.

Before determining the location of signal MIVs, it is crucialto build an M3D PDN with the full metal stack (the originaland the duplicated metal layers) because the signal MIVsshould not be placed at the location that power MIVs are to beplaced, thereby preventing the signal MIVs from deterioratingthe quality of the M3D PDN. After having the PDN structureof the M3D design including the power MIVs, the signal MIVlocation is determined by routing the design and using thelocation of vias connecting the topmost metal layer of thebottom tier and the bottommost metal layer of the top tier.

Then, with the netlist of each tier and the location ofMIVs along with the PDN design, initial routing is performed,and timing constraints for each tier are derived. The timingconstraints are used to perform timing-driven routing for eachtier, resulting in fully placed and routed designs for each tier.The designs are then merged into a single M3D design, andtiming/power analysis as well as PDN analysis is performed.

IV. SYSTEM-LEVEL PDN MODELING

In order to perform an in-depth PDN analysis, it is crucialto build a system-level PDN model. Fig. 2 shows a simplifiedrepresentation of an M3D PDN. It consists of a system modeland a die model, and the system model is categorized into C4bump, package, and printed circuit board (PCB) models [7].

In the die model, the parasitics of the PDN of a designare extracted from Cadence Voltus. In Fig. 2, the resistanceof the metal wire RPDN,eq represents the equivalent resistiveparasitics of the metal wires consisting the PDN, and theimplicit decaps of the die CPDN,eq consists of the equivalentcapacitance of the PDN metal wires, nonswitching device



TABLE II

WIDTH, PITCH, AND UTILIZATION OF PDN. WE USE THE SAME SPECSFOR BOTH 2-D AND M3D (BOTH TOP AND BOTTOM TIER) DESIGNS

capacitance, and coupling capacitance between n-well andsubstrate. The current drawn by switched cells is lumped andmodeled as an ac load current source ILOAD,eq.

Lumped system model is used based on the para-meters obtained from [2] and [7]. The C4 bumps andpower-line traces in the package and PCB are modeledas a series connection of the a resistor and an inductor(C4 bumps: RC4 = 1 m� and LC4 = 10 pH; package:RPKG = 10 m� and LPKG = 100 pH; PCB: RPCB = 5 m�and LPCB = 1 μH), and a dc voltage source supplies power onthe PCB. The parasitics used in the voltage regulator moduleLC-tank filter are incorporated within the PCB parasitics.

Since the implicit decaps alone are not sufficient to keepthe design in safe voltage drop region from Ldi/dt-drop,explicit decaps are deployed both on the die using decap cellsCDIE_dc,eq and on the package and PCB using discrete decapsCPKG_dc (= 400 nF) and CBULK_dc (= 400 uF), respectively.The discrete decaps are modeled by a capacitor connected toan effective resistor and inductor in series (RPKG_dc = 20 m�and LPKG_dc = 200 pH; RBULK_dc = 10 m� and LBULK_dc =2 nH). These explicit decaps and the implicit decap of the dieact as charge storage elements and prevent system failure orperformance degradation due to severe Ldi/dt-drop.

V. EXPERIMENTAL SETUP

Three benchmarks, discrete cosine transform (DCT), AES-128, and JPEG from OpenCores are used as benchmarks in thispaper. Nangate 45-nm Open Process Design Kit (PDT) is usedto synthesize, place, and route the designs. Two-dimensionaldesigns and each top and bottom tier of M3D designs areimplemented using seven metal layers. The footprint of the2-D designs is determined such that the cell utilization is60%, and the M3D designs have half the footprint of the 2-Dcounterparts. Target frequency of each benchmark is fixed totheir maximum operating frequency in the technology node.

Table II summarizes the resources used on each metal layerto build the 2-D and M3D PDN designs. The dimensions ofpower rails are determined targeting the maximum instanceIR-drop to be 5% of nominal voltage (1.1 V) for the 2-Ddesigns of all benchmarks in static rail analysis. The samemetrics are used for both the top and bottom tier of the M3Ddesigns. Power rails on M1 and M2 layers are tightly coupledand run in parallel in the horizontal direction, and M2 and M5power rails are connected with only via arrays, which crossM3 and M4. M5–M7 power rails form a mesh structure todistribute power across the chip.

Since Nangate 45-nm Open PDK does not provide decapcells, we made our own decap cells with various sizes for

TABLE III

DESIGN METRICS AND DECOUPLING CAPACITANCE OF OUR DECAPCELLS. CELL WIDTH AND HEIGHT ARE IN MICROMETERS,

AND CAPACITANCE IN FEMTOFARADS

Fig. 3. Number of switched cells in the DCT design during a workload-basedsimulation. Only the time period that shows the highest switching activity(blue bar) is used for our analysis.

our experiment. Table III shows the size and decouplingcapacitance of the decap cells. The decoupling capacitance ofeach cell is derived using the method presented in [8]. With thefully placed and routed 2-D and M3D designs, decap cells arefirst placed next to clock buffers which drive the clock pins offlip-flops, which usually suffer from high Ldi/dt-drop. Then,rest of decap cells are placed in empty area of the design tomeet a target decoupling capacitance of the chip.

The power and ground pads of the designs are located onthe top metal layer of designs (M7 for the 2-D designs andM7 of the top tier for the M3D designs) with 120-μm spacing,which model the C4 bumps of the designs.

VI. RESULTS AND ANALYSIS

Fig. 3 shows the number of switched cells in the DCTdesign during a workload-based simulation. The vector-basedpower consumption in Table IV is measured during the timestep, which shows the highest switching activity throughoutthe simulation (blue bar in Fig. 3), while the statisticalpower consumption of the designs is calculated assuming theswitching ratio of the primary input and sequential logic as20% and 10%, respectively. Therefore, we observe that thedynamic power (internal + switching power) shows significantdifference between two analysis methods, whereas the staticpower (leakage power) remains similar.

The M3D designs offer power benefit over their 2-D coun-terparts. Since M3D designs utilize short vertical integrationwith MIVs instead of using long metal wires on the xy plane,the wire length of the designs is reduced, as shown in Table V,offering switching power saving. In addition, since the cells



TABLE IV

POWER COMPARISON OF 2-D AND M3D. WE CONDUCT BOTH STATISTICAL AND VECTOR-BASED POWER SIMULATIONS.�% FOR M3D DESIGNS IS CALCULATED WITH RESPECT TO THE 2-D COUNTERPARTS

TABLE V

DESIGN METRIC COMPARISON OF 2-D AND M3D. �% FOR M3D DESIGNS IS CALCULATED WITH RESPECT TO THE 2-D COUNTERPARTS

drive the reduced wire load, the number of buffers as well asthe drive strength of the cells decrease, which, in turn, reducesthe standard cell area, hence, showing benefits on the internaland leakage power consumption.

We use instance voltage drop, which is the voltage drop acell experiences as described in the following equation:

�Vinst = (VDD,nom − VDD,act) + (VSS,act − 0) (1)where �Vinst is the instance voltage drop, and VDD,nom isthe nominal voltage. VDD,act and VSS,act are the actual voltagelevel on the power and ground pin of the cell. Instance voltagedrop can be further decomposed into the voltage drops on eachmetal layer.

A. Static Rail Analysis

Since the static rail analysis is based on statistical powerconsumption, which summarizes the behavior of designs, onlyIR-drop can be analyzed. Fig. 4 shows the breakdown of theworst instance IR-drop into each metal layers consisting of thePDN of the 2-D and M3D designs. The M3D designs showapproximately 2× higher IR-drop on average compared to the2-D counterparts. Since M3D PDNs utilize more metal layersto deliver power to the bottom-tier cells, those cells experienceworse IR-drop compared to the top-tier cells (the dashed boxfrom M1B to M7B in Fig. 4).

Another reason for the higher IR-drop in M3D designs isirregular placement of power MIVs, which connect PDNs oftwo tiers. Fig. 5 illustrates the impact of irregular power MIVplacement. In M3D PDNs, current flowing in metal layersof the top tier is greater than those of the bottom tier (e.g.,IM7T > IM7B in Figs. 5 and 5) since top-tier metal layersdeliver current to both top- and bottom-tier cells, whereas

Fig. 4. Breakdown of the worst instance IR-drop across metal layers. M7Bdenotes M7 of the bottom tier in the M3D designs. SYS represents the systemmodel including C4 bump, package, and PCB model.

only current drawn by bottom-tier cells flows on bottom-tiermetal layers. Therefore, the minimum IR-drop path shownin Fig. 5 to deliver power to a bottom-tier cell utilizes theminimum length of top-tier power rails. However, the path canbe blocked by a missing power MIV. The absence of powerMIVs stems from top-tier cells since MIVs cannot penetratethose cells in order to preserve their active areas. In this case,the current needs to flow through an alternative path shownas actual path in Fig. 5, which utilizes longer top-tier metalwires and exhibits worse IR-drop due to higher current in thosewires. It is important to note that top-tier metal layers are moresusceptible to electromigration because of higher currents, butthe discussion is out of the scope of this paper.



Fig. 5. Current path to deliver power to a target cell. A top-tier cell isblocking a power MIV along the minimum IR-drop path, so the current isdelivered through an alternative path.

TABLE VI

AVERAGE AMOUNT OF CURRENT FLOWING THROUGH C4 BUMPS IN2-D AND M3D DESIGNS. THE UNIT OF CURRENT IS mA

As the footprint of an M3D design is half of its 2-Dcounterpart, the number of the C4 bumps that can be placedon the M3D design is approximately half of those in the 2-Ddesign as shown in Table V. This affects the amount of thecurrent flowing through each C4 bump. Table VI gives thecomparison of the current flowing through C4 bumps in the2-D and M3D designs. Up to 155.7%, higher current flowsthrough the C4 bumps in the M3D designs incurring significantdifference in IR-drop on the top metal layers (i.e., M7 andM6 in Fig. 4).

B. Dynamic Rail Analysis

Unlike the static rail analysis, the dynamic voltage dropconsists of two categories, resistive (IR)-drop and inductive(Ldi/dt)-drop. Ldi/dt-drop has a significantly higher impacton the voltage drop since we perform dynamic rail analysisfor two clock cycles with the maximum switching activity in areal workload. The voltage drop of the M3D designs is 11.3%higher on average than the 2-D designs as shown in Fig. 6,which is much less than that in the static rail analysis.

First, in the dynamic rail analysis, the difference betweenthe voltage drop of the metal layers in the 2-D designs and themetal layers of the top-tier PDN is significantly less than thatin the static rail analysis. The reduced voltage drop on thosemetal layers first results from 3-D placement of decaps in M3Ddesigns. As discussed in Section IV, decaps from nonswitchingdevices (implicit) and decap cells (explicit) in a design act ascharge reservoir, preventing nearby cells from experiencingsudden high Ldi/dt-drop. In M3D designs, Ldi/dt-drop isreduced because of decap cells in both, the xy plane, as in thecase of 2-D designs, as well as decap cells in the adjacent tier(z-dimension), as in TSV-based 3-D ICs [4].

Fig. 6. Breakdown of the worst instance dynamic voltage drop(=IR-drop + Ldi/dt-drop) across metal layers.

Fig. 7. Comparison of the worst voltage drop experienced at C4 bumps. Thedecoupling capacitance is set to 30% of the total capacitance of each design.

Fig. 7 shows the maximum voltage drop experienced atthe C4 bumps comparing the DCT 2-D and M3D designswith and without decap cells. The decoupling capacitance ofthe 2-D and M3D designs with decap cells is targeted to30% of their total capacitance. Even though the decouplingcapacitance added to the M3D design (72.6 pF) is smaller thanthe 2-D design (77.9 pF) due to the lower total capacitanceof the DCT M3D design, it benefits more from the addeddecap cells than the 2-D design as it utilizes decaps in the z-dimension as well. Another reason for the smaller gap betweenthe 2-D and M3D voltage drop in the dynamic rail analysisis the reduced voltage drop on the system model as shownin Fig. 6 due to the varying impedance seen from the diedepending on operating frequency, which will be discussed inSection VI-C.

C. Frequency- and Time-Domain Analysis

As shown in Fig. 2, implicit and explicit decaps on a diemodel are coupled with inductors in a system model, forming



TABLE VII

EFFECTIVE RESISTANCE AND CAPACITANCE OF 2-D AND M3D PDN.RESISTANCE IS IN �, AND CAPACITANCE IN NANOFARADS

an RLC circuit. The RLC circuit has its own resonance fre-quency, causing significant voltage drop on the PDN even withsmall changes in load current. Explicit decaps on the packageand PCB also form RLC circuits with the correspondinginductors, showing their unique resonance frequencies.

To perform an in-depth frequency- and time-domain analy-sis on a PDN, a reasonable die model that represents 2-Dand M3D full-chip system-on-chip (SoC) is needed. Since thebenchmarks used in this paper are small compared to full-chipSoC designs, we use their parameters to create a full-chip diemodel. Table VII shows the effective resistance and capaci-tance of the PDN of each benchmark (RPDN,eq and CPDN,eq +CDIE_dc,eq in Fig. 2, respectively). We observe that as a designbecomes larger, the capacitance of its PDN increases due tothe increased ground and coupling capacitance of the PDN,while the resistance becomes smaller because more numberof parallel resistive paths to the cells are available. For easein modeling, we take the average of the RC product fromthe three benchmarks and model a full-chip die by assumingCPDN,eq + CDIE_dc,eq = 10 nF, resulting in the associatedresistances as 7.87 and 18.9 m� for the 2-D and M3D designs,respectively.

Fig. 8 shows the frequency response of the 2-D and M3Dfull-chip SoC sweeping the frequency of the ac load currentsource, ILOAD,eq. We observe three resonance frequency points,first-order resonance caused by CPDN,eq + CDIE_dc,eq coupledwith LC4, second-order resonance by CPKG_dc with LPKG,and third-order resonance by CBULK_dc with LPCB. While thethird-order and second-order resonances occur at a few kilo-hertz and megahertz range, the largest resonance, first-orderresonance is in the range between 50 and 200 MHz. Althoughthe M3D design shows 16.7% increase at the second-orderresonance frequency, as the operating frequencies of full-chipdesigns at advanced technology nodes are in the range offirst-order resonance frequencies, it is crucial to minimize thefirst-order resonance impact for a robust PDN.

As shown in Fig. 9, the M3D design exhibits 35.9% lowerpeak impedance at the first-order resonance frequency becauseof high effective resistance of the M3D PDN due to the seriesresistive path across tiers. An interesting point is that the highresistance of M3D PDNs, which worsens IR-drop, in fact,improves the resiliency against noise by damping noise atworst case resonance oscillation. This paper is the first studyto demonstrate this effect of M3D designs.

Fig. 8. Impedance seen from �VDIE in Fig. 2 by sweeping the frequencyof ac load current source, ILOAD,eq

Fig. 9. Transient voltage response for (a) unit step and (b) unit 112 MHz(first-order resonance frequency) sine-wave load current source ILOAD,eq.Third-order resonance frequency is not shown in (a) for brevity.

Fig. 9(a) and (b) shows the improved resiliency of theM3D PDN, showing the time-domain response for a unit step,which models in-rush current simulation and for a 112-MHz(first-order resonance frequency) unit sine-wave load cur-rent source. The following equation explains the die voltageresponse affected by first-order resonance for a unit step loadcurrent source [7]:

�VDIE ∼= 2R +√

2LC4CDIE,eq

· e− R2LC4 t sin(ωr − θ) (2)

where R = RPCB + RPKG + RC4 + RPDN,eq, ωr and θ arethe first-order resonance frequency and phase, respectively.While the increased R in an M3D worsens the IR-drop atcell [the first term in (2)], it helps to reduce the secondterm, Ldi/dt-drop. The improved resiliency for the first-orderresonance helps to neutralize the voltage drop gap inducedby the second-order resonance in the worst voltage drop asshown in Fig. 9(a) and shows 12.4% less voltage drop with



Fig. 10. Cell placement of top-tier cells (a) in the baseline M3D (M3D-base) and (b) in M3D with top-tier cell RP technique (M3D w/ RP) whichrepositions obstructing top-tier cells to the nearest available space.

current source oscillating at the first-order resonance frequencyas shown in Fig. 9(b).

VII. M3D PDN OPTIMIZATION

In this section, we present methodologies to improve staticas well as dynamic voltage drop of M3D designs. Although3-D placement of decap cells and better resiliency againstac current noise help reduce Ldi/dt-drop, M3D designs stillsuffer from higher IR-drop due to additional metal layers,irregular placement of power MIVs, and fewer C4 bumps.In addition, as technology advances, IR-drop of M3D designsbecomes worsened because of the increased PDN resistancedue to restricted routing resources, whereas the increasedresistance offer better resiliency against ac current noise,especially in M3D designs, mitigating Ldi/dt drop. Therefore,our methodologies mainly focus on reducing the IR-drop ofM3D designs by RP top-tier cells and utilizing asymmetrictop- and bottom-tier M3D PDNs.

A. Top-Tier Cell Repositioning

Among the factors of worse IR-drop of M3D designs,additional metal layers and fewer C4 bumps are inevitabledue to the nature of M3D. However, irregular power MIVplacement is avoidable by preventing top-tier cells from beingplaced at the location that power MIVs are supposed to lie in.

To achieve this, after performing area-balanced min-cut tierpartitioning in the baseline M3D design flow presented inSection III, we reposition the top-tier cells located above thetopmost power rails of the bottom tier to the nearest availablespace. This produces empty spaces for power MIVs, makingthem tightly couple top- and bottom-tier PDNs without anyobstruction due to the top-tier cells. Fig. 10(b) shows therepositioned top-tier cells, which reserve rooms for powerMIVs to connect the power rails of the bottommost powerrails of the top tier to the topmost power rails of the bottomtier without any obstruction as opposed to Fig. 10(a). Asdiscussed in Section VI-A, to minimize IR-drop of M3Ddesigns, the IR-drop path from current sources (i.e., C4 bumps)to a bottom-tier cell should utilize the minimum top-tier powerrails because of the large current flowing through the top-tier

Fig. 11. Comparison of the IR-drop path of (a) M3D-base and (b) M3D w/RP. Red/blue lines: IR-drop paths of power/ground. Dotted/solid lines: top-and bottom-tier power rails, respectively.

power rails. Top-tier cell RP technique ensures power MIVsto be placed every intersection between the topmost powerrails of the bottom tier and the bottommost power rails ofthe top tier so that helps current flow through the optimalIR-drop path. Fig. 11 shows the impact of top-tier cell RP bycomparing the same IR-drop path [i.e., the IR-drop path tothe worst instance IR-drop cell in the baseline M3D design(M3D-base)]. While M3D-base shown in Fig. 11(a) utilizeslong top-tier power rails (red/blue dotted lines), the M3Ddesign with the top-tier cell RP technique (M3D w/ RP)shown in Fig. 11(b) utilizes shorter top-tier power rails, mainlyusing bottom-tier power rails (red/blue solid lines) and, hence,mitigating the IR-drop.

Table VIII shows the benefit of top-tier cell RP techniqueon the static voltage drop of DCT, AES-128, and JPEGM3D designs. With RP top-tier cells, static voltage dropis improved up to 14.8%. To analyze this trend more indetail, the breakdown of the static voltage drop across metallayers is presented in Fig. 12. Although the top-tier cell RPtechnique tends to increase the IR-drop of the bottom-tierpower rails (the dashed box from M1B to M7B; at most 14.8%in AES-128 design) as it makes the IR-drop path utilize morebottom-tier power rails, it efficiently reduces the IR-drop onthe top-tier power rails (up to 21.6% in AES-128 design).Considering the IR-drop of the top-tier power rails dominatesthe total IR-drop, the technique significantly reduces the totalstatic voltage drop of M3D designs.

However, the top-tier cell RP technique might degrade thetiming integrity and dynamic voltage drop, which result fromthe increased parasitics and power consumption as shown inTable VIII. These increases are mainly attributed to the wire



TABLE VIII

KEY METRIC COMPARISON OF THE BASELINE M3D (M3D-BASE) AND M3D DESIGNS WITH TOP-TIER CELL REPOSITIONING TECHNIQUE(M3D W/ RP). �% FOR M3D W/ RP DESIGNS IS CALCULATED WITH RESPECT TO THE RESPECTIVE M3D-BASE DESIGNS

Fig. 12. Comparison of the breakdown of the worst instance IR-drop acrossmetal layers between M3D-base (base) and M3D w/ RP (w/ RP).

length increase, which results from the top-tier cell movement.In the M3D design flow used in this paper, cell placement isdetermined and optimized during implementing a shrunk2-Ddesign, so that the wire length among cells is minimized.However, while top-tier cells are moved to preserve roomsfor power MIVs, the wire length is increased. As the increasein wire length is rather minor (up to 1.3% in the DCT design),although the technique causes degradation in slack of thecritical timing path (worst negative slack) but not signifi-cant compared with clock period. However, it significantlyincreases the peak dynamic power consumption (up to 5.5%in the DCT design), especially when the lengthened wires areheavily used in the vector-based power simulation.

This degradation in dynamic voltage drop can be resolvedby using asymmetric top- and bottom-tier M3D PDN, whichwill be discussed in Section VII-B.

B. Asymmetric Top- and Bottom-Tier PDN

Although top-tier cell RP technique helps to reduce theIR-drop on top-tier power rails, the top-tier IR-drop still takesa large portion of the total IR-drop of M3D designs as shownin Fig. 12. As the large current flowing through top-tierpower rails is unavoidable because they deliver power toboth top- and bottom-tier cells, we should reduce the IR-drop

Fig. 13. Cross-sectional view of an asymmetric top- and bottom-tier M3DPDN. The width of top-tier power rails is scaled up by α, that of bottom-tierpower rails is scaled down by α. Note that the width of M1 is not scaled asit is determined by power-pin width of standard cells, and M3 and M4 arenot used for top- and bottom-tier power rails in our designs.

by decreasing the resistance of top-tier power rails with anasymmetric top- and bottom-tier PDN.

Fig. 13 shows the cross-sectional view of an asymmetrictop- and bottom-tier M3D PDN. The width of top-tier powerrails is increased by an unbalancing factor, α, to decreasethe resistance and accommodate large current which deliverpower to both top- and bottom-tier cells. On the other hand,the width of bottom-tier power rails is decreased by thesame rate, α, since rather small current, which is drawnby only bottom-tier cells, flows on the bottom-tier powerrails. This helps to retain the signal routing resource takenby the wider top-tier power rails because narrower powerrails occupy smaller number of signal routing tracks. Thewidth of M1 of both top- and bottom-tier power rails is notscaled since it is determined by power/ground pins of standardcells.

The worst instance IR-drop with different unbalancing fac-tors from 10% to 50% for M3D-base and M3D w/ RP ispresented in Fig. 14. As the unbalancing factor increases,static voltage drop tends to decrease, showing only 25.7%higher static voltage drop in AES-128 design compared toits 2-D counterpart. To analyze the trend, we employ thefollowing equation that determines the optimal unbalancingfactor assuming the current flowing through the top-tier powerrails is exactly twice of that through the bottom-tier power rails



Fig. 14. Normalized static voltage drop with different unbalancing factors forasymmetric top- and bottom-tier PDN in M3D-base and M3D w/ RP. Staticvoltage drop values are normalized to their 2-D counterparts.

along the worst instance IR-drop path:min(�VDIE) = min(Itop · Rtop + Ibot · Rbot)

= min(2 · Ibot · (ρ · l/(t · w(1 + α)))+Ibot · (ρ · l/(t · w(1 − α))))

= 2.91 · Ibot · (ρ · l/(t · w)) at α = 0.17 (3)where Itop and Ibot are the current flowing the top- andbottom-tier power rails, and ρ is the resistivity of the powerrails. l, t , and w are the length, thickness, and width of thepower rails, respectively, and α is the unbalancing factor.

The equation indicates that the static voltage drop is min-imized when the width of top-tier power rails is scaled upby 17%, and bottom-tier power rails is scaled down by 17%,which is different from our experiment result. This differencefirst comes from the fact that the current flowing throughtop-tier power rails is not twice of bottom-tier power rails.

In addition and most importantly, the current sources of thetop and bottom tier affect the difference. Current is deliveredto bottom-tier power rails through power MIVs whose pitch isin nanometer scale, which establish a large number of currentsources for the bottom tier. This significantly reduces currentflowing through the power rails near power MIVs, hence,reducing IR-drop of top metal layers of the bottom tier. On theother hand, the current sources of the top tier are C4 bumpswhose pitches are in the micrometer scale. Therefore, thereare only a few C4 bumps in a design as shown in Table V.For that reason, there is significantly large current flowing nearC4 bumps, which worsens IR-drop of the design as shown inlarge IR-drop on top metal layers of the top tier (i.e., M7T andM6T in Fig. 4). In order to compensate the large current nearC4 bumps on the top tier, wider width than the ratio derivedfrom (3) is required for top-tier power rails.

Another interesting point in Fig. 14 is that in M3D-base,the decrease in static voltage drop with increasing unbalancingfactor is rather smaller and not monotonic. Fig. 15 shows

Fig. 15. Breakdown of the worst instance IR-drop across metal layerswith 0%, 30%, and 50% unbalancing factors in AES-128 M3D-base andM3D w/ RP.

the impact of irregular power MIV placement on asymmetrictop- and bottom-tier M3D PDN in AES-128 designs. On theother hand, the IR-drop of the top-tier PDN is monotonicallydecreasing in both M3D-base and M3D w/ RP designs as theunbalancing factor increases, the IR-drop of the bottom-tierPDN tends to increase since the resistance increases as theirwidth is scaled down. However, in M3D-base, the irregularlyplaced power MIVs increase the current flowing power MIVsas the number of power MIVs is smaller than M3D w/ RP,significantly increasing the IR-drop of top metal layers ofbottom-tier power rails. Furthermore, as current flows througha longer alternative path that has higher resistance rather thanthe shorter optimal path, it also increases the bottom-tierIR-drop.

These trends also tend to apply to dynamic voltage drop.Fig. 16 presents the normalized dynamic voltage drop ofM3D designs changing with different unbalancing factors,even showing less dynamic voltage drop in AES-128 M3Ddesigns compared to their 2-D counterparts. Static voltagedrop reduction due to the asymmetric top- and bottom-tierM3D PDN in conjunction with higher resiliency against accurrent noise due to higher M3D PDN resistance helps toreduce dynamic voltage drop of M3D designs.

However, the widened top-tier power rails negatively affectsthe timing of M3D designs, but the impact can be minimizedby utilizing more bottom-tier routing resources for signalrouting. For example, consider that two top tier cells needto be connected, and the top-tier routing resource is restricteddue to the increased width of top-tier power rails. Instead ofrouting them using only top-tier metal layers, it utilizes signalMIVs to connect the two top-tier cells through bottom-tiermetal layers.

Table IX gives the comparison of the design and timingmetrics of the AES-128 M3D designs with unbalancing factorof 0% and 50%. As shown in the table, M3D design withunbalancing factor 50% utilizes less wire on the top tierbecause of the reduced available resources caused by thewidened top-tier power rails. Instead, it uses more routing



Fig. 16. Dynamic voltage drop normalized to their 2-D counterparts withdifferent unbalancing factors in M3D-base and M3D w/ RP.

TABLE IX

KEY METRIC COMPARISON OF THE M3D DESIGN WITH UNBALANCINGFACTOR (α) 0% AND 50%. �% FOR α = 50% DESIGN IS CALCULATED

WITH RESPECT TO α = 0% DESIGN

resources of the bottom tier and more signal MIVs because ofthe increased available routing resources on the bottom tier.

As the M3D design with unbalancing factor 50% utilizesmore bottom tier wires, wire ground and coupling capacitanceof the bottom tier are increased, while those of top tier aredecreased, keeping total wire capacitance almost the same(only 0.4-pF increase), and hence, the impact on the criticalpath is quite low, considering the clock period of the design.

VIII. OBSERVATIONS

IR-drop in M3D is approximately 2× higher IR-drop than2-D designs in our static rail analysis based on the statisticalpower simulation. This is attributed to: 1) increased resistivepath due to additional metal layers to pass through to supplypower to bottom-tier cells; 2) irregular placement of powerMIVs due to top-tier cells preventing current from flowing theoptimal resistive path; and 3) high current flowing C4 bumpsdue to reduced number of C4 bumps placed in an M3D design.

Ldi/dt-drop in M3D is less than 2-D design based onour dynamic rail analysis with frequency- and time-domainstudies, showing only 11.3% higher dynamic voltage drop,which includes both IR-drop and Ldi/dt-drop, than the 2-D

counterparts. This is mainly due to: 1) 3-D placement of decapcells which a cell utilizes; cells an M3D design utilize decapsplaced on the same tier (xy plane) as well as on the differenttier (z-direction) and 2) improved resiliency against ac currentnoise, which results from higher resistance of M3D PDN dueto a series connection across tiers, and PDN shows 35.9% peakimpedance reduction at the worst case resonance oscillations.

M3D PDN optimization should focus on reducing IR-dropof M3D design since M3D PDN shows its strength on Ldi/dt-drop. We propose two methodologies: 1) top-tier cell RPhelps current flow the minimum resistive path without beingblocked by top-tier cells, and it avoids top-tier cell placementon the location that power MIVs are to be placed, showingup to 14.8% static voltage drop reduction and 2) asymmetrictop- and bottom-tier M3D PDNs in conjunction with top-tier cell RP technique reduce static voltage drop further,showing 25.7% higher static voltage drop and 9.0% lessdynamic voltage drop compared to its 2-D counterpart, andthe technique utilizes wider width for top-tier metal layers inM3D PDN since higher current flows in the metal layers thanthose of the bottom tier, whereas narrower bottom-tier powerrails to retrieve signal routing resources taken by wider top-tierpower rails.

IX. CONCLUSION

In this paper, we, for the first time, presented an in-depthstudy and optimization methodologies for PDNs in M3Ddesigns. We built a system-level PDN of M3D designs andperformed comprehensive studies including static, dynamicrail analysis as well as frequency- and time-domain analysis.Although M3D PDNs suffer from high IR-drop due to addi-tional metal layers, irregular placement of power MIVs, andless C4 bumps, they reduce Ldi/dt-drop from 3-D placementof decap cells. In addition, higher resistance of M3D PDN dueto its series resistive path across tiers improves the resiliencyagainst ac noise showing up to 35.9% peak impedance reduc-tion at the first-order resonance frequency. From the obser-vations, we present two optimization methodologies, top-tiercell RP and asymmetric top- and bottom-tier PDN, whichefficiently improve the integrity of M3D PDNs. The proposedmethodologies reduce static and dynamic voltage drop up to32.6% and 17.0% compared to the baseline M3D design,respectively, showing even less dynamic voltage drop (up to9.0%) than the 2-D counterpart.

REFERENCES

[1] P. Larsson, “Resonance and damping in CMOS circuits with on-chipdecoupling capacitance,” IEEE Trans. Circuits Syst., vol. 45, no. 8,pp. 849–858, Aug. 1998.

[2] S. Das, P. Whatmough, and D. Bull, “Modeling and characterizationof the system-level power delivery network for a dual-core arm cortex-a57 cluster in 28 nm CMOS,” in Proc. Int. Symp. Low Power Electron.Design, Jul. 2015, pp. 146–151.

[3] S. Pant and E. Chiprout, “Power Grid Physics and Implications forCAD,” in Proc. ACM Design Autom. Conf., 2006, pp. 199–204.

[4] N. H. Khan, S. M. Alam, and S. Hassoun, “Power delivery design for3-D ICs using different through-silicon via (TSV) technologies,” IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 4, pp. 647–658,Apr. 2011.



[5] S. K. Samal, K. Samadi, P. Kamal, Y. Du, and S. K. Lim, “Fullchip impact study of power delivery network designs in monolithic3D ICs,” in Proc. IEEE Int. Conf. Comput.-Aided Design, Nov. 2014,pp. 565–572.

[6] S. A. Panth, K. Samadi, Y. Du, and S. K. Lim, “Design and CADmethodologies for low power gate-level monolithic 3D ICs,” in Proc.Int. Symp. Low Power Electron. Design, 2014, pp. 171–176.

[7] S. Pant, “Design and analysis of power distribution networks in VLSIcircuits,” Ph.D. dissertation, Elect. Eng., Univ. Michigan, Ann Arbor,MI, USA, 2007.

[8] B. Bozorgzadeh and A. Afzali-Kusha, “Decoupling capacitor optimiza-tion for nanotechnology designs,” in Proc. Int. Conf. Microelectron.,Dec. 2008, pp. 19–22.

Kyungwook Chang received the B.S. degree inelectrical and computer engineering and the M.S.degree in electrical engineering and computer sci-ence from Seoul National University, Seoul, SouthKorea, in 2007 and 2010, respectively. He is cur-rently working toward the Ph.D. degree at the Schoolof Electrical and Computer Engineering, GeorgiaInstitute of Technology, Atlanta, GA, USA.

His current research interests include monolithic3-D IC technology, advanced technology nodes,low-power designs, and 3-D IC design methodology.

Shidhartha Das received the M.Sc. and Ph.D.degrees from the University of Michigan, AnnArbor, MI, USA, in 2003 and 2009, respectively.

He is currently a Senior Principal Research Engi-neer at Arm Research, Cambridge, U.K. His cur-rent research interests include emerging nonvolatilememory technologies, microarchitectural and circuitdesign for variation measurement and mitigation,on-chip power delivery, and VLSI architectures fordigital signal processing accelerators.

Dr. Das was a recipient of the Arm Patent Cubein 2017 and the Arm Inventor of the Year Award in 2016 for his contributionsto emerging nonvolatile memory technologies, multiple best paper awards(CAL 2017, ISLPED 2015, SAME 2010, and MICRO 2003), and the Micro-processor Review Analysts Choice Award in Innovation in 2007. He served asa Guest Editor for the Journal of Solid-State Circuits and an Associate Editorfor IEEE SOLID-STATE CIRCUITS LETTERS. He serves on the TechnicalProgram Committee of ESSCIRC and the ISSCC Students Research PreviewCommittee.

Saurabh Sinha received the B.Tech. degree inelectronics and instrumentation engineering fromthe National Institute of Technology Rourkela,Rourkela, India, in 2006, and the M.S. and Ph.D.degrees in electrical engineering from Arizona StateUniversity, Tempe, AZ, USA, in 2008 and 2011,respectively.

He is currently a Staff Research Engineer atARM Research, Austin, TX, USA. His currentresearch interests include predictive technology,design–technology cooptimization at advanced tech-

nology nodes, and disruptive technologies such as 3-D ICs and their impacton design.

Brian Cline received the B.S. degree in electricalengineering from The University of Texas at Austin,Austin, TX, USA, in 2004, and the M.S. and Ph.D.degrees in electrical engineering from the Universityof Michigan, Ann Arbor, MI, USA, in 2006 and2010, respectively.

From 2006 to 2010, he was a Graduate Fellow atthe Semiconductor Research Corporation, Durham,NC, USA. He is currently a Principal ResearchEngineer in the ARM Research Group, Austin,TX, USA. His current research interests include

design technology cooptimization, low-power circuit design, variation-awarecomputer-aided design tool development, and very large scale integrationdesign optimization for high-performance and low-power designs.

Greg Yeric received the Ph.D. degree in micro-electronics from The University of Texas at Austin,Austin, TX, USA, in 1993.

He joined the Motorola’s Advanced ProductsResearch and Development Laboratories, Austin,TX, USA, in embedded nonvolatile memory processintegration. He was involved in the research of tech-nology development roles at TestChip Technologies,Plano, TX, USA; HPL Technologies, San Jose, CA,USA; and Synopsys, Mountain View, CA, USA.He is currently an ARM Fellow and the Director

of Future Silicon Research at ARM Research, Austin, TX, USA, where he isinvolved in design–technology cooptimization and predictive technology.

Sung Kyu Lim (S’94–M’00–SM’05) received theB.S., M.S., and Ph.D. degrees from the University ofCalifornia at Los Angeles, Los Angeles, CA, USA,in 1994, 1997, and 2000, respectively.

In 2001, he joined the School of Electricaland Computer Engineering, Georgia Institute ofTechnology, Atlanta, GA, USA, where he is cur-rently the Dan Fielder Endowed Chair Professor.He has authored Practical Problems in VLSI Physi-cal Design Automation (Springer, 2008). His currentresearch interests include modeling, architecture, and

electronic design automation (EDA) for 3-D ICs. His research on 3-D ICreliability is featured as Research Highlight in the Communication of the ACMin 2014. His 3-D IC test chip published in the IEEE International Solid-StateCircuits Conference, in 2012, is generally considered the first multicore 3-Dprocessor ever developed in academia.

Dr. Lim was a recipient of the National Science Foundation Faculty EarlyCareer Development (CAREER) Award in 2006, the Distinguished ServiceAward in 2008, the Best Paper Awards from the IEEE Asian Test Symposiumin 2012, the IEEE International Interconnect Technology Conference in 2014,and the Class of 1940 Course Survey Teaching Effectiveness Award fromthe Georgia Institute of Technology in 2016. He was on the Advisory Boardof the ACM Special Interest Group on Design Automation from 2003 to2008. He was an Associate Editor of the IEEE TRANSACTIONS ON VERYLARGE SCALE INTEGRATION SYSTEMS from 2007 to 2009. He has beenan Associate Editor of the IEEE TRANSACTIONS ON COMPUTER-AIDEDDESIGN OF INTEGRATED CIRCUITS AND SYSTEMS since 2013. He hasserved on the Technical Program Committee of several premier conferencesin EDA.


/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 600 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 400 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

/CreateJDFFile false /Description >>> setdistillerparams> setpagedevice

System-Level Power Delivery Network Analysis and Optimization for Monolithic … · 2020. 2....

Documents

Transcript of System-Level Power Delivery Network Analysis and Optimization for Monolithic … · 2020. 2....