01542241.pdf

download 01542241.pdf

of 22

Transcript of 01542241.pdf

  • IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005 1859

    Modeling of Failure Probability and StatisticalDesign of SRAM Array for Yield Enhancement

    in Nanoscaled CMOSSaibal Mukhopadhyay, Student Member, IEEE, Hamid Mahmoodi, Student Member, IEEE, and

    Kaushik Roy, Fellow, IEEE

    AbstractIn this paper, we have analyzed and modeled failureprobabilities (access-time failure, read/write failure, and hold fail-ure) of synchronous random-access memory (SRAM) cells due toprocess-parameter variations. A method to predict the yield of amemory chip based on the cell-failure probability is proposed. Amethodology to statistically design the SRAM cell and the memoryorganization is proposed using the failure-probability and theyield-prediction models. The developed design strategy statisti-cally sizes different transistors of the SRAM cell and optimizesthe number of redundant columns to be used in the SRAM array,to minimize the failure probability of a memory chip under areaand leakage constraints. The developed method can be used in anearly stage of a design cycle to enhance memory yield in nanometerregime.

    Index TermsLeakage, performance, random dopant fluctu-ation (RDF), robustness, synchronous random-access memory(SRAM), yield.

    I. INTRODUCTION

    THE random variations in process parameters have emerg-ed as a major design challenge in circuit design in thenanometer regime [1][3]. The sources of the inter-die and theintra-die variations in process parameters includes variations inchannel length, channel width, oxide thickness, threshold volt-age, line-edge roughness, and random dopant fluctuations [therandom variations in the number and location of dopant atomsin the channel region of the device resulting in the randomvariations in transistor threshold voltage (RDF)] [1][5]. Thesedifferent sources of variations result in significant variation inthe delay and the leakage of digital circuits [1][5]. The inter-die variation in a parameter [say threshold voltage (Vt)] mod-ifies the value of that parameter of all transistors in a die inthe same direction (i.e., threshold voltage of all the transistorseither increase or reduce). This principally results in a spreadin the delay and the leakage, but does not cause a mismatchbetween different transistors in a die. On the other hand, theintra-die variations shift the process parameters of different

    Manuscript received September 14, 2003; revised December 2, 2004. Thiswork was supported in part by the Semiconductor Research Corporation, theDefence Advance Research Project Agency Power Aware Computing andCommunication (DARPA PACC) Program, Intel, and IBM Corporation. Thispaper was recommended by Associate Editor S. Sapatnekar.

    The authors are with the Department of Electrical and Computer Engi-neering, Purdue University, West Lafayette, IN 47907 USA (e-mail: [email protected]; [email protected]; [email protected]).

    Digital Object Identifier 10.1109/TCAD.2005.852295

    transistors in a die in different directions (e.g., Vt of sometransistors increase whereas that of some others reduce). Theintra-die (or on-die) variations can be systematic (i.e., shiftin a parameter of one transistor depends on the shift of thatparameter of a neighboring transistor) or random (i.e., shiftsin a parameter of two neighboring transistors are completelyindependent). An example of the systematic intra-die variationcan be the change in the channel length of different transistorsof a die that are spatially correlated. The RDF induced Vtvariation is a classic example of the random intra-die variation.The systematic variation does not result in large differencesbetween the two transistors that are in close spatial proximity.The random component of the intra-die variation can result ina significant mismatch between the neighboring transistors in adie [1][5].

    In a static random-access memory (SRAM) cell, a mis-match in the strength between the neighboring transistors,caused by intra-die variations, can result in the failure of thecell [7][9]. For example, a cell failure can occur due to: 1) anincrease in the cell access time (access time failure); 2) unstableread (flipping of the cell data while reading) and/or write(inability to successfully write to a cell) operations (read/writefailure); or 3) failure in the data holding capability of thecell (flipping of the cell data with the application of a supplyvoltage lower than the nominal one) at the standby mode (holdfailure in the standby mode). Since these failures are causedby the variations in the device parameters, these are knownas the parametric failures [8], [9]. There can also be hardfailures (caused by open or short) or soft failures due to softerror. In this paper, we will concentrate only on the parametricfailures, and hereafter, by the word failure, we will referto the parametric failures. A failure in any of the cells in acolumn of the memory will make that column faulty. In amemory, the redundant columns are used to improve the faulttolerance of the memory and when a column is detected as afaulty one, it gets replaced by an available redundant column.Thus, if the number of faulty columns in a memory chip islarger than the number of available redundant columns, thenthe chip is considered to be faulty (a similar argument holdsfor the memory designed with the row redundancy). Hence, theprobability of failure of a cell is directly related to the yieldof a memory chip. Thus, the intra-die-variation-induced devicemismatch can significantly reduce the yield of a memory. As theeffect of the intra-die variations increases with the technology

    0278-0070/$20.00 2005 IEEE

  • 1860 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 1. SRAM cell.

    scaling [1][5], analysis and reduction of the mismatch-inducedparametric failures in an SRAM cell is extremely necessary toenhance the yield of a memory designed in nanoscaled com-plementary metaloxidesemiconductor (CMOS) [8], [9]. Inthis paper, we have analyzed and analytically modeled the dif-ferent types of parametric failures (mentioned above) that canoccur in an SRAM cell.

    Among the different sources of random intra-die variations,the most significant one is the threshold-voltage (Vt) variationdue to RDF. The impacts of random dopant effect are most pro-nounced in minimum-geometry transistors commonly used inarea-constrained circuits such as SRAM cells [7]. Hence, in thiswork we have principally concentrated on the on-die randomvariation in the threshold voltage of the different transistors in acell due to RDF. However, the analysis is equally applicable toall the other sources of the intra-die variations such as channellength, channel width, etc. Our analysis is primarily focusedon the random component of the intra-die variation. However,we also have considered the effect of the correlation amongthe threshold voltage of the different transistors in a cell tounderstand the impact of the systematic variations.

    The parametric variations, and in particular the Vt fluctuationdue to RDF, is a strong function of the size of different transis-tors in the cell [channel length (L), width (W )]. Hence, thefailure probability of a memory can be reduced by optimallydesigning the size of different transistors. However, any suchoptimization has to consider its impact on the overall area andthe leakage of the SRAM array. Moreover, the memory organi-zation [i.e., number of rows (NROW) and number of columns(NCOL) and the number of redundant columns (NRC)] alsohave a strong impact on the memory-failure probability. Hence,a statistical design of the SRAM cell and memory organizationis very important to reduce the memory-failure probability andto improve the yield in nanoscaled SRAM.

    In this paper, we have developed a statistical methodologyto design the size of the different transistors of an SRAM cell

    and the memory organization, in order to reduce the memory-failure probability considering on-die Vt variations, which areconstrained by the overall memory area and leakage power. Inparticular, we have:

    1) presented semianalytical models for the access time andthe read, write, and hold failures of a cell due to the on-dierandom variation of transistor threshold voltages;

    2) analyzed the effect of the correlation of the threshold volt-ages of different transistors on the failure probabilities;

    3) developed a method to estimate the failure probability ofa memory (considering the memory architecture and thenumber of redundant columns) and to predict the yield ofa memory chip;

    4) presented a statistical analysis of the impact of circuit(transistors sizing) and architecture (NROW, NCOL, andNRC) on the cell- and memory-failure probability;

    5) proposed a statistical-design strategy to reduce thememory-failure probability and improve the yield innanoscaled SRAM.

    The statistical-design strategy is developed considering theon-die threshold voltage (Vt) variation, but can be extended toconsider on-die L and W variations.

    The remainder of this paper is organized as follows.Section II briefly describes the different failure mechanisms inan SRAM cell. The modeling of the different failure probabil-ities in an SRAM cell are explained in Section III. The modelsdeveloped in Section III are based on the complex short-channeltransistors models that require numerical solutions to estimatethe probability values. However, such a numerical solution(although accurate) is computationally very expensive. Hence,in Section IV, we have presented the analytical estimation ofthe failure probabilities using simple long-channel transistormodels. The models using long-channel current models predictthe probabilities fast, but they are less accurate than the com-plete numerical solutions. In Section V, we have analyzed thesensitivity of the failure probabilities to environmental condi-tions and transistor sizes. Section VI illustrates the statistical-design approach for SRAM. Finally, Section VII concludes thepaper.

    II. MECHANISMS OF FAILURE IN AN SRAM CELL

    On-die variations in the process parameters (e.g., thresholdvoltage, channel length, channel width, etc., of transistors) re-sult in the mismatch in the strength of the different transistorsin an SRAM cell. This device mismatch can result in the failureof the SRAM cell. The parametric failures in an SRAM cell(Fig. 1) are principally due to:

    1) destructive read (i.e., flipping of the stored data in a cellwhile readingknown as read failure);

    2) unsuccessful write (inability to write to a celldefined aswrite failure);

    3) an increase in the access time of the cell resulting in aviolation of the delay requirementdefined as access-time failure;

    4) the destruction of the cell content in standby mode withthe application of a lower supply voltage (primarily toreduce leakage in standby mode)known as hold failure.

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1861

    Fig. 2. Unstable read, write, and hold operations: (a) read, (b) write, and (c) hold failure.

    In this section, we briefly discuss the mechanisms of each ofthese failures caused by the mismatch in the threshold voltage(due to random intra-die variations) of the different transistorsin an SRAM cell.

    A. Read Failure

    While reading the cell shown in Fig. 1 (VL = 1 andVR = 0), due to the voltage divider action between AXRand NR, the voltage at node R (VR) increases to a positivevalue VREAD. If VREAD is higher than the trip point of theinverter PL NL (VTRIPRD), then the cell flips while readingthe cell [Fig. 2(a)] [11]. This represents a read-failure event. Ifthe strength of the access transistor (AXR) is higher than thatof the pull-down N-type metal oxide semiconductor (NMOS)transistor (NR), the voltage division action between the twotransistors increases the voltage VREAD. A measure of therelative strength of the AXR and NR is the ratio of the ONcurrent [known as the beta ratio (BRnpdnax)] of these twotransistors and is given by

    BRnpdnax =npdnax

    =effCoxWnpd

    LnpdeffCoxWnax

    Lnax

    (1)

    where e is the effective mobility, Cox is the oxide capaci-tance (assumed to be same as the oxide thickness of both thetransistors are same), Wnax and Wnpd are the widths of theaccess transistor and the pull-down NMOS, respectively, andLnax and Lnpd are the lengths of the access transistor andthe pull-down NMOS, respectively. A decrease in BRnpdnaxincreases VREAD, thereby facilitating read failure. Hence, whiledesigning an SRAM cell, the size of the access transistor isusually reduced from that of the pull-down NMOS to increaseBRnpdnax. However, such a design strategy does not considerthe effect of the parameter variation resulting in the randomvariation in the strengths of different transistors. For example,due to the random variation in the threshold voltage, a reductionin the Vt of the access transistor (increase in strength) andan increase in the Vt (reduction in strength) of the pull-downNMOS results in an increase in the VREAD from its nominalvalue (i.e., value designed by optimizing the beta ratio), thereby

    resulting in a read failure. Similarly, the trip point of the inverterPL NL depends on the strengths of the pull-up P-type metaloxide semiconductor (PMOS) and pull-down NMOS. Undernominal condition, the cell is designed to have a weaker PMOS(to facilitate writing, as we will explain in the next section)that results in a lower value of VTRIP. Although the nominalvalue of VTRIP is not less than the nominal value of VREAD,parameter variation can result in an increase in the Vt of PLand/or reduction in the Vt of NL. This can lower VTRIP belowVREAD, thereby resulting in read failure. It should be noted thatthe read failure is caused by the mismatch in the strength of thedifferent transistors (e.g., if strength of AXR increases, whilethat of NR reduces). This mismatch can only be caused bythe effect of random intra-die variation and not by the inter-die variation (inter-die variation will shift the threshold voltageof all the transistors in the same direction). Hence, an increasein the random intra-die variation can significantly increase theread failure.

    B. Write Failure

    While writing a 0 to a cell storing 1, the node VL getsdischarged through BL to a low value (VWR) determined bythe voltage division between the PMOS PL and the access tran-sistor AXL [11]. If VL cannot be reduced below the trip pointof the inverter PR NR (VTRIPWR) within the time whenword-line is high (TWL), then a write failure occurs [Fig. 2(b)].The discharging current (IL) at node L is the difference inthe ON currents of the access transistor AXL (IAXL) and thePMOS PL (IPL) (i.e., IL = IAXL IPL). Hence, a strongerPMOS and a weaker access transistor can significantly slowdown the discharging process, thereby causing a write failure.Thus, while designing the cell, the beta ratio between the accesstransistor and the PMOS (BRnaxpup = nax/pup) needs tobe designed (by upsizing the access and downsizing the pull-up transistors) in such a way (BRnaxpup > 1) that undernominal conditions, the write time is less than the word-lineturn-on time. However, the variation in the device strengthsdue to random variations in process parameters can increasethe write time. For example, if Vt of PL reduces and thatof AXL increases, which can result in an increase in thewrite-time thereby causing write failure. Hence, a proper static

  • 1862 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    beta-ratio is not sufficient to reduce the write failure. Moreover,upsizing the access transistor and/or downsizing the PMOStransistors increases the read failure. Thus, an optimum designof the size of the different transistors (considering the parametervariation) is necessary to reduce the read and the write failures.It should be noted that write failure is also primarily caused bythe mismatch in the strength in the transistors in a cell.

    C. Access Time Failure

    The cell access time (TACCESS) is defined as the time re-quired to produce a prespecified voltage difference (MIN 0.1VDD) between two bit-lines (bit-differential). If due to Vtvariation, the access time of the cell is longer than the maximumtolerable limit (TMAX), an access time failure is said to have oc-curred. Access failure is caused by the reduction in the strengthof the access and the pull-down transistors. Thus, access failurelimits the reduction in the size of the access transistor (requiredto increase BRnpdnax to reduce VREAD). An increase in theVt of the access transistor and the pull-down NMOS (caused bythe process variation) can significantly increase the access timefrom its nominal value thereby resulting in an access failure.It should be noted that the access failure is caused by increasein the Vt of AXR and/or Vt of NR. Thus, both intra-die andinter-die variation in process parameters increase the accessfailure.

    D. Hold Failure

    In the stand-by mode, the VDD of the cell is reduced to reducethe leakage power consumption. However, if lowering of VDDcauses the data stored in the cell to be destroyed, then the cellis said to have failed in the hold mode [15] [Fig. 2(c)]. As thesupply voltage of the cell is lowered, the voltage at the nodestoring 1 (node L in Fig. 1) also gets reduced. Moreover, fora low supply voltage (when PL is not strongly ON) leakageof the pull-down NMOS NL reduces the voltage at node L,even below the supply voltage applied to the cell. If the voltageat the node L is reduced below the trip-point of the inverterPR NR, then flipping occurs and the data are lost in the holdmode. The supply voltage to be applied in the hold mode ischosen to ensure the holding of the data under nominal condi-tion. However, variation in the process parameter can result inthe device mismatch causing hold failures. For example, if theVt of NL reduces while that of PL increases (which facilitatesthe reduction of the voltage at node L from the applied supplyvoltage) and/or Vt of NR increases, while that of PR reduces(increase in the trip-point of PR NR) the possibility of dataflipping in the hold mode increases. Consequently, an increasein the random intra-die variation can significantly increase thehold-failure.

    III. MODELING OF FAILURE PROBABILITIES

    In an SRAM cell, mismatches in the device parameters (L,W , Vt) of different transistors (cause by intra-die variations)result in different types of failures as explained in Section II.Because of the small geometry of the SRAM cell, the principal

    Fig. 3. Device characteristics. Discrete points represent the result obtained inMEDICI simulation and the lines represent the results from the models.

    source of the device mismatch is the intrinsic fluctuation of theVt of different transistors due to RDF [4]. As the transistorsin a cell are in very close spatial proximity, the effect ofmismatch in the channel length or width is small. Hence, inthis work we have considered the Vt variation due to RDFas the major source of intra-die variation, while estimatingthe probabilities of different failure events. However, the pro-posed method can also be extended to include other sourcesof variations (such as L and W variations or the impact ofL and W variations can also be represented as an additionalcontribution to the Vt variation). In this section, we will explainthe basic methodology used to model the different failureprobabilities.

    The failure probabilities are estimated using an SRAM celldesigned with bulk CMOS transistors of 50-nm gate length(Le = 25 nm). The transistors are designed using the two-dimensional Gaussian doping profiles [11] and simulated usingthe device simulator MEDICI [12]). In our analysis, we haveused the short-channel metal oxide semiconductor field effecttransistor (MOSFET) theory to model the current and thethreshold voltage considering the device geometry and dopingprofile [3], [5]. Fig. 3 shows the Id Vg characteristics of thedesigned transistors. The leakage current models presented in[10] are used to represent different leakage components.

    While modeling the failure probabilities considering thethreshold voltage variations due to RDF, Vt fluctuations (Vt)in the six transistors in an SRAM cell are considered as sixindependent Gaussian random variables (mean = 0) [4]. Theassumption of the independent random variable is justified aswe have considered primarily the effect of RDF. The placementand the number of dopants in the channel of one transistordepend only on the geometry of that transistor and are inde-pendent of the placements and the number of dopants in thechannel of a neighboring transistor [4]. Thus, the Vt fluctuationdue to the RDF of one transistor does not depend on the Vtfluctuation of any neighboring transistor. Hence, the Vt of thecell transistors can be assumed as independent random var-iables [4]. However, we have also investigated the effect of thecorrelation of Vts of the cell transistors on the failure prob-abilities. The standard deviation of the Vt fluctuation (Vt) dueto RDF depends on the manufacturing process, doping profile,and the transistor sizing [6]. In the proposed method, Vt for a

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1863

    minimum-sized transistor (Vt0) is an input parameter and thedependence of Vt on the transistor size is given by [6]

    Vt = Vt0

    (LminL

    )(WminW

    ). (2)

    A. Modeling Methodology

    In this section, we will summarize the key mathematicalbases used to estimate the failure probabilities. Let us considery = f(x1, . . . , xn) as a function, where x1, . . . , xn are inde-pendent Gaussian random variables with mean 1, . . . , 2 andstandard deviation (STD) 1, . . . , 2. The mean (y) and theSTD (y) of the random variable y can be estimated as (usingmultivariable Taylor-series expansion) [13]

    y = f(1, . . . , n) +12

    ni=1

    2f(x1, . . . , xn)(xi)2

    i

    2i

    2y =n

    i=1

    (f(x1, . . . , xn)

    (xi)

    i

    )22i . (3)

    Assuming the probability distribution function (PDF) of y tobe also Gaussian [Ny(y : y, y)], the probability of (y > Y0)is given by

    P [y > Y0] =

    y=Y0

    Ny(y : y, y)dy

    =1Y0

    y=Ny(y)dy

    =1 y(Y0) (4)

    where y is the cumulative distribution function (CDF) of y.Let us assume y = f(x1, . . . , xn) and z = g(x1, . . . , xn)

    are two Gaussian random variables Ny(y : y, y) and Nz(y :z, z), respectively. The probability of (y > Y0 and z > Z0)is given by

    P [(y > Y0) and (z > Z0)]

    = 1 P [(y Y0) + (z Z0)]= 1 {P [y Y0] + P [z Z0]

    P [(y Y0) & (z Z0)]}= {P [y > Y0] + P [z > Z0] 1}+ y,z(Y0, Z0) (5)

    where y,z(y, z) is the joint CDF of y and z. y,z(Y0, Z0) isgiven by

    (Y0, Z0) =

    Y0y=

    Z0z=

    Ny,z (y : y, y; z : z, z) dydz.

    (6)

    The joint PDF Ny,z(y : y, y; z : z, z) is given by

    Ny,z(y, z) =1

    2yz

    1 2

    exp

    (yzz

    )2 2

    (yzz

    )(zzz

    )(

    zzz

    )22(1 2)

    . (7)

    The correlation coefficient can be computed as follows:

    =E(yz) E(y)E(z)

    (y)(z)=

    E(yz) yzyz

    E(yz) = f(1, . . . , n)g(1, . . . , n) +12

    ni=1

    2(fg)(xi)2

    2i

    = f0g0 +12

    ni=1

    (g0

    2f

    x2i+ 2

    f

    xi

    g

    xi+ f0

    2g

    x2i

    )2i .

    (8)The above results will be used in this paper to estimate the

    failure probabilities of different events.

    B. Read Failure (RF)

    As explained in Section II-A, read failure occurs whenduring reading voltage at node R (VREAD) increases to a valuehigher than the trip-point of the inverter PL NL (VTRIPRD)[Fig. 3(a)] [14]. Hence, the read-failure probability (PRF) isgiven by

    PRF = P [VREAD > VTRIPRD]. (9)VREAD can be obtained by simultaneously solving Kirch-

    hoffs Current Law (KCL) at node R and L, as given byAt R IdsatAXR + IgsAXR + IsubPR + IgdPR

    + IjnPR + IgdNR + IgdNL + IgdPL + IgsPL

    = IdlinNR + IjnNR + IjnAXR,

    At L IdsNL + IjnNL + IgdNL + IgdPL= IdsPL + IjnPL + IdlinAXL (10)

    where IdsatXX is the saturation current (from drain to source),IdlinXX is the drain current at the linear region of operation,IgsXX represents the gate-to-source component of the gateleakage, IgdXX represents the gate-to-drain component of thegate leakage, IsubXX represents the subthreshold leakage, andIjnXX represents the junction leakage of the transistor XX[where XX represents different cell transistors used in (10)].Similarly, VTRIPRD can be obtained by solving [14]

    IdsatNL(Vgate=VTRIPRD,Vdrain=VTRIPRD,Vsource=gnd)

    IdsatPL(Vgate=VTRIPRD,Vdrain=VTRIPRD,Vsource=VDD)+IdsatAXL(Vgate=VDD,Vdrain=VDD,Vsource=VTRIPRD).

    (11)

  • 1864 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 4. Variation of (a) VTRIP of PL NL and (b) VREAD, with Vt applied to different transistors.

    Fig. 5. Distributions of (a) VTRIP of PL NL and (b) VREAD. The curves entitled Gaussian Model (Analytical) represent solutions explained in Section IV.

    TABLE IFAILURE-PROBABILITY ESTIMATIONS FOR DIFFERENT CELLS (MONTE CARLO ESTIMATION)

    Fig. 4 shows that VTRIPRD [obtained using (11) and theMEDICI simulation] is a linear function of independent randomvariables: VtNL and VtPL . Similarly, VREAD [obtained using(10) and the MEDICI simulation] is a linear function of in-dependent random variables: VtAXR and VtNR . As explainedin Section II-A, VREAD increases if the strength of AXRincreases (VtAXR => IdsatAXR ) and/or that of NR reduces(VtNR => IdlinNR ). On the other hand, VTRIPRD reduceswhen the strength of PL reduces (VtPL ) and/or that of NLincreases (VtNL ). The PDF of VREAD [= NRD(vREAD)] andVTRIP [= NTRIP(vTRIP)] can be approximated as Gaussiandistributions with the means and the variances obtained using(3) [Fig. 5(a) and (b)]. PRF is given by

    PRF = P [ZR (VREAD VTRIPRD) > 0] = 1 ZR(0)(12)

    where ZR = VREAD VTRIP and 2ZR = 2VREAD 2VTRIPD .

    The estimated value of PRF closely follows the values ob-tained from Monte Carlo simulations (Table I).

    C. Write Failure (WF)

    Following the discussion in Section II-B, the write failureoccurs when, while writing a 0 to the node storing 1 (nodeL in Fig. 1), the voltage at node L (VL) is not reduced below thetrip-point of the inverter PR NR (VTRIPWR) within the timewhen word-line is high (TWL). The write-failure probability(PWF) is given by

    PWF = P [(TWRITE > TWL)] (13)

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1865

    Fig. 6. Variation and distribution of TWRITE with variation in Vt. (a) TWRITE variation with Vt, and (b) distribution of TWRITE. The curves entitledGaussian Model (Analytical) represent solutions explained in Section IV.

    where TWRITE is the time required to pull down VL from VDDto VTRIPWR. TWRITE is obtained by solving

    TWRITE

    =

    VTRIPVDD

    CL(VL)dVLIin(L)(VL)Iout(L)(VL)

    , if (VWR IdsAXL ) and/or that of PLincreases (VtPL => IdsPL ). Moreover, VTRIP (PR, NR) re-duces (thereby increasing TWRITE) when the strength of PRreduces (VtPL ) and/or that of NR increases (VtNR ). Hence,TWRITE is a strong function of the random variables: VtPL ,VtNR , VtPR , and VtAXR . Using (3), we can estimate themean (TWR) and the standard deviation (TWR) and ap-proximate its PDF as a Gaussian one (fWR(tWR)) [Fig. 6(b)].However, most of the write-failures originate from the tailof the distribution function. Hence, to improve the accuracyof the model at the tail region, we can use a noncentral Fdistribution [13]. Using the PDF (Gaussian/noncentral F) ofTWRITE [NWR(tWR)], PWF is given by

    PWF =

    tWR=TWL

    NWR(tWR)d(tWR) = 1 WR(TWL).

    (15)WR(tWR) represents the CDF of the probability distrib-

    ution (Gaussian/noncentral F) [13]. PWF obtained using (15)

    closely matches the result using Monte Carlo simulations(Table I).

    D. Access-Time Failure (AF)

    Access-time failure occurs if the access time of the cell(TACCESS) is longer than the maximum tolerable limit (TMAX)(Section II-C). The probability of access-time failure (PAF) ofa cell is given by

    PAF = P (TACCESS > TMAX). (16)

    While reading the cell storing VL = 1 and VR = 0(Figs. 1, 3), bit-line BR will discharge through AXR and NR(by the current IBR). Simultaneously, BL will discharge by thegate leakage, the subthreshold leakage, and the junction leakageof AXL of all the cells connected to BL (IBL). The dischargingcurrents IBR and IBL are given by

    IBR =IdsatAXR +

    i=1,...,N

    [IgdAXR(i) + IjnAXR(i)

    ] (17a)IBL =

    i=1,...,N

    [IgdAXL(i) + IjnAXL(i) + IsubAXL(i)

    ] (17b)where N is the number of cells attached to a bit-line (orcolumn). Hence, TACCESS can be obtained by solving

    TACCESS =

    VDDVBLMINVDD

    CBRdVBRIBR

    =

    VDDVBLVDD

    CBLdVBLIBL

    (18)

    where CBR/BL is the bit-line capacitance that includes thediffusion capacitance (Cjn) of the access transistors andthe interconnect capacitances (CIC). To simplify the above

  • 1866 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 7. Variation and distribution of TACCESS with variation in Vt: (a) TACCESS variation with Vt, and (b) distribution of TACCESS. The curves entitledGaussian Model (Analytical) represent solutions explained in Section IV.

    calculation, we assume that IdsatAXR is constant (valid forsmall VBR since AXR is in saturation) and Igd, Ijn, and Isubare constant at their values for VBL = VDD (valid for smallVBR and VBL). Hence, VBR and VBL are linear func-tions of time [Fig. 3(a)]. We further assume that CBL = CBR =CB , IgdAXR(i) = IgdAXL(i) and IjnAXR(i) = IjnAXL(i) (sincethey are not a strong function of Vt). Using these assumptions,TACCESS is given by

    TACCESS =CBRCBLMIN

    CBLIBR CBRIBL=

    CBMINIdsatAXR

    i=1,...,N

    IsubAXL(i). (19)

    Fig. 3(a) shows that during a nondestructive read opera-tion, the voltage at node R quickly rises to VREAD and staysstable at that value. Hence, to simplify (19), we first solve(10) for VREAD and use that VREAD to evaluate TACCESS.The access time given by (19) closely follows the MEDICIsimulation result [Fig. 7(a)]. From Fig. 7, it can be observedthat TACCESS increases with an increase in VtAXR and/or VtNR .Hence, TACCESS principally depends on Vt of AXR and NRthat determines IdsatAXR. The total subthreshold leakage of thecells associated with BL (i.e., IsubAXL(i)) is approximatedas N E[IsubAXL], where E[IsubAXL] is the expected valueof IsubAXL considering random variation in VtAXL . The PDFof TACCESS can be approximated as a Gaussian one with themean (TAC) and the standard deviation (TAC) obtained from(3). Using the derived PDF [NTACCESS(tACCESS)], PAF can beestimated as

    PAF =

    tACCESS=TMAX

    NTACCESS(tACCESS)d(tACCESS)

    =1 TACCESS(TvMAX) (20)

    where TACCESS(tACCESS) is the CDF of TACCESS. PAF ofa cell using the model closely matches the one obtained usingMonte Carlo simulation (Table I).

    E. Hold Failure (HF)

    A hold failure occurs if the minimum supply voltage that canbe applied to the cell in the hold mode (VDDHmin), withoutdestroying the data, is higher than the designed stand-by modesupply voltage (VHOLD) (Section II-D). Thus, the probabilityof hold failure (PHF) is given by

    PHF = P [VDDHmin > VHOLD]. (21)

    Lowering the VDD of the cell (say VDDH represents the cellVDD at the hold mode) reduces the voltage at the node storing1 (VL in Fig. 1). Due to leakage of NL, VL will be less thanVDDH for low VDDH. The hold failure occurs if VL < VTRIPof PR NR. Hence, VDDHmin can be obtained by numericallysolving

    VL(VDDHmin, VtPL , VtNL)=VTRIP(VDDHmin, VtPR , VtNR).(22)

    The estimated value of VDDHmin, using (22), closely followsthe values obtained from MEDICI simulation [Fig. 8(a)]. FromFig. 8(a), it can be observed that VL reduces as the strengthof NL increases (VtNL ) or that of PL reduces (VtPL ).On the other hand, VTRIP (PR, NR) increases as VtNR orVtPR . From (17), it is evident that VDDHmin is a function ofrandom variables: VtPL , VtNL , VtPR and VtNR . The distribu-tion of VDDHmin [NVDDHmin(vDDHmin)] can be approximatedas a Gaussian one with mean and variance obtained using(7) (a noncentral 2 distribution improves the accuracy forVDDHmin values close to 0) [Fig. 8(b)]. Hence, we can estimatePHF as

    PHF =

    VHOLD

    fVDDHmin(vDDHmin)d(vDDHmin)

    = 1 FVDDHmin(VHOLD). (23)

    The PHF obtained using (23) closely matches the result usingMonte Carlo simulations (Table I).

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1867

    Fig. 8. Variation and distribution of VHOLD. (a) VDDHmin variation with Vt, and (b) distribution of VDDHmin. In (a) Vt is applied in the directions:VtNR > 0, VtPR < 0, VtNL < 0, and VtPL > 0. The curves entitled Gaussian Model (Analytical) represent solutions explained in Section IV.

    TABLE IIESTIMATION OF PROBABILITIES OF JOINT EVENTS

    F. Estimation of Overall Cell Failure Probability (PF)The overall failure probability is given by

    PF =P [Fail] = P [AF + RF + WF + HF]

    =PAF + PRF + PWF + PHF P [AFRF] P [AFWF] P [AFHF] P [RFWF] P [RFHF] P [WFHF]+ P [AFRFWF] + P [AFRFHF] + P [RFWFHF]

    + P [WFHFAF] P [All]. (24)

    Table II shows the direction in which Vt of different tran-sistor has to move (Vt > 0 or Vt < 0) to cause each type offailure. For example, let us consider the joint event (AFRF).It can be observed that among the four different ways ofcausing read failure only VtNR > 0 also causes the access-timefailure. An accurate estimate of the probability of joint eventsis possible by constructing the joint PDF representing the twoevents using the procedure given in (5). We have also assumedthat probabilities of simultaneous occurrence of more thantwo events are negligible ( 0). The estimated probabilitiesmatch the Monte Carlo results very closely (Table I). All ofthe different failure probabilities increase significantly with anincrease in the sigma of Vt variation, as shown in Fig. 9 (for cellC1 in Table I).

    G. Estimation of Column and Memory-FailureProbability (PCOL and PMEM)

    The failure probability of a column (PCOL) or row (PROW)is defined as the probability that any of the cells in that column

    Fig. 9. Variation of failure probability with VT0.

    (out of NROW cells) or row (out of NCOL cells) fails. Assumingcolumn redundancy, the probability of failure of a memory chip(PMEM) designed with NCOL number of columns and NRCnumber of redundant columns, is defined as the probability thatmore than NRC (i.e., at least NRC + 1) columns fail (similardefinition is applicable for row redundancy). Hence, PCOL andPMEM can be given by

    PCOL =1 (1 PF)NROW

    PMEM =NCOL+NRCi=NRC+1

    (NCOL + NRC

    i

    )

    P iCOL(1 PCOL)NCOL+NRCi. (25)

    H. Effect of Correlation of Threshold Voltageof Different Transistors

    In the previous discussions, we have assumed the Vt ofdifferent transistors in an SRAM cell are independent randomvariables. This assumption is valid if we are considering theVt variation due to RDF [1][3]. However, in general, due tothe presence of systematic intra-die variations (e.g., Vt variationdue to channel-length variation), the Vts of different transistorscan be correlated. In this section, we will investigate the impactof such correlation on the failure probabilities. The proposed

  • 1868 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 10. Effect of correlation of Vt on the distribution of (a) VTRIPRD and (b) VREAD.

    models in Section III-A can be easily extended to account forthe effect of correlation as shown below [13]

    y = f(1, . . . , n) +12

    ni=1

    2f

    (xi)2

    i

    2i

    +N

    k=1

    Ni=1;i=k

    2f

    xixk

    i,k

    r(i,k)ik

    2y =n

    i=1

    (f

    (xi)

    i

    )22i

    + 2N

    k=1

    Ni=1;i=k

    (f

    xi

    i

    )(f

    xk

    k

    )r(i,k)ik

    (26)

    where r(i,k) is the correlation coefficient between xi and xj .Since all the transistors in a cell are in very close spatial proxim-ity, we can assume the correlation coefficients among differenttransistors are same, i.e., r(i,k) = r for all i and k. Althoughthe proposed model can handle different values of correlationcoefficients for different transistors, we have used the aboveassumption to simplify the calculation. Fig. 10 shows thatthe extended analytical model shown in (26) closely followsthe Monte Carlo distributions for VREAD and VTRIPRD in thepresence of correlation. The Gaussian model for TACCESS,TWRITE, and VHOLD also closely follow the correspondingMonte Carlo distributions. It has been also observed that theeffect of the correlation on the mean value of VREAD, VTRIPRD,TACCESS, TWRITE, and VHOLD is not very significant, whereasit has a stronger impact on the standard deviations of the above-mentioned parameters.

    It can be observed from the discussions in the previoussections and from Table II that for read, write, and hold failuresto occur the threshold voltages of different transistors need toshift in opposite directions. In other words, these three failuresare enhanced by the increase in the mismatch between theVts of the transistors. For example, the read failure increasesif Vt of AXR becomes negative and Vt of NR becomespositive, which increases VREAD. If Vt of AXR and Vt ofNR are completely uncorrelated (independent random vari-ables with mean = 0) and VtAXR < 0, Vt of NR has equal

    probability of being positive or negative. On the other hand,if they are positively correlated and VtAXR < 0, Vt of NRhas a lower probability of being negative. This reduces theprobability of occurrence of a high value of VREAD. Similarly,the probability of occurrence (VtAXR > 0 and VtNR < 0)(i.e., low value of VREAD) also reduces. Hence, the spreadof the distribution of VREAD reduces with an increase in thecorrelation [Fig. 10(a)]. Similar reduction in the spread ofVTRIPRD is also observed with an increase in the correlation be-tween VtNL and VtPL [Fig. 10(b)]. Consequently the standarddeviation of the variable Z = (VREAD VTRIPRD) reduceswith an increase in the correlation coefficient [Fig. 11(a)],which results in a reduction of the read-failure probability[Fig. 11(b)]. Since the write failure and the hold failure arealso enhanced by the mismatch between the transistor thresholdvoltages, an increase in the correlation reduces both the writeand the hold failure probabilities [Fig. 11(b)] by decreasingthe standard deviation of TWRITE and VHOLD, respectively[Fig. 11(a)]. However, increase in the correlation increases thestandard deviation of TACCESS [Fig. 11(a)], and hence, theaccess-time failure probability [Fig. 11(b)]. This is becauseof the fact that access-time failure is caused if VtAXR > 0and VtNR > 0. The correlation between VtAXR and VtNRincreases the probability of occurrence of this event (i.e., ifVtAXR > 0 due to positive correlation VtNR has a higherprobability of being positive and vice-versa). Hence, an in-crease in the correlation coefficient increases the access timefailure probability.

    In this section, we have analyzed the impact of the correlationof the threshold voltage on the failure probabilities. It has beenobserved that the read, write, and hold failures reduce with anincrease in the correlation whereas the access-time failure prob-ability increases. However, due to extremely small geometryof the SRAM cell, the effect of RDF on the threshold voltageis very high in the cell transistors [4]. Hence, the correlationbetween Vts is expected to be small particularly for the SRAMdesigned with the nanoscaled CMOS devices. In the rest of thepaper, we have neglected the effect of the correlation whileestimating the failure probabilities and proposing a statisticalapproach for designing a robust memory. It should be notedthat neglecting the correlation results in an overestimation ofthe read, write, and hold failures (i.e., pessimistic estimation)and an underestimation of access time failure (i.e., optimisticestimation).

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1869

    Fig. 11. Effect of correlation of Vt on (a) standard deviation of TACCESS, TWRITE, VREAD VTRIPRD, and VHOLD; and (b) failure probabilities. Allthe values are normalized to their corresponding values for the completely uncorrelated case (i.e., correlation coefficient = 0). (a) Standard deviation versuscorrelation. (b) Failure probability versus correlation.

    IV. FAILURE-PROBABILITY ESTIMATION USING SIMPLELONG-CHANNEL TRANSISTOR MODEL

    In the estimation method described in the previous sec-tions, the use of sophisticated short-channel transistor modelincreases the accuracy. However, due to their complex nature,these models require numerical solutions of (10), (11), (14),(19), and (22) to obtain the failure probabilities (i.e., KCL at theintermediate nodes need to be solved numerically). Numericalsolution increases the estimation time. Hence, to reduce thecomputation cost, we have derived a set of analytical models toestimate the failure probabilities using long-channel transistorequations (with short-channel threshold-voltage model). Theseanalytical models (although less accurate) allow a fast estima-tion of the failure probabilities. These models are particularlyuseful in the generation of a good initial guess for the statistical-optimization problem discussed in Section VI. The numericalmodels are used in the final optimization stage to ensure theaccuracy.

    A. Long-Channel Transistor Equations

    The long-channel transistor characteristics that are used inderiving the analytical models are summarized below

    Idsub =(m 1)(

    kT

    q

    )2

    exp(

    q(Vgs Vth)mkT

    )(1 exp

    (qVdskT

    ))

    Idlin =[(Vgs Vth)Vds

    (m2

    )V 2ds

    ]1 +

    (effVdsvsatL

    ) andIdsat =

    (Vgs Vth)22m

    =eCoxW

    L, Isub0 = (m 1)

    (kT

    q

    )2(27)

    where e is the effective mobility, and m is the body ef-fect coefficient (m = 1 + 3Tox/(Width of depletion region)).In this work, we have used m and e to match the MEDICIsimulation result as closely as possible.

    B. Estimation of Failure ProbabilitiesUsing a simple square-law model to estimate the ON cur-

    rent through a transistor and neglecting the contribution ofthe leakage currents, VREAD and VTRIPRD are obtained from(assuming a short-channel Vth model)

    VTRIPRD =

    [VDD VtPL + VtNL

    (NLPL

    )](1 +

    (NLPL

    )) (28a)IdsatAXR 0.5AXR (VWL VREAD VtAXR)2

    =NR (VDD VtNR 0.5VREAD)VREAD IdlinNR. (28b)

    Assuming simplified square-law transistors models, the inte-gration in (14) can be performed analytically as shown below

    TWRITE =CL

    VWLVtAXLVDD

    dVLIdsatAXL IdlinPL

    + CL

    VtNR+VtPLVWLVtAXL

    dVLIdlinAXL IdlinPL

    + CL

    VTRIPVtNR+VtPL

    dVLIdlinAXL IdsatPL . (29)

    The analytical model for access-time failure can be obtainedby using the solution of the VREAD from (28b). The estimatedvalue of VREAD can be used to determine the saturation cur-rent of a transistor with VGS = VDD VREAD, VDS = VDD VREAD, and VBS = VREAD (in this step, we can use short-channel saturation-current model also as we need to estimatethe current for one bias point only). This allows an analyticalestimation of TACCESS in (19).

    In order to obtain an estimate of hold voltage (VDDHmin),let us assume at VDD = VDDH, that all transistors are in sub-threshold region. The current of AXL can be neglected dueto its small Vds drop. Using these assumption and neglecting

  • 1870 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 12. Impact of (a) supply-voltage drop and (b) temperature increase on the failure probability.the contribution of the gate leakage and the junction leakagecurrents (as VDD is low), the KCL at node L becomes

    IdNL = IdPL

    NL(mn 1) exp(

    VR VtNLmnvT

    )

    (1 exp

    (VLvT

    ))

    =PL(mp 1) exp(

    VDDH VR VtPLmpvT

    )

    (1 exp

    (VL VDDH

    vT

    ))Analytical solution for VL. (30)

    Since node R is storing 0, due to large Vds for PR and AXR[neglecting the exp(Vds/vT ) term in their current equations],the KCL at node R can be approximated as

    IdNR = IdPR + IdAXR

    NR(mn 1) exp(

    VL VtNRmnvT

    )

    (1 exp

    (VRvT

    ))

    =PR(mp 1) exp(

    VDDH VL VtPRmpvT

    )

    + AXR(mn 1) exp(VR VtAXR

    vT

    )Analytical solution for VR. (31)

    In the above equation, the mn of AXR is approximated to 1,in order to be able to derive an analytical solution for VR.Assuming initial values of VL = VDDH and VR = 0, the aboveequations can be iteratively solved to find the solution for VLand VR and therefore to decide whether the cell fails (flips)at the assigned supply voltage. The minimum hold voltage(VDDHmin) can be found by a binary search.

    The simplified analytical models of VREAD, VTRIPRD,TWRITE, TACCESS, and VHOLD match the Monte Carlo sim-

    ulation result and the numerical models reasonably closely(Figs. 58). It should be remembered that, although the currentsare estimated using the long-channel equations, the thresh-old voltage is still evaluated using the short-channel model,which increases the accuracy. Moreover, the values of VREAD,VTRIPRD, TWRITE, and VHOLD depend on the relative magni-tude of different current components and not on the absolutevalues of the currents. For example, VREAD depends on therelative strength of IdsatAXR and IdlinNR. Thus, the errorintroduced by using the long-channel model in the estimation ofVREAD is less than the error introduced in the magnitudes of thecurrents themselves. Due to these reasons, use of the simplifiedlong-channel current models does not introduce high errors inthe estimation of the failure probabilities. Thus, these simplifiedmodels can be used to generate preliminary estimation of thefailure probabilities. However, the analytical models developedhere neglect the contributions of the leakage currents whiledetermining the node voltages [e.g., see (10), (11), (14)]. As themagnitude of the leakage currents increases, the error in thesesimplified models also becomes higher.

    V. SENSITIVITY ANALYSIS OF FAILURE PROBABILITY

    A. Impact of Supply Voltage (VDD) and TemperatureA drop in the supply voltage increases the cell failure proba-

    bilities. [Fig. 12(a)]. The impact of supply voltage drop is mostsignificant for the access-time failure. The derived model canalso be extended to include VDD of a cell as an independentGaussian random variable. Fig. 12(b) shows that the failureprobabilities increase with an increase in the temperature. Theimpacts of temperature increase are more severe on the access-time failure and the write failure because of: 1) reduction inthe ON current of the access transistors; and 2) increase in thejunction capacitances.

    B. Transistor Size and Cell-Failure Probability

    The length and width of different transistors of the cell (i.e.,Lnax, Wnax, Lnpd, Wnpd, Lpup, and Wpup,) impact the cell-failure probability principally by modifying: 1) the nominal val-ues of TACCESS, VTRIP, and VREAD, TWRITE, and VDDHmin;2) the sensitivity of these parameters to Vt variation, therebychanging the mean and the variance of these parameters; and

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1871

    Fig. 13. Impact of transistor size on distributions of TACCESS. (a) Mean of TACCESS. (b) Standard deviation of TACCESS.

    Fig. 14. The impact of (a) width (Wnax) and (b) length (Lnax) of access NMOS transistor on failure.

    Fig. 15. The impact of (a) width (Wpup) and (b) length (Lpup) of pull-up PMOS transistor on the failure.

    3) the standard deviation of the Vt variation [see (2)]. Forexample, Fig. 13 shows that along with the nominal value, themean and the standard deviation of TACCESS significantly varywith Wnpd and Wnax. In this section, we study the impact ofvariation of strength of different transistors on the cell-failureprobability. Fig. 14 shows that a weak access transistor (smallWnax and/or large Lnax) reduces PRF (VREAD decreases);however, it increases PAF and PWF (Fig. 14) and has very smallimpact on PHF. Reducing the strength of the PMOS pull-uptransistors (by decreasing Wp or increasing Lp) reduces PWF(reducing IdsPL), but increases PRF (lowers VTRIPRD). PAFdoes not depend strongly on PMOS strength (Fig. 15). PHFimproves with an increase in Wp or a reduction in Lp as node L

    is more strongly coupled to the supply voltage (VL VDDH)(Fig. 15). Increasing Wnpd (and/or reducing Lnpd) increasesthe strength of pull-down NMOS transistors (NL and NR). Thisreduces PRF (VREAD ) and PAF by increasing the strength ofNR (Fig. 16). Increase in width of NR has little impact on PWF.Although it slightly increases the nominal value of TWRITE,the reduction of VT of NR [see (2)] tends to reduce TWRITEand hence PWF remains almost constant (Fig. 16). However,increasing Lnpd reduces both TWRITE (by increasing the trip-point of PR NR) and VT of NR [see (2)], which results ina significant reduction in PWF (Fig. 16). An increase in theVTRIP of PR NR initially reduces PHF with the increasein the strength of NR (i.e., higher Wnpd or lower Lnpd).

  • 1872 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 16. The impact of strength of pull-down NMOS transistor on the failure probabilities: (a) width (Wnpd); (b) Length (Lnpd).

    Fig. 17. Variation of SNM and failure probability with (a) width of the access transistors; and (b) normalized cell area.

    However, a higher width of NL (or lower Lnpd) reduces VL(from the applied VDDH) due to an increase in the leakage ofNL. Consequently, a very high Wnpd (or low Lnpd) increasesthe PHF (Fig. 16).

    Due to the variation in the failure probability, the choice ofthe transistor sizes has a strong impact on the yield. Hence,it can be concluded that a statistical approach to the designof transistor sizes is necessary to maximize the yield. Thederived failure-probability models can be effectively used forsuch statistical optimizations.

    C. Static-Noise Margin and Cell-Failure Probability

    The static-noise margin (SNM) of a cell is often used as ameasure of the robustness of an SRAM cell against flipping [4].However, an increase in SNM makes the cell difficult to writeby increasing its data-holding capability, which increases writefailures. For example, reducing the size of the access transistorimproves the SNM [4]. On the other hand, reducing Wnax de-creases read-failure probability, but increases the write-failureprobability [Fig. 17(a)]. Hence, the size of Wnax that resultsin a maximum SNM does not correspond to minimum-failureprobability [Fig. 17(a)]. Moreover, increasing the size of all thetransistors in a cell by the same factor does not modify theSNM. However, an increase in the size of all the transistors ina cell considerably reduces its failure probability by reducingthe standard deviation of the Vt variation [Fig. 17(b)]. Using

    the proposed models, it is observed that SNM does not have astrong relationship with the parametric failure of the memory.Consequently, an increase in the SNM does not necessarilyreduce the overall failure probability and an SNM-based analy-sis of the cell does not directly correspond to the memory-failure probability and the yield. Hence, a statistical analysisand design of the cells and memory structure is necessary toensure acceptable yield in nanometer regimes.

    VI. STATISTICAL DESIGN OF THE SRAM ARRAY

    A. SRAM Yield-Estimation Model

    The hierarchy of the failure probabilities in an SRAM arrayis shown in Fig. 18. We first estimate the failure probabilityof a cell (PF) [see (24)]. The cell failure probability (PF) isused to determine the probability of failure of a column (PCOL)using the total number of cells in that column (i.e., columnlength) [see (25)]. The estimated value of PCOL is then usedto calculate the failure probability of a memory array (PMEM).The memory-failure probability depends on the number ofactual columns (NCOL) and the redundant columns (NR)[see (25)]. The memory-failure probability is directly relatedto the yield of the memory chip. To estimate the yield, wehave used Monte Carlo simulations for inter-die distributionsof L, W , and Vt (assumed to be Gaussian). For each inter-dievalue of the parameters (say LINTER, WINTER, and VtINTER )

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1873

    Fig. 18. Memory hierarchy and failure probability.

    we estimate PF, PCOL, and PMEM considering the intra-diedistribution of Vt. Finally, the yield is defined as

    Yield = 1

    INTER

    PMEM (LINTER,WINTER, VtINTER)

    NINTER

    (32)

    where NINTER is the total number of inter-die Monte Carlosimulations (i.e., total number of chips). An increase in theintra-die variation (i.e., VT0) increases the memory-failureprobability, thereby reducing the yield (Fig. 19). In this esti-mation, we have assumed a standard deviation of 7% for inter-die distribution of L, W , and Vt, N = 32 cells per column,NCOL = 512, NRC = 24.

    As observed in Section V, the cell-failure probabilities aredependent on memory-cell configurations including transistorsizing. Hence, proper choice of the size of the cell transistorshas a strong impact on the memory yield [Fig. 20(a)]. Yieldalso depends strongly on the design of the memory archi-tecture. Increasing the number of cells in a column (columnlength or number of rows) increases the cell-failure probability(particularly PAF as CBL and IBL increases in (19), result-ing in higher TACCESS). Moreover, PCOL, and hence PMEM,increases significantly with the column length [see (25)].However, for a constant memory size, increasing column lengthreduces the number of columns, which tends to reduce PMEM(assuming a constant redundancy). Fig. 20(b) shows the varia-tion of column-failure probability, the memory-failure probabil-ity, and yield of a 2-kB cache with the column length (numberof redundant columns kept constant). It can be observed thatyield strongly depends on the column length. Therefore, thememory yield is also impacted by memory architecture. Hence,the cell configurations and the memory architecture can beoptimized for maximizing memory yield.

    In order to improve the yield of the memory, the memory-failure probability (PMEM) needs to be minimized. From (25),it can be observed that PMEM depends on: 1) length (Lnax,Lnpd, Lpup) and width (Wnax, Wnpd, Wpup) of the transistorsin the cell that determine PF; 2) number of cells in a row (i.e.,number of rows NROW), which determines PCOL; and 3) num-ber of actual columns (NCOL) and the number of redundantcolumns (NRC). It should be noted that NCOL and NROW are

    Fig. 19. Variation of yield with VT0. The results are obtained using cells C1and C2 shown in Table I.

    not independent since NCOL NROW = size of the memory.However, NROW and NCOL are principally determined bythe memory architecture (e.g., memory pitch, complexity ofcolumn and row decoder, etc.). Any modifications of NCOLand NROW have to consider their impact on architectural-level parameters, such as memory pitch, complexity of columnand row decoder, etc. Thus, to simplify the present designproblem, we have not considered NROW and NCOL as designparameters. Minimization of PMEM has to consider the impacton the total leakage and the total area (AMEM). In the followingsection, we present the models we have used to estimate theleakage and the area of an SRAM array.

    B. Statistical Estimation of Leakage in SRAMThe leakage current in SRAM cell is the major contributor

    to the power in the SRAM cell array. Hence, the optimizationof the cell structure has to consider its impact on the cellleakage. The total leakage in a cell principally consists of thesubthreshold leakage, the gate leakage, and the junction band-to-band tunneling leakage through different transistors in thecell (Fig. 1) [10]. Considering all of the different components,the total leakage of the cell can be computed as

    Isub = IsubAXR + IsubNL + IsubPR

    Ijn =2IjnAXL + IjnAXR + IjnNL + IjnPR

    Igate = IgdAXL + IgsAXL + IgdAXR + IgdPR

    + IgdNR + IgsNR + IgdPL + IgsPL + IgdNL

    Ileak = Isub + Ijn + Igate. (33)

    We have used the leakage current expressions presented in[10] to evaluate different leakage components and the totalcell leakage. However, the Vt variation in the transistors of acell results in significant variation in the leakage (particularly,the subthreshold leakage) of the cell. The mean (LCELL)and the standard deviation (LCELL) of the leakage of a cellconsidering RDF-induced Vt fluctuation can be obtained usingthe process described in (3). Since the subthreshold leakageis an exponential function of the threshold voltage, we haveassumed a lognormal PDF to describe the distribution of thecell leakage [13]. Fig. 21 shows that the lognormal distribution

  • 1874 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 20. Impact of (a) circuit (transistor size) and (b) architecture [number of row (column length) and number of columns] on yield. In (b), transistor sizes werechosen to maximize yield following (a). (VT0 = 20 mV).

    Fig. 21. Distributions leakage of SRAM cell (Ileak).

    model with the mean and the standard deviation estimated using(3) closely follows the Monte Carlo simulation results. Con-sidering the different cells to be independent identical randomvariables, (i.e., the mean and the standard deviation of theleakage of all the cells are same) the total SRAM array leakageis given by

    ILeakMem =NCELLS

    i=1

    Ileak =NROW(NCOL+NRC)

    i=1

    Ileak (34)

    where Ileak is the random variable representing the leakage of acell. Applying the Central Limit Theorem [13], the distributionof the total leakage can be approximated as a Gaussianone with the mean (L) and the standard deviation (L)given by

    L = NCELLSLCELL and 2L = NCELLS2LCELL. (35)

    To consider the effect of leakage distribution in the statisticaldesign of the SRAM array, we have defined the probability(PL) that the total memory leakage will meet a given leakagebound as

    PLeakMem =P (ILeakMem ILMAX)

    =(

    ILMAX LMEMLMEM

    ). (36)

    Fig. 22. SRAM cell layout.

    C. Area Estimation of SRAMUsing the layout shown in Fig. 22 [16], the total cell area can

    be computed as

    Xcell =5 + 2max(3,Wnax) + 2max(Lnpd, Lpup)

    Ycell =9 + max(3,Wpup) + max(3,Wnpd) + Lnax

    Acell =Xcell Ycell (37)

    where is the minimum feature size of a technology. In ourmodel, Lmin = 50 nm. Although, there are different typesof layout possible for the SRAM cell, we have only consideredthe one shown in Fig. 22 for the sake of simplicity. The total

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1875

    memory area, including the area of the redundant columns, isgiven by

    Aactual =NROWNCOLAcell

    Aredundant =NROWNRCAcell

    AMEM =Aactual + Aredundant

    =NROW(NCOL + NRC)Acell (38)

    where Aactual is the required memory area (given by the mem-ory size) and Aredundant is the area overhead of the redundantcolumns.

    D. Statistical-Design Procedure

    In order to improve the yield of an SRAM array under param-eter variation, we have developed a statistical-design methodto reduce the memory-failure probability under the area, theperformance, and the leakage constraints. In the proposedmethod, the size of the different transistors in a cell and thenumber of redundant columns in an array is properly chosen tominimize the memory-failure probability. The proposed designmethod can be formulated as a minimization problem as shownbelow

    Minimize PMEM = f(X)

    where X [Lnax Wnax Lnpd Wnpd Lpup Wpup NRC ]

    Subject to :PLeakMem PLeakMinAMEM Maximum Area (AMAX)

    E[TAC] = TACCESS

    Maximum access time mean (TACMAX)

    For all the parameters : {XMIN} {X} {XMAX}. (39)

    This is essentially a nonlinear optimization problem withnonlinear constraints. The upper bound on the mean-accesstime is given to ensure that robustness of the memory hasnot been achieved by significantly sacrificing the performance.It should be noted that the total memory area (i.e., actual +redundant cell area) is used as a constraint instead of onlythe cell area. This allows the tradeoff between the area of theindividual cells and the amount of memory redundancy. Also,in order to consider the effect of leakage distribution, we haveused a lower bound on the probability that the leakage will beless than the maximum allowable limit, as a constraint (insteadof using a deterministic leakage bound).

    Fig. 23 shows the basic steps of the proposed design process.It should be noted that NRC (number of redundant columns)can have only discrete integer values. As we have consid-ered redundancy only to correct the parametric failures, we

    Fig. 23. Statistical-design procedure of SRAM.

    allow the minimum value of NRC to be zero [it should benoted that there can be other kinds of failures (e.g., hardshort or open soft error) that will require NRCmin > 0]. Theupper bound of NRC is determined using (37) and (38) asshown below:

    Xcellmin =5 + 6 + 2 = 13

    Ycellmin =9 + 3 + 3 + = 16

    Acellmin =(Xcellmin)(Ycellmin)

    NRCmax =AMEM NROWNCOLAcellmin

    AcellminNROW

    =AMEM

    AcellminNROWNCOL. (40)

  • 1876 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Fig. 24. Optimization procedure using DLM.

    The minimization of PF requires the estimation of the jointprobabilities given in (5), which are computationally expensive.However, it should be noted that

    PF =P [AF + RF + WF + HF]

    PAF + PRF + PWF + PHF = PFMOD. (41)

    This allows us to minimize PFMOD instead of minimizingPF. Thus, the minimization problem in step 4 in Fig. 23 can beformulated as

    Minimize f(X) = PFMODwhere X = [Lnax Wnax Lnpd Wnpd Lpup Wpup ]

    Subject to : h1(X) =(

    AcellAcellmax(i)

    ) 1 0

    h2(X) =(

    PLeakMinPLeakPower

    ) 1 0

    h3(X) =(

    TACTACMAXr

    ) 1 0. (42)

    The above problem can be solved using Lagrange multiplier-based algorithm [17], [18]. It should be noted that the pa-rameters Lnax, Wnax, Lnpd, Wnpd, Lpup, and Wpup can beconsidered both as continuous (i.e., any width and length largerthan the minimum dimension is allowed) and discrete (i.e.,only a finite set of discrete values of L and W are allowed).In this work, we have considered the discrete-variable space(to account for the minimum limit on the lithographic control-lability of L and W ). To solve the discrete space Lagrangianproblem, we have used the Discrete Lagrangian method (DLM)described in [17]. The basic steps of this procedure are sum-marized in Fig. 24. In the described DLM process, inequalityconstraints are converted into the equality constraints using thefunction: hi(X) = max(hi(X), 0). Ld(x, ) is the generalizedaugmented Lagrangian function [17], defined as

    Ld(x, ) = f(x) + TH (h(x)) +(

    12

    )h(x)2 (43)

    where H is a continuous transformation function satisfy-ing H(y) = 0, if y = 0 [realized as H(y) = y2] and ={1, 2, 3} are the Lagrange multipliers.

    It can be observed from Fig. 23 that the complexity of theproposed design flow has a polynomial dependence on thememory size. To understand this, let us analyze the dependenceof the complexity of different steps of the proposed flow on thememory size (say, MSize = NCOL NRC). First, it should benoted that in the proposed design flow, the number of loopsrequired to find the optimum number of redundant columns(NRC_opt) increases linearly with an increase in the numberof memory columns. Let us now analyze the dependence of thecomplexity of different steps in the main loop on the memorysize. The complexity of the estimation of PMEM dependslinearly on NCOL [see (25)]. Moreover, Acellmax reduces insuccessive iterations of the loop (due to increase in NRC).A reduction in Acellmax reduces the feasible solution spacefor {Lnax,Wnax, Lnpd,Wnpd, Lpup,Wpup} (i.e., all possiblevalues of the length and widths of the transistors that sat-isfy the area constraint). This suggests that the complexity ofthe minimization problem of PFMOD reduces in successiveiterations (due to a reduction in Acellmax). To simplify theanalysis, we assume that the complexity of PFMOD minimiza-tion depends linearly on Acellmax. Hence, the complexity ofthe design (CDesign) flow with respect to the memory size isgiven by

    CDesign

    =NRC_minNRC(i)=1

    O(NCOL)

    PMEM Computation

    + O(Acellmax) PFMOD Minimization

    =NRC_minNRC(i)=1

    [O(NCOL) + O

    (kMSize

    MSizeNRC(i) + MSize)

    )]

    where Acellmax =AMAX

    NROW (NRC(i) + NCOL)(44)

    where for simplicity, we assume AMAX = kMSize andNROW = NCOL = MSize. Simplifying the above equation, weget [18]

    CDesign O(NCOLNRC_min)

    +O(NRC_mink

    MSize ln

    (NRC_min+

    MSize

    MSize

    ))CDesign O(NCOLNCOL) O(MSize)[

    NRC_min = O(NCOL) = O(

    MSize)]. (45)

    The above analysis shows that the complexity of the pro-posed design flow has a linear dependence on the memory size.

    E. Results and Discussions

    The statistical-design methodology described in the previoussection is used to optimize the cell structure and the use ofredundancy to minimize the memory-failure probability. We

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1877

    TABLE IIIRESULTS OF STATISTICAL DESIGN

    have applied the developed design methodology to an SRAMcell given in [19]. The cell was originally designed in 250 nmand was scaled down to the 50-nm node. It is observed thatapplication of the cell optimization successfully reduces thefailure probability and improves yield (Table III). To understandhow the optimization reduces the cell-failure probability, let usconsider Fig. 25, which shows the read, write, access, and holdfailure probabilities of the initial cell (i.e., scaled-down versionof the cell from [19]) and the statistically designed cell. It can beobserved from the figure that the initial cell had a large read andaccess failure, whereas write failure was low. This is becauseof the fact that, to improve the beta ratio between the pull-upPMOS and access transistor (to facilitate writing), the initialcell was designed with a longer PMOS. However, as explainedin Section V, a weaker PMOS tends to increase the read failureand hold failure (Fig. 15). Hence, the optimization reduces thelength of the PMOS resulting in a lower read and hold failure.The extra area obtained from reducing the length of the PMOSis used in increasing the width of the pull-down NMOS, whichsimultaneously improves the access failure (also reduces theaccess time) and read failure. However, the width of the NMOScannot be increased arbitrarily as that would increase the holdfailure by increasing the leakage through NL (as explained inSection V). It should be noted that reduction in the channellength of the PMOS transistors results in an increase in themean value of leakage. Hence, it can be observed that a sta-tistical design of the SRAM cell can significantly improve thedesign yield.

    The proposed design strategy allows to trade off betweenthe redundancy area and the active cell area. Reducing thenumber of redundant columns allows more area for each of theactual cells. This reduces the failure probability of the cells,thereby reducing PMEM. On the other hand, from (25) it canbe observed that reducing NRC will tend to increase PMEM.Fig. 26 shows the variation of PMEM with the variation of NRC,considering constant AMEM. It can be observed that increasingthe redundancy beyond a certain point increases the memory-failure probability. Thus, increasing the redundancy at the costof the silicon area does not necessarily reduce the memory-failure probability. It should be further noted that with theapplication of a higher value of the Vt0, the optimized valueof the redundancy (that minimizes failure probability) reduces.This indicates that with larger amount of variations, design ofrobust cell (with larger area) is more effective in reducing thefailure probability (improving yield) as compared to increasingthe number of redundant columns (at the cost of reducing thecell area). It should be noted that, in this analysis, we have

    Fig. 25. Modification of different failure probabilities by statistical-designstrategy.

    Fig. 26. Impact for number of redundant columns (NRC) on memory yieldunder area constraint.

    neglected the area required to implement the repair circuit thatreplaces the faulty columns with the redundant ones. The arearequired for the repair circuit also increases with an increase inthe number of redundant columns. The effect of repair circuitarea can be considered in the design flow presented in Fig. 23by modifying the estimation of maximum area available for thecells in each iteration [i.e., Acellmax(i)] as

    Acellmax(i) =AMAX Arepair

    NROW(NCOL + NRC)(46)

    where Arepair increases with an increase in NRC (i.e Arepair NRC.

    It should be noted that the consideration of the repair circuitarea will further reduce the optimum number of redundantcolumns. This is due to the fact that the repair circuit areaincreases with an increase in the number of redundant columns,thereby reducing the area available for the cells (i.e., cell-failure probability increases). It should be noted that in thiswork, we have considered only column (or row) redundancy.However, the combined row and column redundancy schemeis also used for yield improvements [20][25]. Although thetradeoff between the actual and the redundant area will still bevalid for the combined redundancy scheme, the exact analysisof the combined scheme is more complex [20][25]. In theAppendix, we presented a simplified method to incorporatethe combined redundancy scheme in the proposed design flow.The principal modification required is in the method by whichthe memory-failure probability (PMEM) is estimated from thecolumn (PCOL) and row (PROW) failure probabilities [i.e., (25)in case of either row or column redundancy].

  • 1878 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    VII. CONCLUSION

    In this work, we have analyzed different failure mecha-nisms in an SRAM cell, namely read, write, access, and holdfailures, due to intra-die variation in the transistor thresholdvoltage. We have developed semianalytical models to esti-mate the probabilities of different failure events. The derivedmodels can include the correlations of threshold voltages oftransistors in the SRAM cell. The cell-failure probability isestimated using the probability of failure of individual events.Using the cell-failure probability, we have developed a setof models to estimate the probability of failure of an SRAMarray. The derived memory failure probability model considersthe architecture of the array and the use of redundancy. Thecell and memory-failure probability models have been usedto predict the yield of memory at an early stage of a design.Using the proposed models, we have shown that to predict theperformance of an SRAM cell under parametric variations, afailure probability-based analysis is necessary. The proposedmodels are used for the statistical design and optimization ofmemory, which is necessary for maximizing yield in nanometerregimes. The developed design approach simultaneously opti-mizes the transistor sizes and the use of redundancy, to enhancethe memory yield. It has been observed that under large para-metric variation, increasing the number of redundant elements(at the cost of cell area) may not improve the design yield.The proposed statistical-modeling and design approach pro-vide an integrated circuit and architecture level-design strategyfor yield enhancement in nanoscale SRAMs.

    APPENDIXESTIMATION OF MEMORY-FAILURE PROBABILITY

    CONSIDERING COMBINED ROW ANDCOLUMN REDUNDANCY

    To improve the yield of memory array using redundancy,the use of combined row and column redundancy has beenproposed [20][25]. Fig. 27 shows the schematic of the mem-ory array considering the combined-redundancy scheme. Anexact estimation of the memory-failure probability consideringcombined redundancy is complex and requires knowledge ofthe information of fault location [20][25]. This is because ofthe fact that, whether a memory chip with a certain numberof faults can be repaired by using redundancy depends on thelocation of the faulty cells [20][25]. In [21][24], authors havedescribed different methods for repairing memory array usingcombined redundancy based on the fault-location information.However, at the design phase the information on fault locationis not available and hence cannot be used to estimate memory-failure probability. In [25], authors have proposed a methodfor the evaluation of memory failure by enumerating the dif-ferent fixable failure events (i.e., fault maps that can be fixedusing combined redundancy), which can be integrated into theproposed estimation models and design strategy. However, itwill increase the complexity of estimation of PMEM. Hence,we propose to use simple lower and upper bounds on PMEM(assuming certain fault locations) for the initial estimations ofPMEM in the design step presented in Fig. 23.

    Fig. 27. Combined row and column redundancy and memory-failureprobability.

    The worst case fault map for the combined-redundancyscheme occurs when none of the faulty cells share any rowor column with any other faulty cells (i.e., all the faulty cellsare orthogonal [24]). Under this condition, either one redundantcolumn or one redundant row is required to replace one faultycell. Hence, the upper bound on the memory-failure probability[PMEM(upper)] is given by

    PMEM(upper)=NORTH

    i=NRR+NRC+1

    (NORTH

    i

    )P iF(1 PF)NORTHi

    (A.1)where NORTH (= min{NCOL + NRC, NROW + NRR}) is thetotal number of orthogonal cells in the array.

    On the other hand, the best-case fault map occurs if thefaulty cells tend to be concentrated either in a column or ina row. This requires a smaller number of redundant columns(or rows) to correct a large number of faulty cells. A column(or row) definitely needs to be replaced if the number of faultycells in that column (or row) is more than the total number ofredundant row (or column). We define such a column (row) asa must-replace column (row) [23], [24]. The probability that acolumn (PCOLMR) or row (PROWMR) is a must-replace one isgiven by

    PCOLMR =NROW+NRRi=NRR+1

    (NROW + NRR

    i

    )

    P iF(1 PF)NROW+NRRi

    PROWMR =NCOL+NRCi=NRC+1

    (NCOL + NRC

    i

    )

    P iF(1 PF)NCOL+NRCi. (A.2)

    A memory with only must-replace columns or rows is afaulty one if the number of must-replace columns (NMRCOL)is greater than the number of redundant columns or the numberof must-replace rows (NMRROW) is greater than the number of

  • MUKHOPADHYAY et al.: MODELING OF FAILURE PROBABILITY AND STATISTICAL DESIGN OF SRAM ARRAY 1879

    redundant rows. This describes the fault map that results in thelower bound on PMEM and given by

    PMEM(lower) =P [(NMRCOL > NRC)(NMRROW > NRR)]

    max {P (NMRCOL > NRC),P (NMRROW > NRR)} . (A.3)

    The individual failure probabilities in the above equation aregiven by

    P (NMRCOL > NRC)

    =NCOL+NRCi=NRC+1

    (NCOL + NRC

    i

    ) P iMRCOL(1 PMRCOL)NCOL+NRCi

    P (NMRROW > NRR)

    =NROW+NRRi=NRR+1

    (NROW + NRR

    i

    ) P iMRROW(1 PMRROW)NROW+NRRi. (A.4)

    Hence, the overall memory failure probability is bounded as

    PMEM(lower) < PMEM < PMEM(upper). (A.5)

    The design flow presented in Fig. 23 can be modifiedto include the combined-redundancy scheme by minimizingPMEM(upper) and PMEM(lower) by a proper choice of NRRand NRC. A simple design flow can assume NRR = NRC.

    REFERENCES[1] S. R. Nassif, Modeling and analysis of manufacturing variations,

    in Proc. Custom Integrated Circuit Conf., San Diego, CA, 2001,pp. 223228.

    [2] C. Visweswariah, Death, taxes and failing chips, in Proc. DesignAutomation Conf., Anaheim, CA, 2003, pp. 343347.

    [3] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De,Parameter variation and impact on circuits and microarchitecture, inProc. Design Automation Conf., Anaheim, CA, 2003, pp. 338342.

    [4] A. Bhavnagarwala, X. Tang, and J. D. Meindl, The impact of intrinsicdevice fluctuations on CMOS SRAM cell stability, IEEE J. Solid-StateCircuits, vol. 36, no. 4, pp. 658665, Apr. 2001.

    [5] X. Tang, V. De, and J. D. Meindl, Intrinsic MOSFET parameter fluctu-ations due to random dopant placement, IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 5, no. 4, pp. 369376, Dec. 1997.

    [6] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices. NewYork: Cambridge Univ. Press, 1998.

    [7] D. Burnett, K. Erington, C. Subramanian, and K. Baker, Implicationsof fundamental threshold voltage variations for high-density SRAM andlogic circuits, in Symp. VLSI Technology, Honolulu, HI, Jun. 1994,pp. 1516.

    [8] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, Modeling and estimationof failure probability due to parameter variation in nano-scale SRAMs foryield enhancement, in Dig. Tech. Papers VLSI Circuit Symp., Honolulu,HI, Jun. 2004, pp. 6467.

    [9] , Statistical design and optimization of SRAM cell for yieldenhancement, in Proc. Int. Conf. Computer Aided Design, San Jose, CA,Nov. 2004, pp. 1013.

    [10] S. Mukhopadhyay, A. Raychowdhury, and K. Roy, Accurate estima-tion of total leakage current in scaled CMOS logic circuits based oncompact current modeling, in Design Automation Conf., Anaheim, CA,Jun. 2003, pp. 169174.

    [11] D. A. Antoniadis, I. J. Djomehri, K. M Jackson, and S. Miller, Well-Tempered, Bulk-Si NMOSFET Device Home Page. Cambridge, MA:

    Microsystems Technologies Laboratories, Massachusetts Institute ofTechnology [Online]. Available: http://www-mtl.mit.edu/Well/

    [12] MEDICI: 2-D Device Simulation Program. Mountain View, CA:Synopsys Inc.

    [13] A. Papoulis, Probability, Random Variables and Stochastic Process.New York: MacGraw-Hill, 2002.

    [14] A. Chandrakasan, W. J. Bowhill, and F. Fox, Design of High-PerformanceMicroprocessor Circuits. Piscataway, NJ: IEEE Press, 2001.

    [15] H. Qin, Y. Cao, D. Markovic, A. Vladimirescu, and J. Rabaey, SRAMleakage suppression by minimizing standby supply voltage, in Int. Symp.Quality Electronic Design, San Jose, CA, Mar. 2004, pp. 5560.

    [16] R. W. Mann et al., Ultralow-power SRAM technology, IBM J. Res.Develop., vol. 47, no. 5/6, pp. 553566, Sep. 2003.

    [17] B. W. Wah and Y.-X. Chen, Constrained genetic algorithms and their ap-plications in nonlinear constrained optimization, in IEEE Int. Conf. ToolsArticial Intelligence, Vancouver, BC, Canada, Nov. 2000, pp. 286293.

    [18] E. K. P. Chong and S. H. Zak, An Introduction to Optimization. NewYork: Wiley, 2001.

    [19] A. Agarwal, H. Li, and K. Roy, A single-Vt low-leakage gated-groundcache for deep submicron, IEEE J. Solid-State Circuits, vol. 38, no. 2,pp. 319328, Feb. 2003.

    [20] A. Chen, Redundancy in LSI memory array, IEEE J. Solid-StateCircuits, vol. 4, no. 5, pp. 291293, Oct. 1969.

    [21] S. E. Schuster, Multiple word/bit line redundancy for semiconductormemories, IEEE J. Solid-State Circuits, vol. 13, no. 5, pp. 698703,Oct. 1978.

    [22] J. R. Day, A fault-driven, comprehensive redundancy algorithm, IEEEDes. Test Comput., vol. 2, no. 2, pp. 3544, Jun. 1985.

    [23] W.-K. Huang, Y.-N. Shen, and F. Lombardi, New approaches for therepair of memories with redundancy by row/column deletion for yieldenhancement, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,vol. 9, no. 3, pp. 323328, Mar. 1990.

    [24] C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, Built-in redundancyanalysis for memory yield improvement, IEEE Trans. Reliab., vol. 52,no. 4, pp. 386399, Dec. 2003.

    [25] C. H. Stapper, A. N. McLaren, and M. Dreckmann, Yield modelfor productivity optimization of VLSI memory chips with redundancyand partially good product, IBM J. Res. Develop., vol. 24, no. 3,pp. 398409, 1980.

    Saibal Mukhopadhyay (S99) was born in Calcutta,India. He received the B.E. degree in electronicsand telecommunication electrical engineering fromJadavpur University, Calcutta, India, in 2000. He isworking toward the Ph.D. degree in electrical andcomputer engineering at Purdue University, WestLafayette, IN.

    He was an Intern in the High-Performance Circuit-Design Department, IBM T. J. Watson Research Lab-oratories, Yorktown Heights, NY, during the summerof 2003 and 2004. His research interests include

    analysis and design of low-power and robust circuits using nanoscaled CMOSand circuit design using double-gate transistors.

    Mr. Mukhopadhyay received the IBM Ph.D. Fellowship award for 20042005. He received the Best Paper Award at the 2004 International Conferenceon Computer Design.

    Hamid Mahmoodi (S00) received the B.S. (Hons.)degree in electrical engineering from Iran Universityof Science and Technology, Tehran, Iran, in 1998,and the M.S. degree in electrical and computer engi-neering from the University of Tehran, Iran, in 2000.His M.S. research was on low power design of digitalsystems based on adiabatic switching principles. Heis working toward the Ph.D. degree in electrical andcomputer engineering at Purdue University, WestLafayette, IN.

    His major research experiences and interests in-clude low-power, robust, and high-performance design in nanoscale bulkCMOS and SOI technologies, nanoelectronic devices and architectures, designfor yield enhancement, and VLSI testing. He has more than 30 publications injournals and conferences.

    Mr. Mahmoodi was a recipient of the 2004 ICCD Best Paper Award.

  • 1880 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 12, DECEMBER 2005

    Kaushik Roy (S83M83SM95F02) receivedthe B.Tech. degree in electronics and electrical com-munications engineering from the Indian Instituteof Technology, Kharagpur, India, and the Ph.D. de-gree from the Electrical and Computer EngineeringDepartment of the University of Illinois, Urbana-Champaign, in 1990.

    He was with the Semiconductor Process and De-sign Center of Texas Instruments, Dallas, TX, wherehe worked on FPGA architecture development andlow-power circuit design. He joined the electrical

    and computer engineering faculty at Purdue University, West Lafayette, IN,in 1993, where he is currently a Professor and University Faculty Scholar.His research interests include VLSI design/CAD for nanoscale silicon andnonsilicon technologies, low-power electronics for portable computing andwireless communications, VLSI testing and verification, and reconfigurablecomputing. He has published more than 300 papers in refereed journals andconferences, holds eight patents, and is a coauthor of the books Low PowerCMOS VLSI Circuit Design (New York: Wiley, 2000) and Low Voltage, LowPower VLSI Subsystems (New York: McGraw-Hill, 2005).

    Dr. Roy received the National Science Foundation Career DevelopmentAward in 1995, the IBM Faculty Partnership Award, the ATT/Lucent Foun-dation Award, and the Best Paper Awards at the 1997 International TestConference, the IEEE 2000 International Symposium on Quality of IC Design,the 2003 IEEE Latin American Test Workshop, and the 2003 IEEE Nano. Heis the Chief Technical Advisor of Zenasis, Inc., Campbell, CA, and ResearchVisionary Board Member of Motorola Laboratories, Schaumburg, IL (2002).He has been a Member of the Editorial Board of IEEE Design and Test, IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS, and IEEE TRANSACTIONSON VERY LARGE SCALE INTEGRATION SYSTEMS. He was Guest Editorfor the Special Issue on Low-Power Very Large Scale Integration in IEEEDesign and Test (1994) and IEEE TRANSACTIONS ON VERY LARGE SCALEINTEGRATION SYSTEMS (June 2000), IEEE PROCEEDINGSCOMPUTERSAND DIGITAL TECHNIQUES (July 2002).