Chapter 2 Semiconductor Device Reliability Verification · Chapter 2 Semiconductor Device...

of 33 /33
Semiconductor Quality and Reliability Handbook Chapter 2 Semiconductor Device Reliability Verification

Embed Size (px)

Transcript of Chapter 2 Semiconductor Device Reliability Verification · Chapter 2 Semiconductor Device...

  • Semiconductor Quality and Reliability Handbook

    Chapter 2 Semiconductor Device Reliability Verification

  • 2-1

    Chapter 2 Semiconductor Device Reliability Verification 2.1 Fundamental Knowledge on Semiconductor Reliability.............................................................................. 2-2

    2.1.1 Measures for Representing Reliability ............................................................................................. 2-2 2.1.2 Distributions Used in Reliability Analysis ......................................................................................... 2-4 2.1.3 Semiconductor Device Failure Pattern ........................................................................................... 2-7 Semiconductor Device Failure Regions ........................................................................... 2-7 Early Failures ..................................................................................................................... 2-8 Random Failures ............................................................................................................. 2-10 Wear-out Failures ............................................................................................................ 2-11

    2.2 Semiconductor Reliability Verification ........................................................................................................ 2-13 2.2.1 Basic Approach Toward Reliability Verification ............................................................................. 2-13 Reliability Verification in the Development Stage ........................................................... 2-13 Reliability Verification in the Prototype Stage ................................................................. 2-13 Reliability Verification in the Mass Production Stage ..................................................... 2-14

    2.2.2 Reliability in the Development and Design Stages ....................................................................... 2-15 Time-Dependent Dielectric Breakdown (TDDB) ............................................................ 2-17 Hot carrier (HCI) ............................................................................................................... 2-19 Negative Bias Temperature Instability (NBTI) ................................................................ 2-20 Soft Error .......................................................................................................................... 2-21 Electromigration ............................................................................................................... 2-23 Stress Migration ............................................................................................................... 2-25

    2.3 Acceleration Model ...................................................................................................................................... 2-28 2.3.1 Acceleration Models for Environmental Stress ............................................................................. 2-28 2.3.2 Acceleration Models for Operating Stress .................................................................................... 2-30

  • 2-2

    2.1 Fundamental Knowledge on Semiconductor Reliability

    With recent advances in the systematization, functions and performance of equipment, the social impact and

    damages produced by failures are increasing, and high reliability has come to be demanded of equipment. This

    means that even higher reliability is demanded of the individual components that comprise equipment.

    Large quantities of semiconductors are used in a single piece of equipment, and these semiconductors often

    handle the main functions of that equipment, so high reliability is extremely important. Semiconductors themselves

    are also becoming more miniaturized and highly integrated, with larger-scale circuit configurations. In addition, as

    semiconductor functions and performance advance and evolve into system LSIs, ensuring semiconductor reliability

    has become a vital matter.

    The reliability measures, distribution functions, trends in failure rates over time, and failure regions needed to

    discuss semiconductor reliability are described below.

    2.1.1 Measures for Representing Reliability JISZ 8115 (Reliability Terminology) defines reliability as The property of an item which enables it to fulfill its

    required functions for the prescribed period under the given conditions. Therefore, reliability includes the concept

    of time, and reliability measures are functions of time.

    (1) Reliability Function (Reliability): R(t)

    Reliability indicates the probability for functioning correctly without failure until time t.

    When n samples are used under the same conditions, if the number of failures occurring until time t has

    elapsed is expressed as r(t), then the reliability R(t) is expressed by the following equation.

    ntrntR )()( Eq. 2.1.1

    (2) Failure Distribution Function (Unreliability): F(t)

    This indicates the probability of failure occurring until time t, and is expressed by the following equation.

    ntrtF )()( Eq. 2.1.2

    In addition, the following relationship is established between unreliability F(t) and reliability R(t).

    1)()( tFtR Eq. 2.1.3

    As shown in Fig. 2-1, R(t) decreases from 1 over time, while conversely F(t) increases from 0 toward 1 over

    time. Note that the distribution functions described hereafter are used as the failure distribution functions of

    semiconductor devices.

  • 2-3

    Fig. 2-1 Relationship between F(t) and R(t)

    (3) Failure Density Function: f(t)

    This represents the probability of failure occurring per unit time when time t has elapsed.


    dttdFtf )()()( Eq. 2.1.4

    (4) Failure Rate Function: (t)

    This represents the probability of failure occurring in the next unit time for samples that have not yet failed

    when time t has elapsed.





    Eq. 2.1.5

    The failure rate function is also called the instantaneous failure rate, and is calculated from the failure

    distribution function F(t) using Equations 2.1.4 and 2.1.5. Failure In Time (FIT: number of failures per billion

    (109) total operating hours) is generally used as the unit for semiconductor devices.

    Note that when the F(t) of the subject product is not known, the average failure rate obtained by the following

    equation is used.

    Average failure rate Total number of failures during the period / Total operating time during the period

    Eq. 2.1.6


    In addition to the failure rate defined above, the cumulative failure rate after a set equipped with the

    semiconductor device has operated for the specified time in the market is sometimes used in the early failure

    region described hereafter. Unless otherwise requested by the customer, the Sony Semiconductor Business

    Unit also uses the cumulative failure rate after one year as the early failure rate.

    In addition, after the early failure region, most semiconductor devices do not reach wear-out failure (genuine

    failure) in the actual operating environment, and the failure rate exhibits the constant value of the random

  • 2-4

    failure region. This value becomes the same as that obtained by Equation 2.1.6, so the average failure rate can

    be said to essentially be the failure rate after the early failure region.

    (5) Mean Time To Failure: MTTF

    The Mean Time To Failure (MTTF) of an item such as a semiconductor device that is not subject to repair or

    maintenance is expressed by the following equation.


    )( dtttfMTTF Eq. 2.1.7

    2.1.2 Distributions Used in Reliability Analysis Typical distribution functions used to analyze reliability data of semiconductor devices are described below.

    (1) Normal distribution

    The normal distribution is a typical continuous distribution used for quality control. It is said that in

    reliability analysis, the normal distribution is often applied to wear-out life where failures concentrate around a

    certain time.

    The probability density function f(t) and distribution function F(t) are expressed by the following equations.


    exp21)( 2



    Eq. 2.1.8







    Eq. 2.1.9

    This distribution is given by the mean parameter and the dispersion (variance) parameter .

    As shown in Fig. 2-2 below, the normal distribution has a symmetrical bell shape centering on , and the

    probability of the value t being contained within the range of , 2 and 3 to both sides of is 68.26%,

    95.44% and 99.7%, respectively.

    Fig. 2-2 Normal Distribution

  • 2-5

    (2) Exponential distribution

    The exponential distribution represents the life distribution (failure distribution function) in the random

    failure region where the failure rate is constant over time, and the probability density function f(t) and

    distribution function R(t) are expressed by the following equations. This distribution corresponds to the case

    when the shape parameter m = 1 in the Weibull distribution described hereafter. tetf )( Eq. 2.1.10

    tetR 1)( Eq. 2.1.11

    Fig. 2-3 Exponential Distribution

    Note that as shown in the following equation, the MTTF is given from t0, which is the inverse of the failure

    rate .

    MTTFt 01 Eq. 2.1.12

    (3) Logarithmic normal distribution

    The logarithmic normal distribution is a distribution function where ln t, which is the logarithm of the life

    time t, follows the above-mentioned normal distribution.

    The probability density function f(t) and distribution function F(t) are expressed by the following equations.


    ttf 0ln




    Eq. 2.1.13






    Eq. 2.1.14

  • 2-6

    Fig. 2-4 Logarithmic Normal Distribution

    In semiconductor device reliability, the electromigration life is generally known to follow a logarithmic

    normal distribution.

    (4) Weibull distribution

    The Weibull distribution is a weakest link model proposed by W. Weibull (Sweden) in 1939 as a mechanical

    breakdown strength distribution. This model was applied by J. H. K. Kao in 1955 to analyze the life of vacuum

    tubes, and has often been used since then to model life distributions in analysis of semiconductor device


    The probability density function f(t) and distribution function F(t) are expressed by the following equations.




    Eq. 2.1.15

    mttFexp1)( Eq. 2.1.16

  • 2-7

    Fig. 2-5 Weibull Distribution

    Here, m is called the form parameter, the measure parameter (characteristic life), and the position


    In addition, assuming t0=m, the failure rate (t) is expressed by the following equation.







    Eq. 2.1.17

    The following information concerning the failure pattern can be obtained from the value of the form

    parameter m.

    0 < m < 1: Early failure (DFR) pattern where the failure rate decreases over time

    m = 1: Random failure (CFR) pattern where the failure rate is constant (matches with the exponential


    m > 1: Wear-out failure (IFR) pattern where the failure rate increases over time

    2.1.3 Semiconductor Device Failure Pattern Semiconductor Device Failure Regions

    Like general electronic equipment, semiconductor device failure regions are classified into the three types of

    early, random and wear-out failure regions, and the time-dependent trend in the failure rate creates a curve called a

    bathtub curve as shown in Fig. 2-6.

    This curve is the sum of the early failure rate which decreases steadily over time, the random failure rate which

    exhibits a constant value, and the wear-out failure rate which increases steadily over time. However, in case of

    semiconductor devices, the random failure rate is thought to consist of only small soft errors as described hereafter,

    and the failure rate in the random failure region (the height of the bottom of the bathtub) can be said to be

    dominated by the sum of the failure rates of the region where the early rate converges towards a constant value and


  • 2-8

    the region where the wear-out failure rate begins to rise.

    Early failure region Random failure region Wear-out failure region

    Wear-out failure rate

    Random failure rate

    Life (Useful years)Operating timeProduct shipped

    Early failure rate


    re ra


    Fig. 2-6 Time-Dependent Change in Semiconductor Device Failure Rate Early Failures

    The failure rate in the early failure period is called the early failure rate (EFR), and the failure rate

    monotonically decreases over time. The vast majority of semiconductor device early failures are caused by defects

    built into devices mainly in the wafer process. The most common causes of these defects are dust adhering to

    wafers in the wafer process and crystal defects in the gate oxide film or the silicon substrate, etc. Most devices

    containing defects rooted in the manufacturing process fail within the manufacturing process and are eliminated as

    defective in the final sorting process. However, a certain percentage of devices with relatively insignificant defects

    may not have failed when making the final measurements and may be shipped as passing products. These types of

    devices that are inherently defective from the start often fail when stress (voltage, temperature, etc.) is applied for a

    relatively short period, and exhibit a high failure rate in a short time within the customers mounting process or in

    the initial stages after being shipped as products. However, these inherently defective devices fail and are

    eliminated over time, so the rate at which early failures occur decreases.

    This property of semiconductor devices where the failure rate decreases over time can be used to perform

    screening known as burn-in, where stress is applied for a short time in the stage before shipping to eliminate

    devices containing initial defects. Product groups from which devices with inherent initial defects have been

    removed to a certain degree by burn-in not only improve the early failure rate in the market, but also make it

    possible to maintain high quality over a long period as long as these products do not enter the wear-out failure


    An overview of burn-in is described below.

    (1) Derivation of failure distribution function of early failure period

    In order to determine the burn-in conditions for reliably removing devices with inherent early failures, it is

    necessary to obtain the failure distribution function of the early failure period.

  • 2-9

    To obtain this function, highly accelerated life tests are performed in a short time using a sample quantity on

    a scale that is certain to contain devices with inherent initial defects (normally several thousand to ten thousand

    pieces). The obtained failure time data is then plotted on Weibull probability paper and the failure distribution

    function is estimated from the resulting regression line.

    Fig. 2-7 shows an example of this process. The shape parameter m and the characteristic life that determine

    the Weibull distribution in the following equation can be obtained from the linear regression.


    exp1)( Eq. 2.1.18

    This method of obtaining the failure distribution function is called burn-in study.

    Fig. 2-7 Weibull Plot of the Burn-in Study

    Note) Weibull probability paper is scaled to display linear regression of failure times that follow a Weibull


    (2) Determining the burn-in conditions

    The screening (burn-in) conditions required to reduce the early failure rate after shipment (Note 1) to the

    target value can be determined using the failure distribution function F(t) obtained from the burn-in study.

    Labeling the burn-in time as t0 and the coefficient of acceleration for the burn-in conditions and the market

    environment as K, the cumulative early failure rate that can be eliminated by burn-in is given as F(Kt0), and

    the new cumulative early failure rate F(t) up to time t after burn-in can be obtained by the following formula.

    )()()( 00 tKFttKFtF Eq. 2.1.19 This relationship can be expressed in graph form as shown in Fig. 2-8.

    The burn-in conditions are selected according to the combination of the acceleration conditions and time that

    will reduce this value to the target early failure rate or lower. Normally, initial defects that are the cause of

  • 2-10

    early failures occur at the highest rate in the initial stages of process development, and then decrease thereafter

    due to process improvements and process mastery. The early failure rate decreases in proportion to these initial

    defects, so the burn-in time is reviewed as appropriate in accordance with process improvements.

    Early failures eliminated by screeningFa






    ty F




    Cumulative early failure rate F(t)


    Burn-in Shipment

    Fig. 2-8 Early Failure Screening by Burn-in

    Note 1) The early failure rate described in this section is not the instantaneous failure rate but the

    cumulative failure rate over the specified period. See the [Supplement] under 2.1.1 Measures for

    Representing Reliability. Random Failures

    When devices containing initial defects have been eliminated to a certain degree, the early failure rate becomes

    extremely small, and the failure rate exhibits a gradually declining curve over time. In this state, the failure

    distribution is close to an exponential distribution, and this is called the random failure period. The semiconductor

    device failure rate during this period is an extremely small value compared to the early failure rate immediately

    after shipment, and is normally a level that can be ignored for the most part. Viewed in terms of failure

    mechanisms, there are extremely few semiconductor device failures that can be clearly defined as random failures.

    However, memory software errors and other phenomena caused by rays and other high-energy particles are

    sometimes classified as randomly occurring failure mechanisms.

    When predicting semiconductor device failure rates, failures occurring sporadically after a certain long time has

    passed since the start of operation and failures for which the failure cause could not be determined are treated as

    random failures in some cases. However, most of these failures are thought to be devices containing relatively

    insignificant initial defects (dust or crystal defects) that fail after a long time, and should essentially be positioned

    on the early failure rate attenuation curve. This type of failure rate cannot be estimated from the results of tests

    performed with few samples such as reliability tests. There are also phenomena such as ESD breakdown,

    overvoltage (surge) breakdown (EOS) and latch-up that occur at random according to the conditions of use.

    However, these phenomena are all produced by the application of excessive stress over the device absolute

  • 2-11

    maximum ratings, so these are classified as breakdowns instead of failures, and are not included in the random

    failure rate. Wear-out Failures

    Wear-out failures are failures rooted in the durability of the materials comprising semiconductor devices and the

    transistors, metal lines, oxide films and other elements, and are an index for determining the device life (useful

    years). In the wear-out failure region, the failure rate increases with time until ultimately all devices fail or suffer

    characteristic defects.

    The main wear-out failure mechanisms for semiconductor devices are as follows.


    Hot carrier-induced characteristics fluctuation

    Time-dependent dielectric breakdown (TDDB)

    Laser diode luminance degradation

    Semiconductor device life is defined as the time (or stress) at which the cumulative failure rate for the wear-out

    failure mode reaches the prescribed value, and can be estimated using the results of reliability tests and test element

    group (TEG) evaluation.

    Semiconductor device life is often determined by the reliability of each element (metal lines, oxide film,

    interlayer film, transistor, etc.) comprising the device, and these reliabilities are evaluated using TEG for each

    element in the process development stage. These TEG evaluation results are incorporated into design rules in the

    form of allowable stress limits (electric field strength, current density, etc.) to suppress wear-out failures in the

    product stage and ensure long-term reliability. As a result, semiconductor devices experience almost no wear-out

    failures within the reliability test time (stress) range in the product stage.

    (1) Life estimation method

    Semiconductor device life can be obtained as follows based on the wear-out failure data generated by TEG

    evaluation and reliability tests. First, linear regression is performed for the time-dependent cumulative failure

    rate using a Weibull probability distribution or logarithmic normal probability distribution, then the life is

    obtained from the time (or stress) at which the reference cumulative failure rate is reached and the acceleration

    factor of the accelerated test conditions (Fig. 2-9).

  • 2-12










    0.110 100 1000 10000 100000

    Time (h)

    Acceleration test failure rate Predicted marketenvironment failure rate

    Acceleration factor

    Fig. 2-9 Failure Rate Prediction Method Using Weibull Probability Plotting Paper

  • 2-13

    2.2 Semiconductor Reliability Verification

    2.2.1 Basic Approach To Reliability Verification The Sony Semiconductor Business Unit performs reliability verification that takes into account semiconductor

    device failure modes (see Fig. 2-10) in each stage from process development through mass production.

    Fig. 2-10 Semiconductor Device Failure Rate Curve Reliability Verification in the Development Stage

    The failure time due to wear-out failure (intrinsic failure) of semiconductor devices, that is to say the life, is

    determined by the failure mechanisms of the process elements described in 2.2.2.

    Reliability is evaluated in the process development stage using test element groups (TEG) suitable for verifying

    these failure mechanisms to confirm that the prescribed reliability is satisfied. Reliability Verification in the Prototype Stage

    (1) Reliability verification for wear-out failures (Intrinsic failures)

    Reliability is evaluated over long times using small quantities of prototypes to verify that wear-out failures

    do not occur in the assumed operating environments and operating periods. (See Table 2-1.)

    (2) Reliability verification for early failures (Extrinsic failures)

    Semiconductor devices tend to have a high failure rate at the start of operation, and this failure rate tends to

    decrease steadily over time. This is because a certain percentage of semiconductor devices have inherent

    manufacturing defects such as dust, causing these devices to fail. This tendency is more noticeable for new

    processes, so burn-in studies are performed when introducing production to verify the early failure rate.

    When the prescribed failure rate is not satisfied, burn-in and other screening methods are used to remove

    Early failure mode (Extrinsic failures)

    Wear-out failure mode(Intrinsic failures)

    New process

    Operating time After burn-in


    re ra


  • 2-14

    semiconductor devices with inherent manufacturing defects.

    The Sony Semiconductor Business Unit continuously executes activities to stabilize and improve processes,

    and strives to reduce the number of semiconductor devices with inherent manufacturing defects so that

    prescribed early failure rates can be satisfied without the need to perform burn-in. Reliability Verification in the Mass Production Stage

    Mass production items are sampled* and reliability is periodically evaluated at the product level corresponding

    to (1) above to confirm that the wear-out failure reliability level built in at the development stage is continuously

    maintained from mass production onward.

    * Samples are taken from each product family in consideration of combinations of wafer process, assembly

    process, factory, and other factors.

  • 2-15

    Table 2-1 shows typical LSI product reliability test items used by the Sony Semiconductor Business Unit.

    Table 2-1 Typical Sony LSI Product Reliability Test Items

    Name of test Code Test conditions

    High Temperature Operating Life HTOL Tj125C Vop_max 1000h

    Low Temperature Operating Life LTOL Ta=-55C Vop_max 1000h

    Temperature Humidity Bias THB Ta=85C85%RH Vop_max On/Off 1000h

    High Temperature Storage HTS Ta=150C 1000h

    Temperature Cycling TC Ts=-65~125C 700cyc Ts=-40~125C 850cyc Ts=-65~150C 500cyc

    Moisture Sensitivity Level MSL Level 3 (standard lank) (J-STD-020)

    Electrostatic Discharge Human Body Model (HBM)


    C=100pF, R=1500 (JS-001-2014)

    Electrostatic Discharge Charged Device Model (CDM)


    Charged Device Model (JESD22-C101)

    Latch-Up Trigger Pulse Current Injection Method

    LU I-Test

    Trigger pulse current injection method (JESD78)

    Latch-Up Supply Overvoltage Method LU V-Test

    Power supply overvoltage method; Ta=25, 125C (JESD78)

    Burn-In Study (Early Life Failure Rate) BIS (ELFR)

    Tj125C, Vop_max

    2.2.2 Reliability in the Development and Design Stages Semiconductor devices have failure mechanisms unique to semiconductors, and resolving these problems in the

    process development stage is an important element for securing reliability. Stable product reliability can be

    secured by verifying the required reliability when developing each process element and reflecting these results to

    the design rules.

    Table 2-2 shows typical failure mechanisms that can pose problems in the process development stage. As

    processes become more miniaturized, higher internal electric fields, current densities, metal line stress and other

    factors increase the stress applied to transistors and metal lines. On the other hand, faster circuit speeds and

    increased parasitic impedance (metal line resistance, parasitic capacitance) reduce operating margins, which is a

    major issue in securing reliability with respect to transistor characteristics fluctuation.

    Typical semiconductor device failure mechanisms that can pose problems in the process development and

    design stages are described below.

  • 2-16

    Table 2-2 Typical Failure Mechanisms in the Process Development Stage

    Process element

    Failure mechanism Failure mode and cause

    Gate dielectric film

    Time-dependent dielectric breakdown (TDDB)

    Dielectric breakdown of the gate dielectric film. This is the phenomenon where bias applied to a gate electrode for a long time produces defects in the gate dielectric film, increasing the micro leak current and leading to dielectric breakdown.

    Transistor Hot carrier (HCI) Transistor characteristics fluctuation due to trapping of hot carriers in the gate dielectric film. This is the phenomenon where high-energy electrons and holes generated by impact ionization of electrons accelerated by high electric fields are trapped in the oxide film, causing the transistor characteristics to fluctuate.

    NBTI (slow trap) PMOS transistor characteristics fluctuation due to application of a gate negative bias (NBT). This is also called the slow trap phenomenon, and is the phenomenon where application of a bias at high temperatures increases the interface state and positive fixed charge, causing the transistor characteristics to fluctuate.

    Memory device

    Soft error Memory data rewrite error due to high-energy cosmic ray particles (neutron rays, proton rays, etc.), rays, etc. This is a temporary data error phenomenon that occurs mainly in DRAM and SRAM.

    Retention/disturb Non-volatile memory data loss. This is the phenomenon where long-term storage or operating environment stress (read/write electric field, temperature, stress) causes the trapped charge in a Flash memory to disappear, inverting the data.

    Metal lines Electromigration Increased metal line resistance and disconnection due to voids forming in metal lines. This is the phenomenon where physical impacts between electrons and metal atoms cause the metal atoms to move, creating voids.

    Stress migration The metal creep phenomenon due to metal line stress causes voids to form and grow in metal lines and connection (via hole) portions, resulting in open defects. In copper lines, this is the phenomenon where vacancies (atom holes) in copper lines due to metal line stress induce the creep phenomenon, causing voids to form and grow.

    Low-k interlayer films

    TDDB between metal lines

    Short-circuit due to dielectric breakdown between copper lines. This phenomenon mainly consists of dielectric breakdown via the CMP interface of an interlayer dielectric film that uses low-k materials, resulting in a short-circuit between metal lines.

  • 2-17 Time-dependent Dielectric Breakdown (TDDB)

    MOS FET gate dielectric film has a failure mechanism whereby applying even an electric field of the dielectric

    withstand voltage or less for a long time causes the dielectric film to deteriorate and lead to breakdown. This

    breakdown of the dielectric film over time is called time-dependent dielectric breakdown (TDDB). The TDDB life

    of gate dielectric film is one of the most important failure mechanisms determining the long-term reliability of a

    MOS-type semiconductor device. The TDDB life said to be the factor that determines the limit for reducing the

    gate dielectric film thickness, and the gate dielectric film thickness in system LSI is also sometimes determined by

    the TDDB life in accordance with the logic circuit supply voltage.

    (1) Gate dielectric film life distribution

    Time-dependent dielectric film breakdown phenomena can generally be divided into an initial breakdown

    area rooted in defects and a genuine life area. Fig. 2-11 shows the TDDB measurement data of a gate oxide

    film (SiO2) plotted using a Weibull distribution function. The initial breakdown and genuine life areas can be

    separated according to differences in the shape parameter (graph slope) of the Weibull distribution function.

    Dielectric film distributed in the initial breakdown area with a short TDDB life is oxide film that includes

    defects that may fail in a short time in the market, so it is important to suppress the defect occurrence rate to

    lower the early failure rate.

    In contrast to this, the genuine breakdown area indicates the natural life of gate dielectric film that does not

    include major defects, and is a necessary index for assuring long-term reliability. The genuine life at the actual

    operating voltage can be predicted using an electric field acceleration model from the evaluation results of

    TDDB accelerated by high electric field stress conditions. The electric field acceleration model uses the E-

    model (exp(E)), Power-law model (E-n) and other models according to the film thickness and film type.

    (See Fig. 2-12.)

    (2) Gate dielectric film breakdown mechanism

    Gate dielectric film contains a large number of micro defects and impurities that occur in the wafer process,

    and micro leak currents flow via these defects even in the state where the applied electric field (supply voltage)

    is less than the genuine withstand voltage. These leak currents generate new defects in the dielectric film over

    time, and the accumulation of these defects leads to dielectric film breakdown.

    The percolation model is a typical failure mechanism for TDDB breakdown of thin gate dielectric film. In

    this failure model, when defects initially present in the gate dielectric film and new defects generated by tunnel

    current flowing due to the application of electric fields are continuous in the thickness direction, this leads to

    dielectric breakdown. (See Fig. 2-13.)

    As gate dielectric film becomes thinner, fewer defects may generate continuous defects which are needed for

    dielectric breakdown, so the TDDB life variance increases. In addition, data written in Flash memories can also

  • 2-18

    be lost (phenomenon of retention) due to micro leak currents prior to breakdown.

    Fig. 2-11 TDDB Data Distribution (Weibull)

    Fig. 2-12 Electric Field Acceleration Model and Life Prediction

    EFIELD: Actual electric fieldETEST: Test electric field

    Genuine life distribution

    Oxide film that includes defects

    (early failure area)

  • 2-19

    Fig. 2-13 Gate Dielectric Film Breakdown Model (Percolation Model) Hot carrier (HCI)

    Hot carrier is a failure mechanism where a charge (carrier) that has attained high energy mainly due to

    acceleration by the electric field inside the MOS FET becomes trapped in the gate dielectric film, causing the

    transistor characteristics to fluctuate and resulting in a circuit operation error. In a general operating environment,

    the greatest transistor deterioration is caused by Drain Avalanche Hot Carrier (DAHC) injection, which occurs

    when electrons flowing along an NMOS FET channel are accelerated by the high electric field near a drain. On the

    other hand, the hot carrier mechanism that injects a charge to the dielectric film is also used to write and erase data

    in a non-volatile memory.

    (1) Drain Avalanche Hot Carrier (DAHC) injection

    Electrons flowing in a NMOS FET channel are accelerated by the high electric field near a drain and undergo

    impact ionization, generating electron-hole pairs. Of the electron or the hole, the carrier with the higher energy

    (hot carrier) is injected to and trapped by the gate dielectric film, causing the transistor characteristics to

    fluctuate (threshold value fluctuation, drop in drain current, etc.). This is called Drain Avalanche Hot Carrier

    (DAHC) injection. (See Fig. 2-14.)

    The dominant DAHC injection mode in a NMOS FET is mainly electron injection, and the maximum

    deterioration occurs under the condition where the gate voltage is approximately 1/2 VDS. This means that in a

    CMOS circuit, hot electron injection occurs when the signal is inverted (HL/LH), so deterioration

    progresses as the circuit is operated.

    This problem can be avoided by selecting operating conditions (voltage, duty) in the circuit design stage

    under which hot carriers are not easily generated, and reliability can also be increased by providing circuits

    with the required operating margin. Device countermeasures are also taken, such as adopting a device structure

    (LDD structure) that suppresses hot carrier generation by reducing the electric field around drains.

    (a) Initial stage (b) Defect generated by micro leak current

    (c) Breakdown occurs


  • 2-20

    Fig. 2-14 DAHC Mechanism Negative Bias Temperature Instability (NBTI)

    PMOS FET negative bias temperature instability (NBTI) is the phenomenon where transistor characteristics

    fluctuate when a negative gate bias is applied to a PMOS FET. This is one of the transistor deterioration

    mechanisms known as slow trap. PMOS FET is one of the latest MOS processes, and the use of surface channel-

    type transistors causes deterioration to increase, which is a transistor reliability problem on a level with hot carriers.

    (1) NBTI deterioration mechanisms

    When a negative bias is applied to a PMOS FET, the holes on the Si surface are trapped by the Si-H bond of

    the Si-SiO2 interface, and the hydrogen (H) is disassociated from the Si-H bond and generates an interface state.

    The hydrogen disassociated from the Si bond diffuses and is trapped within the gate dielectric film, generating

    a positive fixed charge that promotes deterioration of the transistor characteristics.

    Si Si- H + hole Si Si-+ + H

    H + H H2

    The interface state generated at the interface between the Si and the gate dielectric film traps the positive

    charge when the PMOS FET operates, and becomes positively charged. This generates a positive fixed charge

    in the dielectric film, and causes the transistor threshold voltage (Vth) to fluctuate and the drain current to drop.

    One characteristic of NBTI is that when negative bias is applied to a gate, deterioration occurs regardless of

    transistor operation, so deterioration proceeds even in circuits that are not operating. On the other hand, there is

    also the phenomenon that fluctuating characteristics recover rapidly when negative bias stress is not applied,

    and the amount of fluctuation in the operating state is known to be largely independent of the operating

    frequency. In the process conditions, the amount of NBTI deterioration is closely related to the concentrations

    and profile of the impurities (N, H, B, etc.) in the gate dielectric film, and the amount of deterioration increases

    in particular for gate dielectric films (SiON, SiN) with high nitrogen (N) contents.


    Source Drain

    Electron Hole

  • 2-21

    This problem can be avoided by design countermeasures such as providing sufficient margin for circuit

    operation on account of transistor deterioration, and by reducing the electric fields applied to gate dielectric

    film. Device countermeasures are also taken such as forming the gate dielectric film so that interface states and

    fixed charges are not easily generated.

    Fig. 2-15 NBTI Failure Mechanisms Soft Error

    When rays and high-energy neutron rays generated from cosmic rays, etc. penetrate memory elements and

    other semiconductor devices, large quantities of electron-hole pairs are generated within the silicon crystals. These

    charges invert the memory nodes, resulting in memory data errors known as the soft error phenomenon. The soft

    error phenomenon temporarily inverts the memory and logic circuit data, and these errors can be recovered by

    rewriting the data. This phenomenon was previously a problem for DRAM, but is currently also considered a

    problem for SRAM reliability.

    (1) Principle of soft error generation by rays

    The quartz materials used in the sealing resin packages of semiconductors contain trace amounts of

    radioactive elements (uranium: 238U; thorium: 232Th). In addition, the lead bumps used in flip chips sometimes

    contain polonium (210Po). When the high-energy rays emitted by these radioactive elements penetrate the

    silicon substrate, electron (e-) and hole (e+) pairs are generated along the ray path inside the silicon. The

    electric field causes electrons generated inside the depletion layers to migrate and cluster together in the n

    HoleDiffusion to within the oxide film Generation of a positive fixed charge

    Hole trapping

    Generation of an interface state

    Si-SiO2 interface terminated by hydrogen (H) (Negative bias applied)

    Hole trapping by the tunnel phenomenon

    Disassociation of hydrogen (H) and generation of an interface state

  • 2-22

    diffusion area, which causes the memory node capacity potential to drop. (See Fig. 2-16.)

    Fig. 2-17 shows the soft error mechanisms in the SRAM memory cell. When the High side memory node

    potential falls below the driver transistor threshold value, the two inverters forming a Flip-Flop both turn off at

    the same time, making the Flip-Flop unstable and causing misoperation. Generally when the word line is

    selected, the High side memory node potential (Vh) drops to Vcc - Vth (word transistor threshold value). When

    the word line is not selected, the High side memory node is charged by the memory cell load and the potential

    returns to Vcc. The faster this recovery time from Vcc - Vth to Vcc, that is to say the greater the current supply

    capacity of the memory cell load, the more resistant the SRAM is to soft errors.

    Countermeasures for soft errors caused by rays include forming a protective film on the chip surface to

    absorb rays. In addition, countermeasures are also taken to reduce ray emission levels such as by using

    highly pure package materials with reduced levels of radioactive element contents.

    Fig. 2-16 Generation of Electron and Hole Pairs by Rays

  • 2-23

    Fig. 2-17 Soft Error in the SRAM Cell

    (2) Soft errors due to cosmic rays

    High-energy cosmic rays collide in the atmosphere with the atoms that comprise the atmosphere, generating

    high-energy protons and neutrons. These high-energy neutron rays passing through silicon, electron-hole pairs

    are generated along the range and the neutron rays collide with silicon atoms to generate secondary ions by

    spallation reaction; which can cause soft errors. The quantity of high-energy neutrons generated by cosmic rays

    that reaches the ground is known to increase in high-elevation regions due to differences in geographical

    conditions and lower atmospheric shielding effects, and this causes the soft error occurrence rate to increase.

    This can pose serious reliability problems in applications such as aircraft and satellites.

    It is difficult to suppress factors causing soft errors due to cosmic rays, so this is known as a failure mode

    that occurs at a certain probability. One countermeasure method for SRAM is to mount error correcting code

    (ECC) so that data experiencing soft errors is corrected. In addition, device structures such as SOI structures

    that are resistant to the effects of soft errors are also sometimes used. Electromigration

    Electromigration is a failure mechanism where electrons flowing through metal (Al, Cu) lines collide physically

    with the metal atoms, causing the metal atoms to migrate and form voids in the metal lines which lead to increased

    metal line resistance and disconnection. Electromigration is a key failure mechanism that determines the long-term

    reliability of metal lines.

    (1) Aluminum electromigration

  • 2-24

    The thin films used in aluminum (Al) lines are formed by spattering, and the aluminum atoms accumulate in

    a polycrystalline (grain) structure. (See Fig. 2-18.) When current of a certain density or more flows through

    these metal lines, the electromigration phenomenon is caused where the metal atoms physically move by stress

    due to collisions between the electrons and metal atoms. The metal atoms around the grain boundaries have

    weak bonding energy and move easily, so electromigration occurring at the grain boundaries of metal lines

    with uneven grain sizes causes voids to form and grow along the grain boundaries, leading to disconnection.

    (See Figs. 2-19 and 2-20.)

    Process countermeasures include adding trace amounts of copper to aluminum to suppress aluminum atom

    migration by slowing down the movement time, and covering the top and bottom of metal lines with Ti, W or

    other metal alloys (cap layer) to suppress aluminum atom movement. Circuit design countermeasures are also

    taken such as keeping the current density that flows in metal lines to a certain value or less.

    Fig. 2-18 Aluminum Grain Structure

    Fig. 2-19 Electromigration Mechanism

    Al accumulation Al shortage (void)

    Al grain boundary

    Grain boundary diffusion


  • 2-25

    Fig. 2-20 Photo of Electromigration

    (2) Copper electromigration

    Copper lines are formed by an embedded metal line (damascene) process that uses electroplating. Copper has

    a higher melting point and activation energy than aluminum, and exhibits reliability with respect to

    electromigration that is several ten to several hundred times higher than that of aluminum. However, the

    miniaturization of metal lines in the latest processes is increasing the current density, so resistance to

    electromigration is becoming an important issue for reliability.

    The electromigration resistance of copper is known to be greatly affected by the crystal grain size and

    alignment, and the adhesion at the interface between the copper and the barrier metal. Particularly in copper

    lines that has a structure surrounded by barrier metal, when the adhesion drops between the copper and the cap

    layer on the top surface where smoothing is performed, the copper at the interface moves easily, resulting in

    migration. Therefore, it is important that the process incorporate countermeasures to increase the adhesion at

    the interface between the copper and the cap layer. Circuit design countermeasures are also taken such as

    keeping the current density that flows in metal lines to a certain value or less. Stress Migration

    Stress migration is a failure mechanism where stress applied to metal lines causes the metal atoms to creep,

    forming voids in metal lines which lead to increased metal line resistance and disconnection. Stress is generated

    in the metal lines (Al, Cu) used in LSI due to temperature differences between the heat treatment process in the

    manufacturing process and the operating environment temperature. Thanks to this stress, vacancies in the metal

    lines can creep and converge in a single location, forming a void.

    Stress migration occurs due to the interaction between the metal line stress and the metal atom creep

    phenomenon. Whereas the metal atom creep speed increases at high temperatures, the stress acting on the metal

    lines decreases at high temperatures, so there is known to be a peak to the temperatures at which stress migration


    Interlayer dielectric film

  • 2-26

    (1) Aluminum stress migration

    Aluminum lines have many vacancies and aluminum atoms with weak bonding force at the grain boundaries

    of the polycrystalline structure, so when tensile stress is applied to metal lines, these aluminum atoms and

    vacancies at the grain boundaries creep and form voids. Aluminum voids produced by tensile stress mainly

    form and grow along the crystal grain boundaries, and can lead to increased metal line resistance and

    disconnection defects. (See Fig. 2-21.)

    Aluminum stress migration is generally said to have an occurrence ratio peak around 150 to 200C, and can

    become a problem for long-term reliability in devices that are used for long times in high-temperature


    As a design countermeasure, patterns are designed to avoid applying excessive stress to metal lines. Process

    countermeasures include using a metal line structure that layers the aluminum between upper and lower layers

    of a cap layer (Ti, W, etc.) to prevent stress migration. In addition, countermeasures such as using an interlayer

    film structure that reduces stress and optimizing the heat treatment process are also taken to reduce the residual

    metal line stress.

    Fig. 2-21 Disconnection Defect due to Aluminum Stress Migration

    (2) Copper stress migration

    Regarding copper stress migration, the stress induced voiding (SIV) mode that produces voids in via holes

    that connect upper and lower lines is a problem for reliability. When wide lines and narrow lines are

    connected by a single via hole, the tensile stress on the wide line side concentrates in the via hole, causing the

    vacancies in the copper to creep and migrate to the via hole and form a void. (See Fig. 2-22.) Stress migration

    at copper via holes is known to have an occurrence temperature peak around 200C. However, this failure is

    largely dependent on the stress generated in the high- temperature annealing process after copper line

    formation, so it occurs in a short time and is an early failure factor.

    A countermeasure method in the design stage is to use multiple via holes in areas where wide lines and

    narrow lines are connected. When metal lines are connected by multiple via holes, even if stress concentrates

  • 2-27

    on a single via hole and creates a void, the stress applied to other via holes is reduced so voids do not easily

    occur at those other via holes, enabling prevention of open defects between metal lines. Process

    countermeasures are also taken such as reducing the copper stress and selecting process conditions that reduce

    the vacancies in copper.

    Fig. 2-22 Void Caused by Stress Migration in a Copper Wiring Via Hole 1)

    1) R. Kanamura et al.: Symp. on VLSI Tech., p. 107, 2003

  • 2-28

    2.3 Acceleration Model

    In general, failure of components including semiconductor devices occurs due to some reaction at the atomic or

    molecular level, and can be described by the Eyring absolute reaction theory (hereafter, Eyring model).

    This Eyring model expresses the lifetime L in the absolute temperature T range that should be the focus for

    reliability by the following separation of variables-type equation, using the activation energy Ea shown in Fig. 2-23,

    the non-temperature stress S that is a factor inducing failure, and the Boltzmanns constant k(8.61710E-5[eV/K]).

    L = AS-n exp(Ea/kT)Eq. 2.3.1

    A and n in the above equation are constants.

    Outlines of the environmental and operating stress acceleration models used for semiconductor devices are

    described below.

    2.3.1 Acceleration Models for Environmental Stress

    (1) Temperature acceleration model

    exp(Ea/kT) on the right side of Equation 2.3.1 is also called the Arrhenius model since this is the same to the

    equation derived empirically by Arrhenius in the 19th century.

    Ea is the activation energy of which unit is eV. The activation energy is an essential one for the progress of

    chemical and physical reactions. If chemical and physical reactions consisting of failure mechanisms are same,

    the activation energies are inevitably equal.

    L = Aexp(Ea/kT) Eq. 2.3.2

    (2) Humidity acceleration model

    Humidity-induced acceleration models express the absolute vapor pressure Vp or the relative humidity RH as

    humidity stress.

    Typical models are described below.

    Absolute vapor pressure model

    This model expresses temperature stress and humidity stress using the absolute vapor pressure VP, which

    is empirically known as correct. Because Vp depends on the temperature, the Eyring Model cannot be used.

    L = VP-n Eq. 2.3.3 Relative humidity model

    This model is expressed conforming with the Eyring model by a separation of variables-type equation

    using the absolute temperature T and relative humidity RH since Vp depends on the temperature, and

    corresponds to the case when S = RH in Equation 2.3.1.

    L = A(RH)-n exp(Ea/kT) Eq. 2.3.4

  • 2-29

    Lycoudes model

    Those models which multiply temperature, relative humidity and a function of voltage are also available.

    As a typical model, the Lycoudes model reported by N. Lycoudes is shown below.

    MTTF=Aexp(Ea/kT)exp(B/RH)V-1Eq. 2.3.5

    V and B in the above equation are voltage and a constant respectively.

    (3) Temperature difference acceleration model

    This model is applied to failures caused by the repeated application of stress (thermal stress) produced by

    temperature differences. Labeling the temperature difference as T, the number of cycles N is expressed using

    the following equation, by substituting S=T for Equation 2.3.1.

    N=AT Eq. 2.3.6


    In case of low cycle fatigue, failures due to thermal fatigue of materials (cycle life) Nf conforms to the

    Coffin-Manson model described by the following equation, where is the plasticity strain amplitude.

    PNf=C Eq. 2.3.7

    a and C in the above equation are material constants.

    In case of low cycle fatigue, failure due to repeated thermal stress conforms to the Coffin-Manson model,

    and the temperature difference acceleration model is thought to be a form of that model. Semiconductor chip

    failure can be broadly described using the temperature difference acceleration model, but the Coffin-Manson

    model must be taken into account for mounting failures including package factors, such as the thermal fatigue

    life of soldered portions. The following is a variation of the Coffin-Manson model on which the effects of the

    temperature cycling frequency and maximum temperature, suggested by Norris and other persons.

    Nf=CfmP-nexp(Q/kTMAX) Eq. 2.3.8

    In the above equation; Nf is the fatigue life, C is the material constant, m and n are exponents, f is the

    cycling frequency, P is the plasticity strain amplitude, Q is the activation energy, k is the Boltzmann's

    constant and TMAX is the maximum temperature.

  • 2-30

    2.3.2 Acceleration Models for Operating Stress Operating stresses that determine semiconductor device life include voltage, current, electric field strength,

    current density, etc., and differ according to the failure mechanism as described in section 2.2.2. The main failure

    mechanism acceleration models are described below.

    Note that life in these models also depends on the temperature, so it is expressed by an Eyring model of the

    operating stress and temperature stress.

    (1) Time-dependent dielectric breakdown (TDDB) acceleration models

    The life of devices (TTF) due to TDDB depends on the gate oxide film thickness. The Eox model is said to

    be appropriate for those devices of which gate oxide film thickness is 5nm or more; the Vg model for more

    than 2nm, and less than 5nm; and the Power-law model for 2nm or less.

    Eox model

    TTF=Aexp(-EOXEox) exp(Ea/kT) Eq. 2.3.9

    Vg model

    TTF=Aexp(-VgVg) exp(Ea/kT) Eq. 2.3.10

    Power-law model

    TTF=AVgnexp(Ea/kT) Eq. 2.3.11

    In the above equations; EOX is the field intensity acceleration factor, Vg and n are voltage acceleration

    factors, Eox is the stress electric field applied to the gate and Vg is the stress voltage applied to the gate.

    Normal state

    Degraded state

    Activated state

    Fig. 2-23 Activation Energy



    n en


  • 2-31

    (2) Hot carrier (HCI) acceleration models

    The life of devices due to hot carriers is indicated by the substrate current model expressed by the substrate

    current and the 1/Vds model expressed by the drain voltage. Since process nodes for the 0.25um and 0.15um

    generations and newer devices, other impacts are greater than that of the substrate current, the 1/Vds model is

    becoming the main one.

    substrate current model

    TTF=AIsub -m exp(Ea/kT) Eq. 2.3.12

    1/Vds model

    TTF=Aexp(B/Vds)exp(Ea/kT) Eq. 2.3.13

    In the above equations; m is the factor depending on the substrate current, B is the factor depending on the

    voltage, Isub is the maximum substrate current while stress is being applied and Vds is the drain voltage while

    stress is being applied.

    (3) Negative Bias Temperature Instability (NBTI) acceleration models

    The life of devices due to NBTI is often indicated by the following equations:

    TTF=Aexp(Eox) exp(Ea/kT) Eq. 2.3.14

    TTF=AEox exp(Ea/kT) Eq. 2.3.15

    TTF=AVgn exp(Ea/kT) Eq. 2.3.16

    In the above equations; is the field intensity acceleration factor, n is the voltage acceleration factor, Eox is

    the stress electric field applied to the gate oxide film and Vg is the stress voltage applied to the gate oxide film.

    (4) Electromigration (EM) acceleration model

    In general, the life of devices due to EM is logically explained by the Huntingtons equation.

    C/t=DC-(eZ*/kT) EC Eq. 2.3.17

    In the above equation; C is the atomic concentration, D is the diffusion factor, Z* is the effective valence,

    E is the electric field, e is the electronic charge, k is the Boltzmann's factor and T is the absolute


    To calculate the actual life of devices due to EM (TTF), the Blacks equation which was derived empirically, is

    widely used.

    In the following equation; T is the absolute temperature, j is the current density, Ea is the activation energy,

    A is the constant of proportionality, n is the function of the current density andk is the Boltzmann's factor.

    TTF=Aj-nexp(Ea/kT) Eq. 2.3.18

  • 2-32

    1) JEITA EDR-4704A: Application guide of the accelerated life test for semiconductor devices

    2) JEITA EDR-4707Report on Failure Mechanism of LSI and reliability test method

    3) JEITA ETR-7024Research Report on Effect of Voids on Reliability of Lead-Free Solder Joints and

    Standard of Evaluation Criteria

    4) N. J. FloodReliability aspects of plastic encapsulated integrated circuit, IRPS(1972)

    5) D. S. PeckTemperature-humidity acceleration of metal-electronics failure in semiconductor devices,


    6) N. LycodesThe reliability of plastic microcircuit in moist environments, Solid State Technology(1978)

    7) T. GasserHot Carrier Degradation in Semiconductor Device

    8) Comparison of NMOS and PMOS hot carrier effects, IEEE transaction on electron devices(1997)

    9) H. B. HuntingtonDiffusion in Solids, Academic Press(1975)