Chapter 2 Semiconductor Device Reliability Verification · Chapter 2 Semiconductor Device...

Semiconductor Quality and Reliability Handbook

Chapter 2 Semiconductor Device Reliability Verification

2-1

Chapter 2 Semiconductor Device Reliability Verification

2.1 Fundamental Knowledge on Semiconductor Reliability.............................................................................. 2-2

2.1.1 Measures for Representing Reliability ............................................................................................. 2-2

2.1.2 Distributions Used in Reliability Analysis ......................................................................................... 2-4

2.1.3 Semiconductor Device Failure Pattern ........................................................................................... 2-7

2.1.3.1 Semiconductor Device Failure Regions ........................................................................... 2-7

2.1.3.2 Early Failures ..................................................................................................................... 2-8

2.1.3.3 Random Failures ............................................................................................................. 2-10

2.1.3.4 Wear-out Failures ............................................................................................................ 2-11

2.2 Semiconductor Reliability Verification ........................................................................................................ 2-13

2.2.1 Basic Approach Toward Reliability Verification ............................................................................. 2-13

2.2.1.1 Reliability Verification in the Development Stage ........................................................... 2-13

2.2.1.2 Reliability Verification in the Prototype Stage ................................................................. 2-13

2.2.1.3 Reliability Verification in the Mass Production Stage ..................................................... 2-14

2.2.2 Reliability in the Development and Design Stages ....................................................................... 2-15

2.2.2.1 Time-Dependent Dielectric Breakdown (TDDB) ............................................................ 2-17

2.2.2.2 Hot carrier (HCI) ............................................................................................................... 2-19

2.2.2.3 Negative Bias Temperature Instability (NBTI) ................................................................ 2-20

2.2.2.4 Soft Error .......................................................................................................................... 2-21

2.2.2.5 Electromigration ............................................................................................................... 2-23

2.2.2.6 Stress Migration ............................................................................................................... 2-25

2.3 Acceleration Model ...................................................................................................................................... 2-28

2.3.1 Acceleration Models for Environmental Stress ............................................................................. 2-28

2.3.2 Acceleration Models for Operating Stress .................................................................................... 2-30

2-2

2.1 Fundamental Knowledge on Semiconductor Reliability

With recent advances in the systematization, functions and performance of equipment, the social impact and

damages produced by failures are increasing, and high reliability has come to be demanded of equipment. This

means that even higher reliability is demanded of the individual components that comprise equipment.

Large quantities of semiconductors are used in a single piece of equipment, and these semiconductors often

handle the main functions of that equipment, so high reliability is extremely important. Semiconductors themselves

are also becoming more miniaturized and highly integrated, with larger-scale circuit configurations. In addition, as

semiconductor functions and performance advance and evolve into system LSIs, ensuring semiconductor reliability

has become a vital matter.

The reliability measures, distribution functions, trends in failure rates over time, and failure regions needed to

discuss semiconductor reliability are described below.

2.1.1 Measures for Representing Reliability

JISZ 8115 (Reliability Terminology) defines reliability as “The property of an item which enables it to fulfill its

required functions for the prescribed period under the given conditions.” Therefore, reliability includes the concept

of time, and reliability measures are functions of time.

(1) Reliability Function (Reliability): R(t)

Reliability indicates the probability for functioning correctly without failure until time t.

When n samples are used under the same conditions, if the number of failures occurring until time t has

elapsed is expressed as r(t), then the reliability R(t) is expressed by the following equation.

n

trntR

)()(

・・・・Eq. 2.1.1

(2) Failure Distribution Function (Unreliability): F(t)

This indicates the probability of failure occurring until time t, and is expressed by the following equation.

n

trtF

)()( ・・・・Eq. 2.1.2

In addition, the following relationship is established between unreliability F(t) and reliability R(t).

1)()( tFtR ・・・・Eq. 2.1.3

As shown in Fig. 2-1, R(t) decreases from 1 over time, while conversely F(t) increases from 0 toward 1 over

time. Note that the distribution functions described hereafter are used as the failure distribution functions of

semiconductor devices.

2-3

Fig. 2-1 Relationship between F(t) and R(t)

(3) Failure Density Function: f(t)

This represents the probability of failure occurring per unit time when time t has elapsed.

dt

tdR

dt

tdFtf

)()()( ・・・・Eq. 2.1.4

(4) Failure Rate Function: λ(t)

This represents the probability of failure occurring in the next unit time for samples that have not yet failed

when time t has elapsed.

)(

)(

)(1

)()(

tR

tf

tF

tft

・・・・Eq. 2.1.5

The failure rate function is also called the instantaneous failure rate, and is calculated from the failure

distribution function F(t) using Equations 2.1.4 and 2.1.5. Failure In Time (FIT: number of failures per billion

(109) total operating hours) is generally used as the unit for semiconductor devices.

Note that when the F(t) of the subject product is not known, the average failure rate obtained by the following

equation is used.

Average failure rate ≡ Total number of failures during the period / Total operating time during the period

・・・Eq. 2.1.6

[Supplement]

In addition to the failure rate defined above, the cumulative failure rate after a set equipped with the

semiconductor device has operated for the specified time in the market is sometimes used in the early failure

region described hereafter. Unless otherwise requested by the customer, the Sony Semiconductor Business

Unit also uses the cumulative failure rate after one year as the early failure rate.

In addition, after the early failure region, most semiconductor devices do not reach wear-out failure (genuine

failure) in the actual operating environment, and the failure rate exhibits the constant value of the random

2-4

failure region. This value becomes the same as that obtained by Equation 2.1.6, so the average failure rate can

be said to essentially be the failure rate after the early failure region.

(5) Mean Time To Failure: MTTF

The Mean Time To Failure (MTTF) of an item such as a semiconductor device that is not subject to repair or

maintenance is expressed by the following equation.

0

)( dtttfMTTF ・・・Eq. 2.1.7

2.1.2 Distributions Used in Reliability Analysis

Typical distribution functions used to analyze reliability data of semiconductor devices are described below.

(1) Normal distribution

The normal distribution is a typical continuous distribution used for quality control. It is said that in

reliability analysis, the normal distribution is often applied to wear-out life where failures concentrate around a

certain time.

The probability density function f(t) and distribution function F(t) are expressed by the following equations.

)(

2exp

2

1)(

2

2

t

ttf

・・・・Eq. 2.1.8

tdxx

tFt

2

2

2exp

2

1

・・・・Eq. 2.1.9

This distribution is given by the mean parameter μ and the dispersion (variance) parameter σ.

As shown in Fig. 2-2 below, the normal distribution has a symmetrical bell shape centering on μ, and the

probability of the value t being contained within the range of ±σ, ±2σ and ±3σ to both sides of μ is 68.26%,

95.44% and 99.7%, respectively.

Fig. 2-2 Normal Distribution

2-5

(2) Exponential distribution

The exponential distribution represents the life distribution (failure distribution function) in the random

failure region where the failure rate λ is constant over time, and the probability density function f(t) and

distribution function R(t) are expressed by the following equations. This distribution corresponds to the case

when the shape parameter m = 1 in the Weibull distribution described hereafter.

tetf )( ・・・・Eq. 2.1.10

tetR 1)( ・・・・Eq. 2.1.11

Fig. 2-3 Exponential Distribution

Note that as shown in the following equation, the MTTF is given from t0, which is the inverse of the failure

rate λ.

MTTFt 01 ・・・・Eq. 2.1.12

(3) Logarithmic normal distribution

The logarithmic normal distribution is a distribution function where ln t, which is the logarithm of the life

time t, follows the above-mentioned normal distribution.


tt

ttf 0

ln

2

1exp

2

1)(

2

・・・・Eq. 2.1.13

t

dxx

xtF

0

2ln

2

1exp

1

2

1)(

・・・・Eq. 2.1.14

2-6

Fig. 2-4 Logarithmic Normal Distribution

In semiconductor device reliability, the electromigration life is generally known to follow a logarithmic

normal distribution.

(4) Weibull distribution

The Weibull distribution is a weakest link model proposed by W. Weibull (Sweden) in 1939 as a mechanical

breakdown strength distribution. This model was applied by J. H. K. Kao in 1955 to analyze the life of vacuum

tubes, and has often been used since then to model life distributions in analysis of semiconductor device

reliability.


mmttm

tf

exp)(

1

・・・・Eq. 2.1.15

mt

tF

exp1)( ・・・・Eq. 2.1.16

2-7

Fig. 2-5 Weibull Distribution

Here, m is called the form parameter, η the measure parameter (characteristic life), and γ the position

parameter.

In addition, assuming t0=ηm, the failure rate (t) is expressed by the following equation.

1

0

1

(t)

m

m

tt

mtm

・・・・Eq. 2.1.17

The following information concerning the failure pattern can be obtained from the value of the form

parameter m.

0 < m < 1: Early failure (DFR) pattern where the failure rate decreases over time

m = 1: Random failure (CFR) pattern where the failure rate is constant (matches with the exponential

distribution)

m > 1: Wear-out failure (IFR) pattern where the failure rate increases over time

2.1.3 Semiconductor Device Failure Pattern

2.1.3.1 Semiconductor Device Failure Regions

Like general electronic equipment, semiconductor device failure regions are classified into the three types of

early, random and wear-out failure regions, and the time-dependent trend in the failure rate creates a curve called a

bathtub curve as shown in Fig. 2-6.

This curve is the sum of the early failure rate which decreases steadily over time, the random failure rate which

exhibits a constant value, and the wear-out failure rate which increases steadily over time. However, in case of

semiconductor devices, the random failure rate is thought to consist of only small soft errors as described hereafter,

and the failure rate in the random failure region (the height of the bottom of the bathtub) can be said to be

dominated by the sum of the failure rates of the region where the early rate converges towards a constant value and

f(t)

2-8

the region where the wear-out failure rate begins to rise.

Early failure region Random failure region Wear-out failure region

Wear-out failure rate

Random failure rate

Life (Useful years)Operating timeProduct shipped

Early failure rate

Fai

lure

rate

Fig. 2-6 Time-Dependent Change in Semiconductor Device Failure Rate

2.1.3.2 Early Failures

The failure rate in the early failure period is called the early failure rate (EFR), and the failure rate

monotonically decreases over time. The vast majority of semiconductor device early failures are caused by defects

built into devices mainly in the wafer process. The most common causes of these defects are dust adhering to

wafers in the wafer process and crystal defects in the gate oxide film or the silicon substrate, etc. Most devices

containing defects rooted in the manufacturing process fail within the manufacturing process and are eliminated as

defective in the final sorting process. However, a certain percentage of devices with relatively insignificant defects

may not have failed when making the final measurements and may be shipped as passing products. These types of

devices that are inherently defective from the start often fail when stress (voltage, temperature, etc.) is applied for a

relatively short period, and exhibit a high failure rate in a short time within the customer’s mounting process or in

the initial stages after being shipped as products. However, these inherently defective devices fail and are

eliminated over time, so the rate at which early failures occur decreases.

This property of semiconductor devices where the failure rate decreases over time can be used to perform

screening known as “burn-in,” where stress is applied for a short time in the stage before shipping to eliminate

devices containing initial defects. Product groups from which devices with inherent initial defects have been

removed to a certain degree by burn-in not only improve the early failure rate in the market, but also make it

possible to maintain high quality over a long period as long as these products do not enter the wear-out failure

region.

An overview of burn-in is described below.

(1) Derivation of failure distribution function of early failure period

In order to determine the burn-in conditions for reliably removing devices with inherent early failures, it is

necessary to obtain the failure distribution function of the early failure period.

2-9

To obtain this function, highly accelerated life tests are performed in a short time using a sample quantity on

a scale that is certain to contain devices with inherent initial defects (normally several thousand to ten thousand

pieces). The obtained failure time data is then plotted on Weibull probability paper and the failure distribution

function is estimated from the resulting regression line.

Fig. 2-7 shows an example of this process. The shape parameter m and the characteristic life η that determine

the Weibull distribution in the following equation can be obtained from the linear regression.

mt

tF

exp1)( ・・・・Eq. 2.1.18

This method of obtaining the failure distribution function is called burn-in study.

Fig. 2-7 Weibull Plot of the Burn-in Study

Note) Weibull probability paper is scaled to display linear regression of failure times that follow a Weibull

distribution.

(2) Determining the burn-in conditions

The screening (burn-in) conditions required to reduce the early failure rate after shipment (Note 1) to the

target value can be determined using the failure distribution function F(t) obtained from the burn-in study.

Labeling the burn-in time as t0 and the coefficient of acceleration for the burn-in conditions and the market

environment as K, the cumulative early failure rate that can be eliminated by burn-in is given as F(K·t0), and

the new cumulative early failure rate F(t) up to time t after burn-in can be obtained by the following formula.

)()()( 00 tKFttKFtF ・・・・Eq. 2.1.19

This relationship can be expressed in graph form as shown in Fig. 2-8.

The burn-in conditions are selected according to the combination of the acceleration conditions and time that

will reduce this value to the target early failure rate or lower. Normally, initial defects that are the cause of

2-10

early failures occur at the highest rate in the initial stages of process development, and then decrease thereafter

due to process improvements and process mastery. The early failure rate decreases in proportion to these initial

defects, so the burn-in time is reviewed as appropriate in accordance with process improvements.

Early failures eliminated by screeningF

ailu

re P

roba

bilit

yD

ensi

ty F

unct

ion

f(t)

Cumulative early failure rate F(t)

tK・t0

Burn-in Shipment

Fig. 2-8 Early Failure Screening by Burn-in

Note 1) The early failure rate described in this section is not the instantaneous failure rate but the

cumulative failure rate over the specified period. See the [Supplement] under “2.1.1 Measures for

Representing Reliability.”

2.1.3.3 Random Failures

When devices containing initial defects have been eliminated to a certain degree, the early failure rate becomes

extremely small, and the failure rate exhibits a gradually declining curve over time. In this state, the failure

distribution is close to an exponential distribution, and this is called the random failure period. The semiconductor

device failure rate during this period is an extremely small value compared to the early failure rate immediately

after shipment, and is normally a level that can be ignored for the most part. Viewed in terms of failure

mechanisms, there are extremely few semiconductor device failures that can be clearly defined as random failures.

However, memory software errors and other phenomena caused by α rays and other high-energy particles are

sometimes classified as randomly occurring failure mechanisms.

When predicting semiconductor device failure rates, failures occurring sporadically after a certain long time has

passed since the start of operation and failures for which the failure cause could not be determined are treated as

random failures in some cases. However, most of these failures are thought to be devices containing relatively

insignificant initial defects (dust or crystal defects) that fail after a long time, and should essentially be positioned

on the early failure rate attenuation curve. This type of failure rate cannot be estimated from the results of tests

performed with few samples such as reliability tests. There are also phenomena such as ESD breakdown,

overvoltage (surge) breakdown (EOS) and latch-up that occur at random according to the conditions of use.

However, these phenomena are all produced by the application of excessive stress over the device absolute

2-11

maximum ratings, so these are classified as breakdowns instead of failures, and are not included in the random

failure rate.

2.1.3.4 Wear-out Failures

Wear-out failures are failures rooted in the durability of the materials comprising semiconductor devices and the

transistors, metal lines, oxide films and other elements, and are an index for determining the device life (useful

years). In the wear-out failure region, the failure rate increases with time until ultimately all devices fail or suffer

characteristic defects.

The main wear-out failure mechanisms for semiconductor devices are as follows.

• Electromigration

• Hot carrier-induced characteristics fluctuation

• Time-dependent dielectric breakdown (TDDB)

• Laser diode luminance degradation

Semiconductor device life is defined as the time (or stress) at which the cumulative failure rate for the wear-out

failure mode reaches the prescribed value, and can be estimated using the results of reliability tests and test element

group (TEG) evaluation.

Semiconductor device life is often determined by the reliability of each element (metal lines, oxide film,

interlayer film, transistor, etc.) comprising the device, and these reliabilities are evaluated using TEG for each

element in the process development stage. These TEG evaluation results are incorporated into design rules in the

form of allowable stress limits (electric field strength, current density, etc.) to suppress wear-out failures in the

product stage and ensure long-term reliability. As a result, semiconductor devices experience almost no wear-out

failures within the reliability test time (stress) range in the product stage.

(1) Life estimation method

Semiconductor device life can be obtained as follows based on the wear-out failure data generated by TEG

evaluation and reliability tests. First, linear regression is performed for the time-dependent cumulative failure

rate using a Weibull probability distribution or logarithmic normal probability distribution, then the life is

obtained from the time (or stress) at which the reference cumulative failure rate is reached and the acceleration

factor of the accelerated test conditions (Fig. 2-9).

2-12

99.999.0

90.080.070.060.050.040.030.0

F(t) (%

)

20.0

10.0

5.0

2.0

1.0

0.5

0.2

0.110 100 1000 10000 100000

Time (h)

Acceleration test failure rate Predicted marketenvironment failure rate

×Acceleration factor

Fig. 2-9 Failure Rate Prediction Method Using Weibull Probability Plotting Paper

2-13

2.2 Semiconductor Reliability Verification

2.2.1 Basic Approach To Reliability Verification

The Sony Semiconductor Business Unit performs reliability verification that takes into account semiconductor

device failure modes (see Fig. 2-10) in each stage from process development through mass production.

Fig. 2-10 Semiconductor Device Failure Rate Curve

2.2.1.1 Reliability Verification in the Development Stage

The failure time due to wear-out failure (intrinsic failure) of semiconductor devices, that is to say the life, is

determined by the failure mechanisms of the process elements described in 2.2.2.

Reliability is evaluated in the process development stage using test element groups (TEG) suitable for verifying

these failure mechanisms to confirm that the prescribed reliability is satisfied.

2.2.1.2 Reliability Verification in the Prototype Stage

(1) Reliability verification for wear-out failures (Intrinsic failures)

Reliability is evaluated over long times using small quantities of prototypes to verify that wear-out failures

do not occur in the assumed operating environments and operating periods. (See Table 2-1.)

(2) Reliability verification for early failures (Extrinsic failures)

Semiconductor devices tend to have a high failure rate at the start of operation, and this failure rate tends to

decrease steadily over time. This is because a certain percentage of semiconductor devices have inherent

manufacturing defects such as dust, causing these devices to fail. This tendency is more noticeable for new

processes, so burn-in studies are performed when introducing production to verify the early failure rate.

When the prescribed failure rate is not satisfied, burn-in and other screening methods are used to remove

Early failure mode (Extrinsic failures)

Wear-out failure mode(Intrinsic failures)

New process

Operating time After burn-in

Fai

lure

ra

te

2-14

semiconductor devices with inherent manufacturing defects.

The Sony Semiconductor Business Unit continuously executes activities to stabilize and improve processes,

and strives to reduce the number of semiconductor devices with inherent manufacturing defects so that

prescribed early failure rates can be satisfied without the need to perform burn-in.

2.2.1.3 Reliability Verification in the Mass Production Stage

Mass production items are sampled* and reliability is periodically evaluated at the product level corresponding

to (1) above to confirm that the wear-out failure reliability level built in at the development stage is continuously

maintained from mass production onward.

* Samples are taken from each product family in consideration of combinations of wafer process, assembly

process, factory, and other factors.

2-15

Table 2-1 shows typical LSI product reliability test items used by the Sony Semiconductor Business Unit.

Table 2-1 Typical Sony LSI Product Reliability Test Items

Name of test Code Test conditions

High Temperature Operating Life HTOL Tj≧125C

Vop_max 1000h

Low Temperature Operating Life LTOL Ta=-55C

Vop_max 1000h

Temperature Humidity Bias THB Ta=85C85%RH

Vop_max On/Off 1000h

High Temperature Storage HTS Ta=150C 1000h

Temperature Cycling TC Ts=-65~125C 700cyc

Ts=-40~125C 850cyc

Ts=-65~150C 500cyc

Moisture Sensitivity Level MSL Level 3 (standard lank)

(J-STD-020)

Electrostatic Discharge Human Body Model

(HBM)

ESD

HBM

C=100pF, R=1500Ω

(JS-001-2014)

Electrostatic Discharge Charged Device

Model (CDM)

ESD

CDM

Charged Device Model

(JESD22-C101)

Latch-Up Trigger Pulse Current Injection

Method

LU

I-Test

Trigger pulse current injection method

(JESD78)

Latch-Up Supply Overvoltage Method LU

V-Test

Power supply overvoltage method; Ta=25, 125C

(JESD78)

Burn-In Study (Early Life Failure Rate) BIS

(ELFR)

Tj≧125C, Vop_max

2.2.2 Reliability in the Development and Design Stages

Semiconductor devices have failure mechanisms unique to semiconductors, and resolving these problems in the

process development stage is an important element for securing reliability. Stable product reliability can be

secured by verifying the required reliability when developing each process element and reflecting these results to

the design rules.

Table 2-2 shows typical failure mechanisms that can pose problems in the process development stage. As

processes become more miniaturized, higher internal electric fields, current densities, metal line stress and other

factors increase the stress applied to transistors and metal lines. On the other hand, faster circuit speeds and

increased parasitic impedance (metal line resistance, parasitic capacitance) reduce operating margins, which is a

major issue in securing reliability with respect to transistor characteristics fluctuation.

Typical semiconductor device failure mechanisms that can pose problems in the process development and

design stages are described below.

2-16

Table 2-2 Typical Failure Mechanisms in the Process Development Stage

Process element

Failure mechanism Failure mode and cause

Gate dielectric film

Time-dependent dielectric breakdown (TDDB)

Dielectric breakdown of the gate dielectric film. This is the phenomenon where bias applied to a gate electrode for a long time produces defects in the gate dielectric film, increasing the micro leak current and leading to dielectric breakdown.

Transistor Hot carrier (HCI) Transistor characteristics fluctuation due to trapping of hot carriers in the gate dielectric film. This is the phenomenon where high-energy electrons and holes generated by impact ionization of electrons accelerated by high electric fields are trapped in the oxide film, causing the transistor characteristics to fluctuate.

NBTI (slow trap) PMOS transistor characteristics fluctuation due to application of a gate negative bias (NBT). This is also called the slow trap phenomenon, and is the phenomenon where application of a bias at high temperatures increases the interface state and positive fixed charge, causing the transistor characteristics to fluctuate.

Memory device

Soft error Memory data rewrite error due to high-energy cosmic ray particles (neutron rays, proton rays, etc.), α rays, etc. This is a temporary data error phenomenon that occurs mainly in DRAM and SRAM.

Retention/disturb Non-volatile memory data loss. This is the phenomenon where long-term storage or operating environment stress (read/write electric field, temperature, stress) causes the trapped charge in a Flash memory to disappear, inverting the data.

Metal lines Electromigration Increased metal line resistance and disconnection due to voids forming in metal lines. This is the phenomenon where physical impacts between electrons and metal atoms cause the metal atoms to move, creating voids.

Stress migration The metal creep phenomenon due to metal line stress causes voids to form and grow in metal lines and connection (via hole) portions, resulting in open defects. In copper lines, this is the phenomenon where vacancies (atom holes) in copper lines due to metal line stress induce the creep phenomenon, causing voids to form and grow.

Low-k interlayer films

TDDB between metal lines

Short-circuit due to dielectric breakdown between copper lines. This phenomenon mainly consists of dielectric breakdown via the CMP interface of an interlayer dielectric film that uses low-k materials, resulting in a short-circuit between metal lines.

2-17

2.2.2.1 Time-dependent Dielectric Breakdown (TDDB)

MOS FET gate dielectric film has a failure mechanism whereby applying even an electric field of the dielectric

withstand voltage or less for a long time causes the dielectric film to deteriorate and lead to breakdown. This

breakdown of the dielectric film over time is called time-dependent dielectric breakdown (TDDB). The TDDB life

of gate dielectric film is one of the most important failure mechanisms determining the long-term reliability of a

MOS-type semiconductor device. The TDDB life said to be the factor that determines the limit for reducing the

gate dielectric film thickness, and the gate dielectric film thickness in system LSI is also sometimes determined by

the TDDB life in accordance with the logic circuit supply voltage.

(1) Gate dielectric film life distribution

Time-dependent dielectric film breakdown phenomena can generally be divided into an initial breakdown

area rooted in defects and a genuine life area. Fig. 2-11 shows the TDDB measurement data of a gate oxide

film (SiO2) plotted using a Weibull distribution function. The initial breakdown and genuine life areas can be

separated according to differences in the shape parameter (graph slope) of the Weibull distribution function.

Dielectric film distributed in the initial breakdown area with a short TDDB life is oxide film that includes

defects that may fail in a short time in the market, so it is important to suppress the defect occurrence rate to

lower the early failure rate.

In contrast to this, the genuine breakdown area indicates the natural life of gate dielectric film that does not

include major defects, and is a necessary index for assuring long-term reliability. The genuine life at the actual

operating voltage can be predicted using an electric field acceleration model from the evaluation results of

TDDB accelerated by high electric field stress conditions. The electric field acceleration model uses the E-

model (τexp(E)), Power-law model (τE-n) and other models according to the film thickness and film type.

(See Fig. 2-12.)

(2) Gate dielectric film breakdown mechanism

Gate dielectric film contains a large number of micro defects and impurities that occur in the wafer process,

and micro leak currents flow via these defects even in the state where the applied electric field (supply voltage)

is less than the genuine withstand voltage. These leak currents generate new defects in the dielectric film over

time, and the accumulation of these defects leads to dielectric film breakdown.

The percolation model is a typical failure mechanism for TDDB breakdown of thin gate dielectric film. In

this failure model, when defects initially present in the gate dielectric film and new defects generated by tunnel

current flowing due to the application of electric fields are continuous in the thickness direction, this leads to

dielectric breakdown. (See Fig. 2-13.)

As gate dielectric film becomes thinner, fewer defects may generate continuous defects which are needed for

dielectric breakdown, so the TDDB life variance increases. In addition, data written in Flash memories can also

2-18

be lost (phenomenon of retention) due to micro leak currents prior to breakdown.

Fig. 2-11 TDDB Data Distribution (Weibull)

Fig. 2-12 Electric Field Acceleration Model and Life Prediction

EFIELD: Actual electric field

ETEST: Test electric field

Genuine life distribution

Oxide film that includes defects

(early failure area)

2-19

Fig. 2-13 Gate Dielectric Film Breakdown Model (Percolation Model)

2.2.2.2 Hot carrier (HCI)

Hot carrier is a failure mechanism where a charge (carrier) that has attained high energy mainly due to

acceleration by the electric field inside the MOS FET becomes trapped in the gate dielectric film, causing the

transistor characteristics to fluctuate and resulting in a circuit operation error. In a general operating environment,

the greatest transistor deterioration is caused by Drain Avalanche Hot Carrier (DAHC) injection, which occurs

when electrons flowing along an NMOS FET channel are accelerated by the high electric field near a drain. On the

other hand, the hot carrier mechanism that injects a charge to the dielectric film is also used to write and erase data

in a non-volatile memory.

(1) Drain Avalanche Hot Carrier (DAHC) injection

Electrons flowing in a NMOS FET channel are accelerated by the high electric field near a drain and undergo

impact ionization, generating electron-hole pairs. Of the electron or the hole, the carrier with the higher energy

(hot carrier) is injected to and trapped by the gate dielectric film, causing the transistor characteristics to

fluctuate (threshold value fluctuation, drop in drain current, etc.). This is called Drain Avalanche Hot Carrier

(DAHC) injection. (See Fig. 2-14.)

The dominant DAHC injection mode in a NMOS FET is mainly electron injection, and the maximum

deterioration occurs under the condition where the gate voltage is approximately 1/2 • VDS. This means that in a

CMOS circuit, hot electron injection occurs when the signal is inverted (H→L/L→H), so deterioration

progresses as the circuit is operated.

This problem can be avoided by selecting operating conditions (voltage, duty) in the circuit design stage

under which hot carriers are not easily generated, and reliability can also be increased by providing circuits

with the required operating margin. Device countermeasures are also taken, such as adopting a device structure

(LDD structure) that suppresses hot carrier generation by reducing the electric field around drains.

(a) Initial stage (b) Defect generated by micro leak current

(c) Breakdown occurs

Defect

2-20

Fig. 2-14 DAHC Mechanism

2.2.2.3 Negative Bias Temperature Instability (NBTI)

PMOS FET negative bias temperature instability (NBTI) is the phenomenon where transistor characteristics

fluctuate when a negative gate bias is applied to a PMOS FET. This is one of the transistor deterioration

mechanisms known as slow trap. PMOS FET is one of the latest MOS processes, and the use of surface channel-

type transistors causes deterioration to increase, which is a transistor reliability problem on a level with hot carriers.

(1) NBTI deterioration mechanisms

When a negative bias is applied to a PMOS FET, the holes on the Si surface are trapped by the Si-H bond of

the Si-SiO2 interface, and the hydrogen (H) is disassociated from the Si-H bond and generates an interface state.

The hydrogen disassociated from the Si bond diffuses and is trapped within the gate dielectric film, generating

a positive fixed charge that promotes deterioration of the transistor characteristics.

Si ≡ Si- H + hole Si ≡ Si-・+ + H

H + H H2

The interface state generated at the interface between the Si and the gate dielectric film traps the positive

charge when the PMOS FET operates, and becomes positively charged. This generates a positive fixed charge

in the dielectric film, and causes the transistor threshold voltage (Vth) to fluctuate and the drain current to drop.

One characteristic of NBTI is that when negative bias is applied to a gate, deterioration occurs regardless of

transistor operation, so deterioration proceeds even in circuits that are not operating. On the other hand, there is

also the phenomenon that fluctuating characteristics recover rapidly when negative bias stress is not applied,

and the amount of fluctuation in the operating state is known to be largely independent of the operating

frequency. In the process conditions, the amount of NBTI deterioration is closely related to the concentrations

and profile of the impurities (N, H, B, etc.) in the gate dielectric film, and the amount of deterioration increases

in particular for gate dielectric films (SiON, SiN) with high nitrogen (N) contents.

Gate

Source Drain

Electron

Hole

2-21

This problem can be avoided by design countermeasures such as providing sufficient margin for circuit

operation on account of transistor deterioration, and by reducing the electric fields applied to gate dielectric

film. Device countermeasures are also taken such as forming the gate dielectric film so that interface states and

fixed charges are not easily generated.

Fig. 2-15 NBTI Failure Mechanisms

2.2.2.4 Soft Error

When α rays and high-energy neutron rays generated from cosmic rays, etc. penetrate memory elements and

other semiconductor devices, large quantities of electron-hole pairs are generated within the silicon crystals. These

charges invert the memory nodes, resulting in memory data errors known as the soft error phenomenon. The soft

error phenomenon temporarily inverts the memory and logic circuit data, and these errors can be recovered by

rewriting the data. This phenomenon was previously a problem for DRAM, but is currently also considered a

problem for SRAM reliability.

(1) Principle of soft error generation by α rays

The quartz materials used in the sealing resin packages of semiconductors contain trace amounts of

radioactive elements (uranium: 238U; thorium: 232Th). In addition, the lead bumps used in flip chips sometimes

contain polonium (210Po). When the high-energy α rays emitted by these radioactive elements penetrate the

silicon substrate, electron (e-) and hole (e+) pairs are generated along the α ray path inside the silicon. The

electric field causes electrons generated inside the depletion layers to migrate and cluster together in the n

HoleDiffusion to within the oxide film Generation of a positive fixed charge

Hole trapping

Generation of an interface state

Si-SiO2 interface terminated by hydrogen (H) (Negative bias applied)

Hole trapping by the tunnel phenomenon

Disassociation of hydrogen (H) and generation of an interface state

2-22

diffusion area, which causes the memory node capacity potential to drop. (See Fig. 2-16.)

Fig. 2-17 shows the soft error mechanisms in the SRAM memory cell. When the High side memory node

potential falls below the driver transistor threshold value, the two inverters forming a Flip-Flop both turn off at

the same time, making the Flip-Flop unstable and causing misoperation. Generally when the word line is

selected, the High side memory node potential (Vh) drops to Vcc - Vth (word transistor threshold value). When

the word line is not selected, the High side memory node is charged by the memory cell load and the potential

returns to Vcc. The faster this recovery time from Vcc - Vth to Vcc, that is to say the greater the current supply

capacity of the memory cell load, the more resistant the SRAM is to soft errors.

Countermeasures for soft errors caused by α rays include forming a protective film on the chip surface to

absorb α rays. In addition, countermeasures are also taken to reduce α ray emission levels such as by using

highly pure package materials with reduced levels of radioactive element contents.

Fig. 2-16 Generation of Electron and Hole Pairs by α Rays

2-23

Fig. 2-17 Soft Error in the SRAM Cell

(2) Soft errors due to cosmic rays

High-energy cosmic rays collide in the atmosphere with the atoms that comprise the atmosphere, generating

high-energy protons and neutrons. These high-energy neutron rays passing through silicon, electron-hole pairs

are generated along the range and the neutron rays collide with silicon atoms to generate secondary ions by

spallation reaction; which can cause soft errors. The quantity of high-energy neutrons generated by cosmic rays

that reaches the ground is known to increase in high-elevation regions due to differences in geographical

conditions and lower atmospheric shielding effects, and this causes the soft error occurrence rate to increase.

This can pose serious reliability problems in applications such as aircraft and satellites.

It is difficult to suppress factors causing soft errors due to cosmic rays, so this is known as a failure mode

that occurs at a certain probability. One countermeasure method for SRAM is to mount error correcting code

(ECC) so that data experiencing soft errors is corrected. In addition, device structures such as SOI structures

that are resistant to the effects of soft errors are also sometimes used.

2.2.2.5 Electromigration

Electromigration is a failure mechanism where electrons flowing through metal (Al, Cu) lines collide physically

with the metal atoms, causing the metal atoms to migrate and form voids in the metal lines which lead to increased

metal line resistance and disconnection. Electromigration is a key failure mechanism that determines the long-term

reliability of metal lines.

(1) Aluminum electromigration

2-24

The thin films used in aluminum (Al) lines are formed by spattering, and the aluminum atoms accumulate in

a polycrystalline (grain) structure. (See Fig. 2-18.) When current of a certain density or more flows through

these metal lines, the electromigration phenomenon is caused where the metal atoms physically move by stress

due to collisions between the electrons and metal atoms. The metal atoms around the grain boundaries have

weak bonding energy and move easily, so electromigration occurring at the grain boundaries of metal lines

with uneven grain sizes causes voids to form and grow along the grain boundaries, leading to disconnection.

(See Figs. 2-19 and 2-20.)

Process countermeasures include adding trace amounts of copper to aluminum to suppress aluminum atom

migration by slowing down the movement time, and covering the top and bottom of metal lines with Ti, W or

other metal alloys (cap layer) to suppress aluminum atom movement. Circuit design countermeasures are also

taken such as keeping the current density that flows in metal lines to a certain value or less.

Fig. 2-18 Aluminum Grain Structure

Fig. 2-19 Electromigration Mechanism

Al accumulation Al shortage (void)

Al grain boundary

Grain boundary diffusion

Electron

2-25

Fig. 2-20 Photo of Electromigration

(2) Copper electromigration

Copper lines are formed by an embedded metal line (damascene) process that uses electroplating. Copper has

a higher melting point and activation energy than aluminum, and exhibits reliability with respect to

electromigration that is several ten to several hundred times higher than that of aluminum. However, the

miniaturization of metal lines in the latest processes is increasing the current density, so resistance to

electromigration is becoming an important issue for reliability.

The electromigration resistance of copper is known to be greatly affected by the crystal grain size and

alignment, and the adhesion at the interface between the copper and the barrier metal. Particularly in copper

lines that has a structure surrounded by barrier metal, when the adhesion drops between the copper and the cap

layer on the top surface where smoothing is performed, the copper at the interface moves easily, resulting in

migration. Therefore, it is important that the process incorporate countermeasures to increase the adhesion at

the interface between the copper and the cap layer. Circuit design countermeasures are also taken such as

keeping the current density that flows in metal lines to a certain value or less.

2.2.2.6 Stress Migration

Stress migration is a failure mechanism where stress applied to metal lines causes the metal atoms to creep,

forming voids in metal lines which lead to increased metal line resistance and disconnection. Stress is generated

in the metal lines (Al, Cu) used in LSI due to temperature differences between the heat treatment process in the

manufacturing process and the operating environment temperature. Thanks to this stress, vacancies in the metal

lines can creep and converge in a single location, forming a void.

Stress migration occurs due to the interaction between the metal line stress and the metal atom creep

phenomenon. Whereas the metal atom creep speed increases at high temperatures, the stress acting on the metal

lines decreases at high temperatures, so there is known to be a peak to the temperatures at which stress migration

occurs.

Interlayer dielectric film

2-26

(1) Aluminum stress migration

Aluminum lines have many vacancies and aluminum atoms with weak bonding force at the grain boundaries

of the polycrystalline structure, so when tensile stress is applied to metal lines, these aluminum atoms and

vacancies at the grain boundaries creep and form voids. Aluminum voids produced by tensile stress mainly

form and grow along the crystal grain boundaries, and can lead to increased metal line resistance and

disconnection defects. (See Fig. 2-21.)

Aluminum stress migration is generally said to have an occurrence ratio peak around 150 to 200°C, and can

become a problem for long-term reliability in devices that are used for long times in high-temperature

environments.

As a design countermeasure, patterns are designed to avoid applying excessive stress to metal lines. Process

countermeasures include using a metal line structure that layers the aluminum between upper and lower layers

of a cap layer (Ti, W, etc.) to prevent stress migration. In addition, countermeasures such as using an interlayer

film structure that reduces stress and optimizing the heat treatment process are also taken to reduce the residual

metal line stress.

Fig. 2-21 Disconnection Defect due to Aluminum Stress Migration

(2) Copper stress migration

Regarding copper stress migration, the stress induced voiding (SIV) mode that produces voids in via holes

that connect upper and lower lines is a problem for reliability. When wide lines and narrow lines are

connected by a single via hole, the tensile stress on the wide line side concentrates in the via hole, causing the

vacancies in the copper to creep and migrate to the via hole and form a void. (See Fig. 2-22.) Stress migration

at copper via holes is known to have an occurrence temperature peak around 200°C. However, this failure is

largely dependent on the stress generated in the high- temperature annealing process after copper line

formation, so it occurs in a short time and is an early failure factor.

A countermeasure method in the design stage is to use multiple via holes in areas where wide lines and

narrow lines are connected. When metal lines are connected by multiple via holes, even if stress concentrates

2-27

on a single via hole and creates a void, the stress applied to other via holes is reduced so voids do not easily

occur at those other via holes, enabling prevention of open defects between metal lines. Process

countermeasures are also taken such as reducing the copper stress and selecting process conditions that reduce

the vacancies in copper.

Fig. 2-22 Void Caused by Stress Migration in a Copper Wiring Via Hole 1)

<References>

1) R. Kanamura et al.: Symp. on VLSI Tech., p. 107, 2003

2-28

2.3 Acceleration Model

In general, failure of components including semiconductor devices occurs due to some reaction at the atomic or

molecular level, and can be described by the Eyring absolute reaction theory (hereafter, “Eyring model”).

This Eyring model expresses the lifetime L in the absolute temperature T range that should be the focus for

reliability by the following separation of variables-type equation, using the activation energy Ea shown in Fig. 2-23,

the non-temperature stress S that is a factor inducing failure, and the Boltzmann’s constant k(8.617×10E-5[eV/K]).

L = A・S-n exp(Ea/kT)・・・・Eq. 2.3.1

“A” and “n” in the above equation are constants.

Outlines of the environmental and operating stress acceleration models used for semiconductor devices are

described below.

2.3.1 Acceleration Models for Environmental Stress

(1) Temperature acceleration model

exp(Ea/kT) on the right side of Equation 2.3.1 is also called the Arrhenius model since this is the same to the

equation derived empirically by Arrhenius in the 19th century.

Ea is the activation energy of which unit is “eV.” The activation energy is an essential one for the progress of

chemical and physical reactions. If chemical and physical reactions consisting of failure mechanisms are same,

the activation energies are inevitably equal.

L = A・exp(Ea/kT) ・・・・Eq. 2.3.2

(2) Humidity acceleration model

Humidity-induced acceleration models express the absolute vapor pressure Vp or the relative humidity RH as

humidity stress.

Typical models are described below.

① Absolute vapor pressure model

This model expresses temperature stress and humidity stress using the absolute vapor pressure VP, which

is empirically known as correct. Because Vp depends on the temperature, the Eyring Model cannot be used.

L = VP-n ・・・・Eq. 2.3.3

② Relative humidity model

This model is expressed conforming with the Eyring model by a separation of variables-type equation

using the absolute temperature T and relative humidity RH since Vp depends on the temperature, and

corresponds to the case when S = RH in Equation 2.3.1.

L = A・(RH)-n exp(Ea/kT) ・・・・Eq. 2.3.4

2-29

③ Lycoudes model

Those models which multiply temperature, relative humidity and a function of voltage are also available.

As a typical model, the Lycoudes model reported by N. Lycoudes is shown below.

MTTF=A・exp(Ea/kT)・exp(B/RH)・V-1・・・・Eq. 2.3.5

“V” and “B” in the above equation are voltage and a constant respectively.

(3) Temperature difference acceleration model

This model is applied to failures caused by the repeated application of stress (thermal stress) produced by

temperature differences. Labeling the temperature difference as ∆T, the number of cycles N is expressed using

the following equation, by substituting S=ΔT for Equation 2.3.1.

N=A・ΔT –α ・・・・Eq. 2.3.6

[Supplement]

In case of low cycle fatigue, failures due to thermal fatigue of materials (cycle life) Nf conforms to the

Coffin-Manson model described by the following equation, where ∆ε is the plasticity strain amplitude.

ΔεP・Nfα=C ・・・・Eq. 2.3.7

“a” and “C” in the above equation are material constants.

In case of low cycle fatigue, failure due to repeated thermal stress conforms to the Coffin-Manson model,

and the temperature difference acceleration model is thought to be a form of that model. Semiconductor chip

failure can be broadly described using the temperature difference acceleration model, but the Coffin-Manson

model must be taken into account for mounting failures including package factors, such as the thermal fatigue

life of soldered portions. The following is a variation of the Coffin-Manson model on which the effects of the

temperature cycling frequency and maximum temperature, suggested by Norris and other persons.

Nf=C・fm・ΔεP-n・exp(Q/kTMAX) ・・・・Eq. 2.3.8

In the above equation; “Nf” is the fatigue life, “C” is the material constant, “m” and “n” are exponents, “f” is the

cycling frequency, “ΔεP” is the plasticity strain amplitude, “Q” is the activation energy, “k” is the Boltzmann's

constant and “TMAX” is the maximum temperature.

2-30

2.3.2 Acceleration Models for Operating Stress

Operating stresses that determine semiconductor device life include voltage, current, electric field strength,

current density, etc., and differ according to the failure mechanism as described in section 2.2.2. The main failure

mechanism acceleration models are described below.

Note that life in these models also depends on the temperature, so it is expressed by an Eyring model of the

operating stress and temperature stress.

(1) Time-dependent dielectric breakdown (TDDB) acceleration models

The life of devices (TTF) due to TDDB depends on the gate oxide film thickness. The Eox model is said to

be appropriate for those devices of which gate oxide film thickness is 5nm or more; the Vg model for more

than 2nm, and less than 5nm; and the Power-law model for 2nm or less.

① Eox model

TTF=A・exp(-γEOX・Eox) exp(Ea/kT) ・・・・Eq. 2.3.9

② Vg model

TTF=A・exp(-γVg・Vg) exp(Ea/kT) ・・・・Eq. 2.3.10

③ Power-law model

TTF=A・Vgn・exp(Ea/kT) ・・・・Eq. 2.3.11

In the above equations; “γEOX” is the field intensity acceleration factor, “γVg” and “n” are voltage acceleration

factors, “Eox” is the stress electric field applied to the gate and “Vg” is the stress voltage applied to the gate.

Normal state

Degraded state

Activated state

Fig. 2-23 Activation Energy

Act

ivat

ion

ene

rgy

2-31

(2) Hot carrier (HCI) acceleration models

The life of devices due to hot carriers is indicated by the substrate current model expressed by the substrate

current and the 1/Vds model expressed by the drain voltage. Since process nodes for the 0.25um and 0.15um

generations and newer devices, other impacts are greater than that of the substrate current, the 1/Vds model is

becoming the main one.

① substrate current model

TTF=A・Isub -m ・exp(Ea/kT) ・・・・Eq. 2.3.12

② 1/Vds model

TTF=A・exp(B/Vds)・exp(Ea/kT) ・・・・・Eq. 2.3.13

In the above equations; “m” is the factor depending on the substrate current, “B” is the factor depending on the

voltage, “Isub” is the maximum substrate current while stress is being applied and “Vds” is the drain voltage while

stress is being applied.

(3) Negative Bias Temperature Instability (NBTI) acceleration models

The life of devices due to NBTI is often indicated by the following equations:

TTF=A・exp(γ・Eox) exp(Ea/kT) ・・・・Eq. 2.3.14

TTF=A・Eoxγ ・exp(Ea/kT) ・・・・Eq. 2.3.15

TTF=A・Vgn ・exp(Ea/kT) ・・・・Eq. 2.3.16

In the above equations; “γ” is the field intensity acceleration factor, “n” is the voltage acceleration factor, “Eox” is

the stress electric field applied to the gate oxide film and “Vg” is the stress voltage applied to the gate oxide film.

(4) Electromigration (EM) acceleration model

In general, the life of devices due to EM is logically explained by the Huntington’s equation.

∂C/∂t=D∇｛∇C-(eZ*/kT) E・C｝・・・・Eq. 2.3.17

In the above equation; “C” is the atomic concentration, “D” is the diffusion factor, “Z*” is the effective valence,

“E” is the electric field, “e” is the electronic charge, “k” is the Boltzmann's factor and “T” is the absolute

temperature.

To calculate the actual life of devices due to EM (TTF), the Black’s equation which was derived empirically, is

widely used.

In the following equation; “T” is the absolute temperature, “j” is the current density, “Ea” is the activation energy,

“A” is the constant of proportionality, “n” is the function of the current density and“k” is the Boltzmann's factor.

TTF=A・j-n・exp(Ea/kT) ・・・・ Eq. 2.3.18

2-32

<References>

1) JEITA EDR-4704A: Application guide of the accelerated life test for semiconductor devices

2) JEITA EDR-4707：Report on Failure Mechanism of LSI and reliability test method

3) JEITA ETR-7024：Research Report on Effect of Voids on Reliability of Lead-Free Solder Joints and

Standard of Evaluation Criteria

4) N. J. Flood：Reliability aspects of plastic encapsulated integrated circuit, IRPS(1972)

5) D. S. Peck：Temperature-humidity acceleration of metal-electronics failure in semiconductor devices,

IRPS(1973)

6) N. Lycodes：The reliability of plastic microcircuit in moist environments, Solid State Technology(1978)

7) T. Gasser：Hot Carrier Degradation in Semiconductor Device

8) Comparison of NMOS and PMOS hot carrier effects, IEEE transaction on electron devices(1997)

9) H. B. Huntington：Diffusion in Solids, Academic Press(1975)

Chapter 2 Semiconductor Device Reliability Verification · Chapter 2 Semiconductor Device...

Documents

Transcript of Chapter 2 Semiconductor Device Reliability Verification · Chapter 2 Semiconductor Device...