RELIABILITY & FAILURE ANALYSIS

download RELIABILITY & FAILURE ANALYSIS

of 44

Transcript of RELIABILITY & FAILURE ANALYSIS

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    1/44

    EMT 480/3: RELIABILITY &

    FAILURE ANALYSIS

    original version !

    oraini "#$%an

    Ei#e ! 'asni(a$ Aris

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    2/44

    1. Reliability Engineering2. Design for Reliability (DFR)3. Reliability Prediction Techniques

    . F!E"#. FT"$. Reliability %ife Testing

    Lecture contents

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    3/44

    Reliability

    the ability of a &roduct to confor' to itselectrical and visual/mechanical specications

    oer a s&ecied &eriod of ti'e unders&ecied conditions at a specied condencelevel

    Reliability Engineering refers to the deelo&'ent of technology*

    &rocesses and standards to ensure thereliability semiconductors during applications.

    Focuses on eliminating maintenancerequirements

    Ter's + Denitions

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    4/44

    Reliability !onitoring

    consists of getting nished product samples from the line andsubjecting these to reliability testing. Valid reliability failuresshould undergo root cause analysis for reliability improvement

    ,afer-leel Reliability Testing

    once an integrated circuit has been designed and the rstsilicon comes out, reliability tests at wafer-level are done toassess the reliability of the die

    Pacage-leel Reliability Testing

    refers to the assessment of the overall reliability of the device

    in a pacaged form.

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    5/44

    /e0 Product ualication

    operationally the same as pacage-levelreliability testing, e!cept that it is systemi"ed

    with the objective of generating o#icialreliability data that would justify the massproduction of a new product.

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    6/44

    $he concept is to e!ert as much e#ort aspossible to design a &roduct to be inherentlyreliable

    $his consist of follo0ing all no0n designrules for 'aing a &roduct reliable, not onlyelectrically but visually and mechanically as

    possible %uilding reliability into a &roduct as early as

    the design &hase is a &must'.

    esigning for Reliability (DFR)

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    7/44

    (eliability design begins with the specication ofreliability goals consistent with cost and&erfor'anceobjective

    $hese goals must be translated into indiidualco'&onent* subco'&onent and &arts&ecications

    4arious design 'ethods are then applied inorder to meet the goals )such as stress-strengthanalysis, simplication etc.*

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    8/44

    + failure analysis is then performed to determinewhether s&ecications are being 'et and toprovide a systematic approach for identifying,

    raning and eliminating failure modes

    f either the reliability or the safety goals are notmet, the design process must continue

    ften, it may require a complete redesign

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    9/44Reliability Design

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    10/44

    n summary, an e!cellent reliability engineeringsystem would have all of the followingcomponents

    )a* design for reliability )b* wafer-level reliability testing

    )c* pacage-level reliability testing

    )d* new product/process qualication

    )e* reliability monitoring

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    11/44

    (eliability prediction is a design-assist process by

    which the reliability characteristics of a system areobtained, by calculating the anticipated system (+0)(eliability, +vailability, aintainability and 0afety-ntegrity*from assumed component failure rates.

    $he mportance of (eliability 1rediction

    )a* provides early indication of a system's potential to meetthe design reliability requirements

    )b* enables assessment of life-cycle costs to be carried out

    )c* enables one to establish which components, or areas in adesign contribute to the major portion of unreliability

    )d* enables trade-o#s to be made, as for eg. betweenreliability and maintainability in achieving a given availability

    ,hat is Reliability PredictionTechniques5

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    12/44

    $raditionally, reliability has been achieved throughe!tensive testing and use of techniques such asprobabilistic reliability modeling )$hese are techniques

    done in the late stages of development* $he challenge is to design in quality and reliability

    early in the development cycles

    (eliability of a device could be nown up-front, during

    the design phase and before the device ismanufactured

    $his could avoid costly redesign cycles.

    ,hy Reliability Prediction Techniquesis /eeded5

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    13/44

    aterial quality

    perating temperature

    Vibration and miscellaneous mechanical factors

    2lectrical stress levels

    ,hat "re The Factors That "6ect The Reliability Perfor'anceof Electronic 7o'&onents5

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    14/44

    F2+ stands for Failure odes and 2#ects +nalysis

    t is a methodology designed)i* to identify potential failure modes for a product or

    process3

    )ii* to assess the ris associated with those failure modes3

    )iii* to ran the issues in terms of importance3 and

    )iv* to identify and carry out corrective actions to address themost serious concerns

    (a) 8ntroduction

    8ntroduction to F!E"

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    15/44

    For easy understanding, just remember that F2+is intended to

    document)i* a Failure

    )ii* its ode

    )iii* its 2#ects

    )iv* by +nalysis

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    16/44

    $here are 4 standard categories of F2+ 5esign F2+ )5F2+*

    addresses potential failure modes arising during

    design of components and subsystems

    1rocess F2+ )1F2+*

    addresses potential failure modes arising during

    manufacturing and assembly processes

    (b) Ty&es of F!E"

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    17/44

    $he process for conducting F2+ is summari"ed as

    follows

    )a* 5escribe product or process

    )b* 5ene Functions )c* dentify 1otential Failure odes

    )d* 5escribes 2#ects of Failures

    )e* 5etermine 6auses

    )f* 5irection ethods or 6urrent 6ontrols

    )g* 6alculate (iss 7 use (is 1riority 8umber )(18* )h* $ae +ction

    )i* +ssess (esults

    (c) Process for 7onducting F!E"

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    18/44

    + typical F2+ incorporates some method toevaluate the ris associated with the potentialproblems identied through the analysis. ne of itis by using the (is 1riority 8umbers )(18*

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    19/44

    $o use (18 method to assess ris, the analysisteam must

    )a* (ate the severity of each e#ect of failure

    )b* (ate the lielihood of occurrence for each

    cause of failure )c* (ate the lielihood of prior detection for

    each

    cause of failure

    )d* 6alculate the (18 by obtaining theproduct of

    the three ratings

    RP/ 9 :eerity ;

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    20/44

    "n E;a'&le of F!E" =a>ard "ssess'ent

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    21/44

    mprove product/process reliability and quality

    ncrease customer satisfaction

    2arly identication and elimination of potential

    product/process failure modes

    1rioriti"e product/process deciencies

    6apture engineering/organi"ation nowledge

    (d) ?enets of F!E"

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    22/44

    F$+ stands for Fault $ree +nalysis

    t is a graphical representation of themajor faultsor critical failures associated with a product, thecauses for the faults, and potentialcountermeasures

    $he tool helps identify areas of concern for new

    product design or for improvement of e!istingproducts. t also helps identify corrective actionsto correct or mitigate problems

    n a Fault $ree, one wors in a9failure space:, and

    loos at system failure combinations

    (a) ,hat is a FT"5

    8ntroduction to Fault Tree "nalysis(FT")

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    23/44

    (b) ,hen to use it5

    F$+ is useful both in designing newproducts/services or in dealing

    with identied problems in e!isting

    products/services.

    n the quality planning process, the analysis can beused to optimi"e

    process features and goals and to design forcritical factors and human error. +s part of process improvement, itcan be used to

    help identify root causes of trouble and to design

    remedies and

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    24/44

    $he basic constructs in a Fault $ree 5iagram are )a* gates ); representconditions* )b* events )represent the system failure mode* $he two most commonly used gates are )a* +85 gates )b* ( gates

    f occurrence ofeither eventcauses the top event tooccur, then these events )blocs* are connectedusing an( gate

    +lternatively, ifboth eventsneed to occur to causethe top event to occur, they are connected by an

    +85 gate

    (c) ?asic 7onstructs of FT"

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    25/44

    Example:For the 9$op 2vent: to occur, either + or % must

    happen. n other

    words, failure of + or %, causes the system to fail.

    equivalent < (eliability

    %loc 5iagram =

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    26/44

    :y'bols used in FT"

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    27/44

    1. :elect a to& leel eent for analysis.$ry to be specic, fore!ample, 92mail server down for more than > hours.: 0ources oftop level events include 1roblem/?nown 2rror (ecords3potential failures from brainstorming3 etc.

    2. 8dentify faults that could lead to the to& leel eent.6ontinuing the above e!ample, some possible faults leading to anoutage lasting more than four hours might be 9loss of power:,another might be 9hardware failure.: @ist all the faults under the

    top level event in bo!es and connect the fault bo!es to the toplevel event bo! by drawing lines.

    (d) =o0 to Perfor' FT" in $ ste&s

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    28/44

    3. For each fault* list as 'any causes as&ossible in bo;es belo0 the related fault.6ontinuing the e!ample above, in the case of 9loss

    of power,A some causes might be 9electricaloutage,: 9power supply failure,: and so on. 6onnectthe bo!es to the appropriate fault bo!.

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    29/44

    . Dra0 a diagra' of the @fault tree.A$wo logicoperators 7 +85 and (, also nown as logic gates7 are used to represent the sequencing of faults andcauses. For e!ample, 92mail server down for more

    than > hours: could be caused by 9loss of power: or9hardware fault.A +nother might be 9loss of buildingpower: and 9battery bacup e!hausted.:

    Bpdate faults and causes by grouping logically

    related items using +85 or ( between faults andevents3 and faults and causes. (e-draw the linesfrom top level event to logic gates to faults to logicgates to causes.

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    30/44

    #. 7ontinue identifying causes for each fault until you reacha root cause (reactie FT")* or one that you can doso'ething about (&roactie FT").For e!ample, the root causeof 9power supply failure: might be 9lter clogged3A the root causeof 9battery bacup e!hausted: might be 9battery bacup too

    small.A

    $. 7onsider counter'easures. + root cause is one you can dosomething about3 so now you need to thin of thecountermeasures you might apply to each root cause. @istcountermeasures for each root cause in a bo! under the rootcause. For e!ample, for 9lter clogged: a countermeasure might

    be 9clean lter monthly.: @in the countermeasure to the rootcause by drawing a line.

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    31/44

    E;a'&leB

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    32/44

    :olutionB

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    33/44

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    34/44

    ?urn-inB

    + process of operating items at elevated stresslevels )particularly temperature, humidity andvoltage* in order to accelerate the processesleading to failure. $he populations of defectiveitems are thus reduced

    :creeningB +n enhancement to Cuality 6ontrol whereby

    additional detailed visual and electrical/mechanicaltests see to reveal defective features which wouldotherwise increase the population of &wea' items

    Reliability %ife Testing

    (b) ?urn-8n and :creening

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    35/44

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    36/44

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    37/44

    0everal types of 200 testing available are listed asfollows

    )i* $emperature 6ycling )ii* $hermal 0hoc

    )iii* Dumidity $esting

    )iv* $emperature, Dumidity, %ias )$D%* $esting

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    38/44

    (i) Temperature Cycling

    () (efers to the process in which a product issubjected to multiple cycles of changingtemperatures between pre-determined e!tremes

    at relatively high rates of change fatiguing andcausing inferior product to fail

    () 6ycling will show at what temperature, bothhigh and low, a product will cease to function

    properly

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    39/44

    (ii) Thermal Shock

    (efers to rapid temperature changes from e!treme cold tohot environment to thermally shocs and stresses a products

    $his causes permanent changes in electrical performanceand can cause sudden overloading of materials

    $hermal shoc failures are due to thermal mismatches ormaterials with di#erences in rates of thermal e!pansion andcontraction

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    40/44

    (iii) Humidity Testing Dumidity testing normally involves high heat to

    aid in forcing water vapor through wealysealed components

    any electronic devices are susceptible to thedamaging e#ects of moisture both by directcondensation and indirect e#ects

    5irect condensation is where water comes out ofthe air and forms droplets on a device

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    41/44

    $hese droplets may nd their way into thedevice and attac sensitive components

    6ommon e#ects include shorting of electricalcomponents and initiation of corrosive e#ects

    ndirect e#ects are numerous

    2!ample is moisture breaching sealed deviceswhich results in failures over time

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    42/44

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    43/44

  • 7/26/2019 RELIABILITY & FAILURE ANALYSIS

    44/44

    any +@$ of semiconductors involve temperature assemiconductor properties are usually have a strongtemperature dependency

    $he most common accelerated test condition is as follows

    )a* echanical 0hoc )b* 5rop 0hoc )$est* )c* Voltage 2!tremes )d* Digh Dumidity )e* (andom Vibration $est3 etcH