Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components...

32
Prof. Enrico Zio Reliability of Simple Systems Prof. Enrico Zio Politecnico di Milano Dipartimento di Energia

Transcript of Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components...

Page 1: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Reliability of Simple Systems

Prof. Enrico Zio

Politecnico di MilanoDipartimento di Energia

Page 2: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Background

n Definition under IEC 50 (191):n Summarising expression to describe availability and its influencing factors,

reliability and maintainability.n Note: Dependability is only used for general descriptions of non-quantitative

character.

n Broad definition:n Dependability is the methodical approach of estimating, analysing and

avoiding failures in the future.

RAMDependability

Reliability Availability Maintainability

MTBF R(t)

MTBFMTBF + MDTA=

MDT MTTR

Page 3: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Index

1. Probability theory: basic definitions

2. Reliability analysis: theory and examples

Page 4: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Index

1. Probability theory: basic definitions

2. Reliability analysis: theory and examples

Page 5: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Introduction: reliability and availability

• Reliability and availability: important performance parameters of a system, with respect to its ability to fulfill the required mission in a given period of time

• Two different system types:Ø Systems which must satisfy a specified mission within an

assigned period of time: reliability quantifies the ability to achieve the desired objective without failures

Ø Systems maintained: availability quantifies the ability to fulfill the assigned mission at any specific moment of the life time

Maintainablilty:Ability of a unit, under given circumstances, to maintain or respectively to reset its actualstate so that the desired requirements are met, provided that maintenance is carried outusing the specified resources and stated procedures.

Page 6: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Basic definitions (1)

Reliability is the ability of an item to perform a required function under stated conditions for a stated period of time.

Therefore….the failure is an eventwhereby a unit or componentunder consideration is no longercapable of fulfilling a required functionunder stated conditionsfor a stated the period of time.

Page 7: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Basic definitions (2)

The required function includes the specification of satisfactory operation as well as unsatisfactory operation. For a complex system, unsatisfactory operation may not be the same as failure.

The stated conditions are the total physical environmental including mechanical, thermal, and electrical conditions.

The stated period of time is the time during which satisfactory operation is desired, commonly referred to as service life.

Page 8: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Basic definitions (3)

• T = Time to failure of a component (random variable)Ø cdf = FT(t) = probability of failure before time t: P(T<t)

Ø pdf = fT(t) = probability density function at time t:

fT(t)dt = P(t<T<t+dt)

Ø ccdf = R(t) = 1- FT(t) = reliability at time t: P(T>t)

Ø hT(t) = hazard function or failure rate at time t

)()(

)()()|()(

tRdttf

tTPdttTtPtTdttTtPdtth T

T =>+£<

=>+£<=

Page 9: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Hazard function: the bath-tub curve

0 1 2 3 4 5 6 7 8 9 10 11 12 13Operating time t in years

Failu

re ra

te l

Three types of failures:

- Early failures (Infant mortality), caused by errors in design, defects in manufacturing, etc..

- Wear-out failures, caused by ageing.

(Both types are systematic failures and could be prevented by improvement in design, manufacturing, maintenance).

- Random failure: appear spontaneously and purely by chance.

Early failures

Characteristic: The failure rate is initially high, but rapidly decreases.

These types of failure rates result in the traditional bathtub curve

l total

Bathtub curve model of failure rate (Example)

Wear-out failure

Characteristic: The failure rate increases monotonically.

Random failure

Characteristic: Constant failure rate during the whole lifetime of the units.

Page 10: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Hazard function: the bath-tub curve

• The hazard function shows three distinct phases:i. Decreasing - infant mortality or burn in period

ii. Constant - useful life

iii. Increasing - ageing

(i) (iii)(ii)

lThe unit of the failure rate l is failure/time, often indicated as FIT (Failure in Time).e.g. 1 FIT = 1 Failure per 109h in FRU(Field Replaceable Unit) employed in the railway industry

Page 11: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

In electronic hardware, particularly computer systems, a field-replaceable unit (FRU) is a circuit board or part that can be quicklyand easily removed and replaced by the user or by a technicianwithout having to send the entire product or system to a repairfacility. The defective unit is found by standard troubleshootingprocedures, removed, and either discarded or shipped back to the factory for repair. The new unit is installed directly in place of the defective one.

The FRU scheme is often the most cost-effective way to maintaincomplex systems, and is a major motivating factor behind the evolution of modular construction. When backed up by good parts availability, knowledgeable technical support, and reader-friendlydocumentation, this approach can minimize system downtime and optimize reliability.

Field-Replaceable Unit (FRU)

Page 12: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

The exponential distribution (1)

• hT(t) = l, t ³ 0

• Only distribution characterized by a constant hazard rate

• Widely used in reliability practice to describe the constant part of the bath-tub curve

tT etTPtF l--=£= 1)()(

( ) 00 0

tTf t e t

t

ll -ì = ³í

= <îFT(t)

1

t t

fT(t)l

fT(t) =λe-λt

Page 13: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

The exponential distribution (2)

• The expected value and variance of the distribution are:

• Failure process is memoryless

21 1[ ] ; [ ]E T MTTF Var Tl l

= = =

1 22 1

1

1 2 2 11 2 1

1 1

( )

( ) ( ) ( )( | )( ) 1 ( )

1

T T

Tt t

t tt

P t T t F t F tP t T t T tP T t F t

e e ee

l ll

l

- -- -

-

< < -< < > = = =

> -

-= = -

Page 14: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

The exponential distribution (3)IEC 61709: Electronic components –Reliability –Reference conditions for failure rates and stress models for conversion

The failure rate under given operating conditions is calculated as follows:

TIUfRe p×p×p×l=lwherelref is the failure rate under reference conditions;pU is the voltage dependence factor;pI is the current dependence factor;pT is the temperature dependence factor.These Parameter are listed, e.g in the SN29000 library!

Reference conditions for climatic and mechanical stresses

Type of stress Reference condition 1)

qamb, ref = 40 °C

Climatic conditions Class 3K3 as per IEC 721-3-3Mechanical stress Class 3M3 as per IEC 721-3-3Special stresses 3) NoneFor details of notes (-1, -2,-3) please refer to IEC 61709

Ambient temperature 2)

The definitions, reference conditions and conversion models used in the IEC 61709 fullycorrespond with the already existing SIEMENS standard SN 29500 method.

Page 15: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

The Weibull distribution

• In practice, the age of a component influences its failure process so that the hazard rate does not remain constant throughout the lifetime

altT etTPtF --=£= 1)()(

1( ) 00 0

tTf t t e t

t

aa lla - -ì = ³ïí

= <ïî

2

21 1 1 2 1[ ] 1 ; [ ] 1 1E T Var Tl a l a a

æ öæ ö æ ö æ ö= G + = G + -G +ç ÷ ç ÷ ç ÷ç ÷è ø è ø è øè ø

ò¥ -- >=G0

1 0)( kdxexk xk

Page 16: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Index

1. Probability theory: basic definitions

2. Reliability analysis: theory and examples

Page 17: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Definition of the problem

• Objective:Ø Computation of the system reliability R(t)

• Hypotheses:Ø N = number of system components

Ø The components’ reliabilities Ri(t), i = 1, 2, …, N are known

Ø The system configuration is known

Page 18: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Series system

EPROM Capa.

Resi.

Fan

Trans. Diode

DC/DC

Optic. Module

IC

PINs

LED

Solder joints

ASIC

field-replaceable HW-unit (Example)

EPROM

Optic. Module

PINs

LED

IC IC IC

Trans. DiodeDiode

Trans.Trans.

Fan

Capa. Capa. Capa. Capa. Capa.

Resi.Resi. Resi.Resi. Resi.Resi. Resi.

ASIC

Resistors 1 FIT 8 16 8 FITCapacitors 2 FIT 6 12 12 FITDiodes 8 FIT 3 6 24 FITTransistors 15 FIT 4 12 60 FITICs 25 FIT 4 64 100 FITEPROM 100 FIT 2 64 200 FITDC/DC 40 FIT 2 28 80 FITASIC 250 FIT 2 1016 500 FITFAN 150 FIT 2 10 300 FITOptical Module 800 FIT 2 32 1.600 FITSolder joints 0,1 FIT 1260 126 FIT

3.010 FIT

Failure rates ofl Components

Sum offailure rates

Name ofComponents

No.ofPins

No.ofComp.

Example: tot. failure rate of the HW-unit l unit =

Page 19: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Series system

• All components must function for the system to function

• For N exponential components:

)()(1

tRtRN

iiÕ

=

=

etR tl-=)([ ]

1

1

N

ii

E T

l l

l

=

ì=ïï

íï =ïî

å System failure rate

MTTF

Page 20: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Parallel system

• All components must fail for the system to fail

• For N exponential components:

[ ]1

( ) 1 1 ( )N

ii

R t tR=

= - -Õ

( )1

1 1N

ti

i

R t e l-

=

= - é - ùë ûÕ( )

1

1 1 1

2 11

1 1 1

1

1 1

1 11

N N N

i i j ii i j

N N NN

Ni j i k j i j k

ii

MTTFl l l

l l l l

-

= = = +

- --

= = + = +

=

ì= - +ï +é ùë ûïï

í+ - + -ï + +é ùë ûï

ïî

å åå

åå åå

!

Page 21: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Parallel system: an example

• Two exponential units with failure rates l1 and l2

• For N identical elements, compare series and parallel

! ! ( ) ( ) ( )1 2 1 2 1 2 1 2( ) ( )1 2

1 2

( ) 1 (1 )(1 ) ,t t t t t t tR t t t seriese e e e e e eR RR R

ll l l l l l l l- - - - - + - - += - - - = + - > > =

[ ]1 2 1 2

1 1 1MTTFl l l l

= + -+

1

1

1

N

n parallel MTTF

n

series MTTFN

l

l

=

ü= ïï

ýï= ïþ

å1

1 1N

series paralleln

MTTF MTTFN n

l l=

× = < = ×å

Page 22: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

r-out-of-N system

• N identical components function in parallel but only r are needed (parallel system: r = 1)

• For N identical exponential components:

( )eekN

tR t kNktN

rk

ll - --

=

-÷÷ø

öççè

æ=å 1)( 1N

k rMTTF

kl=

Page 23: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Standby system

• One component is functioning and when it fails it is replaced immediately by another component (sequential operation of one component at a time)

• The system configuration is time-dependent Þ the story of the system from t = 0 must be considered

• Two types of standby:Ø Cold: the standby unit cannot fail until it is switched on

Ø Hot: the standby unit can fail also while in standby

Page 24: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Cold standby (1)

• Since the components are operated sequentially, the

system fails at time , which is a random

variable sum of N independent random variables

• Convolution theorem

Example: 2 components

å= =Ni iTT 1

[ ] ò==

ò -==+=Þþýü

¥×-

¥

¥-

0

21212122

11

)()(~)(

)()()(*)()(,)(,)(,

dxxfesfxfL

dxxtfxftftftfTTTtfTtfT

xs

TTTTTT

T

[ ] )(~)(~)(*)()(~ 2121 sfsftftfLsf TTTTT ==

Page 25: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Cold standby (2)

Example: N components

ò-=t

T dxxftR0

)(1)(

Õ==

N

iTiT sfsf

1)(~)(~

Page 26: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Cold standby: an example

• Consider a “cold” standby system of two units

• The on-line unit has an MTTF of 2 years

• When it fails, the standby unit comes on line and its MTTFis 3 years

• Assume that each component has an exponential failure times distribution

(1) What is the probability density function of the system failure time? What is the MTTF of the system?

(2) Repeat assuming that the two components are in parallel in a one-out-of-two configuration

Page 27: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

• T1 and T2 are independent random variables denoting the times when the on-line and standby units are operating, respectively

• The system failure time is also a random variable, T = T1+T2

Cold standby: an example – solution (1)

yrs21

1 =lyrs31

2 =l

τ t

On-line, T1 Standby, T2

1 2 1 2 2 2 1 2( ) ( ) ( )1 2 1 2 1 2

0 0 0

( )t t t

t t tsysf t e e d e e d e e dlt l t l l t l l l l tl l t l l t l l t- - - - - - - - -= = = =ò ò ò

( )2 1 2 1 2( ) /3 / 21 2 1 20

2 1 2 1( )tt t t t yrs t yrse e e e e el l l t l ll l l l

l l l l- - - - - - -= = - = -

- -

ò =ò==¥

0 0

1)(i

tiTii dtetdtttfMTTF i

ll l

)(tfT

Page 28: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Cold standby: an example – solution (2)

yrstu3

=yrst2

=x

( )22

0 0(3 ) 2[ ] [ ]u uMTTF yrs ue e yrs e ex xx

¥ ¥- - - -= - - - - - =

( ) ( )2 21 13 2 5yrs yrs yrsyr yr

æ ö æ ö= - =ç ÷ ç ÷

è ø è ø(3.8yrs for parallel!)

ò ò -==¥ ¥

--

o

yrstyrstT dttetedtttf

yrstu

0

2/3/ )()(3

Page 29: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Hot standby (1)

• The convolution theorem can no longer be used to calculate the reliability of the system, because there is no independence of failures any more

• Simple case of two components: the system will perform its task in the interval (0, t) in either of the two mutually exclusive ways:

Ø the online component 1 does not fail in (0, t)[probability = R1(t) ]

Ø the online component fails in (0, t) [probability = fT1(t)dt];the standby component 2 does not fail in (0, t) [probability Rs(t)]

and it operates successfully from t to t [probability R2(t-t)]

Page 30: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Hot standby (2)

• The system reliability is given by the sum of the probabilities of the two mutually exclusive events:

• For 2 exponential components:

1 1 2

1 2 1

( )1

0

1 ( )

1 2

( ) s

s

tt t

t t t

s

R t de e e e

e e e

t t tl l l l

l l l l

tl

ll l l

- - - - -

- - - +

= + =

= + é - ùë û+ -

ò

)()()()()( 20

11 tttt -+= ò tRRdftRtR s

t

Page 31: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Time-dependent systems: an example

• When both A and B are fully energized they share the total load and the failure densities are fA(t) and fB(t)

• If either one fails, the survivor must carry the full load and its failure density becomes gA(t) or gB(t)

A

B

Find the reliability R(t) of the system if

( ) ( ) tA Bf t f t e ll -= = ( ) ( ) 1k t

A Bg t g t k e kll -= = >

Page 32: Reliability of Simple Systems - polimi.it...2021/02/04  · IEC 61709: Electronic components –Reliability – Reference conditions for failure rates and stress models for conversion

Prof. Enrico Zio

Time-dependent systems: an example - solution

• R(t) = P{system survives up to t} = P{neither component fails before t}+P{one fails at some time t < t, the other one survives up to t, with f(t), and from t to t with g(t)} =

( )( ) ( )( )2 2 (2 )

0 0

2 2t t

k tt t k t ke e d e e e e e dl tl lt lt l l l tl t l t- -- - - - - - -= + = + =ò ò

222

k t te kek

l l- --=

-