1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer...

14
1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University [email protected]

Transcript of 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer...

Page 1: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

1

Fault-Tolerant Computing Systems#4 Reliability and Availability

Pattara LeelapruteComputer Engineering DepartmentKasetsart [email protected]

Page 2: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

2

Reliability and Availability

Reliability The probability that a system survives till time t

(it has not fail till t)

AvailabilityThe probability that a system works properly at

time t

Page 3: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

3

Preliminaries of Probability Discrete sample space:

Tossing a coin {head, tail} sample space

Continuous sample space: How long the pc stays up after reboot {t | t>0} sample space

Random variable A function mapping each element of sample space to

a real number Ex. heads=1, tails=0

Page 4: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

4

Preliminaries Random variable

A function mapping each element of sample space to a real number

CDF (Cumulative distributed function) FX (t) = Pr [X ≤ t]

Pr : probability that the system has gone down by time t Pdf (Probability density function)

f(t) = dF(t) / dx Expected Value, Mean

E[X] = t f(t)dt (X≥0)

Average outcome of the random experiment expect value, mean of a random variable

Page 5: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

5

Exponential DistributionThe most commonly used distribute function in reliability

modeling. CDF

F(t) = 1 – e-t

pdf f(t) = e-t

Mean

Memoryless property Y = X – t Gt(y) = Pr [Y ≤ y | X > t ] = 1 – e-y

Distribute of remaining life of a component does not depend on how long it has been working.

The component does not AGE !(remaining life of X does not depend on the time that has passed)

F(t) = 1 – e-2t

f(t) = 2e-2t

Page 6: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

6

Reliability Reliability

The probability that a system survives till time t

R(t) = Pr [X > t]

= 1 – F(t) X : Random probability

variable X which represents a time to failure of the system (the life of the system)

R(t): represents probability that the system survives till time t

F(t) = 1 – e-2t

R(t) = e-2t

time 0 Xt

time t

F(t) = exponential Distribution

Page 7: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

7

Reliability Reliability

R(t) = Pr [X > t]

= 1 – F(t) R(0) = 1

The system is initially working R() = 0

No system has infinite lifetime

F(t) = 1 – e-2t

R(t) = e-2t

time 0 Xt

time t

F(t) = exponential DistributionR(t) = reliability

Page 8: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

8

F(t) = 1 – e-2t

R(t) = e-2t

f(t) = 2e-2t

Failure Rate

f(t)t Probability that fault

will occur in time [t, t+t]

f(t)t / R(t) Probability of

occurrence of fault at time [t, t+t], when the system is working properly at t

Failure Rate

f(t) / R(t)[t, t+t]

=

Probability that fault will occur in an interval time [t, t+t]

f(t) = probability of faultF(t) = exponential DistributionR(t) = reliability

Page 9: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

9

Bathtub Curve Failure Rate

f(t) / R(t)

Bathtub Curve General Failure Rate

observed from the empirical data collected from mechanical and electronic component

When lifetime of a system F(t) is exponential distribution , it has a constant Failure Rate (see previous slide)

Failu

re ra

teTime

Failu

re ra

teTime

1.Initial stage:•Inherit defects•faulty design

3.last stage:•faults caused by age

2.constant failure rate

Page 10: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

10

MTTF (Mean Time To Failure)

MTTF E[X] =

t f(t)dt = R(t)dt

X: the Expected value of the probability variable which represents time till fault occurs in the system

When R(t) = e-t (X is exponential distribution) Failure Rate = MTTF = 1 /

time 0

expected value

Page 11: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

11

Availability The probability that a system works properly at

time t Availability is a measure that is frequently used for

describing the behavior of the system

*If the system has no repair or replacement, availability is equal to reliability R(t)

R(t): the probability that no failures have occurred during the whole period (0,t)

Operational Under repair Operational

fails repairs fails repairs

tXi Xi+1

Ui Ui+1

Xi+2

Page 12: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

12

Availability Instantaneous availability (ทั�นทั�ทั�นใด)

A(t) = Pr [probability that the component is functioning correctly at t ]

Steady-State Availability (general meaning) A = limt→∞ A(t)

fails repairs fails repairs

tXi Xi+1

Ui Ui+1

Xi+2

Page 13: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

13

Availability When Xi, Ui is exponential distribution

FXi(t) = 1 – e-t, FUi(t) = 1 – e-t

Instantaneous Availability

A(t) = ( e- ( +)t ) /(+ )Steady-State Availability

A = limt→∞ A(t) = /(+ )

tXi Xi+1

Ui Ui+1

Xi+2

Page 14: 1 Fault-Tolerant Computing Systems #4 Reliability and Availability Pattara Leelaprute Computer Engineering Department Kasetsart University pattara.l@ku.ac.th.

14

MTTR (Mean Time To Repair) MTTR (mean time to repair)

MTTR = E [ Ui ] Ui : the random variable that represents the downtime for i th repair or replacementE[Ui] : the Expected value of Ui

MTTF (mean time to failure) MTTF = E [ Xi ]

Xi : the random variable that represents the duration of the i th function period.E[Xi] : the Expected value of Xi

Steady-State AvailabilityA = MTTF / (MTTF+MTTR)

= /(+ )  (Xi,Ui is the exponential distribution of parameter )

tXi Xi+1Ui Ui+1

Xi+2