Availability - courses.ece.ubc.ca · Availability and Reliability Availability: Probability that a...

Copyright © 2004-2005 Konstantin Beznosov

T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A

Availability

EECE 412

2

Where We Are

ProtectionAuthorization Accountability Availability

Acc

ess

Con

trol

Dat

a Pr

otec

tion

Audit

Non-Repudiation

Serv

ice

Con

tinui

ty

Dis

aste

r R

ecov

ery

Assurance

Req

uire

men

ts A

ssur

ance

Dev

elop

men

t A

ssur

ance

Ope

ratio

nal A

ssur

ance

Des

ign

Ass

uran

ce

AuthenticationCryptography

3

What do you already know?

How are error, fault, and failure different? What’s the difference between fail-stop

and Byzantine failures? How many nodes do you need to have

3-fault tolerance for Byzantine failures? What measures to deal with failures do you

know? What are the ways of achieving service

continuity in the presence of attacks?

4

Outline

Availability in the presence of failures

• FT terminology

• k fault tolerance

• two army problem

• Byzantine Generals problem

• Services continuity and disaster recovery

Availability in the presence of attacks

• Failures vs. attacks

• Random vs. scale-free networks

• Internet tolerance to attacks and failures




Availability in the Presence ofFailures

Failures, Errors, and Faults

A system is said to fail when it cannotmeet its promisesError may lead to a faultFault -- a cause of an error

errors failuresfaults

Fault Types

Transient: occur once and then disappear

Intermittent: occurs, then vanishes, thenreappears

Permanent: continues to exist

Availability and Reliability

Availability: Probability that a system operatescorrectly at any given moment and is available toperform its functions

Reliability: time period during which a systemcontinues to be available to perform its functions

Problem: calculate system availability andreliability if it’s unavailable for 1 second everyhour.

Fault Tolerance

A fault tolerant system can provide itsservices even in the presence of faults

Classification of Failure Modes

Type of failure Description

Crash failure A server halts, but is working correctly until it halts

Omission failure Receive omission Send omission

A server fails to respond to incoming requestsA server fails to receive incoming messagesA server fails to send messages

Timing failure A server's response lies outside the specified time interval

Response failure Value failure State transition failure

The server's response is incorrectThe value of the response is wrongThe server deviates from the correct flow of control

Arbitrary (a.k.a. Byzantine) failure A server may produce arbitrary responses at arbitrary times

Achieving k fault tolerance

A system is k fault tolerant if it can survivefaults in k components silent failure vs. Byzantine failure k+1 2k+1

Agreement among honest playerswith unreliable communications:

Two-army Problem

Even with nonfaulty processes, agreementeven between two processes is not possiblein the face of unreliable communications

Agreement among dishonest playerswith perfect communications:Byzantine Generals Problem

Results:1. In a system with m faulty processes,agreement can be achieved only if 2m+1correctly functioning processes are present(total 3m+1). (Lamport et al., 1982)

2. If messages cannot be guaranteed to bedelivered within a known, finite time, noagreement is possible even with one faultyprocess. (Fischer et al., 1985)

14

Ways to Deal with Failures

Service continuity• Masking failures via

• Redundancy of– information– time– physical

Disaster recovery• Backward recovery

• check pointing

• Forward recovery• bringing system into a correct new state

• Don’t underestimate backups!



Availability in the Presence ofAttacks

16

Failures vs. Attacks

Failure• Random (unintentional) unavailability of

participants and/or infrastructure elements

Attack• Systematic (intentional) unavailability of

participants and/or infrastructure elements

17

Random vs. Scale–free Networks

19

Internet Tolerance toAttacks and Failures

Source: R. Albert, H. Jeong, and A.-L. Barabasi, "Error and attack tolerance of complex networks,"Nature, vol. 406, no. 6794, 2000, pp. 378-82.

fraction of nodes destroyed

netw

ork

diam

eter

Scale-free networks are failure-tolerant Random networks are attack-tolerant

20

Ways to Deal with Attacks

Service continuity• Same as for FT, plus• Heterogeneity

• Diversification– Avoid monocultures

• Randomization– Avoid “hubs”

Disaster recovery• Same as for FT

21

Summary

Availability in the presence of failures

• FT terminology

• k fault tolerance

• two army problem

• Byzantine Generals problem


Availability in the presence of attacks

• Failures vs. attacks

• Random vs. scale-free networks

• Internet tolerance to attacks and failures


22

What did you learn?

How are error, fault, and failure different? What’s the difference between fail-stop

and Byzantine failures? How many nodes do you need to have

3-fault tolerance for Byzantine failures? What measures to deal with failures do you

know? What are the ways of achieving service

continuity in the presence of attacks?

Availability - courses.ece.ubc.ca · Availability and Reliability Availability: Probability that a...

Documents

Transcript of Availability - courses.ece.ubc.ca · Availability and Reliability Availability: Probability that a...