Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The...

40
Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: [email protected] (joint work with Bharath Balasubramanian and John Bridgman)

Transcript of Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The...

Page 1: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Tolerating Faults in Distributed Systems

Vijay K. GargElectrical and Computer Engineering

The University of Texas at AustinEmail: [email protected]

(joint work with Bharath Balasubramanian and John Bridgman)

Page 2: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Fault Tolerance: Replication

2

Server 1 Server 2 Server 3

1 Fault Tolerance

2 FaultTolerance

Page 3: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Fault Tolerance: Fusion

3

1 FaultTolerance

Server 1 Server 2 Server 3

Page 4: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Fault Tolerance: Fusion

4

2 FaultTolerance

`Fused’ Servers : Fewer Backups than Replication

Server 1 Server 2 Server 3

Page 5: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Motivation

5

Coding Replication Fusion

Space Efficient Wasteful Efficient

Recovery Expensive Efficient Expensive

Updates Expensive Efficient Efficient

Probability of failure is low => expensive recovery is ok

Page 6: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

OutlineCrash Faults

Space savingsMessage savings Complex Data Structures

Byzantine FaultsSingle Fault (f=1), O(1) dataSingle Fault, O(m) dataMultiple Faults (f>1), O(m) data

Conclusions & Future Work

6

Page 7: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Example 1: Event Counter

7

n different counters counting n different itemscounti= entry(i) – exit(i)

What if one of the processes may crash?

Page 8: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Event Counter: Single Fault

8

fCount1 keeps the sum of all countsAny crashed count can be recovered using remaining

counts

Page 9: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Event Counter: Multiple Faults

9

Page 10: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Event Counter: Theorem

10

Page 11: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Shared Events: Aggregation

11

Suppose all processes act on entry(0) and exit(0)

Page 12: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Aggregation of Events

12

Page 13: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Some Applications of FusionCausal Ordering of Messages for n Processes

O(n2) matrix at each processReplication to tolerate one fault: O(n3) storageFusion to tolerate one fault: O(n2) storage

Ricart and Agrawala’s AlgorithmO(n) storage per process, 2(n-1) messages/mutexReplication: n backup processes each with O(n) storage,

2(n-1) additional messagesFusion: 1 fused process with O(n) storage

Only n additional messages

13

Page 14: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

OutlineCrash Faults

Space savingsMessage savings Complex Data Structures

Byzantine FaultsSingle Fault (f=1), O(1) dataSingle Fault, O(m) dataMultiple Faults (f>1), O(m) data

Conclusions & Future Work

14

Page 15: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Example: Resource Allocation, P(i)

15

user: int initially 0;// resource idlewaiting: queue of int initially null;

On receiving acquire from client pid if (user == 0) { send(OK) to client pid; user = pid; } else waiting.append(pid);

On receiving release if (waiting.isEmpty()) user = 0; else { user = waiting.head(); send(OK) to user; waiting.removeHead(); }

Page 16: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Complex Data Structures: Fused Queue

16

a1 a2

a3

a4

a5a6a7

a8

b1

b2b3b4

b5

head

tail tail

head

(i) Primary Queue A (i) Primary Queue B

HeadA

a2

a3 + b1

a4 + b2

a5 + b3

a6 + b4

a7 + b5a8 + b6

a1

HeadB

tailA tailB

(iii) Fused Queue F

Fused Queue that can tolerate one crash fault

Page 17: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Fused Queues: Circular Arrays

17

Page 18: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Resource Allocation: Fused Processes

18

Page 19: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

OutlineCrash Faults

Space savingsMessage savings Complex Data Structures

Byzantine FaultsSingle Fault (f=1), O(1) dataSingle Fault, O(m) dataMultiple Faults (f>1), O(m) data

Conclusions & Future Work

19

Page 20: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Fault Tolerance: Replication

20

13 8 45

13 8 45

13 8 45 (2f+1)*n processes

Page 21: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Goals for Byzantine Fault ToleranceEfficient during error-free operationsEfficient detection of faults

No need to decode for fault detectionEfficient in space requirements

21

Page 22: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Fault Tolerance: Fusion

22

13 8 45

13 8 45

66

P(i)

Q(i)

F(1)

11

Page 23: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Faults (f=1)

Assume n primary state machine P(1)..P(n), each with an O(1) data structure.

Theorem 2: There exists an algorithm with additional n+1 backup machines withsame overhead as replication during normal operations additional O(n) overhead during recovery.

23

Page 24: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine FT: O(m) data

24

P(i)

Q(i)

F(1)

a1 a2

a3

a4

a5a6a7

a8

a1 a2

a3

a4

a5a6a7

a8

b1

b2b3b4

b5

b1

b2b3b4

b5

HeadA

a2a3 + b1

a4 + b2

a5 + b3

a6 + b4

a7 + b5a8 + b6

a1

HeadB

tailA tailB

g

x

Crucial location

Page 25: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Faults (f=1), O(m)Theorem 3: There exists an algorithm with additional

n+1 backup machines such thatnormal operations : same as replication additional O(m+n) overhead during recovery.

No need to decode F(1)

25

Page 26: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Fault Tolerance: Fusion

26

3 1 4

3 8 4

P(i)

F(1)

1

3 1 4

3 1 4

8 17 43 F(3)

1*3 + 2*1 + 3*41*3+4*1+9*4

5

5

3Single mismatched primary

10

1*3+1*1+1*4

Page 27: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Fault Tolerance: Fusion

27

3 7 4

3 8 4

P(i)

F(1)

1

3 1 4

3 1 4

8 17 43 F(3)

5

5

3Multiple mismatched primary

8

1

Page 28: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Faults (f>1), O(1) data

Theorem 4: Algorithm with additional fn+f state machines for f Byzantine faults with same overhead as replication during normal operations.

28

Page 29: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Liar Detection (f > 1), O(m) data Z := set of all f+1 unfused copiesWhile (not all copies in Z identical) do

w := first location where copies differUse fused copies to find v, the correct value of state[w]Delete unfused copies with state[w] != v

Invariant: Z contains a correct machine.

No need to decode the entire fused state machine!

29

Page 30: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Fusible Structures

Fusible Data Structures[Garg and Ogale, ICDCS 2007, Balasubramanian and Garg

ICDCS 2011]Linked Lists, Stacks, Queues, Hash tablesData structure specific algorithmsPartial Replication for efficient updatesMultiple faults tolerated using Reed-Solomon Coding

Fusible Finite State Machines [Ogale, Balasubramanian, Garg IPDPS 09]Automatic Generation of minimal fused state machines

30

Page 31: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Conclusions

31

Coding Replication Fusion

Crash Faults n+nf n+f

Byzantine Faults n+2nf n+nf+f

Replication: recovery and updates simple, tolerates f faults for each of the primaryFusion: space efficient

Can combine them for tradeoffs

n: the number of different servers

Page 32: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Future Work

Optimal Algorithms for Complex Data StructuresDifferent Fusion OperatorsConcurrent Updates on Backup Structures

32

Page 33: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Thank You!

33

Page 34: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Event Counter: Proof Sketch

34

Page 35: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

ModelThe servers (primary and backups) execute

independently (in parallel)Primaries and backups do not operate in lock-stepEvents/Updates are applied on all the serversAll backups act on the same sequence of events

35

Page 36: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Model contd…Faults:

Fail Stop (crash): Loss of current stateByzantine: Servers can `lie` about their current state

For crash faults, we assume the presence of a failure detector

For Byzantine faults, we provide detection algorithmsInfrequent Faults

36

Page 37: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Faults (f=1), O(m)Theorem 3: There exists an algorithm with additional n+1 backup

machines such thatnormal operations : same as replication additional O(m+n) overhead during recovery.

Proof Sketch:Normal Operation: Responses by P(i) and Q(i), identical Detection: P(i) and Q(i) differ for any response Correction: Use liar detectionO(m) time to determine crucial locationUse F(1) to determine who is correctNo need to decode F(1)

37

Page 38: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Faults (f>1)Proof Sketch:

f copies of each primary state machine and f overall fused machines

Normal Operation: all f+1 unfused copies result in the same output

Case 1: single mismatched primary state machine Use liar detection

Case 2: multiple mismatched primary state machinesUnfused copy with the largest tally is correct

38

Page 39: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Resource Allocation Machine

39

RequestQueue 1

RequestQueue 2

Lock Server 1

Lock Server 2

R1 R2 R3

R1 R2

RequestQueue 3

Lock Server 3

R1R2 R4

R3

Page 40: Tolerating Faults in Distributed Systems Vijay K. Garg Electrical and Computer Engineering The University of Texas at Austin Email: garg@ece.utexas.edu.

Byzantine Fault Tolerance: Fusion

40

13 8 45

13 8 45

66 (f+1)*n + f processes

P(i)

Q(i)

F(1)

11