Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel...

Efficient Algorithms to ImplementFailure Detectors and Solve Consensus

in Distributed Systems

Mikel Larrea

Departamento de Arquitectura yTecnología de Computadores

UPV / EHU

2

Contents

• Introduction and system model• Implementation of failure detectors

– Ring based algorithms– Heartbeat based optimal ◊S

• Impossibility result• Eventually consistent failure detectors (◊C)• Solving Consensus using ◊C

3

Introduction and system model

• A distributed system is synchronous if:– there is a known upper bound on the transmission delay of

messages– there is a known upper bound on the processing time of a

piece of code

• A distributed system is asynchronous if:– there is no bound on the transmission delay of messages– there is no bound on the processing time of a piece of code

4


• A distributed system is partially synchronous if:– there is an unknown upper bound on the transmission delay

of messages– there is an unknown upper bound on the processing time of

a piece of code

• Real distributed systems (e.g., the Internet):– synchronous? asynchronous? partially synchronous?

• The Consensus problem:– a set of processes must reach a common decision, which

must be one of the proposed values, despite failures

5


• FLP Impossibility result (Fischer, Lynch, and Paterson): Consensus cannot be solved deterministically in an asynchronous system subject to even a single process crash

• Possibility result (Chandra & Toueg): Consensus can be solved in an asynchronous system subject to failures with an unreliable failure detector– obviously, such failure detector cannot be implemented in an

asynchronous system!– but it can be implemented in a partially synchronous system

6

Motivation

Unreliable Failure

Detector

Unreliable Failure

Detector

Process

Consensus

Process

Consensus

asynchronous network

part. synchronous network

7


• The implementation of an unreliable failure detector proposed by Chandra and Toueg has a quadratic complexity in the number of messages

• We have proposed several implementations with a linear complexity

• We have shown the impossibility of implementing several classes of unreliable failure detectors in partially synchronous systems

• We have proposed a new class of unreliable failure detectors which allows to solve Consensus more efficiently

8


• Unreliable Failure Detector: distributed oracle that provides (possibly incorrect) hints about the operational status of other processes

• Abstractly characterized in terms of two properties: completeness and accuracy– Completeness characterizes the degree to which failed

processes are suspected by correct processes– Accuracy characterizes the degree to which correct

processes are not suspected, i.e., restricts the false suspicions that a failure detector can make

9


10


• System model:– partially synchronous distributed system

– finite set of processes = {p1, p2, ..., pn}

– crash failure model (no recovery). A process is correct if it never crashes

– communication only by message-passing (no shared memory)

– reliable channel connecting every pair of processes (fully connected system)

11


• Chandra-Toueg’s implementation of P:– each process periodically sends an I-AM-ALIVE message to

all the processes– upon timeout, suspect. If, later on, a message from a

suspected process is received, then stop suspecting it and increase its timeout period

• Performance analysis (n processes, C correct):– Number of messages sent in a period: n2 (eventually nC)– Size of messages: (log n) bits– Amount of information exchanged in a period: (n2 log n) bits

12


• Solving Consensus using an unreliable failure detector:– algorithms based on the rotating coordinator paradigm– current coordinator decides if “things go well”– the rest of processes (participants) communicate with the

coordinator. If a participant suspects that the coordinator has crashed, it advances to the next round

– eventually, nobody suspects some coordinator, which takes a decision

13

Implementation of failure detectors

• We propose more efficient implementations of W, Q, S, and P:– processes arranged into a logical ring– polling (i.e., interrogation) strategy

• ARE-YOU-ALIVE? + I-AM-ALIVE!

– communication pattern: one-to-one

• Modular approach:– basic algorithm providing only weak completeness– extensions providing accuracy and strong completeness

14


Weak Completeness

15


– Weak completeness: each process starts monitoring its successor in the ring. Upon timeout, suspect and monitor the next process. If, later on, a message from a suspected process is received, then stop suspecting it and take it as successor again

W: take a first common candidate, and increase timeouts only with respect to this candidate and its successors

Q: increase timeouts with respect to all processes

S, P: propagate the information about suspicions

16


• Performance analysis: n processes, C correct– Number of messages sent in a period: 2n (eventually 2C)– Size of messages: (log n) bits for W and Q, (n) bits for

S and P (messages carry a list of suspected processes)– Amount of information exchanged in a period: (n log n) bits

for W and Q, (n2) bits for S and P

• Better performance than Chandra-Toueg’s algorithm

• Drawback: latency of failure information propagation in the case of S and P

17


• We also propose an optimal implementation of S, the weakest failure detector for solving Consensus:– processes ordered: p1, ..., pn

– heartbeat strategy– communication pattern: one-to-successors– based on a trusted process (instead of a list of suspected

processes)

18


i) Initially, p1 starts sending messages periodically to the rest of processes, and all processes trust p1

p2p1 p5p4p3

trusted1 = p1 trusted2 = p1 trusted3 = p1 trusted4 = p1 trusted5 = p1

19


ii) If a process does not receive a message within some timeout period from its trusted process pi, then it suspects pi and takes the next process pi+1 as its new trusted process

p2p1 p5p4

trusted1 = p1 trusted2 = p1 trusted3 = p1 timeout on p1

trusted4 = p2

trusted5 = p1

p3

20


iii) If a process trusts itself, then it starts sending messages periodically to its successors

p2p1 p5p4

trusted1 = p1 trusted3 = p1 trusted4 = p2 trusted5 = p1

p3

timeout on p1

trusted2 = p2

21


iv)If a process receives a message from a process pi preceding its trusted process, then it will trust pi again, increasing its timeout period with respect to pi

p2p1 p5

trusted1 = p1 message from p1

trusted2 = p1

timeout_period21++

trusted3 = p2 message from p1

trusted4 = p1

timeout_period41++

trusted5 = p1

p3 p4

22


• Lemma. With the previous algorithm, eventually all the correct processes will permanently trust the first correct process in p1, ..., pn

• This property trivially allows us to provide the properties of S:– Eventual weak accuracy: by not suspecting the trusted

process– Strong completeness: by suspecting all the processes

except the trusted process

23


• Performance analysis: n processes, C correct– Number of messages sent in a period: n-1– Size of messages: (log n) bits– Amount of information exchanged in a period: (n log n) bits

• Better performance than previous algorithms

• Apparent drawback: big loss of accuracy, since all processes except one are systematically suspected. As it will be shown, this can be successfully exploited

24


• Eventual monitoring degree: number of pairs of correct processes that will infinitely often communicate– Chandra-Toueg’s algorithm: C2

– ring algorithms: 2C– ordered-heartbeat algorithm: C-1

• Lemma. Any algorithm implementing W requires an eventual monitoring degree of at least C-1. Hence, the ordered-heartbeat algorithm is optimal

25

Impossibility result

• Failure detectors with perpetual accuracy, i.e., P, Q, S, and W, cannot be implemented in a partially synchronous distributed system

• It would be sufficient to show the impossibility for class S, because– classes W and S are equivalent (Chandra and Toueg)– Q and P are strictly stronger than W and S, respectively (Q

and P are subclasses of W and S, respectively)

26

Impossibility result

• Idea of the proof: impossibility to satisfy both the completeness and the accuracy properties– in order to satisfy strong completeness, it is impossible to

avoid the incorrect suspicion of correct processes, violating weak accuracy

– we consider several runs of the system, with and without failures, such that they look identical to some correct processes up to certain time t. Being indistinguishable, the processes take the same actions in all runs up to time t, in particular in what concerns the suspicion of other processes

– we show a scenario in which every correct process is incorrectly suspected at least once, violating weak accuracy

27

Eventually consistent failure detectors

• The Eventually Consistent failure detector class (C) satisfies strong completeness and eventual consistent accuracy, defined as follows:– there is a correct process p that is eventually and

permanently not suspected by any correct process, and there is a function that each correct process can apply to the set of processes not suspected by its local failure detector module that eventually and permanently returns p

C enhances classical failure detectors with an eventual leader election mechanism

28


P is a subclass of C C is a subclass of S• Theorem. C and S are equivalent classes

29


• Implementations of C:– Any implementation of P implements also C– Any implementation of S can be transformed into C– The ring algorithm implementing S implements also C:

take as leader the first non-suspected process starting from the initial candidate

– The ordered-heartbeat algorithm implementing S implements also C: take as leader the trusted process

• Thus, C can be implemented as efficiently as S

30


• Any Consensus algorithm based on a failure detector of class S is also correct with a failure detector of class C

• We propose a Consensus algorithm based on C:– it does not rely on the rotating coordinator paradigm, but on

the eventual leader election mechanism of C– it is more efficient than existing S-Consensus algorithms in

the number of rounds needed to solve Consensus

31


• Solving Consensus using C:– The algorithm executes in asynchronous rounds– The algorithm goes through three asynchronous epochs,

each of which may span several rounds. In the first epoch, several decision values are possible. In the second epoch, a value gets locked: no other decision value is possible. In the third epoch, processes decide the locked value

– Each round is divided into five asynchronous phases– If the failure detector is stable, i.e., the leader function

converges, Consensus is reached in one round

32


• Phases of a round of C-Consensus:– Phase 0: every process determines its coordinator for the

round– Phase 1: every process sends its estimate to its coordinator– Phase 2: each coordinator tries to gather a majority of

estimates. If it succeeds, then it sends a proposition– Phase 3: every process waits for the proposition of a

coordinator. If a proposition is received, then it adopts it and replies with an ack; otherwise, it sends a nack

– Phase 4: the coordinator that sent a proposition in Phase 2 (if any) tries to gather a majority of acks. If it succeeds, then it decides and broadcasts the decision

33


S-Consensus vs. C-Consensus:– All the S-Consensus algorithms we are aware of rely on the

rotating coordinator paradigm. Hence, once the failure detector is stable, the algorithm may require O(n) rounds to solve Consensus (until the correct process not suspected by any correct process becomes coordinator)

– In our C-Consensus algorithm, once the failure detector is stable, i.e., the leader function converges, Consensus is solved in only one round (by means of the leader election mechanism, all correct processes select the same correct process as their coordinator for that round)

34

Conclusions

• Future directions and open questions:

– Consider the recovery of processes

– Consider a dynamic set of processes

– Other applications of C

– What is the minimal synchronism needed to implement perpetual failure detectors?

Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel...

Documents

Transcript of Efficient Algorithms to Implement Failure Detectors and Solve Consensus in Distributed Systems Mikel...