Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum...

62
Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum...

Page 1: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Lecture 4Introduction to Principles of

Distributed Computing

Sergio RajsbaumMath Institute

UNAM, Mexico

Page 2: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Lecture 4Consensus in partially synchronous systems,

and failure detectors

• Part I: Realistic timing model and metric

• Part II: Failure detectors, algorithms

• Part III: this is the best possible

• Part IV: New directions and extensions

Page 3: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

CONSENSUS A fundamental Abstraction

Each process has an input, should decide an output s.t.

Agreement: correct processes’ decisions are the same

Validity: decision is input of one process

Termination: eventually all correct processes decide

There are at least two possible input values 0 and 1.

all possible vectors over the input values V

Page 4: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

The lecture in a nutshell

• Consensus solvability depends on how long connectivity preserved by a particular model

• In synchronous it is solvable, in asynchronous not. What about intermediate, more realistic models?

X0

L(X0)

L2(X0)

Initial statesstates after one roundstates after

2 rounds

Connectivitypreserved

Connectivitydestroyed

Page 5: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Basic Model

• Message passing (essentially equivalent to read/write shared memory model)

• Channels between every pair of processes

• Crash failurest < n potential failures out of n >1 processes

• No message loss among correct processes

Page 6: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Is consensus solvable?If so, how long does it take to solve it?

• It depends on what exactly the model is• But what is a realistic model?• And what are the common scenarios within the

model? The nature of a distributed system is to include complex combinations of failures and delays

Page 7: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

How Fast Can We Solve Consensus?

Depends on the timing model:• Message delays• Processing times• Clocks

• And on the metric used:• Worst case• Average• etc

Page 8: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

The Rest of This Lecture

• Part I: Realistic timing model and metric

• Part II: Upper bounds

• Part III: this is the best possible

• Part IV: New directions and extensions

Page 9: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Part I: Realistic Timing Model

Page 10: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

First two simple models

Page 11: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Asynchronous Model

• Unbounded message delay, processor speed

Consensus impossible even for t=1 [FLP85]

Page 12: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Round

Synchronous Model

• Algorithm runs in synchronous rounds:

– send messages to any set of processes, – receive messages from previous round, – do local processing (possibly decide, halt)

• If process i crashes in a round, then any subset of the messages i sends in this round can be lost

Page 13: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Synchronous Consensus

• In a run with f failures (f<t)– Processes can decide in f+1 rounds [Lamport Fischer 82; Dolev, Reischuk, Strong 90] (early-deciding)

• 1 round with no failures

• In this talk deciding– halting takes min(f+2,t+1) [Dolev, Reischuk, Strong 90]

Page 14: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

The Middle Ground

Many real networks are neither synchronous nor asynchronous

• During long stable periods, delays and processing times are bounded– Like synchronous model

• Some unstable periods – Like asynchronous model

Page 15: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Partial Synchrony Model [Dwork, Lynch, Stockmeyer 88]

• Processes have clocks (with bounded drift)

• upper bound on message delay

• , upper bound on processing time

• GST, global stabilization time– Until GST, unstable: bounds do not hold– After GST, stable: bounds hold– GST unknown

Page 16: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Partial Synchrony in Practice

• For , , choose bounds that hold with high probability

• Stability forever?– We assume that once stable remains stable

– In practice, has to last “long enough” for given algorithm to terminate

– A commonly used model that alternates between stable and unstable times:

Timed Asynchronous Model [Cristian, Fetzer 98]

Page 17: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Consensus with Partial Synchrony

• Solvable

• requires t < n/2 [DLS88]

Unbounded running time

by [FLP85], because model can be asynchronous for unbounded time

Page 18: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Exercise

• Prove that consensus is not solvable in the partially synchronous model, if t ≥ n/2

• Prove that if t<n/2, it takes unbounded running time to be solved

Page 19: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

In a Practical System

Can we say more than:

consensus will be solved eventually ?

Page 20: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Performance Metric

Number of rounds in well-behaved runs

• Well-behaved: – No failures– Stable from the beginning

• Motivation: common case

Page 21: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

The Rest of This Lecture

• Part II: best known algorithms decide in 2 rounds in well-behaved runs– 2 time (with delay bound , 0 processing time)

• Part III: this is the best possible

• Part IV: new directions and extensions

Page 22: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Part II: Algorithms, and the Failure Detector Abstraction

II.a Failure Detectors and Partial Synchrony

II.b Algorithms

-=

Page 23: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Time-Free Algorithms

• Goal: abstract away time, get simpler algorithms

• We describe the algorithms using failure detector abstraction [Chandra, Toueg 96]

Page 24: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Unreliable Failure Detectors [Chandra, Toueg 96]

• Each process has local failure detector oracle– Typically outputs list of processes suspected to

have crashed at any given time

• Unreliable: failure detector output can be arbitrary for unbounded (finite) prefix of run

Page 25: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Performance of Failure Detector Based Consensus Algorithms

• Implement a failure detector in the partial synchrony model

• Design an algorithm for the failure detector

• Analyze the performance in well-behaved runs of the combined algorithm

Page 26: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

A Natural Failure Detector Implementation

in Partial Synchrony Model

• Implement failure detector using timeouts:– When expecting a message from a process i,

wait clock skew before suspecting i

• In well-behaved runs, always hold, hence no false suspicions

Page 27: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

The resulting failure detector is <>P - Eventually Perfect

• Strong Completeness: From some point on, every faulty process is suspected by every correct process

• Eventual Strong Accuracy: From some point on, every correct process is not suspected

Page 28: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Weakest Failure Detectors for Consensus

• <>S - Eventually Strong– Strong Completeness– Eventual Weak Accuracy: From some point on,

some correct process is not suspected

• - Leader – Outputs one trusted process– From some point, all correct processes trust the

same correct process

Page 29: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

A Simple Implementation

• Use <>P implementation

• Output lowest id non-suspected process

In well-behaved runs: process 1 always trusted

Page 30: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Exercise

• Write the algorithm code for this failure detector and prove it is correct

Page 31: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Relationships among Failure Detector Classes

• <>S is a subset of <>P• <>S is strictly weaker than <>P• <>S ~ [Chandra, Hadzilacos, Toueg 96]

Food for thought: What is the weakest timing model where <>S

and/or are implementable but <>P is not?

Page 32: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Relationships among Failure Detector Classes- Recent Results

Partial Answer: In PODC’03 Aguilera et al present a system with synchronous processes S :

– any number of them may crash, and – only the output links of an unknown correct

process are eventually timely (all other links can be asynchronous and/or lossy)

<>P is not implementable in S, yesNew proof that: <>S is strictly weaker than <>P

Page 33: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Note on the Power of Consensus

• Consensus cannot implement <>P, interactive consistency, atomic commit, …

• So its “universality”, in the sense of – wait-free objects in shared memory [Herlihy 93]

– state machine replication [Lamport 78; Schneider 90]

does not cover sensitivity to failures, timing, etc.

Page 34: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Other Failure Detector Implementations

Food for thought:When is building <>P more costly than <>S or ?

Partial answer: Aguilera at al PODC’03 observe

– any implementation of <>P (even in a perfectly synchronous system) requires all alive processes to send messages forever, while can be implemented such that eventually only the leader sends messages

Page 35: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Other Failure Detector Implementations

• Message efficient <>S implementation [Larrea, Fernández, Arévalo 00]

• QoS tradeoffs between accuracy and completeness [Chen, Toueg, Aguilera 00]

• Leader Election [Aguilera, Delporte, Fauconnier, Toueg 01]

• Adaptive <>P [Fetzer, Raynal, Tronel 01]

rajsbaum
We introduce the notion of stable leader election and derive severalalgorithms for this problem. Roughly speaking, a leader election algorithm is stableif it ensures that once a leader is elected, it remains the leader for as longas it does not crash and its links have been behaving well, irrespective of thebehavior of other processes and links. In addition to being stable, our leaderelection algorithms have several desirable properties. In particular, they are allcommunication-efficient, i.e., they eventually use only n links to carry messages,and they are robust, i.e., they work in systems where only the links to/from somecorrect process are required to be eventually timely. Moreover, our best leaderelection algorithm tolerates message losses, and it ensures that a leader is electedin constant time when the system is stable.We conclude the paper by applying theabove ideas to derive a robust and efficient algorithm for the eventually perfectfailure detector �P.
rajsbaum
QOS: how fast it detects actual failures and how well it avoids false detections. give metrics and anlize systems with probabilistic behaviors.Present an optimal algorithm w.r.t. some of these metrics. give similations.
rajsbaum
The detection of process failures is a crucial problem system designers have to cope with in order to build fault-tolerant distributed platforms. Unfortunately, it is impossible to distinguish with certainty a crashed process from a very slow process in a purely asynchronous distributed system. This prevents some problems to be solved in such systems. That is why failure detector oracles have been introduced to circumvent these impossibility results. This paper presents a relatively simple protocol that allows a process to ``monitor'' another process, and consequently to detect its crash. This protocol enjoys the nice property to rely as much as possible on application messages to do this monitoring. Differently from previous process crash detection protocols, it uses control messages only when no application messages is sent by the monitoring process to the observed process. This protocol has noteworthy features. When the underlying system satisfies the partial synchrony assumption, it actually implements an eventually perfect failure detector (i.e., a failure detector of the class usually denoted $\Diamond {\cal P}$). Moreover, if the upper layer application terminates correctly when the failure detector it uses belongs to $\Diamond {\cal P}$, then, when run with the proposed protocol, it also terminates correctly. These properties make the protocol attractive: it is inexpensive, implementable, and powerful. The paper also describes performance measurements of an implementation of the protocol.
rajsbaum
by Chandra and Toueg [2] as a mechanism that provides informationabout process failures. Depending on the propertiesthe failure detectors guarantee, they proposed a taxonomyof failure detectors. It has been shown that one of theclasses of this taxonomy, namely Eventually Strong (3S), isthe weakest class allowing to solve the Consensus problem.3S. Our algorithm guarantees that eventually all the correctprocesses agree on a common correct process. Thisproperty trivially allows us to provide the accuracy andcompleteness properties required by 3S. We show, then,that our algorithm is better than any other proposed implementationof 3S in terms of the number of messages andthe total amount of information periodically sent. In particular,previous algorithms require to periodically exchangeat least a quadratic amount of information, while ours onlyrequires O(n log n) (where n is the number of processes).
Page 36: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Part II: Algorithms, and the Failure Detector Abstraction

II.a Failure Detectors and Partial Synchrony

II.b Algorithms

Page 37: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Algorithms that Take 2 Rounds in Well-Behaved Runs

• <>S-based [Schiper 97; Hurfin, Raynal 99; Mostefaoui,

Raynal 99]• -based for t < n/3 [Mostefaoui, Raynal 00]

• -based for t < n/2 [Dutta, Guerraoui 01]

• Paxos (optimized version) [Lamport 89; 96]

– Leader-based ()

– Also tolerates omissions, crash recoveries

• COReL - Atomic Broadcast [Keidar, Dolev 96] – Group membership based (<>P)

Page 38: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Of This Laundry List, We Present Two Algorithms

1 <>S-based [MR99]

2 Paxos

Page 39: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

<>S-based Consensus [MR99]• val input v; est null for r =1, 2, … do

coord (r mod n)+1 if I am coord, then send (r,val) to all wait for ( (r, val) from coord OR suspect coord (by <>S))

if receive val from coord then est val else est null

send (r, est) to all wait for (r,est) from n-t processes

if any non-null est received then val estif all ests have same v then send (“decide”, v) to all; return(v)

od• Upon receive (“decide”, v), forward to all, return(v)

1

2

Page 40: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

In Well-Behaved Runs

1 1

2

n

.

.

.

(1, v1)

1

2

n

.

.

.

(1, v1)est = v1

decide v1

Page 41: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

In Case of Omissions

The algorithm can block in case of transient message omissions, waiting for a specific round message that will not arrive

Page 42: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Paxos [Lamport 88; 96; 01]

• Uses failure detector

• Phase 1: prepare– A process who trusts itself tries to become leader– Chooses largest unique (using ids) ballot number– Learns outcome of all smaller ballots

• Phase 2: accept– Leader proposes a value with his ballot number.– Leader gets majority to accept his proposal.– A value accepted by a majority can be decided

Page 43: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Paxos - Variables

• Type Rank– totally ordered set with minimum element r0

• Variables:Rank BallotNum, initially r0

Rank AcceptNum, initially r0

Value {} AcceptVal, initially

Page 44: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Paxos Phase I: Prepare

• Periodically, until decision is reached do:

if leader (by ) thenBallotNum (unique rank > BallotNum)

send (“prepare”, rank) to all

• Upon receive (“prepare”, rank) from iif rank > BallotNum then

BallotNum rank

send (“ack”, rank, AcceptNum, AcceptVal) to i

Page 45: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Paxos Phase II: Accept

Upon receive (“ack”, BallotNum, b, val) from n-tif all vals = then myVal = initial valueelse myVal = received val with highest b send (“accept”, BallotNum, myVal) to all /* proposal */

Upon receive (“accept”, b, v) with b BallotNum

AcceptNum b; AcceptVal v /* accept proposal */send (“accept”, b, v) to all (first time only)

Page 46: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Paxos – Deciding

Upon receive (“accept”, b, v) from n-t

decide v

periodically send (“decide”, v) to all

Upon receive (“decide”, v)

decide v

Page 47: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

In Well-Behaved Runs1 1

2

n

.

.

.

(“accept”,1 ,v1)

1

2

n

.

.

.

1 1

2

n

.

.

.

(“prepare”,1)

(“ack”,1,r0,)

decide v1

(“accept”,1 ,v1)

Our implementation always trusts process 1

Page 48: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Optimization

• Allow process 1 (only!) to skip Phase 1– use rank r0

– propose its own initial value

• Takes 2 rounds in well-behaved runs

• Takes 2 rounds for repeated invocations with the same leader

Page 49: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

What About Message Loss?

• Does not block in case of a lost message– Phase I can start with new rank even if previous

attempts never ended

• But constant omissions can violate liveness• Specify conditional liveness:

If n-t correct processes including the leader can communicate with each other

then they eventually decide

Page 50: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Synchronous Consensus

• In a run with f failures (f<t)– Processes can decide in f+1 rounds – And no less ![Lamport Fischer 82; Dolev, Reischuk, Strong 90] (early-deciding)

• 1 round with no failures

• In this talk deciding– halting takes min(f+2,t+1) [Dolev, Reischuk, Strong 90]

Page 51: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Uniform Consensus

• Uniform agreement: decision of every two processes is the same

Recall: with consensus, only correct processes have to agree (disagreement with the dead is OK)

This version of consensus will be useful to extend the lower bound argument to asynchronous models

Page 52: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Synchronous Uniform Consensus

Every algorithm has a run with f failures (f<t-1), that takes at least

f+2 rounds to decide

• [Charron-Bost, Schiper 00; KR 01]

– as opposed to f+1 for consensus

Page 53: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

A Simple Proof of the Uniform Consensus Synchronous Lower Bound

[Keidar, Rajsbaum IPL 02]

Page 54: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Theorem: f+2 Lower Bound

• Assume n>t, and f < t-1

• Lf(X0) - final states of runs with f failures

– connected

– in any state in Lf(X0) exist at least 3 non-failed processes and 2 can fail

• Take z, z’X0 s.t. val(z) val(z’),

– let x, x’ be failure-free extensions of z, z’: x=z.(i,[0])f Lf(X0)

Page 55: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Exercise

1. Consider Modify the theorem and the proof of this talk for the consensus problem (instead of the uniform consensus problem)

Page 56: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Upper Bounds From Part II

We saw that there are algorithms that take 2 rounds to decide in well-behaved runs

• <>S-based, -based, Paxos, COReL• Presented two of them.

Page 57: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Why are there no 1-Round Algorithms?

There is a lower bound of 2 rounds in well-behaved executions

– Similar bounds shown in [Dwork, Skeen 83; Lamport 00]

• We will show that the bound follows from a similar bound on Uniform Consensus in the synchronous model

Page 58: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Uniform Consensus

• Uniform agreement: decision of every two processes is the same

Recall: with consensus, only correct processes have to agree

Page 59: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

From Consensus to Uniform Consensus

In partial synchrony model, any algorithm A for consensus solves uniform consensus [Guerraoui 95]

Proof: Assume by contradiction that A does not solve uniform consensus– in some run, p,q decide differently, p fails– p may be non-faulty, and may wake up after q

decides

Page 60: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Synchronous Uniform Consensus

Every algorithm has a well-behaved run that takes 2 rounds to decide

• More generally, it has a run with f failures (f<t-1), that takes at least f+2 rounds to decide [Charron-Bost, Schiper 00; KR 01]

– as opposed to f+1 for consensus

Page 61: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006

Bibliography

• Keidar and Rajsbaum, “A Simple Proof of the Uniform Consensus Synchronous Lower Bound,” in IPL, Vol. 85, pp. 47-52, 2003.

• Keidar and Rajsbaum, “On the Cost of Fault-Tolerant Consensus When There Are No Faults” in Keidar’s page, including slides and papers.

• Moses, Rajsbaum, “A Layered Analysis of Consensus,” SIAM J. Comput. 31(4): 989-1021, 2002.

• Mostéfaoui, Rajsbaum, Raynal: Conditions on input vectors for consensus solvability in asynchronous distributed systems. J. ACM, 2003

Page 62: Sergio Rajsbaum 2006 Lecture 4 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.

Sergio Rajsbaum 2006