© Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are...

69
dit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault- Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002 Tutorial
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are...

Page 1: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

On the Cost of Fault-Tolerant Consensus

When There are no Faults

Idit Keidar and Sergio RajsbaumPODC 2002 Tutorial

Page 2: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

About This Tutorial

• Preliminary version in SIGACT News and MIT Tech Report, June 2001

• More polished lower bound proof to appear in IPL• New version of the tutorial in preparation• The talk includes only a subset of references, sorry

• We include some food for thought

Any suggestions are welcome!

Page 3: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Consensus

Each process has an input, should decide an output s.t.

Agreement: correct processes’ decisions are the same

Validity: decision is input of one process

Termination: eventually all correct processes decide

There are at least two possible input values 0 and 1

Page 4: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Basic Model

• Message passing

• Channels between every pair of processes

• Crash failures– t<n potential failures out of n>1 processes

• No message loss among correct processes

Page 5: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

How Long Does It Take to Solve Consensus?

Depends on the timing model:• Message delays• Processing times• Clocks

• And on the metric used:• Worst case• Average• etc

Page 6: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

The Rest of This Tutorial

• Part I: Realistic timing model and metric

• Part II: Upper bounds

• Part III: Lower bounds

• Part IV: New directions and extensions

Page 7: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Part I: Realistic Timing Model

Page 8: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Asynchronous Model

• Unbounded message delay, processor speed

Consensus impossible even for t=1 [FLP85]

Page 9: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Round

Synchronous Model

• Algorithm runs in synchronous rounds:

– send messages to any set of processes, – receive messages from previous round, – do local processing (possibly decide, halt)

• If process i crashes in a round, then any subset of the messages i sends in this round can be lost

Page 10: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Synchronous Consensus

1 round with no failures

• Consider a run with f failures (f<t)– Processes can decide in f+1 rounds [Lamport Fischer 82; Dolev, Reischuk, Strong 90] (early-deciding)

• In this talk deciding– halting takes min(f+2,t+1) [Dolev, Reischuk, Strong 90]

Page 11: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

The Middle Ground

Many real networks are neither synchronous nor asynchronous

• During long stable periods, delays and processing times are bounded– Like synchronous model

• Some unstable periods – Like asynchronous model

Page 12: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Partial Synchrony Model [Dwork, Lynch, Stockmeyer 88]

• Processes have clocks with bounded drift

• upper bound on message delay

• , upper bound on processing time

• GST, global stabilization time– Until GST, unstable: bounds do not hold– After GST, stable: bounds hold– GST unknown

Page 13: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Partial Synchrony in Practice

• For , , choose bounds that hold with high probability

• Stability forever?– We assume that once stable remains stable

– In practice, has to last “long enough” for given algorithm to terminate

– A commonly used model that alternates between stable and unstable times:

Timed Asynchronous Model [Cristian, Fetzer 98]

Page 14: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Consensus with Partial Synchrony

Unbounded running time

by [FLP85], because model can be asynchronous for unbounded time

• Solvable iff t < n/2 [DLS88]

Page 15: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

In a Practical System

Can we say more than:

consensus will be solved eventually ?

Page 16: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Performance Metric

Number of rounds in well-behaved runs

• Well-behaved: – No failures– Stable from the beginning

• Motivation: common case

Page 17: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

The Rest of This Tutorial

• Part II: best known algorithms decide in 2 rounds in well-behaved runs– 2 time (with delay bound , 0 processing time)

• Part III: this is the best possible

• Part IV: new directions and extensions

Page 18: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Part II: Algorithms, and the Failure Detector Abstraction

II.a Failure Detectors and Partial Synchrony

II.b Algorithms

Page 19: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Time-Free Algorithms

• We describe the algorithms using failure detector abstraction [Chandra, Toueg 96]

• Goal: abstract away time, get simpler algorithms

Page 20: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Unreliable Failure Detectors [Chandra, Toueg 96]

• Each process has local failure detector oracle– Typically outputs list of processes suspected to

have crashed at any given time

• Unreliable: failure detector output can be arbitrary for unbounded (finite) prefix of run

Page 21: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Performance of Failure Detector Based Consensus Algorithms

• Implement a failure detector in the partial synchrony model

• Design an algorithm for the failure detector

• Analyze the performance in well-behaved runs of the combined algorithm

Page 22: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

A Natural Failure Detector Implementation

in Partial Synchrony Model

• Implement failure detector using timeouts:– When expecting a message from a process i,

wait clock skew before suspecting i

• In well-behaved runs, always hold, hence no false suspicions

Page 23: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

The resulting failure detector is <>P - Eventually Perfect

• Strong Completeness: From some point on, every faulty process is suspected by every correct process

• Eventual Strong Accuracy: From some point on, every correct process is not suspected*

*holds in all runs

Page 24: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Weakest Failure Detectors for Consensus

• <>S - Eventually Strong– Strong Completeness– Eventual Weak Accuracy: From some point on,

some correct process is not suspected

• <>- Leader – Outputs one trusted process– From some point, all correct processes trust the

same correct process

Page 25: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Relationships among Failure Detector Classes

• <>S is a subset of <>P• <>S is strictly weaker than <>P• <>S ~ <> [Chandra, Hadzilacos, Toueg 96]

Food for thought: What is the weakest timing model where <>S

and/or <> are implementable but <>P is not?

Page 26: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Note on the Power of Consensus

• Consensus cannot implement <>P, interactive consistency, atomic commit, …

• So its “universality”, in the sense of – wait-free objects in shared memory [Herlihy 93]

– state machine replication [Lamport 78; Schneider 90]

does not cover sensitivity to failures, timing, etc.

Page 27: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

A Natural <> Implementation

• Use <>P implementation

• Output lowest id non-suspected process

In well-behaved runs: process 1 always trusted

Page 28: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Other Failure Detector Implementations

• Message efficient <>S implementation [Larrea, Fernández, Arévalo 00]

• QoS tradeoffs between accuracy and completeness [Chen, Toueg, Aguilera 00]

• Leader Election [Aguilera, Delporte, Fauconnier, Toueg 01]

• Adaptive <>P [Fetzer, Raynal, Tronel 01]

Food for thought:When is building <>P more costly than <>S or <> ?

Page 29: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Part II: Algorithms, and the Failure Detector Abstraction

II.a Failure Detectors and Partial Synchrony

II.b Algorithms

Page 30: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Algorithms that Take 2 Rounds in Well-Behaved Runs

• <>S-based [Schiper 97; Hurfin, Raynal 99; Mostefaoui,

Raynal 99]• <>-based for t < n/3 [Mostefaoui, Raynal 00]

• <>-based for t < n/2 [Dutta, Guerraoui 01]

• Paxos (optimized version) [Lamport 89; 96]

– Leader-based (<>)

– Also tolerates omissions, crash recoveries

• COReL - Atomic Broadcast [Keidar, Dolev 96] – Group membership based (<>P)

Page 31: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Of This Laundry List, We Present Two Algorithms

1 <>S-based [MR99]

2 Paxos

Page 32: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

<>S-based Consensus [MR99]• val input v; est null for r =1, 2, … do

coord (r mod n)+1 if I am coord, then send (r,val) to all wait for ( (r, val) from coord OR suspect coord )

if receive val from coord then est val

send (r, est) to all wait for (r,est) from n-t

if any non-null est received then val estif all ests have same v then send (“decide”, v) to all; return(v)

od• Upon receive (“decide”, v), forward to all, return(v)

1

2

Page 33: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

In Well-Behaved Runs

1 1

2

n

.

.

.

(1, v1)

1

2

n

.

.

.

(1, v1)est = v1

decide v1

Page 34: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

In Case of Omissions

The algorithm can block in case of transient message omissions, waiting for a specific round message that will not arrive

Page 35: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Paxos [Lamport 88; 96; 01]

• Uses <> failure detector

• Phase 1: prepare– A process who trusts itself tries to become leader– Chooses largest unique (using ids) ballot number– Learns outcome of all smaller ballots

• Phase 2: accept– Leader gets majority to accept a value associated

with his ballot number– A value accepted by a majority can be decided

Page 36: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Paxos - Variables

• Type Rank– totally ordered set with minimum element r0

• Variables:Rank BallotNum, initially r0

Rank AcceptNum, initially r0

Value {} AcceptVal, initially

Page 37: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Paxos Phase I: Prepare

• Periodically, until decision is reached do:

if leader (by <>) thenBallotNum (unique rank > BallotNum)

send (“prepare”, rank) to all

• Upon receive (“prepare”, rank) from iif rank > BallotNum then

BallotNum rank

send (“ack”, rank, AcceptNum, AcceptVal) to i

Page 38: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Paxos Phase II: Accept

Upon receive (“ack”, BallotNum, b, val) from n-tif all vals = then myVal = initial valueelse myVal = received val with highest b send (“accept”, BallotNum, myVal) to all

Upon receive (“accept”, b, v) with b BallotNum

AcceptNum b; AcceptVal vsend (“accept”, b, v) to all (first time only)

Page 39: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Paxos – Deciding

Upon receive (“accept”, b, v) from n-t

decide v

periodically send (“decide”, v) to all

Upon receive (“decide”, v)

decide v

Page 40: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

In Well-Behaved Runs1 1

2

n

.

.

.

(“accept”,1 ,v1)

1

2

n

.

.

.

1 1

2

n

.

.

.

(“prepare”,1)

(“ack”,1,r0,)

decide v1

(“accept”,1 ,v1)

Our <> implementation always trusts process 1

Page 41: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Optimization

• Allow process 1 (only!) to skip Phase 1– use rank r0

– propose its own initial value

• Takes 2 rounds in well-behaved runs

• Takes 2 rounds for repeated invocations with the same leader

Page 42: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

What About Omissions?

• Does not block in case of a lost message– Phase I can start with new rank even if previous

attempts never ended

• But constant omissions can violate liveness• Specify conditional liveness:

If n-t correct processes including the leader can communicate with each other

then they eventually decide

Page 43: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Part III: Lower Bounds in Partial Synchrony Model

Page 44: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Upper Bounds From Part II

We saw that there are algorithms that take 2 rounds to decide in well-behaved runs

• <>S-based, <>-based, Paxos, COReL• Presented two of them.

Page 45: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Why are there no 1-Round Algorithms?

There is a lower bound of 2 rounds in well-behaved executions

– Similar bounds shown in [Dwork, Skeen 83; Lamport 00]

• We will show that the bound follows from a similar bound on Uniform Consensus in the synchronous model

Page 46: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Uniform Consensus

• Uniform agreement: decision of every two processes is the same

Recall: with consensus, only correct processes have to agree

Page 47: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

From Consensus to Uniform Consensus

In partial synchrony model, any algorithm A for consensus solves uniform consensus [Guerraoui 95]

Proof: Assume by contradiction that A does not solve uniform consensus– in some run, p,q decide differently, p fails– p may be non-faulty, and may wake up after q

decides

Page 48: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Synchronous Uniform Consensus

Every algorithm has a well-behaved run that takes 2 rounds to decide

• More generally, it has a run with f failures (f<t-1), that takes at least f+2 rounds to decide [Charron-Bost, Schiper 00; KR 01]

– as opposed to f+1 for consensus

Page 49: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

A Simple Proof of the Uniform Consensus Synchronous Lower Bound

[Keidar, Rajsbaum 01]To Appear in IPL

Page 50: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

States

• State = list of processes’ local states

• Given a fixed deterministic algorithm, state at the end of run determined by initial values and environment actions– failures, message loss– can be denoted as:

x . E1. E2. E3

x state, Ei environment actions

Page 51: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Connectivity

States x, x’ are similar, x~x’, if they look the same to all but at most one process

• Set of initial states of consensus is connected

• Intuition: in connected states there cannot be different decisions

000 001 111011~ ~ ~

Page 52: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Coloring

• Impossibility proofs color non-decided states

• Classical coloring: valency, potential decisions state can lead to e.g. [FLP85]

• Our coloring:

val(x) = decision of correct processes in failure-free extension of x (0 or 1)

Page 53: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

To Prove Lower Bounds

• Sufficient to look at subset of runs, called a system

• Simplifies proof

• A set of environment actions defines a system

Page 54: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Considered Environment Actions

• (i, [k]) - i fails, – messages to processes {1,…,k} lost (if sent)– [0] empty set - no loss– applicable if i non-failed and < t failures

• (0, [0]) - no failures – always applicable

Notice: at most one process fails in one round– its messages lost by prefix of processes

Page 55: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Layering

• Layering L = set of environment actions– L(X) = {x.E | x X, E L applicable to x}

– L0(X) = X

– Lk(X) = L(Lk-1(X))

• Define system using layers – X0 set of initial states

– System: all runs obtained from L( . )

[Moses, Rajsbaum 98; Gafni 98; Herlihy, Rajsbaum,Tuttle 98]

X0

L(X0)

L2(X0)

Page 56: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Proof Strategy

• Uniform Lemma: from connected set, under some conditions, 2 more rounds needed for uniform consensus (recall: 1 for consensus)

• The initial states are connected.

In general: for f<t+1, Lf(X0) connected (will not be shown)

– feature of model, not of the problem– also implies consensus f+1 lower bound– can be proven for all Li(X0) in other models, e.g.,

mobile failure model [MosesR98], [Santoro,Widemayer89], and asynchronous model

Page 57: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Uniform Lemma

• If– X connected x,x’X, s.t. val(x)= 0, val(x’)=1– In all states in X exist at least 3 non-failed

processes and 2 can fail

• Then yX s.t. in y.(0,[0]) not all decide

1-round failure-free extension of y

Page 58: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Uniform Lemma: Proof

• Assume, by contradiction, in failure-free extensions of y, y’, all decide after 1 round

• 2 cases: j either failed or non-failed

y’yx x’......

• X connected, val(x)= 0, val(x’)=1

differ only in state of some j

Page 59: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Illustrating the Contradiction Case 1: j is correct

y y’

y.(0,[0]) y’.(0,[0])

X

y y’

Xy.(1,[2]) y’.(1,[2])

X X X X

y.(1,[2]).(3,[3]) y.(1,[2]).(3,[3])

A contradiction to uniform agreement!

val(y)=0, so y leads to decision 0

in one failure-free round

look the same to process 2

look the same to process 3

Page 60: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

The uniform consensus synchronous lower bound

• n >2, t >1, f =0

• X0 = {initial failure-free states} connected

x’,xX0 s.t. val(x)=0, val(x’)=1 (validity)

• By Uniform Lemma, from some initial state need 2 rounds to decide

Page 61: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Part IV: Extensions and New Directions

Page 62: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Less Than Well-Behaved Runs

We now discuss stable runs with failures– only initial failures, and after that run is “well-

behaved”– the run is synchronous but there are f failures

Food for thought:

What if the run is unstable and then becomes stable?

Page 63: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Stable Runs with Initial Failures Only

• Algorithms we showed here:– [MR99] can take t+2 rounds– Paxos takes 4 rounds after leader is known

• Other results: – If n > 3t, exists <>-based algorithm that

terminates in 2 rounds [Mostefaoui, Raynal 00]

– If n < 3t+1, tight lower bound of 3 rounds [Dutta, Guerraoui, Keidar WIP; Lamport WIP]

Page 64: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Stable Runs with f Failures

• When f<t-1, we showed lower bound of f+2

• Algorithms we showed here can take 2f+2 in runs with f failures

• [Dutta, Guerraoui ]– extend lower bound to cover also f=t-1 and f=t– show algorithm that takes t+2

Food for thought:

Algorithm that takes f+2 rounds for every f <=t ?

Page 65: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Byzantine Failures

• Byzantine variants of Paxos [Castro Liskov 99; Chockler, Malkhi, Reiter 01]

• Need n>3t, even with authentication [DLS88]

– as opposed to synchronous case

• New result: same upper and lower bounds as in crash failure model (but with more processes) [Lamport WIP]

Page 66: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

Weakness of the “Rounds”Performance Metric

• Over the Internet, time to complete a round can depend heavily on number of messages sent and other factors [Bakr, Keidar ]

– Reason: depends on delay distribution

Food for thought:

Better performance metrics?

Page 67: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

More Food for Thought

• Shared memory models– Good definitions of “well-behaved executions”?

Including factors such as contention E.g., [Alur, Attiya, Taubenfeld 97]

– Generic reductions from message passing bounds to bounds in different shared memory models

• Infinitely many processes – Paxos in shared memory with infinitely many

processes [Chockler, Malkhi ]

Page 68: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

More Food for Thought Yet

• Other problems– Atomic Commit – 3 rounds? – mutual exclusion, etc.

• Proof techniques – Layered analysis for partial synchrony models– Improve timing bounds for such models

• E.g., consensus [Attiya, Dwork, Lynch, Stockmeyer 94]

• Wait-free k-set agreement [Herlihy, Rajsbaum, Tuttle 98]

Page 69: © Idit Keidar and Sergio Rajsbaum; PODC 2002 On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar and Sergio Rajsbaum PODC 2002.

© Idit Keidar and Sergio Rajsbaum; PODC 2002

On an Optimistic Note

• Consensus requires 2 rounds in partial synchrony model because of false suspicions

• Optimistic approach: use 1-round algorithm– correct while there are no false suspicions– upon false suspicions, reconcile or rollback – E.g., Group Communication Horus, Amoeba, ...

Atomic Commit [Jimenez-Peris, Patino-Martinez, Alonso, Arevalo 01]