Post on 30-Sep-2015
description
Lecture 12 Page 1 CS 188,Winter 2015
Agreement in Distributed Systems CS 188
Distributed Systems February 19, 2015
Lecture 12 Page 2 CS 188,Winter 2015
Introduction
We frequently want to get a set of nodes in a distributed system to agree
Commitment protocols and mutual exclusion are particular cases
The approaches we discussed for those work in limited situations
In general, when can we reach agreement in a distributed system?
Lecture 12 Page 3 CS 188,Winter 2015
Basics of Agreement Protocols
What is agreement? What are the necessary conditions for
agreement?
Lecture 12 Page 4 CS 188,Winter 2015
What Do We Mean By Agreement?
In simplest case, can n processors agree that a variable takes on value 0 or 1? Only non-faulty processors need
agree More complex agreements can be built
from this simple agreement
Lecture 12 Page 5 CS 188,Winter 2015
Conditions for Agreement Protocols
Consistency All participants agree on same value
and decisions are final Validity
Participants agree on a value at least one of them wanted
Termination/Progress All participants choose a value in a
finite number of steps
Lecture 12 Page 6 CS 188,Winter 2015
Challenges to Agreement Delays
In message delivery In nodes responding to messages
Failures And recovery from failures
Lies by participants Or innocent errors that have similar
effects
Lecture 12 Page 7 CS 188,Winter 2015
Failures and Agreement
Failures make agreement difficult Failed nodes dont participate Failed nodes sometimes recover at
inconvenient times At worst, failed nodes participate in
harmful ways Real failures are worse than fail-stop
Lecture 12 Page 8 CS 188,Winter 2015
Types of Failures
Fail-stop A nice, clean failure Processor stops executing anything
Realistic failures Partitionings Arbitrary delays
Adversarial failures Arbitrary bad things happen
Lecture 12 Page 9 CS 188,Winter 2015
Election Algorithms
If you get everyone to agree a particular node is in charge,
Future consensus is easy, since he makes the decisions
How do you determine whos in charge? Statically Dynamically
Lecture 12 Page 10 CS 188,Winter 2015
Static Leader Selection Methods
Predefine one process/node as the leader
Simple Everyone always knows whos the
leader Not very resilient
If the leader fails, then what?
Lecture 12 Page 11 CS 188,Winter 2015
Dynamic Leader Selection Methods
Choose a new leader dynamically whenever necessary
More complicated But failure of a leader is easy to handle
Just elect a new one Election doesnt imply voting
Not necessarily majority-based
Lecture 12 Page 12 CS 188,Winter 2015
Election Algorithms vs. Mutual Exclusion Algorithms
Most mutual exclusion algorithms dont care much about failures
Election algorithms are designed to handle failures
Also, mutual exclusion algorithms only need a winner
Election algorithms need everyone to know who won
Lecture 12 Page 13 CS 188,Winter 2015
A Typical Use of Election Algorithms
A group of processes wants to periodically take a distributed snapshot
They dont want multiple simultaneous snapshots
So they want one leader to order them to take the snapshot
Lecture 12 Page 14 CS 188,Winter 2015
Problems in Election Algorithms
Some of the nodes may have failed before the algorithm starts
Some of the nodes may fail during the algorithm
Some nodes may recover from failure Possible at inconvenient times
What about partitions?
Lecture 12 Page 15 CS 188,Winter 2015
Election Algorithms and the Real Work
The election algorithm is usually overhead Theres a real computation you want to
perform The election algorithm chooses someone to
lead it Having two leaders while real computation
is going on is bad
Lecture 12 Page 16 CS 188,Winter 2015
The Bully Algorithm
The biggest kid on the block gets to be the leader
But what if the biggest kid on the block is taking his piano lesson?
The next biggest kid gets to be leader Until the piano lesson is over . . .
Lecture 12 Page 17 CS 188,Winter 2015
Electing a Bully The kids come out to play
Hey, Spike!
Spikes Mom hasnt let him out yet
Hey, Butch!
Im here, who else is? Peewee! Cuthbert!
Im the leader, lets play tag!
The piano lesson ends
Cuthbert Peewee! Butch! Im the leader, and were playing
baseball!
Hey, Spike! Hey,
Spike! Im here, where are you sissies?
Lecture 12 Page 18 CS 188,Winter 2015
Assumptions of the Bully Algorithm
A static set of possible participants With an agreed-upon order
All messages are delivered with Tm seconds All responses are sent within Tp seconds of
delivery These last two imply synchronous behavior
Lecture 12 Page 19 CS 188,Winter 2015
The Basic Idea Behind the Bully Algorithm
Possible leaders try to take over If they detect a better leader, they agree
to its leadership Keep track of state information about
whether you are electing a leader Only do real work when you agree on a
leader
Lecture 12 Page 20 CS 188,Winter 2015
The Bully Algorithm and Timeouts
Call out the biggest kids name If he doesnt answer soon enough,
call out the next biggest kids name Until you hear an answer Or the caller is the biggest kid Then take over, by telling everyone
else youre the leader
Lecture 12 Page 21 CS 188,Winter 2015
The Bully Algorithm At Work
One node is currently the coordinator It expects a certain set of nodes to be up and
participating The coordinator asks all other nodes If an expected node doesnt answer, start an
election Also if it answers in the negative
If an unexpected node answers, start an election
Lecture 12 Page 22 CS 188,Winter 2015
The Practicality of the Bully Algorithm
The bully algorithm works reasonably well if the timeouts are effective A timeout occurring really means the
site in question is down And there are no partitions at all
If there are, what happens?
Lecture 12 Page 23 CS 188,Winter 2015
The Invitation Algorithm
More practical than bully algorithm Doesnt depend on timeouts
But its results are not as definitive An asynchronous algorithm
Lecture 12 Page 24 CS 188,Winter 2015
The Basic Idea Behind the Invitation Algorithm
A current coordinator tries to get all other nodes to agree to his leadership
If more than one coordinator around, get together and merge groups
Use timeouts only to allow progress, not to make definitive decisions
No set priorities for who will be coordinator
Lecture 12 Page 25 CS 188,Winter 2015
The Invitation Algorithm and Group Numbers
The invitation algorithm recruits a group of nodes to work together More than one group can exist
simultaneously Group numbers identify the group Why not identify with coordinator ID?
Because one node can serially coordinate many groups
Lecture 12 Page 26 CS 188,Winter 2015
The Basic Operation of the Invitation Algorithm
Coordinators in a normal state periodically check all other nodes
If any other node is a coordinator, try to merge the groups
If timeouts occur, dont worry about it Also dont worry if a response to
check comes from this or earlier request
Lecture 12 Page 27 CS 188,Winter 2015
Merging in the Invitation Algorithm
Merging always requires forming new group May have same coordinator, but
different group number Coordinator who initiates merge asks
all other known coordinators to merge They ask their group members Original group members also asked
Lecture 12 Page 28 CS 188,Winter 2015
A Simplified Example
1
1
1
2
3
3
3
4
Node 1 checks for other
coordinator
AreYouCoordinator?
AreYouCoordinator?
Yes
No
So node 1 finds another coordinator Node 1 asks the other coordinator and his old node to join his group
Invite
Invite
Invite on behalf of node
1
1 1
1 1
Accept
Accept
UP ={1,2,3,4}
Ready
Ready
If all members of UP{} respond, were fine Node 1 forms a new group
Lecture 12 Page 29 CS 188,Winter 2015
The Reorganization State Nodes enter the reorganization state
after getting their answer Whats the point of this state?
Why not just start up the group? After all, we all know whos going
to be a member Or do we?
Lecture 12 Page 30 CS 188,Winter 2015
Why We Need Another Round of Messages
1
2
3
4
1 1
1 1
Invitation
Invitation
Who does 1 think will join the group, at this point? 2 and 3
Invitation
Assuming no timeouts, 4 will also join And 2 needs to know that And what if someone crashes? Presumably not accepting the invitation?
Lecture 12 Page 31 CS 188,Winter 2015
Timeouts in the Merge
Dont worry too much about them Some nodes respond before the timeout
Some dont If you dont catch them this time, you
might the next
Lecture 12 Page 32 CS 188,Winter 2015
Straggler Messages
This algorithm is asynchronous So messages may come in late
What do we do when messages arrive late?
Mostly, reject them How do we tell?
Messages contain group number
Lecture 12 Page 33 CS 188,Winter 2015
Multiple Simultaneous Groups
The invitation algorithm allows multiple simultaneous groups to exist Each with a proper coordinator
Is this a good thing? No, but what are the alternatives?
No node ever belongs to more than one group, at least
Lecture 12 Page 34 CS 188,Winter 2015
Paxos A family of algorithms that allow a
distributed system to reach agreement In the face of delays and failures Cant perfectly guarantee progress
But makes progress in realistic conditions Does guarantee consistency Usually defined to reach consensus on some
value v
Lecture 12 Page 35 CS 188,Winter 2015
Paxos Assumptions Processors are of variable speed and may
fail Might recover after failure But they dont lie
Any processor can send a message to any other processor
Messages can be lost, arbitrarily delayed, reordered, or duplicated But never corrupted
Lecture 12 Page 36 CS 188,Winter 2015
Paxos Processor Roles Client
Issues a request, waits for a response Acceptor/voter
Remembers things for the protocol Proposer (simpler if theres only one)
Assists client in getting a response Learner
Actually executes a request Leader
One of the proposers that leads the process One processor can play several roles
Usually, all processes are acceptors, proposers, and learners
Lecture 12 Page 37 CS 188,Winter 2015
Paxos Quorums Collections of acceptors that make decisions
Several different quorums in system Messages are sent to quorums, not single
acceptors Messages only effective if all quorum members
receive it Similarly, all acceptors in a quorum must send
a message for to be effective If any member of the quorum survives, its
decisions survive
Lecture 12 Page 38 CS 188,Winter 2015
Quorum Membership All quorums must contain a majority of
all acceptors in the system Any two quorums must share at least
one acceptor E.g., if there are four acceptors
{1,2,3,4}, quorums might be: {1,2,3}, {1,2,4}, {2,3,4}, {1,3,4}
Lecture 12 Page 39 CS 188,Winter 2015
Paxos Rounds
Paxos proceeds in rounds In response to a client request If the round reaches agreement, the
client gets a response If not, you start another round Continue till a round reaches
agreement
Lecture 12 Page 40 CS 188,Winter 2015
A Simple Paxos Round
C P
A1
A2
A3
L1
L2
1. request
2. prepare(N)
3. promise(N,null) 3. promise(N,null) 3. promise(N, null)
4. accept(N,Vres)
5. accepted(N,Vmax)
Vres is a result chosen by P, if
no promise had a value
6. response
N is a bigger number than P has ever used or seen
before
If an acceptor ever promised on this
item before, it returns the
generation and value from that run of Paxos, not null
Lecture 12 Page 41 CS 188,Winter 2015
The Point of Different Paxos Roles
C P
A1
A2
A3
L1
L2
The client wants to get
something done
The proposer coordinates
protocol activities
The acceptors ensure proper concurrent
behavior and handle proposer failures
The learners ensure redundant memory of the
result of a decision Remember!
One machine can play multiple roles
Lecture 12 Page 42 CS 188,Winter 2015
Paxos Error Handling Some cases simple, some complex A simple case:
One of the acceptors fails If theres still a quorum, no problem Go ahead without him
Another simple case: One of the learners failed If any learners are left, theyll provide the
right response to the client
Lecture 12 Page 43 CS 188,Winter 2015
More Complex Error Cases
Things like failure of proposer in middle of a round
Paxos chooses a new leader and uses him from this point
What if old leader comes back? Even more complex, but it works out
Lecture 12 Page 44 CS 188,Winter 2015
Paxos and Overheads
Generally quite expensive In messages and thus delays
Many optimizations possible Some dont alter the protocol
characteristics Some trade off handling some error
conditions for better performance
Lecture 12 Page 45 CS 188,Winter 2015
Byzantine Agreement
Life can be a lot worse than merely being unable to rely on timeouts
What if one of the nodes were working with is lying?
How can we reach agreement if we cant trust all the participants?
Lecture 12 Page 46 CS 188,Winter 2015
The Purpose of Byzantine Agreement
Well, why would one of our distributed system components lie?
It probably wouldnt But it might contain a bug If it contains the worst possible bug,
what can it do? Essentially, inadvertently lie
Lecture 12 Page 47 CS 188,Winter 2015
The Realism of Byzantine Agreement
It isnt realistic It doesnt really happen No one really uses it But it demonstrates a limit on how
badly things can go while still allowing agreement
Lecture 12 Page 48 CS 188,Winter 2015
Why Is It Called Byzantine?
After the fall of Rome itself, the empire lived on in the east Called Byzantium
Byzantium survived for around 1000 years
The Byzantines were famous for their treachery and double-dealing
Lecture 12 Page 49 CS 188,Winter 2015
The Byzantine General Problem Several Byzantine generals each command
their own army They are far apart and communicate with
messengers The emperor wants to attack the Turks If all generals attack, theyll win
Even if a majority attack, theyll win Retreating is OK, if everyone does it
But the Turks may have bribed some generals
Lecture 12 Page 50 CS 188,Winter 2015
The Complete Problem Statement Messages are point-to-point Messages are reliably delivered, with a
predictable timeout Failure to receive message in time
means sender is a traitor Traitors can send any messages they
please But cannot forge their identities
Lecture 12 Page 51 CS 188,Winter 2015
How Many Traitors Is Too Many?
Can all the loyal generals reach agreement on whether to attack or retreat?
Or can the traitors prevent them from reaching any agreement?
How many generals must the Turks bribe before no agreement is possible?
Lecture 12 Page 52 CS 188,Winter 2015
The Answer
If the Turks bribe 1/3 of the generals, the remaining 2/3s cannot reach agreement
How can that be? Why not just a majority? Easiest to consider in the case of a
commander
Lecture 12 Page 53 CS 188,Winter 2015
The 3-General Byzantine Problem
Commander
What if theyre all loyal?
Attack Attack
Everyone attacks and the Turk is vanquished
But what if the commander is a traitor?
Attack Retreat
One general attacks, one retreats, the traitor pockets the bribe, and the Turks win
Lecture 12 Page 54 CS 188,Winter 2015
Cant the Loyal Generals Check Their Orders?
Commander
Attack Retreat
1
2 3
Generals 2 and 3 check their orders
Retreat
Attack
They figure out 1 is a traitor and come to their own agreement
Lecture 12 Page 55 CS 188,Winter 2015
But What if the Commander Wasnt the Traitor?
Commander
Attack Attack
1
2 3
3 is the traitor, this time Generals 2 and 3 check their orders
Retreat
Attack
They figure out 1 is a traitor and come to their own agreement But 1 isnt the traitor, 3 is the traitor He convinces 2 to retreat, 1 is slaughtered attacking, and 3 pockets the bribe
Lecture 12 Page 56 CS 188,Winter 2015
Can General 2 Tell Which Scenario Is Occurring?
When 1 was the traitor, 2 saw: When 3 was the traitor, 2 saw:
1
2 3 Retreat
Attack 1
2 3 Retreat
Attack
2 cant tell the difference, so he cant decide whether to attack or retreat
Lecture 12 Page 57 CS 188,Winter 2015
What If There Were 4 Generals?
1
2
Commander
3 4
What if the commander (1) is the traitor? If he doesnt send some messages, hell be seen as the traitor But what can he send?
Attack Attack Retreat
Lecture 12 Page 58 CS 188,Winter 2015
Can the Three Loyal Generals Reach Agreement?
1
2
Commander
3 4
Attack Attack Retreat
They can exchange all the messages and let the majority rule Since there are only two messages, the commander must have sent the same message to two nodes If the commander is loyal and someone else is lying, the majority represents the loyal commanders will
Lecture 12 Page 59 CS 188,Winter 2015
But What if There Were Five Generals?
1 Commander
2 3 4 Attack Attack Retreat
5 Retreat
Pre-arrange a tie-breaker E.g., always retreat on ties All the loyal generals then retreat And the traitor must explain his failure to the Turks
Lecture 12 Page 60 CS 188,Winter 2015
What If You Dont Want a Commander?
What if you want everyone to vote? And accept the majority?
With the guarantee that all loyal nodes abide by the majority?
Serially treat each node as the commander Reach agreement on his vote Then move on to the next node
Lecture 12 Page 61 CS 188,Winter 2015
The Trick Behind Byzantine Agreement
Everyone must know what everyone else thinks about everything else
Not just what I think the commander said, but what everyone else claims the commander said
Resulting algorithms are tricky and expensive But it could be (and will be) worse
Lecture 12 Page 62 CS 188,Winter 2015
Authenticated Byzantine Agreement
What if the messages are signed in an unforgeable way?
Then dishonest generals cant lie about what honest general told them
In this case, honest generals reach agreement regardless of how many are dishonest