LEADER ELECTION CS 2711. Election Algorithms Many distributed algorithms need one process to act as...

29
LEADER ELECTION CS 271 1

Transcript of LEADER ELECTION CS 2711. Election Algorithms Many distributed algorithms need one process to act as...

LEADER ELECTION

CS 271 1

Election Algorithms• Many distributed algorithms need one process to

act as coordinator– Doesn’t matter which process does the job, just need

to pick one• Election algorithms: technique to pick a unique

coordinator (aka leader election)• Types of election algorithms: Bully and Ring

algorithms

CS 271 2

Bully Algorithm• Each process has a unique numerical ID• Processes know Ids and address of all other

process• Communication is assumed reliable• Key Idea: select process with highest ID• Process initiates election if it just recovered from

failure or if coordinator failed• 3 message types: election, OK, I won• Processes can initiate elections simultaneously

– Need consistent resultCS 271 3

Bully Algorithm Details• Any process P can initiate an election• P sends Election messages to all process with

higher Ids and awaits OK messages• If no OK messages, P becomes coordinator &

sends I won to all process with lower Ids• If it receives OK, it drops out & waits for I won• If a process receives Election msg, it returns OK

and starts an election• If a process receives I won then sender is

coordinator

CS 271 4

Bully Algorithm Example

a) Process 4 holds an electionb) Process 5 and 6 respond, telling 4 to stopc) Now 5 and 6 each hold an election

CS 271 5

Bully Algorithm Example

d) Process 6 tells 5 to stope) Process 6 wins and tells everyone

CS 271 6

Simple Ring-based Election

• Processes have unique Ids and arranged in a logical ring• Each process knows its neighbors • Select process with highest ID as leader• Begin election if just recovered or coordinator has failed• Send Election to closest downstream node that is alive

– Sequentially poll each successor until a live node is found

• Each process tags its ID on the message• Initiator picks node with highest ID and sends a coordinator

message• Multiple elections can be in progress—no harm.

CS 271 7

Ring Algorithm Example

CS 271

8

Ring Algorithm Example

CS 271

9

Comparison

• Assume n processes and one election in progress

• Bully algorithm– Worst case: initiator is node with lowest ID

• Triggers n-2 elections at higher ranked nodes: O(n2) msgs

– Best case: immediate election: n-2 messages• Ring

– 2 (n-1) messages always

CS 271 10

Highlights of Leader Election

• Basic idea: each process has a unique process-id.

• Once leader is discovered died, elect process with highest (lowest) process-id.

CS 271 11

BROADCAST PROTOCOLS

CS 271 12

Broadcast Protocols

• Why Broadcast protocols?– Data replication– Highly available servers– Cluster management– Distributed logging– ……

• Sometimes, message is received, but delivered later to satisfy some order requirements.

CS 271 13

Ordering properties: FIFO(Cornell)

• Fifo or sender ordered multicast: fbcastMessages are delivered in the order they were sent (by any single sender)

p

q

r

s

a e

CS 271 14

Ordering properties: FIFO

p

q

r

s

a

b c d

e

delivery of c to p is delayed until after b is delivered

CS 271 15

Limitations of FIFO Broadcast

Scenario:• User A broadcasts a message to a mailing list• B delivers that message• B broadcasts reply• C delivers B’s response without A´s original

message• and misinterprets the message

CS 271 16

Ordering properties: Causal• Causal or happens-before ordering: cbcast

If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations

p

q

r

s

a

b

CS 271 17

Ordering properties: Causal

p

q

r

s

a

b cdelivery of c to p is delayed until after b is delivered

CS 271 18

Ordering properties: Causal

p

q

r

s

a

b c

e

delivery of c to p is delayed until after b is deliverede is sent (causally) after b

CS 271 19

Ordering properties: Causal

p

q

r

s

a

b c d

e

delivery of c to p is delayed until after b is delivereddelivery of e to r is delayed until after b&c are delivered

CS 271 20

Limitation of Causal Broadcast

Causal broadcast does not impose any order on unrelated messages.

Two replicas can deliver operations/request in different order.

CS 271 21

Ordering properties: Total• Total or locally total multicast: atomic bcast

Messages are delivered in same order to all recipients (including the sender)

p

q

r

s

a

b c d

e

all deliver a, b, c, d, then e

CS 271 22

Simple Causal broadcast protocol

• Each broadcast message carries all causally preceding messages

• Before delivery, ensure causality by delivering any missed causally preceding messages.

CS 271 23

Isis Causal Broadcast

• Each process maintains a time vector of size n.• Initially VT[i] = 0.• When p sends a new message m: VT[p]++• Each message is piggybacked with VTm which

is the current VT of the sender.• When p delivers a message, p updates its

vector: for k in 1..n:– VTp[k] = max{ VTp[k], VTm[k] }.

CS 271 24

Isis Causal Order

• Requirement for delivery at node j:– VTsender[sender] = VTreceiver[sender]+1

• This is the next message from sender

– VTsender[k] =< VTreceiver[k] for all k not sender• Receiver has received all causally preceding messages

sender recei

ver

VTsenderVTreceiver

CS 271 25

Total order

• Different classes of total order broadcast:– Fixed sequencer – Moving sequencer using Token– Dstributed agreement using Timestamp

CS 271 26

Using Sequencer (Amoeba)

• Delivery algorithm similar to FIFO except for using a special “sequencer” to order messages

• Sender attaches unique id i to each message m and sends <m,i> to the sequencer as well as to all destinations

• Sequencer maintains sequence number S (consecutive and increasing) and broadcast <i, S> to all destinations.

• Message(k) is delivered – if all messages(j) (0 j < k) are received

CS 271 27

Distributed Total Order Protocol (ISIS)

• Processes collectively agree on sequence numbers (priority) in three rounds

• Sender sends message <m, id> to all receivers;• Receivers suggest priority (sequence number) and

reply to sender with proposed priority;• Sender collects all proposed priorities; decides on

final priority (breaking ties with process ids), and resends the agreed final priority for message m

• Receivers deliver message m according to decided final priority

CS 271 28

ISIS algorithm for total ordering

2

1

1

2

2

1 Message

2 Proposed Seq

P2

P3

P1

P4

3 Agreed Seq

3

3

Group g: P1, P2, P3, P4

CS 271 29