EECS 591 DISTRIBUTED SYSTEMS
Transcript of EECS 591 DISTRIBUTED SYSTEMS
![Page 1: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/1.jpg)
EECS 591DISTRIBUTED SYSTEMS
Manos KapritsosFall 2021
Slides by: Lorenzo Alvisi
![Page 2: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/2.jpg)
When all Ack’s have been received: := Commit send Commit to all
I. sends VOTE-REQ to all participants Coordinator Participant
2. sends to Coordinatorif = No then := Abort
halt send Precommit to allelse := Abort send Abort to all who voted Yes
halt
3. if (all votes are Yes) then
4. if received Precommit then send Ack
5. collect Ack from all participants
6. When receives Commit, sets := Commit and halts
3-PHASE COMMIT
![Page 3: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/3.jpg)
TIMEOUT ACTIONS
Step 2: is waiting for VOTE-REQ from the coordinator
Coordinator Participant
Step 3: Coordinator is waiting for vote from participants
Step 4: is waiting for Precommit
Step 5: Coordinator is waiting for Ack’s
Step 6: is waiting for Commit
![Page 4: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/4.jpg)
TIMEOUT ACTIONS
Step 2: is waiting for VOTE-REQ from the coordinator
Same as in 2PC
Coordinator Participant
Step 3: Coordinator is waiting for vote from participants
Step 4: is waiting for Precommit
Step 5: Coordinator is waiting for Ack’s
Step 6: is waiting for Commit
![Page 5: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/5.jpg)
TIMEOUT ACTIONS
Step 2: is waiting for VOTE-REQ from the coordinator
Same as in 2PC
Coordinator Participant
Step 3: Coordinator is waiting for vote from participants
Same as in 2PCStep 4: is waiting for Precommit
Step 5: Coordinator is waiting for Ack’s
Step 6: is waiting for Commit
![Page 6: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/6.jpg)
TIMEOUT ACTIONS
Step 2: is waiting for VOTE-REQ from the coordinator
Same as in 2PC
Coordinator Participant
Step 3: Coordinator is waiting for vote from participants
Same as in 2PCStep 4: is waiting for Precommit
Run termination protocol
Step 5: Coordinator is waiting for Ack’s
Step 6: is waiting for Commit
![Page 7: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/7.jpg)
TIMEOUT ACTIONS
Step 2: is waiting for VOTE-REQ from the coordinator
Same as in 2PC
Coordinator Participant
Step 3: Coordinator is waiting for vote from participants
Same as in 2PCStep 4: is waiting for Precommit
Run termination protocol
Step 5: Coordinator is waiting for Ack’s
Coordinator sends Commit Step 6: is waiting for Commit
![Page 8: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/8.jpg)
TIMEOUT ACTIONS
Step 2: is waiting for VOTE-REQ from the coordinator
Same as in 2PC
Coordinator Participant
Step 3: Coordinator is waiting for vote from participants
Same as in 2PCStep 4: is waiting for Precommit
Run termination protocol
Step 5: Coordinator is waiting for Ack’s
Coordinator sends Commit Step 6: is waiting for Commit
Run termination protocol
![Page 9: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/9.jpg)
TIMEOUT ACTIONS
Step 2: is waiting for VOTE-REQ from the coordinator
Same as in 2PC
Coordinator Participant
Step 3: Coordinator is waiting for vote from participants
Same as in 2PCStep 4: is waiting for Precommit
Run termination protocol
Step 5: Coordinator is waiting for Ack’s
Coordinator sends Commit Step 6: is waiting for Commit
Run termination protocol
Participant knows what they will receive…but the NB property can be violated!
![Page 10: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/10.jpg)
TERMINATION PROTOCOL:PROCESS STATES
At any time while running 3PC, each participantcan be in exactly one of these four states:
Aborted
Uncertain
Pre-committed
Committed
Not voted, voted No, received Abort
Voted Yes but not received Precommit
Received Precommit, not Commit
Received Commit
![Page 11: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/11.jpg)
NOT ALL STATES ARE COMPATIBLE
Aborted
Uncertain
Pre-committed
Committed
Aborted Uncertain Pre-committed Committed
![Page 12: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/12.jpg)
When times out, it starts an election protocol to elect a new coordinator
The new coordinator sends STATE-REQ to all processes that participated in the election
The new coordinator collects the states and follows a set of termination rules
TERMINATION PROTOCOL
![Page 13: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/13.jpg)
to elect a new coordinator
The new coordinator sends STATE-REQ to all processes that participated in the election
The new coordinator collects the states and follows a set of termination rules
TERMINATION PROTOCOL
TR1: if some process decided Abort, thendecide Abortsend Abort to allhalt
TR2: if some process decided Commit, thendecide Commitsend Commit to allhalt
TR3: if all processes that reported state are uncertain, thendecide Abortsend Abort to allhalt
TR4: if some process is pre-committed, but none committed, thensend Precommit to uncertain processeswait for Ack’ssend Commit to allhalt
![Page 14: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/14.jpg)
TERMINATION PROTOCOL AND FAILURES
Processes can fail while executing the termination protocol
if times out on , it can just ignore
if fails, a new coordinator is elected and the protocol is restarted (election protocol to follow)
total failures will need special care
![Page 15: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/15.jpg)
RECOVERING
If fails before sending Yes, decide Abort
If fails after having decided, follow decisionIf fails after voting Yes, but before receiving decision value
asks other processes for help3PC is non-blocking: will receive a response with the decision
If has received Precommit
still needs to ask other processes (cannot just Commit)
No need to log Precommit!(or is there?)
![Page 16: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/16.jpg)
THE ELECTION PROTOCOL
Processes agree on linear ordering (e.g. by pid)Each process maintains a set of all processes that it believes to be operationalWhen detects failure of , it removes from and chooses smallest in to be the new coordinatorIf , then is the new coordinatorOtherwise, sends UR-ELECTED to
![Page 17: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/17.jpg)
TOTAL FAILURE
Suppose that is the first process to recover and that is uncertain. Can decide Abort?
Some process could have decided Commit after crashed!
is blocked until some process recovers such that either can recover independently is the last process to fail: then can simply invoke the termination protocol
![Page 18: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/18.jpg)
DETERMINING THE LAST PROCESS TO FAIL
Suppose a set of processes has recoveredDoes contain the last process to fail?
the last process to fail is in the set of every processso the last process to fail must be in
contains the last process to fail if:
![Page 19: EECS 591 DISTRIBUTED SYSTEMS](https://reader033.fdocuments.in/reader033/viewer/2022050407/627068866331aa648048660c/html5/thumbnails/19.jpg)
ADMINISTRIVIA
I will email you homework #1 later today
Due next Monday 9/27 before class by email to Tony and me
Research project
Declare your team by Oct 1st (by email to me)
Declare your topic by Oct 8th (by email to me)
Not sure what to do? Come talk to me.