Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4:...

65
IPD, Forschungsbereich Systeme der Informationsverwaltung Lecture Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann [email protected]

Transcript of Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4:...

Page 1: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

IPD, Forschungsbereich Systeme der Informationsverwaltung

Lecture

Distributed Data Management

Chapter 4: Distributed Transactions(Second Part)

Erik [email protected]

Page 2: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

IPD, Forschungsbereich Systeme der Informationsverwaltung

2PC Variants

Page 3: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 3

Linear Two-Phase Commit (1)

● Commit processing sequentially via the TMs of the n nodes participating in the global TA.

● Phase 1: communication ‚in the forward direction‘ from coordinator (TM1) to last agent (TMn);Phase 2: other direction.

T M 1 T M 2 T M 3 T M n

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

A C K

2PC Optim.

3PC

Discussion

Page 4: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 4

Linear Two-Phase Commit (2)

● Coordinator enters PREPARED state,passes its local commit decision (READY) to TM2.

● Agent enters PREPARED state; after having received READY,sends READY to next agent.

2PC Optim.

3PC

Discussion

T M 1 T M 2 T M 3 T M n

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

A C K

Page 5: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 5

Linear Two-Phase Commit (3)

● Successful termination of transaction – is decided once last agent TMn has received

READY and has written commit log entry (of entire transaction).

– commit decision goes to agents in reverse order; logging and release of locks.

– TM1 then sends ACK to TMn; TMn writes end log entry.

2PC Optim.

3PC

Discussion

T M 1 T M 2 T M 3 T M n

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

A C K

Page 6: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 6

Linear Two-Phase Commit (4)

● Abort of transaction– if one of the nodes decides abort;

FAILED message is passed on.– Last agent (TMn) becomes coordinator:

it logs global commit result and passes it on.

2PC Optim.

3PC

Discussion

T M 1 T M 2 T M 3 T M n

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

A C K

Page 7: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 7

Linear Two-Phase Commit (5)

● Advantages– reduced communication overhead

● combine PREPARE- and READY-Messages● only one ACK

● Disadvantages– serial processing,

very slow if many nodes are involved

2PC Optim.

3PC

Discussion

T M 1 T M 2 T M 3 T M n

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

R E A D Y / F A I L E D

C O M M I T / A B O R T

A C K

Page 8: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 8

Hierarchical Two-Phase Commit (1)

● Generalization of basic scheme for hierarchical invocation structures (transaction tree).

● Each agent communicates only with direct ancestor and direct successors.

standard

linearhierarchical

2PC Optim.

3PC

Discussion

Page 9: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 9

Hierarchical Two-Phase Commit (2)

TM1 (Coordinator)

Logging

End

TM2 (Agent)

READY / FAILED

PREPARE

COMMIT / ABORT

ACK

TM3 (Agent)

READY / FAILED

PREPARE

Logging

Logging

Logging Release of Locks

COMMIT / ABORT

ACKEnd

Logging Release of Locks

2PC Optim.

3PC

Discussion

Page 10: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 10

Hierarchical Two-Phase Commit (3)

● No changes for root and leaf nodes.● Intermediate nodes:

coordinator for successors in tree,agent from its ancestor‘s perspective.

● PREPARE messages go to all successors,wait for their commit votes.Then commit decision for entire subtree, logging + sending it to ancestor.

2PC Optim.

3PC

Discussion

Page 11: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 11

Hierarchical Two-Phase Commit (4)

● Abort – immediately inform all successors having voted commit.

● Phase 2: receive commit result from ancestor, log it, pass it on to successors, and confirm it immediately.

● After having received all ACKs from successors write end log entry.

2PC Optim.

3PC

Discussion

Page 12: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 12

Hierarchical Two-Phase Commit (5)

● High generality and flexibility– less messages, compared to basic scheme– might speed up processing if groups of nodes

are located in different subnets with slow interconnections

● In general, reduced performance– less parallelism and longer duration

(proportional to height of tree)– one additional (asynchronous) log write

in intermediate node (end log).

2PC Optim.

3PC

Discussion

Page 13: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

IPD, Forschungsbereich Systeme der Informationsverwaltung

2PC Optimizations

Page 14: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 14

Basic Protocol

● Flow of messages:

Messages shown occur for each agent.● Recap: What happens when agent recovers

during uncertainty period?

TM1 (Coordinator)

Ph

ase

1

Determineglobal commit resultand log it.

End

Determine local commitresult and log it.

TM2 (Agent)

Ph

ase

2

local commit result

(READY / FAILED)

PREPARE

global commit result

(COMMIT / ABORT)

Confirmation

(ACK)

Log globalcommit result;release locks.

2PC Optim.

3PC

Discussion

Recap: Why necessary?

Page 15: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 15

Presumed Abort (1)

● Objective: less messages and less log entries.● If after failure coordinator log file does not contain

commit log entry: decide abort.● Presumed abort – incorporated in several products

and in standards (ISO/OSI TP, X/Open DTP).

2PC Optim.

3PC

Discussion

Page 16: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 16

Presumed Abort (2)

● Advantages:– Coordinator does not need to write

abort log entry synchronously.– ACK messages for failed transactions

superfluous,– same with end log entries

at coordinator and intermediate nodes.● However, no savings

with successful global transactions.

2PC Optim.

3PC

Discussion

Page 17: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 17

Read-Only Subtransactions (1)

● Objective: save messages and/or (synchronous) log entries.

● Example – transaction with three subtransactions:1.STA1: read balance of bank account x.

2.STA2: if x>1000, withdraw 500 from account y.

3.STA3: otherwise withdraw 100 from account z.● What happens after STA1 has voted commit

– if transaction is successful,– if transaction fails? Note that STA1 is read-only.

2PC Optim.

3PC

Discussion

Page 18: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 18

Read-Only Subtransactions (2)

● Neither recovery nor logging, only release of locks.● May already happen in Phase 1 of 2PC protocol,

irrespective of success of global transaction;save entire second commit phase

● If m subtransactions are read-only (of n-1),number of messages reduces by 2m to 4·(n-1)-2m, number of log writes reduces to 2n-m.

● If global transaction is read-only (m = n),only 2·(n-1) messages and no log writes.

2PC Optim.

3PC

Discussion

Page 19: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 19

● So far: work in subtransactions separated from commit protocol:

● Short distributed transactions with only one external database operation (e.g., money transfer to bank): commit processing is more expensive than transaction itself.

One-Phase Commit (1)

primary transaction

WORK

DONE

PREPARE

READYsubtransaction

COMMIT

ACK

once for each operation2PC Optim.

3PC

Discussion

Page 20: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 20

One-Phase Commit (2)

● Thus: combine PREPARE message with WORK message.

● Subtransaction enters PREPARE state immediately after having executed operation and before replying to primary transaction.

primarytransaction

WORK & PREPARE

DONE & READY subtransactionCOMMIT

ACK

2PC Optim.

3PC

Discussion

Page 21: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 21

One-Phase Commit (3)

● We can save first commit phase, commit processing consists of only one phase to communicate global result(hence the name).

● Two messsages less per agent.● Why does it only work with short transactions?

2PC Optim.

3PC

Discussion

Page 22: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

IPD, Forschungsbereich Systeme der Informationsverwaltung

3PC

Page 23: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 23

3PC – Introduction

● Weakness of all 2PC protocols– dependency on coordinator– failure while agents are in READY state

may result in long blockings.● Thus, some ‚solutions‘ in practice

refrain from transactional guarantees.● Alternative:

„non-blocking“ commit protocols alleviate the situation but require more effort, e.g., three-phase commit (3PC).

● For practical purposes, 2PC generally is sufficient.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 24: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 24

3PC – Variants

● Two variants of 3PC protocol:– Tolerates site failures.

Non-blocking, except for total failures.(Total failures – all nodes are down.)Communication failures may result in inconsistencies.

– Tolerates both communication and site failures, but blocking.

● We deal with Variant 1 first, then with Variant 2.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 25: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 25

Non-Blocking Characteristic (1)

● (Recap:) Process is blocked if fixing an error/failure is necessary s.t. it can proceed.

● Non-blocking characteristic: if an operational process is uncertain, no other process (operational or failed) has decided commit.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

READY­Messages

Non­blocking characteristicis not violated.

COMMIT­Messages

Non­blocking characteristicis typically violated.

Page 26: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 26

Non-Blocking Characteristic (3)

● 2PC does not have non-blocking characteristic, → COMMIT messages do not arrive at same time.

● Note:– Non-blocking characteristic leaves aside

processes that have just recovered and are finding out the state of the transaction.

– Non-blocking characteristic is not violated as long as all nodes are uncertain.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 27: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 27

Non-Blocking Characteristic (4)

● Desired characteristic:– uncertain processes may abort– processes that have failed have not decided

commit as well.

→ No blocking.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 28: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 28

3PC – Overview

● PRE-COMMIT messages – end uncertainty, but not yet commit.

● PRE-COMMIT tells node that it will eventually receive COMMIT message if coordinator does not fail.

● Coordinator fails→ Nodes decide without coordinator

and are not blocked.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 29: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 29

3PC (1)

● First phase and abort case (not depicted here) as with 2PC.

● Additional phase only if all agents vote READY.

TM1 (Coordinator)

logging (precommit)

end

TM2 (Agent)

READY

PREPARE

PRECOMMIT

PC-ACK

logging (prepared)

logging (precommit)

logging (commit) release locks

COMMIT

ACK

logging (commit)

‘Precommit’ = ‘Committable’

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

precommit: agent has the

intention to commit; different

from commit!

Page 30: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 30

3PC (2)

● Intermediate state PRECOMMIT of coordinator, respective log entry.

● It informs all agents; they write log entry as well + confirm.

● If k of n-1 PC-ACK messages have arrived, coordinator decides commit and writes respective log entry.

● Last phase – same as with 2PC.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 31: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 31

3PC (3)

● Precommit of coordinator:– assertion that it will not abort transaction in

future– TA may still abort if this coordinator fails.

● Failure of coordinator node still possible:– timeout to recognize this,– election of new coordinator.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 32: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 32

Why COMMIT and ACK Still Needed?

● Node knows that it will eventually receive COMMIT message if coordinator does not fail.

● COMMIT also guarantees that transaction was completed successfully

● ACK proves that agent is in a consistent state

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 33: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 33

Timeouts in 3PC

● First phase and abort case (not depicted here) as with 2PC.

● Additional phase only if all agents vote READY.

TM1 (Coordinator)

logging (precommit)

end

TM2 (Agent)

READY

PREPARE

PRECOMMIT

PC-ACK

logging (prepared)

logging (precommit)

logging (commit) release locks

COMMIT

ACK

logging (commit)

‘Precommit’ = ‘Committable’

1

2

3

4

5

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 34: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 34

Dealing with Timeouts

● When do we wait?1.Participants wait for PREPARE.

2.Coordinator waits for the votes.

3.Participants wait for PRE-COMMIT/ABORT.

4.Coordinator waits for PC-ACKs.

5.Participants wait for COMMIT.● 1., 2. – unproblematic, abort is always possible.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion 4

2

1

3

5

Page 35: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 35

Coordinator Failures (1)

● Case 4 („Coordinator waits for PC-ACKs“).● Ignore missing PC-ACK messages;

go on after timeout period.● Agent must find out state of protocol

after recovery. (Will be dealt with right away.)● Non-blocking characteristic is not violated.

(Recap. non-blocking characteristic: if operational process is uncertain, no (other) process has decided commit.)

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 36: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 36

Coordinator Failures (2)

TM1 (Coordinator)

logging

(precommit)

end

TM2 (Agent)

READY

PREPARE

PRECOMMIT

PC-ACK

logging (prepared)

logging (precommit)

logging (commit) release locks

COMMIT

ACK

logging (commit)

‘Precommit’ = ‘Committable’

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

uncertainty ends3

5

Page 37: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 37

Coordinator Failures (3)

● Cases 3., 5. („Participants wait ...“)● Available nodes elect new coordinator

→ election protocol● New coordinator requests states

of all available nodes (message STATE-REQ).→ termination rules specify how protocol continues

● Why cannot participant simply commit in Case 5 („participants wait for COMMIT“)? Non-blocking property is typically violated.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 38: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 38

Coordinator Failures (4)

● Second node cannot commit – non-blocking property would be violated.

● Coordinator has failed – not guaranteed that protocol will terminate successfully.

READY­Messages

PRECOMMIT­Messages

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

PC­ACK

Page 39: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 39

Election Protocol (1)

● Select new coordinator if the old one failed● Prerequisite: linear ordering of processes (‚<‘).● UPp – set of processes

of which p believes that they are operational.● New coordinator – first node

according to that ordering.● Message UR-ELECTED.● Then messages STATE-REQ.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 40: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 40

● Illustration:

Election Protocol (2)

2

5

31 4

UR­ELECTED

STATE­REQ

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 41: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 41

Termination Rules (1)

● After new coordinator has collected states of available nodes:– TR1: a process has aborted

→ coordinator decides abort and sends out ABORT messages.

– TR2: a process has committed→ coordinator decides commit

and sends out COMMIT messages.– TR3: all processes that have reported their state

are uncertain (PREPARED state)→ coordinator decides abort

and sends out ABORT messages.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 42: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 42

Termination Rules (2)

– TR4: a process is committable, none is committed. → PRE-COMMIT messages to processes in uncertain state and wait for acks.Then commit decision, and COMMIT messages are sent out.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 43: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 43

Failures during Termination

● Point that is still open:how to deal with failures during termination.– Ignore failures of agents.– Failures of coordinator – algorithm is repeated,

but with less nodes → termination.I.e., nodes do not need to remain available.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 44: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 44

Summary of 3PC Features

● Conclusion:– no blockings any more (site failures only).

● Why?● Situation that would lead to blocking

with 2PC, but not with 3PC?

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 45: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 45

2PC vs. 3PC

● 3PC does not have to wait, but elects new coordinator instead.

● Intermediate state with 3PC → pleasant situation that we know votes of all other nodes without decision having been taken.

READY­Messages

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

COMMITdecision

READY­Messages

PRE­COMMIT

Page 46: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 46

Communication Failures

● What happens in case of communication failures?

● Partitions, with different outcomes of protocol!

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

READY­Messages

PRE­COMMIT

Page 47: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 47

● Illustration:

● What may happen if there are communication failures(disconnected partitions)?

Election and Communication Failures (2)2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion 2

5

31 4

UR­ELECTED

STATE­REQ

Page 48: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 48

Election and Communication Failures (2)

● Effects that may occur, e.g., because of message delays.– New coordinator q,

p‘ does not yet know that c is not available any more.p‘ receives Message STATE-REQ.p‘ concludes that c is not available any more.

– p‘ receives Message STATE-REQ from q, then one from q‘, with q‘>q.What does it mean?

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 49: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 49

Recovery after Total Failures

● Protocol typically blocks in case of total failures (but this is a pathological case).

● Process p that has just recovered:in general, autonomous decision is not feasible.

● Namely, decision for commit or abort could have been taken after failure of p.

● Only process that has failed last can do this.● The only possible approach:

wait for recovery of this process

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 50: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 50

3PC + Communication Failures

● In the following: second variant of 3PC protocol.● Variant 1 assumes

that there are no communication failures.● Illustration: Components A and B,

separated from each other.All processes in A: uncertain,all processes in B: commitable.

● Characteristics of Variant 2: tolerates communication and site failures, but blocking.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

PRE­COMMIT

Page 51: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 51

3PC – Variant 2

● Variant 2: Tolerates communication and site failures, but blocking.

● Coordinator that decides must be able to communicate with majority of processes.– Idea: rule out wrong descision of a secondary

coordinators which were elected by mistake due to network failures

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 52: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 52

Differences to Variant 1

● First phase and abort case (not depicted here) as with 2PC.

● Additional phase only if all agents vote READY.

TM1 (Coordinator)

logging

(precommit)

end

TM2 (Agent)

READY

PREPARE

PRECOMMIT

PC-ACK

logging (prepared)

logging (precommit)

logging (commit) release locks

COMMIT

ACK

logging (commit)

‘Precommit’ = ‘Committable’

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 53: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 53

Abort in 3PC

● Second round of messages in ABORT case as well.

TM1 (Coordinator)

logging

(preabort)

end

TM2 (Agent)

FAILED

PREPARE

PREABORT

PREABORT-ACK

logging (failed)

logging (preabort)

logging (abort)

release locks

ABORT

ACK

logging (abort)

‘Preabort’ = ‘Abortable’

Intention

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 54: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 54

Intention in 3PC – Variant 2

● Intention – Coordinator effects intention

as soon as it knows states of majority of processes.

– Intention = decision that coordinator will take as soon as majority of processes has been informed (if not one process already in aborted or committed state).

– Intention commit – at least one process is commitable. Abort – all processes are uncertain.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 55: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 55

Intention (cont.)

● Intention must have been communicated to majority of processes before coordinator takes decision.

● Intention is communicated to processes(PRE-COMMIT/PRE-ABORT).

● Coordinator waits for PRE-COMMIT-ACK or PRE-ABORT-ACK of absolute majority.

● Coordinator sends COMMIT or ABORT to component.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 56: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 56

Abortable and Committable (1)

Intention: Commit

Committable

CommittablePrepared Prepared Prepared

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 57: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 57

Abortable and Committable (2)

● After timeout, green component elects new coordinator.

● New coordinator collects states from nodes in component.

● New coordinator comes up with intention ‚Abort‘.

Committable

Committable

CPRE­ABORT

PRE­ABORT

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 58: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 58

Abortable and Committable (3)

● What happens when communication failure goes away?

● Participants in Abortable state will not react to PRE-COMMIT message,coordinator will not be able to form majority.

Committable

CommittableAbortable Abortable Abortable

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 59: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 59

C

What If There Were No PRE-ABORT Messages?

Committable

CommittablePREPARED

PREPARED

Intention:Abort PRECOMMIT

PRECOMMIT

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

● If new coordinator fails after deciding abort, and if communication is restored afterwards, the old coordinator would enforce commit

Page 60: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 60

Majority Termination Rules (1)

● Majority Termination Rules – specify how coordinator decides, depending on the states received.– Coordinator receives a committed state

→ commit decision, COMMIT messages.– Aborted – same.

In what follows, no more committed or aborted states.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 61: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 61

Majority Termination Rules (2)

1. Sequence:a) Coordinator:

one committable and a majority non-abortable states (prepared or committable)⇒ PRE-COMMIT messages to all sites that have not sent commitable.

b) Site: PRE-COMMIT → new state committable→ PRE-COMMIT-ACK

c) Coordinator: if sites in state commitable form majority: commit; otherwise blocking.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 62: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 62

Majority Termination Rules (3)

1. Sequence:a) Coordinator:

Majority of non-committable states (prepared or abortable) ⇒ PRE-ABORT messages to all sites

that have not replied abortable.b) Site: PRE-ABORT

→ new state abortable→ PRE-ABORT-ACK

c) Coordinator: if sites in state abortable form majority: abort; otherwise blocking.

blocking: e.g., insufficient number of responses from other nodes

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 63: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 63

Election Protocol

● Problem: due to communication failures, multiple coordinators could be elected

● Each process p administers set UPp of processes that it believes it can currently communicate with.

● Message UR-ELECTED to process in UPp with smallest ID.

● q ignores UR-ELECTED if it can communicate with processes with smaller ID.

● Process ignores messages STATE-REQ, PRE-COMMIT etc. from other coordinator.

2PC Optim.

3PC

- Introduction

- Protocol - Steps

- Timeouts & Site Fail.

- Variant 2

Discussion

Page 64: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 64

Comparison

● Number of messages(including optimized treatment of read-only subtransactions)

● transaction has executed at n nodes (n > 1),m nodes with read-only subtransactions (m < n)

2PC Optim.

3PC

Discussion

general Example 1(n=2, m=0)

Example 2(n=10, m=5)

1PC 2(n-1) 2 18

Linear 2PC 2n-1 3 19

Centralized/hierarchical 2PC

4(n-1)-2m 4 26

3PC 6(n-1)-4m 6 34

Page 65: Distributed Data Management - KIT · 2008. 11. 24. · Distributed Data Management Chapter 4: Distributed Transactions (Second Part) Erik Buchmann buchmann@ipd.uka.de. IPD, Forschungsbereich

Erik Buchmann IWM: Einleitung – 65

Comparison (2)

● For 1PC and linear 2PC no savings for read-only subtransactions:with long read locks two messages per agent per release for these protocols as well.

● 3PCsignificantly more messages (6·(n – 1)) and log writes (3n)

2PC Optim.

3PC

Discussion