CS 347: Distributed Databases and Transaction...

80
CS 347 Notes07 1 CS 347: Distributed Databases and Transaction Processing Notes07: Reliable Distributed Database Management Hector Garcia-Molina

Transcript of CS 347: Distributed Databases and Transaction...

Page 1: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 1

CS 347: Distributed Databases and

Transaction ProcessingNotes07: Reliable

DistributedDatabase Management

Hector Garcia-Molina

Page 2: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 2

Reliable distributed database management

• Reliability• Failure models• Scenarios

Page 3: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 3

Reliability

• Correctness– Serializability– Atomicity– Persistence

• Availability

Page 4: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 4

Types of failures

• Processor failures– Halt, delay, restart, bezerk, ...

• Storage failures– Volatile, non-volatile, atomic write,

transient errors, spontaneous failures

• Network failures– Lost message, out-of-order messages,

partitions, bounded delay

Page 5: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 5

• Malevolent failures• Multiple failures• Detectable failures

More Types of failures

Page 6: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 6

Failure models

• Cannot protect against everything• Unlikely failures (e.g., flooding in the Sahara)

• Expensive to protect failures (e.g., earthquake)

• Failures we know how to protect against (e.g., message sequence numbers; stable storage)

Page 7: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 7

Failure model:

DesiredEvents

ExpectedUndesired

Unexpected

Page 8: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 8

Node models

(1) Fail-stop nodestime

perfect halted recoveryperfect

Volatilememory lost

Stablestorage ok

Page 9: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 9

(2) Byzantine nodesA Perfect

Perfect Arbitrary failure RecoveryB

C

At any given time, at most some fraction f of nodesfailed (typically f < 1/2 or f < 1/3)

Node models

Page 10: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 10

Network models

(1) Reliable network- in order messages- no spontaneous messages- timeout TD

I.e., no lost messages, except for node failures

If no ack in TD sec.Destination down

(not paused)

Page 11: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 11

Variation of reliable net:

• Persistent messages– If destination down, net will

eventually deliver message– Simplifies node recovery, but leads to

inefficiencies (hides too much)– Not considered here

Page 12: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 12

Network models

(2) Partitionable network- In order messages- No spontaneous messages

- no timeout; nodes can have different view of failures

Page 13: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 13

Scenarios

• Reliable network– Fail-stop nodes

– No data replication (1)– Data replication (2)

• Partitionable network– Fail-stop nodes (3)

Page 14: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 14

No Data Replication

• Reliable network, fail-stop nodes

• Basic idea: node Pα controls X

Pα Item Xnet

Page 15: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 15

No Data Replication

• Reliable network, fail-stop nodes

• Basic idea: node Pα controls X

Pα Item Xnet

- Single control point simplifies concurrency control, recovery

- Not an availability hit: if Pα down, X unavailable too!

Page 16: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 16

“Pα controls X” means- Pα does concurrency control for X- Pα does recovery for X

Page 17: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 17

Say transaction T wants to access X:

PT is process that represents T at this node

PT

req

Local DMBS

Lockmgr

LOG X

Page 18: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 18

Process models

(A) Cohorts Spawn process

CommunicationData Access

USER

T1

LocalDMBS

T2

LocalDMBS

T3

LocalDMBS

Page 19: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 19

Process models

(B) Transaction servers (manager)USER

LocalDMBS

TransMGR

LocalDMBS

TransMGR

LocalDMBS

TransMGR

Page 20: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 20

• Cohorts: application code responsible for remote access

• Transaction manager: “system” handles distribution, remote access

Page 21: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 21

.

Distributed commit problem

Action:a1,a2

Action:a3

Action:a4,a5

Transaction T

Page 22: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 22

Centralized two-phase commit

Coordinator Participant

I

W

C

A

I

W

C

A

go exec*

nok abort*

ok* commit *

commit -

exec ok

exec nok

abort -

Page 23: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 23

• Notation: Incoming message Outgoing message( * = everyone)

• When participant enters “W” state:– it must have acquired all resources

– it can only abort or commit if so instructedby a coordinator

• Coordinator only enters “C” state if all participants are in “W”, i.e., it is certain that all will eventually commit

Page 24: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 24

Handling node failures

• Coordinator and participant logs are used to reconstruct state before failure

Page 25: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 25

-> Example: after participant fails: Log:

T1X

undo/redoinfo

T1Y

info... ...

T1“W”state

Page 26: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 26

At recovery:

• T1 is in “W” state• Obtain X,Y write locks (no read locks!)

• Wait for message from coordinator(or ask coordinator for outcome)

Page 27: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 27

Other examples:

• No “W” record on log ⇒ abort T1

• See “C” record on log ⇒ finish T1

Page 28: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 28

• Add timeouts to cope with messages lost during crashes

• Add finish (“F”) state for coordinator – all done, can forget outcome

Next

Page 29: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 29

Coordinator

I

W

C

A

F

_go_exec*

c-ok*-

nok*-

nokabort*

ok*commit*

t=timeout

cping=coord. ping

Page 30: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 30

Coordinator

I

W

C

A

F

_go_exec*

c-ok*-

nok*-

ping- _t_

cping

pingabort

pingcommit

_t_cping

_t_abort*

nokabort*

ok*commit*

t=timeout

cping=coord. ping

Page 31: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 31

Participant

I

W

C

A

execok

execnok

commitc-ok

abortnok

Page 32: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 32

Participant

I

W

C

A

execok

execnok

commitc-ok

abortnok

cping , _t - ping

cpingdone

cpingdone

“done” message counts as eitherc-ok or n-ok for coordinator

Page 33: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 33

Participant

I

W

C

A

execok

equivalent to finish state

execnok

commitc-ok

abortnok

cping , _t - ping

cpingdone

cpingdone

“done” message counts as eitherc-ok or n-ok for coordinator

Page 34: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 34

Presumed abort protocol

• “F” and “A” states combined in coordinator

• Saves persistent space (forget quicker)

• Presumed commit is analogous

Page 35: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 35

Presumed abort-coordinator (participant unchanged)

I

W

C

A/F

_go_exec*

ping-

c-ok*-

pingabort

pingcommit

_t_cping

nok, tabort*

ok*commit*

Page 36: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 36

Remember: all state transitions must be loggedExample: tracking who has sent “OK” msgsLog at coord:

• After failure, we know still waiting for OK from node b

• Alternative: do not log receipts of “OK”s abort T1

T1start

part={a,b}

T1OK

from a RCV

... ......

Page 37: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 37

Example: logging receipt of C-OK messages

• If logged, can recover state• If not logged:

– resend commit *– participants reply “done” if duplicate

Page 38: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 38

2PC is blocking

Sample scenario:Coord P2

W P1 P3

WP4

W

Page 39: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 39

Case I:P1 → “W”; coordinator sent commits P1 → “C”

Case II:P1 → NOK; P1 → A

⇒ P2, P3, P4 (surviving participants)

cannot safely abort or commit transaction

coord

P1

P2

P3

P4w

w

w

Page 40: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 40

Variants of 2PC

• Linear Coord

• Hierarchical

ok ok ok

commit commit commit

Page 41: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 41

• Distributed

– Nodes broadcast all messages– Every node knows when to commit

Variants of 2PC

Page 42: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 42

3PC = non-blocking commit

• Assume: failed node is down forever

• Key idea: before committing, coordinator tells participants everyone is ok

Page 43: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 43

Coordinator Participant

I

W

P

A

_go_exec*

ack*commit*

nokabort*

ok*pre *

3PC

C

I

W

P

A

_exec_ok

commit-

execnok

preack

C

abort-

Page 44: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 44

3PC recovery rules: termination protocol

• Survivors try to complete transaction, based on their current states

• Goal:– If dead nodes committed or aborted, then

survivors should not contradict!

– Else, survivors can do as they please...

surv

ivors

Page 45: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 45

• Let {S1,S2,…Sn} be survivor sites

• If one or more Si = COMMIT ⇒ COMMIT T

• If one or more Si = ABORT ⇒ ABORT T

• If one or more Si = PREPARE ⇒T could not have aborted ⇒ COMMIT T

• If no Si = PREPARE (or COMMIT) ⇒T could not have committed ⇒ ABORT T

surv

ivors

Page 46: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 46

Example:

P

W

W

?

?

Page 47: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 47

Example:

I

W

W

?

?

Page 48: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 48

Example:

P

P

C

?

?

Page 49: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 49

Example:

P

W

A

?

?

Page 50: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 50

➳ Once survivors make decision, they must select new coordinator to continue 3PC

P P C C

W P C C W P P C

Decide tocommit

Time1

Time2

Time3

Time4

Page 51: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 51

Note: when survivors continue 3PC, failed nodes do not count

E.g., “OK*” ≡ when OK’s received from non-failed nodesW

P

ok*pre*

Page 52: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 52

Note: 3PC unsafe with partitions!

W

W

W

P

P

abort commit

Page 53: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 53

Node recovery:

• After node N recovers from failure:– do not participate in termination protocol

(why?)

W

W

W

?

P

→ A

Page 54: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 54

Node recovery:

• After node N recovers from failure:– do not participate in termination protocol

(why?)

W

W

W

?

P

→ A

later on...

Page 55: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 55

Node recovery:

• After node N recovers from failure:– do not participate in termination protocol

(why?)– wait until it hears commit or abort decision

from operational node

?ping

ping

ping“C”(or “A”)

Page 56: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 56

• Waiting for commit/abort decision from other node is ok, unless all fail:

? ? ? ? ?

Page 57: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 57

Two options for all-failed problem:(A) Wait for all to recover(B) Majority commit

Page 58: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 58

Option A

• Recovering node waits for either:(1) commit/abort outcome for T from other node(2) all nodes that participated in T are up and recovering:

⇒ then 3PC can continue(no danger that a failed node could haveaborted or committed)

Page 59: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 59

Option B

• Nodes are assigned votes, total is VMajority is V+1 e.g., V=5 2 Maj=3 V=6 Maj=4

• To make state transitions, coordinator requires messages from nodes with a majority of votes

Page 60: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 60

Example(1): Coord P2 → W P1 P3 → Wp4 → W

• Nodes P2, P3, P4 enter “W” state and fail• When they recover, coord. and P1 are down• Each node has 1 vote, V=5, Maj=3

?

?

?

Page 61: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 61

Example(1): Coord P2 → W P1 P3 → Wp4 → W

• Nodes P2, P3, P4 enter “W” state and fail• When they recover, coord. and P1 are down• Each node has 1 vote, V=5, Maj=3

?

?

?

• Since P2, P3, P4 have majority, they know coord. could not have gone to “P” without at least one of their votes

• Therefore, T can be aborted!

Page 62: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 62

Example(2): Coord P3 → ”P” P1 P4 → ”W” P2

• Each node has 1 vote; V=5, Maj=3• Nodes fail after entering states shown;

P3, P4 recover

?

?

Page 63: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 63

Example(2): Coord P3 → ”P” P1 P4 → ”W” P2

• Each node has 1 vote; V=5, Maj=3• Nodes fail after entering states shown;

P3, P4 recover

?

?

• Termination rule says we can try to commit, but P3, P4 do not have enough votes, so they do nothing!

• P3, P4 doing nothing is good because later on, coord. P1, P2 could abort T

Page 64: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 64

1

Summary: Majority rule ensures that any

decision (e.g., Preparing, committing) will be

knownto any future group making a decision

1 1

22

decision # 2

decision # 1

Page 65: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 65

Important Detail for Majority 3PC

• Example:

W

W

W

?

P

➜ A

Page 66: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 66

Important Detail for Majority 3PC

• Example:

W

W

W

?

P

➜ A

➜ P ➜ C

Page 67: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 67

Need “Prepare To Abort” State

I

W

PC PA

_go_exec*

ackC*commit*

nokpreA*ok*

preC *

C

I

W

PC

A

_exec_ok

commit-

execnok

preCackC

C

preAackA

coordinator participant

A

ackA*abort*

PA

abort-

Page 68: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 68

Example Revisited

W

W

W

?

PC

➜ PA

Page 69: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 69

Example Revisited

W

W

W

?

PC

➜ PA

➜ PC ➜ C

OK to commit sincetransaction could not have aborted

Page 70: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 70

Example Revisited -II

W

W

W

?

PC

➜ PA

➜ PA

Page 71: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 71

Example Revisited -II

W

W

W

?

PC

➜ PA

➜ PA

No decision:Transaction could have aborted orcould have committed... Block!

➜ PA

Page 72: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 72

3PC with Majority Voting

• If survivors have majority andall states W ⇒ try to abort

• If survivors have majority andstates in {W, PC, C} ⇒ try to commit

• If survivors have majority andstates in {W, PA, A} ⇒ try to abort

• Otherwise block

Page 73: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 73

Comparison

Option A: only nodes that have not failed participate in 3PC

• Any size group can terminate(even one node)

• If all nodes fail, must wait for all to recover

Page 74: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 74

Option B: Majority voting• A group of failed+recovering nodes

can terminate transaction (with majority of votes)

• Need majority for every commit➽ blocking protocol!

Comparison

Page 75: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 75

Reminder

• When node recovers, it uses its log in a normal fashion to determine status of transactions:– if commit found in log ⇒ redo if

necessary– if abort found (or no “W” record) ⇒

rollback if necessary

Page 76: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 76

– if in “W” state (or “P” state): • reclaim locks held by T before crash• try to terminate T (with other nodes)

– after locks claimed for “in doubt” transactions, start normal processing

Reminder - Continued

Page 77: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 77

Final note

• If nodes use 2P locking, global deadlocks possible

Local WFG: Local WFG: no cycles no cycles!

T1

T2

T1

T2

Page 78: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 78

• Need to “combine” WFGs to discover global deadlock

T1

T2

T1

T2

T1

T2e.g., central

detection node

Page 79: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 79

Problem: False deadlocks

T1

T2

T1

T2

T1

T2

T1

T2

Time1

Time2

Time3

infosent

infosent

T1

T2

at centralsite:

Page 80: CS 347: Distributed Databases and Transaction Processinginfolab.stanford.edu/~venetis/cs347/notes/347Notes07.pdf · Transaction Processing Notes07: Reliable Distributed Database Management

CS 347 Notes07 80

• Many deadlock solutions– Distributed vs. centralized– Detection vs. prevention

• timeouts• wait-die• wound-wait

• Covered in CS245