CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control
description
Transcript of CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control
![Page 1: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/1.jpg)
CS 347 Notes05 1
CS 347: Parallel and Distributed
Data ManagementNotes05: Concurrency
Control
Hector Garcia-Molina
![Page 2: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/2.jpg)
CS 347 Notes05 2
Overview
• Concurrency Control– Schedules and Serializability– Locking– Timestamp Control
• Failure Recovery– next set of notes...
![Page 3: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/3.jpg)
CS 347 Notes05 3
Schedule
• Just like in a centralized system, a schedule represents how a set of transactions were executed
• Schedules may be “good” or “bad” (preserve constraints)
![Page 4: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/4.jpg)
CS 347 Notes05 4
T1 T2
1 (T1) a X 5 (T2) c X
2 (T1) X a+100 6 (T2) X 2c
3 (T1) b Y 7 (T2) d Y
4 (T1) Y b+100 8 (T2) Y 2d
ExampleX Y
Node 1 Node 2
constraint: X=Y
Precedence relation
![Page 5: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/5.jpg)
CS 347 Notes05 5
(node X) (node Y)1 (T1) a X
2 (T1) X a+100
5 (T2) c X 3 (T1) b Y
6 (T2) X 2c 4 (T1) Y b+100
7 (T2) d Y
8 (T2) Y 2d
If X=Y=0 initially, X=Y=200 at end (always good?)
Schedule S1 Precedence: intra-transactioninter-transaction
![Page 6: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/6.jpg)
CS 347 Notes05 6
Let T= {T1,T2,Tn} be a set of transactions.
A schedule S over T is a partial order withordering relation <S where:
(1) S = Ti
(2) <S <i
(3) for any two conflicting operations p,q S,either p <S q or q <S p
Definition of Schedule
i=1
N
Ni=1
![Page 7: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/7.jpg)
CS 347 Notes05 7
Example
(T1) r1[X] W1[X]
(T2) r2[X] W2[Y] W2[X]
(T3) r3[X] W3[X] W3[Y] W3[Z]
r2[X] W2[Y] W2[X]
S1: r3[Y] W3[X] W3[Y] W3[Z]
r1[X] W1[X]
![Page 8: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/8.jpg)
CS 347 Notes05 8
Definition of P(S)
• The precedence graph for schedule S, P(S), is a directed graph where– nodes: the transactions in S
– edges: Ti Tj is an edge IFF p Ti, q Tj such thatp, q conflict and p <S q
![Page 9: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/9.jpg)
CS 347 Notes05 9
Example
S1: r1[X] w1[X] w1[Y]
r3[X] w3[X]
r2[X] w2[Y]
P(S1): T2 T1 T3
![Page 10: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/10.jpg)
CS 347 Notes05 10
Serializability Theorem
Theorem: A schedule S is serializable IFF P(S) is acyclic.
![Page 11: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/11.jpg)
CS 347 Notes05 11
Enforcing Serializability
• Locking• Timestamps
![Page 12: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/12.jpg)
CS 347 Notes05 12
Locking
• Just like in a centralized system…• But with multiple lock managers
D1
locksforD1
scheduler 1
node 1
D2
locksforD2
scheduler 2
node 2
...
![Page 13: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/13.jpg)
CS 347 Notes05 13
Locking
• Just like in a centralized system…• But with multiple lock managers
D1
locksforD1
scheduler 1
node 1
D2
locksforD2
scheduler 2
node 2
...
Taccess &lock data
access &lock data
(release all locks at end)
![Page 14: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/14.jpg)
CS 347 Notes05 14
Locking Rules
• Well-formed transactions• Legal schedulers• Two-phase transactions
• These rules guarantee serializable schedules
![Page 15: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/15.jpg)
CS 347 Notes05 15
What about
• Locking in a shared-memory architecture?
• Locking in a shared-disks architecture?
![Page 16: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/16.jpg)
Locking with Shared Memory
• Where does lock table live?• How do we avoid race conditions?
CS 347 Notes05 16
P P P...
M
![Page 17: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/17.jpg)
Locking with Shared Disks
CS 347 Notes05 17
...
P
M
P P
M M
• Where does lock table live?• How do we avoid race conditions?
![Page 18: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/18.jpg)
Shared Disk Locking, Cases to Discuss• Control of Data Partitioned;
fixed partition function known by everyone
• Dynamic partition of control– For each DB object i we need
• LT(i): lock table entry for i (transaction that has lock, waiting transactions, lock mode, etc...)
• W(i): at what processor is LT(i) currently?
– Replicate W(i) at all processors (why?)– Need replicated data management scheme!
CS 347 Notes05 18
![Page 19: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/19.jpg)
CS 347 Notes05 19
Overview
• Concurrency Control– Schedules and Serializability– Locking– Timestamp Control
• Failure Recovery– next set of notes...
next
![Page 20: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/20.jpg)
CS 347 Notes05 20
Timestamp Ordering Schedulers
• Basic idea:
- assign timestamp as transaction begins
- if ts(T1) < ts(T2) … < ts(Tn), then scheduler produces history equivalent to T1,T2,T3,T4, ... Tn
![Page 21: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/21.jpg)
CS 347 Notes05 21
TO Rule
If pi[x] and qj[x] are conflicting operations, then pi[x] is executed before qj[x]
(pi[x] <S qj[x])
IFF ts(Ti) < ts(Tj)
![Page 22: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/22.jpg)
CS 347 Notes05 22
Example: a non-serializable schedule S2
(Node X) (Node Y)
(T1) a X (T2) d Y
(T1) X a+100 (T2) Y 2d
(T2) c X (T1) b Y
(T2) X 2c (T1) Y b+100
![Page 23: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/23.jpg)
CS 347 Notes05 23
Example: a non-serializable schedule S2
(Node X) (Node Y)
(T1) a X (T2) d Y
(T1) X a+100 (T2) Y 2d
(T2) c X (T1) b Y
(T2) X 2c (T1) Y b+100
ts(T1) < ts(T2)
reject!
abort T1
![Page 24: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/24.jpg)
CS 347 Notes05 24
Example: a non-serializable schedule S2
(Node X) (Node Y)
(T1) a X (T2) d Y
(T1) X a+100 (T2) Y 2d
(T2) c X (T1) b Y
(T2) X 2c (T1) Y b+100
ts(T1) < ts(T2)
reject!
abort T1
abort T2
abort T1
abort T2
![Page 25: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/25.jpg)
CS 347 Notes05 25
Strict T.O.
• Lock written items until it is certain that writing transaction has been successful (avoid cascading rollbacks)
![Page 26: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/26.jpg)
CS 347 Notes05 26
Example Revisited
(Node X) (Node Y)
(T1) a X (T2) d Y
(T1) X a+100 (T2) Y 2d
(T2) c X (T1) b Y
ts(T1) < ts(T2)
reject!abort T1
delay
![Page 27: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/27.jpg)
CS 347 Notes05 27
Example Revisited
(Node X) (Node Y)
(T1) a X (T2) d Y
(T1) X a+100 (T2) Y 2d
(T2) c X (T1) b Y
ts(T1) < ts(T2)
reject!abort T1
delay
abort T1
(T2) c X
(T2) X 2c
![Page 28: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/28.jpg)
CS 347 Notes05 28
Enforcing T.O.
• For each data item X:MAX_R[X]: maximum timestamp of a transaction that read
XMAX_W[X]: maximum timestamp of a transaction that
wrote X
rL[X]: # of transactions currently reading X (0,1,2,…)wL[X]: # of transactions currently writing X (0 or 1)
![Page 29: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/29.jpg)
CS 347 Notes05 29
T.O. Scheduler - Part 1
ri[X] arrivesIF ts(Ti) < MAX_W[X] THEN ABORT Ti
ELSE [ IF ts(Ti) > MAX_R[X] THEN MAX_R[X] ts(Ti);
IF queue is empty AND wL[X] = 0 THEN
[rL[X] rL[X]+1; START READ OF X]
ELSE add (r, Ti) to queue]
![Page 30: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/30.jpg)
CS 347 Notes05 30
Wi[X] arrivesIF ts(Ti) < MAX_W[X] OR ts(Ti) < MAX_R[X] THEN ABORT Ti
ELSE [ MAX_W[X] ts(Ti);
IF queue is empty AND wL[X]=0 AND rL[X]=0
THEN [wL[X] 1; WRITE X; WAIT FOR Ti TO FINISH ]
ELSE add (w, Ti) to queue ]
T.O. Scheduler - Part 2
![Page 31: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/31.jpg)
CS 347 Notes05 31
When o finishes (o is r or w) on X
oL[X] oL[X] - 1; NDONE TRUE
WHILE NDONE DO[ let head of queue be (q, Tj); (smallest timestamp)
IF q=w AND rL[X]=0 AND wL[X]=0 THEN
[remove (q,Tj); wL[X] 1;
WRITE X AND WAIT FOR Tj TO FINISH ] ELSE IF q=r AND wL[X]=0 THEN
[remove (q,Tj); rL[X] rL[X] +1; START READ OF X]
ELSE NDONE FALSE ]
T.O. Scheduler - Part 3
![Page 32: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/32.jpg)
CS 347 Notes05 32
Note about the code
for reads: [rL[X] rL[X]+1; START READ OF X]
for writes: [wL[X] 1; WRITE X; WAIT FOR Ti TO FINISH ]
Meaning: In Part 3, the end of a write isonly processed when all writes for itstransaction have completed.
![Page 33: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/33.jpg)
CS 347 Notes05 33
If a transaction is aborted, it must be retired with a new, larger timestamp
MAX_R[X]=10 T ts(T)=8
MAX_W[X]=9
Xread X
..
..
![Page 34: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/34.jpg)
CS 347 Notes05 34
If a transaction is aborted, it must be retired with a new, larger timestamp
MAX_R[X]=10 T ts(T)=8
MAX_W[X]=9
Xread X
..
..
ts(T)=11
Starvation possible
![Page 35: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/35.jpg)
CS 347 Notes05 35
Thomas Write Rule
MAX_R[X] MAX_W[X]
ts(Ti)
Ti wants to write X
![Page 36: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/36.jpg)
CS 347 Notes05 36
Change in T.O. Scheduler:
When Wi[X] arrivesIF ts(Ti)<MAX_R[X] THEN ABORT Ti
ELSE IF ts(Ti)<MAX_W[X] THEN[IGNORE* THIS WRITE (tell Ti it was OK)]
ELSE [process write as before…
MAX_W[X] ts(Ti);
IF queue is empty AND wL[X]=0 AND rL[X] =0 THEN
[wL[X] 1; WRITE X and WAIT FOR Ti TO FINISH]
ELSE add (W, Ti) to queue]
MAX_R[X] MAX_W[X]
ts(Ti)
Ti wants to write X
* Ignore if transaction that wrote MAX_X[X] did not abort
![Page 37: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/37.jpg)
CS 347 Notes05 37
Question
ts(Ti)Ti wants to write
MAX_W[X]
MAX_R[X]
• Why can’t we let Ti go ahead?
![Page 38: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/38.jpg)
CS 347 Notes05 38
Question
ts(Ti)Ti wants to write
MAX_W[X]
MAX_R[X]
• Why can’t we let Ti go ahead?
Tj read
MAX_R[X] is only the latest read; there could be a Tj read as shown...
![Page 39: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/39.jpg)
CS 347 Notes05 39
Optimizations
• Update MAX_R, MAX_W when action executed, not when action put on queue
Example: MAX_W[X]=9 or 7?
W, ts=9
W, ts=8
W, ts=7
active write
X:
![Page 40: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/40.jpg)
CS 347 Notes05 40
• Use multiple versions of data
Value written with ts=9
..
Optimizations
ri[x] ts(Ti)=8Value written with ts=7
X:
![Page 41: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/41.jpg)
CS 347 Notes05 41
2PL TO
T1: w1[Y]T2: r2[X] r2[Y] w2[Z]
ts(T1)<ts(T2)<ts(T3)T3: w3[X]
S: r2[X] w3[X] w1 [Y] r2[Y] w2[Z] S could be produced with T.O. but not
with 2PL
![Page 42: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/42.jpg)
CS 347 Notes05 42
Are all 2PL schedules T.O.?
T.O. schedules
2PL schedules
previous example
anyexamples
here??
![Page 43: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/43.jpg)
CS 347 Notes05 43
Theorem
If S is a schedule representing an execution by a T.O. scheduler, then S is serializable
Proof:1) say Ti Tj in P(S)
conflicting pi[x], qj[x] in S,
such that pi[x] <S qj[x]
Then by T.O. rule, ts(Ti)< ts(Tj)
![Page 44: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/44.jpg)
CS 347 Notes05 44
2) Say there is a cycle T1 T2 ...Tn T1 in P(S) then: ts(T1)<ts(T2) <ts(T3)…<ts(Tn)<ts(T1) A contradiction!
3) So P(S) is acyclic S is serializable
Proof - Continued
![Page 45: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/45.jpg)
CS 347 Notes05 45
Timestamp management
Item data MAX_R MAX_WX1X2 - too much
space!. - more IO..Xn
.. .. ..
![Page 46: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/46.jpg)
CS 347 Notes05 46
Timestamp cache
Item MAX_R MAX_WXY
Z
..
tsMIN
• If transaction reads or writes X, make entry in cache for X (add row if not in)
• Periodically purge all items X withMAX_R[X] < tsMIN, MAX_W[X] < tsMINand remember tsMIN (choose tsMIN current time - d)
![Page 47: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/47.jpg)
CS 347 Notes05 47
• To enforce T.O. rule for pi[X]– if X entry in cache,
use MAX_R, MAX_W values in cache– else assume
MAX_R[X]=tsMIN, MAX_W[X]=tsMIN
• Use hashing (on items) for cache(same as lock table)
Timestamp cache
![Page 48: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/48.jpg)
CS 347 Notes05 48
Distributed T.O. Scheduler
D1
D1
tscache
scheduler 1
node 1
D2
D2
tscache
scheduler 2
node 2
...
• Each scheduler is “independent”• At end of transaction, signal all schedulers involved to release all wL[X] locks
![Page 49: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/49.jpg)
CS 347 Notes05 49
Distributed T.O. Scheduler
D1
D1
tscache
scheduler 1
node 1
D2
D2
tscache
scheduler 2
node 2
...
Taccess data
access data
• Each scheduler is “independent”• At end of transaction, signal all schedulers involved to release all wL[X] locks
![Page 50: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/50.jpg)
CS 347 Notes05 50
Summary
• 2PL - the most popular - useful in a distributed
system - deadlocks possible- several variations
• T.O. - good for multiple versions- aborts more likely
- no deadlocks- useful in a
distributed sytem
![Page 51: CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control](https://reader036.fdocuments.in/reader036/viewer/2022070405/56813e5c550346895da85b9c/html5/thumbnails/51.jpg)
CS 347 Notes05 51
• Others concurrency control schemese.g., Certifiers, serialization graph testing
hard to not very practicalimplement in need global dataa distributed structuresystem