Concurrency Control - Praveen kumarpraveencs.weebly.com/uploads/1/0/4/4/10440152/unit-v-dbms.pdf ·...
Transcript of Concurrency Control - Praveen kumarpraveencs.weebly.com/uploads/1/0/4/4/10440152/unit-v-dbms.pdf ·...
1
Praveen Kumar
Concurrency Control
One of the important functions of DBMS is Concurrency Control, which is performed by
one of its major components called Concurrency Control Manager (CCM). Concurrency
Control implies controlling the execution of concurrent transactions in a schedule in such
a way that the resulting schedules are Serializable and Cascade-less.
Concurrency Control Protocols
The Concurrency Control Protocols can be divided into the following sub-categories:-
(a) Lock based Protocols
(b) Time-Stamp based Protocols
(c) Hybrid Time-Stamp and Lock based Protocols
(d) Multi-Version based Protocols
(e) Validation based Protocols
Lock Based Protocols
In the Lock-Based Protocols, when a Concurrent transaction Ti needs to access a shared
data item Q, it will first lock the data item Q in an appropriate Mode and then access it.
The locking and unlocking of data items is centrally controlled by Concurrency Control
Manager. For accessing data item Q, Transaction Ti will proceed as follows:-
(i) Ti requests Concurrency Control Manager to grant it lock on data item Q in
Mode M. Locks on all data items are controlled by Concurrency Control
Manager.
(ii) Ti will examine whether lock on Q can be granted immediately to Ti in the
requested mode or not?
(iii) If YES then lock on data item Q in mode M is granted to Ti and Ti proceeds
with its execution Else Ti is made to wait till the requested lock on Q is
available. So, Ti will execute its next instruction only after the requested lock
is granted.
(iv) After Ti finishes with accessing of Q, it releases the lock on Q, which can now
be assigned to another waiting transaction.
(v) If Ti fails during its execution before releasing a lock, the lock is
automatically released when Ti is rolled-back during recovery.
Different Locking Modes
2
Praveen Kumar
Locks will be granted in two different modes:-
(a) Shared-Mode A Shared Mode lock on a data item Q permits the owner
transaction to perform only Read(Q) Operation; not Write(Q) operation. Since, the owner
transaction of a Shared-Mode Lock on a data item Q is not permitted to modify the data
value, shared mode lock can be granted on Q to more than one transactions concurrently.
(b) Exclusive-Mode If a transaction Ti needs to perform Read/Write or Write
operations on a data item Q, then it would request Exclusive Lock (X-Lock) on the data
item Q. Once Exclusive Lock is granted to Ti on Q, it provides exclusive access rights on
Q. No other transaction can have Shared or Exclusive lock on Q till Ti releases the X-
Lock. So, Exclusive Lock on a data item Q can be granted to a waiting transaction, only
when no other transaction is holding any lock (Shared or Exclusive) on Q.
Compatibility of Locks
Existing Lock on Data Item Q →
Requested Lock
On Data Item Q
Shared Exclusive
Shared Yes No
Exclusive No No
It implies that if a data item Q is currently held by a Transaction Ti in Shared Mode then
other transactions can also be granted Shared Lock (but NOT Exclusive Lock) on Q. But
if a data item Q is currently held by a Transaction Ti in Exclusive Mode then no other
transaction can be granted Shared or Exclusive Lock on Q.
Use of Locks to achieve serializability.
Schedule L1
T1 T2
Read (A);
Read (A);
Write (A);
Write (A);
The above Schedule L1 is not Conflict-Serializable; but it can be serialized by using locks
as below in Schedule L2 :-
Schedule L2
3
Praveen Kumar
T1 T2 Concurrency Control Manager
Lock-X (A); Grant-X (A, T1)
/*Assuming A is currently not locked;
the requested X-Lock will be granted to T1 */
Read (A); /* T1 Reads A */
Lock-X (A); /* Since A is currently locked in X-Mode
by Transaction T1, so T2 is made to wait
till A is unlocked by T1 */
Write (A);
Unlock (A); De-assign (A, T1)
/* The X-Lock on A granted to T1 is de-assigned */
Grant-X (A, T2)
/* X-Lock on A is assigned to waiting transaction T2 */
Read (A); /* Now T2 proceeds to execute Read (A) */
Write (A);
Unlock (A);
So, the non-serial Schedule L1 is now converted to a Conflict-Serializable Schedule L2. It
is in fact a serial schedule. So, in the bargain, Concurrency was sacrificed.
The use of locks may not always result in a serializable schedule, as indicated below for
schedule L3
Schedule L3
T1 T2 Concurrency Control Manager
Lock-X (A); Grant-X (A, T1)
Read (A);
Lock-X(A); /* Since ‘A’ is currently locked in X-Mode
by Transaction T1, so T2 is made to wait
till ‘A’ is unlocked by T1 */
Write (A);
Unlock (A); /* At this point, T1 unlocks ‘A’ */
Grant-X (A, T2)
Read (A);
Write (A);
4
Praveen Kumar
Unlock (A);
Lock-X (B); Grant-X (B, T2)
Read (B);
Write (B);
Unlock (B);
Lock-X (B); Grant-X (B, T1)
Read (B);
Write (B);
Unlock (B);
The Precedence Graph of Schedule L3 :-
As shown above, the Schedule L3, despite use of Locks, is NOT a Conflict-Serializable
Schedule. This has happened, because T1 unlocked Resource ‘A’ prematurely. The
locking needs some additional protocols to ensure Serializability, as discussed below.
Two Phase Locking Protocol
As per this protocol, each transaction, during its execution, will obtain and release locks
in two distinct phases:-
Phase I This phase begins as soon as a transaction becomes active. During this
phase, a transaction will only obtain locks and will not release any locks. This phase ends
as soon as a lock, held by the transaction, is released.
Phase II This phase begins when first lock is released by a transaction. During this
phase, no new locks will be obtained by the transaction; only the locks held by it will be
released.
Let us modify the locking of Schedule L3 to Two-Phase-Locking, as shown below in
the Schedule L4.
Schedule L4 (using Two-Phase Locking Protocol)
T1 T2 Concurrency Control Manager
Lock-X (A); Grant-X (A, T1)
Read (A);
Lock-X(A); /* Since ‘A’ is currently locked in X-Mode
by Transaction T1, so T2 is made to wait
T1 T2
5
Praveen Kumar
till ‘A’ is unlocked by T1 */
Write (A);
Lock-X (B); Grant-X (B, T2)
Unlock (A); /* At this point, T1 unlocks ‘A’ */
Grant-X (A, T2)
Read (A);
Write (A);
Lock-X (B); /*Since X-Lock on ‘B’ is currently held by T1
T2 has to wait */
Read (B);
Write (B);
Unlock (B); Grant-X (B, T1)
Unlock (A);
Read (B);
Write (B);
Unlock (B);
The Precedence Graph of Schedule L4 :-
As indicated above, the Schedule L4, which is using Two-Phase Locking Protocol, is a
Conflict-Serializable Schedule.
So, Two-Phase-Locking Protocol ensures Conflict-Serializability.
The Two-Phase Locking results in Conflict-Serializable Schedules. The point, where
last lock is obtained by a transaction, is called Lock-Point. The transactions, participating
in a schedule, will be serialized in the same order as the order of their lock-points.
The Two-Phase Locking ensures a Conflict-Serializable Schedule but not a Cascade-
less Schedule, as shown below in Schedule L5
Schedule L5
Suppose Transaction T1 needs to access both data items ‘A’ and ‘B’ but Transaction T2
needs to access only one data item ‘A’.
T1 T2 Concurrency Control Manager
T1 T2
6
Praveen Kumar
Lock-X (A); Grant-X (A, T1)
Read (A);
Lock-X (A); Since ‘A’ is currently locked by T1, T2 has to wait.
Write (A);
Lock-X (B); Grant-X (B, T1)
Unlock (A); Since T1 has unlocked ‘A’, T2 can now be granted
lock on ‘A’
Grant-X (A, T1)
Read (A);
Write (A);
Unlock (A);
Read (B);
Write (B);
Unlock (B);
Commit;
Commit;
The Schedule L5 is following Two-Phase-Locking Protocol. It is Conflict-Serializable but
not Cascade-less, since T2 is able to read the value of data item ‘A’ after ‘A’ has been
modified by T1 but before T1 Commits. This can be obviated by using Strict-Two-Phase-
Locking Protocol as explained below:-
Strict Two Phase Locking Protocol This protocol is a variant of two-phase locking
protocol. In addition to two-phase protocol, it ensures that all exclusive locks held by a
transaction are released only after the transaction has committed successfully. This
ensures that the data items updated by a transaction are permitted to be read by other
transactions till T commits. Thus, this protocol will ensure Cascade-Less Schedules. Let
us modify the Schedule L5 (Shown above) to follow Strict-Two-Phase-Locking, as
indicated below:-
Schedule L6
T1 T2 Concurrency Control Manager
Lock-X (A); Grant-X (A, T1)
Read (A);
Lock-X (A); since ‘A’ is currently locked by T1, T2 has to wait.
Write (A);
Lock-X (B); Grant-X (B, T1)
Read (B);
Write (B);
Unlock (A);
Unlock (B);
7
Praveen Kumar
Commit; Since T1 has unlocked ‘A’, T2 can now be granted
lock on ‘A’
Grant-X (A, T1)
Read (A);
Write (A);
Unlock (A);
Commit;
The Schedule L6 , as indicated above, is Conflict-Serializable as well as Cascade-less. T2
is permitted to read the value of ‘A’ (modified by T1) only after T1 Commits.
Rigorous Two Phase Locking Protocol This protocol is also a variant of two-phase
locking protocol. It follows two-phase-locking protocol and also ensures that all locks
(Exclusive as well as Shared Locks) held by a transaction will be released only when a
transaction Commits successfully. This ensures that transactions will be serialized strictly
in the same order as the order of their Commit.
Limitation of Lock-based Algorithms
The Lock-based algorithms suffer from the problem of Deadlocks. For example, suppose
the transactions T1 and T2 need to access Resources ‘A’ and ‘B’ and suppose T1 needs to
first access ‘A’ followed by ‘B’ and T2 needs to first access ‘B’ followed by ‘A’ and
suppose the two transactions are following Strict Two Phase Protocol or Rigorous Two
Phase Protocol, the two transaction can enter into Deadlock, as indicated below:-
Suppose both T1 and T2 need to access data items ‘A’ and ‘B’. T1 needs to access ‘A’
first and then ‘B’ whereas T2 needs to access ‘B’ first and then ‘A’.
Schedule L7
T1 T2 Concurrency Control Manager
Lock-X (A); Grant-X (A, T1)
Read (A);
Lock-X(B); Grant-X (B, T2)
Read (B);
Write (A);
Lock-X (B); /* Since X-Lock on ‘B’ is currently held by T2
so T1 has to wait */
Write (B);
Lock-X (A); /* Since X-Lock on ‘A’ is currently held by T1
so T2 has to wait */
8
Praveen Kumar
The above Schedule L7 follows Strict-Two-Phase Protocol. It has entered a condition,
wherein T1 and T2 will wait forever for the Resources locked by each other; thus causing
a Deadlock Condition.
The Solution for Deadlocks can be found in Time-Stamp based algorithms, as explained
subsequently.
Multiple Granularity Locking
In the concurrency control schemes discussed above, data items are locked individually.
However, the locking overheads can be much reduced, if data-items are grouped into
synchronization units. A hierarchical relationships can be created between the data items
in the form of a tree as indicated below:-
----- -----
The tree indicated above has three levels of nodes- the root represents the entire database
(DB), the middle level indicates Files (representing Tables or Relations) and the lowest
level indicates Records (Tuples of the Relations). It uses two-phase locking and makes
use of the following types of locks:-
(a) Exclusive (X) Lock When a Transaction is granted an X lock on a node “K”, it
is implicitly granted X lock on all the descendents of node “K”.
(b) Shared (S) Lock When a Transaction is granted an S lock on a node “K”, it
is implicitly granted S lock on all the descendents of node “K”.
(c) Intentional X (IX) Lock When a Transaction needs X lock on a node “K”,
the transaction would need to apply IX lock on all the precedent nodes of “K” starting
DB
File A File B File C
Record
Ra1 Ran Rb1 Rbm
9
Praveen Kumar
from the root node. So, when a node is found locked in IX lock mode, it indicates that
some of its descendent nodes must be locked in X mode.
(d) Intentional S (IS) Lock When a Transaction needs S lock on a node “K”,
the transaction would need to apply IS lock on all the precedent nodes of “K” starting
from the root node. So, when a node is found locked in IS lock mode, it indicates that
some of its descendent nodes must be locked in S mode.
(e) SIX Lock When a node is locked in SIX mode, it indicates that the node is
explicitly locked in S Mode and IX Mode. So, the entire tree rooted by that node is
locked in S mode and some nodes in that are locked in X mode. This mode is compatible
only with IS mode.
The Compatibility Matrix is as indicated below:-
IS IX SIX S X
IS True True True True False
IX True True False False False
SIX True False False False False
S True False False True False
X False False False False False
The Scheme operates as follows:-
(a) A Transaction must first lock the Root Node and it can be locked in any mode.
(b) Locks are granted as per the Compatibility Matrix indicated above.
(c) A Transaction can lock a node in S or IS mode if it has already locked all the
predecessor nodes in IS or IX mode.
(d) A Transaction can lock a node in X or IX or SIX mode if it has already locked
all the predecessor nodes in SIX or IX mode.
(e) A transaction must follow two-phase locking. It can lock a node only it has
not previously unlocked a node.
(f) Before it unlocks a node, a Transaction has to first unlock all the children
nodes of that node.
So, locking proceeds top-down and unlocking proceeds bottom-up.
Take the following example:-
DB
10
Praveen Kumar
Suppose a transaction needs lock Records Ra2 and Rc1 in X Mode, it will lock as
follows:-
DB, Fa, Fc ----- in IX mode.
Ra2, Rc1------ in X mode.
Time-Stamp Based Algorithms
The main features of Time-Stamp based algorithm are as follows:-
- When a Transaction Ti is initiated, it is assigned a Time Stamp TS (Ti).
- The Time Stamp will be a number; it could be real time (indicated by the
system clock) at the time of initiation of Ti or a counter value, updated by
the system in an ascending order; initial value could be zero and it could
be incremented by one whenever Time Stamp is assigned to a new
Transaction. So, no two transactions will have same value of Time Stamp.
If a Transaction Tj is initiated later than Ti then TS (Tj) > TS (Ti).
- Each Data Item Q, being accessed by the Transactions will have two
Time Stamps- Read Time Stamp R-TS(Q) and Write Time Stamp W-
TS(Q). The Initial Value of these Time Stamps will be zero but later the
values are updated as explained below:-
The value R-TS (Q) will be equal to the Time-Stamp of the
Transaction Tk which has read the data item Q successfully and
this happens to be largest amongst the transactions which have
read it successfully. So, whenever a Transaction Ti reads data item
Q successfully, R-TS(Q) is updated as follows:-
R-TS (Q) = Max ( R-TS(Q), TS(Ti) )
Fa Fb
Fc
Ra1 Ra2
Ra3
Rb2
Rc1
Rc3 Rb1
Rc2
11
Praveen Kumar
The value W-TS(Q) will be equal to the Time-Stamp of the
Transaction Tk which has Updated (Written) the data item Q
successfully and this happens to be largest amongst the
transactions which have updated it successfully. So, whenever a
Transaction Ti updates data item Q successfully, W-TS(Q) is
updated as follows:-
W-TS (Q) = Max ( W-TS(Q), TS(Ti) )
Whenever a Transaction Ti needs to access a data item ‘Q’, it proceeds as follows:-
Read (Q)
- The Transaction Ti requests Read (Q)
- The System will process the request as follows:-
If ( W-TS(Q) > TS (Ti) )
it indicates that a transaction that was initiated later than Ti has
already updated Q. So, the value of Q, which Ti intends to Read,
has already been overwritten.
In this case, the Read(Q) is rejected and Transaction Ti is rolled
back. Then, Ti will be re-initiated with a new Time-Stamp
assigned to it.
Else Read(Q) is executed and R-TS(Q) is updated as follows:-
R-TS(Q) = Max ( R-TS(Q), TS(Ti) )
Write (Q)
- The Transaction Ti requests Write (Q)
- The System will process the request as follows:-
If ( R-TS(Q) > TS (Ti) )
it indicates that a transaction (Tj) that was initiated later than Ti
has already Read the old value Q. So, Ti should have performed
this Write operation earlier than Tj’s Read operation, so that Tj
could have Read the updated value of Q.
In this case, the Write (Q) is rejected and Transaction Ti is rolled
back. Then, Ti will be re-initiated with a new Time-Stamp
assigned to it.
Else If ( W-TS (Q) > TS (Ti ) )
12
Praveen Kumar
It indicates that a Transaction (say Tj) initiated later than Ti has
already updated Q. It means the value of Q which Ti now intends
to write is already obsolete.
In this case also, the Write (Q) is rejected and Transaction Ti is
rolled back (though rolling-back the transaction in this case is
not really necessary, which will be discussed later). Then, Ti will
be re-initiated with a new Time-Stamp assigned to it.
Else Write (Q) is executed and W-TS(Q) is updated as follows:-
W-TS(Q) = TS(Ti)
So, in the Time-Stamp based algorithm, whenever a Transaction requests Read/Write of a
data item Q, either the Read/ Write is successfully performed or the requesting
Transaction is Rolled-back. If Rolled-back, the transaction will be re-initiated with a new
Time-Stamp assigned to it.
Advantages of Time-Stamp based Algorithm vis-à-vis Lock-Based Algorithm
Since a Transaction never waits for accessing of a data item, there is no possibility of
Deadlock.
Dis-advantages of Time-Stamp based Algorithm vis-à-vis Lock-Based Algorithm
1. There is a distinct possibility that a Transaction may be rolled back again and
again and may face starvation.
2. Whenever a transaction is rolled back, the work already performed by that
transaction is undone; Even undoing the work also requires more work. This
affects the system throughput adversely.
Thomas Write Rule
Whenever a Transaction Ti requests Write (Q)
If ( W-TS (Q) > TS (Ti ) ) It indicates that a Transaction (say Tj) initiated later than Ti
has already updated Q. It means the value of Q which Ti now intends to Write is already
obsolete. Under this condition, the Time-Stamp based algorithm will reject the Write (Q)
and also roll-back the Transaction Ti. Rolling-back the transaction is not really necessary
in this case. This was proposed by Thomas as follows:-
As per Thomas’ Write Rule, under the above condition, the Write (Q) operation is
ignored (since the value which Ti intends to Write is obsolete) but Ti is not Rolled-back.
13
Praveen Kumar
Some Algorithms, which avoid Starvation
The following algorithms, which make use of both Locks and Time Stamps, obviate the
possibility of starvation:-
1. Wait-Die Algorithm
2. Wound-Wait Algorithm
1. Wait-Die Algorithm
It operates as follows:-
- When a Transaction is initiated, it is assigned a Time Stamp just like in
the case of Time-Stamp based algorithm.
- When a Transaction Ti requests accessing of a data item Q, which is
currently locked by another Transaction Tj, the request is processed as
follows:-
If ( TS(Ti ) < TS (Tj) )
Then Ti waits for Tj to finish and release lock on Q
Else Ti Rolls-back and restarts with old Time-Stamp retained by it.
It is possible that a transaction may be rolled back a couple of times while attempting to
access a data item Q. But since a rolled back transaction is re-initiated with its original
time-stamp retained by it, finally it will mature and be eligible to access the data item Q.
So, there is no possibility of starvation.
2. Wound-Wait Algorithm
It operates as follows:-
- When a Transaction is initiated, it is assigned a Time Stamp just like in
the case of Time-Stamp based algorithm.
- When a Transaction Ti requests accessing of a data item Q, which is
currently locked by another Transaction Tj, the request is processed as
follows:-
If ( TS(Ti ) < TS (Tj) )
Then Ti forces Tj to Roll-back. When Tj Rolls-back, it will release
lock on Q, which will then be available for access to Ti
Else Ti waits for Tj to finish and release lock on Q, which will then
become available to Ti
In this case also, there is no possibility of starvation.
14
Praveen Kumar
Wound-Wait Vs Wait-Die Algorithm
It is better than Wait-Die Algorithm, as the average number of roll-backs in this case will
be lower. Since, if a Transaction Tj has been forced to roll-back by another Transaction Ti
then Tj will be re-initiated with the original Time-Stamp retained by it. Now, when it
requests access of Q, suppose Ti is still holding a lock on Q, then Tj will wait for Ti to
finish and then access Q. So, in this there would be only one roll-back, not repeated roll-
backs which are possible in Wait-Die Algorithm. Since, the average number of Roll-back
will be less than Wait-Die Algorithm, the Wound-Wait Algorithm will produce more
throughput.
Validation Based Protocols
The protocol operates as follows:-
A transaction is executed in the following three phases:-
1. Read Phase During this phase, a Transaction Ti reads-in all the
data-items needed by the transaction into its local variables. It performs its
processing and stores the results into its local variables, without affecting the
database.
2. Validation Phase During this phase, it determines whether it is valid
to copy the updated local variables into the Database, without causing a
serializability conflict. The Validation Test is subsequently explained.
3. Write Phase If Transaction Ti passes the Validation Test, then the
system transfers its updates from the local variables to the Database; else Ti is
rolled back to its initial state and re-started. As obvious, there won’t be any roll-
back of ‘Read Only’ Transactions and such transactions will always go-through
without any waiting.
Time Stamps associated with a Transaction
To enable performing of Validation Test, the system associate the following
Time-Stamps with each Transaction Ti :-
1. Start (Ti) The time when Ti started its execution i.e. start of Read
Phase.
2. Validation(Ti) The time when Ti finished its Read Phase and stated
Validation Phase.
3. Finish (Ti) The Time when Ti its Write Phase.
15
Praveen Kumar
The Time Stamp associated with Ti for serialization TS(Ti) = Validation(Ti).
The Serializabililty requirement of a Transaction Pair (Ti , Tj) is that if TS(Ti) <
TS(Tj) then in the valid schedule, Ti must appear before TJ.
Validation Test
A Transaction Ti is said to pass the Validation Test if one of the following two
conditions holds against each concurrent transaction Tj that satisfies TS(Tj) < TS
(Ti) i.e. Tj commenced Validation Test earlier than Ti :-
(a) Finish (Tj) < Start (Ti), since Tj finished before Ti commenced; so Tj does
not conflict with Ti.
(b) Start (Ti) < Finish (Tj) < Validation (Ti) AND the set of data items written
by Tj does not overlap with the data items read-in by Ti. This implies that
Ti started before Tj finished but Tj finished before Ti stated Validation
Phase AND the set of data items read-in by Ti does not overlap with the
data items modified by Tj. Since, Tj is already finished; so it can’t modify
any more data items. So, Tj does not conflict with Ti.
In this protocol, the concurrency checks are not made for individual Read/Write
operations of a transaction; rather it makes a single check after all Read/Write
operations have been performed in the local variables; and if the validation check
goes through successfully, all updates of the transaction are applied to the
database. So, concurrency control overheads will be very low. This protocol is
ideal for an environment, wherein majority of the Transactions are “Read-Only’
and contention of ‘Write’ Transactions is extremely rare. In case of such
contentions, a transaction would need to be rolled back
Multi-Version based Scheme
The Multi-Version Scheme operates as follows:-
- When a Transaction Ti requests a Write (Q) operation, the system creates
a new Version of data item Q.
- When a Transaction requests a Read (Q) operation, the system will
determine which version of Q should be read by the Transaction Tj and
the system returns appropriate value of Q to Tj.
- The Versions of data-item Q, which are no more required, are deleted by
the system.
16
Praveen Kumar
Each version Qk of data-item Q will have three fields:-
Content (Qk) the value of data-item Q associated with version Qk. It is equal to
the value of Q written by the transaction that created the version.
W-TS(Qk) The time Stamp of the Transaction that created Version Qk by a
Write(Q) operation.
R-TS(Qk) The largest time Stamp of any Transaction that has successfully read
the Version Qk by a Read (Q) operation.
When a Transaction Ti requests a Read (Q) or Write (Q) operation then the system will
determine the version Qk of Q, whose W-TS(Qk) is largest amongst all the versions of Q
and less than or equal to TS (Ti), then the system proceeds as follows:-
Read (Q) Value of Content (Qk) is returned to Ti. And if TS(Ti) > R-TS(Qk) then the
value of R-TS(Qk) is updated to MAX (R-TS(Qk), TS(Ti)).
Write (Q)
If TS(Ti) < R-TS(Qk) (i.e. a transaction which was initiated later than Ti has already
read the most current version of Q) the Transaction Ti is rolled-back and restarted
with a new time-stamp;
Else If TS(Ti) = W-TS(Qk) it implies that version Qk has been created by Ti itself;
in this case Ti overwrites the contents of Qk;
Else Ti creates a new version Qi of Q. The Content (Qi) will be equal to the value of
Q written by Ti. R-TS(Qi) and W-TS(Qi) will be set equal to TS (Ti).
Deletion of Defunct Versions
A Crucial issue in this scheme is to determine the versions of Q that are no more
required and therefore can be deleted. Suppose there are two versions Qi and Qj of a data
item Q such that both have Write Time Stamp W-TS less than the oldest Transaction in
the system at that moment of time then the older of the two transactions is deleted.
In Multi-Version based protocol, all Read operations will always go through
successfully, but a Read operation will have higher overheads as compared to other
protocols since the system has to determine the version of Q to be read. As regards Write
operation, it may go through or it may have to be rolled back. In case, a transaction is
rolled back, it is re-started with a new Time-Stamp. Next time, hopefully it will go
through.
CHPATER 12
17
Praveen Kumar
Recovery
A failure would leave the Database System in a Suspect State. Recovery implies restoring
the Database (after a failure) to a state that is assumed to be Consistent. The Recovery
is made possible by the log of updates maintained in a system Log-File on a stable (non-
volatile) storage.
Let us consider a Transaction for Transfer an amount of Rs.1000/= from
ACCOUNT# 100 to ACCOUNT# 200.
Let the relation Account be on Schema ACCOUNT (Account-No, Branch-Name, Bal).
BEGIN TRANSACTION;
UPDATE Account
SET Balance = Balance –1000
WHERE Account-No = ‘100’;
IF any error THEN GO TO Undo;
UPDATE Account
Balance = Balance + 1000
WHERE Account-No = ‘200’;
IF any error THEN GO TO Undo;
COMMIT TRANSACTION;
GOTO FINISH;
UNDO: ROLLBACK TRANSACTION;
FINISH: RETURN;
The above transaction involves two updates to the relation Account. Temporarily,
the database would be in an inconsistent state, when one update has been performed and
the other one is still to be performed. During this period, the total Balance at the
BRANCH would be deficit by 1000, but at the end of the Transaction, the total would
tally. So, we can state that a Transaction transforms database from one consistent
state to other consistent state, without necessarily preserving consistency at all
intermediate steps.
Suppose the system fails between the two updates i.e. the first update is executed,
but not the second. Then, the database would be left in an inconsistent state. To obviate
this situation, the Transaction should be either executed in its entirety or not at all.
So, if a transaction executes some of its updates and then a failure occurs before the
transaction reaches its planned termination, then the completed updates must be undone
(called ROLLBACK). The system component that provides this atomicity to the
transaction processing is called Transaction Manager.
18
Praveen Kumar
COMMIT TRANSACTION This operation signals a successful end of the
Transaction. It confirms to the Transaction Manager that a transaction has been
successfully completed and the database is in a consistent state. All the updates made by
the transaction have been made safe in a non-volatile Log File.
ROLLBACK TRANSACTION This operation signals an unsuccessful end of the
transaction. It indicates to the transaction manager that a transaction has failed to
complete due to some failure and the database may be in an inconsistent state, till all the
updates made by the transaction are “rolled back” i.e. undone.
The log is maintained in two portions:-
An active or online log, which is maintained on the online disk. This log is used
for minor recoveries, during normal operations.
An archive or offline log, which is maintained on a tape. The offline log
maintains the record of updates since the last backup, which are used to restore
the system, in case of major failures. The database is first installed from the last
backup and then updates since the last backup are applied, to bring the database as
close as possible to the state that existed at the time of failure. Under such
situations, 100% recovery is not practically feasible.
Transaction Recovery A transaction begins with the successful execution of a
BEGIN TRANSACTION statement and ends with the successful execution of a
COMMIT or ROLLBACK statement. Thus, a COMMIT establishes a COMMIT
POINT (also called synch-point) at which database is in a state of consistency. A
ROLLBACK rolls back the database to the previous COMMIT POINT, at which again
the database was in a state of consistency.
When a COMMIT POINT is established:-
1. All the updates, made by the program since the previous COMMIT POINT, are
committed i.e. are made PERMANENT.
2. All database pointers (i.e. addressability to certain tuples) and all tuple- locks will
be released. Some systems provide an option to retain addressability to tuples (and also
retain tuple locks) from one commit point to next.
A single program execution may comprise a sequence of transactions. A
COMMIT or ROLLBACK will terminate a transaction, but will not terminate the entire
program.
The System does not assume that an application program can include explicit
checks for all possible error conditions. The System will issue implicit ROLLBACK for
any program that fails due to any reason, even if it has not been specified explicitly in the
program. So, a transaction is also used as a unit of recovery.
19
Praveen Kumar
If a transaction SUCCESSFULLY COMMITS, the system will guarantee that all
its updates are permanently reflected in the database, even if the system crashes before
the updates are physically written into the database. With the help of log, the system
would write such updates physically into the database during RESTART after the failure.
The RESTART procedure will recover any transactions that completed successfully, but
did not manage to get their updates physically written into the database (on stable
storage) prior to the crash.
System Recovery
There are two types of System Failures:-
(a) Local Failure
(b) Global Failure
A local failure affects only the transaction, during the execution of which the failure has
occurred. Recovery from such failures has been covered above.
A global failure affects all the transactions that may be in progress at the time of failure.
Such failure fall into two broad categories:-
1. System Failure (e.g. Power Failure) This affects all transactions currently
in progress but does not physically damage the database. This failure is called soft crash.
2. Media Failure (e.g. Disk Head Crash) It causes damage to the
database, or to some portion of it, and affects those transactions currently using that
portion of the database. A media failure is called hard crash.
Modes of Database Updates
1. Immediate Update
When an active Transaction Ti updates a data item Q, the update is immediately reflected
in the database (on stable storage) even before Ti Commits. In this case, there will be a
requirement to UNDO the updates of those transactions, which fail before Commit Point
is reached. Another limitation of this mode of update is that for each update, the system
has to perform a Disk-Write, that too on sectors which may be widely separated. This
involves significant overheads of Disk Seek Time.
How to UNDO the updates in case of ROLLBACK?
20
Praveen Kumar
The System maintains a log (or journal) of all operations on the disk, which
contains details of all updates. The pre-update and post-update values of the
updated objects are recorded in the log, as follows:-
<Ti , Q, Old-Value, New-Value>
This implies that Transaction Ti has updated data item Q from Old-Value to New-
Value.
In case of a rollback, the system uses the log to restore the values to the pre-
update state i.e. in case of roll-back of Transaction Ti the value of data item Q
will be reverted back from New-Value to Old-Value.
2. Deferred Update The updates performed by an active transaction Ti are not
immediately reflected on the stable storage. The Updates are kept in RAM in DBMS
Buffers. The updates of Transaction Ti are transferred from the DBMS Buffers to Stable
Storage only after Ti Commits. The updates are also reflected in the Log-File on stable
storage, which enable recovery in case of failures. Since, log-file updates are only in
close-by sectors, so for each update, the system does not encounter large seek time of
disk write. So, this mode will produce higher throughput than Immediate Update Mode.
But one limitation of this mode is that suppose the system failure occurs after a
Transaction Ti has committed but before its updates have been transferred from the
DBMS Buffers to the stable storage. In this case, the updates of Ti have to be Redone
during Recovery Procedure. Suppose <Ti , Q, Old-Value, New-Value> is an entry in the
log-file pertaining to a Transaction Ti, which is to be Redone, the during Recovery, New-
Value will be rewritten into data item Q.
Recovery from System Failures.
During System Failures, contents of the main memory i.e. database buffers are lost. The
precise state of the transactions, which were in progress at the time of failure, will no
longer be known. Such transactions would need to be rolled back, when the system is
restarted after the failure.
There may be some transactions, which might have already committed prior to the system
failure, but not managed to get their updates transferred from the database buffers to the
physical database. Such transactions will need to be redone.
Which failed transactions to be UNDONE and which transactions to be REDONE?
During the normal operations, the system keeps TAKING CHECKPOINTS at pre-
specified regular intervals or when prescribed number of entries have been made in the
log. Taking a CHECKPOINT means:-
1. Physically writing (force-writing) the contents of the database buffers into the
physical database.
21
Praveen Kumar
2. Physically writing a special CHECKPOINT RECORD into the physical non-
volatile log. This CHECKPOINT RECORD gives a list of the transactions that
were in progress at the time of taking the CHECKPOINT .
Criteria for UNDO and REDO
Recovery Process is initiated when system is restarted after a failure. From the view-point
of recovery, there will be four types of transactions:-
1. Transactions, which began and were committed before the last CHECKPOINT.
These transactions need no action during Recovery.
2. Transactions, which began either before or after the last CHECKPOINT, but were
COMMITTED after the checkpoint, prior to failure. These transactions need a
REDO operation during Recovery.
3. Transactions, which began before or after the last CHECKPOINT , but were still
NOT COMITTED at the time of failure. These need UNDO operation at the time
of Recovery.
Recovery Procedure
At restart time, the system goes through the following procedure:-
1. It will make use of two lists- the UNDO list and REDO list. Initialize the UNDO
list to the list of transactions, recorded in the most recent CHECKPOINT
RECORD and initialize the REDO list to empty.
2. Starting from the most recent CHECKPOINT RECORD, search the log file in the
forward direction.
3. If a “BEGIN TRANSACTION” log entry is found for transaction Ti, then add Ti
to the UNDO list.
4. If a “COMMIT” log entry is found for transaction Ti, move Ti from UNDO list to
the REDO list.
5. When the end of log file is reached, the UNDO and REDO lists are final.
6. The system now works backward through the log file, undoing the transactions in
the UNDO list. This is called BACKWARD RECOVERY.
7. Then, the system works forward redoing the transactions in the REDO list. This is
called FORWARD RECOVERY.
22
Praveen Kumar
Log Based Recovery Algorithm
Undo-List:= 0;
Redo-List := 0;
Pass I
/* This Pass is made scanning the Log-File, starting from the Failure Point, traversing it
in the Backward Direction, till the Last Check-Point Record is encountered.
Then, update the Undo-List as follows:- */
For each Transaction Entry Ti Log-File Do
Undo-List := Undo-List {Ti};
Pass II
/* This Pass is made scanning the Log-File, starting from the Last Check-Point Record,
traversing it in the Forward Direction, till the Failure Point Record is encountered. */
While Traversing the Log-File Do
Begin
For Each Record <Begin-Transaction Ti > Log-File Do
Undo-List := Undo-List {Ti};
For Each Record <Commit Ti > Log-File Do
Begin
Undo-List := Undo-List - {Ti}; /* Transfer Ti from Undo-List to Redo-
List*/
Redo-List := Undo-List {Ti};
End;
End;
/*At the End of this Pass, the Undo-List and Redo-List will be ready */
Pass III
/* This Pass is made scanning the Log-File, starting from the Failure Point, traversing the
Log-File in the Backward Direction, till the Undo-List gets empty. */
While Traversing the Log-File Do
Begin
For Each Data-Update Record <Ti , Q, Old-Val, New-Val > Log-File Do
If Ti Undo-List
Then Q := Old-Val; /* Undo the Update */
For Each Record <Begin-Transaction Ti > Log-File Do
Undo-List := Undo-List - {Ti}; /* Remove Ti from the Undo-List */
23
Praveen Kumar
End;
/*At the End of this Pass, the Undo of all Transactions in the Undo-List would be
complete */
Pass IV
/* This Pass is made scanning the Log-File, starting from the Last Check-Point,
traversing the Log-File in the Forward Direction, till the Redo-List gets empty. */
While Traversing the Log-File Do
Begin
For Each Data-Update Record <Ti , Q, Old-Val, New-Val > Log-File Do
If Ti Undo-List
Then Q := New-Val; /* Redo the Update */
For Each Record <Commit Ti > Log-File Do
Redo-List := Redo-List - {Ti}; /* Remove Ti from the Redo-List */
End;
/*At the End of this Pass, the Redo of all Transactions in the redo-List would be complete
*/
Media Recovery
Media failures imply failures like disk-head crash or disk-controller failure, in which case
some portion of the database is physically destroyed. Recovery from such a failure
involves a reloading of the database from a backup copy (dump) and then using the log
files (both active log file and archived log files) to REDO all transactions that completed
since the backup copy was taken. There is no need to UNDO those transactions, which
were in progress at the time of failure, since those would have been lost from the
database buffers anyway.
Exercise
Q.1. (a) Discuss the salient features of Immediate Database Modification and Deferred Database Modification. Explain the role log-file. (b) Explain Log-Based-Recovery Algorithm can achieve Database Recovery. This Algorithm should cater both for Immediate Update and Deferred Update.
Exercises
24
Praveen Kumar
Ex.10.1 (a) Explain Two-Phase Locking Protocol for Concurrency Control in Transaction Processing. Show how this protocol ensures conflict-serializable schedules. (b) What are the additional stipulations of Strict-Two-Phase Locking Protocol? Explain how it ensures conflict-serializable and cascade-less schedules.
Ex.10.2 Draw Precedence Graph to determine whether the following schedule is Conflict-serializable? If not, show how:-
(i) Two-Phase locking can be used to achieve conflict-serializability. (ii) Time Stamp Ordering can be used to achieve conflict serializability.
T1 T2
Read (A) Write (A);
Read (A) Write (A);
Ex10.3 Explain Time-Stamp Ordering technique for concurrency control. What are its strengths and limitations as compared to lock-based protocols?
Ex.10.4 Consider the following transactions:-
T1 : Read (A); Read (B); If A = 0 then B:= B+1; Write (B); T2 : Read (B); Read (A); If B = 0 then A:= A+1; Write (A); Add Lock and Unlock instructions to Transactions T1 and T2, so that they observe the Two-Phase-Locking Protocol.