Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of...

30
Principles of Transaction Management

Transcript of Principles of Transaction Management. Outline Transaction concepts & protocols Performance impact of...

Principles of Transaction Management

Outline

• Transaction concepts & protocols

• Performance impact of concurrency control

• Performance tuning

ApplicationProgrammer

(e.g., business analyst,Data architect)

SophisticatedApplicationProgrammer

(e.g., SAP admin)

DBA,Tuner

Hardware[Processor(s), Disk(s), Memory]

Operating System

Concurrency Control Recovery

Storage SubsystemIndexes

Query Processor

Application

Transaction Concepts & Protocols

• Transaction– A logical unit of database processing– A sequence of begin, reads/writes, end– Unit of recovery, consistency, concurrency

• Transaction Processing Systems– Large databases with multiple users executing

database transactions– Examples

• Banking systems, airline reservations, supermarket checkouts, ...

Transaction States

Active

Failed

Committed

Terminated

Partially Committed

begin-transaction commit

read-item, write-item

abortabort

end-transaction

STATE

Transition

Interleaved Transactions

• A and B are concurrent transactions

t1 t2 t3 t4 t5Time

A

B

A

B

Transaction “Correctness”

• ACID properties– Atomicity– Consistency– Isolation– Durability

• Enforced by concurrency control and recovery methods of the DBMS

Serial Schedule

• Schedule– A sequence of read & write operations from various transactions– R1[X] W3[Y] R2[X] W2[Y] W1[X] W2[X]

• Serial schedule– No interleaved operations from the participating transactions– W3[Z] R3[Y] R1[X] W1[Y] R2[Y] W2[Z] W2[X]– Always correct, but … so slow!

• A schedule that is equivalent to some serial schedule is correct too

Serializable Schedule

T1 T2

R(A)

W(A)

R(A)

W(A)

R(B)

W(B)

R(B)

W(B)

CommitCommit

Equivalent Schedules

• 2 schedules are equivalent if the transactions– Read the same values– Produce the same output– Have the same effect on the database

• Examples1. R1[X] W2[X] R3[Y] W1[Y] R2[Y] W3[Z] W2[Z]

2. W3[Z] R3[Y] R1[X] W1[Y] R2[Y] W2[Z] W2[X]

3. W2[X] R1[X] W1[Y] R2[Y] W3[Z] W2[Z] R3[Y]• 1 and 2 are equivalent; not 3

Serializable Schedule

Theorem• A schedule is serializable if there is a serial

schedule such that for every conflicting pair of operations, the two operations appear in the same order in both schedules.

• 2 operations conflict if they are on the same object and one is a write

• Example 1 is serializable

WR Conflicts

T1 T2

R(A) ($200)

W(A) ($100)

R(A) (100)

W(A) (106)

R(B) (200)

W(B) (212)

R(B) (212)

W(B) (312)

Commit

Commit

T1 transfer $100 from A to B, and T2 increments both and B by 6%

(A and B have $200 initially)

Dirty read

R(A)

Unrepeatable Read (UR)

WW Conflicts

T1 T2

R(A)

W(A) ($1000)

R(A)

W(A) ($2000)

R(B)

W(B) ($2000)

R(B)

W(B) ($1000)

Commit

Commit

T1 to set both A and B to $1000, T2 to set both A and B to $2000

Lost Update!

Concurrency Control Enforces Serializability

• Most commercial DBMS use protocols (a set of rules) which when enforced by DBMS ensure the serializability of all schedules in which transactions participate.– Serializability testing after execution is meaningless;

how to rectify?– This done by Concurrency Control

Concurrency Control Protocols

• Commercially accepted mechanisms– Locking– Timestamps

• Others mechanisms– Multi-version and optimistic protocols

• Granularity issues

Locking

• Locking is used to synchronize accesses by concurrent transactions on data items– A concept also found in operating systems and

concurrent programming

• A lock is a variable for a data item, that describes the status of the item with respect to allowable operations

Types of Locks

• Binary locks– Locked, or Unlocked

• Check before enter; wait when locked; lock after enter; unlock after use (and wakeup one waiting transaction).

– Simple but too restrictive

• Read/Write locks in commercial DBMS– read-locked– write-locked– Unlocked

R-lock W-lock

R-lock

W-lock

Y

N

N

N

Read/Write Locking Scheme

• A transaction T must issue read-lock (X) or write-lock before any read-item (X)

• T must issue write-lock (X) before any write-item (X)

• T must issue unlock-item (X) after completing all read-item (X) and write-item (X)

• T will not issue a read-lock (X) if T already holds a read/write lock on X

• T will not issue write-lock (X) if T already holds a write lock on X

Does Locking Ensure Serializability?

X unlocked too early

Y unlocked too early

T1

read-lock (Y);

read-item (Y);

unlock (Y);

write-lock (X);

read-item (X);

X:=X+Y;

write-item (X);

unlock (X);

T2

read-lock (X);

read-item (X);

unlock (X);

write-lock (Y);

read-item (Y);

Y:=X+Y;

write-item (Y);

unlock (Y);

Cannot serialize T1 and T2

X == Y (orignal X + originalY)

For serializable T1T2,

X == X + Y

Y == 2Y + originalX?

Need for Locking Protocol

• Locking alone does not ensure serializability!• We need a locking protocol• A set of rules that dictate the positioning of

locking and unlocking operations, thus guaranteeing serializability

Two-Phase Locking (2PL)

• A transaction follows the two-phase protocol if all locking operations precede the first unlocking operation

Phase 2: Shrinking

read-lock (Y)unlock (X)unlock (Y)

Phase 1: Growing

read-lock (X)write-lock (X)write-lock (Y)

2PL Variants

• Basic 2PL

• Conservative 2PL– Locking operations precede transaction execution– Make sure can acquire necessary locks

• Strict 2PL– Unlocking of write-locks after commit (or abort)– Avoid cascading abort

• Rigorous 2PL– Unlocking of all locks after commit (or abort)

Limitations of 2PL

• Some serializable schedules may not be permitted– Performance not optimal

• 2PL (and locking in general) may cause deadlocks and starvation – Deadlock: no transactions can proceed– Starvation: some transaction wait forever

Lock Granularity

• Larger size - lower concurrency• Smaller size - higher overhead

What is the best item size?

Processing a mix of transactions?

Depends on the type of transactions

Multiple granularity locking scheme, changing the size of the data item dynamically

Performance of Locking

Throughput

# of Active Transactions

Thrashing

• Overhead: blocking

• Increasing the throughput:

•Locking smaller size objects

•Reducing locking time

•Reducing hot spots

Other CC Protocols

• Timestamp based• Multi-version based• Optimistic concurrency control

– No checking is done before or during transaction execution

– The transaction is validated at the end of execution, by checking if serializability has been violated

Summary of Transaction Concepts

ACID

Baseline:Serial Schedule

Strict 2PL

2PL

Ideal:Serializable Schedule

TransactionCorrectness

Other CC Protocols• Timestamp• Multi-version• Optimistic

Summary

To improve performance

Interleave transactions

Correctness: ACID

Serial schedule is correct

Serializable schedule is equivalent to some serial schedule

Concurrency control enforces serializability

2PL- Deadlock- Starvation- Granularity

Optimistic Timestamping Multi-version

Performance Impact of Concurrency Control

• Lock contention• Deadlock

Performance Impact of Concurrency Control

• LONG transactions are penalized