Recovery System - Computer Sciencehcao/teaching/cs582/note/DB2_t23_trans3_recovery.pdfDatabase...
Transcript of Recovery System - Computer Sciencehcao/teaching/cs582/note/DB2_t23_trans3_recovery.pdfDatabase...
Database Management Systems II, Huiping Cao! 1!
Recovery System!
2!Database Management Systems II, Huiping Cao!
Review: The ACID properties!
q Atomicity: All actions in the transaction happen, or none happen."
q Consistency: If each transaction is consistent, and the DB starts consistent, it ends up consistent."
q Isolation: Execution of one transaction is isolated from that of other transactions."
q Durability: If a transaction commits, its effects persist."
q The Recovery Manager guarantees Atomicity & Durability."
3!Database Management Systems II, Huiping Cao!
Motivation!q Atomicity: "
q Transactions may abort (“Rollback”)."q Durability:"
q What if DBMS stops running?"q Causes: System Crash, media failures"
q Desired Behavior after system restarts:"q T1, T2 & T3 should be durable."q T4 & T5 should be aborted (effects not seen)."
crash!
T1 T2 T3 T4 T5
4!Database Management Systems II, Huiping Cao!
ARIES recovery algorithm!
q Analysis: identify dirty pages in the buffer pool and active transactions at the time of the crash. "
q Redo: Repeats all actions, starting from an appropriate point in the log, and restores the database state to what it was at the time of the crash"
q Undo: Undoes the actions of transactions that did not commit, so that the database reflects only the actions of committed transactions."
5!Database Management Systems II, Huiping Cao!
Three main principles!
q Write-Ahead Logging"q Repeating History During Redo"q Logging Changes During Undo"
6!Database Management Systems II, Huiping Cao!
Basic Idea: Logging!
q Record REDO and UNDO information, for every update, in a log."q Sequential writes to log (put it on a separate disk)."q Minimal info (diff) written to log, so multiple updates fit in a single
log page."q Log: An ordered list of REDO/UNDO actions"
q Log record contains: "<XID, pageID, offset, length, old data, new data> "
q and additional control info (which we will see soon)."
7!Database Management Systems II, Huiping Cao!
Write-Ahead Logging (WAL)!
q The Write-Ahead Logging Protocol:"q Must force the log record for an update before the corresponding
data page gets to disk."q Must write all log records for a transaction before commit."
"
8!Database Management Systems II, Huiping Cao!
WAL & the Log!
q Each log record has a unique Log Sequence Number (LSN). "q LSNs always increas."
q Each data page contains a pageLSN."q The LSN of the most recent log record
for an update to that page."q System keeps track of flushedLSN."
q The max LSN flushed so far."q WAL: Before a page is written,"
q pageLSN ≤ flushedLSN"
LSNs
DB
pageLSNs
RAM
flushedLSN
pageLSN
Log records flushed to disk
“Log tail” in RAM
9!Database Management Systems II, Huiping Cao!
Log Records!
q Log Record fields"
prevLSN" transID" Type" pageID" Length" Offset" Before-image"
After-image"
Fields common to all log records"
Additional fields for update log records"
10!Database Management Systems II, Huiping Cao!
Log Record Types!q Update!q Commit!
q The log record is appended to the log"q The log tail is written to stable storage (up to and including the
commit record)"q Others: Remove the transaction’s entry from the transaction table;
writing of the commit log record"q Abort!
q Initiate Undo "q End (signifies end of commit or abort)"
q After commit/abort, additional actions must be taken; "q After all these additional steps are completed, write “End”"
q Compensation Log Records (CLRs) !q When a transaction is aborted during recovery from a crash
(UNDO actions)"q undoNextSLN"
11!Database Management Systems II, Huiping Cao!
Transaction Commit!
q Write commit record to log."q All log records up to the transaction’s lastLSN are flushed."
q Guarantees that flushedLSN ≥ lastLSN."q Note that log flushes are sequential."q Many log records per log page."
q Commit() returns."q Write end record to log."
12!Database Management Systems II, Huiping Cao!
Transaction Abort!
q For now, consider an explicit abort of a transaction."q No crash involved."
q We want to “play back” the log in reverse order, undoing updates."q Get lastLSN of transaction from transaction table."q Can follow chain of log records backward via the prevLSN
field."q Before starting UNDO, write an Abort log record."
! For recovering from crash during UNDO!"q To perform UNDO, must have a lock on data!"
13!Database Management Systems II, Huiping Cao!
Abort, cont.!
q Before restoring old value of a page, write a CLR:"q You continue logging while you UNDO!!"q CLR has one extra field: undonextLSN"
! Points to the next LSN to undo (i.e. the prevLSN of the record we are currently undoing)"
! 10 T1 update P2"! 20 T1 update P1"! 40 CLR: Undo T1 LSN 20 //undonextLSN = prevLSN(LSN20)=10"
q CLRs never Undone (but they might be Redone when repeating history: guarantees Atomicity!)"
q At end of UNDO, write an “end” log record."
14!Database Management Systems II, Huiping Cao!
CLR!
q Compensation Log Records (CLRs) !q When a transaction is aborted or during recovery from a crash
(UNDO actions)"q undoNextSLN"
LSN! prevLSN!
transID! type! pageID! len! offset! Before-image!
After-image!
undoNextLSN!
10" null" T1000" update" P500" 3" 21" ABC" DEF"
20" null" T2000" update" P600" 3" 41" HIJ" KLM"30" 20" T2000" update" P500" 3" 20! GDE" QRS"
40" 10" T1000" update" P505" 3" 21" TUV" WXY"45" 40" T1000" abort"50" 45" T1000" CLR
UNDO 40"
P505" 3" 21" WXY" TUV" 10"
15!Database Management Systems II, Huiping Cao!
Other Log-Related State!
q Transaction Table:"q One entry per active transaction."q Contains XID: transaction id"q Status (running/committed/aborted)"q lastLSN: the LSN of the most recent log record for this
transaction"q Dirty Page Table:"
q One entry per dirty page in buffer pool."q Contains recLSN: the LSN of the log record which first
caused the page to be dirty."
16!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"T2000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID! len! offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20! GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"
Log"
17!Database Management Systems II, Huiping Cao!
Normal Execution of a transaction!
q Series of reads & writes, followed by commit or abort."q We will assume that write is atomic on disk."
! In practice, additional details to deal with non-atomic writes."
q Strict 2PL"
18!Database Management Systems II, Huiping Cao!
Checkpointing!
q Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system crash. Write to log:"q Step1: begin_checkpoint record: Indicates when checkpoint
began."q Step 2: end_checkpoint record: Contains current transaction table
and dirty page table. This is a `fuzzy checkpoint’:"! Other transactions continue to run; so these tables are
accurate only as of the time of the begin_checkpoint record."! No attempt to force dirty pages to disk; effectiveness of
checkpoint limited by oldest unwritten change to a dirty page. (So it is a good idea to periodically flush dirty pages to disk!)"
q Step 3: Store LSN of checkpoint record in a safe place (master record)."! This will be done after the “end_checkpoint” record is written to
stable storage. "
19!Database Management Systems II, Huiping Cao!
The Big Picture: What is Stored Where!
DB
Data pages each with a pageLSN
Transaction Table lastLSN status
Dirty Page Table
recLSN flushedLSN
RAM
prevLSN XID type
length pageID
offset before-image after-image
LogRecords
LOG
master record
20!Database Management Systems II, Huiping Cao!
Crash Recovery: Big Picture!
q Start from a checkpoint (found via master record)."
q Three phases. Need to:"o Analysis: Figure out which transactions
committed since checkpoint, which failed."o REDO all actions."
o (repeat history)"o UNDO effects of failed transactions."
Oldest log rec. of Xact active at crash
Smallest recLSN in dirty page table after Analysis
Last chkpt
CRASH
A R U
21!Database Management Systems II, Huiping Cao!
Recovery: The Analysis Phase!
q Finish three tasks"q Determine REDO point"q Determine dirty pages"q Determine active transactions"
q Reconstruct state at checkpoint."q Start from the most recent begin_checkpoint"q via end_checkpoint record"
q Scan log forward from checkpoint."q End record: Remove transaction from transaction table."q Other records: Add transaction to transaction table, set
lastLSN=LSN, change transaction status on commit."q Update record: If P is not in the dirty page table,"
! Add P to the dirty page table, set its recLSN=LSN."
22!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"T2000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"T1000" update" P700" …" …" …"
Log"
Checkpoint: beginning of the execution, with empty dirty page table and transaction table""Last log record not written to disk"
23!Database Management Systems II, Huiping Cao!
pageID" recLSN"
transID" lastLSN"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Restore the dirty page table and transaction table using the checkpoint"Read log"
24!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"
transID" lastLSN"T1000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Checkpoint: beginning of the execution, with empty dirty page table and transaction table"Scan the first log record"
25!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"
transID" lastLSN"T1000"T2000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Checkpoint: beginning of the execution, with empty dirty page table and transaction table"Scan the first log record"Scan the second log record"
26!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"
transID" lastLSN"T1000"T2000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Checkpoint: beginning of the execution, with empty dirty page table and transaction table"Scan the first log record"Scan the second log record"Scan the third log record (no change)"
27!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"T2000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Checkpoint: beginning of the execution, with empty dirty page table and transaction table"Scan the first log record"Scan the second log record"Scan the third log record (no change)"Scan the fourth log record"
28!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"T2000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Checkpoint: beginning of the execution, with empty dirty page table and transaction table"Scan the first log record"Scan the second log record"Scan the third log record (no change)"Scan the fourth log record"Scan the fifth log record: set the status of Transaction T2000 from “U” to “C”"""
29!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"T2000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Checkpoint: beginning of the execution, with empty dirty page table and transaction table"Scan the first log record"Scan the second log record"Scan the third log record (no change)"Scan the fourth log record"Scan the fifth log record: set the status of Transaction T2000 from “U” to “C”"Scan the sixth log record: remove T2000 from the transaction table."What if P600 was already written to the disk? ""
30!Database Management Systems II, Huiping Cao!
Recovery: The REDO Phase!
q We repeat History to reconstruct state at crash:"q Reapply all updates (even of aborted transactions!), redo CLRs."
q Scan forward from log rec containing smallest recLSN in Dirty Page Table "q For each redoable log record (CLR or update), REDO the action unless: "
q Affected page is not in the Dirty Page Table, or"q Affected page is in Dirty Page Table., but the recLSN for the entry > LSN of
the log record being checked, or"q pageLSN of this page (that is in the disk) >= LSN of the log record being
checked"q To REDO an action:"
q Reapply logged action."q Set pageLSN to LSN. "q No additional logging!"
31!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Smallest recLSN = LSN of the 1st log record"REDO:"• Fetch P500, compare pageLSN and this LSN, apply updates, pageLSN updated"
"
32!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"
Log"
Smallest recLSN = LSN of the 1st log record"REDO:"• Fetch P500, compare pageLSN and this LSN, apply updates, pageLSN updated"• Fetch P600, assume P600 was written to disk and that item was not in DPT, NO UPDATE"
"
33!Database Management Systems II, Huiping Cao!
Recovery: The UNDO Phase!
ToUndo = {x | x is a lastLSN of a “loser” transaction}"Repeat:"
q Choose the largest LSN among ToUndo."q If this LSN is a CLR (Compensation log record) and undonextLSN==NULL"
! Write an End record for this transaction."q If this LSN is a CLR, and undonextLSN != NULL"
! Add undonextLSN to ToUndo "q Else this LSN is an update. "
! Undo the update, "! Write a CLR"! Add prevLSN to ToUndo"
Until ToUndo is empty.!
34!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"T1000" CLR-4"
Log"
ToUndo = {LSN for 4th log record}"Active transactions: T1000"Most recent LSN: the 4th log record LSN"• Undo the update"• Write CLR"• undoNextLSN = prevLSN of the 4th log record = LSN of 1st log record""
"
35!Database Management Systems II, Huiping Cao!
pageID" recLSN"P500"P600"P505"
transID" lastLSN"T1000"
Transaction Table"
Dirty Page Table"
prevLSN!
transID! type! pageID!
len!
offset! Before-image!
After-image!
T1000" update" P500" 3" 21" ABC" DEF"
T2000" update" P600" 3" 41" HIJ" KLM"T2000" update" P500" 3" 20" GDE" QRS"
T1000" update" P505" 3" 21" TUV" WXY"T2000" commit"T2000" end"T1000" CLR-4 "T1000" CLR-1"T1000" End"
Log"Active transactions: T1000"Most recent LSN: the 4th log record LSN"• Undo the update"• Write CLR-4"• undoNextLSN = LSN of 1st log record"Undo the action in the 1st log record; Write CLR-1; undoNextLSN=null; "T1000 End"
36!Database Management Systems II, Huiping Cao!
Example of Recovery!
begin_checkpoint end_checkpoint update: T1 writes P5 update: T2 writes P3 T1 abort CLR: Undo T1 LSN 10
T1 End update: T3 writes P1 update: T2 writes P5 CRASH, RESTART
LSN LOG
00 05 10 20 30 40
45 50 60
Xact Table lastLSN status
Dirty Page Table recLSN
flushedLSN ToUndo
prevLSNs
RAM
37!Database Management Systems II, Huiping Cao!
Example of Recovery!
begin_checkpoint end_checkpoint update: T1 writes P5 update: T2 writes P3 T1 abort CLR: Undo T1 LSN 10
T1 End update: T3 writes P1 update: T2 writes P5 CRASH, RESTART
LSN LOG
00 05 10 20 30 40
45 50 60
prevLSNs
pageID" recLSN"P1" 50"P3" 20"P5" 10"
Dirty Page Table"
transID" lastLSN"T2" 60"T3" 50"
Transaction Table"
Redo:10-60"ToUndo={60,50}
38!Database Management Systems II, Huiping Cao!
Example: Crash During Restart!!
begin_checkpoint, end_checkpoint update: T1 writes P5 update T2 writes P3 T1 abort CLR: Undo T1 LSN 10, T1 End update: T3 writes P1
update: T2 writes P5 CRASH, RESTART CLR: Undo T2 LSN 60 CLR: Undo T3 LSN 50, T3 end
CRASH, RESTART
LSN LOG
00,05 10 20 30 40,45 50
60 70 80,85
undonextLSN
Redo:10-60"ToUndo={60,50} • Process 60, CLR, undoNextLSN=20 • Process 50, CLR, undoNextLSN=null, T3 end • Log 70-85 written to disk
pageID" recLSN"P1" 50"P3" 20"P5" 10"
Dirty Page Table"
transID" lastLSN"T2" 60"T3" 50"
Transaction Table"
39!Database Management Systems II, Huiping Cao!
Example: Crash During Restart!!
begin_checkpoint, end_checkpoint update: T1 writes P5 update T2 writes P3 T1 abort CLR: Undo T1 LSN 10, T1 End update: T3 writes P1
update: T2 writes P5 CRASH, RESTART CLR: Undo T2 LSN 60 CLR: Undo T3 LSN 50, T3 end
CRASH, RESTART CLR: Undo T2 LSN 20, T2 end
LSN LOG
00,05 10 20 30 40,45 50
60 70 80,85
90
undonextLSN
Redo:10-85"ToUndo={70} • Process 70, NO CLR,
undoNextLSN=20 • Process 20, CLR (90),
undoNextLSN=null, T2 end
pageID" recLSN"P1" 50"P3" 20"P5" 10"
Dirty Page Table"
transID" lastLSN"T2" 70"
Transaction Table"
40!Database Management Systems II, Huiping Cao!
Additional Crash Issues!
q What happens if system crashes during Analysis? During REDO?"
q How do you limit the amount of work in REDO?"q Flush asynchronously in the background."q Watch “hot spots”!"
q How do you limit the amount of work in UNDO?"q Avoid long-running transactions."
41!Database Management Systems II, Huiping Cao!
Summary of Logging/Recovery!
q Recovery Manager guarantees Atomicity & Durability."q LSNs identify log records; linked into backwards chains per
transaction (via prevLSN)."q pageLSN allows comparison of data page and log records."
42!Database Management Systems II, Huiping Cao!
Summary, Cont.!
q Checkpointing: A quick way to limit the amount of log to scan on recovery. "
q Recovery works in 3 phases:"q Analysis: Forward from checkpoint."q Redo: Forward from oldest recLSN."q Undo: Backward from end to first LSN of oldest Xact alive
at crash."q Upon Undo, write CLRs."q Redo “repeats history”: Simplifies the logic!"