Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and...

39
Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro & terminology TP m ons & ORBs Logging & res. M gr. Files& BufferM gr. Structured files 11:00 Reliability Locking theory Res. M gr. & Trans. M gr. COM + A ccesspaths 13:30 Fault tolerance Locking techniques CICS & TP & Internet CORBA/ EJB + TP G roupw are 15:30 Transaction models Q ueueing A dvanced Trans. M gr. Replication Perform ance & TPC 18:00 Reception Workflow Cyberbricks Party FREE Files and Buffer Manager Chapter 15

Transcript of Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and...

Page 1: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro &

terminologyTP mons& ORBs

Logging &res. Mgr.

Files &Buffer Mgr.

Structuredfiles

11:00 Reliability Lockingtheory

Res. Mgr. &Trans. Mgr.

COM+ Access paths

13:30 Faulttolerance

Lockingtechniques

CICS & TP& Internet

CORBA/EJB + TP

Groupware

15:30 Transactionmodels

Queueing AdvancedTrans. Mgr.

Replication Performance& TPC

18:00 Reception Workflow Cyberbricks Party FREE

Files and Buffer Manager

Chapter 15

Page 2: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2

Abstractions Provided by the File Manager

Device independence: The file manager turns the large variety of external storage devices, such as disks (with their different numbers of cylinders, tracks, arms, and read/write heads), ram-disks, tapes, and so on, into simple abstract data types.

Allocation independence: The file manager does its own space management for storing the data objects presented by the client. It may store the same objects in more than one place (replication).

Page 3: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 3

Abstractions Provided by the File Manager

Address independence: Whereas objects in main memory are always accessed through their addresses, the file manager provides mechanisms for associative access. Thus, for example, the client can request access to all records with a specified value in some field of the record. Support for associative access comes in many flavors, from simple mechanisms yielding fast retrieval via the primary key up to the expressive power of the SQL select statement.

Page 4: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 4

External Storage vs. Main Memory

Capacity: Main memory is usually limited to a size that is some orders of magnitude smaller than what large databases need.

Economics: External storage holds large volumes of data at reasonable cost.

Durability: Main memory is volatile. External storage devices such as magnetic or optical disks are inherently durable and therefore are appropriate for storing persistent objects. After a crash, recovery starts with what is found in durable storage.

Page 5: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 5

External Storage vs. Main Memory

Speed: External storage devices are some orders of magnitude slower than main memory. As a result, it is more costly, both in terms of latency and in terms of pathlength, to get data from external storage to the CPU than to load data from main memory.

Functionality: Data cannot be processed directly on external storage: they can neither be compared nor modified “out there.”

Page 6: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 6

The Storage Pyramid

main memory

online external storage

near line (archive) storage

typical capacity

Electronic RAM and bulk storage

Magnetic / optical disks

Automated archives (e.g. optical disk jukeboxes, tape robots, etc.)

current data

stale data

Page 7: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 7

Interfacing to External Memory:Read-Write Mapping

External Storage

File A

File B

File C

File D

Main Memory

read object w read object yread object x /write object x

Page 8: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 8

Interfacing to External Memory:File Mapping

External Storage

File A

File B

File C

File D

Main Memory File C

map File C

unmap File C

Page 9: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 9

Interfacing to External Memory:Single-Level Storage

External Storage

File A

File B

File C

File D

Main Memory

File A File B File C File DVirtual memory

Explicit mapping

Page 10: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 10

Locality and Cacheing

The movement of data through the pyramid is guided by the principle of locality:

Locality of active data: Data that have recently been referenced will very likely be referenced again.

Locality of passive data: Data that have not been referenced recently will most likely not be referenced in the future.

Page 11: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 11

Levels of Abstraction in a File and Database Manager

main memory

online external memory

nearline external memory

DBMS Application

database access modules

databasebuffer mgr.

logging recovery

media and file manager

archive manager

Transaction programs

Tuple management, associative access

Buffer management

File management

Archive management

manages

manages

manages

tuple oriented access

block oriented access

device oriented access

setorientedaccess

Application

sort, join,...

read, write

Page 12: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 12

Operations of the Basic File System

STATUS create(filename, allocparmp)

STATUS delete(filename)

STATUS open(filename, ACCESSMODE, FILEID);

STATUS close(FILEID)

STATUS extend(FILEID, allocparmp)

STATUS read(FILEID, BLOCKID, BLOCKP)

STATUS readc(FILEID, BLOCKID, blockcount, BLOCKP)

STATUS write(FILEID, BLOCKID, BLOCKP)

STATUS writec(FILEID, BLOCKID, blockcount,

BLOCKP)

Page 13: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 13

Mapping Files To Disk

File A

File B

File C

File D

File E

Disk 1

The denote the address mapping between disk and files.

Page 14: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 14

Issues in Managing Disk Space

Initial allocation: When a file is created, how many contiguous slots should be allocated to it?

Incremental expansion: If an existing file grows beyond the number of slots currently allocated, how many additional contiguous blocks should be assigned to that file?

Reorganization: When and how should the free space on the disk be reorganized?

Page 15: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 15

Extent-Based Allocation

Disk-A Disk-B Disk-A Disk-Adisk-id

extent indexaccum-length

14 187 3 214100 350 600 850

primary 1.secondary 2.secondary 3.secondary

file directory

A Bextent directory

Page 16: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 16

Buddy SystemsSlots (shaded extents are free)

Free extents

Type

012

3

0110

00101100

1011

Buddy types

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0

1

2

3

Page 17: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 17

Simple Mapping of Relations To Disks

file system & operating systemdatabase system

real disks

File system segments

relationsextents

Page 18: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 18

A Usual Way of Mapping of Relations To Disks

operating system database system

real disks logical disks OS-files

tabelspacessegments

relations

extents

Page 19: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 19

Principles of the Database Bufferprocess of access module

buffer storage area

buffer is accessible from the caller's process (shared memory)

buffer manager interface

readdirect

directory

bufferfix (P, ...)Give me page P

find frame in buffer

determine FILEID and block number

return frame address F

1

23

4

5

Page 20: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 20

Design Options for the Buffer Manager

Buffer per file: Each file has its own private buffer pool.. Buffer per page size: In systems with different page

(and block) sizes, there is usually at least one buffer for each page size.

Buffer per file type: There are files like indices, which are accessed in a significantly different way from other files. Therefore, some systems dedicate buffers to files depending on the access pattern and try to manage each of them in a way that is optimal for the respective file organization.

Page 21: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 21

Logic of the Buffer Manager

Search in buffer: Check if the requested page is in the buffer. If found, return the address F of this frame to the caller.

Find free frame: If the page is not in the buffer, find a frame that holds no valid page.

Determine replacement victim: If no such frame exists, determine a page that can be removed from the buffer (in order to reuse its frame).

Page 22: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 22

Logic of the Buffer Manager

Write modified page: If replacement page has been changed, write it.

Establish frame address: Denote the start address of the frame as F.

Determine block address: Translate the requested PAGEID P into a FILEID and a block number. Read the block into the frame selected.

Return: Return the frame address F to the caller.

Page 23: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 23

Synchronization in the Bufferprocess A process B process A process B

bufferfix (P,...) bufferfix (P,...)

give copy to caller

give copy to caller

change P to P' change P to P"

rewrite page rewrite page

?

a) Access module in process A requests access to page P; gets private copy.

b) Access module in process B requests access to page P; gets private copy.

c) Both processes try to rewrite an updated version of the page, but these versions are different. Only the version written last will be on disk; this is the "lost update" anomaly.

Page 24: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 24

What the Buffer Manager Does for Synchronization

Sharing: Pages are made addressable to all processes that run the database code.

Semaphore protection: Each requestor gets the address of a semaphor protecting the page.

Durable storage: The access modules inform the buffer manager if their page access has resulted in an update of the page; the actual write operation, however, is issued by the buffer manager, probably at a time when the update transaction is long gone.

Page 25: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 25

The Interface to the Buffer Manager

typedef struct

{PAGEID pageid; /* id of page in file */

PAGEPTR pageaddr; /* base addr. in buffer */

int index; /* record within page */

semaphore * pagesem; /* pointer to the sem. */

Boolean modified; /* caller modif. page */

Boolean invalid; /* destroyed page */

} BUFFER_ACC_CB, *BUFFER_ACC_CBP;

/* control block for buffer access */

Page 26: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 26

The Need for Fix and Unfixtransaction X

bufferfix(P,...)

page P not in buffer

read block containing page P

return base address in buffer

transaction Y transaction Y

bufferfix(Q,...)

page Q not in buffer; replace P

read block containing page Q

bufferfix(Q,...)

return base address in buffer

a) Transaction X requests access to page P; gets base address in buffer.

b) Transaction Y requests access to page Q; buffer mgr. decides to replace page P

c) Transaction Y gets the base address of Q in the buffer - is same as P's.

Page 27: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 27

The Fix-Use-Unfix Protocol I

FIX: The client requests access to a page using the bufferfix interface.

USE: The client uses the page and the pointer to the frame containing the page will remain valid.

UNFIX: The client explicitly waives further usage of the frame pointer; that is, it tells the buffer manager that it no longer wants to use that page.

Page 28: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 28

The Fix-Use-Unfix Protocol II

page P

page Q

page R

fix page P

use

useunfix page P

fix page R

use

use

use

unfix page R

fix page Quse

use

useunfix page Q

Page 29: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 29

Structure of the Buffer Manager

bufferpool

frames

hash table

buffercontrol block

buffer control block

buffercontrol block

buffer control block

buffer control block

buffercontrol block

pages

frame_index first_bcb

next_in_hclass

mru_pagelru_page

buffer access control block

to and from client

index of frame holding the page (address pointer in case of buffer access control block)chain of buffer control blocks in same hash classLRU - chain

Page 30: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 30

Logging and Recovery from the Buffer Manager's Perspective I

Transaction Buffer Database Remark

running

running

running

running

committed

committed

committed

committed

A BA B OK; old state in DBBA A B OK; old state in DBBA A B database corruptedBA BA conflicting view on TA

A BA B

BA A B

BA A B

BA BA

OK; Read-only TA

DB not in new state

database corrupted

OK; new state in DB

Page 31: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 31

Logging and Recovery from the Buffer Manager's Perspective II

state of transaction TA

aborted

aborted

committed

committed

state of page A in database

old

new

new

result of recovery using operation log

wrong tuple might be deleted

old

inverse operation succeeds

operation succeeds

duplicate of tuple is inserted

Page 32: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 32

The Log and Page LSNs

buffer manager

log manager log record for

page A log record for page A

log record for page A

write to disk

page A LSN1

write to disk

page A LSN3

log record for page A

time

LSN1 LSN2 LSN3 LSN4

Page 33: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 33

Different Buffer Management Policies

Steal policy: When the buffer manager needs space, it can decide to replace dirty pages.

No-Steal policy: Pages can be replaced only if they are clean.

Force policy: At end of transaction, all modified pages are forced to disk in a series of synchronous write operations.

No-Force policy: No modified page is forced during commit. REDO log records are written to the log.

Page 34: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 34

The Problem of Hotspot Pages

bufferpool

durable storage

log

TA1

TA2TA3

TA4 TA5TA6

TA7TA8

force

The dotted arrows indicate an update of the page by the respective transaction.The arrows at 45 degrees indicate the forced writing of the page during commit processing.The downward arrows indicate the writing of log records for the respective transaction.

update operations

page A

Page 35: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 35

The Basic Checkpoint Algorithm

Quiesce: Delay all incoming update DML calls until all fixes with exclusive semaphores have been released.

Flush the buffer: Write all modified pages. Log the checkpoint: Write a record to the log,

saying that a checkpoint has been generated. Resume normal operation: The bufferfix requests

for updates that have been delayed in order to take the checkpoint can now be processed again.

Page 36: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 36

The Case for Indirect Checkpointing

checkpoints

c1 c2 c3

1 23456789

10

frame- numbers

log

When taking a checkpoint, the PAGEIDs of the pages currently in buffer are written to the log.

Page 37: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 37

The Indirect Checkpointing Algorithm

Record TOC: Log the list of PAGEIDs. Compare with prev. ckpt: See if any modified

pages have not been replaced since last ckpt. Force lazy pages: Schedule the writing of those

pages during the next checkpoint interval. Low-water mark: Find the LSN of the oldest

still-volatile update; write it to the log. Write “Checkpoint done” record Resume normal operation

Page 38: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 38

Further Possibilities for Optimization

Pre-flushing can be performed by an asynchronous process that scans the buffer for "old" modified pages. Writing is done under semaphore protection.

Pre-fetching can, among other things, be used to make restart more efficient. If page reads are logged one can use the recent checkpoint plus the log to prime the bufferpool, i.e. it will look almost exactly like at the moment of the crash.

Page 39: Files and Buffer Manager Chapter 15. Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 Abstractions.

Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 39

Further Possibilities for Optimization

Transaction scheduling and buffer management can take hints from the query optimizer:

This relation will be scanned sequentially. This is a sequential scan of the leaves of a B-

tree. This is the traversal of a B-tree, starting at the

root. This is a nested-loop join, where the inner

relation is scanned in physically sequential order.