Why Concerning Storage and Indexing?

53
Part III: Storage and Indexing (Chapters 8-11) Part V: Transctions, Concurrency control, Scheduling, and Recovery (Chapters 16-18)

description

Part III: Storage and Indexing (Chapters 8-11) Part V: Transctions, Concurrency control, Scheduling, and Recovery (Chapters 16-18). Why Concerning Storage and Indexing?. Performance: another major factor in user satisfaction Depends on Efficient data structures for data representation - PowerPoint PPT Presentation

Transcript of Why Concerning Storage and Indexing?

Page 1: Why Concerning Storage and Indexing?

Part III: Storage and Indexing(Chapters 8-11)

Part V: Transctions, Concurrency control, Scheduling, and Recovery

(Chapters 16-18)

Page 2: Why Concerning Storage and Indexing?

Why Concerning Storage and Indexing?

Performance: another major factor in user satisfaction Depends on

• Efficient data structures for data representation• Efficiency of system operation on those

structures Disks contains data files and system files including

dictionary and index files Disk access: one of the most critical factor in

performance.

Page 3: Why Concerning Storage and Indexing?

Why Not Store Everything in Main Memory?

Cost and size Main memory is volatile: What’s the problem? Typical storage hierarchy:

Factors: access speed, cost per unit, reliability Cache and main memory (RAM) for currently

used data: fast but costly Flash memory: limited number of writes (and

slow), non-volatile, disk-substitute in embedded systems

Disk for the main database (secondary storage). Tapes for archiving older versions of the data

(tertiary storage).

Page 4: Why Concerning Storage and Indexing?

Buffer Management in a DBMS

CC & Recovery may require additional I/O when a frame is chosen for replacement. Why?

DB

MAIN MEMORY

DISK

disk page

free frame

Page Requests from Higher Levels

BUFFER POOL

choice of frame dictatedby replacement policy

Page 5: Why Concerning Storage and Indexing?

Indexes An index on a file speeds up selections on the

search key fields for the index. Any subset of the fields of a relation can be the

search key for an index on the relation. Search key is not the same as key (minimal set

of fields that uniquely identify a record in a relation).

An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k. Given data entry k*, we can find record with key

k quickly. Classes: dense/sparse index, primary/secondary,

clustered/un-clustered

Page 6: Why Concerning Storage and Indexing?

Dense vs Sparse Index Dense index: one

index entry per search key value.

Sparse index: index records for only some of the records Every sparse index is

clustered! Sparse indexes are

smaller Which one is faster? Which one has less

overhead?

Ashby, 25, 3000

Smith, 44, 3000

Ashby

Cass

Smith

22

25

30

40

44

44

50

Sparse Indexon

Name Data File

Dense Indexon

Age

33

Bristow, 30, 2007

Basu, 33, 4003

Cass, 50, 5004

Tracy, 44, 5004

Daniels, 22, 6003

Jones, 40, 6003

Page 7: Why Concerning Storage and Indexing?

Clustered vs. Unclustered Index

Suppose that Alternative (2) is used for data entries, and that the data records are stored in a Heap file. To build clustered index, first sort the Heap file (with

some free space on each page for future inserts). Overflow pages may be needed for inserts. (Thus,

order of data recs is `close to’, but not identical to, the sort order.) Index entries

Data entries

direct search for

(Index File)

(Data file)

Data Records

data entries

Data entries

Data Records

CLUSTERED UNCLUSTERED

Page 8: Why Concerning Storage and Indexing?

Index Trees As for any index, 3 alternatives for data

entries k*: Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key

k> Choice is orthogonal to the indexing

technique used to locate data entries k*. Tree-structured indexing techniques support

both range searches and equality searches. ISAM (indexed sequential access method):

static structure (data entries reside in leaf pages and overflow pages)

B+ tree: dynamic, adjusts gracefully under inserts and deletes.

Page 9: Why Concerning Storage and Indexing?

ISAM

Index file may still be quite large. But we can apply the idea repeatedly – making a tree

Leaf pages contain data entries.

P0

K1 P

1K 2 P

2K m

P m

index entry

Non-leaf

Pages

Pages

Overflow page

Primary pages

Leaf

Page 10: Why Concerning Storage and Indexing?

Example ISAM Tree Each node can hold 2 entries; no need for `next-

leaf-page’ pointers. Why? Sequential allocation of leaf pages. Insert 23. Insert 48, 41, 42.

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

20 33 51 63

40

Root

Page 11: Why Concerning Storage and Indexing?

After Inserting 23*, 48*, 41*, 42* ...

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

20 33 51 63

40

Root

23* 48* 41*

42*

Overflow

Pages

Leaf

Index

Pages

Pages

Primary

Page 12: Why Concerning Storage and Indexing?

B+ Tree: Most Widely Used Index

Insert/delete at log F N cost; keep tree height-balanced. F = fanout , N = # leaf pages;

F is typically >> 2. Why? Minimum 50% occupancy (except for root). Each

node contains d <= m <= 2d entries. The parameter d is called the order of the tree.

Supports equality and range-searches efficiently.

Index Entries

Data Entries("Sequence set")

(Direct search)

Page 13: Why Concerning Storage and Indexing?

Extendible Hashing Situation: Bucket (primary page) becomes

full. Why not re-organize file by doubling # of buckets? Reading and writing all pages for reorganizing a

data file is expensive Idea: Use directory of pointers to buckets, double

# of buckets by doubling the directory, splitting just the bucket that overflowed!

Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. No overflow page!

Trick lies in how hash function is adjusted!

Page 14: Why Concerning Storage and Indexing?

Example

Directory is array of size 4. To find bucket for r, take

last `global depth’ # bits of h(r); we denote r by h(r). If h(r) = 5 = binary 101,

it is in bucket pointed to by 01.

Insert: If bucket is full, split it (allocate new page, re-distribute).

If necessary, double the directory. (When is splitting a bucket does not require doubling? We can tell by comparing global depth with local depth for the split bucket.)

13*00

01

10

11

2

2

2

2

2

LOCAL DEPTH

GLOBAL DEPTH

DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

DATA PAGES

10*

1* 21*

4* 12* 32* 16*

15* 7* 19*

5*

Page 15: Why Concerning Storage and Indexing?

Insert h(r)=20 (Causes Doubling)--- How to determine keys in A and A2?

20*

00

01

10

11

2 2

2

2

LOCAL DEPTH 2

2

DIRECTORY

GLOBAL DEPTHBucket A

Bucket B

Bucket C

Bucket D

Bucket A2(`split image'of Bucket A)

1* 5* 21*13*

32*16*

10*

15* 7* 19*

4* 12*

19*

2

2

2

000

001

010

011

100

101

110

111

3

3

3DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

Bucket A2(`split image'of Bucket A)

32*

1* 5* 21*13*

16*

10*

15* 7*

4* 20*12*

LOCAL DEPTH

GLOBAL DEPTH

Page 16: Why Concerning Storage and Indexing?

Linear Hashing

This is another dynamic hashing scheme, an alternative to Extendible Hashing.

LH handles the problem of long overflow chains without using a directory, and handles duplicates.

Idea: Use a family of hash functions h0, h1, h2, ... hi(key) = h(key) mod(2iN); N = initial # buckets h is some hash function (range is not 0 to N-1) If N = 2d0, for some d0, hi consists of applying h and

looking at the last di bits, where di = d0 + i. hi+1 doubles the range of hi (similar to directory

doubling)

Page 17: Why Concerning Storage and Indexing?

Overview of LH File

In the middle of a round.

Levelh

Buckets that existed at thebeginning of this round:

this is the range of

NextBucket to be split

of other buckets) in this round

Levelh search key value )(

search key value )(

Buckets split in this round:If is in this range, must useh Level+1

`split image' bucket.to decide if entry is in

created (through splitting`split image' buckets:

Page 18: Why Concerning Storage and Indexing?
Page 19: Why Concerning Storage and Indexing?

LawyersRecently reported in the Massachusetts Bar Association

Lawyers Journal, the following are questions actually asked of witnesses by attorneys during trials:

"Now doctor, isn't it true that when a person dies in his sleep,he doesn't know about it until the next morning?“

"Were you alone or by yourself?“

"Was it you or your younger brother who was killed in the war?“

"How far apart were the vehicles at the time of the collision?“

Q: "How was your first marriage terminated?"A: "By death."Q: "And by whose death was it terminated?"

Page 20: Why Concerning Storage and Indexing?

Part V:Transactions, Concurrency control, Scheduling, and

Recovery

Ch. 16 - 18

Page 21: Why Concerning Storage and Indexing?

Overview

Transactions and ACID properties Serial execution and serializable execution Serializability (dependency) graph Serializability theorem Conflict equivalence and view equivalence Properties of schedules: SR, RC, ACA, ST Schedulers

3 options to handle the request from TM optimistic (aggressive) vs pessimistic

(conservative) 2PL and Strict 2PL Timestamp ordering and Strict TO

Page 22: Why Concerning Storage and Indexing?

Atomicity of Transactions

A transaction might commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions.

A very important property guaranteed by the DBMS for all transactions is that they are atomic.

Atomicity: a transaction is assumed to be executing all its actions in one step, or not executing any actions at all. Not easy to achieve. Why?

Page 23: Why Concerning Storage and Indexing?

Consistency, Isolation, Durability

A transaction executed in isolation must preserve DB consistency.

Even if multiple transactions executed concurrently, each should be unaware of other transactions are being executed concurrently.

When a transaction complete successfully, the changes it made must persist, even with failures afterwards.

Page 24: Why Concerning Storage and Indexing?

Conflicts and Equivalence

When do two operations conflict? They are issued by different transactions They operate on the same data object At least one of them is a write operation

Conflict equivalent Two executions are conflict equivalent if in both

executions, all conflicting operations have the same order.

Page 25: Why Concerning Storage and Indexing?

Serializability Correctness criterion

Serializability is the correctness definition of DB All serializable schedules are equally correct Scheduling algorithms enforce certain ordering In distributed DBMS, variable delays may disturb

any particular ordering which is supposed to occur

Serialization graph (dependency graph) shows dependency relationship among transactions

Serialization Theorem For a schedule H, if SG(H) is acyclic, then H is

serializable.

Page 26: Why Concerning Storage and Indexing?

Properties of Schedules Recoverability

To ensure that aborting a transaction does not change the semantics of committed transactionsw1(x)r2(x)w2(y)C2

Is it recoverable? What if T1 aborts? Recoverable execution depends on commit order A transaction cannot commit until all values it

read are guaranteed not to be aborted. How to do it?

Delaying commit: T2 cannot commit until T1 commits

Cascaded abort is sometimes necessary. Why?w1(x)r2(x)w2(x)A1

Page 27: Why Concerning Storage and Indexing?

Properties of Schedules

Recoverability Cascaded abort is sometimes necessary

w1(x)r2(x)w2(x)A1 Avoiding cascaded aborts

Achieved if every transaction reads only the values written by committed transactions

Must delay each r(x) until all transactions that issued w(x) is either committed or abortedw1(x) ….. C1 r2(x) w2(y) …

Page 28: Why Concerning Storage and Indexing?

Properties of Schedules Restoring before images

Implementing transaction abort by simply restoring before images of all writes is very convenientw0(x)w1(x)w2(x) A1 A2

Value of x must be restored to the initial value, not the value written by T1

Solution: delay w(x) until all transactions that have written x are either committed or aborted

Strictness Executions that satisfy both requirements Delay both r(x) and w(x) until all transactions that

have written w(x) are either committed or abortedr1(x)w1(x) w1(y) w2(z) w2(x) C1 --- is it strict?

Page 29: Why Concerning Storage and Indexing?

Relationships among Properties

Recoverability (RC) RC if Ti reads from Tj and Ci is in H, then Ci

follows Cj Avoiding cascaded aborts (ACA)

ACA if Ti reads x from Tj then ri(x) follows Cj Strictness (ST)

ST if whenever Oi(x) follows wj(x), then Oi(x) follows either Aj or Cj

What is the relationship among ST, ACA, and RC?ST < ACA < RC

What about with SR and Serial execution?

Page 30: Why Concerning Storage and Indexing?

Two-Phase Locking (2PL)

Two-Phase Locking Protocol Each Xact must obtain a S (shared) lock on

object before reading, and an X (exclusive) lock on object before writing.

A transaction can not request additional locks once it releases any locks.

If an Xact holds an X lock on an object, no other Xact can get a lock (S or X) on that object.

Page 31: Why Concerning Storage and Indexing?

Multiple-Granularity Locks Why consider it? Database consists of tables, pages, tuples

(records) Hard to decide what granularity to lock

(tuples vs. pages vs. tables). Shouldn’t have to decide. How? Data “containers” are nested:

Tuples

Tables

Pages

Database

contains

Page 32: Why Concerning Storage and Indexing?

The Phantom Problem

T1 implicitly assumes that it has locked the set of all sailor records with rating = 1. Assumption only holds if no sailor records are

added while T1 is executing! Why did this problem happen? Example shows that conflict serializability

guarantees serializability only if the set of objects is fixed!

Need some mechanism to enforce this assumption -- index locking and predicate locking

Page 33: Why Concerning Storage and Indexing?

Index Locking

If there is a dense index on the rating field using Alternative (2), T1 should lock the index page containing the data entries with rating = 1. If there are no records with rating = 1, T1

must lock the index page where such a data entry would be, if it existed!

What if there is no suitable index? T1 must lock all pages, and lock the

file/table to prevent new pages from being added, to ensure that no new records with rating = 1 are added.

r=1Data

Index

Page 34: Why Concerning Storage and Indexing?

Predicate Locking Grant lock on all records that satisfy some logical

predicate, e.g. age > 2*salary. Index locking is a special case of predicate

locking for which an index supports efficient implementation of the predicate lock. What is the predicate in the sailor example? rating=1

Why not using predicate locks in commercial DBMS?

In general, predicate locking has a significant locking overhead.

Page 35: Why Concerning Storage and Indexing?

Locking in B+ Trees

How can we efficiently lock a particular leaf node? Don’t confuse this with multiple granularity

locking -- How are they different? One solution: Ignore the tree structure, just

lock pages while traversing the tree, following 2PL -- What’s wrong?

This has terrible performance! Root node (and many higher level nodes)

becomes a bottleneck. Why? Because every tree access begins at the root.

Page 36: Why Concerning Storage and Indexing?

B+ Tree Locking

Higher levels of the tree only direct searches for leaf pages.

For inserts, a node on a path from root to modified leaf must be locked (in X mode, of course), only if a split can propagate up to it from the modified leaf. (Similar point holds w.r.t. deletes.)

We can exploit these observations to design efficient locking protocols that guarantee serializability even though they violate 2PL.

Page 37: Why Concerning Storage and Indexing?

A Simple Tree Locking Algorithm Search: Start at root and go down;

repeatedly, S lock child then unlock parent.

Insert/Delete: Start at root and go down, obtaining X locks as needed. Once child is locked, check if it is safe: If child is safe, release all locks on ancestors.

Safe node: Node such that changes will not propagate up beyond this node. When is a node safe for inserts?

• Node is not full. When is a node safe for deletes?

• Node is not half-empty.

Page 38: Why Concerning Storage and Indexing?

Timestamp Ordering Idea: Any conflicting operations are

executed in their timestamp order Simple and aggressive

Schedule immediately and reject requests that arrive too late

How do you know a request has arrived too late?

Give each object a read-timestamp (RTS) and a write-timestamp (WTS), give each transaction a timestamp (TS) when it begins

Page 39: Why Concerning Storage and Indexing?

Timestamp Ordering Timestamp ordering rule:

If Oi(x) and Oj(x) are conflicting operation, Oi(x) is processed before Oj(x), if and only if TS(Ti) < TS(Tj).

Request arriving too late: Oi(x) arrives after the scheduler has sent

conflicting operation Oj(x) with TS(Tj) > TS(Ti)

Page 40: Why Concerning Storage and Indexing?

Basic Timestamp Ordering Ri(x): if TS(Ti) < WTS(x), reject it;

otherwise (TS(Ti) >=WTS(x)), then schedule it and set RTS(x) to max (RTS(x), TS(Ti))

Wi(x): if TS(Ti) < RTS(x) or TS(Ti) <WTS(x), reject it; otherwise (TS(Ti) >=WTS(x)), then schedule it and set WTS(x) to max (WTS(x), TS(Ti))

When restarted, Ti is assigned a new timestamp Thomas Write Rule:

For wi(x), if TS(Ti) < WTS(x) and TS(Ti) >= RTS(x), then wi(x) can be ignored, rather than being rejected.

Why is it correct? Ignoring obsolete write

Page 41: Why Concerning Storage and Indexing?

Exercise: Non-equivalence of 2PL and TO

H1=r2(x) w3(x) C3 w1(y)C1 r2(y) w2(y) C21. Is this schedule possible to timestamp

ordering?2. Is it possible with 2PL?

H1 is legal with strict timestamp ordering. What is the equivalent serial schedule?T1 T2 T3It is not possible with 2PLT2 must release lock on x for T3, but then gets lock on y – violation of two-phaseness

Page 42: Why Concerning Storage and Indexing?

Relationship between 2PL and TO Schedules generated by 2PL and TO

They are all correct (serializable) They are not the same set: H1 shows that Is the relationship inclusive?

S {schedules by 2PL} subset of S {schedules by TO}?S {schedules by TO} subset of S {schedules by 2PL}?

Consider w3(x) C3 w2(x) C2 r1(x)Is it legal with TO?

Is it legal with 2PL? Two sets of schedules are intersecting, but subset

2PL TOSR

Page 43: Why Concerning Storage and Indexing?

Failure and Recovery Failure and consistency

Transaction failures System failures Media failures

Principle of recovery Redundancy Database can be protected by ensuring that its

correct state can be reconstructed from information stored redundantly in the system

Recovering database – restart operation Bringing the stable DB to a consistent state by

removing effects of uncommitted transactions and applying missing effects of committed transactions.

Page 44: Why Concerning Storage and Indexing?

Recovery and Restart

Types of storage media Volatile storage: fast , but not surviving system

failures Non-volatile storage Stable storage: information never lost (practically)

Recovery Ideally, stable DB should contain, for each data item,

the last value written by committed transaction Practically, stable DB may contain values written by

uncommitted transactions, or may not contain the last committed values.

Why?1) Updating of uncommitted T 2) Buffering of committed values in the cache

Page 45: Why Concerning Storage and Indexing?

Function of Recovery Manager

Atomicity: Transactions may abort (“Rollback”).

Durability: What if DBMS stops running? (Causes?)

crash! Desired Behavior after

system restarts:– T1, T2 & T3 should be

durable.– T4 & T5 should be

aborted (effects not seen).

T1T2T3T4T5

Page 46: Why Concerning Storage and Indexing?

Recovery Management

Design rules for recovery manager Undo rule: committed values must be saved

before overwritten by uncommitted values in the stable DB

Redo rule: before commit, new values it wrote must be in the stable storage (DB or log)

Restart activity Preparation: during normal operation Actual recovery: after failure

Preparation Logging Checkpointing

Page 47: Why Concerning Storage and Indexing?

Cache Manager Two operations: fetch and flush

Use dirty bit for deciding flushing operation Flush: if the slot in cache is not dirty, do nothing;

otherwise, copy the value into stable storage Fetch: select a slot, using replacement algorithm

if full (and flush if necessary), copy the value into slot, reset dirty bit, update cache directory

When to flush? Depends on recovery strategy of the system Different recovery algorithms use different

strategies Idempotence of restart

Any sequence of incomplete execution, followed by a complete execution of restart has the same effect of just one complete execution

Page 48: Why Concerning Storage and Indexing?

Handling the Memory Pool

Write to disk: force/no-force

Cache page: steal/no-steal Force every write to disk?

Poor response time. But provides durability.

Steal buffer-pool frames from uncommited Xacts? If not, poor throughput. If so, how can we ensure

atomicity?

Force

No Force

No Steal Steal

Trivial

Desired

Page 49: Why Concerning Storage and Indexing?

More on Steal and Force STEAL (why enforcing Atomicity is hard)

To steal frame F: Current page in F (say P) is written to disk; some Xact holds lock on P.• What if the Xact with the lock on P aborts?• Must remember the old value of P at steal time

(to support UNDOing the write to page P). NO FORCE (why enforcing Durability is hard)

What if system crashes before a modified page is written to disk?

Write as little as possible, in a convenient place, at commit time, to support REDOing modifications.

Page 50: Why Concerning Storage and Indexing?

Recovery Algorithms

Undo/redo algorithm Most complicated of the four recovery algorithms Flexible in deciding when to flush (no-force) Maximize efficiency during normal operation at the

expense of less efficient recovery Comparison with other recovery algorithms

Issues: disk I/O, log space, recovery time No-redo requires more frequent flush (force) Uncommitted transaction is allowed to replace dirty

slot for in-place update – undo might be necessary Restart procedure

Process log forward and backward for redo and undo

Page 51: Why Concerning Storage and Indexing?

Undo/Redo Recovery

A transaction T writes vale V to data object X. What will happen?

System fetches X if it is not already in cache Record V in the log and in X’s slot C No need for the cache manager to flush C

If cache manager replaces C (steal), and either T aborts or system fails before T commits, undo is required

If T commits and system fails before C is flushed (no force), redo is required

Page 52: Why Concerning Storage and Indexing?

Restart Procedure for Undo/Redo Recovery

1. Discard all cache slots2. Scan the log to analyze which transactions

committed, aborted, or active, to determine data for redo/undo

3. Redo all actions that were committed but not recorded in the stable DB

4. Undo all actions of transactions that were aborted or active at the time of failure

Page 53: Why Concerning Storage and Indexing?