Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree...

Index tuning--

B+tree

overview

© Dennis Shasha, Philippe Bonnet 2001

B+-Tree Locking

• Tree Traversal– Update, Read– Insert, Delete

• phantom problem: need for range locking

• ARIES KVL (implemented in DB2)• Tree Traversal (next page)• Lock on tuples• Lock on key values• Range locking:

– Next key lock 42 4

© Dennis Shasha, Philippe Bonnet 2001

A

B C

D

E F

T1 lock

T1 lockT1 lock

B+-Tree Locking

Bulk Loading of a B+ Tree

• If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow.

• Bulk Loading can be done much more efficiently.• Initialization: Sort all data entries, insert pointer to first

(leaf) page in a new (root) page.

Bulk Loading (Contd.)

• Add <low key value on page, pointer to page> to the root page


• Split the root and create a new root page.


• Index entries for leaf pages always entered into rightmost index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.)

• Much faster than repeated inserts, especially when one considers locking!

Comparison: B-trees vs. static indexed sequential file

Ref #1: Held & Stonebraker, “B-Trees Re-examined”, CACM, Feb. 1978

Ref # 1 claims:

- Concurrency control harder in B-Trees

- B-tree consumes more space

For their comparison:

block = 512 byteskey = pointer = 4 bytes4 data records per block

Example: 1 block static index

127 keys

(127+1)4 = 512 Bytes

-> pointers in index implicit! up to 127blocks

k1

k2

k3

k1

k2

k3

1 datablock

Example: 1 block B-tree

63 keys

63x(4+4)+8 = 512 Bytes

-> pointers needed in B-tree up to 63blocks because index is blocksnot contiguous

k1

k2

...

k63

k1

k2

k3

1 datablock

next

-

Size comparison Ref. #1Size comparison Ref. #1

Static Index B-tree

# data # datablocks height blocks height

2 -> 127 2 2 -> 63 2

128 -> 16,129 3 64 -> 3968 3

16,130 -> 2,048,383 4 3969 -> 250,047 4

250,048 -> 15,752,961 5

Ref. #1 analysis claims

• For an 8,000 block file,after 32,000 inserts

after 16,000 lookups

Static index saves enough accessesto allow for reorganization

Ref. #1 conclusion Static index better!!

Ref #2: M. Stonebraker, “Retrospective on a database system,” TODS, June 1980

Ref. #2 conclusion B-trees better!!

• DBA does not know when to reorganize• DBA does not know how full to load

pages of new index

• Buffering– B-tree: has fixed buffer requirements– Static index: must read several overflow

blocks to be efficient(large & variable

size buffers needed for this)

• Speaking of buffering… Is LRU a good policy for B+tree

buffers? Of course not!

Should try to keep root in memory at all times

(and perhaps some nodes from second level)

Interesting problem:

For B+tree, how large should n be?

…

n is number of keys / node

Sample assumptions:

(1) Time to read node from disk is(S+Tn) msec.

(2) Once block in memory, use binarysearch to locate key:

(a + b LOG2 n) msec.

For some constants a,b; Assume a << S(3) Assume B+tree is full, i.e., # nodes to examine is LOGn N

where N = # records

Can get: f(n) = time to find a record

f(n)

nopt n

FIND nopt by f’(n) = 0

Answer is nopt = “few hundred”

(see homework for details)

What happens to nopt as

• Disk gets faster?

• CPU get faster?

Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree...

Documents

Transcript of Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree...