Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree...
-
Upload
shon-morrison -
Category
Documents
-
view
218 -
download
2
Transcript of Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree...
Index tuning--
B+tree
overview
© Dennis Shasha, Philippe Bonnet 2001
B+-Tree Locking
• Tree Traversal– Update, Read– Insert, Delete
• phantom problem: need for range locking
• ARIES KVL (implemented in DB2)• Tree Traversal (next page)• Lock on tuples• Lock on key values• Range locking:
– Next key lock 42 4
© Dennis Shasha, Philippe Bonnet 2001
A
B C
D
E F
T1 lock
T1 lockT1 lock
B+-Tree Locking
Bulk Loading of a B+ Tree
• If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow.
• Bulk Loading can be done much more efficiently.• Initialization: Sort all data entries, insert pointer to first
(leaf) page in a new (root) page.
Bulk Loading (Contd.)
• Add <low key value on page, pointer to page> to the root page
Bulk Loading (Contd.)
• Split the root and create a new root page.
Bulk Loading (Contd.)
• Index entries for leaf pages always entered into rightmost index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.)
• Much faster than repeated inserts, especially when one considers locking!
Comparison: B-trees vs. static indexed sequential file
Ref #1: Held & Stonebraker, “B-Trees Re-examined”, CACM, Feb. 1978
Ref # 1 claims:
- Concurrency control harder in B-Trees
- B-tree consumes more space
For their comparison:
block = 512 byteskey = pointer = 4 bytes4 data records per block
Example: 1 block static index
127 keys
(127+1)4 = 512 Bytes
-> pointers in index implicit! up to 127blocks
k1
k2
k3
k1
k2
k3
1 datablock
Example: 1 block B-tree
63 keys
63x(4+4)+8 = 512 Bytes
-> pointers needed in B-tree up to 63blocks because index is blocksnot contiguous
k1
k2
...
k63
k1
k2
k3
1 datablock
next
-
Size comparison Ref. #1Size comparison Ref. #1
Static Index B-tree
# data # datablocks height blocks height
2 -> 127 2 2 -> 63 2
128 -> 16,129 3 64 -> 3968 3
16,130 -> 2,048,383 4 3969 -> 250,047 4
250,048 -> 15,752,961 5
Ref. #1 analysis claims
• For an 8,000 block file,after 32,000 inserts
after 16,000 lookups
Static index saves enough accessesto allow for reorganization
Ref. #1 conclusion Static index better!!
Ref #2: M. Stonebraker, “Retrospective on a database system,” TODS, June 1980
Ref. #2 conclusion B-trees better!!
• DBA does not know when to reorganize• DBA does not know how full to load
pages of new index
• Buffering– B-tree: has fixed buffer requirements– Static index: must read several overflow
blocks to be efficient(large & variable
size buffers needed for this)
• Speaking of buffering… Is LRU a good policy for B+tree
buffers? Of course not!
Should try to keep root in memory at all times
(and perhaps some nodes from second level)
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
Sample assumptions:
(1) Time to read node from disk is(S+Tn) msec.
(2) Once block in memory, use binarysearch to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S(3) Assume B+tree is full, i.e., # nodes to examine is LOGn N
where N = # records
Can get: f(n) = time to find a record
f(n)
nopt n
FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
What happens to nopt as
• Disk gets faster?
• CPU get faster?