1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February...
![Page 1: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/1.jpg)
1
Database TuningRasmus Pagh and S. Srinivasa Rao
IT University of CopenhagenSpring 2007
February 8, 2007
Tree IndexesLecture based on [RG, Chapter 10] and [Pagh03, 2.3.0-
2.3.2]
Slides based onNotes 04: Indexing
for Stanford CS 245, fall 2002by Hector Garcia-Molina
![Page 2: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/2.jpg)
2
Today
IndexesPrimary, secondary, dense, sparse
B-trees Analysis of B-trees B-tree variants and extensions
![Page 3: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/3.jpg)
3
Why indexing?
Support more efficiently queries like:SELECT * FROM R WHERE a=11SELECT * FROM R WHERE 8<= b and b<42
Indexing an attribute (or set of attributes) can be used to speed up finding tuples with specific values.
Goal of an index: Look at as few blocks as possible to find the matching record(s)
![Page 4: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/4.jpg)
4
Sequential files
Store relation in sorted order according to search key.
Search: binary search in logarithmic time (I/Os) in the number of blocks used by the relation.
Drawback: expensive to maintain.
![Page 5: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/5.jpg)
5
Primary and secondary indexes
In a primary index, records are stored in an order determined by the search key
e.g. sequentially A relation can have at most one primary
index. (Often on primary key.) A secondary index can not take
advantage of any specific order, hence it has to be dense.
Secondary index can have a second, sparse level.
![Page 6: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/6.jpg)
6
Dense index For each record store the key and a pointer to
the record in the sequential file.
Why? Uses less space, hence less time to search. Time (I/Os) logarithmic in number of blocks used
by the index.
Need not access the data for some kinds of queries.
Can also be used as secondary index, i.e. with another order of records.
![Page 7: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/7.jpg)
7
Sparse index
Store first value in each block in the sequential file and a pointer to the block.
Uses even less space than dense index, but the block has to be searched, even for unsuccessful searches.
Time (I/Os) logarithmic in the number of blocks used by the index.
![Page 8: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/8.jpg)
8
Multiple levels of indexes
If an index is small enough it can be stored in internal memory. Only one I/O is used.
If the index is too large, an index of the index can be used.
Generalize, and you have a B-tree. The top level index has size equal to one block.
![Page 9: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/9.jpg)
9
B-trees
Can be seen as a general form of multi-level indexes.
Generalize usual (binary) search trees.
Allow efficient insertions and deletions at the expense of using slightly more space (than sequential files).
Popular variant: B+-tree
![Page 10: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/10.jpg)
10
Root
B+-tree Example
10
0
12
01
50
18
0
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
Each node stored in one disk block
![Page 11: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/11.jpg)
11
Sample internal node
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
![Page 12: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/12.jpg)
12
Sample leaf node:
From internal node
to next leafin
sequence
57
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5Alternative: Records in leaves
![Page 13: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/13.jpg)
13
Searching a B+-tree
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
Question: How does one search for a range of keys?
Above: Search path for tuple with key 101.
![Page 14: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/14.jpg)
14
B+-tree invariants on nodes
Suppose a node (stored in a block) has space for n keys and n+1 pointers.
Don't want block to be too empty: Should have at least (n+1)/2 non-null pointers.
(Different from the text book (RG) notation!)
Exception: The root, which may have only 2 non-null pointers (only 1 key).
![Page 15: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/15.jpg)
15
Other B+-tree invariants
(1) All leaves at same lowest level(perfectly balanced tree)
(2) Pointers in leaves point to records except for sequence pointer
![Page 16: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/16.jpg)
16
Problem session:Analysis of B+-trees
What is the height of a B+-tree with N leaves and room for n pointers in a node?
![Page 17: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/17.jpg)
17
Insertion into B+-tree
(a) simple case- space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
![Page 18: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/18.jpg)
18
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
![Page 19: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/19.jpg)
19
(b) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
![Page 20: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/20.jpg)
20
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
![Page 21: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/21.jpg)
21
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
![Page 22: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/22.jpg)
22
(a) Simple case - no example
(b) Coalesce with neighbour (sibling)
(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf
Deletion from B+-tree
![Page 23: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/23.jpg)
23
(b) Coalesce with sibling- Delete 50
n=4
10
40
100
10
20
30
40
50
40
![Page 24: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/24.jpg)
24
(c) Redistribute keys- Delete 50
n=4
10
40
100
10
20
30
35
40
5035
35
![Page 25: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/25.jpg)
25
(d) Non-leaf coalesce- Delete 37
n=5
40
45
30
37
25
26
20
22
10
141 3
10
20
30
4040
30
25
25
new root
![Page 26: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/26.jpg)
26
Alternative B+-tree deletion
In practice, coalescing is often not implemented (hard, and often not worth it)
An alternative is to mark deleted nodes.
Periodic global rebuilding may be used to remove marked nodes when they start taking too much space.
![Page 27: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/27.jpg)
27
Problem session:Analysis of B+-trees
What is the worst case I/O cost of Searching? Inserting and deleting?
![Page 28: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/28.jpg)
28
B+-tree summary
Height 1+logn/2 N, typically 3 or 4.
Best search time we could hope for! By keeping top node(s) in memory,
the number of I/Os can be reduced. Updates: Same cost as search,
except for rebalancing.
![Page 29: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/29.jpg)
29
Problem session
Prove that B-trees are optimal in terms of search time among pointer-based indexes, i.e.,Suppose we want to search among N keys, that internal memory can hold M keys/pointers, and that a disk block can hold n keys/pointers. Further, suppose that the only way of accessing disk blocks is by following pointers. Show that a search takes at least logn (N/M) I/Os in the worst case.
Hint: Consider the sizes of the set of blocks that can be accessed in at most t I/Os.
![Page 30: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/30.jpg)
30
Sorting using B-trees
In internal memory, sorting can be done in O(N log N) time by inserting the N keys into a balanced search tree.
The number of I/Os for sorting by inserting into a B-tree is O(N logBN).
This is more than a factor B slower than multiway mergesort (Feb 22 lecture).
![Page 31: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/31.jpg)
31
Next: Buffering in B-trees
Based on slides by Gerth Brodal, covering a paper published in 2003 at the SODA conference.
Using buffering techniques could be the next big thing in DB indexing.
A nice thesis subject!
![Page 32: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/32.jpg)
32
More on rebalancing
"It will be a rare event that calls for splitting or merging of blocks" – GUW, page 645.
This is true (in particular at the top levels), but a little hard to see.
Easier seen for weight-balanced B-trees.
![Page 33: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/33.jpg)
33
Weight-balanced B-trees(based on [Pagh03], where n corresponds to
B/2)
Remove the B+-tree invariant:There must be (n+1)/2 non-null pointers in a node.
Add new weight invariant:A node at height i must have weight (number of leaves in the subtree below) that is between (n/4)i and 4(n/4)i.(Again, the root is an exception.)
![Page 34: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/34.jpg)
34
Consequences of the weight invariant: Tree height is 1+logn/4 N (almost
same) A node at height i with weight, e.g.,
2(n/4)i will not need rebalancing until there have been at least (n/4)i updates in its subtree. (Why?)
Consequences of the weight invariant: Tree height is 1+logn/4 N (almost
same) A node at height i with weight, e.g.,
2(n/4)i will not need rebalancing until there have been at least (n/4)i updates in its subtree. (Why?)
Weight-balanced B-trees
![Page 35: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/35.jpg)
35
Rebalancing weight
A B Y Z
New insertion in subtree
More than 4(n/4) i
leaves in subtree weight balanceinvariant violated
![Page 36: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/36.jpg)
36
Rebalancing weight
A B Y Z
Node is split into two nodes of weightaround 2(n/4) i, i.e.,far from violating the invariant(details in [Pagh03])
![Page 37: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/37.jpg)
37
Summary of properties Deletions similar to insertions (or:
use marking and global rebuilding). Search in time O(logn N).
A node at height i is rebalanced (costing O(1) I/Os) once for every ((n/4)i) updates in its subtree.
Summary of properties Deletions similar to insertions (or:
use marking and global rebuilding). Search in time O(logn N).
A node at height i is rebalanced (costing O(1) I/Os) once for every ((n/4)i) updates in its subtree.
Weight-balanced B-trees
![Page 38: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/38.jpg)
38
Other kinds of B-trees
String B-trees: Fast searches even if keys span many blocks. (April 19 lecture.)
Persistent B-trees: Make searches in any previous version of the tree, e.g. ”find x at time t”. The time for a search is O(logBN), where N is the total number of keys inserted in the tree. (April 12 lecture.)
![Page 39: 1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.](https://reader035.fdocuments.in/reader035/viewer/2022062516/56649d775503460f94a58e96/html5/thumbnails/39.jpg)
39
Summary Indexing is a "key" database technology. Conventional indexes (when few updates). B-trees (and variants) are more flexible
The choice of most DBMSs• Range queries.• Deterministic/reliable.
Theoretically “optimal”: O(logB N) I/Os per operation.
Buffering can be used to achieve fast updates, at the cost of increasing the height of the tree.