CAS CS 460/660 Introduction to Database Systems Indexing ...
Cs 157 Al 22 Indexing
-
Upload
studyhardgr10 -
Category
Documents
-
view
219 -
download
0
Transcript of Cs 157 Al 22 Indexing
-
8/11/2019 Cs 157 Al 22 Indexing
1/56
The B-Tree I ndex
Prof. Sin-Min Lee
Department of Computer Science
San Jose State University
-
8/11/2019 Cs 157 Al 22 Indexing
2/56
Outline
Introduction The B-tree shape
Dynamic changes in the B-tree
Properties of the B-tree Create index statement syntax
More about the B-tree
Summary
-
8/11/2019 Cs 157 Al 22 Indexing
3/56
Introduction
A B-tree is a keyed index structure, comparable to anumber of memory resident keyed lookup structures
such as balanced binary tree, the AVL tree, and the 2-3
tree.
The difference is that a B-tree is meant to reside on disk,being made partially memory-resident only when entries
int the structure are accessed.
The B-tree structure is the most common used index typein databases today.
It is provided byORACLE, DB2, and INGRES.
-
8/11/2019 Cs 157 Al 22 Indexing
4/56
-
8/11/2019 Cs 157 Al 22 Indexing
5/56
The B-Tree Shape
A B-tree is built upside down with the root at the top andthe leaves at the bottom.
All nodes above the leaf level, including the root, are
called directory nodes or index nodes.
Directory nodesbelow the root are calledinternal
nodes.
The root node is known aslevel 1 of the B-tree and
successively lower levels are given successively largerlevel numbers with the leaf nodes at the lowest level.
The total number of levels is called thedepthof the B-
tree.
-
8/11/2019 Cs 157 Al 22 Indexing
6/56
Balanced and Unbalanced Trees
Trees can be balancedor unbalanced.
In a balanced tree, every path from the route to a leaf node is
the same length.
A tree that is balanced has at most logorder
nlevels. This is
desirable for an index.
-
8/11/2019 Cs 157 Al 22 Indexing
7/56
The Problem of Unbalanced
Trees a. A Troublesome Search Tree
b. A More Troublesome Search Tree
1
2
5
3
4
6
9
7
8
5
4
3
2
1
-
8/11/2019 Cs 157 Al 22 Indexing
8/56
Disadvantage of unbalanced tree
Searching an unbalanced tree may
require traversing an arbitrary and
unpredictable number of nodes and
pointers.
-
8/11/2019 Cs 157 Al 22 Indexing
9/56
Unbalanced Tree (cont.)
Problems:
1. The levels of the tree are onlysparsely filled
2. Resulting in long
3. Deep paths and defeating thepurpose of binary trees in the first
place.
-
8/11/2019 Cs 157 Al 22 Indexing
10/56
The general name for B-trees is multiwaytrees. However,
the best known of them have their very own names: 2-3
trees,B* trees, andB+ trees. Here we will look only at 2-3trees. They are the simplest. And or purposes of learning
the underlying principles this is good. Multiway trees can
get pretty complicated pretty fast. Keep in mind that 2-3
trees as we study them are rarely built. Their larger
cousins, B* and B+ trees are often used for medium and
large database applications. The general idea of a
multiway tree of order nis that each node can hold up to n
- 1key values and each node can have up to nchildern.
So, a 2-3 tree is actually a multiway tree of order 3. Thatmeans that a 2-3 tree has nodes that can hold 1 or 2 values
and that a node can have 0, 1, 2, or 3 children. Thus the
name 2-3 tree
-
8/11/2019 Cs 157 Al 22 Indexing
11/56
The B-Tree Shape (c.)
level1: root node, level2: directory nodes, level3: leaf nodes
-
8/11/2019 Cs 157 Al 22 Indexing
12/56
-
8/11/2019 Cs 157 Al 22 Indexing
13/56
B-Trees
For a binary search tree, the search time is O(h), where h is
the height of the tree.
The height cannot be less than roughly log2
n for a tree
with n nodes.
If the tree is too large for internal memory, each access of a
node is an I/O.
For a tree with 10,000 nodes:
log210000 = 100 disk accesses
We need a structure that would require only 3 or 4 disk
accesses.
-
8/11/2019 Cs 157 Al 22 Indexing
14/56
B-Trees - Definition
A B-tree of order M is an M-ary tree
with the following properties:
(1) The data items are stored at leaves.
(2) The nonleaf nodes store up to M - 1
keys to guide the searching; key i represents
the smallest key in subtree i + 1.
-
8/11/2019 Cs 157 Al 22 Indexing
15/56
(3) The root is either a leaf or has between 2
and M children.
(4) All nonleaf nodes (except the root) havebetween ceiling(M/2) and M children.
(5) All leaves are at the same depth and
have between ceiling(L/2) and L data items,for some L.
-
8/11/2019 Cs 157 Al 22 Indexing
16/56
-
8/11/2019 Cs 157 Al 22 Indexing
17/56
B-tree of order n
Every B-tree is of some "order n", meaning nodes
contain from n to 2n keys (so nodes are always at
least half full of keys), and n+1 to 2n+1 pointers,and n can be any number.
Keys are kept in sorted order within each node. A
corresponding list of pointers are effectively
interspersed between keys to indicate where tosearch for a key if it isn't in the current node.
-
8/11/2019 Cs 157 Al 22 Indexing
18/56
A B-tree of order n is a multi-way search
tree with two properties:
1.All leaves are at the same level
2.The number of keys in any node lies
between n and 2n, with the possible
exception of the root which may have fewerkeys.
-
8/11/2019 Cs 157 Al 22 Indexing
19/56
Other definition
A B-tree of order m is a m-way tree that satisfies the
following conditions.
Every node has < m children.
Every internal node (except the root) has 2 children.
An internal node with k children contains (k-1) ordered
keys. The leftmost child contains keys less than or equal tothe first key in the node. The second child contains keys
greater than the first keys but less than or equal to the
second key, and so on.
-
8/11/2019 Cs 157 Al 22 Indexing
20/56
A B-tree of order 2
-
8/11/2019 Cs 157 Al 22 Indexing
21/56
Dynamic changes in the B-Tree
A B-tree is an efficient self-modifying structure whennew entries are inserted pointing to new rows inserted in
the indexed table.
The nodes at every level are generally assumed not to be
full.
Space is left so that inserts are often possible to a node at
any level without new disk space being required.
An insert of a new entry always occurs at the leaf level,but occasionally the leaf node is too full to simply accept
the new entry. In this case, for additional space the leaf
level node is split into two leaf pages.
-
8/11/2019 Cs 157 Al 22 Indexing
22/56
Properties of the B-Tree
Assumptions: Entry key values can have variable length because of
variable-length column values appearing in the index
key. When a node split occurs, equal lengths of entry
information are placed in the left and right split node.
Rebalancing actions in the B-tree occur when entries are
deleted.
-
8/11/2019 Cs 157 Al 22 Indexing
23/56
Properties of the B-Tree (c.)
Properties: Every node is disk-page sized and resides in a well-
defined location.
Nodes above the leaf level contain directory entries, with
n-1 separator keys and n disk pointers to lower-level B-
tree nodes.
Nodes at the leaf level contain entries with (keyval,
rowid) pairs pointing to individual rows indexed. All nodes below the root are at least half full with entry
information.
The root node contains at least two entries.
-
8/11/2019 Cs 157 Al 22 Indexing
24/56
Insertion in B-Tree
1. 2.
a, g, f,b: k:
a b f g
a b g k
f
-
8/11/2019 Cs 157 Al 22 Indexing
25/56
Insertion (cont.)
3. 4.
d, h, m: j:
5. 6.
e, s, i, r: x:
f
a b d g h k m
f j
a b d g h k m
f j
a b d e k m r sg h i
f j r
g h i s xk ma b d e
-
8/11/2019 Cs 157 Al 22 Indexing
26/56
Insertion (cont.)
7.
c, l, n, t, u:
8.p:
c f j r
s t u xk l m ng h ia b d e
j
a b d e k l n p
m rc f
g h i s t u x
-
8/11/2019 Cs 157 Al 22 Indexing
27/56
Inserting into a B-Tree
To insert key value x into a B-Tree:
Use the B-Tree search to determine on which node
to make the insertion. Insert x into appropriate position on that leaf
node.
If resulting number of keys on that node < L, then
simply output that node to disk and return.
Otherwise, split the node.
-
8/11/2019 Cs 157 Al 22 Indexing
28/56
Inserting into a B-Tree: Splitting a Node
Allocate a new leaf node. Put about half (i.e.,
about L/2) of the keys on the new node and
leave about half of the keys on the existing node. Make appropriate changes to keys and pointers in
the parent node.
If the parent node was already full, then split theparent node.
The splitting of parents may continue all the way
back up to the root node.
-
8/11/2019 Cs 157 Al 22 Indexing
29/56
Insert 19,12, 22,15.
-
8/11/2019 Cs 157 Al 22 Indexing
30/56
Insertion
Insert the keys in the folowing order into a B-tree of order 5.
A, G, F, B, K, D, H, M, J, E, S, I, R, X, C, L, N, T, U, P.
-
8/11/2019 Cs 157 Al 22 Indexing
31/56
-
8/11/2019 Cs 157 Al 22 Indexing
32/56
Searching
Searching for an Item in a B-Tree:
1. Make a local variable, i, equal to the first index such
that data[i] >= target. If there is no such index, then set i
equal to data_count, indicating that none of the entries is
grater than or equal to the target.
2. if (we found the target at data[i])
return true;
else if (the root has no children)
return false;
else
return subset[i]->contains (target);
-
8/11/2019 Cs 157 Al 22 Indexing
33/56
Searching (cont.)
Example: target = 10
2 3
19 22
6 17
1610 18 20 25
12
5
4
-
8/11/2019 Cs 157 Al 22 Indexing
34/56
Deletion form a B-Tree
1. detete h, r :
s promote s
and
delete form leaf
j
c f
g i
d ea b k l n p
m r
g h i
t u x
s t u x
-
8/11/2019 Cs 157 Al 22 Indexing
35/56
Deletion (cont.)
2. delete p :
t
pull s down; pull t up
j
g i n pk ld ea b
m sc f
t u x
n s
-
8/11/2019 Cs 157 Al 22 Indexing
36/56
Deletion (cont.)
3. delete d:
Combine:
j
c f
g id ea b k l n su x
m t
-
8/11/2019 Cs 157 Al 22 Indexing
37/56
Deletion (cont.)
combine :
f
j
u xn sk lg i
g i k l n s u x
m t
a b c e
f j m t
a b c e
-
8/11/2019 Cs 157 Al 22 Indexing
38/56
Deleting from a B-Tree
To delete a key value x from a B-tree, first searchto determine the leaf node that contains x.
If removing x leaves that leaf node with fewerthan the minimum number of keys, try to adopt a
key from a neighboring node. If thats possible,
then youre finished.
-
8/11/2019 Cs 157 Al 22 Indexing
39/56
Deleting from a B-Tree
(continued) If the neighboring node is already at its minimum,
combine the leaf node with its neighboring node,
resulting in one full leaf node. This will require restructuring the parent node
since it has lost a child
If the parent now has fewer than the minimum
keys, adopt a key from one of its neighbors. Ifthats not possible, combine the parent with its
neighbor.
-
8/11/2019 Cs 157 Al 22 Indexing
40/56
Deleting from a B-Tree
(continued) This process may percolate all the way to
the root.
If the root is left with only one child, thenremove the root node and make its child the
new root.
Both insertion and deletion are O(h), whereh is the height of the tree.
-
8/11/2019 Cs 157 Al 22 Indexing
41/56
Delete 18
-
8/11/2019 Cs 157 Al 22 Indexing
42/56
Delete 5
-
8/11/2019 Cs 157 Al 22 Indexing
43/56
Delete 19
-
8/11/2019 Cs 157 Al 22 Indexing
44/56
Delete 12
-
8/11/2019 Cs 157 Al 22 Indexing
45/56
Properties of B-Trees
(1) In a B-tree of order m with n keys the
number of nodes, p, satisfies
and on average p ~1.44 n/m.
(2) On average less than
nodes are split per insertion
-
8/11/2019 Cs 157 Al 22 Indexing
46/56
Advantages of B-tree
Searching a balanced tree means that all leaves are
at the same depth. There is no runaway pointer
overhead. Indeed, even very large B-trees can
guarantee only a small number of nodes must be
retrieved to find a given key. For example, a B-
tree of 10,000,000 keys with 50 keys per node
never needs to retrieve more than 4 nodes to findany key.
-
8/11/2019 Cs 157 Al 22 Indexing
47/56
ORACLE Create Index Statement
Syntax:create[unique]index indexname ontablename
(columnname [asc| desc] {, columnname [asc| desc]})
[tablespace tblspacename]
[storage( [initialn] [nextn] [minextentsn]
[maxextentsn] [pctincreasen] ) ]
[pctfreen]
[other disk storage and transaction clauses not covered
or deferred]
[nosort];
-
8/11/2019 Cs 157 Al 22 Indexing
48/56
ORACLE Create Index Statement (c.)
Explanation:
The value of n inpctfree can range from 0 to 99 and this
number determines the percentage of each B-tree node
page.
The list of columnnames in parentheses on the secondline specifies a concatenation of column values that make
up an index key on the table specified.
Thenosortindicates that the rows already lie on disk insorted order by the key values for this index.
-
8/11/2019 Cs 157 Al 22 Indexing
49/56
DB2 Create Index Statement
Syntax:create[unique] indexindexname ontablename
(columname [asc| desc] {, columnname [asc| desc]})
[using . . .][freepagen]
[pctfreen]
[additional clauses not covered or deferred];
-
8/11/2019 Cs 157 Al 22 Indexing
50/56
DB2 Create Index Statement (c.)
Explanation: The usingspecifies how the index is to be constructed
from disk files.
The integer n of the freepagespecifies how frequently an
empty free page should be left in the sequence of pages
assigned to the index when it is loaded with entries by a
DB2utility.
One free page is left for every n index pages where nvaries from 0 to 255. The defaut value for n is 0 meaning
that no free pages are left. A value of n=1 means that
alternate disk page are left empty.
-
8/11/2019 Cs 157 Al 22 Indexing
51/56
INGRES Create Index Statement
Syntax:create[unique] indexindexname ontable
(columname {, columnname})
[with /* comas separate clauses following */[location= . . . ]
[structure= btree|isam |hash |. . .]
[key= (columnname {, columnname})]
[fillfactor= n]
[nonleaffill= n]
[additional clauses not covered or deferred] ];
-
8/11/2019 Cs 157 Al 22 Indexing
52/56
INGRES Create Index Statement (c.)Explanation:
The withkeyword must be present if any of the later
coma-separated clauses appear.
Thelocation specifies how the index is to be constructed
from disk files. Thestructureis unique to INGRESand names the
access structure that the index will be assigned when it is
created.
Thekey indicates that the index key value will be
constructed from the columnnames listed.
Thefillfactor andnonleaffill gives the percentage of
node space that should be filled.
-
8/11/2019 Cs 157 Al 22 Indexing
53/56
Index Node Layout and Free Space
Below is the schematic layout of a normal leaf-level index
node with unique key value.
Header info Free space
keyval rid keyval rid
-
8/11/2019 Cs 157 Al 22 Indexing
54/56
More about the B-Tree
The purpose of the B-tree index is to minimize the
number of disk I/Os needed to locate a row wit a given
index key value.
The depth of the B-tree bears a close relationship to the
number of disk I/Os used to reach the leaf-level entrywhere the rowid is kept.
The nodes of the B-tree are loaded in a left-to-right
fashion so that successive inserts normally occur to the
same leaf node held consistently in memory buffer.
When the leaf node splits, the successive leaf node is
allocated from the next disk page of the allocated extent.
-
8/11/2019 Cs 157 Al 22 Indexing
55/56
More about the B-Tree (c.)
Node splits at every level occur in a controlled way andallow us to leave just the right amount of free space on
each page.
It is common to estimate the fanout at each level to have
a value of n where n is expected number of entries thatappear in each node. Assuming that there are n directory
entries at the root node and every node below that, the
number of entries at the second level is n^2, at third level
is n^3, and so on. For a tree of depth K, the number of
leaf-level entries is n^K just before a root split occurs in
the tree to make it a tree of depth K+1.
-
8/11/2019 Cs 157 Al 22 Indexing
56/56
Summary
The B-tree is a tree-like structure that helps us to
organize data in an efficient way.
The B-tree index is a technique used to minimize the disk
I/Os needed for the purpose of locating a row with a
given index key value. Because of its advantages, the B-tree and the B-tree index
structure are widely used in databases nowadays.