B+-Trees (PART 1)

23
1 B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET SOURCES

description

B+-Trees (PART 1). What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET SOURCES. What is a B+ tree?. A B + -tree of order M ≥ 3 is an M-ary tree with the following properties: - PowerPoint PPT Presentation

Transcript of B+-Trees (PART 1)

Singly Linked ListsWhy B+ trees?
Insertion in a B+ tree
*
*
What is a B+ tree?
A B+-tree of order M ≥ 3 is an M-ary tree with the following properties:
Leaves contain data items or references to data items
all are at the same depth
each leaf has L/2 to L data or data references (L may be equal to, less or greater than M; but usually L << M)
Internal nodes contain searching keys
The keys in each node are sorted in increasing order
each node has at least M/2 and at most M subtrees
The number of search keys in each node is one less than the number of subtrees
key i in an internal node is the smallest key in subtree i+1
Root
can be a single leaf, or has 2 to M children
*
*
The internal node structure of a B+ tree
Each leaf node stores key-data pair or key-dataReference pair. Data or data
references are in leaves only.
Leaves form a doubly-linked list that is sorted in increasing order of keys.
Each internal node has the following structure:
j a1 k1 a2 k2 a3 … kj aj+1
j == number of keys in node.
ai is a reference to a subtree.
ki == smallest key in subtree ai+1 and > largest key in subtree ai.
k1 < k2 < k3 < . . . < kj
Example: A B+ tree of order M = 5, L = 5
Records or references to records are stored at the leaves, but we only show
the keys here
At the internal nodes, only keys (and references to subtrees) are stored
*
*
Example: A B+ tree of order M = 4, L = 4
*
*
Why B+ trees?
Like a B-tree each internal node and leaf node is designed to fit into one I/O block of data. An I/O block usually can hold quite a lot of data. Hence, an internal node can keep a lot of keys, i.e., large M. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion.
B+-tree is a popular structure used in commercial databases. To further speed up searches, insertions, and deletions, the first one or two levels of the B+-tree are usually kept in main memory.
The reason that B+ trees are used in databases is, unlike B-trees,
B+ trees support both equality and range-searches efficiently:
Example of equality search: Find a student record with key 950000
Example of range search: Find all student records with Exam grade
greater than 70 and less than 90
*
*
A B+ tree supports equality and range-searches efficiently
Index Entries
Data Entries
("Sequence set")
(Direct search)
B+ Trees in Practice
For a B+ tree of order M and L = M, with h levels of index,
where h 1:
The maximum number of records stored is n = (M – 1)h
The space required to store the tree is O(n)
Inserting a record requires O(logMn) operations in the worst case
Finding a record requires O(logMn) operations in the worst case
Removing a (previously located) record requires O(logMn) operations in the worst case
*
*
Search KEY among the keys in that node
linear search or binary search
If KEY < smallest key, follow the leftmost child reference down
If KEY >= largest key, follow the rightmost child reference down
If Ki <= KEY < Kj, follow the child reference between Ki and Kj
If a leaf is reached:
Search KEY among the keys stored in that leaf
linear search or binary search
*
*
Searching a B+ Trees
In processing a query, a path is traversed in the tree from the root to some leaf node.
If there are K search-key values in the file, the path is no longer than
logm/2(K).
With 1 million search key values and m = 100, at most
log50(1,000,000) = 4 nodes are accessed in a lookup.
*
*
A B+ tree has two OVERFLOW CONDITIONS:
A leaf-node overflows if after insertion it contains L + 1 keys
A root-node or an internal node of a B+ tree of order M overflows if, after a key insertion, it contains M keys.
Insertion algorithm:
Search for the appropriate leaf node x to insert the key. Note: Insertion of a key always starts at a leaf node.
If the key exists in the leaf node x, report an error, else
Insert the key in its proper sorted order in the leaf node
If the leaf does not overflow (If x contains less than L+1 keys after insertion), the insertion is done, else
If a leaf node overflows, split it into two, COPY the smallest key y of right splinted node to the parent of the node (Records with keys < y go to the left leaf node.
Records with keys >= y go to the right leaf node). If the parent overflows, split the parent into two (keys < middle key go to the left node. keys > middle key go to the right node. The middle key PROPAGATES to the parent of the splinted parent. The process propagates upward until a parent that does not overflow is reached or the root node is reached. If the root node is reached and it overflows, create a new root node.
*
*
Insert KEY:
Search for KEY using search operation
If the key is found in a leaf node report an error
Insert KEY into that leaf
If the leaf does not overflow (contains <= L keys), just insert KEY into it
If the leaf overflows (contains L+1 keys), splitting is necessary
An example of inserting O into a B+ tree of order M = 4, L = 3.
Search for O; this leaf has 2 keys.
Insert O and maintain the order.
*
*
Insertion in B+ Trees: Splitting a Leaf Node
If the leaf overflows (contains L+1 keys after insertion), splitting is necessary
Splitting leaf:
LeftLeaf has the (L+1) / 2 smallest keys
RightLeaf has the remaining (L+1) / 2 keys
Make a copy of the smallest key in RightLeaf, say MinKeyRight, to be the parent of LeftLeaf and RightLeaf [COPY UP]
Insert MinKeyRight, together with LeftLeaf and RightLeaf, into the original parent node
An example of inserting T into a B+ tree of order M = 4 and L= 3
Search for T; this leaf has 3 keys.
Overflow
xL
xR
Insert S into the parent. Maintain the order of keys and child references (DONE).
Split the leaf (xL gets (L+1)/2 keys, xR gets
(L+1)/2) keys , takes the minimum key in xR be the parent of xL and XR.
*
*
Insertion in B+ Trees: Splitting Internal Node
An insertion in a full parent node causes the parent to overflow, in that case this internal node must be split.
Splitting internal node:
Split it into 2 new internal nodes LeftNode and RightNode
LeftNode has the smallest M/2 -1 keys
RightNode has the largest M/2 keys
NumberOfKeys in LeftNode <= NumberOfKeysInRightNode
Note that the M/2 th key is not in either node.
Make the M/2 th key, say “MiddleKey”, to be the parent of LeftNode and RightNode [PROPAGATE UP]
Insert “MiddleKey”, together with LeftNode and RightNode, into the original parent node
Splitting root:
Follow exactly the same procedure as splitting an internal node
“MiddleKey”, the parent of LeftNode and RightNode, is now set to be the root of the tree
*
*
Insertion in B+ Trees
An example of inserting M into a B+ tree of order M= 4 and L = 3
Split the leaf and distribute the keys.
Search for M; this leaf has 3 keys.
Insert M and B+ tree condition is violated.
*
*
Split the leaf and distribute the keys.
Make L the parent of the two new leaves.
*
*
xL
xR
Since the parent is not full, we can just insert the subtree rooted at J to the parent Done.
*
*
Insertion in B+ Trees
Insert 16 then 8 in the following B+ tree of order M = 5, L = 4:
Note: A * in a leaf node key indicates a key-dataReference pair
Root
17
24
30
13
One new child (leaf node) generated; must add one more reference to its parent, thus one more key value as well.
14* 15*
2*
3*
5*
7*
8*
5
(Note that 5 is
s copied up and
13 17 24 30
*
*
(Note that 17 is pushed up and only

Entry to be inserted in parent node.
this with a leaf split.)
5
24
30
17
13
*
*
Notice that root was split, leading to increase in height.
2*
3*
Root
17
24
30
14*
15*
19*
20*
22*
24*
27*
29*
33*
34*
38*
39*
13
5
7*
5*
8*
Find correct leaf X.
If X has enough space, done!
Else, must split X (into X and a new node X2)
Redistribute entries evenly, put middle key in X2
copy up middle key.
Insert reference (index entry) refering to X2 into parent of X.
This can happen recursively
To split index node, redistribute entries evenly, but push (propagate) up middle key. (Contrast with leaf splits.)
Splits “grow” tree; root split increases height.
*