B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching,...

40
B + -Trees

Transcript of B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching,...

Page 1: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

B+-Trees

Page 2: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

MotivationAn AVL tree with N nodes is an excellent

data structure for searching, indexing, etc.The Big-Oh analysis shows that most

operations finish within O(log N) timeThe theoretical conclusion works as long as

the entire structure can fit into the main memory

When the size of the tree is too large to fit in main memory and has to reside on disk, the performance of AVL tree may deteriorate rapidly

Page 3: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

From Binary to M-aryIdea: allow a node in a tree to have many

childrenLess disk access = smaller tree height = more

branchingAs branching increases, the depth decreasesAn M-ary tree allows M-way branching

Each internal node has at most M childrenA complete M-ary tree has height that is roughly

logM N instead of log2 NIf M = 20, then log20 220 < 5Thus, we can speedup the search significantly

Want all leaves to be at same level. Can do that by varying the branching factor.

Page 4: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

M-ary Search TreeA binary search tree has one key to decide

which of the two branches to takeAn M-ary search tree needs M–1 keys to decide

which branch to take - “One more kid than key”

An M-ary search tree should be balanced in some way tooWe don’t want an M-ary search tree to

degenerate to a linked list, or even a binary search treeThus, we require that each node is at

least ½ full!

Page 5: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

B+ Tree A B+-tree of order M (M>3) is an M-ary tree with

the following properties:1. The data items are stored in leaves2. The root is either a leaf or has between two

and M children 3. The non-leaf nodes store up to M-1 keys to

guide the searching; key i represents the smallest key in subtree i+1

4. All non-leaf nodes (except the root) have between M/2 and M children

5. All leaves are at the same depth and have between L/2 and L data items, for some L (usually L << M, but we will assume M=L in most examples)

Page 6: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Keys in Internal NodesWhich keys are stored at the internal nodes?

There are several ways to do it. Different books adopt different conventions

We will adopt the following convention:key i in an internal node is the smallest key in

its i+1 subtree (i.e., right subtree of key i)I would even be less strict. Since internal nodes

are “roadsigns”, I would just not bother to update the internal values.

Even following this convention, there is no unique B+-tree for the same set of records

Page 7: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

B+ Tree Example 1 (Order 5, M=L=5)

Records are stored at the leaves (we only show the keys here) Since L=5, each leaf has between 3 and 5 data items (root can be exception) Since M=5, each nonleaf node has between 3 to 5 children (root can be

exception)

Requiring nodes to be half full guarantees that the B+ tree does not degenerate into a simple binary tree

Page 8: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

B+ Tree Example 2 (Order M=L=4)

We can still talk about left and right child pointersE.g., the left child pointer of N is the same as the right

child pointer of JWe can also talk about the left subtree and right

subtree of a key in internal nodes

Page 9: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

B+ Tree in Practical UsageEach internal node/leaf is designed to fit into one I/O

block of data. An I/O block usually can hold quite a lot of data. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion

B+-tree is a popular structure used in commercial databases. To further speed up the search, the first one or two levels of the B+-tree are usually kept in main memory

wasted space: The disadvantage of B+-tree is that most nodes will have less than M-1 keys most of the time.

The textbook calls the tree B-tree instead of B+-tree. In some other textbooks, B-tree refers to the variant where the actual records are kept at internal nodes as well as the leaves. Such a scheme is not practical. Keeping actual records at the internal nodes will limit the number of keys stored there, and thus increasing the number of tree levels

Page 10: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Searching ExampleSuppose that we want to search for the key K.

The path traversed is shown in bold

Page 11: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Insertionfind the leaf locationInsert K into node loc

Splitting (instead of rotations in AVL trees) of nodes is used to maintain properties of B+-trees

If leaf loc contains < L keys, then insert K into loc (at the correct position

If x is already full (i.e. containing L keys). Split locCut loc off from its parent Split loc into two pieces. Insert K into the correct pieceIdentify key to be the parent of xL and xR, and insert

the copy together with its child pointers into the old parent of x.

Page 12: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Inserting into a Non-full Leaf (L=3)

Page 13: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Splitting a Leaf: Inserting T (L=3)

Page 14: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Splitting Example 2 (L=3, M=4)

Page 15: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Splitting an Internal NodeTo insert a key K into a full internal node x:Cut x off from its parentInsert K and its left and right child pointers into

x, pretending there is space. Now x has M keys.

Split x into 2 new internal nodes xL and xR, with xL containing the ( M/2 - 1 ) smallest keys, and xR containing the M/2 largest keys. Note that the (M/2)th key J is not placed in xL or xR

Make J the parent of xL and xR, and insert J together with its child pointers into the old parent of x.

Page 16: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Notice the multiple splits

Page 17: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Termination Splitting will continue as long as we

encounter full internal nodesIf the split internal node x does not have a

parent (i.e. x is a root), then create a new root containing the key J and its two children

Page 18: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Deletion Find and delete in leaf May have too few nodes. Do reverse of add (pull down and slap

together) BUT, it could be that when you combine

neighbor nodes you get a node that is too large. Then, you would have to split it apart.

Better to shift some of the records from a neighbor into the leaf that is too small.

Page 19: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Removal of a Key target can appear in at most one ancestor

y of x as a key (why?) Node y is seen when we searched down

the tree After deleting from node x, we can access

y directly and replace target by the new smallest key in x

Page 20: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Deletion Example –deletion causes no issues

Want to delete 15

Page 21: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Want to delete 9

Again, no problems

Page 22: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Want to delete 10, situation 1

Page 23: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

uv

Deletion of 10 node too small

Note, if would have merged with left (7,8) node, no problems would have occurred

Page 24: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Share with right neighbor

Page 25: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Example

Want to delete 12

Page 26: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

u v

Page 27: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

Page 28: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

too few keys! …

Page 29: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Deleting a Key in an Internal Node Suppose we remove a key from an

internal node u, and u has less than M/2 -1 keys after that

Case 1: u is a rootIf u is empty, then remove u and make its

child the new root

Page 30: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Deleting a key in an internal nodeCase 2: the right sibling v of u has M/2 keys or

moreMove the separating key between u and v in the parent of u

and v down to uMake the leftmost child of v the rightmost child of uMove the leftmost key in v to become the separating key

between u and v in the parent of u and v.

Case 2: the left sibling v of u has M/2 keys or moreMove the separating key between u and v in the parent of u

and v down to u. Make the rightmost child of v the leftmost child of uMove the rightmost key in v to become the separating key

between u and v in the parent of u and v.

Page 31: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

…Continue From Previous Example

u v

case 2

Page 32: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

Page 33: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Deleting a key in an internal node Case 3: all sibling v of u contains exactly

M/2 - 1 keysMove the separating key between u and v in the parent of u and v down to u

Move the keys and child pointers in u to vRemove the pointer to u at parent.

Page 34: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Example

Want to delete 5

Page 35: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

uv

Page 36: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

Page 37: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d - Pull 7 down and slap together

u v

case 3

Page 38: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

Page 39: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Cont’d

Page 40: B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Another examplehttp://www.ceng.metu.edu.tr/~karagoz/cen

g302/302-B+tree-ind-hash.pdf