Multilevel Indexing and B+ Trees - Welcome to CENG
Transcript of Multilevel Indexing and B+ Trees - Welcome to CENG
![Page 1: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/1.jpg)
CENG 351 File Structures 1
Multilevel Indexing and
B+ Trees
![Page 2: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/2.jpg)
CENG 351 File Structures 2
Problems with simple indexes
If index does not fit in memory:
1. Seeking the index is slow (binary search):
– We don’t want more than 3 or 4 seeks for a search.
2. Insertions and deletions take O(N) disk accesses.
N Log(N+1)
15 keys 4
1000 ~10
100,000 ~17
1,000,000 ~20
![Page 3: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/3.jpg)
CENG 351 File Structures 3
Indexed Sequential Files
• Provide a choice between two alternative
views of a file:
1. Indexed: the file can be seen as a set of
records that is indexed by key; or
2. Sequential: the file can be accessed
sequentially (physically contiguous
records), returning records in order by
key.
![Page 4: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/4.jpg)
CENG 351 File Structures 4
Example of applications
• Student record system in a university:
– Indexed view: access to individual records
– Sequential view: batch processing when posting
grades
• Credit card system:
– Indexed view: interactive check of accounts
– Sequential view: batch processing of payments
![Page 5: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/5.jpg)
CENG 351 File Structures 5
The initial idea
• Maintain a sequence set:
– Group the records into blocks in a sorted way.
– Maintain the order in the blocks as records are
added or deleted through splitting,
concatenation, and redistribution.
• Construct a simple, single level index for
these blocks.
– Choose to build an index that contain the key
for the last record in each block.
![Page 6: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/6.jpg)
CENG 351 File Structures 6
Maintaining a Sequence Set
• Sorting and re-organizing after insertions and deletions is out of question. We organize the sequence set in the following way:
– Records are grouped in blocks.
– Blocks should be at least half full.
– Link fields are used to point to the preceding block and the following block (similar to doubly linked lists)
– Changes (insertion/deletion) are localized into blocks by performing:
• Block splitting when insertion causes overflow
• Block merging or redistribution when deletion causes underflow.
![Page 7: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/7.jpg)
CENG 351 File Structures 7
Example: insertion• Block size = 4
• Key : Last name
ADAMS … BIXBY … CARSON … COLE …Block 1
• Insert “BAIRD …”:
ADAMS … BAIRD … BIXBY …
CARSON .. COLE …
Block 1
Block 2
![Page 8: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/8.jpg)
CENG 351 File Structures 8
Example: deletion
ADAMS … BAIRD … BIXBY … BOONE …Block 1
BYNUM… CARSON .. CARTER ..
DENVER… ELLIS …
Block 2
Block 3
• Delete “DAVIS”, “BYNUM”, “CARTER”,
COLE… DAVISBlock 4
![Page 9: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/9.jpg)
CENG 351 File Structures 9
Add an Index set
Key Block
BERNE 1CAGE 2DUTTON 3EVANS 4FOLK 5GADDIS 6
![Page 10: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/10.jpg)
CENG 351 File Structures 10
Tree indexes
• This simple scheme is nice if the index fits in memory.
• If index doesn’t fit in memory:
– Divide the index structure into blocks,
– Organize these blocks similarly building a tree structure.
• Tree indexes:
– B Trees
– B+ Trees
– Simple prefix B+ Trees
– …
![Page 11: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/11.jpg)
CENG 351 File Structures 11
Separators
Block Range of Keys Separator
1 ADAMS-BERNE
BOLEN
2 BOLEN-CAGECAMP
3 CAMP-DUTTONEMBRY
4 EMBRY-EVANSFABER
5 FABER-FOLKFOLKS
6 FOLKS-GADDIS
![Page 12: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/12.jpg)
CENG 351 File Structures 12
ADAMS-BERNE FOLKS-GADDIS
FABER-FOLKBOLEN-CAGE
CAMP-DUTTON EMBRY-EVANS
BOLEN CAMP
EMBRY
FABER FOLKS
1
2
3
5
4 6
Index set
root
![Page 13: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/13.jpg)
CENG 351 File Structures 13
B Trees
• B-tree is one of the most important data structures
in computer science.
• What does B stand for? (Not binary!)
• B-tree is a multiway search tree.
• Several versions of B-trees have been proposed,
but only B+ Trees has been used with large files.
• A B+tree is a B-tree in which data records are in
leaf nodes, and faster sequential access is possible.
![Page 14: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/14.jpg)
CENG 351 File Structures 14
Formal definition of B+ Tree Properties
• Properties of a B+ Tree of order v :
– All internal nodes (except root) has at least v keys and at most 2v keys .
– The root has at least 2 children unless it’s a leaf..
– All leaves are on the same level.
– An internal node with k keys has k+1 children
![Page 15: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/15.jpg)
CENG 351 File Structures 15
B+ tree: Internal/root node
structure
P0 K1 P1 K2 ……………… Pn-1 Kn Pn
Requirements:
K1 < K2 < … < Kn
For any search key value K in the subtree pointed by Pi,
If Pi = P0, we require K < K1
If Pi = Pn, Kn K
If Pi = P1, …, Pn-1, Ki K < Ki+1
Each Pi is a pointer to a child node; each Ki is a search key value
# of search key values = n, # of pointers = n+1
![Page 16: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/16.jpg)
CENG 351 File Structures 16
Pointer L points to the left neighbor; R points to
the right neighbor
K1 < K2 < … < Kn
v n 2v (v is the order of this B+ tree)
We will use Ki* for the pair <Ki, ri> and omit L and
R for simplicity
B+ tree: leaf node structure
L K1 r1 K2 ……………… Kn rn
R
![Page 17: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/17.jpg)
CENG 351 File Structures 17
Example: B+ tree with order of 1
• Each node must hold at least 1 entry, and at most
2 entries
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40
Root
![Page 18: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/18.jpg)
CENG 351 File Structures 18
Example: Search in a B+ tree order 2• Search: how to find the records with a given search key value?
– Begin at root, and use key comparisons to go to leaf
• Examples: search for 5*, 16*, all data entries >= 24* ...
– The last one is a range search, we need to do the sequential scan, starting from
the first leaf containing a value >= 24.
Root
17 24 30
2* 3* 5* 7* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
![Page 19: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/19.jpg)
CENG 351 File Structures 19
Cost for searching a value in B+ tree• Typically, a node is a page (block or cluster)
• Let H be the height of the B+ tree: we need to read H+1 pages to reach a leaf node
• Let F be the (average) number of pointers in a node (for internal node, called fanout )
– Level 1 = 1 page = F0 page
– Level 2 = F pages = F1 pages
– Level 3 = F * F pages = F2 pages
– Level H+1 = …….. = FH pages (i.e., leaf nodes)
– Suppose there are D data entries. So there are D/(F-1) leaf nodes
– D/(F-1) = FH. That is, H = logF( )
1
2
levelHeight
0
1
1F
D
![Page 20: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/20.jpg)
CENG 351 File Structures 20
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.
– average fanout = 133 (i.e, # of pointers in internal node)
• Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
• Suppose there are 1,000,000,000 data entries.
– H = log133(1000000000/132) < 4
– The cost is 5 pages read
![Page 21: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/21.jpg)
CENG 351 File Structures 21
How to Insert a Data Entry into a
B+ Tree?
• Let’s look at several examples first.
![Page 22: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/22.jpg)
CENG 351 File Structures 22
Inserting 16*, 8* into Example B+ tree
Root17 24 3013
2* 3* 5* 7* 8*
2* 5* 7*3*
17 24 3013
8*
You overflow
One new child (leaf node)
generated; must add one more
pointer to its parent, thus one more
key value as well.
14* 15* 16*
![Page 23: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/23.jpg)
CENG 351 File Structures 23
Inserting 8* (cont.)
• Copy up the
middle value
(leaf split)
2* 3* 5* 7* 8*
5
Entry to be inserted in parent node.
(Note that 5 iscontinues to appear in the leaf.)
s copied up and
13 17 24 30
You overflow!5 13 17 24 30
![Page 24: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/24.jpg)
CENG 351 File Structures 24
(Note that 17 is pushed up and onlyappears once in the index. Contrast
Entry to be inserted in parent node.
this with a leaf split.)
5 24 30
17
13
Insertion into B+ tree (cont.)
5 13 17 24 30
• Understand
difference
between copy-up
and push-up
• Observe how
minimum
occupancy is
guaranteed in
both leaf and
index pg splits.
We split this node, redistribute entries evenly,
and push up middle key.
![Page 25: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/25.jpg)
CENG 351 File Structures 25
Example B+ Tree After Inserting 8*
Notice that root was split, leading to increase in height.
2* 3*
Root
17
24 30
14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
![Page 26: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/26.jpg)
CENG 351 File Structures 26
Inserting a Data Entry into a B+ Tree:
Summary• Find correct leaf L.
• Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, put middle key in L2
• copy up middle key.
• Insert index entry pointing to L2 into parent of L.
• This can happen recursively
– To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height.
– Tree growth: gets wider or one level taller at top.
![Page 27: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/27.jpg)
CENG 351 File Structures 27
Deleting a Data Entry from a B+
Tree
• Examine examples first …
![Page 28: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/28.jpg)
CENG 351 File Structures 28
Delete 19* and 20*
2* 3*
Root
17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
22*
27* 29*22* 24*
Have we still forgot something?
![Page 29: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/29.jpg)
CENG 351 File Structures 29
Deleting 19* and 20* (cont.)
• Notice how 27 is copied up.
• But can we move it up?
• Now we want to delete 24
• Underflow again! But can we redistribute this time?
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
![Page 30: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/30.jpg)
CENG 351 File Structures 30
Deleting 24*• Observe the two leaf
nodes are merged, and 27 is discarded from their parent, but …
• Observe `pull down’ of index entry (below).
30
22* 27* 29* 33* 34* 38* 39*
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
30135 17New root
![Page 31: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/31.jpg)
CENG 351 File Structures 31
Deleting a Data Entry from a B+ Tree:
Summary
• Start at root, find leaf L where entry belongs.
• Remove the entry.
– If L is at least half-full, done!
– If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
• If re-distribution fails, merge L and sibling.
• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.
• Merge could propagate to root, decreasing height.
![Page 32: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/32.jpg)
CENG 351 File Structures 32
Example of Non-leaf Re-distribution
• Tree is shown below during deletion of 24*. (What
could be a possible initial tree?)
• In contrast to previous example, can re-distribute entry
from left child of root to right child.
Root
135 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*
![Page 33: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/33.jpg)
CENG 351 File Structures 33
After Re-distribution
• Intuitively, entries are re-distributed by `pushing
through’ the splitting entry in the parent node.
• It suffices to re-distribute index entry with key 20;
we’ve re-distributed 17 as well for illustration.
14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2* 3*
Root
135
17
3020 22
![Page 34: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/34.jpg)
CENG 351 File Structures 34
Primary vs Secondary Index
• Note: We were assuming the data items
were in sorted order
– This is called primary index
• Secondary index:
– Built on an attribute that the file is not sorted
on.
• Can have many different indexes on the
same file.
![Page 35: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/35.jpg)
A Secondary B+-Tree index
14 16 22 27 2917 18 20 2175 82 3 33 34 38 39
Root
135
17
3020 22
2* 16* 5* 39* 20* 7* 38* 29*22* 3* 14* 8*Actual data
buckets
![Page 36: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/36.jpg)
CENG 351 File Structures 36
More…
• Hash-based Indexes
– Static Hashing
– Extendible Hashing
– Linear Hashing
• Grid-files
• R-Trees
• etc…
![Page 37: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/37.jpg)
CENG 351 File Structures 37
Terminology
• Bucket Factor: the number of records which can fit in a leaf node.
• Fan-out : the average number of children of an internal node.
• A B+tree index can be used either as a primary index or a secondary index.
– Primary index: determines the way the records are actually stored (also called a sparse index)
– Secondary index: the records in the file are not grouped in buckets according to keys of secondary indexes (also called a dense index)
![Page 38: Multilevel Indexing and B+ Trees - Welcome to CENG](https://reader034.fdocuments.in/reader034/viewer/2022051315/627a2b8a3ae94e10e70b98a4/html5/thumbnails/38.jpg)
CENG 351 File Structures 38
Summary
• Tree-structured indexes are ideal for range-searches, also good for equality searches.
• B+ tree is a dynamic structure.– Inserts/deletes leave tree height-balanced; High fanout (F)
means depth rarely more than 3 or 4.
– Almost always better than maintaining a sorted file.
– Typically, 67% occupancy on average.
– If data entries are data records, splits can change rids!
• Most widely used index in database management systems because of its versatility. One of the most optimized components of a DBMS.