Post on 09-Apr-2018
8/8/2019 btrees (1)
1/33
1
Multilevel Indexing andB+ Trees
8/8/2019 btrees (1)
2/33
2
Indexed Sequential Files
Provide a choice between two alternativeviews of a file:
1. Indexed : the file can be seen as a set of records that is indexed by key; or
2. Sequential: the file can be accessedsequentially (physically contiguousrecords), returning records in order bykey.
8/8/2019 btrees (1)
3/33
3
E xample of applications
Student record system in a university: Indexed view: access to individual records Sequential view: batch processing when posting
grades
Credit card system: Indexed view: interactive check of accounts
Sequential view: batch processing of payments
8/8/2019 btrees (1)
4/33
4
The initial idea
Maintain a sequence set: Group the records into blocks in a sorted way. Maintain the order in the blocks as records are
added or deleted through splitting,concatenation, and redistribution.
Construct a simple, single level index for
these blocks. Choose to build an index that contain the key
for the last record in each block.
8/8/2019 btrees (1)
5/33
5
Maintaining a Sequence Set
Sorting and re-organizing after insertions anddeletions is out of question. We organize thesequence set in the following way: Records are grouped in blocks . Blocks should be at least half full . Link fields are used to point to the preceding block and
the following block (similar to doubly linked lists) Changes (insertion/deletion) are localized into blocks
by performing:Block splitting when insertion causes overflowBlock merging or redistribution when deletion causesunderflow.
8/8/2019 btrees (1)
6/33
6
E xample: insertionBlock size = 4Key : Last name
ADAMS BIXBY CARSON COLE Block 1
Insert BAIRD :
ADAMS BAIRD BIXBY
CARSON .. COLE
Block 1
Block 2
8/8/2019 btrees (1)
7/33
7
E xample: deletionADAMS BAIRD BIXBY BOONE Block 1
BYNUM CARSON .. CARTER ..
DENVER ELLIS
Block 2
Block 3
Delete DAVIS, BYNUM, CART E R,
COLE DAVISBlock 4
8/8/2019 btrees (1)
8/33
8
Add an Index set
Key Block
BERNE 1CAGE 2DUTTON 3EVANS 4F OLK 5GADDIS 6
8/8/2019 btrees (1)
9/33
9
Tree indexes
This simple scheme is nice if the index fits inmemory.If index doesnt fit in memory: Divide the index structure into blocks,
Organize these blocks similarly building a treestructure.
Tree indexes: B Trees
B+ Trees Simple prefix B+ Trees
8/8/2019 btrees (1)
10/33
10
Separators
Block Range of Keys Separator1 ADAMS-BERNE
BOL E N2 BOLEN-CAGE
CAMP3 CAMP-DUTTON
E MBRY4 EMBRY-EVANS
FAB E R 5 F ABER- F OLK
FOLKS6 F OLKS-GADDIS
8/8/2019 btrees (1)
11/33
11
ADAMS-BERNE F OLKS-GADDIS
FABER- F OLK BOLEN-CAGE
CAMP-DUTTON EMBRY-EVANS
BOLEN CAMP
EMBRY
FABER F OLKS
1
2
3
5
4 6
Index set
root
8/8/2019 btrees (1)
12/33
12
B Trees
B-tree is one of the most important data structuresin computer science.What does B stand for? (Not binary!)
B-tree is a multiway search tree.Several versions of B-trees have been proposed,
but only B+ Trees has been used with large files.
A B+tree is a B-tree in which data records are inleaf nodes, and faster sequential access is possible.
8/8/2019 btrees (1)
13/33
13
Formal definition of B+ Tree Properties
Properties of a B+ Tree of order v : All internal nodes (except root) has at least v keys
and at most 2v keys . The root has at least 2 children unless it s a leaf.. All leaves are on the same level. An internal node with k keys has k+1 children
8/8/2019 btrees (1)
14/33
14
B+ tree: Internal/root node
structureP 0 K1 P 1 K2 P n-1 Kn P n
Requirements:
K1 < K 2 < < K n
For any search key value K in the subtree pointed by Pi,If Pi = P0, we require K < K1If Pi = Pn, Kn e KIf Pi = P1, , Pn-1, Ki < K e Ki+1
Each P i is a pointer to a child node; each K i is a search key value# of search key values = n, # of pointers = n+1
8/8/2019 btrees (1)
15/33
15
Pointer L points to the left neighbor; R points tothe right neighbor
K1 < K2 < < K nv e n e 2v (v is the order of this B+ tree)
W e will use K i* for the pair and omit L andR for simplicity
B+ tree: leaf node structure
L K1 r 1 K2 Kn r nR
8/8/2019 btrees (1)
16/33
16
E xample: B+ tree with order of 1
Each node must hold at least 1 entry, and at most2 entries
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40
Root
8/8/2019 btrees (1)
17/33
17
E xample: Search in a B+ tree order 2Search: how to find the records with a given search key value? Begin at root, and use key comparisons to go to leaf
Examples: search for 5* , 16* , all data entries >= 2 4* ... The last one is a range search, we need to do the sequential scan, starting from
the first leaf containing a value >= 2 4 .Root
17 24 30
2* 3* 5* 7* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
8/8/2019 btrees (1)
18/33
18
B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67% . average fanout = 1 33 (i.e, # of pointers in internal
node)
Can often hold top levels in buffer pool: Level 1 = 1 page = 8 Kbytes Level 2 = 1 33 pages = 1 Mbyte Level 3 = 1 7 ,68 9 pages = 1 33 MBytes
Suppose there are 1,000,000,000 data entries. H = log 133 (1000000000/1 3 2) < 4
The cost is 5 pages read
8/8/2019 btrees (1)
19/33
19
H ow to Insert a Data E ntry into a
B+ Tree?
Lets look at several examples first.
8/8/2019 btrees (1)
20/33
20
Inserting 16*, 8* into E xample B+ tree
Root 17 24 3013
2* 3* 5* 7* 8*
2* 5* 7*3*
17 24 3013
8*
You overflow
O ne new child (leaf node)generated; must add one morepointer to its parent, thus one morekey value as well.
14* 15* 16*
8/8/2019 btrees (1)
21/33
21
Inserting 8* (cont.)
Copy up themiddle value(leaf split)
2*3* 5*
7* 8*
5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)
s copied up and
13 17 24 30
You overflow!5 13 17 24 30
8/8/2019 btrees (1)
22/33
22
(Note that 17 is pushed up and only
appears once in the index. Contrast
Entry to be inserted in parent node.
this with a leaf split.)
5 24 30
17
13
Insertion into B+ tree (cont.)
5 13 17 24 30
Understanddifference
between copy-upand push-up
Observe howminimumoccupancy isguaranteed in
both leaf and
index pg splits.
We split this node, redistribute entries evenly,and push up middle key.
8/8/2019 btrees (1)
23/33
23
E xample B+ Tree After Inserting 8*
Notice that root was split, leading to increase in height.
2* 3*
Root
17
24 30
14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
8/8/2019 btrees (1)
24/33
8/8/2019 btrees (1)
25/33
25
Deleting a Data E ntry from a B+
TreeExamine examples first
8/8/2019 btrees (1)
26/33
26
Delete 19* and 20*
2* 3*
Root
17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
22*
27* 29*22* 24*
Have we still forgot something?
8/8/2019 btrees (1)
27/33
27
Deleting 19* and 20* (cont.)
Notice how 2 7 is c opied up .But can we move it up?Now we want to delete 2 4
Underflow again! But can we redistribute this time?
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
8/8/2019 btrees (1)
28/33
28
Deleting 24*Observe the two leaf nodes are merged, and27 is discarded fromtheir parent, but
Observe pull down of index entry (below).
30
22* 27* 29* 33* 34* 38* 39*
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
30135 17N ew root
8/8/2019 btrees (1)
29/33
29
Deleting a Data E ntry from a B+ Tree:
SummaryStart at root, find leaf L where entry belongs.Remove the entry. If L is at least half-full, done! If L has only d-1 entries,
Try to re-distribute , borrowing from sibling (adja c ent nodewith same parent as L) .If re-distribution fails, merge L and sibling.
If merge occurred, must delete entry (pointing to L or sibling) from parent of L.Merge could propagate to root, decreasing height.
8/8/2019 btrees (1)
30/33
3 0
E xample of Non-leaf Re-distribution
Tree is shown below during deletion of 2 4* . (Whatcould be a possible initial tree?)In contrast to previous example, can re-distribute entryfrom left child of root to right child.
Root
135 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*
8/8/2019 btrees (1)
31/33
3 1
After Re-distribution
Intuitively, entries are re-distributed by ` pushing through the splitting entry in the parent node.It suffices to re-distribute index entry with key 20;weve re-distributed 1 7 as well for illustration.
14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2* 3*
Root
135
17
3020 22
8/8/2019 btrees (1)
32/33
3 2
Terminology
Bucket Factor: the number of records which canfit in a leaf node.Fan-out : the average number of children of aninternal node.A B+tree index can be used either as a primaryindex or a secondary index. Primary index: determines the way the records are
actually stored (also called a sparse index) Secondary index: the records in the file are not
grouped in buckets according to keys of secondaryindexes (also called a dense index)
8/8/2019 btrees (1)
33/33
33
SummaryTree-structured indexes are ideal for range-searches, also good for equality searches.B+ tree is a dynamic structure. Inserts/deletes leave tree height-balanced; High fanout ( F )
means depth rarely more than 3 or 4 . Almost always better than maintaining a sorted file. Typically, 67% occupancy on average. If data entries are data records, splits can change rids!
Most widely used index in database managementsystems because of its versatility. One of the mostoptimized components of a DBMS.