Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method...

17
Index tuning-- B+tree

Transcript of Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method...

Page 1: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Index tuning--

B+tree

Page 2: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

overview

• Overview of tree-structured index

• Indexed sequential access method (ISAM)

• B+tree

Page 3: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Data Structures

• Most index data structures can be viewed as trees.

• In general, the root of this tree will always be in main memory, while the leaves will be located on disk.– The performance of a data structure depends on the

number of nodes in the average path from the root to the leaf.

– Data structure with high fan-out (maximum number of children of an internal node) are thus preferred.

Page 4: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Tree-Structured Indexing• Tree-structured indexing techniques support both

range searches and equality searches– index file may still be quite large. But we can

apply the idea repeatedly! – ISAM: static structure; B+ tree: dynamic, adjusts

gracefully under inserts and deletes.

Data pages

Page 5: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Indexed sequential access method (ISAM)

• Data entries vs index entries– Both belong to index file– Data entries: <key, records or pointer to records/record list>– Index entries:<key, pointer to index entries or data entries>

• Both in ISAM and B-tree, leaf pages are data entries

Page 6: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Example ISAM Tree

• Each node can hold 2 entries; no need for `next-leaf-page’ pointers. (Why?)

Page 7: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Comments on ISAM• File creation: Leaf (data) pages allocated

sequentially, sorted by search key; then index pages allocated, then space for overflow pages.

• Index entries: <search key value, page id>; they `direct’ search for data entries, which are in leaf pages.

• Search: Start at root; use key comparisons to go to leaf.

• Cost=logFN ; F = # entries/index pg, N = # leaf pgs

• Insert: Find leaf data entry belongs to, and put it there.

• Delete: Find and remove from leaf; if empty overflow page, de-allocate.

• * Static tree structure: inserts/deletes affect only primary leaf pages.

Page 8: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

After Inserting 23*, 48*, 41*, 42*

Page 9: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

... Then Deleting 42*, 51*, 97** Note

If primary leaf page is empty, just leave it empty so that the number of allocation primary pages does not change!

Page 10: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Features of ISAM

• The primary leaf pages are assumed to be allocated sequentially– Because the number of such pages is known

when the tree is created and does not change subsequently under inserts and deletes

– So next-page-pointers are not needed

Page 11: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Prons and cons of ISAM• Cons:losing sequentiality and balance

– Due to long overflow chains– Leading to poor retrieving performance– One solution

• Keep 20% of each page free when tree was created

• Prons– No need to lock non-leaf index pages since we never

modify them, which is one of important advantages of ISAM over B+tree

– Another is: scans over a large range is more efficient thant B+tree

Page 12: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

B+-Tree

• A B+-Tree is a balanced tree whose leaves contain a sequence of key-pointer pairs.

• Dynamic structure that adjust gracefully for delete and insert

Page 13: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

B+ Tree: The Most Widely Used Index

– Operation (insertion, deletion) keep it Height-balanced.

– Searching for a record requires a traversal from root to the leaf

• Equality query, Insert/delete at log F N cost (F = fanout, N = # leaf pages);

– Grow and shrink dynamically.• Need `next-leaf-pointer’ to chain up the leaf nodes

– To handle cases such as leaf node merging

– Minimum 50% occupancy (except for root). • Each node contains d <= m <= 2d entries. The

parameter d is called the order of the tree.– Data entries at leaf are sorted.

Page 14: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Example B+ Tree

• Each node can hold 4 entries (order = 2)

2 3

Root

17

24 30

14 16 19 20 22 24 27 29 33 34 38 39

135

75 8

Page 15: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Node structure

P0

K1 P

1K 2 P

2K m

P m

index entry

• Non-leaf nodes

Leaf nodes

P0

K1 P

1K 2 P

2K m

P m Next leaf node

Page 16: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

Searching in B+ Tree

• Search begins at root, and key comparisons direct it to a leaf (as in ISAM).

• Search for 5, 15, all data entries >= 24 ...

Based on the search for 15*, we know it is not in the tree!

Root

17 24 30

2 3 5 14 16 19 20 22 24 27 29 33 34 38 39

13

Page 17: Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.

summarize