CS 255: Database System Principles slides: B-trees By:- Arunesh Joshi Id:-006538558.

Post on 22-Dec-2015

213 views 0 download

Transcript of CS 255: Database System Principles slides: B-trees By:- Arunesh Joshi Id:-006538558.

CS 255: Database System Principles

slides: B-trees

By:- Arunesh Joshi Id:-006538558

Agenda• The features and different functionalities of B-

Tree in terms of index structure• The Structure of B-Trees• Applications of B-Trees• Lookup in B-Trees• Range Queries• Insertion into B-Trees• Deletion from a B-Tree• Efficiency of B-Trees

B-Trees

B-tree organizes its blocks into a tree. The tree is balanced, meaning that all paths from the root to a leaf have the same length. Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any number of layers is possible.

functionalities of B- Tree

• B-Trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed.

• B-Trees manage the space on the blocks they use so that every block is between half used and completely full. No overflow blocks are needed.

Structure of B-Trees

• There are three layers in binary trees- the root, an intermediate layer and leaves

• In a B-Tree each block have space for n search-key values and n+1 pointers

[next slide explains the structure of a B-Tree]

Root

B-Tree Example n=3

100

120

150

180

30

3 5 11 30 35 100

101

110

120

130

150

156

179

180

200

Sample non-leaf

to keys to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57 81 95

From non-leaf node

to next leafin sequence57 81 95

To re

cord

w

ith k

ey 5

7

To re

cord

w

ith k

ey 8

1

To re

cord

w

ith k

ey 8

5

Sample leaf node:

In textbook’s notation n=3

Leaf:

Non-leaf:

30 3530

30 35

30

Size of nodes: n+1 pointersn keys (fixed)

Don’t want nodes to be too empty

• Use at least

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

Full node min. node

Non-leaf

Leaf

n=3

120

150

180

30

3 5 11 30 35

coun

ts e

ven

if nu

ll

B-tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to recordsexcept for “sequence pointer”

Number of pointers/keys for B+tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 1 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

Applications of B-trees1. The search key of the B-tree is the primary key for the data

file, and the index is dense. That is, there is one key-pointer pair in a leaf for every record of the data file. The data file may or may not be sorted by primary key.

2. The data file is sorted by its primary key, and the B-tree is a sparse index with one key-pointer pair at a leaf for each block of the data file.

3. The data file is sorted by an attribute that is not a key, and this attribute is the search key for the B-tree. For each key value K that appears in the data file there is one key-pointer pair at a leaf. That pointer goes to the first of the records that have K as their sort-key value.

Lookup in B-Trees

• Suppose we want to find a record with search key 40.

• We will start at the root , the root is 13, so the record will go the right of the tree.

• Then keep searching with the same concept.

Looking for block “40”<not present>13

317

312923191713117532

43

4137 4743

23

Range Queries

• B-trees are used for queries in which a range of values are asked for. Like,

SELECT * FROM R WHERE R. k >= 10 AND R. k <= 25;

Insert into B-tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

(a) Insert key = 32 n=33 5 11 30 31

30

100

32

(a) Insert key = 7 n=3

3 5 11 30 31

30

100

3 5

7

7

(c) Insert key = 160 n=3

100

120

150

180

150

156

179

180

200

160

180

160

179

(d) New root, insert 45 n=3

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

40

30new root

CS 245 Notes 4 24

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B-tree

(b) Coalesce with sibling– Delete 50

10 40 100

10 20 30 40 50

n=4

40

(c) Redistribute keys– Delete 50

10 40 100

10 20 30 35 40 50

n=4

35

35

40 4530 3725 2620 2210 141 3

10 20 30 40

(d) Non-leaf coalese– Delete 37

n=4

40

30

25

25

new root

B-tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!

Why we take 3 as the number of levels of a B-tree?

Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If there is no header information kept on the blocks, then we want to find the largest integer value of n such that -

411 + 8(n + 1) 5 4096. That value is n = 340. 340 key-pointer pairs could fit in one block for our example data. Suppose that the average block has an occupancy midway between the minimum and maximum. i.e.. a typical block has 255 pointers. With a root 255 children and 255*255= 65023 leaves. We shall have among those leaves cube of 253. or about 16.6 million pointers to records. That is, files with up to 16.6 million records can be accommodated by a 3-level B-tree.

Thank youfor bearing me.