Index management in shallow depth

30
Index management in shallow depth Andrea Giuliano @bit_shark

description

These slides afford in shallow depth the index management question. There are some example on how your choice can change your relation in terms of I/O accesses

Transcript of Index management in shallow depth

Page 1: Index management in shallow depth

Index managementin shallow depth

Andrea Giuliano@bit_shark

Page 2: Index management in shallow depth

Architecture of a DBMS

Andrea Giuliano @bit_shark

Page 3: Index management in shallow depth

Disk managerThe disk manager provides the following commands for a page

• allocate a page • deallocate a page • read a page • write a page

The size of a page is chosen to be the size of a disk block and pages are stored as disk blocks so that reading or writing a page can be done in one disk I/O

Andrea Giuliano @bit_shark

Page 4: Index management in shallow depth

Buffer manageris the software layer responsible for bringing pages from disk to main memory as needed and manages the available main memory by partitioning it into a collection of pages

the buffer pool

Andrea Giuliano @bit_shark

Page 5: Index management in shallow depth

Mysql indexes are stored!in B-trees

Page 6: Index management in shallow depth

B-tree structure

data entry: record stored in an index file data record: record stored in a database file

Andrea Giuliano @bit_shark

Page 7: Index management in shallow depth

Clustered indexdata entries and data records have the same (or close) order

Andrea Giuliano @bit_shark

Page 8: Index management in shallow depth

Unclustered indexdata entries and data records have the different order criteria

Andrea Giuliano @bit_shark

Page 9: Index management in shallow depth

Dense indexAn index is dense if every value of the search key that appears in the data file appears also in at least one data entry of the index

Andrea Giuliano @bit_shark

Page 10: Index management in shallow depth

An index is sparse if every value of the search key that appears in the data entry points to a page of the data record

Sparse index

Andrea Giuliano @bit_shark

Page 11: Index management in shallow depth

Search in a b-tree

Cost: logF n

fan-out

n leaves

Andrea Giuliano @bit_shark

Page 12: Index management in shallow depth

Insert and delete

..too deep

Insert and delete operations must keep the tree balanced towards split, redistribution and coalesce techniques.

Andrea Giuliano @bit_shark

Page 13: Index management in shallow depth

How can I compute I/O accesses?

Andrea Giuliano @bit_shark

Page 14: Index management in shallow depth

ask for code, author, publisher af all books with a given cost

how many page accesses do we need to answer to the query?

Book case

Query:

• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on

average) • dense non-clustering B+-tree index with search key cost

BOOK

codeauthorcostpublisher

Andrea Giuliano @bit_shark

Page 15: Index management in shallow depth

Let’s build the index structure !• we know that each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20

Book case• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on

average) • dense non-clustering B+-tree index with search key cost

BOOK

codeauthorcostpublisher

Andrea Giuliano @bit_shark

Page 16: Index management in shallow depth

We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)

How many leaves are there in the tree? 2.000.000/13 = 153.846 leaves

Book case

fan-out: 20

Andrea Giuliano @bit_shark

Page 17: Index management in shallow depth

There are 153.846 leavesIn order to go to the leaves we need

Book case

log20 (153.846) = 4 I/O page accesses

fan-out: 20

Andrea Giuliano @bit_shark

Page 18: Index management in shallow depth

Book case

Remember, we have on average 200 records with the same value of the attribute cost therefore 200/13 = 15 pages (on average) We need to visit these leaves because the index is dense and for each tuple we have to access the 200 data record in order to obtain the other attributes

fan-out: 20

Andrea Giuliano @bit_shark

Page 19: Index management in shallow depth

The total cost is: 4 + 15 + 200 = 219 I/O accesses

Book case

~ 3 sec

fan-out: 20

Andrea Giuliano @bit_shark

Page 20: Index management in shallow depth

• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on

average) • sparse clustering B+-tree index with search key cost

ask for code, author, publisher af all books with a given cost

how many page accesses do we need to answer to the query?

BOOK

codeauthorcostpublisher

Book case

Query:

Andrea Giuliano @bit_shark

Page 21: Index management in shallow depth

Let’s build the index structure !• we know that each tuple has 4 field so in each page there are 40 fields • we can infer that 20 data entries fit in one leaf page of the index • so we have a fan-out of 20

Book case• 2.000.000 records (tuples) • 200.000 pages • 10 data record in a page • 200 records with the same value of the attribute cost (on

average) • sparse clustering B+-tree index with search key cost

BOOK

codeauthorcostpublisher

Andrea Giuliano @bit_shark

Page 22: Index management in shallow depth

We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries)BUT each data entry points to a data record page (and not to a tuple)

How many data record pages do we have? 2.000.000/10 = 200.000 data record pages

Book case

fan-out: 20

Andrea Giuliano @bit_shark

Page 23: Index management in shallow depth

We know there is an occupancy factor of 67% we have 13 data entries in the leaves (each of which can contain 20 data entries) BUT each data entry points to a data record (and not to a tuple)

How many leaves there are in the tree? 200.000/13 = 15.384 leaves

Book case

fan-out: 20

Andrea Giuliano @bit_shark

Page 24: Index management in shallow depth

There are 15.384 leavesIn order to go to the leaves we need

Book case

log20 (15.384) = 3 I/O pages accessesRemember, we have on average 200 data records with the same value of the attribute cost therefore 200/10 = 20 data record pages to visit

fan-out: 20

Andrea Giuliano @bit_shark

Page 25: Index management in shallow depth

The total cost is: 3 + 20 = 23 I/O accesses

Book case

~ 0.3 sec

fan-out: 20

Andrea Giuliano @bit_shark

Page 26: Index management in shallow depth

And what if the attributes we wantwere part of the search key?

Book case

Andrea Giuliano @bit_shark

Page 27: Index management in shallow depth

In the worst case we have to visit all the 2.000.000 tuples

Book casewithout index

~ 50 min

Andrea Giuliano @bit_shark

Page 28: Index management in shallow depth

Ο λογος δηλοι οτι

Think before doing

Page 29: Index management in shallow depth

?Thanks!

Andrea Giuliano @bit_shark

Page 30: Index management in shallow depth

References:

https://www.flickr.com/photos/james_wheeler/9340597900/sizes/o/in/photolist-fep1ko-bQByHk-duQ4Qr-82aKA9-82aL6y-8Tn6uc-iPzADZ-99etoQ-cZy6e9-jyqdnW-bxHjLf-8gP59X-cZDq3h-cZDq9d-cZDq8N-cZDq7d-cZDqwy-cZDqym-cZDqCf-cZDqAo-cZDqsS-cZDqnQ-cZDqey-cZDqkA-cZDqkJ-

cZDqLh-7Dg8pp-a7f1QC-a7c8rK-7Dg7n6-gCbBVr-9FZ4J1-e6XCpX-aZnsGv-ecTv5D-atFACM-gjXozL-9LBjtC-knoEf8-8LGGqw-a8Hw3M-gvL3bp-a7gmG6-aju6p2-

brQ76S-7Ckbm1-85XaXe-8JBcwN-9oYU3p-a3VsvR-atFAup/ http://www.woking.gov.uk/images/instances/00004A290FD4.C0A801BA.000079A7.0015.jpg

http://assets.20bits.com/20080513/b-tree.png, http://dblab.cs.toronto.edu/courses/443/2014/basic-index/dense-index.png http://dblab.cs.toronto.edu/courses/443/2014/basic-index/sparse-index.png

http://www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg, http://www.geeky-gadgets.com/wp-content/uploads/2008/10/insert-delete_cufflinks.jpg http://webhostinggeeks.com/blog/wp-content/uploads/2012/07/611157_small.jpg

https://farm7.staticflickr.com/6237/6230474283_50d1f0f4ac_b.jpg, https://www.flickr.com/photos/javiercosio/6230474283/sizes/l/in/photolist-

auyNWp-9GRjnM-9GRjmZ-9GRjFz-9GRjvV-9GUcPq-9GRjz8-9GRjm4-9GUcTb-9GRjCt-9GUcQC-9GUcYm-9GRjZD-9GRk1Z-9GUcMU-9GUcGh-9GRjsg-9GRjYZ-9GRjTF-9GRjGe-9GRjNe-9GRjBz-9GUcNs-9GU

d25-9GUcKS-9GRjPn-9GRjRg-9GUcBh-9GUcVf-9GUcxj-9GUcuu-brLe7G-e8s4Cw-fyi4Rj-83LyYW-83HuFg-83LyLm-83LzUY-83Htrv-83Hv2H-83LBBb-83LAg9-83LBhQ-83Hw8t-83HtKD-83H

sYk-afT6uk-cwVhL1-ceVgGC-8tFezr-8SeW9d/