Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192...

27
Indexing

Transcript of Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192...

Page 1: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Indexing

Page 2: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Physical Disk Structure

Page 3: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Disk Example

• Four platters providing eight surfaces• 213 = 8192 tracks per surface• 28 = 256 sectors per track• 29 = 512 bytes per sector

• Sector is physical unit while block is logical unit dependent on the DBMS

• Typically a block is as large as a sector

Page 4: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Disk Access Characteristics

• To read a block– Head has to move to the track containing the

block (seek time)– The block has rotate under the head

(rotational latency)– Transfer time (negligible)

Page 5: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

I/O model of computation

• If a block needs to be moved between disk and main memory then the time taken to perform the read or write is much larger than the time likely to be used manipulating the data in main memory. Thus number of blocks accessed is a good approximation of the time needed by the algorithm and should be minimized

• Minimize seek time + rotational latency

Page 6: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Sorting data in Secondary Storage

• Suppose a relation consists of 10,000,000 records

• Want to sort this relation on a “key”• 100 bytes per record. Approx 1 gigabyte• Assume 50 MB of main memory available• Disk block is 4096 bytes.• Relation takes 250,000 blocks. • 12,800 blocks can fit in memory.

Page 7: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Sorting…cont

If data fits in main memory, the fastest algorithms for sorting are variants of Quicksort.

Preferred method is to minimize the number of times a block is brought into main memory

Page 8: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Merge-Sort exampleStep List 1 List 2 Output

Start 1,3,4,9 2,5,7,8 none

1) 3,4,9 2,5,7,8 1

2) 3,4,9 5,7,8 1,2

3) 4,9 5,7,8 1,2,3

4) 9 5,7,8 1,2,3,4

5) 9 7,8

1,2,3,4,5

6) 9 8

1,2,3,4,5,7

7) 9 1,2,3,4,5,7,8

8) 1,2,3,4,5,7,8,9

Page 9: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Two-Phase Multiway Merge Sort

• Phase 1: Sort main-memory sized pieces of data and store them as sorted lists.– Fill all memory with blocks from orig. list– Time:read and write 250,000 blocks; 15

millisecond per block; 7500 seconds, 125 min

• Phase 2: Merge all the sorted lists into a single sorted list.

Page 10: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Phase 2

• If we use the main-memory merge then we would have to read the data 2log(n) times where n is the number of sorted lists.

• The common strategy is– Bring the first block of each sorted list into memory and have

one output buffer– Find smallest key among remaining keys in all lists– Move the smallest element to the first available position in the

buffer– If output block is full write buffer to disk and reinitialize to empty– If the block from where the smallest element was taken is full,

then bring in the next block from the same sorted list into the buffer

• Cost of phase 2 is again is 125 min; Total cost is 250 min

Page 11: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Motivation

• SQL is declarative

• How should the following queries be processed:

SELECT * from R

SELECT * from R where R.A = ’10’

Page 12: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Index Function

IIn

dex Block

Holdingrecords

value MatchingRecords

Page 13: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Book Analogy

• Just remember the book index

• Index is a set of pages (a separate file) with pointers (page numbers) to the data page which contains the value

• Also note difference between “search key” and “primary key”

Page 14: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Types of Indexes

• Simple indexes on sorted files

• Secondary indexes on unsorted files

• B-trees on any type of file

Page 15: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Sequential File

• A file sorted on the attribute(s) of the index.• Very useful when the search attribute is the

primary key.• Build a dense index on the file. • Called dense because every key from the data

file is represented in the index.• Note the index only contains the key and pointer

of the data file and thus is usually much smaller than the data file.

Page 16: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Example

• Suppose a relation has 1,000,000 records• A block is of size 4096 bytes.• 10 records fit in one block.• Thus size of data is > 400 MB• An index will have a 12 byte representation for

each record.• Thus will fit 100 index entries in a block.• Index will fit in 10000 blocks (40 MB).• Log2(10000) ~= 14 (Thus 14 I/O’s for lookup)

Page 17: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Sparse Index

• Instead of one index record per data record, use one index record per block of data record.

• This is called a sparse index.• Suppose query is “Is there a record with

key value K”.– Just check in dense index– For sparse index a data block has to be

retrieved

Page 18: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Secondary Indexes

• When you go to a library, books are sorted by Call Number.

• The call numbers ranges on the shelves are like a sparse index.

• Now what if you want to search by “last name” and not call number.

• You build a secondary structure on the books. • Secondary structure does not determine or

influence the place of data records.• You can have several secondary structures on

one file.

Page 19: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Example

• SELECT book.title

FROM books

WHERE books.lastname=“Codd”

Create secondary index by SQL statement

CREATE INDEX LNIndex on Books (lastname)’

Page 20: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Question?

• Can a secondary index be sparse?

Page 21: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

B-Trees

• B-Trees are the most commonly used indexing structure in commercial systems.

• Several variants are available, but the most popular is called the B+ tree.

• Roughly speaking– Ord Array (search: O(Log(n)), update:O(n))– Linked List (search:O(n), update: O(1))– Tree (search: Log(n), update:Log(n))

Page 22: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Structure of B-Tree

• Organizes its blocks into a tree• Tree is Balanced: all paths from leaf to

root have the same length• There are three types of nodes

– Root,– Interior– Leaf

• Associated with each tree is a layout parameter n (n search keys, n+1 pointers)

Page 23: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Example

• Suppose block size is 4096 bytes. Each key is 4 bytes and pointer is 8 bytes.

• Want 4*n + 8*(n+1) <= 4096; n = 340

20 31 5257 81 95

Leaf Node Interior Node

To record with key 57

To record with key 81

To record with key 95

Next leafin sequence

To keys K < 20

To keys 20 <= K < 31

To keys 31 <= K < 52

To keys K>=52

Page 24: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

13

7 23 31 43

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

Example

Page 25: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

17

7 37 43

2 3 5 7 13 13 17 23 23 23 23 37 41 43 47

-

Page 26: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

Efficiency of B-Trees

• Earlier we saw up to 340 key-pointer pairs.

• Assume average block has half-occupancy =~ 255

• 1 root block, 255 child nodes and 255*255 leaf nodes.

• In the leaf we will have 2553 = 16.6 million records

• Thus upto 4 I/O to access any record!

Page 27: Indexing. Physical Disk Structure Disk Example Four platters providing eight surfaces 2 13 = 8192 tracks per surface 2 8 = 256 sectors per track 2 9.

B-Tree Animation

• Go to http://slady.cz/java/bt/ to see B-Tree animation.