BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the...

28
BTrees & Bitmap Indexes BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By: Under the supervision of: Deepti Kundu Dr. T.Y.Lin Maciej Kicinski
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the...

Page 1: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

BTrees & Bitmap IndexesBTrees & Bitmap Indexes14.2, 14.7

DATABASE SYSTEMS – The Complete Book

Presented By: Under the supervision of:

Deepti Kundu Dr. T.Y.Lin

Maciej Kicinski

Page 2: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

StructureStructure

• A balance tree, meaning that all paths from the

leaf node have the same length.

• There is a parameter n associated with each Btree

block. Each block will have space for n searchkeys

and n+1 pointers.

• The root may have only 1 parameter, but all other

blocks most be at least half full.

Page 3: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

StructureStructure

● A typical node >● a typical interior node would havepointers pointing toleaves with outvalues● a typical leaf wouldhave pointers pointto recordsN search keysN+1 pointers

Page 4: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

ApplicationApplication

• The search key of the Btree is the primary key for the data file.

• Data file is sorted by its primary key.

• Data file is sorted by an attribute that is not a key,and this attribute is the search key for the Btree.

Page 5: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

LookupLookup

If at an interior node, choose the correct pointer to use. This is done by comparing keys to search value.

Page 6: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

LookupLookup

If at a leaf node, choose the key that matches what you are looking for and the pointer for

that leads to the data.

Page 7: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

InsertionInsertion

• When inserting, choose the correct leaf node to put pointer to data.

• If node is full, create a new node and split keysbetween the two.

• Recursively move up, if cannot create new pointer to new node because full, create new node.

• This would end with creating a new root node, ifthe current root was full.

Page 8: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

DeletionDeletion

Perform lookup to find node to delete and delete it.

If node is no longer half full, perform join on adjacent node and recursively delete up, or key move if that node is full and recursively change pointer up.

Page 9: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

EfficiencyEfficiency

Btrees allow lookup, insertion, and deletion of records using very few disk I/Os.

Each level of a Btree would require one read. Then you would follow the pointer of that to the next or final read.

Page 10: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

EfficiencyEfficiency

Three levels are sufficient for Btrees. Having each block have 255 pointers, 255^3 is about 16.6 million.

You can even reduce disk I/Os by keeping a level of a Btree in main memory. Keeping the first block with 255 pointers would reduce the reads to 2, and even possible to keep the next 255 pointers in memory to reduce reads to 1.

Page 11: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Bitmap Indexes DefinitionBitmap Indexes Definition

A bitmap index for a field F is a collection of bit-vectors of length n, one for each possible value that may appear in that field F.[1]

Page 12: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

What does that mean?What does that mean?

• Assume relation R with – 2 attributes A and

B.– Attribute A is of

type Integer and B is of type String.

– 6 records, numbered 1 through 6 as shown.

A B

1 30 foo

2 30 bar

3 40 baz

4 50 foo

5 40 bar

6 30 baz

Page 13: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Example Continued…Example Continued…

• A bitmap for attribute B is:A bitmap for attribute B is:

Value Vector

foo 100100

bar 010010

baz 001001

A B

1 30 foo

2 30 bar

3 40 baz

4 50 foo

5 40 bar

6 30 baz

Page 14: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Where do we reach?Where do we reach?

• A bitmap index is a special kind of database index that uses bitmaps.[2]

• Bitmap indexes have traditionally been considered to work well for data such as gender, which has a small number of distinct values, e.g., male and female, but many occurrences of those values.[2]

Page 15: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

A little more…A little more…

• A bitmap index for attribute A of relation R is:– A collection of bit-vectors– The number of bit-vectors = the number of distinct

values of A in R. – The length of each bit-vector = the cardinality of

R.– The bit-vector for value v has 1 in position i, if the

ith record has v in attribute A, and it has 0 there if not.[3]

• Records are allocated permanent numbers.[3]• There is a mapping between record numbers and record

addresses.[3]

Page 16: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Motivation for Bitmap IndexesMotivation for Bitmap Indexes

• Very efficient when used for partial match queries.[3]

• They offer the advantage of buckets [2]

–Where we find tuples with several specified attributes without first retrieving all the record that matched in each of the attributes.

• They can also help answer range queries [3]

Page 17: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Another ExampleAnother Example

Multidimensional Array of multiple types{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}

5 = 10001079 = 0101004 = 0010006 = 000001d = 101100t = 010010a = 000001

Page 18: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Example Continued…Example Continued…

{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}

Searching for items is easy, just AND together.

To search for (5,d)

5 = 100010

d = 101100

100010 AND 101100 = 100000

The location of the record has been traced!

Page 19: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Compressed BitmapsCompressed Bitmaps

• Assume:

– The number of records in R are n– Attribute A has m distinct values in R

• The size of a bitmap index on attribute A is m*n.• If m is large, then the number of 1’s will be around 1/m.

– Opportunity to encode• A common encoding approach is called run-length encoding.

[1]

Page 20: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Run-length encoding

• Represents runs– A run is a sequence of i 0’s followed by a 1, by some suitable binary encoding

of the integer i.

• A run of i 0’s followed by a 1 is encoded by:– First computing how many bits are needed to represent i, Say k– Then represent the run by k-1 1’s and a single 0 followed by k bits which

represent i in binary.– The encoding for i = 1 is 01. k = 1– The encoding for i = 0 is 00. k = 1

• We concatenate the codes for each run together, and the sequence of bits is the encoding of the entire bit-vector

Page 21: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Understanding with an ExampleUnderstanding with an Example• Let us decode the sequence 11101101001011• Staring at the beginning (left most bit):

– First run: The first 0 is at position 4, so k = 4. The next 4 bits are 1101, so we know that the first integer is i = 13

– Second run: 001011• k = 1• i = 0

– Last run: 1011• k = 1• i = 3

• Our entire run length is thus 13,0,3, hence our bit-vector is: 0000000000000110001

Page 22: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Managing Bitmap IndexesManaging Bitmap Indexes

1) How do you find a specific bit-vector for a

value efficiently?

2) After selecting results that match, how do you retrieve the results efficiently?

3) When data is changed, do you you alter bitmap index?

Page 23: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

1) Finding bit vectors1) Finding bit vectors

– Think of each bit-vector as a key to a value.[1]

– Any secondary storage technique will be efficient in retrieving the values.[1]

– Create secondary key with the attribute value as a search key [3]

• Btree

• Hash

Page 24: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

2) Finding Records2) Finding Records

• Create secondary key with the record number as a search key [3]

• Or in other words,

– Once you learn that you need record k, you can create a secondary index using the kth position as a search key.[1]

Page 25: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

3) Handling Modifications3) Handling Modifications

Two things to remember:

Record numbers must remain fixed once assigned

Changes to data file require changes to bitmap index

Page 26: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Deletion–Tombstone replaces deleted record–Corresponding bit is set to 0

Page 27: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Insertion–Record assigned the next record number. –A bit of value 0 or 1 is appended to each bit

vector–If new record contains a new value of the

attribute, add one bit-vector.

Page 28: BTrees & Bitmap Indexes 14.2, 14.7 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin Maciej Kicinski.

Modification–Change the bit corresponding to the old

value of the modified record to 0–Change the bit corresponding to the new

value of the modified record to 1–If the new value is a new value of A, then

insert a new bit-vector.