CS4432: Database Systems II

45
CS 4432 lecture #9 1 CS4432: Database Systems II Lecture #8 (Basic indexing) Professor Elke A. Rundensteiner

description

CS4432: Database Systems II. Lecture #8 (Basic indexing). Professor Elke A. Rundensteiner. Indexing (Chapter 14 ). Indexing : helps to retrieve data quicker for certain queries value= 1,000,000 Select * FROM Emp WHERE salary = 1,000,000;. record. ?. value. Topics. - PowerPoint PPT Presentation

Transcript of CS4432: Database Systems II

Page 1: CS4432: Database Systems II

CS 4432 lecture #9 1

CS4432: Database Systems IILecture #8

(Basic indexing)

Professor Elke A. Rundensteiner

Page 2: CS4432: Database Systems II

CS 4432 lecture #9 2

Indexing : helps to retrieve data quicker for certain queries

value= 1,000,000

Select * FROM Emp WHERE salary = 1,000,000;Select * FROM Emp WHERE salary = 1,000,000;

Indexing (Chapter 14 )

value

record

Page 3: CS4432: Database Systems II

CS 4432 lecture #9 3

Topics

• Sequential Index Files • Secondary Indexes

Page 4: CS4432: Database Systems II

CS 4432 lecture #9 4

Sequential File

2010

4030

6050

8070

10090

Page 5: CS4432: Database Systems II

CS 4432 lecture #9 5

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Every record

is in index.

Page 6: CS4432: Database Systems II

CS 4432 lecture #9 6

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Only first record

per block in index.

Page 7: CS4432: Database Systems II

CS 4432 lecture #9 7

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

Page 8: CS4432: Database Systems II

CS 4432 lecture #9 8

Note : DATA FILE or INDEX can be both “ordered files”.

Question:How would we lay them out on disk ?

- contiguous layout on disk ? - block-chained layout on disk ?

Page 9: CS4432: Database Systems II

CS 4432 lecture #9 9

Questions:

• Do we want to build a dense 2nd-level index for a dense index?

• Can we even do this ?

Sequential File2010

4030

6050

8070

10090

2nd level?1030507090

110130150170190210230

1090

170250330410490570

1st level?

Page 10: CS4432: Database Systems II

CS 4432 lecture #9 10

Notes on pointers:

(1)Block pointer (used in sparse index) can be smaller than record pointer (used in dense index)

BP

RP

Page 11: CS4432: Database Systems II

CS 4432 lecture #9 11

K1

K3

K4

K2

R1

R2

R3

R4

say:1024 Bper block

• if we want K3 block:• get it at offset (3-1)*1024 = 2048 bytes

Note : If file is contiguous, then we can omit pointers

Page 12: CS4432: Database Systems II

CS 4432 lecture #9 12

Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of index in

memory (Later: sparse better for insertions)

• Dense: Can tell if any record exists without accessing file

(Later: dense needed for secondary indexes)

Page 13: CS4432: Database Systems II

CS 4432 lecture #9 13

Terms

• Index sequential file• Search key ( primary key)• Primary index (on sequencing field)• Secondary index• Dense index (contains all search

key values)• Sparse index• Multi-level index

Page 14: CS4432: Database Systems II

CS 4432 lecture #9 14

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

Page 15: CS4432: Database Systems II

CS 4432 lecture #9 15

Duplicate keys

1010

2010

3020

3030

4540

Page 16: CS4432: Database Systems II

CS 4432 lecture #9 16

1010

2010

3020

3030

4540

1010

2010

3020

3030

4540

10101020

20303030

10101020

20303030

Dense index ! Point to each value !

Duplicate keys

Page 17: CS4432: Database Systems II

CS 4432 lecture #9 17

1010

2010

3020

3030

4540

Dense index. Point to each distinct value!

10203040

Duplicate keys

Page 18: CS4432: Database Systems II

CS 4432 lecture #9 18

1010

2010

3020

3030

4540

10102030

Sparse index: point to start of block !

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!

Page 19: CS4432: Database Systems II

CS 4432 lecture #9 19

1010

2010

3020

3030

4540

10203030

Sparse index, another way ?

Duplicate keys

– place first new key from block

shouldthis be40?

Page 20: CS4432: Database Systems II

CS 4432 lecture #9 20

Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b

Page 21: CS4432: Database Systems II

CS 4432 lecture #9 21

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

Page 22: CS4432: Database Systems II

CS 4432 lecture #9 22

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

Page 23: CS4432: Database Systems II

CS 4432 lecture #9 23

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

Page 24: CS4432: Database Systems II

CS 4432 lecture #9 24

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

Page 25: CS4432: Database Systems II

CS 4432 lecture #9 25

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

Page 26: CS4432: Database Systems II

CS 4432 lecture #9 26

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

Page 27: CS4432: Database Systems II

CS 4432 lecture #9 27

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

Page 28: CS4432: Database Systems II

CS 4432 lecture #9 28

Insertion, sparse index case

2010

30

5040

60

10304060

Page 29: CS4432: Database Systems II

CS 4432 lecture #9 29

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

Page 30: CS4432: Database Systems II

CS 4432 lecture #9 30

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 15

15

2030

20

• Immediate reorganization• Other variations?

Page 31: CS4432: Database Systems II

CS 4432 lecture #9 31

• Just Illustrated: -Immediate reorganization

• Now Variation:– insert new block (chained file)– otherwise leave data file– update index only

Page 32: CS4432: Database Systems II

CS 4432 lecture #9 32

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 25

25

overflow blocks(reorganize later...)

Page 33: CS4432: Database Systems II

CS 4432 lecture #9 33

Insertion, dense index case

• Similar

• Often more expensive . . .

Page 34: CS4432: Database Systems II

CS 4432 lecture #9 34

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

Page 35: CS4432: Database Systems II

CS 4432 lecture #9 35

Secondary indexesSequencefield

5030

7020

4080

10100

6090

Can I make a

secondary

index sparse ?

Page 36: CS4432: Database Systems II

CS 4432 lecture #9 36

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

Page 37: CS4432: Database Systems II

CS 4432 lecture #9 37

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

?

Page 38: CS4432: Database Systems II

CS 4432 lecture #9 38

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Sparse index

302080

100

90...

does not make sense!

Page 39: CS4432: Database Systems II

CS 4432 lecture #9 39

Secondary indexesSequencefield

5030

7020

4080

10100

6090

• Must be dense index !10203040

506070...

105090...

sparsehighlevel

allowed?

Page 40: CS4432: Database Systems II

CS 4432 lecture #9 40

Reminder : With secondary indexes:• Lowest level is dense• Other levels are sparse

Also: Pointers are record pointers

(not block pointers; nor off-sets)

Page 41: CS4432: Database Systems II

CS 4432 lecture #9 41

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

Page 42: CS4432: Database Systems II

CS 4432 lecture #9 42

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10101020

20304040

4040...

one option...

Problem:excess overhead!

• disk space• search time

Page 43: CS4432: Database Systems II

CS 4432 lecture #9 43

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10

another option...

4030

20Problem:variable sizerecords inindex!

Page 44: CS4432: Database Systems II

CS 4432 lecture #9 44

Duplicate values & secondary indexes

1020

4020

4010

4010

4030

10203040

5060...

Another idea :Chain records with same key !

Problems:• Need to add fields to data records for each index• Need to follow chain to know records

Page 45: CS4432: Database Systems II

CS 4432 lecture #9 45

Summary : Indexing Basics

– Basic Ideas: sparse, dense, multi-level…

– Duplicate Keys– Deletion/Insertion– Secondary Indexes