RDBMS and SQL Physical View and Indexing

41
Venkatesh Vinayakarao (Vv) RDBMS and SQL Physical View and Indexing Venkatesh Vinayakarao [email protected] http://vvtesh.co.in Chennai Mathematical Institute Slide contents are borrowed from the course text. For the authors’ original version of slides, visit: https://www.db-book.com/db6/slide-dir/index.html.

Transcript of RDBMS and SQL Physical View and Indexing

Page 1: RDBMS and SQL Physical View and Indexing

Venkatesh Vinayakarao (Vv)

RDBMS and SQL

Physical View and Indexing

Venkatesh [email protected]

http://vvtesh.co.in

Chennai Mathematical Institute

Slide contents are borrowed from the course text. For the authors’ original version of slides, visit: https://www.db-book.com/db6/slide-dir/index.html.

Page 2: RDBMS and SQL Physical View and Indexing

Story So Far…1 2

3

Relational AlgebraSQL

You are here!

Page 3: RDBMS and SQL Physical View and Indexing

File Organization

DB

file 1 file 2 file n…

Data stored as files.Files are managed by the

underlying OS.

Page 4: RDBMS and SQL Physical View and Indexing

Files

• A file is a sequence of blocks.

• Blocks are fixed-length units of both storage allocation and data transfer.

182

file i

Block 1

Block 2

Page 5: RDBMS and SQL Physical View and Indexing

Records

• A block may contain several records.

• Each record is entirely contained in a single block.

183

Block i

Record 1

Record 2

Record n

Page 6: RDBMS and SQL Physical View and Indexing

File Organization

184

DB

DB is stored asa set of files.

no record is larger than a block

Page 7: RDBMS and SQL Physical View and Indexing

Approch1: Fixed-Length Records

185

Page 8: RDBMS and SQL Physical View and Indexing

Quiz

• Assume each char takes 1 byte and numeric(8,2) type take 8 bytes of physical storage. Say, block size in our file system is 1 KB. If there are 20 records in our relation, how many block accesses will we need to retrieve all of them?

186

Page 9: RDBMS and SQL Physical View and Indexing

Quiz

• Assume each char takes 1 byte and numeric(8,2) type take 8 bytes of physical storage. Say, block size in our file system is 1 KB. If there are 20 records in our relation, how many block accesses will we need to retrieve all of them?• Record length = 53 bytes

• Total no. of records = 20

• Space required = 53 * 20 = 1060 bytes

• Block size = 1024 bytes.

• We need two block accesses to retrieve all records.

187

Page 10: RDBMS and SQL Physical View and Indexing

Issues

• Deletion• Causes gaps inside blocks.

• Space optimization• block size may not be a multiple of record length

• space wasted in blocks.

188

Page 11: RDBMS and SQL Physical View and Indexing

Space Usage

189

Record

Ptr to 2nd

deleted record

Record

Block

File

Record

Record

Record

Block

Record

Record

Record

Block

File HeaderPointer to first deleted record

Deleted records form a linked list called the “free list”.

Page 12: RDBMS and SQL Physical View and Indexing

Free List

190

Free list1 → 4 → 6

Page 13: RDBMS and SQL Physical View and Indexing

Variable Length Record

191

Metadata about the variable length data is stored (in fixed length part)

Read 10 bytes from 36th byte for this field

Page 14: RDBMS and SQL Physical View and Indexing

Storage Organization of Records

• Heap file organization• Place any record anywhere in the file.

• Single file for each relation.

• Sequential file organization• Records are stored in sequential order (of key).

• Hashing file organization• Hash (some attribute of) records to blocks.

192

Page 15: RDBMS and SQL Physical View and Indexing

Indexing

193

Page 16: RDBMS and SQL Physical View and Indexing

Motivation

• We usually access only a small part of the DB.

DBFind the instructors in the physics department

Need additional structures to access data efficiently

Page 17: RDBMS and SQL Physical View and Indexing

Basic Concepts

• Indexing mechanisms used to speed up access to desired data.• E.g., author catalog in library

• Search Key - Set of attributes used to look up records in a file.

• An index file consists of records (called index entries) of the form

• Index files are typically much smaller than the original file

• Two basic kinds of indices:• Ordered indices: search keys are stored in sorted order• Hash indices: search keys are distributed uniformly across

“buckets” using a “hash function”.

search-key pointer

Page 18: RDBMS and SQL Physical View and Indexing

Ordered Indices

• In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.

• Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.• The search key of a primary index is usually but not

necessarily the primary key.

• Secondary index: an index whose search key specifies an order different from the sequential order of the file.

• Index-sequential file: ordered sequential file with a primary index.

Page 19: RDBMS and SQL Physical View and Indexing

Dense Index Files

• Dense index — Index record appears for every search-key value in the file.

• E.g. index on ID attribute of instructor relation

Page 20: RDBMS and SQL Physical View and Indexing

Dense Index Files (Cont.)

• Dense index on dept_name, with instructor file sorted on dept_name

Page 21: RDBMS and SQL Physical View and Indexing

Sparse Index Files

• Sparse Index: contains index records for only some search-key values.• Applicable when records are sequentially ordered on search-key

• To locate a record with search-key value K we:• Find index record with largest search-key value < K

• Search file sequentially starting at the record to which the index record points

Page 22: RDBMS and SQL Physical View and Indexing

Secondary Indices Example

• Index record points to a bucket that contains pointers to all the actual records with that particular search-key value.

• Secondary indices have to be dense

Secondary index on salary field of instructor

Page 23: RDBMS and SQL Physical View and Indexing

Multilevel Index

• If primary index does not fit in memory, access becomes expensive.

Page 24: RDBMS and SQL Physical View and Indexing

B+-Tree Index Files

• B+-tree indices are an alternative to indexed-sequential files.

• Advantage of B+-tree index files:

• automatically reorganizes itself with small, local, changes, in the face of insertions and deletions.

• Reorganization of entire file is not required to maintain performance.

• (Minor) disadvantage of B+-trees:

• extra insertion and deletion overhead, space overhead.

Page 25: RDBMS and SQL Physical View and Indexing

Example of B+-Tree

fanout, n=4 (#pointers in each node)

Page 26: RDBMS and SQL Physical View and Indexing

n=6

204

Page 27: RDBMS and SQL Physical View and Indexing

B+-Tree Node Structure

• Typical node

• Ki are the search-key values

• Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes).

• The search-keys in a node are ordered

K1 < K2 < K3 < . . . < Kn–1

Page 28: RDBMS and SQL Physical View and Indexing

Leaf Nodes in B+-Trees• For i = 1, 2, . . ., n–1, pointer Pi points to a file record with search-key

value Ki,

• If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than or equal to Lj’s search-key values

• Pn points to next leaf node in search-key order

Page 29: RDBMS and SQL Physical View and Indexing

Rules

• Root node• can hold fewer than n/2 pointers.• must hold at least two pointers, unless the tree consists

of only one node.

• Internal nodes• all pointers are pointers to tree nodes.• and must hold at least n/2 pointers and up to n

pointers.

• Leaf nodes• Can contain from as few as (n − 1)/2 values, up to n-1

values

207

Page 30: RDBMS and SQL Physical View and Indexing

B+ Tree Construction

See https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

Insert B+ Tree

sri

wu

moz

ein

els

Page 31: RDBMS and SQL Physical View and Indexing

B+ Tree Construction

209

Page 32: RDBMS and SQL Physical View and Indexing

B+ Tree Insertion

210

Page 33: RDBMS and SQL Physical View and Indexing

B+ Tree Insertion

211

Page 34: RDBMS and SQL Physical View and Indexing

Deletion of a Key in a B+ Tree

212

Delete katz

Page 35: RDBMS and SQL Physical View and Indexing

Deletion of a Key in a B+ Tree

213

Delete gold

Page 36: RDBMS and SQL Physical View and Indexing

Deletion of a Key in a B+ Tree

214

Delete kim

Page 37: RDBMS and SQL Physical View and Indexing

Deletion of a Key in a B+ Tree

215

Delete els

Page 38: RDBMS and SQL Physical View and Indexing

Deletion of a Key in a B+ Tree

216

Delete moz

Page 39: RDBMS and SQL Physical View and Indexing

Deletion of a Key in a B+ Tree

217

Delete sing

Page 40: RDBMS and SQL Physical View and Indexing

Readings

• Insert and Delete algorithms over B+Trees

218

Page 41: RDBMS and SQL Physical View and Indexing

Thank YouB+ Tree simulation available at https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

219