Elementary Indexes for a Consumer Price Index - Ottawa Group
An Introductory Course on Database · PDF file• Indexes and index files a) Simple...
Transcript of An Introductory Course on Database · PDF file• Indexes and index files a) Simple...
2012-02-14 1 Silvia Stefanova, UDBL - IT - UU
DATABASE DESIGN I - 1DL300
Spring 2012
An Introductory Course on Database Systems
http://www.it.uu.se/edu/course/homepage/dbastekn/vt12/
Erik Zeitler Uppsala Database Laboratory
Department of Information Technology, Uppsala University, Uppsala, Sweden
2012-02-14 2 Silvia Stefanova, UDBL - IT - UU
Introduction to Physical Database Design
Elmasri/Navathe ch 16 & 17 Padron-McCarthy/Risch ch 21 & 22
Silvia Stefanova
Department of Information Technology Uppsala University, Uppsala, Sweden
2012-02-14 3 Silvia Stefanova, UDBL - IT - UU
Outline • Index in the DBMS • Physical Database Design: main concepts • Data structures for physical storage of the database
Unsorted files Sorted files Hashing methods Files sorted according to an index
• Indexes and index files a) Simple (one-level) index
Primary index Secondary index Clustering index
b) Search trees - multi-level index (Covered in DB II) c) Hash indexes (Covered in DB II)
2012-02-14 4 Silvia Stefanova, UDBL - IT - UU
• The physical database - a collection of stored records organized in files on the hard disk. A record consists of a number of data fields. Each field has an elementary data type
(integer, real, string, pointer etc.)
• Records are used for physical storage of: Tuples, where each attribute in the tuple is stored as a field. Objects in object-oriented databases.
Physical Database Design
2012-02-14 5 Silvia Stefanova, UDBL - IT - UU
• A database file: sequence of units, called blocks. Blocks are units of both storage allocation and data transfer.
• Database system seeks to minimize the number of block
transfers between the disk and memory. Reduce the number of disk accesses by keeping as many
blocks as possible in main memory.
• Buffer – portion of main memory available to store copies of disk blocks.
• Buffer manager – subsystem responsible for allocating buffer space in main memory.
Physical Database Design
2012-02-14 7 Silvia Stefanova, UDBL - IT - UU
• A disk block contains information that is needed for record access: Block addresses, record format etc. To find records, one or several blocks are transferred
to (one or several buffers in) primary memory. These blocks can then be searched to find the records that were sought. If the address to the block containing the record is
unknown one has to search through all block in the file (linear search).
Physical Database Design
2012-02-14 8 Silvia Stefanova, UDBL - IT - UU
• Heap A record can be placed anywhere in the file where a space is
available • Sequential
Records are in sequential order The order is based on the value of the search key of each
record • Hashing
A hash function is computed on some attribute of each record; the result specifies in which block of the file the record should be placed
• Ordered according to an index
Organization of DB records in files
2012-02-14 9 Silvia Stefanova, UDBL - IT - UU
• New records are added to the end of the file. • Suitable when we don’t know how data shall be used.
Insert - very efficient Search - expensive (linear to the size). Delete - expensive (search - read into - delete - write back). Update of a record of variable length can be hard Retrieval according to a certain order requires that the file
must be sorted which is expensive.
• Instead of physically removing a record one can mark the record as deleted. Both methods require a periodically reorganization of the file.
Heap files (Unordered files)
2012-02-14 10 Silvia Stefanova, UDBL - IT - UU
• The records are ordered according to the value of a certain field (ordering field) Ordered retrieval - very fast (no sorting needed). Next record in the order is found on the same block (except
for the last record in the block) Search - fast (binary search - log2b; b – number of the
sorted blocks) Insert and delete - expensive since the file must be kept
sorted. • Suitable for applications that require sequential
processing of the entire file • Need to reorganize the file from time to time to restore
sequential order
Sequential files (Sorted files)
2012-02-14 11 Manivasakan Sabesan- UDBL - IT - UU
Hashing technique
• Goal Retrieve records faster.
• How? Find a hash function, h, so that for a record p, h(f(p)) provides
the address to the block where p shall be stored. Example: h(f(p)) = f(p) mod M,
f(p) : hash field for p M : hash table size.
• What : Most of the records will be found in only one block access.
• Observe !!! Collision if: h(f(p)) = h(f(p’))
2012-02-14 13 Silvia Stefanova, UDBL - IT - UU
• The records in the file are organized as a hash table (external hashing) Records are hashed according to the bucket method. A bucket = one block. The bucket address is used as the address for a record. Info. regarding the block address for each bucket is stored in
the beginning of the file.
• Since several records can be stored in the same block, the number of collisions decreases.
• When a bucket is filled one can use a chaining method where a number of buckets are used as overflow buckets.
External Hashing
2012-02-14 14 Silvia Stefanova, UDBL - IT - UU
• Choose a larger array of size M+O and use the extra space as overflow places.
Chaining Method in External Hashing
2012-02-14 15 Silvia Stefanova, UDBL - IT - UU
Outline • Index in the DBMS • Physical Database Design: main concepts • Data structures for physical storage of the database
Unsorted files Sorted files Hashing methods Files ordered according to an index
• Indexes and index files a) Simple (one-level) index
Primary index Secondary index Clustering index
b) Search trees - multi-level index (Covered in DB II) c) Hash indexes (Covered in DB II)
2012-02-14 16 Silvia Stefanova, UDBL - IT - UU
Indexes • An index (or index file) is an extra file structure used to
make the retrieval of records faster. • Search key (indexing field)– attribute or set of attributes
(data fields) used to look up records in a file. • An index file
Has records (index entries) of the form: <search key, pointer>
The entries determine the physical address for records Much smaller than the data file
• Two main types of indexes:
Ordered indexes: search keys are stored Hash indexes: search keys are distributed uniformly across
“buckets” using a “hash function”.
2012-02-14 17 Silvia Stefanova, UDBL - IT - UU
Primary Index
• Records in the data file are kept ordered on the primary key
• Primary index - a file of records with two fields. 1st field : of the same type as the primary key field (index
field) for the data file 2nd field : a pointer to a block (block pointer).
• A primary index has one index record for each block in the data file, and therefore is called a sparse index (or “non-dense index”)
• The first record in each block is called the anchor record of the block
2012-02-14 19 Manivasakan Sabesan- UDBL - IT - UU
Clustering index
• Clustering index is defined for data files that are ordered according to a non-key field (the clustering field), i.e. several records in the data file can have the same value for the clustering field.
• A clustering index - a file of records with two fields: 1st field : the same type as the clustering field for the data file 2nd field : a pointer to a block of records (block pointer) in the
data file.
• Insert and dele are problematic. However, if each block only can contain records with the same clustering value the insertion problem is solved. (see Elmasri/Navathe Fig 17.3)
2012-02-14 21 Manivasakan Sabesan- UDBL - IT - UU
Secondary index
• The data file is not sorted according to the indexing field. • A secondary index is an ordered file that consists of
records with two fields. The 1st field is of the same type as the indexing field (any field in
the data file) The 2nd field is a block pointer.
• Two options: Option 1 (key) : the index field has unique values for each
record; dense index Option 2 (non-key) : several records in the data file can have
the same value for the index field;