1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.
-
Upload
holly-carter -
Category
Documents
-
view
222 -
download
1
Transcript of 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.
![Page 1: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/1.jpg)
1
Chapter 7
Indexing
File Structures by Folk, Zoellick, and Ricarrdi
![Page 2: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/2.jpg)
2
Chapter Objectives
Index files. Operations Required to Maintain an Index File. Primary keys. Secondary keys.
![Page 3: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/3.jpg)
3
Index a tool for finding records in a file consists of:
key field field on which the index is searched
reference (address or RRN) field tells where to find the data file record associated with
a particular key.
7.1 What is an Index
![Page 4: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/4.jpg)
4
Examples of an Index book index
usually at the end of the book arranged alphabetically by topic
The index in a library (an on-line catalog) allows you to locate items by an author, by a title, or by a call number.
photo thumbnails usually represents a link to the actual photo
much smaller file, can be loaded quickly actual photo takes much longer to load
if index was actual photos, would take long to load
![Page 5: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/5.jpg)
5
Book Index
![Page 6: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/6.jpg)
6
Example: Index in Databases
University uses an index file to keep track of its courses.
The data file consists of the following fields in each record: Department Title Professor Student List Room & Time
![Page 7: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/7.jpg)
7
Example: Primary key Department
not specific enough Course Number
not unique Professor
not unique Room & Time
possible classes aren’t identified this way
Department + Course Number -> Obvious?
7.2 A Simple Index
![Page 8: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/8.jpg)
8
Index file It is used to provide rapid access to
individual records in the data file via the keys
Example index file consists of the following fields: key (e.g. CIS402) reference (address) =address of the
corresponding record in the data file
![Page 9: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/9.jpg)
9
![Page 10: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/10.jpg)
10
![Page 11: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/11.jpg)
Primary Index
k1 k2 k4 k5 k7 k9
k1 k2 k4 k5 k7 k9
AAA ZZZ CCC XXX EEE FFF
Index File
Data File
7.1 What Is an Index?
![Page 12: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/12.jpg)
7.2 A Simple Index for E-S Files
Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1);
int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const;
//search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique}
Index Class Interface
![Page 13: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/13.jpg)
13
Operations on an Indexed File
Create (when data file is created) Load into memory (whole file, if possible and
prudent) Write updated file to permanent storage Record(s) added to data file Record(s) deleted from data file Update record(s) in data file
Searches
![Page 14: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/14.jpg)
14
![Page 15: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/15.jpg)
15
Creating Files of Data Create files
index file data record file
Load Index via
buffer I/O an array.
Writing Back Index File Can be part of the close operation for the index file
close function in index object can write the buffer/array to the disk before closing file
![Page 16: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/16.jpg)
16
Record addition Adding a new data record to the data file
requires adding a new record to the index file If the index file is sorted:
adding a new record may require rearranging the records in this file.depends upon index file representation in memory if sort necessary, easily done if the indices are in main
memory
![Page 17: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/17.jpg)
17
Record deletion Deletion of a data record requires deletion of the
corresponding index record. Can space in data file be reclaimed?
Difficult, as with index file organization all data records are pinned a pinned data record is one that has a reference to its address in
an index file
Other consequences Resorting difficulty
Solution: Sort the file via the indices
![Page 18: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/18.jpg)
18
Record Updating Two categories of updates:
modification of key value re-ordering of the index file might be required
two possible situations1. modifying key reorders file
2. see below
modification of non-key value might still require reordering of records in the data
file. (WHY?)
size of data record might increase, requiring moving it to space that can hold it
must reset index for that record
![Page 19: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/19.jpg)
19
Indexes too large for Memory
kept on the secondary storage disadvantages
time consumptionsearching the index file
requires disk accesses instead of just memory accesses
rearranging indexesrequires disk accesses
7.5 Indexes That Are Too Large to Hold in Memory
![Page 20: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/20.jpg)
20
Solutions to Index Files in 2ndary storage
If the index file is too large to be kept in main memory than the following alternative organizations should be considered:
a hashed organization (if access speed is very important)
a tree structured organization, or a multilevel index such as a B-tree
![Page 21: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/21.jpg)
21
Pros of a simple index file allows for use of binary search sorting and maintaining an index is much
easier than for a data file true if index entries are much smaller than data
records,
if data records are pinned, can rearrange keys without moving data records
apply them to multiple simple indexes...
![Page 22: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/22.jpg)
22
Indexing with Multiple Key Access
unique primary key often used as a search keyword. Example primary key
CS215
What if you’d like to include the prof in the search? Two keys: Course & Prof Could also be:
Course & Time Location & Time (?)
7.6 Indexing to Provide Access by Multiple Keys
![Page 23: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/23.jpg)
23
Secondary key A secondary key is a key for
which multiple records may exist in the data file.
Example: Sorting an Excel sheet using two
fields (e.g. name & section) A professor teaches more than
one class
![Page 24: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/24.jpg)
24
Secondary Index File create for the possible secondary indexes.
secondary keys can be sharedprimary keys were unique
Example: Professor El-Ramly secondary keys:
Primary keys containing this prof: CS352 CS215
Can access those courses via the secondary key What if course has multiple sections?
![Page 25: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/25.jpg)
25
Record Addition Adding a record to the data file likely requires
adding a record to the secondary index file. Costs are similar to the cost of adding a record
in the primary index file. records might have to be shifted indexes may have to be rearranged
![Page 26: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/26.jpg)
26
![Page 27: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/27.jpg)
27
Record Deletion must remove all references to that record in the file system. search for primary key in primary index file
remove index search in secondary index file
for the primary key of the record to be deleted remove index from the secondary index file.
what if secondary keys are maintained? secondary key refers to primary key primary key will have been deleted, and will not exist if we consider this possibility, don’t have to delete secondary key pitfalls?
![Page 28: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/28.jpg)
28
Record Updating There are three possible situations:
secondary key altered may have to rearrange the secondary key index so it stays in sorted
order primary key altered
big impact on the primary key index in the secondary key index only need to update the affected primary
key field confined to non-key fields
all updates that do not affect either the primary or secondary key fields do not affect the secondary key index, even if the update is substantial. recall, can affect primary index, since that refers to location in data file
![Page 29: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/29.jpg)
29
![Page 30: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/30.jpg)
30
Retrieving Data with Multiple Secondary Keys
Example: All courses taught by Spiegel or Gordon Requires two searches searches produce a list of courses by providing
primary keys.Spiegel: CIS136, CIS235, CIS402Gordon: CIS425, CIS520, CIS243
![Page 31: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/31.jpg)
31
Boolean AND in searches Example:
Search for courses: taught by Spiegel located in Lytle Hall
Courses found are in intersection ofcourses taught by Spiegelcourses offered in Lytle Hall
![Page 32: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/32.jpg)
32
Boolean OR searches Example:
Search for courses: taught by Spiegel located in Lytle Hall
Courses found are in union ofcourses taught by Spiegelcourses offered in Lytle Hall
![Page 33: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/33.jpg)
33
Cons of the Current Secondary Index Structure
index file has to be rearranged every time a new record is added to the file.
for duplicate secondary keys, secondary key field is repeated for each entry.
Secondary Key Primary Key
El-Ramly CS215
El-Ramly CS352
Khattab CS214
Khattab CS316
7.8 Improving the Secondary Index Structure
![Page 34: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/34.jpg)
34
Cons of the Current Secondary Index Structure
Solution A: by an array of references
Solution B: by linking the list of references
7.8 Improving the Secondary Index Structure
![Page 35: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/35.jpg)
35
Improvements to the secondary index key
structure Solution 1
Allow for multiple primary keys to be associated with a single secondary key by allocating a primary key list (STL vector is best; why?) for each secondary key entry. Solves the problem of sorting each time when an new entry is
added. According to text: Suffers from internal fragmentation due to
fixed nature of list, and the number of allocated entries in the array may prove too small. STL (or Java) vector fixes this: How?
![Page 36: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/36.jpg)
A. Array of References
BEETHOVEN ANG3795 DG139201 DG18807 RCA2626
COREA WAR23699
DVORAK COL31809
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
SWEET HONEY IN THE R FF245
Secondary key Set of primary key references
Revised composer index
* no need to rearrange
* limited reference array
* internal fragmentation
![Page 37: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/37.jpg)
37
Allow for multiple primary keys to be associated with a single secondary key by allocating a primary key array (STL vector is best; why?) for each secondary key entry.
Solves the problem of sorting each time when an new entry is added.
According to text: Suffers from internal fragmentation due to fixed nature of list, and the number of allocated entries in the array may prove too small. STL (or Java) vector fixes this: How?
A. Array of References
![Page 38: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/38.jpg)
38
Solution B Create an inverted list of indexes. Have each
secondary key point to a list of primary key references associated with it.This method eliminates most of the problems
associated with maintaining a secondary index file.
Which solution is better?
B. Inverted List
![Page 39: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/39.jpg)
Inverted Lists
Guidelines for better solution no reorganization when adding no limitation for duplicate key no internal fragmentation
Solution B: by Linking the list of references
A list of primary key references
secondary key field, relative record number of the first
corresponding primary key reference
PROKOFIEV ANG36193
LON2312
![Page 40: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/40.jpg)
Linking List of References (1)
BEETHOVEN
COREA
PROKOFIEV
RIMSKY-KORSAKOV
SPINGSTEEN
SWEET HONEY IN THE R
DVORAK
3
2
7
10
6
4
9
LON2312
RCA2626
ANG23699
COL38358
DG18807
MER75016
COL31809
DG139201
ANG36193
WAR23699
-1
-1
-1
8
-1
1
-1
-1
5
0
0
1
2
3
4
5
6
7
8
9 FF245 -1
Secondary Index file Label ID List file
Improved revision of the composer index
0
1
2
3
4
5
6
10
![Page 41: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/41.jpg)
Linking List of References (2)
The primary key references in a separate, entry-sequenced file
Advantages rearranges only when secondary key changes rearrangement is quick less penalty associated with keeping the secondary index file on
secondary storage (less need for sorting) Label ID List file not need to be sorted reusing the space of deleted record is easy
![Page 42: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/42.jpg)
Linking List of References (3)
Disadvantage same secondary key references may not be
physically grouped lack of localitycould involve a large amount of seekingsolution: reside in memory
same Label ID list can hold the lists of a number of secondary index files
if too large in memory, can load only a part of it
![Page 43: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/43.jpg)
43
![Page 44: 1 Chapter 7 Indexing File Structures by Folk, Zoellick, and Ricarrdi.](https://reader031.fdocuments.in/reader031/viewer/2022032017/56649f0c5503460f94c1fb7d/html5/thumbnails/44.jpg)
Selective Indexes
Selective Index: Index on a subset of records Selective index contains only some part of
entire index provide a selective view useful when contents of a file fall into several
categories e.g. 20 < Age < 30 and $1000 < Salarye.g. Courses offered after 12 noon
7.9 Selective Indexes