CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions
CS: Introduction to Record Manipulation & Indexing
-
Upload
katrin-becker -
Category
Technology
-
view
184 -
download
0
description
Transcript of CS: Introduction to Record Manipulation & Indexing
Records and Indexing 14-Sep-03 1© Katrin Becker
All Rights Reserved
Record Manipulation & Indexing
•records/fields•index placement; index management•manipulating fixed-length record files•re-using space in fixed-length files•varying length records:[VLR] adds; dels; mods;•free lists for VLR - placement strategies (first, best, worst)•varying length record maintenance
Records and Indexing 14-Sep-03 2© Katrin Becker
All Rights Reserved
Records in General
A record is:• An identifiable, describable data set• Often contains a sub-structure• Typically part of a larger structure
This definition also works for: files; fields; …
Records and Indexing 14-Sep-03 3© Katrin Becker
All Rights Reserved
Records and Fields
FILE containing records
RECORD containing fields
FIELD containing elements
FILE SYSTEM containing files
Records and Indexing 14-Sep-03 4© Katrin Becker
All Rights Reserved
Record Manipulation
• Operations on Records:– Searches– Additions– Deletions– Modifications
Records and Indexing 14-Sep-03 5© Katrin Becker
All Rights Reserved
Record Manipulation - SearchSequential Search
•While NOT done:– Position file pointer– Read record– Examine record to see if it’s the
one•Yes DONE•No CONTINUE
Records and Indexing 14-Sep-03 6© Katrin Becker
All Rights Reserved
Other Searches• What changes?
– Binary search:• We position the file pointer in a different
fashion (the rest is the same)
– Search with an index• We apply the search to the index and retrieve
the record only when located in the index
Records and Indexing 14-Sep-03 7© Katrin Becker
All Rights Reserved
Record Manipulation – Addition
New record gets added to the end.
• Insertion into middle of file is impractical.• If there is an index, then we also perform
an addition to the index (addition to the end of this list is infeasible – WHY? ).
Records and Indexing 14-Sep-03 8© Katrin Becker
All Rights Reserved
Addition with an Index - 1
1. New record gets added to the end.
RECORDS
INDEX
Records and Indexing 14-Sep-03 9© Katrin Becker
All Rights Reserved
Addition with an Index - 2
2. Locate place where index entry needs to go
INDEX
RECORDS
Records and Indexing 14-Sep-03 10© Katrin Becker
All Rights Reserved
Addition with an Index - 3
RECORDS
INDEX
3. Insert New Index entry (it’s a record too)
Records and Indexing 14-Sep-03 11© Katrin Becker
All Rights Reserved
Records vs. Index:Assertions & Questions
• Moving file records is more expensive than moving index records.
• Should index be IN record file or its own file? (How do we maintain it? )
• If IN file: should it be at the beginning, end, middle, distributed?
• What if we are able to hold the index in memory?
• What if we can’t?
Records and Indexing 14-Sep-03 12© Katrin Becker
All Rights Reserved
Record Manipulation - Deletion
• Locate record (Search)• Mark space as deleted• Remove index entry? (why or why
not)
Records and Indexing 14-Sep-03 13© Katrin Becker
All Rights Reserved
Deletion with an index - 1
RECORDS
INDEX1. Locate index entry
Records and Indexing 14-Sep-03 14© Katrin Becker
All Rights Reserved
Deletion with an index - 2
RECORDS
INDEX1. Locate index entry
2. Locate record
Records and Indexing 14-Sep-03 15© Katrin Becker
All Rights Reserved
Deletion with an index - 3
RECORDS
INDEX
3. Delete (mark) record
Records and Indexing 14-Sep-03 16© Katrin Becker
All Rights Reserved
Deletion with an index - 4
RECORDS
INDEX
4. Delete (mark?) index entry
Records and Indexing 14-Sep-03 17© Katrin Becker
All Rights Reserved
Record Manipulation - Modification
• Locate record• Read record• Modify record• Re-write record (assuming fixed-
size records – what if the record is now a different size? [see later])
Records and Indexing 14-Sep-03 18© Katrin Becker
All Rights Reserved
File Behaviour – 1 start
Record count = 9
Records and Indexing 14-Sep-03 19© Katrin Becker
All Rights Reserved
File Behaviour – 2 add record
Record count = 10
Records and Indexing 14-Sep-03 20© Katrin Becker
All Rights Reserved
File Behaviour – 3 add record
Record count = 11
Records and Indexing 14-Sep-03 21© Katrin Becker
All Rights Reserved
File Behaviour – 4 delete
Record count = 10
Records and Indexing 14-Sep-03 22© Katrin Becker
All Rights Reserved
File Behaviour – 5 delete
Record count = 9
Records and Indexing 14-Sep-03 23© Katrin Becker
All Rights Reserved
File Behaviour – 6 add
Record count = 10
Records and Indexing 14-Sep-03 24© Katrin Becker
All Rights Reserved
File Behaviour – 7 add
Record count = 11
Records and Indexing 14-Sep-03 25© Katrin Becker
All Rights Reserved
File Behaviour – 8 add
Record count = 12
Records and Indexing 14-Sep-03 26© Katrin Becker
All Rights Reserved
File Behaviour – 9 delete
Record count = 11
Records and Indexing 14-Sep-03 27© Katrin Becker
All Rights Reserved
File Behaviour – 10 delete
And so on…….
Record count = 10
Records and Indexing 14-Sep-03 28© Katrin Becker
All Rights Reserved
What’s happening to the file?• File grows – does not shrink (we get
fragmentation)• We end up covering more ground to do the
same job • Q: If we are doing random access, why does it matter?
• The file system has less space to use (the fragmentation is internal from the perspective of the file system).
• Worst case = EVERY record access ends up costing us a seek.
Records and Indexing 14-Sep-03 30© Katrin Becker
All Rights Reserved
Re-Using Space in the File [FLR]• When there is a deletion, locate
the last record in the file, end move to the free slot– Costs:
•Additional file access to locate (where will we remember where the last records is?) and retrieve last record.
•Records will loose locality faster than if we simply mark the slot. (Why do we care?)
Records and Indexing 14-Sep-03 31© Katrin Becker
All Rights Reserved
Re-Using Space – Way 2
• Make a list of places where records have been deleted.
• When doing addition, check for empty ‘slot’ before placing new record at end.
Q: What about the index?
• When doing deletion, add location of deleted record to ‘free-list’
Records and Indexing 14-Sep-03 32© Katrin Becker
All Rights Reserved
What does the Free-List look like?
RECORDS
INDEX
All we need is the location.Order is unimportant.
Records and Indexing 14-Sep-03 33© Katrin Becker
All Rights Reserved
How to decide which ‘slot’ to re-use?
• In FLR every slot will fit a new record.• We can just take the first one – Free-
List can then be maintained as a stack (which is easy).
• Do we keep Free-List information in the file?
Records and Indexing 14-Sep-03 34© Katrin Becker
All Rights Reserved
Indexing – What is it?
• Table-of-contents for a file (directory)
• Uses keys• Byte Offset (BO) vs Relative
Record Number (RRN)
Records and Indexing 14-Sep-03 35© Katrin Becker
All Rights Reserved
Primary Key Properties:
• Unique• Canonical• Data-less• Unchanging
Records and Indexing 14-Sep-03 36© Katrin Becker
All Rights Reserved
Indexing – How does it Look?• Must have:
– Key– Way to locate record
• It is itself a structure containing ‘records’ (each index entry is a record)
• It may be separate from the main data or in the same file.
• It may be copied into memory for manipulation and only updated infrequently; or the file copy may be maintained as well.
INDEX
Records and Indexing 14-Sep-03 37© Katrin Becker
All Rights Reserved
Indexing – File Ops?
• Tied to records:– If records added – new/update index
entry– If record deleted – ‘delete’ index entry– If record modified – maybe no change
to index; maybe update BO [byte offset]
Records and Indexing 14-Sep-03 38© Katrin Becker
All Rights Reserved
Fixed-length vs Varying Length• VLR provides greater flexibility.• VLR increases maintenance overhead.• VLR decreases wasted space. *• VLR makes index virtually essential.• VLR complicates Free-List
maintenance.
*may simply waste space in a different place or a different way.
Records and Indexing 14-Sep-03 39© Katrin Becker
All Rights Reserved
VLR Index
RECORDS
INDEX
• Requires:– Key– Byte offset– Record size? [optional]
Records and Indexing 14-Sep-03 40© Katrin Becker
All Rights Reserved
VLR Search Operation
RECORDS
INDEX
• Same as for FLR:1. Locate key in index2. Locate record in file
• Binary search still possible on index, but NOT on records alone.
Records and Indexing 14-Sep-03 41© Katrin Becker
All Rights Reserved
VLR Deletion Operation - 1
RECORDS
INDEX
Locate key
Records and Indexing 14-Sep-03 42© Katrin Becker
All Rights Reserved
VLR Deletion Operation - 2
RECORDS
INDEX
Locate record
Records and Indexing 14-Sep-03 43© Katrin Becker
All Rights Reserved
VLR Deletion Operation - 3
RECORDS
INDEX
Delete record
Records and Indexing 14-Sep-03 44© Katrin Becker
All Rights Reserved
VLR Deletion Operation - 4
RECORDS
INDEX
• Remember location of ‘slot’• Remember size of slot.
Free-List
Records and Indexing 14-Sep-03 45© Katrin Becker
All Rights Reserved
VLR Deletion Operation - 5
RECORDS
INDEX
5. Mark index entry
Free-List
Records and Indexing 14-Sep-03 46© Katrin Becker
All Rights Reserved
VLR Addition Operation – 1a
RECORDS
INDEX
1. Search Free-List
Free-List
New
Record
Records and Indexing 14-Sep-03 47© Katrin Becker
All Rights Reserved
VLR Addition Operation – 1b
New
Record
RECORDS
INDEX
Too Big for first place
Free-List
New
Record
Records and Indexing 14-Sep-03 48© Katrin Becker
All Rights Reserved
VLR Addition Operation – 1c
New
RecordRECORDS
INDEX
Too Big for second place
Free-List
New
Record
Records and Indexing 14-Sep-03 49© Katrin Becker
All Rights Reserved
VLR Addition Operation – 1d
New
RecordRECORDS
INDEX
Too Big for third place
Free-List
New
Record
Records and Indexing 14-Sep-03 50© Katrin Becker
All Rights Reserved
VLR Addition Operation – 1e
RECORDS
INDEX
Place at end of file
Free-List
New
Record
Records and Indexing 14-Sep-03 51© Katrin Becker
All Rights Reserved
VLR Addition Operation – 2a
RECORDS
INDEX
Search Free-List
Free-List
New
RecordNew
Record
Records and Indexing 14-Sep-03 52© Katrin Becker
All Rights Reserved
VLR Addition Operation – 2b
RECORDS
INDEX
Fits in first place….BUT…..
Free-List
New
RecordNew
Record
Records and Indexing 14-Sep-03 53© Katrin Becker
All Rights Reserved
VLR Addition Operation – 2c
RECORDS
INDEXWe will end up with left-
over unused (and probably unusable space).
We call this “First-Fit” (because we are using the first slot that we find that fits).
Free-List
New
Record
Records and Indexing 14-Sep-03 54© Katrin Becker
All Rights Reserved
VLR Addition Operation – 2d
RECORDS
INDEX
If instead we keep looking…
We find the second entry is a better fit…..
Free-List
New
Record
Records and Indexing 14-Sep-03 55© Katrin Becker
All Rights Reserved
VLR Addition Operation – 2e
RECORDS
INDEX
The third slot does not fit, so….
Free-List
New
Record
Records and Indexing 14-Sep-03 56© Katrin Becker
All Rights Reserved
VLR Addition Operation – 2f
RECORDS
INDEX
We decide to use the second slot.
It is the Best-Fit
Free-List
New
Record
Records and Indexing 14-Sep-03 57© Katrin Becker
All Rights Reserved
VLR Addition Operation – 2g
RECORDS
INDEX
1. Insert record.Free-List
New
Record
2. Delete Free-List entry.
3. Update Index
Notice the index entry is sorted differently.
What’s the advantage to leaving ‘spaces’ in the index?
Records and Indexing 14-Sep-03 58© Katrin Becker
All Rights Reserved
VLR Modification Operation - 1
• 2 kinds:– 1. Mod results in record remaining same
size – 2. Mod results in record growing or
shrinking.
Records and Indexing 14-Sep-03 59© Katrin Becker
All Rights Reserved
VLR Modification Operation - 2
• Mod results in record remaining same size – Same as for FLR
Records and Indexing 14-Sep-03 60© Katrin Becker
All Rights Reserved
VLR Modification Operation - 3
• Mod results in record growing or shrinking.– Treat Mod as a deletion followed by an
addition.
Records and Indexing 14-Sep-03 61© Katrin Becker
All Rights Reserved
Free-Lists
• May want to keep Free-List sorted.
• If the List is short it may not matter.
• Placement Strategies:– First Fit– Best Fit– Worst Fit
• It could be its own list or we could make the regular index serve double-duty.
Records and Indexing 14-Sep-03 62© Katrin Becker
All Rights Reserved
Summary• Managing space inside the file is our
business.• We must choose:
– FLR / VLR?– Index? (what kind?)– Secondary indices?– Re-claim free space? How?