CS: Introduction to Record Manipulation & Indexing

61
Records and Indexing 14-Sep-03 1 © Katrin Becker ll Rights Reserved Record Manipulation & Indexing •records/fields •index placement; index management •manipulating fixed-length record files •re-using space in fixed-length files •varying length records:[VLR] adds; dels; mods; •free lists for VLR - placement strategies (first, best, worst) •varying length record maintenance

description

An introduction to data record manipulation and indexing. Originally created 2003 by Katrin Becker All rights reserved.

Transcript of CS: Introduction to Record Manipulation & Indexing

Page 1: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 1© Katrin Becker

All Rights Reserved

Record Manipulation & Indexing

•records/fields•index placement; index management•manipulating fixed-length record files•re-using space in fixed-length files•varying length records:[VLR] adds; dels; mods;•free lists for VLR - placement strategies (first, best, worst)•varying length record maintenance

Page 2: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 2© Katrin Becker

All Rights Reserved

Records in General

A record is:• An identifiable, describable data set• Often contains a sub-structure• Typically part of a larger structure

This definition also works for: files; fields; …

Page 3: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 3© Katrin Becker

All Rights Reserved

Records and Fields

FILE containing records

RECORD containing fields

FIELD containing elements

FILE SYSTEM containing files

Page 4: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 4© Katrin Becker

All Rights Reserved

Record Manipulation

• Operations on Records:– Searches– Additions– Deletions– Modifications

Page 5: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 5© Katrin Becker

All Rights Reserved

Record Manipulation - SearchSequential Search

•While NOT done:– Position file pointer– Read record– Examine record to see if it’s the

one•Yes DONE•No CONTINUE

Page 6: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 6© Katrin Becker

All Rights Reserved

Other Searches• What changes?

– Binary search:• We position the file pointer in a different

fashion (the rest is the same)

– Search with an index• We apply the search to the index and retrieve

the record only when located in the index

Page 7: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 7© Katrin Becker

All Rights Reserved

Record Manipulation – Addition

New record gets added to the end.

• Insertion into middle of file is impractical.• If there is an index, then we also perform

an addition to the index (addition to the end of this list is infeasible – WHY? ).

Page 8: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 8© Katrin Becker

All Rights Reserved

Addition with an Index - 1

1. New record gets added to the end.

RECORDS

INDEX

Page 9: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 9© Katrin Becker

All Rights Reserved

Addition with an Index - 2

2. Locate place where index entry needs to go

INDEX

RECORDS

Page 10: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 10© Katrin Becker

All Rights Reserved

Addition with an Index - 3

RECORDS

INDEX

3. Insert New Index entry (it’s a record too)

Page 11: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 11© Katrin Becker

All Rights Reserved

Records vs. Index:Assertions & Questions

• Moving file records is more expensive than moving index records.

• Should index be IN record file or its own file? (How do we maintain it? )

• If IN file: should it be at the beginning, end, middle, distributed?

• What if we are able to hold the index in memory?

• What if we can’t?

Page 12: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 12© Katrin Becker

All Rights Reserved

Record Manipulation - Deletion

• Locate record (Search)• Mark space as deleted• Remove index entry? (why or why

not)

Page 13: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 13© Katrin Becker

All Rights Reserved

Deletion with an index - 1

RECORDS

INDEX1. Locate index entry

Page 14: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 14© Katrin Becker

All Rights Reserved

Deletion with an index - 2

RECORDS

INDEX1. Locate index entry

2. Locate record

Page 15: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 15© Katrin Becker

All Rights Reserved

Deletion with an index - 3

RECORDS

INDEX

3. Delete (mark) record

Page 16: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 16© Katrin Becker

All Rights Reserved

Deletion with an index - 4

RECORDS

INDEX

4. Delete (mark?) index entry

Page 17: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 17© Katrin Becker

All Rights Reserved

Record Manipulation - Modification

• Locate record• Read record• Modify record• Re-write record (assuming fixed-

size records – what if the record is now a different size? [see later])

Page 18: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 18© Katrin Becker

All Rights Reserved

File Behaviour – 1 start

Record count = 9

Page 19: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 19© Katrin Becker

All Rights Reserved

File Behaviour – 2 add record

Record count = 10

Page 20: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 20© Katrin Becker

All Rights Reserved

File Behaviour – 3 add record

Record count = 11

Page 21: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 21© Katrin Becker

All Rights Reserved

File Behaviour – 4 delete

Record count = 10

Page 22: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 22© Katrin Becker

All Rights Reserved

File Behaviour – 5 delete

Record count = 9

Page 23: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 23© Katrin Becker

All Rights Reserved

File Behaviour – 6 add

Record count = 10

Page 24: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 24© Katrin Becker

All Rights Reserved

File Behaviour – 7 add

Record count = 11

Page 25: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 25© Katrin Becker

All Rights Reserved

File Behaviour – 8 add

Record count = 12

Page 26: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 26© Katrin Becker

All Rights Reserved

File Behaviour – 9 delete

Record count = 11

Page 27: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 27© Katrin Becker

All Rights Reserved

File Behaviour – 10 delete

And so on…….

Record count = 10

Page 28: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 28© Katrin Becker

All Rights Reserved

What’s happening to the file?• File grows – does not shrink (we get

fragmentation)• We end up covering more ground to do the

same job • Q: If we are doing random access, why does it matter?

• The file system has less space to use (the fragmentation is internal from the perspective of the file system).

• Worst case = EVERY record access ends up costing us a seek.

Page 29: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 30© Katrin Becker

All Rights Reserved

Re-Using Space in the File [FLR]• When there is a deletion, locate

the last record in the file, end move to the free slot– Costs:

•Additional file access to locate (where will we remember where the last records is?) and retrieve last record.

•Records will loose locality faster than if we simply mark the slot. (Why do we care?)

Page 30: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 31© Katrin Becker

All Rights Reserved

Re-Using Space – Way 2

• Make a list of places where records have been deleted.

• When doing addition, check for empty ‘slot’ before placing new record at end.

Q: What about the index?

• When doing deletion, add location of deleted record to ‘free-list’

Page 31: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 32© Katrin Becker

All Rights Reserved

What does the Free-List look like?

RECORDS

INDEX

All we need is the location.Order is unimportant.

Page 32: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 33© Katrin Becker

All Rights Reserved

How to decide which ‘slot’ to re-use?

• In FLR every slot will fit a new record.• We can just take the first one – Free-

List can then be maintained as a stack (which is easy).

• Do we keep Free-List information in the file?

Page 33: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 34© Katrin Becker

All Rights Reserved

Indexing – What is it?

• Table-of-contents for a file (directory)

• Uses keys• Byte Offset (BO) vs Relative

Record Number (RRN)

Page 34: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 35© Katrin Becker

All Rights Reserved

Primary Key Properties:

• Unique• Canonical• Data-less• Unchanging

Page 35: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 36© Katrin Becker

All Rights Reserved

Indexing – How does it Look?• Must have:

– Key– Way to locate record

• It is itself a structure containing ‘records’ (each index entry is a record)

• It may be separate from the main data or in the same file.

• It may be copied into memory for manipulation and only updated infrequently; or the file copy may be maintained as well.

INDEX

Page 36: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 37© Katrin Becker

All Rights Reserved

Indexing – File Ops?

• Tied to records:– If records added – new/update index

entry– If record deleted – ‘delete’ index entry– If record modified – maybe no change

to index; maybe update BO [byte offset]

Page 37: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 38© Katrin Becker

All Rights Reserved

Fixed-length vs Varying Length• VLR provides greater flexibility.• VLR increases maintenance overhead.• VLR decreases wasted space. *• VLR makes index virtually essential.• VLR complicates Free-List

maintenance.

*may simply waste space in a different place or a different way.

Page 38: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 39© Katrin Becker

All Rights Reserved

VLR Index

RECORDS

INDEX

• Requires:– Key– Byte offset– Record size? [optional]

Page 39: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 40© Katrin Becker

All Rights Reserved

VLR Search Operation

RECORDS

INDEX

• Same as for FLR:1. Locate key in index2. Locate record in file

• Binary search still possible on index, but NOT on records alone.

Page 40: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 41© Katrin Becker

All Rights Reserved

VLR Deletion Operation - 1

RECORDS

INDEX

Locate key

Page 41: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 42© Katrin Becker

All Rights Reserved

VLR Deletion Operation - 2

RECORDS

INDEX

Locate record

Page 42: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 43© Katrin Becker

All Rights Reserved

VLR Deletion Operation - 3

RECORDS

INDEX

Delete record

Page 43: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 44© Katrin Becker

All Rights Reserved

VLR Deletion Operation - 4

RECORDS

INDEX

• Remember location of ‘slot’• Remember size of slot.

Free-List

Page 44: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 45© Katrin Becker

All Rights Reserved

VLR Deletion Operation - 5

RECORDS

INDEX

5. Mark index entry

Free-List

Page 45: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 46© Katrin Becker

All Rights Reserved

VLR Addition Operation – 1a

RECORDS

INDEX

1. Search Free-List

Free-List

New

Record

Page 46: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 47© Katrin Becker

All Rights Reserved

VLR Addition Operation – 1b

New

Record

RECORDS

INDEX

Too Big for first place

Free-List

New

Record

Page 47: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 48© Katrin Becker

All Rights Reserved

VLR Addition Operation – 1c

New

RecordRECORDS

INDEX

Too Big for second place

Free-List

New

Record

Page 48: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 49© Katrin Becker

All Rights Reserved

VLR Addition Operation – 1d

New

RecordRECORDS

INDEX

Too Big for third place

Free-List

New

Record

Page 49: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 50© Katrin Becker

All Rights Reserved

VLR Addition Operation – 1e

RECORDS

INDEX

Place at end of file

Free-List

New

Record

Page 50: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 51© Katrin Becker

All Rights Reserved

VLR Addition Operation – 2a

RECORDS

INDEX

Search Free-List

Free-List

New

RecordNew

Record

Page 51: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 52© Katrin Becker

All Rights Reserved

VLR Addition Operation – 2b

RECORDS

INDEX

Fits in first place….BUT…..

Free-List

New

RecordNew

Record

Page 52: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 53© Katrin Becker

All Rights Reserved

VLR Addition Operation – 2c

RECORDS

INDEXWe will end up with left-

over unused (and probably unusable space).

We call this “First-Fit” (because we are using the first slot that we find that fits).

Free-List

New

Record

Page 53: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 54© Katrin Becker

All Rights Reserved

VLR Addition Operation – 2d

RECORDS

INDEX

If instead we keep looking…

We find the second entry is a better fit…..

Free-List

New

Record

Page 54: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 55© Katrin Becker

All Rights Reserved

VLR Addition Operation – 2e

RECORDS

INDEX

The third slot does not fit, so….

Free-List

New

Record

Page 55: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 56© Katrin Becker

All Rights Reserved

VLR Addition Operation – 2f

RECORDS

INDEX

We decide to use the second slot.

It is the Best-Fit

Free-List

New

Record

Page 56: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 57© Katrin Becker

All Rights Reserved

VLR Addition Operation – 2g

RECORDS

INDEX

1. Insert record.Free-List

New

Record

2. Delete Free-List entry.

3. Update Index

Notice the index entry is sorted differently.

What’s the advantage to leaving ‘spaces’ in the index?

Page 57: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 58© Katrin Becker

All Rights Reserved

VLR Modification Operation - 1

• 2 kinds:– 1. Mod results in record remaining same

size – 2. Mod results in record growing or

shrinking.

Page 58: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 59© Katrin Becker

All Rights Reserved

VLR Modification Operation - 2

• Mod results in record remaining same size – Same as for FLR

Page 59: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 60© Katrin Becker

All Rights Reserved

VLR Modification Operation - 3

• Mod results in record growing or shrinking.– Treat Mod as a deletion followed by an

addition.

Page 60: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 61© Katrin Becker

All Rights Reserved

Free-Lists

• May want to keep Free-List sorted.

• If the List is short it may not matter.

• Placement Strategies:– First Fit– Best Fit– Worst Fit

• It could be its own list or we could make the regular index serve double-duty.

Page 61: CS: Introduction to Record Manipulation & Indexing

Records and Indexing 14-Sep-03 62© Katrin Becker

All Rights Reserved

Summary• Managing space inside the file is our

business.• We must choose:

– FLR / VLR?– Index? (what kind?)– Secondary indices?– Re-claim free space? How?