Appendix C File Organization & Storage Structure.

13
Appendix C File Organization & Storage Structure

Transcript of Appendix C File Organization & Storage Structure.

Page 1: Appendix C File Organization & Storage Structure.

Appendix C

File Organization & Storage Structure

Page 2: Appendix C File Organization & Storage Structure.

Agenda

• Definition

• Types of File Organization

Page 3: Appendix C File Organization & Storage Structure.

Definition

• Logical record & physical record

• File organization

• Access method

Page 4: Appendix C File Organization & Storage Structure.

Types of File Organization

• Heap (unordered)

• Sequential (ordered or sorted)

• Hash (direct or random)

• Index

Page 5: Appendix C File Organization & Storage Structure.

Heap• Unordered structure• Pros

– Simple– No overhead

• Cons– Slow– Waste space (deletion)

• For– Bulk-loaded– Short file– Retrieving 80% of the file

Page 6: Appendix C File Organization & Storage Structure.

Ordered

• Sorted according to a field value or primary key field

• Pros– Binary search– Sequential processing

• Con– Slow for retrieval information needed by

management

Page 7: Appendix C File Organization & Storage Structure.

Hash

• Terminology– Hash field, hash key

– Collision, synonyms

– Bucket, slots

• Types– Folding

– Division-remainder

• Collision handling– Open addressing or unchained overflow

– Chained overflow

– Multiple hashing

Page 8: Appendix C File Organization & Storage Structure.

Direct (Random or Hash)

• Pro– Random processing

• Cons– Sequential processing– Updating (reorganization)

Page 9: Appendix C File Organization & Storage Structure.

Indexes• Terminology

– Primary index (one for each file)– Secondary index for unique field or non-unique field

(several for each file)– Clustering index for clustering attribute (non-key field

or non-unique field)– Sparse index for some of the search key values– Dense index for every search key value

• Types– Linked list– Inverted file– Indexed sequential– B+-tree

Page 10: Appendix C File Organization & Storage Structure.

Indexed Sequential • Structure

– Prime area

– Index area: track no, highest key on the track, highest key in the overflow, address of first overflow record

– Overflow area: address, record, pointer

• Types– Indexed Sequential Access Method (ISAM)– Virtual Sequential Access Method (VSAM)

• Pro– Sequential & random processing

• Con– Waste spaces (deletion)

– Inefficient due to overflow

Page 11: Appendix C File Organization & Storage Structure.

B+-Tree• Terminology

– Node– Root– Parent– Child– Leaf– Depth: the maximum number of level– Balanced tree– Degree or order (n): the maximum number of children

• Rules– Root having at least two children– Each node having n/2 and n pointers (children)– Key values in leaf have to be between (n-1)/2 and (n-1)– Max no. of key values in non-leaf is 1 less than pointer– Balanced tree– Ordered values in leaf

Page 12: Appendix C File Organization & Storage Structure.

Points to Remember

• Definition

• Types of File Organization

Page 13: Appendix C File Organization & Storage Structure.

Assignment

• Review chapter 1 & appendix C

• Read chapter 2

• Group list due date: 9/18/07

• Homework due date: