CS102-01 Intro to File Org

27
CS 102 File Structures & File Organizations Chapter 01 roduction to File Organizati

Transcript of CS102-01 Intro to File Org

Page 1: CS102-01 Intro to File Org

CS 102File

Structures & File

OrganizationsChapter 01

Introduction to File Organizations

Page 2: CS102-01 Intro to File Org

CJD

Files Terminology

File – a collection of related data

examples : student records, payroll file

Entity – a data item of interest

example : student or employee

Fields or Columns – data related to an entity or object

examples : last name, first name, gender, birth date, (degree program or department), (year level or job title), etc.

Page 3: CS102-01 Intro to File Org

CJD

Files Terminology

Record or Row – a grouped collection of fields about an entity or object

examples: fields about a student or employee

Key – one or more fields chosen to identify a record uniquely. Files are usually ordered according to key values.

examples: student or employee number, or last name, first name and birth date.

Page 4: CS102-01 Intro to File Org

CJD

Why External Storage ?

collection of data usually too large to fit in main memory

only a small portion of the file is accessed at a time need to be in internal memory

file needs to be stored permanently in non-volatile storage for access by several programs

Preference : Contiguous storage of data

provides quicker access to information.

Page 5: CS102-01 Intro to File Org

CJD

Our Objective

Objective :

Investigate structures used to organize large collections of data into files stored on secondary storage devices.

File Design goals :

Efficiency of storage and access.

Many factors affect file design.

Page 6: CS102-01 Intro to File Org

CJD

Factors of File Design

File Organization – how records are stored

File Type – role of the file in an information system

File Characteristics – activity and volatility of the file

File Manipulations – operations to access the file and keep it current

Page 7: CS102-01 Intro to File Org

CJD

File Organizations

the way in which records are stored in an external file.

the data structures used for organizing the data

Common File Organizations

1. Sequential

2. Random

3. Indexed Sequential

4. Multikey

Page 8: CS102-01 Intro to File Org

CJD

Sequential File Organization

Records are stored and accessed consecutively in sequence from beginning to end

Records are usually in ascending or descending order by the key field

On average, half the records must be accessed to locate a record of interest

The entire file must usually be copied in order to update it

Records may be fixed length or of varying length

Page 9: CS102-01 Intro to File Org

CJD

Random File Organization

Allows access of a record without sequential search through the file

There is a relationship between each record’s key and its location in an external file

Directly computes the location of the record using key value

There may be no relationship between the logical ordering and the physical ordering of the file.

Page 10: CS102-01 Intro to File Org

CJD

Relative Files

an implementation of random file organization with fixed record lengths.

Each record in secondary storage is assigned a record number which designates its relative position with respect to the beginning of the file.

Page 11: CS102-01 Intro to File Org

CJD

Relative Files

The first record may be record number 0 or 1 depending on implementation.

record address =

address of beginning of file +

relative record number x fixed record length

can be updated in place

can be sequentially accessed according to physical order of records in storage but this access is usually meaningless.

Page 12: CS102-01 Intro to File Org

CJD

Indexed Sequential File Organization

a hybrid of sequential and random file organizations

Records may be variable length.

Records are grouped into blocks.

Blocks are fixed length.

Records in a block are not necessarily ordered but usually are.

Each block is stored in a contiguous location.

The blocks are organized into a relative file ordered by the key fields of representative records per block.

Page 13: CS102-01 Intro to File Org

CJD

Indexes

there is a hierarchical structure of record keys and relative block numbers called an index

To retrieve a record,

index is used to retrieve the relative record number of the block containing the record,

then the relative file of blocks is accessed randomly

then the block is searched sequentially for the record

Page 14: CS102-01 Intro to File Org

CJD

Indexes

index can specify order in which file should be accessed sequentially by record keys.

access speed is almost as fast as random file organization but slower due to the index search

Page 15: CS102-01 Intro to File Org

CJD

Multikey File Organization

allows multiple ways to access the file by several different key fields

uses indexing structure with indexes for each key field

Page 16: CS102-01 Intro to File Org

CJD

File Types

Six File Types according to functionality :

1. Master File - records of permanent data but are updated occasionally

major collection of data pertaining to a specific application

usually stored on disks, and nowadays rarely on tape

Example : Bank accounts file, accounts balances file

Page 17: CS102-01 Intro to File Org

CJD

File Types

2. Transaction File – records of operations applied to master file

Example : new account opening file, Deposits/Withdrawals file

3. Table File – records used for lookup

Example : Interest rates file, minimum balance requirement file, account type description file

Page 18: CS102-01 Intro to File Org

CJD

File Types

4. Report File – information prepared for the user. May be printed or displayed on-screen.

Example : Summary of accounts, Error and Audit listings of a maintenance run

5. Control File – summary of maintenance run

Example contents : run date, maintenance statistics

6. History File – backup of master, transaction and control files from past runs.

Page 19: CS102-01 Intro to File Org

CJD

File Characteristics

Usage characteristics of a file :

file size – consider initial and future file sizes to determine storage that can accommodate file.

activity – percentage of master records updated during a maintenance run

high activity is more efficiently stored using sequential file organization

low activity is more efficiently stored using organizations with random access that allows update in place

Page 20: CS102-01 Intro to File Org

CJD

File Characteristics

volatility – number of records added and deleted compared to original number of records.

high volatility – best updated using merge procedures. Non-sequential files have high overhead to reorganize as a result of these updates.

frequency of use – sequential files require more time to update on hourly basis instead of daily or monthly basis, so can’t be used frequently

required response time – real time access require random access

Page 21: CS102-01 Intro to File Org

CJD

File Manipulations : Queries1. Queries

searching records whose values meet a criteria

the types of queries to be performed can affect file design

Examples :

List the record of student Juan De La Cruz

List all students enrolled in more than 20 units

List all second-year level Engineering students

Count all Engineering students per major with an average of 1.5 or better

Page 22: CS102-01 Intro to File Org

CJD

File Manipulations : Merging

2. Merging

combining data extracted from two or more files

Examples :

File of Engineering students is usually separate from file of grades

Counting all Engineering students per major with an average of 1.5 or better requires merging these files.

Page 23: CS102-01 Intro to File Org

CJD

File Manipulations : Maintenance

3. Maintenance

updating master file with transaction file to keep the master file up-to-date

possible transaction or update codes could be

A = addition of a new master file record

C = changing values of fields of master file records

D = deletion of an existing record from master file

Page 24: CS102-01 Intro to File Org

CJD

File Manipulations : Maintenance file maintenance program merges transaction and

master files

if transaction refer to a non-existent master file record, it must be an addition transaction

if transaction refer to an existent master file record, it must be an update or deletion

if no transactions refer to a master file record, no update on that record is made

For sequential files, transactions must be ordered in the same key sequence as the master file so they may be merged. A new master file is created by the maintenance program.

Page 25: CS102-01 Intro to File Org

CJD

File Manipulations : Multifile

4. Multifile Information Systems

uses many master files

designed to reduce duplication of information

Examples :

The last name and first name need not be stored with each grade of a student.

Only the key field “student number” is in both student file and grades file to link corresponding records.

Page 26: CS102-01 Intro to File Org

CJD

File Manipulations : Multifile

4. Multifile Information Systems : DBMS

Database management systems (DBMS) is a multifile information system

DBMS is out of scope of this course and is the subject of a different course.

Page 27: CS102-01 Intro to File Org

CJD

End