CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... ·...

15
CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs Object Code Data Manipulation Language Pre-compiler (DML) Query Processor Data Definition Language Compiler (DDL) Data Base Manager Other Services Users Other OS Services File Manager Memory Manager Other Files Data Dictionary Data Files Naïve Users Application Interfaces Application Programmers Application Programs Sophisticated Users Query Database Administrator DBA Data Base Schema Database Management System Operating System Disk Storage - Physical

Transcript of CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... ·...

Page 1: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

CPS510 Database System DesignPrimitive SYSTEM STRUCTURE

ApplicationPrograms

Object Code

Data ManipulationLanguage Pre-compiler

(DML)

QueryProcessor

Data DefinitionLanguage Compiler

(DDL)

Data BaseManager

OtherServices

Users

Other OSServices

FileManager

MemoryManager

OtherFiles

DataDictionary

DataFiles

NaïveUsers

ApplicationInterfaces

ApplicationProgrammers

ApplicationPrograms

SophisticatedUsers

Query

DatabaseAdministrator

DBA

Data BaseSchema

Database Management System

Operating System

Disk Storage - Physical

Page 2: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Introduction

A database can be defined as a set of Master files organized and administered in a flexible way, so that the files in the database can be easily adapted to new unforeseen tasks!

• Relational Database Management Systems

• Hierarchical Database Management Systems

• Network Database Management Systems

• Inverted Files DBMS

There are 3 forms of database description - the ANSI/SPARC, 1975 (and so on...)

1. Conceptual Schema (Conceptual View)

Machine-and-software independent description of the total database. It is also referred to as a logical database.

2. Internal Schema (Internal View)

Description of the physical database! It is close to the machine level and describes such things as file organization and access paths.

3. External Schema (User View)

User-oriented description of part of the database. It corresponds to a way in which a program needs to "view" the database. Since there are many purposes to which the data in a database maybe put, there will be many different external schemas (or schemata) corresponding to different programs interpreting the database in particular ways.

A DBMS is software used to read/write, maintain, and provide among other things, security and integrity for the data.

Page 3: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Facilities • Standard methods used by DBMSs to implement relationships • Data dictionary, docs, etc... • Data independence (i.e. the possibility of changing the physical

database without having to make alterations to old programs that operate on the database.) Example: consider the File:

CODE NAME ADDRESS TELEPHONE SALARY

• Database languages (QUEL, SQL, etc) • Report generators/screen generators• Recovery facilities • Concurrency facilities • File protection

Elements of the Conceptual Schema

Definition: A conceptual schema maybe thought of as a model of the enterprise using it.

The Description of the Conceptual Schema may include a description of:

1. The kinds of logical files and record types comprising the db.

2. The fields included in the record type

3. The relationships between the different record types of the db.

4. Any limitations in the values that can be taken by individual fields, as well as, constraints upon the relationships between records.

Page 4: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Entities/Attributes

• An Entity is an item or a concept in the real world about which we

want to report data.

• The data associated with an Entity is called Attribute (field names in

a record).

• An entity type is a classification of all the entities describing the same

type of item or concept from the real world. • A description of an Entity type in the conceptual database includes:

• An unique name for each entity type. • A field containing a unique identifier for each individual Entity

called a key. • A description of all attributes or fields of the entity type. • An indication of the number of occurrences of an entity type the

Database will be required to hold - known as the cardinal number of the entity type.

Some attributes which maybe in an entity • Mandatory or optional depending on whether fields should or should

not always have a value. • Single-valued or multi-valued. Multi-valued Attributes are stored in

repeating fields • Aggregate or simple. Corresponding to whether an attribute is formed

from a combination of other attributes or NOT.

Relationships

• Relationship: is used to describe a connection between Entities. • Relation: is used to designate a logical table describing a set of similar

Entities.

(File = Relation = Table)

Page 5: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Data Models

A data model is a method used to describe the Entity types and relationships

of the Conceptual Database. • E-R Entity / Relationship (or E-R-A)

The E-R is used for top-down analysis of new systems.E-R Diagrams (Frank (1971) Chen (1976))

• Bachman Diagrams (1969) or Data Structure Diagrams

• Relational Model (see the paradigm)

The Relational Model for bottom-up of existing Relations

• User View Diagrams for bottom-up also.

Examples

E r

E1 r

2 E2

n m

E1 r

2 E2

1 m

E1 r

2 E2

1 1

E1 r E

2

E3

E

1

m r Tree

E

n

m r Graph/Network

E-R

Order 3

Bachman

Page 6: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

The Design of the Conceptual Schema

Conceptual schema independent of concrete DBMS

Normalization Stage 2 Normalization

Adjusted diagrams and normalized Relational model

Logical Optimizationand Adjustments

to a concrete DBMSStage 3 Optimization

Data Dictionary Database Description

E-R designof the DB

User Requirements

Real World

Stage 1 Choice of a Model

Page 7: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Normalization

Normal forms • Formulate constraints on the structures of table, in the database to obtain a logical

database that is like the "External world". • By applying different sets of constraints results in differently structured tables and

normal forms. • Terminology

• Relation = table = file• Tuple = record• Attribute = a field• Domain = field's value area

Example: A sales person file (Before normalization)

SNR City Code PNR Qty PNR Qty PNR Qty PNR Qty PNR Qty

S1 Athens 10 --- ---

S2 Toronto 30 P1 200 P3 100

S4 Kingston 20 P5 200 P8 100

S5 Toronto 30 P1 50 P3 500 P4 800 P5 500 P8 1000

1NF (First Normal Form - FNF) • Each tuple must contain a unique identifier.

• Tuples have only atomic values in their A's (i.e. repeating groups are excluded)

Page 8: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

The file after normalization. 1NF

SNR City Code PNR Qty

S1 Athens 10 --- ---

S2 Toronto 30 P1 200

S2 Toronto 30 P2 100

S4 Kingston 20 P5 200

S4 Kingston 20 P8 100

S5 Toronto 30 P1 50

S5 Toronto 30 P3 500

S5 Toronto 30 P4 800

S5 Toronto 30 P5 500

S5 Toronto 30 P8 1000

Functional Dependency

• Assume that X and Y are fields in the same record (SC, City). • Field Y is said to be functionally dependent on field X, if and only if,

for all pairs of records in the file, that if they have the same value in field X, then they also have in field Y.

• Field Y is said to be fully functionally dependent on X if Y is functionally dependent on X, and NOT functionally dependent on any subset of X's possible subfields.

• A field on which another field is fully functional dependent is called the determinant for that field.

• Field Y is said to be transitively functionally dependent on X if there is

some other field Z such that X determines the value of Z and Z

determines the value of Y.

Page 9: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

2NF (Second Normal Form - SNF) • A relation is in 2NF if it is in 1NF and if all non-identifier fields are

fully functional dependent on the record's identifier • Example given with SC and City or with SNR and Name

SNR Name

S1 CB

S2 BB

S4 RR

S5 PO

3NF (Third Normal Form)

• The relation is in 3NF if it is in 2NF and the record's non-identifier

fields are NOT transitively dependent on the record's id field. • The non-key fields must contain data that is attached to the identifier

field, to the entire identifier and nothing but the identifier field. (this is omitted from the 2NF)

Example The sales person table is represented as two tables satisfying the 3NF provided that SC (Sales Code) and City are independent.

a)

SNR PNR Qty

S2 P1 200

S2 P2 100

S4 P5 200

S4 P8 100

S5 P1 50

S5 P3 500

S5 P4 800

S5 P5 500

S5 P8 1000

b)

SNR SC City

S1 10 Athens

S2 30 Toronto

S4 20 Kingston

S5 30 Toronto

Page 10: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

3.5NF (Boyce/Codd Normal Form B/C NF) • This NF makes use of the determinant.

A field is the Determinant for other fields in the tuple, if all the fields make up a description of a type of object or concept from the external world in such a way that the determinant can be used as id key field for the new type of object or concept.

• Now we can define a relation in B/C NF as one in which any determinant attribute can be used as the tuples' id field.

• Note: The 3NF contains a constraint which the B/C does not: • A relation is in the B/C normal form may well contain several

different unique id fields, which the math derivation of the 3NF does not allow!

• The 3.5NF is more practical than the 3NF.

4NF (Fourth Normal Form) • Multi-valued dependency (between attributes) • It holds between two A's in a table if the second A can assume

different values for a given value in the first A. • If a relation contains two multi-value dependencies, these may depend

on or be independent of each other, respectively. • A relationship of order n, (please see order 3 in the E-R diagram

example), will usually include many multi-valued dependencies (MD). If these are independent of each other, the relationship of order n can be reduced to l:n relationships and n:m relationships.

• A Table satisfies the 4NF if it satisfies the 3NF and if the table contains several MDs which are dependent of each other.

• If a table does not satisfy the 4NF because it may contain two independent MDs, then the table can be normalized by splitting it up into two different tables, each of which contains one of the MDs.

• Note: if the MDs are dependent on each other, the tables normally cannot be split.

• Problems with maintenance and updating!

5NF (Fifth Normal Form) • Theoretical value. • A relation R is in 5NF if and only if every join dependency

(projection - join NF).• Possible to reduce a R containing two MDs which are dependent on

each other

Page 11: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Example: Relationship of order 3 • Assume that a database contains a relation for each one of:

vehicle dealers, vehicle manufacturers, vehicle types.

Note: There is an N:M relationship between dealers and manufacturers. And a N:M relationship between dealers and types

a) Show that C.B. and M.M. each sell Ford and GM products.

Dealer Manufacturer

C.B. Ford

C.B. GM

M.M. Ford

M.M. GM

b) Show that C.B. sells cars and buses and M.M. sells cars and trucks.

Dealer Type

C.B. Car

C.B. Bus

M.M. Car

M.M. Truck

a) MD between dealer and manufacturer. b) MD between dealer and type.

The 2 tables can be implemented as a simple table with three columns.

The MDs are in the same table.

The 2 MDs are independent of each other if the "contents" of the first row mean that C.B. sells only Ford cars.

• if the MDs are independent, it is possible that C.B. doesn't sell Ford cars, only Ford buses

• if the MD are dependent, C.B. sells Ford's cars.

4NFDealer Manufacturer Type

C.B. Ford Car

C.B. Ford Bus

C.B. GM Car

C.B. GM Bus

M.M. Ford Car

M.M. Ford Bus

M.M. GM Car

M.M. GM Bus

5NFDealer Manufacturer Dealer Type

C.B. Ford C.B. Car

C.B. GM C.B. Bus

M.M. Ford M.M. Truck

Manufacturer Type

Ford Car

Ford Bus

GM Car

GM Truck

Page 12: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Data Definition in DB2

The principal data definition statements are: CREATE TABLE ALTER TABLE DROP TABLE CREATE VIEW DROP VIEW CREATE INDEX DROP INDEX

CREATE TABLE There are two formats for the CREATE TABLE statement:

1. CREATE TABLE table-name (column-definition [, column-definition ]... [, primary-key-definition] [, alternate-key-definition [, alternate-key-definition]...])[, foreign-key-definition, [foreign-key-definition...J) [ other parameters];

where a column-definition iscolumn data-type [NOT NULL [ WITH DEFAULT I UNIQUE]]

e.g.

CREATE TABLE SS# CHAR(5) NOT NULL, SNAME CHAR(20) NOT NULL WITH DEFAULT,STATUS SMALLINT NOT NULL WITH DEFAULT,CITY CHAR(15) NOT NULL WITH DEFAULT,PRIMARY KEY (S#));

2. CREATE TABLE table-name LIKE table [other parameters];

This format allows the user to create a table with, the same "shape" as another. The new table inherits only the column definitions from the old one.

e.g.

CREATE TABLE SCOPY LIKE S

This would generate a table identical to a table generated with the following CREATE TABLE statement:

Page 13: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

CREATE TABLE SCOPY S# CHAR(5) NOT NULL, SNAME CHAR(20) NOT NULL WITH DEFAULT,STATUS SMALLINT NOT NULL WITH DEFAULT,CITY CHAR(15) NOT NULL WITH DEFAULT);

Note that the new table does not inherit any primary, alternate, or foreign key definitions. Nor would it inherit any UNIQUE specifications. DB2 does not allow any such specifications to be stated explicitly either.

ALTER TABLE

A new column can be added to a table at any time using the ALTER TABLE command:

ALTER TABLE table-name ADD column data-type [NOT NULL WITH DEFAULT];

e.g.

ALTER TABLE S ADD DISCOUNT SMALLINT

DROP TABLE An existing table can be destroyed at any time by means of the DROP TABLE statement:

DROP TABLE table-name;

Foreign Key, and Referential Integrity in DB2 The referential integrity rule states that the database must not contain any unmatched foreign key values. That is non-null foreign keys for which there does not exist a matching value of the corresponding primary key are not allowed.

The syntax of a foreign key definition is as follows:

FOREIGN KEY [foreign-key] (column [,column]...]), REFERENCES table [ON DELETE effect]

where effect is RESTRICT, CASCADE, or SET NULL.

Page 14: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

e.g.

CREATE TABLE SP ( S# CHAR(5) NOT NULL, P# CHAR( 6) NOT NULL, QTY INTEGER, PRIMARY KEY (S#, P#), FOREIGN KEY SFK (S#) REFERENCES S

ON DELETE CASCADE FOREIGN KEY PFK (P#) REFERENCES P

ON DELETE RESTRICT);

The ON DELETE clause defines the delete rule for the target table with respect to this foreign key; that is, it defines what happens if an attempt is made to delete a row from the target table.

RESTRICT: the delete is restricted to the case where there are no matching rows in table T2 (it is rejected is any such rows exist).

CASCADE: the delete cascades to delete all matching rows in table T2 also.Note: if the key in T2 references yet another table T3, the delete rule for that key is applied as well. That is a single delete statement can cascade through a large number of tables if you are not careful.

SET NULL: the foreign key must have “NULLs allowed”. The target row is deleted and the foreign key is set to NULL in all matching rows in table T2.

INDEXES The CREATE INDEX takes the general form:

CREATE [UNIQUE] INDEX index ON table-name (column [order] [, column [order]]...)[other parameters] ;

e.g. CREATE INDEX X ON T(P, Q, DESC, R);

This creates an index called X on table T in which entries are ordered by ascending R-value, within descending Q-value and within ascending P-value. The columns P, Q and R need not be contiguous, nor need be all the same data type, nor need they all be fixed or varying length.

The UNIQUE option specifies that no two rows in the indexed tables will be allowed to take on the same values for the indexed column or column combinations at the same time.

Indexes can be dropped by issuing a DROP INDEX command.

e.g.DROP INDEX X

Page 15: CPS510 Database System Design Primitive SYSTEM STRUCTUREmcarberr/cps510/06CPS510Notes... · 2006-09-21 · CPS510 Database System Design Primitive SYSTEM STRUCTURE Application Programs

Physical Database Structures

File Organizations

• Sequential Organization: Records are stored according to a fixed sequence.

• Random Organization: Records are retrieved by transforming the id field (key) to a block address.

• Index Organization: Records can be searched through an index that contains references to the records in a file.

• List Organization: There are various forms. Usually, records are chained together by pointer fields.

Note: There are primary and secondary File Organizations:

Primary is based on the physical storage of the individual records, but the secondary is not.

Keys

• Super Key: An attribute (or combination of attribute) that uniquely identifies each entity.

• Candidate Key: A minimal super key that does not contain a subset of attributes that itself is a super key Primary Key: A candidate key selected to uniquely identify all other attribute values in any given row. It cannot contain null values. (chosen by the DB designer).

• Secondary Key: An attribute (or combination of attributes) used

strictly for data retrieval purposes.

• Foreign Key: An attribute (or combination of attributes) in a table

whose value must either match the primary key in another table or be

null

Entity Sets

• Weak: An entity set does not have sufficient attributes to form a

primary key. • Strong: An entity set which has as a primary key.

Integrity Rules

• Entity integrity: No null values in primary key guarantees that each entity will have a unique identity.

• Referential Integrity: Foreign key should match another primary key or be null. Makes it possible for an Attribute NOT to have a corresponding Attribute, but it will still be impossible to have an invalid entry.