Seven Cs of Communication

42
ISOM MIS710 Module 1b Relational Model and Normalization Arijit Sengupta

description

Usefull Lecture

Transcript of Seven Cs of Communication

ISOM

MIS710 Module 1bRelational Model and Normalization

Arijit Sengupta

ISOM

Structure of this semester

Database Fundamentals

Relational Model

Normalization

ConceptualModeling Query

Languages

AdvancedSQL

Transaction Management

Java DB Applications –JDBC

DataMining

0. Intro 1. Design 3. Applications 4. AdvancedTopics

Newbie Users ProfessionalsDesigners

MIS710

2. Querying

Developers

ISOM

Today’s Buzzwords

• Relational Model• Superkey, Candidate Key, Primary Key

and Foreign Key• Entity Integrity Rule• Referential Integrity Rule• Normalization• First, Second, Third, and Boyce-Codd

Normal Forms• Unnormalization

ISOM

Objectives of this lecture

• Understand the Relational Model and its properties• Understand the notion of keys• Understand the use and importance of referential

integrity• Provide an alternative way to design relations using

semantics rather than concepts• Take an existing “flat file” design and creating a

relational design from it through the process of Normalization

• Identify sources of problems (or anomalies) within a given relational design

• Argue about improvements to designs created by others

ISOM

Relational Data Model

• Originally proposed by Codd in 1970• Based on mathematical set theory

ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2Tuples

AttributesAttributeValues

Attribute NamesRelation

ISOM

Relation: Properties

• A relation is a set of tuples• A tuple is a set of attribute-value properties

(relations) Ordering of attributes is immaterial Ordering of Tuples is immaterial

• Tuples are distinct from one another• Attributes contain atomic values only

Emp# Name AddressE1 Jose' 'M.' 'Smith' 3413 Main Street', 'Atlanta', GA

ISOM

Attributes

• Attribute nameAttribute names are unique within a relation

• Attribute domainSet of all possible values an attribute may

takeDomain (GPA) = Domain (name) =Domain (DateOfBirth) = Domain (year)

• Number of attributes: degree of the relation

ISOM

Tuples

• Aggregation of attribute valuesS1 = (s1, ‘Jose’, 21, ‘StonedHill’, 3.1)S2 = (s2, ‘Alice’, 18, ‘BigHead’, 3.2)

• Cardinality: Number of tuples in a relation

• What is the difference between the cardinality and the degree?

ID Name Age Address GPAS1 Jose 21 Stoned Hill 3.1S2 Alice 18 BigHead 3.2S3 Lin 32 Done-Audy 2.9S4 Joyce 20 Atlanta 3.7S5 Sunil 27 Mare-iota 3.2

ISOM

Primary Keys

• Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes

• Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property.

• Primary Key: PK, the selected Candidate key, K.

• Can a primary key be composed of multiple attributes?• Can a relation have multiple primary keys?

ISOM

Keys - example

• Superkeys?

• Candidate keys?

• Primary key?

Disk: (ISBN#, Artist_name, Album_name, Year, Producer, Genre, time, price)

ISOM

Entity Integrity Rule

• The primary key of a base relation cannot contain a NULL value.

• Enforcement of the rule:An update which results in a NULL value

in the primary key must be rejected.

• Are the following ok?

Course Section Meets Enrolled201 1 MW 20201 NULL TTh 25

NULL NULL MWF 18

Primary Key

ISOM

Foreign Key

Physician (ID, Name, …) Patient (ID, Name, PhysID*, …)

Club (ID, Name, …) Player (ID, Name, ?*, …)

Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*)

Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*)

• Attribute(s) of one relation that reference(s) the PK of another relation

• FK may or may not be (a part of) the PK of this relation

Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*)Student (SID, Name, …, ?*) Registration (?)

• Can an FK refer to a part of the PK of another relation?• Can an FK refer to a PK of the same relation?

ISOM

Foreign Key ..

• FK and referenced PK may have different names

• The values of FK must draw from the value set of PK

• How do we define the Domain of an FK?• Can an FK have a NULL value?• What can we enforce with PKs and FKs?

Domain

Value Set Domain

Primary Key Foreign Key

ISOM

Referential Integrity Rule

• If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then: the FK value must match the PK value in some tuple of R1,

or the FK value may be NULL, but only if the FK is not (a part

of) the PK of R2.

• Enforcement of the Rule An update on either a referenced PK or an FK must satisfy

the rule. Otherwise, the operation is rejected.

• Which operation on the primary key may violate this rule?• Which operation on the foreign key may violate this rule?

ISOM

Referential Integrity Enforcement

• If an operation violates referential integrity:Restrict

• reject the operation

Cascade• try to propagate the operation to all dependent FK

values, if it is not possible, reject the operation

Nullify (or Default)• set all dependent FK values to NULL (or a default

value), if that is not possible, reject the operation

• Cases for each of the above situations?

ISOM

Creating Relations

create table STUDENT (ID char (11) not null primary key,Name char(30) not null,age int,GPA number (2,1));

create table COURSE (courseno char (6) not null primary key,coursename char(30) not null,credithours number (2,1));

create table REGISTRATION (ID references STUDENT (ID)

on delete cascade,CourseNum references COURSE (courseno),primary key (ID, CourseNum) );

ISOM

Normalization - Motivating Example

• Is there any redundant data?

• Can we insert a new course# with a new textbook?

• What should be done if ‘CIS’ is changed to ‘MIS’?

• What would happen if we remove all CIS 800 students?

SID Name Grade Course# Text Major Depts1 Joseph A CIS800 b1 CIS CISs1 Joseph B CIS820 b2 CIS CISs1 Joseph A CIS872 b5 CIS CISs2 Alice A CIS800 b1 CS MCSs2 Alice A CIS872 b5 CS MCSs3 Tom B CIS800 b1 Acct Accts3 Tom B CIS872 b5 Acct Accts3 Tom A CIS860 b1 Acct Acct

ISOM

Why Normalization?

• Poor Relation Design causes Anomalies Insertion anomalies - Insertion of some piece of

information cannot be performed unless other irrelevant information is added to it.

Update anomalies - Update of a single piece of information requires updates to multiple tuples.

Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information.

• Normalization improves the design to remove these anomalies

ISOM

Why Normalization?

• Benefitscontain minimum amount of redundancyallow users to insert, delete and modify tuples

in the relation without errors or inconsistencies. improve quality of information in the databasedecrease storage space for the database

• Costsmay contribute to performance problemsmay require more storage in some cases

ISOM

Unnormalized Relation

• Create a ‘Definition’ for this relation.• Do you see any problems in the definition?• Do you see any anomalies in the data?

STUDENT STUDENT COURSE COURSE INSTR ROOM CREDITS GRADEID NAME ID NAME NAME

224 Waters CIS20 Intro CBIS Greene 205G 5 ACIS40 Database Mgt Hong 311S 5 BCIS50 Sys.Analysis Purao 139S 5 B

351 Byron CIS30 COBOL Brown 629G 3 BCIS50 Sys.Analysis Purao 139S 5 C

421 Smith CIS20 Intro CBIS Greene 205G 5 BCIS30 COBOL Brown 629G 3 BCIS50 Sys.Analysis Purao 139S 5 B

ISOM

Normal Forms

Unnormalized Relation

First Normal Form

Second Normal Form

Third Normal Form

Higher Order Forms

Only atomic attributes

Remove nonkey dependency

Remove transitive dependency

Dependency preservation: BCNFRemove Multi-valued Dependencies: 4NFRemove Join Dependencies: 5NF

NF2

1NF

2NF

3NF

BCNF

ISOM

The Basis of Normalization

• Functional Dependency (FD)Consider two attributes, X and Y, and two

arbitrary tuples r1 and r2 of a relation R.

• Y is functionally dependent on X iff:

value of x in r1 = value of x in r2implies

value of Y in r1 = value of Y in r2

• Also stated as: R.X R.Y or X Y

ISOM

Properties of FDs

• If R.X R.Y or X Y X is called the determinant of Y. X may or may not be the key attribute of R. A FD changes with its semantic meaning

• Name Address?

X and Y may be composite X and Y may be mutually dependent on each other

• Husband Wife, Wife Husband

The same Y value may occur in multiple tuples• Course# Text

ISOM

Fully Functional Dependencies

• When is X Y a FFD?When Y is not functionally dependent on any proper subset

of X

• X Y is a fully functional dependency ( FFD )( SID, Course# ) Name? ( SID, Course# )

Grade?

( SID, Name ) Major? ( SID, Name ) SID?

• By default, the term FD refers to FFD

ISOM

Transitive Dependencies

• Given attributes X, Y, and Z of a relation R,• Z is transitively dependent on X (X Z)

iff X Y and Y Z

• For example:SID Dept, SID Major,

Dept School, Major Dept

• Do you see any Transitive Functional Dependencies?

ISOM

Some Inference Rules for FDs

• An FD is redundant if it can be derived from other FDs based on a set of inference rules. Some of these rules are:

• Reflexive rule: If X Y, then X Y X always determines a subset of itself.

• Augmentation rule: If X Y, then XZ YZ Adding an attribute(s) on both side does not change the FD.

• Transitive rule: If X Y & Y Z, then X Z Functional dependencies can be ‘chained’.

• Decomposition rule: If X YZ, then X Y and X Z• Given: { SID Name, SID Major, Major Dept }, which ones

is/are redundant?SID School, SID Dept, Dept SchoolSID ( Name, Major ), (SID, Name) (Major, Name)SID SID, SID (Name, SID)

ISOM

First Normal Form

• DEFINITIONA relation R is in first normal form (1NF) if and

only if all underlying domains contain atomic values only.

• TranslationTo be in first normal form the table must not

contain any repeating attributes.

• ImplicationAre all ‘relations’ in First Normal Form (1NF) ?

ISOM

Example - 1NF

The ‘unnormalized’ relation has been decomposed in two.

• What are the PKs?

StudentID Course# Course Title Instrname ROOM CREDITS GRADE224 CIS20 Intro CBIS Greene 205G 5 A224 CIS40 Database Mgt Hong 311S 5 B224 CIS50 Sys.Analysis Purao 139S 5 B351 CIS30 COBOL Brown 629G 3 B351 CIS50 Sys.Analysis Purao 139S 5 C421 CIS20 Intro CBIS Greene 205G 5 B421 CIS30 COBOL Brown 629G 3 B421 CIS50 Sys.Analysis Purao 139S 5 B

StudentID StudentName224 Waters251 Byron421 Smith

Relation: Student-CourseRelation: Student

ISOM

Anomalies (with only 1NF)

• Insertion Anomaly A new course cannot be inserted in the database (relation

Student-Course) until a student registers for that course.

• Update Anomaly If the instructor of a course is changed, this fact would have

to be noted at many places in the database (many tuples of the relation Student-Course).

• Deletion Anomaly Withdrawal of all students from an existing course (that is,

deletion of related tuples from the relation Student-Course) will result in unwarranted removal of that course from the database.

ISOM

Anomalies in 1NF

Course (SID, Name, Grade, Course#, Text, Major, Dept)

• 1NF Relations have anomaliesRedundant Information ?Update Anomalies ? Insertion Anomalies ?Deletion Anomalies ?

Major

Dept

SID

Course#

Name

Grade

Text

ISOM

Second Normal Form

• DEFINITIONA relation R is in second normal form (2NF) if

and only if it is in 1NF and every nonkey attribute is dependent on the full primary key.

• TranslationA table is in second normal form if there are no

partial dependencies.

• ImplicationWhat kinds of primary keys may lead to a

violation of the Second Normal Form (2NF) ?

ISOM

Bubble Chart

• Reconsider the example ..

StudentId+CourseId

StudentName

CourseTitle

Credits

Instructor

Classroom

Grade

ISOM

Dealing with Compound Keys

• Revised Bubble Chart

StudentId

StudentName

CourseTitle

Credits

Instructor

Classroom

Grade

CourseId

ISOM

Example - 2NF

STUDENT STUDENTID NAME

224 Waters251 Byron421 Smith

STUDENT COURSE GRADEID ID

224 CIS20 A224 CIS40 B224 CIS50 B351 CIS30 B351 CIS50 C421 CIS20 B421 CIS30 B421 CIS50 BCOURSE COURSE CREDITS

ID TITLECIS20 Intro to CIS 5CIS30 Java 3CIS40 DBMS 5CIS50 Systems Analysis 5

ISOM

Anomalies with (only) 2NF

• Insertion anomaly Information about a faculty (potential advisor) cannot be

added to the database unless a student is assigned to him/her.

• Update anomaly If the advisor’s office location or phone were changed, many

tuples would need to be changed.• Deletion anomaly

If all students assigned to an advisor graduate, information about the advisor will disappear from the database.

STUDENT STUDENT STATUS ADVISOR ADVISOR ADVISOR TOTALID NAME OFFICE PHONE CREDITS

224 Waters Junior Young CBA221 726104 105351 Byron Soph Greene CBA215 718434 77421 Smith Junior Young CBA221 726104 97

ISOM

Third Normal Form

• DEFINITION A relation R is in third normal form (3NF) if and only if

it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key.

• Translation A table is in Third Normal Form if every non-key

attribute is determined by the key, and nothing else.

• Implication How many total attributes must the relation have for a

possible violation of the Third Normal Form (3NF) ?

ISOM

3NF Example

• Chalk out the relations.

How do you maintain student-advisor relation?

StudentName

Status

TotalCredits

AdvisorOffice

AdvisorPhone

StudentId

Advisor

Advisor

ISOM

Boyce-Codd Normal Form (BCNF)

• Update anomalies occur in an 3NF relation R ifR has multiple candidate keys,Those candidate keys are composite, andThe candidate keys are overlapped.

Computer-Lab (SID, Account, Class, Hours)

• A relation R is in BCNF iff every determinant is a candidate key.

ISOM

The Normalization Process

1. Flatten the Table Completely (no composite columns)

2. Find the Key and “all” FDs (well as many as you can possibly detect)

3. Find Partial Dependencies and decompose relation using them (2NF)

4. Find Transitive dependencies and decompose using them (3NF)

5. Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!

ISOM

Lossless Decomposition

• A bad decomposition loses information

• In a good decomposition The join of decomposed relations restores the original

relation Decomposed relations can be maintained independently

• Rissanen’s rule for non-loss decomposition: Two projections R1 and R2 of a relation R are independent iff: Every FD in R can be logically deduced from those in R 1

and R 2 , and The common attributes of R 1 and R 2 form a candidate

key for at least one of the pair.

ISOM

Higher Normal Forms

• Fourth Normal FormMultivalued Dependencies (Fagin 1977)

• Fifth Normal FormJoin Dependencies (Fagin 1979)

• Other Dependencies Inclusion Dependencies (Casanova 1981)Template Dependencies (Sadri 1982)Domain-Key Normal Form (Fagin 1981)

ISOM

In-class Exercise – Normalize this: