IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

33
IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu www.cs.usna.edu/~adina

Transcript of IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Page 1: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

IT420: Database Management and Organization

Normalization31 January 2006

Adina Crăiniceanuwww.cs.usna.edu/~adina

Page 2: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Previously on IT420

Database design Entity-Relationship (ER) Model Relational Model Transform ER to relational model

Database Implementation SQL

Page 3: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Goal

Refresh ER to Relational Normalization

Page 4: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

ER to Relational

ER: Entities

identifiers

Relationships cardinality

Relational model Tables Constraints

Page 5: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

ER to Relational

Transform entities in tables Transform relationships using foreign keys Specify logic for enforcing minimum

cardinalities

Page 6: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Class Exercise: Transform ER model into Relational Model

Min Cardinality:1

Min Cardinality:1

AIRCRAFT

FAA number

ModelNumberColor

CLUB

FAALicense

AddressPhone

FLIGHT

FlightID

RentalDateReturnDateTimeFlown

MEMBER

NamePhone

RatingTotalHours

OWNER

NamePhone

Address

Page 7: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Outline

ER to Relational Normalization

Page 8: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Premise

We have received one or more tables with data

The data is to be stored in a new database QUESTION: Should the data be stored as

received, or should it be transformed for storage?

Page 9: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Data Redundancy

Number LastName FirstName Email Rating Wage

190 Smith John [email protected] 4 25

673 Doe Jane [email protected] 7 35

312 Doe Bob [email protected] 8 40

152 Johnson Matt [email protected] 7 35

Application constraint: All employees with same rating have the same wage (Rating Wage)

Problems due to data redundancy?

Page 10: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Modification Anomalies

Deletion Anomaly: What if we delete all employees with rating 8?

Insertion Anomaly: What if we have a new employee with new rating 12?

Update Anomaly: What if we change the wage for rating 7 to be 27?

Page 11: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Update Anomalies

The EMPLOYEE table before and after an incorrect update operation on Wage for Rating = 7

Number LastName FirstName Email Rating Wage

190 Smith John [email protected] 4 25

673 Doe Jane [email protected] 7 35

312 Doe Bob [email protected] 8 40

152 Johnson Matt [email protected] 7 35

Number LastName FirstName Email Rating Wage

190 Smith John [email protected] 4 25

673 Doe Jane [email protected] 7 37

312 Doe Bob [email protected] 8 40

152 Johnson Matt [email protected] 7 35

Page 12: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Table decompositionNumber LastName FirstName Email Rating Wage

190 Smith John [email protected] 4 25

673 Doe Jane [email protected] 7 35

312 Doe Bob [email protected] 8 40

152 Johnson Matt [email protected] 7 35

Number LastName FirstName Email Rating

190 Smith John [email protected] 4

673 Doe Jane [email protected] 7

312 Doe Bob [email protected] 8

152 Johnson Matt [email protected] 7

Rating Wage

4 25

7 35

8 40

Problem?

Page 13: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Decisions

Do we have to decompose / merge?

How do we identify problems caused by redundancy? Functional dependencies

Page 14: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Functional Dependency (FD)

A functional dependency: the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s):

Alpha MIDNName

Alpha (DormName, DormRoom)

The attribute on the left side of the functional dependency is called the determinant

Functional dependencies may be based on equations:Charge = NbHours X HourlyPrice

(NbHours, HourlyPrice)Charge

Function dependencies are not equations!

Page 15: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Functional Dependencies Are Not Equations

ObjectColor Weight ObjectColor Shape ObjectColor (Weight, Shape)

Page 16: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Composite Determinants

Composite determinant: A determinant of a functional dependency that consists of more than one attribute (StudentName, ClassName) Grade (NbHours, HourlyPrice) Charge

Page 17: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Functional Dependency (FD) Rules

If A (B, C), then A B and A C If (A,B) C, then neither A nor B

determines C by itself

Page 18: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

FD Facts

A functional dependency is a statement about all allowable instances of a table

You cannot find the functional dependencies simply by looking at some data: Data set limitations Must be logically a determinant

Given some data in a table R, we can check if it violates some FD, but we cannot tell if the FD holds over R!

Page 19: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Functional Dependencies in the SKU_DATA Table

Assuming data is representative, determine the FD

Page 20: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Functional Dependencies in the SKU_DATA TableSKU (SKU_Description, Department, Buyer)

SKU_Description (SKU, Department, Buyer)

Buyer Department

Page 21: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Class ExerciseAssuming all FD that are verified by the data are indeed FD, determine the FD in the ORDER_ITEM Table

Page 22: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

What Makes Determinant Values Unique?

A determinant is unique in a relation if, and only if, it determines every other column in the relation

Unique determinants = superkey

Page 23: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Key

A set of columns is a key for a relation if :

1. No two distinct rows can have same values in all key columns, and

2. This is not true for any subset of the key Part 2 false? A superkey Primary key Alternate key

Page 24: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Normal Forms

Relations are categorized as a normal form based on which modification anomalies or other problems that they are subject to:

Page 25: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Normal Forms

Boyce-Codd Normal Form (BCNF) – A relation is in BCNF if every determinant is a candidate key

“I swear to construct my tables so that all nonkey columns are dependent on the key, the whole key and nothing but the key, so help me Codd.”

Page 26: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Eliminating Modification Anomalies from Functional Dependencies in Relations

Put all relations into Boyce-Codd Normal Form (BCNF):

Page 27: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Putting a Relation into BCNF:EQUIPMENT_REPAIR

Page 28: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Putting a Relation into BCNF:EQUIPMENT_REPAIREQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost,

RepairNumber, RepairDate, RepairAmount)

ItemNumber (Type, AcquisitionCost)

RepairNumber (ItemNumber, Type, AcquisitionCost, RepairDate, RepairAmount)

ITEM (ItemNumber, Type, AcquisitionCost)

REPAIR (ItemNumber, RepairNumber, RepairDate, RepairAmount)

Where REPAIR.ItemNumber must exist in ITEM.ItemNumber

Page 29: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Putting a Relation into BCNF:New Relations

Page 30: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Class Exercise

Put the following relation into BCNF

Page 31: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Multivalued Dependencies

A multivalued dependency occurs when a determinant determines a particular set of values: Employee Degree

Employee Sibling

PartKit Part

The determinant of a multivalued dependency can never be a primary key

Page 32: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Multivalued Dependencies

Page 33: IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu adina.

Eliminating Anomolies from Multivaled Dependencies Multivalued dependencies are not a

problem if they are in a separate relation, so: Always put multivalued dependencies into

their own relation This is known as Fourth Normal Form (4NF)