Normalization ISYS 464. Database Design Based on ERD Strong entity: Create a table that includes all...

28
Normalization ISYS 464

Transcript of Normalization ISYS 464. Database Design Based on ERD Strong entity: Create a table that includes all...

Normalization

ISYS 464

Database Design Based on ERD • Strong entity: Create a table that includes all simple attributes

– Composite• Weak entity: add owner primary key• Multi-valued attribute: Create a table for each multi-valued

attribute– Key + attribute

• Relationship: – 1:1, 1:M

• Relationship table: for partial participation to avoid null• Foreign key

– M:M: relationship table– N-ary relationship: relationship table– Recursive relationship

• Attribute of relationship• Superclass and subclass• Note: The database designed according to these

rules will meet the 3NF requirements.

Database design objectives

• Eliminate data duplication.

• Link related records in related tables.

ExampleEmployee/Dependent report:

EmpID: E101 Ename: Peter

Address: 123 XYZ St

DependentName Relationship DOB

Nancy Daughter 1/1/95

Alan Son 12/25/03

EmpDependent Table:

EmpID EmpName Address DepName Relation DepDOBE101 Peter 123 XYZ St Nancy D 1/1/95

E101 Peter 123 XYZ St Alan S 12/25/03

Note: This database is able to produce the report, but has duplicated data.

Update Anomalies Due To Duplication

• Modification anomaly:– Inconsistent data

• Insertion Anomalies:– Enter an employee with no dependent– Null

• Deletion Anomaly:– If Nancy and Alan become independent.

If we mix multivalue attribute with regular attributes in one table

• Employee Table:– SSN, Ename, Sex, DOB, Phone– Employee may have more than 1 phone.

• Key: SSN or SSN + Phone

• Duplication ?

Example 2

• EmpDependent table:– EmpID, Ename, Address, Depname, Relation,

DepDOB

• Key: EmpID + Depname

If we mix two entities with 1:M relationship in one table

• FacultyStudent table:– Faculty Advise Student: 1:M relationship– FID, Fname, SID, Sname, SAddress

• Key: SID

• Duplication?

If we mix two entities with M:M relationship in one table

• StudentCourse table:– SID, Sname, GPA, CID, Cname, Units

• Key: SID + CID

• Duplication?

Normalization• Decompose unsatisfactory relation into smaller

relations with desirable properties.– No duplication

• The original relation can be recovered by applying natural join to the smaller relations.– So that no information is lost in the process.

• Keys and function dependency:– Which field is the key field of the EMpDependent

Table?• EmpID + DepName

Function Dependency

• Relationship between attributes

• X -> Y– The value of X uniquely determines the value

of Y.– Y is functionally dependent on X.– A value of X is associated with only one value

of Y.

Example• Employee table:

– SSN Ename Sex DOB– S1 Peter M 1/1/75– S2 Paul M 12/25/80– S3 Mary F 7/4/72

• Function Dependencies:– SSN -> Ename, SSN ->Sex, SSN -> DOB– SSN -> Ename, Sex, DOB

• Any other FD:– Ename -> SSN ?– Ename -> Sex ?– DOB -> SSN ?

• What is the key of Employee table:– SSN

• Observations:– All non-key fields are functionally dependent on SSN.

– There is no other FD.

– The only FD is the key dependency.

– There is no data duplication in the Employee table.

Normalization Process

• Inputs: – A “universal relation”

– Function dependencies

• Output: Normalized tables• Process:

– Decompose the unnormalized relation into smaller relations such that in each relation the non key fields are functionally dependent on the key, the whole key, and nothing but the key. So help me Codd!

First Normal Form

• The fields of a relation are all simple attribute.– All relational database tables meet this

requirement.

• EmpDependent table:– EmpID, Ename, Address, Depname, Relation, DepDOB

– First normal form? Yes

– Second normal form?

Second Normal Form

• The non-key fields are functionally dependent on the key, and the whole key.– FD:

• EmpID ->Ename, Address

– Key: EmpID + Depname – Ename and Address depend on part of the key.

• Every non-key field is fully functionally dependent on the key.

• Decompose the EMpDependent table into two tables:– EmpID, Ename, Address– EmpID, Depname, Relation, DepDOB

• Employee Table:– SSN, Ename, Sex, DOB, Phone– Employee may have more than 1 phone.

• FD:– SSN -> Ename, Sex, DOB,– SSN -> Phone ?

• Key: SSN + Phone• 2NF? No• Decompose into two tables:

– SSN, Ename, Sex, DOB– SSN, Phone

• FacultyStudent table:– Faculty Advise Student: 1:M relationship– FID, Fname, Office, SID, Sname, SAddress

• FD:– FID -> Fname, Office– SID -> Sname, SAddress, FID, Fname, Office

• Key: SID• 2NF ? Yes• Duplication? Yes• Why?

– All non-key fields depend on the whole key, but not Nothing But the Key!

• SID -> FID, Fname, Office• FID -> Fname, Office

Transitive Dependency

• If X -> Y, and Y->Z then X -> Z.

• Z if transitively dependent on the key.

• SID -> FID, FID -> Fname, Office– SID -> Fname, Office– Fname and Office are transitively dependent on

SID.

Third Normal Form

• Every non-key field is:– Fully functionally dependent on the key, and– Non-transitively dependent on the key.

• Decompose:– FID, Fname, Office– SID, FID, Sname, SAddress

ExampleCustomer/Orders report:

CID: C101 Cname: Peter

Address: 123 XYZ St

OID Odate SalesPerson Amount

O25 1/1/04 John 125

O30 2/25/04 Alan 500

CustomerOrders Table:CID CName Address OID Odate SalesPerson Amount

C101 Peter 123 XYZ St O25 1/1/04 John 125

C101 Peter 123 XYZ St O30 2/25/04 Alan 500

Example

• Key: OID• FD:

– OID -> CID, Cname, Address, Odate, SalesPerson, Amount– CID -> Cname, Address

• 2NF? Yes• 3 NF? No• Decompose:

– CID, Cname, Address– OID, CID, Odate, SalesPerson, Amount

Example with 1:M Relationship

• FacultyStudent table:– Faculty Advise Student: 1:M relationship– FID, Fname, SID, Sname, SAddress

• FD: – FID -> Fname– SID -> Sname, Saddress

• Key: SID• 2NF? Yes• 3NF? No, because SID ->FID, FID -> Fname• Decompose:

– Table 1: FID, Fname– Tablw 2: SID, FID, Sname, SAddress

Example with M:M Relationship

• StudentCourse table:– SID, Sname, GPA, CID, Cname, Units

• Key: SID + CID• Function Dependencies:

– SID -> Sname, GPA– CID -> Cname, Units

• 2NF? No– Decompose:

• Table 1: SID -> Sname, GPA• Table 2: CID -> Cname, Units • Table 3: SID, CID

• 3NF? Yes

Online Shopping Cart

Customer ShoppingCart

Product

Has

Has

1 M

M

M

CID CnameAddr CartID Date

Qty

PIDPname

Price

Normalized Database

• Universal Relation:– CID, Cname, Addr, CartID, Date, PID, Pname, Price, Qty

• Key: CartID + PID• FDs:

– CartID -> Date, CID, Cname, Addr– CID -> Cname, Addr– PID -> Pname, Price

• Normalized database:– CID, Cname, Addr– CartID, Date, CID– PID, Pname, Price– CartID, PID, Qty

Database Design Based on ERD • Strong entity: Create a table that includes all simple attributes

– Composite• Weak entity: add owner primary key• Multi-valued attribute: Create a table for each multi-valued

attribute– Key + attribute

• Relationship: – 1:1, 1:M

• Relationship table: for partial participation to avoid null• Foreign key

– M:M: relationship table– N-ary relationship: relationship table– Recursive relationship

• Attribute of relationship• Superclass and subclass• Note: The database designed according to these

rules will meet the 3NF requirements.

Denormalization

• The refinement to the relational schema such that the degree of normalization for a modified relation is less than the degree of at least one of the original relations.

• Objective:– Speed up processing