Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational...

113
Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies 1

Transcript of Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational...

Page 1: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Chapter 15Functional Dependencies and

Normalization for Relational Databases

Chapter 16Relational Database Design Algorithms

and Further Dependencies

1

Page 2: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

2

Functional Dependency Armstrong’s Axioms Closure of FDs Closure of attribute(X) – Find Keys Minimal set of FDs Normalization Lossless Join Decomposition

Page 3: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Outline

3

1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information in Tuples and Update

Anomalies 1.3 Null Values in Tuples 1.4 Spurious Tuples

2 Functional Dependencies (FDs) 2.1 Definition of FD 2.2 Inference Rules for FDs 2.3 Equivalence of Sets of FDs 2.4 Minimal Sets of FDs

Page 4: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Outline

4

3 Normal Forms Based on Primary Keys3.1 Normalization of Relations

3.2 Practical Use of Normal Forms

3.3 Definitions of Keys and Attributes Participating in Keys

3.4 First Normal Form

3.5 Second Normal Form

3.6 Third Normal Form

4 General Normal Form Definitions (For Multiple Keys)

5 BCNF (Boyce-Codd Normal Form)

Page 5: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

5

What is relational database design? The grouping of attributes to form "good"

relation schemas  Two levels of relation schemas

The logical "user view" level The storage "base relation" level

 Design is concerned mainly with base relations

 What are the criteria for "good" base relations? 

Page 6: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

6

A simplified COMPANY relational database schema

Page 7: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

1.2 Redundant Information in Tuples and Update Anomalies

7

Information is stored redundantly Wastes storage Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Page 8: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

8

Page 9: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

9

Page 10: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

10

Page 11: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Functional Dependency:

11

  Attribute B is functionally dependent on

attribute A if at each point in time, each value of A has only one value of B associated with it.

 

Page 12: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Examples:

12

a. SSN ENAME b. PNUMBER {PNAME, PLOCATION} c. {SSN, PNUMBER} HOURS

Page 13: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

functional dependency

13

A functional dependency (X Y)between two sets of attributes X and Y that aresubsets of R specifies a constraint on thepossible tuples that can form a relation state rof R. The constraint is that, for any two tuples

t1 and t2 in r that have t1[X] = t2[X], we must

also have t1[Y] = t2[Y].

Page 14: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

14

--X is a candidate key of R—this implies that X  Y for any subset of attributes Y of R

--If X Y in R, this does not say whether or not Y X in R.

Page 15: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Inference Rules:

15

F = {SSN {ENAME, BDATE, ADDRESS, DNUMBER},

       DNUMBER {DNAME,DMGRSSN}}

Can infer additional FDs such as:

SSN {DNAME, DMGRSSN},

SSN SSN,

DNUMBER DNAME

Page 16: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Inference rules (Armstrong’s Axioms)

16

Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold

Armstrong's inference rules: IR1. (Reflexive) If Y subset-of X, then X -> Y IR2. (Augmentation) If X -> Y, then XZ -> YZ

(Notation: XZ stands for X U Z) IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

IR1, IR2, IR3 form a sound and complete set of inference rules These are rules hold and all other rules that hold can

be deduced from these

Page 17: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Inference rules (Armstrong’s Axioms) -continued

17

IR4 (decomposition, or projective rule):

{X YZ} |= X Y.

IR5 (union, or additive rule):

{X Y, X Z} |= X YZ.

IR6 (pseudotransitive rule)

{X Y, WY Z } |= WX Z.

Page 18: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Armstrong’s Axioms

18

IR1 – Reflexive ruleX Y, then X Y

ssn, name ssn

IR2 – Augmentation ruleX Y |= XZ YZssn name |= ssn, dob

name,dob

Page 19: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

19

IR3 – Transitive rule X Y, Y Z |= X Z

Ssn phone#Phone# zipSsn zip

Page 20: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

PROOF OF IR4 (USING IR1 THROUGH IR3)

20

IR4 (decomposition, or projective, rule):

{X YZ} |= X Y.

1. X YZ (given). 2. YZ Y (using IR1 and knowing that YZ Y). 3. X Y (using IR3 on 1 and 2).

Page 21: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

PROOF OF IR5 (USING IR1 THROUGH IR3)

21

IR5 (union, or additive, rule)

{X Y, X Z} |= X YZ.

X Y (given). X Z (given). X XY (using IR2 on 1 by augmenting with X; notice that XX =

X).

XY YZ (using IR2 on 2 by augmenting with Y). X YZ (using IR3 on 3 and 4).

Page 22: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

PROOF OF IR6 (USING IR1 THROUGH IR3)

22

IR6 (pseudotransitive rule)

{X Y, WY Z } |= WX Z.

X Y (given). WY Z (given). WX WY (using IR2 on 1 by augmenting

with W). WX Z (using IR3 on 3 and 2).

Page 23: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Exercise

23

a) {w y, x z} |= {wx y}{w y, x z} |= {wx y}

w y Given

wx xy Aug(X)

xy y Reflexive

wx y Transitive

STUID DOB

C# C_TITLE

STUID, C# DOB

Page 24: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Closure of FDs (F+)

24

Set of FDs that could be derived from FDs using Armstrong’s Axioms

ssn name ssn dob

=> Ssn name, dob

Page 25: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

closure sets with respect to F (Example)

25

F = { SSN ENAME, NUMBER

{NAME,PLOCATION}, {SSN, PNUMBER}

HOURS}

{ SSN }+ = { SSN, ENAME }

{ PNUMBER }+ = { PNUMBER, PNAME,

PLOCATION }

{ SSN, PNUMBER }+ = { SSN, PNUMBER,

ENAME, PNAME, PLOCATION, HOURS }

Page 26: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Inference Rules for FDs (3)

26

Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F

Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X

X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Page 27: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Closure of attribute X (X+)

27

Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X

Algorithm 16.1 Determining X+, the closure of X

under FX+ := XRepeat old X+ := X+; for each functional dependency Y Z in F do if X+ Y then X+ := X+ U Z;Until ( X+ = old X+ );

p 310

Page 28: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

ExampleR(A,B,C,D,M,F,G)

28

FDs: A B B CD C M D FG

A+ = A = AB (A B) = ABCD (B CD) = ABCDM (C M) = ABCDFGM (DFG)A IS A KEY OF THIS RELATION

Page 29: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

R(STUID,F,M,L,STREET,CITY,STATE,ZIP,C#,CNAME,GRADE)

29

STUID F,M,L,STREET, CITY,STATE,ZIP

C# CNAME

ZIP CITY, STATE

STUID,C# GRADE

(STUID)+ = STUID

= STUID, F,M,L,STREET, CITY,STATE,ZIP

(C#)+ = C#

= C#, CNAME

(STUID,C#)+ = STUID,C# =STUID,C#, F,M,L,STREET, CITY,STATE,ZIP

= STUID,C#, F,M,L,STREET, CITY,STATE,ZIP ,CNAME

= STUID,C#, F,M,L,STREET, CITY,STATE,ZIP ,CNAME,GRADE

Page 30: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Algorithm 16.2 (a): Finding a Key K for R Given a set F of Functional Dependencies

30

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

1. Set K := R;2. For each attribute A in K {

Compute (K - A)+ with respect to F;If (K - A)+ contains all the attributes in R,

then set K := K - {A}; }

Page 31: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Find key of R(ABCDFGMN)

31

FDs:AC B B CD C MN D FG

Is (AC) a candidate key?

(AC)+ = ABCDFGMN

A+ = AC+ = CMN

(AC)+ = AC = ABC (AC B) = ABCD (B CD)

= ABCDMN (C MN) = ABCDMNFG (D FG)

Page 32: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Find key of R(ABCDFGMN)

32

FDs:AC B B CD C MN D FG

A C

Is (AC) a candidate key?

(AC)+ = ABCDFGMN

A+ = AC = ABC (AC B) = ABCD (B CD)

= ABCDMN (C MN) = ABCDMNFG (D FG)

Page 33: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Find key of R(ABCDFGMN)

33

FDs:A BB CDC MND FG

ABCDFGMNACDFGMN A BAFGMN A B, B CDAFG A B, B C, C

MNA A B, B D, D

FG

Page 34: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Find key of R(ABCDFGMN)

34

FDs:AC B B CD C MN D FG

A C

Is (AC) a candidate key?

(AC)+ = ABCDFGMN

A+ = C+ =

Page 35: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Example:

35

Patient(Patient#, Pat_Name, Pat_Adr, Med_Exam_Description, Doctor_Number,

Med_Exam#, Date_Test, Test_Result, Doctor_Name)

Key:

Page 36: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Example:

36

Hotel_Bill(Guest_Id, Room#, Room_Rate, Date_Checkin, Total_Amount, Guest_Fname,

Guest_Lname, InvoiceNumber, Guest_Address)

Key:

Page 37: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

16.1.3 Minimal Sets of Functional Dependencies

37

F is minimal if:

1. Every dependency in F has a single attribute for its right-hand side.

2. We cannot replace any dependency X A in F with a dependency Y A, where Y is a proper subset of X, and still have a set of dependencies that is equivalent to F.

3. We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.

Page 38: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Algorithm 16.2  Finding a minimal cover F for a set of functional dependencies E (p550)

38

Input: a set of functional dependencies E.

1. Set F := E.2. Replace each functional dependency X {A1, A2,…, An}

in F by the n functional dependencies X A1, X A2, . . ., X An .

3. For each functional dependency X A in F

for each attribute B that is an element of X   if ((F - {X A}) D {(X - {B}) A}) is equivalent to F,    then replace X A with (X - {B}) A in F.

4. For each remaining functional dependency X A in F  if (F - {X A}) is equivalent to F,   then remove X A from F.

Page 39: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Example

39

1) The set of functional dependencies F of relation R(ABCDEFGHIJKLM) is

  A A,B,C,D,E

B C,D,E,F

C E,H

D,E F

D F,G

K L

L K

a) Find a minimal cover of the set F.

b) Find a candidate key for the relation R.

Page 40: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

40

Normalization: The process of decomposing complex data structures into simple relations according to a set of dependency rules. – reduce anomalies (update, insert, delete)

first proposed by Codd (1972)

Assumption: all non-key fields will be updated frequently

- tend to penalize retrieval

Page 41: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Problems with update anomalies

41

Insertion anomaliesDeletion anomaliesModification anomalies

Page 42: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

EXAMPLE OF AN UPDATE ANOMALY

42

Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname,

No_hours)

Update Anomaly: Changing the name of project number P1 from

“Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1.

Page 43: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

EXAMPLE OF AN INSERT ANOMALY

43

Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname,

No_hours) Insert Anomaly:

Cannot insert a project unless an employee is assigned to it.

Conversely Cannot insert an employee unless a he/she is

assigned to a project.

Page 44: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

EXAMPLE OF AN DELETE ANOMALY

44

Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname,

No_hours) Delete Anomaly:

When a project is deleted, it will result in deleting all the employees who work on that project.

Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

Page 45: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

15.3 Normalization of Relations (1)

45

Normalization: The process of decomposing

unsatisfactory "bad" relations by breaking up their attributes into smaller relations

Normal form: Condition using keys and FDs of a

relation to certify whether a relation schema is in a particular normal form

Page 46: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Normalization of Relations (2)

46

2NF, 3NF, BCNF based on keys and FDs of a relation schema

4NF based on keys, multi-valued dependencies : MVDs;

5NF based on keys, join dependencies : JDs (Chapter 11)

Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

Page 47: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Definitions of Keys and Attributes Participating in Keys (2)

47

If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily

designated to be the primary key, and the others are called secondary keys.

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

Page 48: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Problems:

48

Page 49: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

49

Page 50: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Decomposition by Analysis and Synthesis techniques

50

Decomposition by analysis Decomposition by synthesis

Page 51: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

51

Page 52: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Normalization into 1NF

52

Page 53: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Normalization of nested relations into 1NF

53

Page 54: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

54

Unnormalized & 1NF relation

Page 55: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

55

Page 56: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

First Normal Form(1NF):

56

  A relation is in first normal form if and only if every attribute is single-valued for each tuple.

-- 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single tuple. In other words, 1NF disallows "relations within relations" or "relations as attributes of tuples." The only attribute values permitted by 1NF are single atomic (or indivisible) values.

Page 57: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

First Normal Form

57

Disallows composite attributes multivalued attributes nested relations; attributes whose values for an

individual tuple are non-atomic

Page 58: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

58

Page 59: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Second Normal Form (1)

59

Uses the concepts of FDs, primary key Definitions

Prime attribute: An attribute that is member of the primary key K

Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more

Examples: {SSN, PNUMBER} -> HOURS is a full FD since neither SSN

-> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a

partial dependency ) since SSN -> ENAME also holds

Page 60: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Second Normal Form (2)

60

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Page 61: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Second Normal Form(2NF):

61

  A relation is in second normal form if and only if it is in first normal form and all the nonkey attributes are fully functionally dependent on the key.

Grade_report(SID, C#, stu_name, course_title)

not in 2NF (there are partial dependencies)

Page 62: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Full functional dependency

62

A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; that is, for any attribute A X, (X - {A}) does not functionally determine Y.

Page 63: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

(definition from the book)

63

Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; that is, for any attribute A X, (X - {A}) does not functionally determine Y. A functional dependency X Y is a partial dependency if some attribute A X can be removed from X and the dependency still holds; that is, for some A X, (X - {A}) Y.

Page 64: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Second normal form is violated when a non-key field is a fact

about a subset of a key. It is only relevant when the key is

composite, i.e., consists of several fields

64

Function Dependencies (F):Warehouse Warehouse_addressPart, Warehouse quantity

William Kent (CACM1982)

Page 65: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

65

Normalizing into 2NF and 3NF

Page 66: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

3.4 Third Normal Form (1)

66

Definition: Transitive functional dependency: a FD X ->

Z that can be derived from two FDs X -> Y and Y -> Z

Examples: SSN -> DMGRSSN is a transitive FD

Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold

SSN -> ENAME is non-transitive Since there is no set of attributes X where SSN -> X and

X -> ENAME

Page 67: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Third Normal Form (2)

67

A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE: In X -> Y and Y -> Z, with X as the primary key, we

consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the

transitive dependency . E.g., Consider EMP (SSN, Emp#, Salary ).

Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Page 68: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

68

Successive Normalization of LOTS into 2NF and 3NF

Page 69: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Normal Forms Defined Informally

69

1st normal form All attributes depend on the key

2nd normal form All attributes depend on the whole key

3rd normal form All attributes depend on nothing but the key

Page 70: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

70

SUMMARY OF NORMAL FORMS based on Primary Keys

Page 71: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

General Normal Form Definitions (For Multiple Keys) (1)

71

The above definitions consider the primary key only

The following more general definitions take into account relations with multiple candidate keys

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R

Page 72: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

General Normal Form Definitions (2)

72

Definition: Superkey of relation schema R - a set of

attributes S of R that contains a key of R A relation schema R is in third normal form

(3NF) if whenever a FD X -> A holds in R, then either: (a) X is a superkey of R, or (b) A is a prime attribute of R

NOTE: Boyce-Codd normal form disallows condition (b) above

Page 73: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

73

Third Normal Form(3NF):  A relation is in third normal form if it is in

second normal form and no nonkey attribute is transitively dependent on the key.

Course_info(C#, course_title, Instructure_name, Inst_phone)

Not in 3NF – there is a transitive dependency

C# course_title, Instructure_nameInstructure_name Inst_phone

Page 74: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

From the book

74

Third normal form (3NF) is based on the concept of transitive dependency. A functional dependency X Y in a relation schema R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and both X Z and Z Y hold.

Codd’s original definition, a relation

schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key

Page 75: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Third normal form is violated when a non-key field is a fact about another non-key field

75

Function Dependencies (F):Employee DepartmentDepartment Location

William Kent ( CACM 1982)

Page 76: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Summary of Normal Forms Based on Primary Keys and corresponding Normalization.

76

First (1NF) Relation should have no nonatomic attributes or nested relations

Form new relations for each nonatomic attribute or nested relation

Second (2NF) For relations where primary key contains multiple attributes, no nonkey attribute should be functionally dependent on a part of the primary key

Decompose and set up a new relation for each partial key with its dependent attribute(s). Make sure to keep a relation with the original primary key and any attributes that are fully functionally dependent on it.

Third (3NF) Relation should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes.) That is, there should be no transitive dependency of a nonkey attribute on the primary key.

Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s).

Page 77: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

77

Unnormalized & 1NF relation

Page 78: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

78

University Database (Third Normal Form relations)

Page 79: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

79

Philip Bernstein, “Synthesizing Third Normal Form Relations from Functional Dependencies,” ACM Transactions on Database Systems, 1976, pp 277-298.

(University of Toronto)

To find 3NF relations - Find Minimal Cover of FDs

- Each FD— convert to 3NF relation - If key of the Universal DB is not in any

relations then create a relation from the key

Page 80: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

80

A) INVOICE(INVOICE#, CUST_ID, CUST_NAME, BILL_DATE, TOT_AMOUNT, SALES_REP#, SALES_REP_NAME)

Key: Normal Form:

3NF relations:

Page 81: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

81

B) PRJ_EMP(PROJECT#, MANAGER_ID, PROJECT_TITLE, EMP_ID, EMP_LNAME, EMP_FNAME, MANAGER_NAME, EXPECT_END_PROJECT_DATE, EMP_HOURLY_RATE,HOURS_WORK_PER_PRJ)

Key: Normal Form:

3NF relations:

Page 82: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

82

Page 83: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Boyce-Codd Normal Form(BCNF):

83

A relation is in Boyce-Codd normal form if and only if every determinant is a candidate key (1974).

(every relation in BCNF is also in 3NF)

Determinant– attribute(s) on the LHS of FDA BC (A is determinant)

Page 84: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

BCNF (Boyce-Codd Normal Form)

84

A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Page 85: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

85

Page 86: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

86

Page 87: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Boyce-Codd Normal Form

87

Page 88: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

A relation TEACH that is in 3NF but not in BCNF

88

Page 89: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Achieving the BCNF by Decomposition (1)

89

Two FDs exist in the relation TEACH: fd1: { student, course} -> instructor fd2: instructor -> course

{student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b). So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations. (See Algorithm 11.3)

Page 90: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Multivalued Dependency

90

  Let R be a relation having attributes or sets of attributes A,B,C. There is a multivalued dependence of attribute B on attribute A if and only if the set of B values associated with a given A value is independent of the C values. 

A > B,CB -/-> C or C -/-> B

Page 91: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Fourth Normal Form(4NF):

91

A relation is in fourth normal form if and only if it is in Boyce-Codd normal form and there are no nontrivial multivalued dependencies.

 

Page 92: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

92

Page 93: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

93

Employee > Skill, Language

Language --/ skill

Employee Skill Language

Employee Skill Employee Language

William Kent (1982)

Page 94: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Fifth Normal Form(5NF):

94

A relation is in fifth normal form if no remaining nonloss projections are possible, except the trivial one in which the key appears in each projection.

Page 95: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

95

Fifth normal form when its information content cannot be reconstructed from several smaller record types

William Kent (CACM 1982)

Page 96: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Domain-Key Normal Form(DKNF):

96

A relation is in domain-key normal form if every constraint is a logical consequence of domain constraints or key constraints.

Page 97: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

97

Page 98: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

98

Page 99: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

99

Page 100: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

100

Page 101: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Exercise!!

101

Classify each of the following relations as 1NF, 2NF, 3NF, BCNF, 4NF, or 5NF. (State any assumptions you make)

a) COURSE(COURSE#,C_TITLE,INS_NAME,TEXT_BOOK,PUBLISHER)

b) ENROLLMENT(STUID,COURSE#,STUNAME,COURSE_TITLE,

GRADE)

c) PART(PART#,PART_NAME,DEPARTMENT,PRICE)

Page 102: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

102

d) EMPLOYEE(EMPID,EMPNAME,DATE_HIRED,JOB_TITLE,

JOB_LEVEL,PROJ#,PROJ_TITLE,DUEDATE)

 

e) BOOK(ISBN,TITLE,AUTHOR,AUTHOR_INSTITUTE,PRICE,

PUBLISHER_NAME,PUBLISHER_ADR)

  

f) CUSTOMER(CUSTID,CUSTNAME, PHONE#,

STREET,CITY,STATE,ZIP)

Page 103: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Lossless Join Decomposition

103

(R1, R2) is a decomposition of R F is a set of functional dependencies

(R1, R2) is lossless join with to F if and only if

R1 R2 R1 – R2 or R1 R2 R2 – R1

And R1 U R2 = R

Page 104: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

104

Page 105: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Dependency preservation

105

Dependency preservation: Let Fi be the set of

dependencies F+ that include only attributes in Ri. dependency preserving

(F1 F2 … Fn)+ = F+

Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

Page 106: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

106

Example -- dependency preservation: R = (A, B, C)

F = {A B, B C)Two ways to decompose this relation:

1) R1 = (A, B), R2 = (B, C) Lossless-join decomposition:

R1 R2 = {B} and R2 - R1 = C Dependency preserving

2) R1 = (A, B), R2 = (A, C) Lossless-join decomposition:

R1 R2 = {A} and R1 - R2 = B

Not dependency preserving (cannot check B C without computing R1 x R2)

3NF – always dependency preservationBCNF – not always

Page 107: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

107

Page 108: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

108

Page 109: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

109

Page 110: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Exercise

110

1) The set of functional dependencies F of relation R(ABCDEFGHIJKLM) is

 A A,B,C,D,E

B C,D,E,F

C E,H

D,E F

D F,G

K L

L K

 

a) Find a minimal cover of the set F.

b) Find a candidate key for the relation R.

c) Find a lossless join decomposition of R into 3NF (also give a proof of lossless join).

 

Page 111: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

111

2) The set of functional dependencies F of relation R(ABCDEFGHIJKL) is

  A ---> B,C,D,E,KBC ---> D,E,F

C ---> K,H

D,B ---> G

D ---> F,G

K ---> L

 

a)      Find a minimal cover of the set F.

b)      Find a candidate key for the relation R.

c)      Find a lossless join decomposition of R into 3NF (also give a proof of lossless join).

 

Page 112: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

112

REFERENCES

1. E.F. Codd, "A Relational Model of Data for Large Shared Data Banks", Comm. ACM 13 (6), June 1970, pp. 377-387. The original paper introducing the relational data model.

2. E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial", ACM SIGFIDET Workshop on Data Description, Access, and Control, Nov. 11-12, 1971, San Diego, California, E.F. Codd and A.L. Dean (eds.). An early tutorial on the relational model and normalization.

3. E.F. Codd, "Further Normalization of the Data Base Relational Model", R. Rustin (ed.), Data Base Systems (Courant Computer Science Symposia 6), Prentice-Hall, 1972. Also IBM Research Report RJ909. The first formal treatment of second and third normal forms.

4. C.J. Date, An Introduction to Database Systems (third edition), Addison-Wesley, 1981. An excellent introduction to database systems, with emphasis on the relational.

5. R. Fagin, "Multivalued Dependencies and a New Normal Form for Relational Databases", ACM Transactions on Database Systems 2 (3), Sept. 1977. Also IBM Research Report RJ1812. The introduction of fourth normal form.

6. R. Fagin, "Normal Forms and Relational Database Operators", ACM SIGMOD International Conference on Management of Data, May 31-June 1, 1979, Boston, Mass. Also IBM Research Report RJ2471, Feb. 1979. The introduction of fifth normal form.

7. W. Kent, "A Primer of Normal Forms", IBM Technical Report TR02.600, Dec. 1973. An early, formal tutorial on first, second, and third normal forms.

8. T.-W. Ling, F.W. Tompa, and T. Kameda, "An Improved Third Normal Form for Relational Databases", ACM Transactions on Database Systems, 6(2), June 1981, 329-346. One of the first treatments of inter-relational dependencies.

Page 113: Chapter 15 Functional Dependencies and Normalization for Relational Databases Chapter 16 Relational Database Design Algorithms and Further Dependencies.

Exercise 14.18 (p498)

113

a) {w y, x z} |= {wx y}{w y, x z} |= {wx y}

w y Given

wx xy Aug(X)

xy y Reflexive

wx y Transitive

STUID DOB

C# C_TITLE

STUID, C# DOB