14-1 Chapter 14 Functional Dependencies and Normalization for Relational Database.

14-1

Chapter 14

Functional Dependencies and Normalization for Relational Database

12-1 14-2

1 Informal Design Guidelines for Relational Databases1.1 Semantics of the Relation Attributes1.2 Redundant Information in Tuples and Update Anomalies1.3 Null Values in Tuplesss1.4 Generation of Spurious Tuples

2 Functional Dependencies (FDs)2.1 Definition of FD2.2 Inference Rules for FDs2.3 Equivalence of Sets of FDs2.4 Minimal Sets of FDs

3 Normal Forms Based on Primary Keys3.1 Introduction to Normalization3.2 First Normal Form3.3 Second Normal Form3.4 Third Normal Form

4 General Normal Form Definitions (For Multiple Keys)5 BCNF (Boyce-Codd Normal Form)

12-2 14-3

1.Informal Design Guidelines for Relatio1.Informal Design Guidelines for Relational Databasesnal Databases

• What is relational database design?The grouping of attributes to form “good” relation schemas

• Two levels of relation schemas:

– The logical “user view” level

– The storage “base relation” level

• Design is concerned mainly with base relations

• What are the criteria for “good” base relations?

12-2 14-4

1 Informal Design Guidelines for Relatio1 Informal Design Guidelines for Relational Databases (Cont.)nal Databases (Cont.)

• We first discuss informally guidelines for good relational design

• Then we discuss formal concepts of functional dependencies and normal forms– 1 NF (First Normal Form)

– 2 NF (Second Normal Form)

– 3 NF (Third Normal Form)

– BCNF (Boyce-Codd Normal Form)

• Additional types of dependencies, further normal forms, relational design algorithms are discussed in Chapter 15

12-3 14-5

1.1 Semantics of the Relation Attributes

• Informally, each tuple should represent one entity or relationship instance

• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation

• Only foreign keys should be used to refer to other entities (see 14-7) Figure 14.1

• semantics of attributes• reducing redundant values in tuples• reducing null values in tuples• disallowing spurious tuples

Informal measures

12-3 14-6

1.2 Redundant Information in Tuples and Update Anomalies

• Mixing attributes of multiple entities may cause problems

• Information is stored redundantly (i.e., wasting storage (see 14-11))

• Problems with update anomalies:– Insertion anomalies– Deletion anomalies– Modification anomalies

12-4 14-7

Figure 14.1 Simplified version of the COMPANY relational database schema

12-4a 14-8

Figure 14.2 Example relations for the schema of Figure 14.1

12-4a 14-9

Figure 14.2 Example relations for the schema of Figure 14.1 (Cont.)

12-5 14-10

GUIDELINE 1.• Design a relation schema so that it is easy to

explain its meaning.

• Do not combine attributes from multiple entity types and relationship types into a single relation

EMPLOYEE ＊ DEPARTMENT

attributes from department

attributes from project

12-6 14-11

12-6 14-12

12-7 14-13

Insertion Anomalies

• To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that employee works for or nulls. (if the employee does not work for a department)

12-7 14-14

Insertion Anomalies (Cont.)

• It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relationPlace null values??SSN is a primary keythe first employee is assigned

12-8 14-15

Deletion Anomalies

• If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost.

12-8 14-16

Modification Anomalies

• In EMP_DEPT, if we change the value of one of attributes of a particular department, we must update the tuples of all employees who work in that department.

12-8 14-17

GUIDELINE 2.

Design the base relation schemes so that no

insertion deletion, or modification anomalies

are present.

Cost: join is needed

(view definition)

12-9 14-18

1.3 Null Values in Tuples

• Relation should be designed such that their tuples will have few NULL values if possible.

• Attributes that are NULL frequentlycould be placed in separate relations (with the primary key)

not applicableunknownknown but absent

Waste spacejoinaggregate COUNT. SUM. AVG.

problems

Office numbers (~ 10%)EMP_OFFICES (ESSN, OFFICE_NUMBER)

12-9 14-19

1.4 Spurious Tuples

• Bad designs for a relational database may result in erroneous results for certain JOIN operations

• The “lossless join” property is used to guarantee meaningful results for join operations

• The relations should be designed to satisfy the lossless join condition

• Discussed in Chapter 15

12-10 14-20

12-10 14-21

=ΠENAME, PLOCATION(EMP_PROJ) see 14-22

=ΠSSN,PNUMBER,HOURS,PNAME,PLOCATION(EMP_PROJ)

12-11 14-22

12-12 14-23

EMP_PROJ1 ＊ EMP_LOC

12-12 14-24

GUIDELINE 4.

• Design relation schemas so that they can be joined with equality conditions on attributes that either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated.

12-13 14-25

2 Functional Dependencies2 Functional Dependencies

• Functional dependencies (FDs) are used to specify formal measures of the ‘goodness’ of relational designs

• FDs and keys are used to define normal forms for relations

• FDs are constrains that are derived from the meaning and interrelationships of the data attributes

12-13 14-26

2.1 Definition of FD

• A set of attributes X functionally determines a set of attributes if the value of X determines a unique value for Y

• Written as X→ Y; can be displayed graphically on a relation schema as in Figure 14.3 (see 14-11)

• Specifies a constraint on all relation instances r(R)

12-13/14 14-27

2.1 Definition of FD (Cont.)

• For any two tuples t1 and t2 in any relation instance r(R): If tIf t11[X]= t[X]= t22[X], then t[X], then t11[Y]= t[Y]= t22[Y][Y] X is a candidate key of R ⇒ X→ Y for any subset Y of R

• X→ Y holds if whenever two tuples have the same value for X, they must have the same value for Y

• FDs are derived from the real-world constrains on the attributes

12-14 14-28

Examples of FD constraints:

• Social security number determines employee name SSN → ENAME

• Project number determines project name and location PNUMBER →{PNAME, PLOCATION}

12-14 14-29

Examples of FD constraints: (Cont.)

• Employee SSN and project number determines the hoursper week that the employee works onthe project

{SSN, PNUMBER} → HOURS• An FD is a property of the attributes in the schema R• The constraint must hold on every relation instance r(R)• If K is a key of R, then K functionally determines all

attributes in R (since we never have two distinct tuples with tt11[K]= t[K]= t22[K][K]))

TEACHTEACHER

SmithSmithHall

Brown

COURSED.S.D.M.

CompilersD.S.

TEXTBartramAl-NourHoffman

Augenthaler

TEACHER → COURSECOURSE → TEXTTEXT → COURSE(P)

12-14a 14-30

Inference Rules for Functional Dependencies Designer specifies the functional dependencies that

are semantically obvious.

closure of closure of FF (( Closure of Closure of FF ) = { X ) = { X →→Y | Y | FF ㅑㅑ X X →→Y}Y}

FF ㅑㅑ X X →→Y :Y : XX →→Y isY is inferred from inferred from FF whenever r (whenever r ( an extension of R an extension of R ))

satisfies all the dependencies in satisfies all the dependencies in FF , , X X →→Y also holds in r.Y also holds in r. F F = { SSN →{ENAME, BDATE, ADDRESS, DNUMBER},= { SSN →{ENAME, BDATE, ADDRESS, DNUMBER},

DNUMBER →{DNAME, DMGRSSN}} DNUMBER →{DNAME, DMGRSSN}}

ㅑㅑ SSN → { DNAME, DMGRSSN}SSN → { DNAME, DMGRSSN}

SSN → SSNSSN → SSN DNUMBER → DNAMEDNUMBER → DNAME

12-15 14-31

2.2 Inference Rules for FDs

• Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold

12-15 14-32

Armstrong’s inference rules:

• notations{X,Y}→Z {X,Y}→Z ≡≡ XY→ Z,{X,Y,Z}→{U,V} XY→ Z,{X,Y,Z}→{U,V} ≡≡ XYZ → UV XYZ → UV

A1. (Reflexive) If YA1. (Reflexive) If Y X, then X ⊆X, then X ⊆ →Y →Y (trivial dependency)(trivial dependency)

A2. (Augmentation) If X A2. (Augmentation) If X →Y, then XZ →YZ→Y, then XZ →YZ (Notation: XZ stands for X Z)∪ (Notation: XZ stands for X Z)∪

A3. (Transitive) If A3. (Transitive) If X X →Y and Y →Z, then X →Z→Y and Y →Z, then X →Z

• A1,A2,A3 form a sound and complete set of inference A1,A2,A3 form a sound and complete set of inference rulesrules

12-15 14-33

Some additional inference rules that are useful:

(Decomposition) If X →YZ, then X →Y and X→Z

(Union) If X →Y and X →Z, then X →YZ

(Psuedotransitivity) If X →Y and WY→Z, then WX →Z

• The last three inference rules, as well as any other inference rule can be deduced from A1, A2, and A3(completeness property)

12-16 14-34

A1. (Reflexive) If YIf Y X, then X ⊆X, then X ⊆ →Y→Y

Proof.

Assume t1, t2 r of R∈ and t1[X] = t2[X]

∵ Y X t1[Y] = t2[Y]⊆ ∴ t1[X] = t2[X]

12-16 14-35

A2. (Augmentation) If X →Y, then XZ →YZIf X →Y, then XZ →YZ

Proof.

Assume X→Y holds in a r of R. and XZ→YZ does not hold

t1, t2 r∈1) t1[X] =t2[X]

2) t1[Y] =t2[Y]

3) t1[XZ] = t2[XZ]

4) t1[YZ] ≠ t2[YZ]

5) t1[Z] = t2[Z]

6) t1[YZ] = t2[YZ]

7) XZ →YZ

(X→Y)

XZ→YZ ()

1) 3)

2) 5)

3) 6) contradiction

12-17 14-36

A3. (Transitive) If If X X →Y and Y →Z, then X →Z→Y and Y →Z, then X →Z

Proof.

t1, t2 r of R and t1[X] = t2[X]∈1) X →Y (given)

2) Y →Z (given)

3) t1[Y] = t2[Y] t1[X] = t2[X] & (1)

4) t1[Z] = t2[Z] (3) & (2)

5) X →Z t1[X] = t2[X] & (4)

12-17 14-37

Decomposition Rule {X →YZ}{X →YZ} ㅑㅑ X →Y X →Y

1) X →YZ (given)

2) YZ →Y (Reflexive rule)

3) X →Y (Transitive rule)

12-17 14-38

Union Rule{X →Y, X →Z}{X →Y, X →Z} ㅑㅑ X →YZ X →YZ

1) X →Y (given)

2) X →Z (given)

3) X →XY augmenting on 1 with X

4) XY →YZ augmenting on 2 with Y

5) X →YZ transitive rule on (2) & (4)

12-18 14-39

Pseudotransitive Rule{X →Y, WY →Z}{X →Y, WY →Z} ㅑㅑ WX →Z WX →Z

1) X →Y (given)

2) WY →Z (given)

3) WX →WY (augmenting on 1with W)

4) WX →Z (transitive rule on (3) & (2))

12-19 14-40

2.2 Inference Rules for FDs (Cont.)

• Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F FF + + = { X →Y | F= { X →Y | F ㅑ ㅑ X →Y }X →Y }

• Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by XXX + + = { Y | F= { Y | F ㅑ ㅑ X →Y }X →Y }

• X+ can be calculated by repeatedly applying A1, A2, A3 using the FDs in F

12-20 14-41

Algorithm 12.1 Determining X+

X+ := X;repeat oldX+ := X+; for each functional dependency Y →Z in F do if Y X⊆ + then X+ Z∪until (oldX+ = X+ );

12-20 14-42

Example

F = { SSN → ENAME, PNUMBER → {PNAME, PLOCATION},

{SSN, PNUMBER} → HOURS}

{SSN}+ = {SSN,ENAME}

{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}

{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}

{SSN, PNUMBER} is a key

12-19 14-43

2.3 Equivalence of Sets of FDs

• Two sets of FDs F and G are equivalent if:

– Every FD in F can be inferred from G, and

– Every FD in G can be inferred from F

• Hence, F and G are equivalent if F+ = G+

• Definition: F covers G if every FD in G can be inferred from F (i.e., if G+ F⊆ +)

12-19 14-44

2.3 Equivalence of Sets of FDs (Cont.)

• F and G are equivalent if F covers G and G covers F

• There is an algorithm for checking equivalence of sets of FDs

F covers E: X→Y E compute X∀ ∈ + w.r.t. FF check Y X∈ +

E covers F: X→Y F compute X∀ ∈ + w.r.t. EE check Y X∈ +

12-21 14-45

2.4 Minimal Sets of FDs

• A set of FDs is minimal if it satisfies the following conditions:

1) Every dependency in F has a single attribute for its RHS.

2) We cannot remove any dependency from F and have a set of dependencies that is equivalent to F

3) We cannot replace any dependency X →A in F with a dependency Y → A, where Y X and still have a set of dependencies ⊂that is equivalent to F.

12-21 14-46

2.4 Minimal Sets of FDs (Cont.)

• Every set of FDs has an equivalent minimal set

• There can be several equivalent minimal sets

• Having a minimal set is important for some relational design algorithms (see Chapter 15)

12-21a 14-47

Algorithm 14.2 Finding a minimal cover G for F

1. Set G : F.﹦

2. Replace each functional dependency X→{A1,A2,…,An} in G by the

n functional dependencies X →A1, X →A2,…, X →An.

3. For each functional dependency X → A in G

for each attribute B that is an element of X

if (( G － {X → A}) {( X ∪ － {B}) →A} ) is equivalent to G,

then replace X → A with ( X － {B}) → A in G.

4. For each remaining functional dependency X → A in G

if (G － {X → A}) is equivalent to G,

then remove X → A from G.

12-22 14-48

3 Normal Forms Based on Primary Keys3 Normal Forms Based on Primary Keys

3.1 Introduction to Normalization

• Normalization: Process of decomposing unsatisfactory “bad” relations by breaking up their attributes into smaller relations

• Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

12-22 14-49

3.1 Introduction to Normalization (Cont.)

• 2NF, 3NF, BCNF based on keys and FDs of a relation schemaprime attribute : member of any keynonprime attribute

• 4NF based on keys, MVDs; 5NF based on keys, JDs (Chapter 15)

• Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 15)

12-22 14-50

3.2 First Normal Form

• Disallows composite attributes, multivalued attributes, and nested relations: attributes whose values for an individual tuple are non-atomic

• Considered to be part of the definition of relation

12-23 14-51

Figure 14.8

(a) A relation schema that is not in 1NF

(b) Example relation instance

12-23 14-52

Figure 14.8 (Cont.)

(c) 1NF relation with redundancy

alternative 1

SSN → PLOCATIONKEY:{DNUMBER,DLOCATION}

alternative 2

(better)

SSN → DLOCATION

12-24 14-53

Figure 14.9 (a)

A nested relation PROJS within EMP_PROJ

Primary key Partial key

EMP_PROJ (SSN, ENAME, {PROJS (PNUMBER, HOURS)})EMP_PROJ (SSN, ENAME, {PROJS (PNUMBER, HOURS)})

12-24 14-54

Figure 14.9 (b) Example extension of the EMP_PROJ relation showing nested relations within each tuple.

12-24 14-55

Figure 14.9 (c)

Decomposing EMP_PROJ into 1NF relations by migrating the primary key

12-25 14-56

3.3 Second Normal Form

• Uses the concepts of FDs, primary key

Definitions:

• Prime attribute – attribute that is member of the primary key K (candidate key??)

• Full functional dependency –

a FD Y →Z where removal of any attribute

from Y means the FD does not hold any

more. ∀ A Y, ( Y∈ － {A}) →Z ×

12-25 14-57

Example:{SSN, PNUMBER} →HOURS is a full FD since neither

SSN → HOURS nor PNUMBER → HOURS hold{SSN, PNUMBER} →ENAME is not a full FD (it is

called partial dependency) since SSN →ENAME also holds

3.3 Second Normal Form (Cont.)

∃ A Y, ( Y∈ － {A}) → Z (i.e., A=PNUMBER)

12-25 14-58

• A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary keyprime attribute K→A trivial dependency

• R can be decomposed into 2NF relations via the process of 2NF normalization

3.3 Second Normal Form (Cont.)

12-26 14-59

Figure 12.10fd2 and fd3 violate 2NF,i.e., ENAME, PNAME, and PLOCATION partially dependent on {SSN, PNUMBER}

SSN→DNUMBERDNUMBER →DMGRSSNㅑㅑ SSN →DMGRSSN

2NF (O)3NF (X)

It is not a primary key

12-26a 14-60

Y→X (non-trivial dependency) ≡ t1, t2 r, if t1[Y] = t2[Y] then t1[X] = t2[X]∀ ∈ 有可能 t1[Y] ≠ t2[Y], 但是 t1[X] = t2[X]X →Z (non-trivial dependency) 只要上述可能性發生 , 資料就重複

12-26a 14-61

SSN ( 或 PNUMBER) 僅是 key 的一部份，而非 key ，表示可能有一個以上的 tuples 具有相同的值，再加上 SSN→ENAME PNUMBER→PNAME PLOCATION相依部分也會重複

12-27 14-62

3.4 Third Normal Form

Definition:

• Transitive functional dependency-a FD Y→Z that can be derived from two FDs Y→X and X →Z

nontrivial dependencyX is not a subset of any key

12-27 14-63

Examples:SSN→DMGRSSN is a transitive FD since

SSN→DNUMBER and DNUMBER→DMGRSSN holdSSN→ENAME is non-transitive since there is no set of

attributes X where SSN→X and X→ENAME

3.4 Third Normal Form (Cont.)

12-27 14-64

3.4 Third Normal Form (Cont.)

• A relation schema R is in third normal form

(3NF) if it is in 2NF and no non-prime

attribute A in R is transitively dependent on

the primary key (see 14-59/60/61 ) Figure 12.10

• R can be decomposed into 3NF relations via

the process of 3NF normalization

12-28 14-65

4. General Normal Form Definitions(For Multiple Keys)

• The above definitions consider the primary key only

• The following more general definitions take into account relations with multiple candidate keys

• A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R (see Figure 14.11)

12-29 14-66

Figure 14.11(a)

Parcels of lands for sale in various counties of a state

Candidate keys:PROPERTY_ID#{COUNTY_NAME, LOT#}

Partial dependency

12-29 14-67

Figure 14.11 (b)

transitive dependency

12-28 14-68

Definition:• Superkey of relation schema R- a set of attr

ibutes S of R that contains a key of R• A relation schema R is in third normal form

(3NF) if whenever a FD X →A holds in R, then either: (a) X is a superkey of R, or(b) A is a prime attribute of R(see 14-67/68/69)Figure 14.11

• Boyce-Codd normal form disallows condition (b) above

•A: nonprime transitive dependency key Y Y →X Y →A X →A•X: proper subset of a key key Y Y →X Y →A X →A partial dependency

12-29 14-69

Figure 14.11 (c) (d)

fd5

Marion 0.5County 0.6 0.7 0.8 0.9 1.0 Liberty 1.1County 1.2 : 1.9 2.0

12-30 14-70

5 BCNF (Boyce-Codd Normal Form)5 BCNF (Boyce-Codd Normal Form)

• A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever a FD X →A holds in R, then X is a superkey of R (14-71a) Figure 14.12

• Each normal form is strictly stronger than the previous one: Every 2NF relation is in 1NFEvery 3NF relation is in 2NFEvery BCNF relation is in 3NF

• There exist relations that are in 3NF but not in BCNF (14-71b) Figure 14.12

12-31 14-71

Figure 14.12 (a) BCNF normalization with the dependency of FD2 being ‘lost’ in the decomposition

(b) A relation R in 3NF but not in BCNF

Non-prime: Cprime: A. B

14-32 14-72

Three possible decompositions:

1. {STUDENT, INSTRUCTOR} and { STUDENT, COURSE}

2. {COUSE, INSTRUCTOR} and { COURSE, STUDENT}

3. {INSTRUCTOR, COURSE} and { INSTRUCTOR, STUDENT}

generate spurious tuples

generate spurious tuples

lossless join

“lost” FD1

FD1FD2

3NF, but not BCNF

14-33 14-73

STUDENT INSTRUCTOR COURSESTUDENT

14-34 14-74

INSTRUCTORCOURSE STUDENTINSTRUCTOR

12-30 14-75

5 BCNF (Boyce-Codd Normal Form) Cont.5 BCNF (Boyce-Codd Normal Form) Cont.

• The goal is to have each relation in BCNF (or 3NF)

• Additional criteria may be needed to ensure the set of relations in a relational database are satisfactory (see Chapter 15)

– Lossless join property

– Dependency preservation property

• Additional normal forms are discussed in Ch. 15

– 4NF (based on multi-valued dependencies)

– 5NF (based on join dependencies)

14-1 Chapter 14 Functional Dependencies and Normalization for Relational Database.

Documents

Transcript of 14-1 Chapter 14 Functional Dependencies and Normalization for Relational Database.