14-1 Chapter 14 Functional Dependencies and Normalization for Relational Database.
-
Upload
allison-joseph -
Category
Documents
-
view
249 -
download
0
Transcript of 14-1 Chapter 14 Functional Dependencies and Normalization for Relational Database.
14-1
Chapter 14
Functional Dependencies and Normalization for Relational Database
12-1 14-2
1 Informal Design Guidelines for Relational Databases1.1 Semantics of the Relation Attributes1.2 Redundant Information in Tuples and Update Anomalies1.3 Null Values in Tuplesss1.4 Generation of Spurious Tuples
2 Functional Dependencies (FDs)2.1 Definition of FD2.2 Inference Rules for FDs2.3 Equivalence of Sets of FDs2.4 Minimal Sets of FDs
3 Normal Forms Based on Primary Keys3.1 Introduction to Normalization3.2 First Normal Form3.3 Second Normal Form3.4 Third Normal Form
4 General Normal Form Definitions (For Multiple Keys)5 BCNF (Boyce-Codd Normal Form)
12-2 14-3
1.Informal Design Guidelines for Relatio1.Informal Design Guidelines for Relational Databasesnal Databases
• What is relational database design?The grouping of attributes to form “good” relation schemas
• Two levels of relation schemas:
– The logical “user view” level
– The storage “base relation” level
• Design is concerned mainly with base relations
• What are the criteria for “good” base relations?
12-2 14-4
1 Informal Design Guidelines for Relatio1 Informal Design Guidelines for Relational Databases (Cont.)nal Databases (Cont.)
• We first discuss informally guidelines for good relational design
• Then we discuss formal concepts of functional dependencies and normal forms– 1 NF (First Normal Form)
– 2 NF (Second Normal Form)
– 3 NF (Third Normal Form)
– BCNF (Boyce-Codd Normal Form)
• Additional types of dependencies, further normal forms, relational design algorithms are discussed in Chapter 15
12-3 14-5
1.1 Semantics of the Relation Attributes
• Informally, each tuple should represent one entity or relationship instance
• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation
• Only foreign keys should be used to refer to other entities (see 14-7) Figure 14.1
• semantics of attributes• reducing redundant values in tuples• reducing null values in tuples• disallowing spurious tuples
Informal measures
12-3 14-6
1.2 Redundant Information in Tuples and Update Anomalies
• Mixing attributes of multiple entities may cause problems
• Information is stored redundantly (i.e., wasting storage (see 14-11))
• Problems with update anomalies:– Insertion anomalies– Deletion anomalies– Modification anomalies
12-4 14-7
Figure 14.1 Simplified version of the COMPANY relational database schema
12-4a 14-8
Figure 14.2 Example relations for the schema of Figure 14.1
12-4a 14-9
Figure 14.2 Example relations for the schema of Figure 14.1 (Cont.)
12-5 14-10
GUIDELINE 1.• Design a relation schema so that it is easy to
explain its meaning.
• Do not combine attributes from multiple entity types and relationship types into a single relation
EMPLOYEE * DEPARTMENT
attributes from department
attributes from project
12-6 14-11
12-6 14-12
12-7 14-13
Insertion Anomalies
• To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that employee works for or nulls. (if the employee does not work for a department)
12-7 14-14
Insertion Anomalies (Cont.)
• It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relationPlace null values??SSN is a primary keythe first employee is assigned
12-8 14-15
Deletion Anomalies
• If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost.
12-8 14-16
Modification Anomalies
• In EMP_DEPT, if we change the value of one of attributes of a particular department, we must update the tuples of all employees who work in that department.
12-8 14-17
GUIDELINE 2.
Design the base relation schemes so that no
insertion deletion, or modification anomalies
are present.
Cost: join is needed
(view definition)
12-9 14-18
1.3 Null Values in Tuples
• Relation should be designed such that their tuples will have few NULL values if possible.
• Attributes that are NULL frequentlycould be placed in separate relations (with the primary key)
not applicableunknownknown but absent
Waste spacejoinaggregate COUNT. SUM. AVG.
problems
Office numbers (~ 10%)EMP_OFFICES (ESSN, OFFICE_NUMBER)
12-9 14-19
1.4 Spurious Tuples
• Bad designs for a relational database may result in erroneous results for certain JOIN operations
• The “lossless join” property is used to guarantee meaningful results for join operations
• The relations should be designed to satisfy the lossless join condition
• Discussed in Chapter 15
12-10 14-20
12-10 14-21
=ΠENAME, PLOCATION(EMP_PROJ) see 14-22
=ΠSSN,PNUMBER,HOURS,PNAME,PLOCATION(EMP_PROJ)
12-11 14-22
12-12 14-23
EMP_PROJ1 * EMP_LOC
12-12 14-24
GUIDELINE 4.
• Design relation schemas so that they can be joined with equality conditions on attributes that either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated.
12-13 14-25
2 Functional Dependencies2 Functional Dependencies
• Functional dependencies (FDs) are used to specify formal measures of the ‘goodness’ of relational designs
• FDs and keys are used to define normal forms for relations
• FDs are constrains that are derived from the meaning and interrelationships of the data attributes
12-13 14-26
2.1 Definition of FD
• A set of attributes X functionally determines a set of attributes if the value of X determines a unique value for Y
• Written as X→ Y; can be displayed graphically on a relation schema as in Figure 14.3 (see 14-11)
• Specifies a constraint on all relation instances r(R)
12-13/14 14-27
2.1 Definition of FD (Cont.)
• For any two tuples t1 and t2 in any relation instance r(R): If tIf t11[X]= t[X]= t22[X], then t[X], then t11[Y]= t[Y]= t22[Y][Y] X is a candidate key of R ⇒ X→ Y for any subset Y of R
• X→ Y holds if whenever two tuples have the same value for X, they must have the same value for Y
• FDs are derived from the real-world constrains on the attributes
12-14 14-28
Examples of FD constraints:
• Social security number determines employee name SSN → ENAME
• Project number determines project name and location PNUMBER →{PNAME, PLOCATION}
12-14 14-29
Examples of FD constraints: (Cont.)
• Employee SSN and project number determines the hoursper week that the employee works onthe project
{SSN, PNUMBER} → HOURS• An FD is a property of the attributes in the schema R• The constraint must hold on every relation instance r(R)• If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct tuples with tt11[K]= t[K]= t22[K][K]))
TEACHTEACHER
SmithSmithHall
Brown
COURSED.S.D.M.
CompilersD.S.
TEXTBartramAl-NourHoffman
Augenthaler
TEACHER → COURSECOURSE → TEXTTEXT → COURSE(P)
12-14a 14-30
Inference Rules for Functional Dependencies Designer specifies the functional dependencies that
are semantically obvious.
closure of closure of FF (( Closure of Closure of FF ) = { X ) = { X →→Y | Y | FF ㅑㅑ X X →→Y}Y}
FF ㅑㅑ X X →→Y :Y : XX →→Y isY is inferred from inferred from FF whenever r (whenever r ( an extension of R an extension of R ))
satisfies all the dependencies in satisfies all the dependencies in FF , , X X →→Y also holds in r.Y also holds in r. F F = { SSN →{ENAME, BDATE, ADDRESS, DNUMBER},= { SSN →{ENAME, BDATE, ADDRESS, DNUMBER},
DNUMBER →{DNAME, DMGRSSN}} DNUMBER →{DNAME, DMGRSSN}}
ㅑㅑ SSN → { DNAME, DMGRSSN}SSN → { DNAME, DMGRSSN}
SSN → SSNSSN → SSN DNUMBER → DNAMEDNUMBER → DNAME
12-15 14-31
2.2 Inference Rules for FDs
• Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold
12-15 14-32
Armstrong’s inference rules:
• notations{X,Y}→Z {X,Y}→Z ≡≡ XY→ Z,{X,Y,Z}→{U,V} XY→ Z,{X,Y,Z}→{U,V} ≡≡ XYZ → UV XYZ → UV
A1. (Reflexive) If YA1. (Reflexive) If Y X, then X ⊆X, then X ⊆ →Y →Y (trivial dependency)(trivial dependency)
A2. (Augmentation) If X A2. (Augmentation) If X →Y, then XZ →YZ→Y, then XZ →YZ (Notation: XZ stands for X Z)∪ (Notation: XZ stands for X Z)∪
A3. (Transitive) If A3. (Transitive) If X X →Y and Y →Z, then X →Z→Y and Y →Z, then X →Z
• A1,A2,A3 form a sound and complete set of inference A1,A2,A3 form a sound and complete set of inference rulesrules
12-15 14-33
Some additional inference rules that are useful:
(Decomposition) If X →YZ, then X →Y and X→Z
(Union) If X →Y and X →Z, then X →YZ
(Psuedotransitivity) If X →Y and WY→Z, then WX →Z
• The last three inference rules, as well as any other inference rule can be deduced from A1, A2, and A3(completeness property)
12-16 14-34
A1. (Reflexive) If YIf Y X, then X ⊆X, then X ⊆ →Y→Y
Proof.
Assume t1, t2 r of R∈ and t1[X] = t2[X]
∵ Y X t1[Y] = t2[Y]⊆ ∴ t1[X] = t2[X]
12-16 14-35
A2. (Augmentation) If X →Y, then XZ →YZIf X →Y, then XZ →YZ
Proof.
Assume X→Y holds in a r of R. and XZ→YZ does not hold
t1, t2 r∈1) t1[X] =t2[X]
2) t1[Y] =t2[Y]
3) t1[XZ] = t2[XZ]
4) t1[YZ] ≠ t2[YZ]
5) t1[Z] = t2[Z]
6) t1[YZ] = t2[YZ]
7) XZ →YZ
(X→Y)
XZ→YZ ()
1) 3)
2) 5)
3) 6) contradiction
12-17 14-36
A3. (Transitive) If If X X →Y and Y →Z, then X →Z→Y and Y →Z, then X →Z
Proof.
t1, t2 r of R and t1[X] = t2[X]∈1) X →Y (given)
2) Y →Z (given)
3) t1[Y] = t2[Y] t1[X] = t2[X] & (1)
4) t1[Z] = t2[Z] (3) & (2)
5) X →Z t1[X] = t2[X] & (4)
12-17 14-37
Decomposition Rule {X →YZ}{X →YZ} ㅑㅑ X →Y X →Y
1) X →YZ (given)
2) YZ →Y (Reflexive rule)
3) X →Y (Transitive rule)
12-17 14-38
Union Rule{X →Y, X →Z}{X →Y, X →Z} ㅑㅑ X →YZ X →YZ
1) X →Y (given)
2) X →Z (given)
3) X →XY augmenting on 1 with X
4) XY →YZ augmenting on 2 with Y
5) X →YZ transitive rule on (2) & (4)
12-18 14-39
Pseudotransitive Rule{X →Y, WY →Z}{X →Y, WY →Z} ㅑㅑ WX →Z WX →Z
1) X →Y (given)
2) WY →Z (given)
3) WX →WY (augmenting on 1with W)
4) WX →Z (transitive rule on (3) & (2))
12-19 14-40
2.2 Inference Rules for FDs (Cont.)
• Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F FF + + = { X →Y | F= { X →Y | F ㅑ ㅑ X →Y }X →Y }
• Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by XXX + + = { Y | F= { Y | F ㅑ ㅑ X →Y }X →Y }
• X+ can be calculated by repeatedly applying A1, A2, A3 using the FDs in F
12-20 14-41
Algorithm 12.1 Determining X+
X+ := X;repeat oldX+ := X+; for each functional dependency Y →Z in F do if Y X⊆ + then X+ Z∪until (oldX+ = X+ );
12-20 14-42
Example
F = { SSN → ENAME, PNUMBER → {PNAME, PLOCATION},
{SSN, PNUMBER} → HOURS}
{SSN}+ = {SSN,ENAME}
{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}
{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}
{SSN, PNUMBER} is a key
12-19 14-43
2.3 Equivalence of Sets of FDs
• Two sets of FDs F and G are equivalent if:
– Every FD in F can be inferred from G, and
– Every FD in G can be inferred from F
• Hence, F and G are equivalent if F+ = G+
• Definition: F covers G if every FD in G can be inferred from F (i.e., if G+ F⊆ +)
12-19 14-44
2.3 Equivalence of Sets of FDs (Cont.)
• F and G are equivalent if F covers G and G covers F
• There is an algorithm for checking equivalence of sets of FDs
F covers E: X→Y E compute X∀ ∈ + w.r.t. FF check Y X∈ +
E covers F: X→Y F compute X∀ ∈ + w.r.t. EE check Y X∈ +
12-21 14-45
2.4 Minimal Sets of FDs
• A set of FDs is minimal if it satisfies the following conditions:
1) Every dependency in F has a single attribute for its RHS.
2) We cannot remove any dependency from F and have a set of dependencies that is equivalent to F
3) We cannot replace any dependency X →A in F with a dependency Y → A, where Y X and still have a set of dependencies ⊂that is equivalent to F.
12-21 14-46
2.4 Minimal Sets of FDs (Cont.)
• Every set of FDs has an equivalent minimal set
• There can be several equivalent minimal sets
• Having a minimal set is important for some relational design algorithms (see Chapter 15)
12-21a 14-47
Algorithm 14.2 Finding a minimal cover G for F
1. Set G : F.﹦
2. Replace each functional dependency X→{A1,A2,…,An} in G by the
n functional dependencies X →A1, X →A2,…, X →An.
3. For each functional dependency X → A in G
for each attribute B that is an element of X
if (( G - {X → A}) {( X ∪ - {B}) →A} ) is equivalent to G,
then replace X → A with ( X - {B}) → A in G.
4. For each remaining functional dependency X → A in G
if (G - {X → A}) is equivalent to G,
then remove X → A from G.
12-22 14-48
3 Normal Forms Based on Primary Keys3 Normal Forms Based on Primary Keys
3.1 Introduction to Normalization
• Normalization: Process of decomposing unsatisfactory “bad” relations by breaking up their attributes into smaller relations
• Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form
12-22 14-49
3.1 Introduction to Normalization (Cont.)
• 2NF, 3NF, BCNF based on keys and FDs of a relation schemaprime attribute : member of any keynonprime attribute
• 4NF based on keys, MVDs; 5NF based on keys, JDs (Chapter 15)
• Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 15)
12-22 14-50
3.2 First Normal Form
• Disallows composite attributes, multivalued attributes, and nested relations: attributes whose values for an individual tuple are non-atomic
• Considered to be part of the definition of relation
12-23 14-51
Figure 14.8
(a) A relation schema that is not in 1NF
(b) Example relation instance
12-23 14-52
Figure 14.8 (Cont.)
(c) 1NF relation with redundancy
alternative 1
SSN → PLOCATIONKEY:{DNUMBER,DLOCATION}
alternative 2
(better)
SSN → DLOCATION
12-24 14-53
Figure 14.9 (a)
A nested relation PROJS within EMP_PROJ
Primary key Partial key
EMP_PROJ (SSN, ENAME, {PROJS (PNUMBER, HOURS)})EMP_PROJ (SSN, ENAME, {PROJS (PNUMBER, HOURS)})
12-24 14-54
Figure 14.9 (b) Example extension of the EMP_PROJ relation showing nested relations within each tuple.
12-24 14-55
Figure 14.9 (c)
Decomposing EMP_PROJ into 1NF relations by migrating the primary key
12-25 14-56
3.3 Second Normal Form
• Uses the concepts of FDs, primary key
Definitions:
• Prime attribute – attribute that is member of the primary key K (candidate key??)
• Full functional dependency –
a FD Y →Z where removal of any attribute
from Y means the FD does not hold any
more. ∀ A Y, ( Y∈ - {A}) →Z ×
12-25 14-57
Example:{SSN, PNUMBER} →HOURS is a full FD since neither
SSN → HOURS nor PNUMBER → HOURS hold{SSN, PNUMBER} →ENAME is not a full FD (it is
called partial dependency) since SSN →ENAME also holds
3.3 Second Normal Form (Cont.)
∃ A Y, ( Y∈ - {A}) → Z (i.e., A=PNUMBER)
12-25 14-58
• A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary keyprime attribute K→A trivial dependency
• R can be decomposed into 2NF relations via the process of 2NF normalization
3.3 Second Normal Form (Cont.)
12-26 14-59
Figure 12.10fd2 and fd3 violate 2NF,i.e., ENAME, PNAME, and PLOCATION partially dependent on {SSN, PNUMBER}
SSN→DNUMBERDNUMBER →DMGRSSNㅑㅑ SSN →DMGRSSN
2NF (O)3NF (X)
It is not a primary key
12-26a 14-60
Y→X (non-trivial dependency) ≡ t1, t2 r, if t1[Y] = t2[Y] then t1[X] = t2[X]∀ ∈ 有可能 t1[Y] ≠ t2[Y], 但是 t1[X] = t2[X]X →Z (non-trivial dependency) 只要上述可能性發生 , 資料就重複
12-26a 14-61
SSN ( 或 PNUMBER) 僅是 key 的一部份,而非 key ,表示可能有一個以上的 tuples 具有相同的值,再加上 SSN→ENAME PNUMBER→PNAME PLOCATION相依部分也會重複
12-27 14-62
3.4 Third Normal Form
Definition:
• Transitive functional dependency-a FD Y→Z that can be derived from two FDs Y→X and X →Z
nontrivial dependencyX is not a subset of any key
12-27 14-63
Examples:SSN→DMGRSSN is a transitive FD since
SSN→DNUMBER and DNUMBER→DMGRSSN holdSSN→ENAME is non-transitive since there is no set of
attributes X where SSN→X and X→ENAME
3.4 Third Normal Form (Cont.)
12-27 14-64
3.4 Third Normal Form (Cont.)
• A relation schema R is in third normal form
(3NF) if it is in 2NF and no non-prime
attribute A in R is transitively dependent on
the primary key (see 14-59/60/61 ) Figure 12.10
• R can be decomposed into 3NF relations via
the process of 3NF normalization
12-28 14-65
4. General Normal Form Definitions(For Multiple Keys)
• The above definitions consider the primary key only
• The following more general definitions take into account relations with multiple candidate keys
• A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R (see Figure 14.11)
12-29 14-66
Figure 14.11(a)
Parcels of lands for sale in various counties of a state
Candidate keys:PROPERTY_ID#{COUNTY_NAME, LOT#}
Partial dependency
12-29 14-67
Figure 14.11 (b)
transitive dependency
12-28 14-68
Definition:• Superkey of relation schema R- a set of attr
ibutes S of R that contains a key of R• A relation schema R is in third normal form
(3NF) if whenever a FD X →A holds in R, then either: (a) X is a superkey of R, or(b) A is a prime attribute of R(see 14-67/68/69)Figure 14.11
• Boyce-Codd normal form disallows condition (b) above
•A: nonprime transitive dependency key Y Y →X Y →A X →A•X: proper subset of a key key Y Y →X Y →A X →A partial dependency
12-29 14-69
Figure 14.11 (c) (d)
fd5
Marion 0.5County 0.6 0.7 0.8 0.9 1.0 Liberty 1.1County 1.2 : 1.9 2.0
12-30 14-70
5 BCNF (Boyce-Codd Normal Form)5 BCNF (Boyce-Codd Normal Form)
• A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever a FD X →A holds in R, then X is a superkey of R (14-71a) Figure 14.12
• Each normal form is strictly stronger than the previous one: Every 2NF relation is in 1NFEvery 3NF relation is in 2NFEvery BCNF relation is in 3NF
• There exist relations that are in 3NF but not in BCNF (14-71b) Figure 14.12
12-31 14-71
Figure 14.12 (a) BCNF normalization with the dependency of FD2 being ‘lost’ in the decomposition
(b) A relation R in 3NF but not in BCNF
Non-prime: Cprime: A. B
14-32 14-72
Three possible decompositions:
1. {STUDENT, INSTRUCTOR} and { STUDENT, COURSE}
2. {COUSE, INSTRUCTOR} and { COURSE, STUDENT}
3. {INSTRUCTOR, COURSE} and { INSTRUCTOR, STUDENT}
generate spurious tuples
generate spurious tuples
lossless join
“lost” FD1
FD1FD2
3NF, but not BCNF
14-33 14-73
STUDENT INSTRUCTOR COURSESTUDENT
14-34 14-74
INSTRUCTORCOURSE STUDENTINSTRUCTOR
12-30 14-75
5 BCNF (Boyce-Codd Normal Form) Cont.5 BCNF (Boyce-Codd Normal Form) Cont.
• The goal is to have each relation in BCNF (or 3NF)
• Additional criteria may be needed to ensure the set of relations in a relational database are satisfactory (see Chapter 15)
– Lossless join property
– Dependency preservation property
• Additional normal forms are discussed in Ch. 15
– 4NF (based on multi-valued dependencies)
– 5NF (based on join dependencies)