Functional Dependencies and Normalization for Relational Databases

Click here to load reader

download Functional Dependencies and Normalization  for  Relational Databases

of 62

description

CHAPTER 10. Functional Dependencies and Normalization for Relational Databases. Informal Design Guidelines for Relational Databases. Semantics of the Relation Attributes Reducing the redundant values in tuples Reducing the null values in tuples - PowerPoint PPT Presentation

Transcript of Functional Dependencies and Normalization for Relational Databases

Introduction and Conceptual Modeling

Functional Dependencies andNormalization for Relational DatabasesCHAPTER 10Informal Design Guidelines for Relational DatabasesSemantics of the Relation AttributesReducing the redundant values in tuplesReducing the null values in tuplesDisallowing the possibility of generating spurious tuplesSemantics of the Relation AttributesGUIDELINE 1: Each tuple in a relation should represent one entity or relationship instance.Attributes of different entities should not be mixed in the same relation.Only foreign keys should be used to refer to other entitiesEntity and relationship attributes should be kept apart as much as possible.

Semantics of the Relation AttributesExample:

Redundant Information in Tuples andUpdate AnomaliesMixing attributes of multiple entities may cause problemsInformation is stored redundantly wasting storageProblems with update anomaliesInsertion anomaliesDeletion anomaliesModification anomalies

Thng tin d tha trong cc b v s d thng cp nht

5Redundant Information in Tuples andUpdate Anomalies

Redundant Information in Tuples andUpdate AnomaliesConsider the relation:EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)Update Anomaly: Changing the name of project number P1 from Billing to Customer-Accounting may cause this update to be made for all 100 employees working on project P1. Insert Anomaly: Cannot insert a project unless an employee is assigned to .Inversely - Cannot insert an employee unless an he/she is assigned to a project7Redundant Information in Tuples andUpdate AnomaliesDelete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project8Redundant Information in Tuples andUpdate AnomaliesGUIDELINE 2: Guideline to Redundant Information in Tuples and Update AnomaliesDesign the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.Null Values in TuplesGUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relationsReasons for nulls:Attribute not applicable or invalidAttribute value unknown (may exist)Value known to exist, but unavailable

Cc gi tr khng xc nh trong cc b : Quan h phi c thit k sao cho b d liu c t gi tr NULL cng tt10Spurious TuplesBad designs for a relational database may result in erroneous results for certain JOIN operationsThe "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.

Nhiu khi chng ta a vo c s d liu nhng quan h khng ng, vic p dng cc php ton (nht l cc php ni) s sinh ra cc b gi tr khng ng, gi l cc b gi. Sinh ra cc b githit k khng tt i vi mt c s d liu quan h c th dn n kt qu sai lch cho cc hot ng nht nh JOINCc thuc tnh "lossless join" c s dng m bo kt qu c ngha cho cc hot ng ca JOIN11Functional dependenciesConsider the following relation schema

{Plane} StartTime{Pilot, StartDate, StartTime} Plane{Plane, StartDate} Pilot

Functional dependenciesA functional dependency is a constraint between two sets of attributes from the database. Suppose that relational database schema has n attributes R{A1, A2, An}, X and Y are subset of R.Definition. A functional dependency between two sets of attributes X and Y: XY, specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X]=t2[X] t1[Y] = t2[Y] .Functional dependenciesExample:

Functional dependenciesWe say that there is a functional dependency from X to Y, or that Y is functionally dependent on X.The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side.X Y (t1.X = t2.X t1.Y = t2.Y)Functional dependenciesExample: Social security number determines employee name: SSN -> ENAMEProject number determines project name and location PNUMBER -> {PNAME, PLOCATION}Employee SSN and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER} -> HOURS

Functional dependencies If a constraint on R states that there cannot be more than one tuple with a given X value in any relation instance r(R), that is, X is a candidate key of R, this implies that XY for any subset of attributes Y of R If XY in R, this does not say whether or not Y X in RIf K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])

Nu c mt rng buc trn cc trng thi ca R l ch c mt b gi tr duy nht ca X trong mi th hin quan h r(R) th iu ko theo X Y vi mi tp con cc thuc tnh Y ca R. Mt ph thuc hm l mt tnh cht ca lc quan h R ch khng phi l tnh cht ca mt trng thi hp php r ca R. V vy, mt ph thuc hm khng 17Functional dependencies(FDs)A functional dependency is a property of the semantics or meaning of the attributes.Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states of R. A functional dependency is a property of the relation schema R, not of a particular legal relation state r of R. Hence, an FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R

Inference Rules for FDsGiven a set of FDs F on relation schema R. Any other functional dependencies hold in all legal relation instances that satisfy the dependencies in F can be said to be inferred or deduced from the FDs in F.The following six inference rules for functional dependencies (Armstrong's inference rules)IR1 (reflexive : phn x): If X Y, then XY.IR2. (Augmentation) If X Y, then XZ YZ.IR3. (Transitive) If X Y and Y Z, then X Z

Cc quy tc suy din i vi cc ph thuc hm 19Inference Rules for FDsIR4(Decomposition):If X YZ, then X Y and X ZIR5(Union):If X Y and X Z, then X YZIR6(pseudo-transitive) If X Y and WY Z, then WX ZThe last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

Inference Rules for FDsClosure : the set of all dependencies that can be inferred from F is called the closure of F; it is denoted by F+.Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X.X + can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Inference Rules for FDsAlgorithm: Determining X+, the Closure of : the set of attribute X under FX+= X;repeatoldX+= X+;for each functional dependency Y Z in F doIf X+ Y then X += X+ Z;until (X+ = oldX+);Inference Rules for FDsExample 1: Let Q be a relation with attributes (A,B,C,D,E,G,H) and let the following functional dependencies holdF={f1: BA; f2: DACE; f3: DH; f4: GH C; f5: ACD}Find the closure X+ of X = {AC}.Answer: X+=ACDEHInference Rules for FDsExample 2: Let Q be a relation with attributes (A,B,C,D,E,G) and let the following functional dependencies holdF = { f1: A C; f2: A EG; f3: B D; f4: G E} Find the closure X+ and Y+ of X = {A,B}; Y = {C,G,D} Answer: X+ = {ABCDEG} , Y+= {CGDE}Inference Rules for FDsExample 3:F = {SSN ENAME,PNUMBER {PNAME, PLOCATION},{SSN, PNUMBER} HOURS}Calculate the following closure sets with respect to F;{SSN }+ = {SSN, ENAME}{PNUMBER }+ = {PNUMBER, PNAME, PLOCATION}{SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}Inference Rules for FDsAlgorithm: Determining F+Find all subsets of relation Q+Find all possible functional dependencies of QFind all subset closure of QBased on the closure of all subsets sought to determine which the functional dependencies belong F+Inference Rules for FDsExample: Q(A,B,C) F = {AB C,C B} F+ ?All subset of attributes

Closure of all subsetsA+ = AB+ = BC+ = BCAC+ = ABCAB+ = ABCBC+ = BC

ABC{A}{B}{C}{A,B}{A,C}{B,C}{A,B,C}Inference Rules for FDsAll functional dependencies:ABABCBCABCFCACBCF+ACBCF+BCACAABAABCBACABACF+CBFCABCACABCF+BCABCACBABBCABBCF+CABACBF+BCAAACBABBABCABABCF+CACACABF+BCABInference Rules for FDsResult:

F+ = {ABC, ABAC,ABBC, ABABC,CB,CBC,ACB, ACAB,ACBC, ACABC} Bc 1: Tm tt c tp con ca Q Bc 2: Tm bao ng ca tt c tp con ca Q+Bc 4: Da vo bao ng ca cc tp con tm suy ra cc ph thuc hm thuc F+. V d bao ng A+ = A ch gm cc ph thuc hm hin nhin bao ng {AB}+ = ABC cho cc ph thuc hm khng hin nhin sau ABC,ABAC,ABBC,ABABC (Tm tt c cc tp con ca {ABC} ri b cc tp con ca {AB}) Cc tp con ca {ABC} l: , {A},{B},{AB},{C},{AC},{BC},{ABC} B cc tp con ca {AB} l: , {A},{B},{AB},{C},{AC},{BC},{ABC} Cc tp cn li chnh l v phi ca ph thuc hm c v tri l AB 29Equivalence of Sets of FDsTwo sets of FDs F and G are equivalent if:Every FD in F can be inferred from G, andEvery FD in G can be inferred from FHence, F and G are equivalent if F + =G +Definition: F covers G if every FD in G can be inferred from F (i.e., if G + subset-of F +)F and G are equivalent if F covers G and G covers FThere is an algorithm for checking equivalence of sets of FDs

30Minimal Sets of FDsA set of FDs is minimal if it satisfies the following conditions:Every dependency in F has a single attribute for its right-hand side. Example:F = {A B, A C , B C, AB D}Cannot replace any dependency X A in F with a dependency YA, where Y is a subset of X, and still have a set of dependencies that is equivalent to F. Example: F= {ABC; BC} F l tp ph thuc hm c v tri khng d tha F l tp ph thuc hm c v phi mt thuc tnh. F l tp ph thuc hm khng d tha

31Minimal Sets of FDsCannot remove any dependency from F and still have a set of dependencies that is equivalent to F.Example:F = {ABC, BD, ABD} redundant because:F F= {ABC, BD}

Minimal Sets of FDsAlgorithm 10.2: Finding a Minimal Cover (ph ti thiu)for a set of Functional Dependencies FRemove all FDs have left hand side redundant Converse all FDs with the right hand side of more than one attribute to FDs having the right hand side of one attribute Remove all redundant FDsBc 1: loi khi F cc ph thuc hm c v tri d tha. Bc 2: Tch cc ph thuc hm c v phi trn mt thuc tnh thnh cc ph thuc hm c v phi mt thuc tnh. Bc 3: loi khi F cc ph thuc hm d tha. 33Minimal Sets of FDsExample: Let Q be a relation with attributes Q(A,B,C,D) and FD F={AB CD, B C, C D} Finding a Minimal Cover of FABCD is FD with the left hand side redundant A because B+=BCD B CD F+=>F{B CD;B C;C D} F{B D; B C;C D}=F1mc , In F1tt , B D is redundant.F{B C;C D}=FmcMinimal Sets of FDsExample: Q(MSCD,MSSV,CD,HG) F = {MSCD CD; CD MSCD; CD,MSSV HG; MSCD,HG MSSV; CD,HG MSSV; MSCD,MSSV HG} Find minimal cover for FD F Result: Fmc = { MSCD CD; CD MSCD; CD,HG MSSV; MSCD,MSSV HG}Normalization of RelationsNormalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relationsNormal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

Normalization of Relations2NF, 3NF, BCNF based on keys and FDs of a relation schema4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11)Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

Practical Use of Normal FormsNormalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detectThe database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF)Denormalization: the process of storing the join of higher normal form relations as a base relationwhich is in a lower normal form Definitions of Keys and Attributes Participating in KeysA super key of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] A key K is a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more.

Definitions of Keys and Attributes Participating in KeysAlgorithm 10.3: Finding all key for a relation Q and set of Functional Dependencies FDetermine all subset of Q+: X1, X2, , XnFind closure of the Xi.Super key is Xi that they are closure =Q+, suppose that set supper key are S= {S1, S2, Sn}Create the set contain all key of Q from S by consider subset Si, Sj of SS (i j), if Si Sj then remove Sj (i,j=1..n).

Definitions of Keys and Attributes Participating in KeysExample:

Definitions of Keys and Attributes Participating in KeysExample:

Definitions of Keys and Attributes Participating in KeysAlgorithm 10.4: Finding one key K for a relation Q and set of Functional Dependencies F:Step 1: Assign K = Q+Step 2: A is an attribute of K, K=K-A. If K+=Q+ then K=K.

Definitions of Keys and Attributes Participating in KeysExample:U={A,B,C,D,E} , F={AB->C, AC->B, BC->DE} Find KStep1: K=U > K=ABCDEStep2:(K\A)+=>(BCDE)+=BCDE U+ K=ABCDEStep3:(K\B)+=>(ACDE)+= ABCDE = U+ K=ACDEStep4: (K\C)+ =>(ADE)+ = ADE U+ K=ACDEStep5: (K\D)+ => (ACE)+ = ACEBD=U+K=ACEStep6: (K\E)+ =>(AC)+ = ACBDE =U+ K=ACDefinitions of Keys and Attributes Participating in KeysIf a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys.A Prime attribute (thuc tnh kha) must be a member of some candidate keyA Nonprime attribute is not a prime attributethat is, it is not a member of any candidate key.

First Normal FormFirst normal form (INF):Disallow multivalued attributes, composite attributes, and their combinations. It states that the domain of an attribute must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. Mt lc quan h Q dng chun 1 nu ton b cc thuc tnh ca mi b u mang gi tr n. 46First Normal Form

Second Normal FormSecond normal form (2NF) is based on the concept of full functional dependency. A functional dependency XY is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.Examples:{SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds Second Normal FormDefinition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R.A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary keyR can be decomposed into 2NF relations via the process of 2NF normalization Second Normal FormAlgorithm 10.4 :The test for 2NFGive a relational Q, a set FD F Determine Q belong 2FN ?Step1: Find all key of Q Step2: With each key K, Find closure of all the subsets S of KStep3: If there are S+ contain Nonprime attribute then Q is not 2NF else Q is 2NFSecond Normal FormExample: Q(A,B,C,D) F={ABC; BD; BCA}. Is Q 2NF?

D is nonprime attribute Q is not 2NF

Third Normal FormThird normal form (3NF) is based on the concept of transitive dependency.Transitive functional dependency A functional dependency XZ that can be derived from two FDs X Y and Y Z Examples:SSN DMGRSSN is a transitive FD sinceSSNDNUMBER and DNUMBER DMGRSSN hold

Third Normal Form Definition: A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key.Example:

Third Normal FormNOTE:In X Y and Y Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency .EMP (SSN, Emp#, Salary ). SSNEmp#Salary and Emp# is a candidate key.

General Normal Form Definitions (For Multiple Keys)The above definitions consider the primary key onlyThe following more general definitions take into account relations with multiple candidate keysPartial and full functional dependencies and transitive dependencies will now be considered with respect to all candidate keys of a relation.General Definition of 2NFDefinition: A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not partially dependent on any key of R.Example:

General Definition of 2NF In LOT relation (above example), Suppose that there are two candidate keys: PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique only within each county, but PROPERTY_ID numbers are unique across counties for the entire state. Based on the two candidate keys PROPERTY_ID# and {COUNTY_NAME, LOT#}, we know that functional dependencies FD1 and FD2 hold. We choose PROPERTY_ID# as the primary key,

General Definition of 3NFDefinition: A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional dependency XA holds in R, either X is a super key of R, or A is a prime attribute of R.

BCNF (Boyce-Codd Normal Form)Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF. Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency XA holds in R, then X is a super key of RBCNF (Boyce-Codd Normal Form)Each normal form is strictly stronger than the previous oneEvery 2NF relation is in 1NFEvery 3NF relation is in 2NFEvery BCNF relation is in 3NFThere exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)

BCNF (Boyce-Codd Normal Form) Example: Q(A,B,C,D,E,I) F={ACDEBI;CEAD}. Q is a BCNF?

F Ftt={ACDE,ACDB,ACDI,CEA,CED}The feft hand side of Ftt are super key Q is BCNF

BCNF (Boyce-Codd Normal Form)Example: