Database Systems. DataBase System Haichang Gao, Software School, Xidian University 2 Major Content &...

Click here to load reader

download Database Systems. DataBase System Haichang Gao, Software School, Xidian University 2 Major Content & Grade  Introduction*  The Relational Model***

of 86

Transcript of Database Systems. DataBase System Haichang Gao, Software School, Xidian University 2 Major Content &...

  • Slide 1
  • Database Systems
  • Slide 2
  • DataBase System Haichang Gao, Software School, Xidian University 2 Major Content & Grade Introduction* The Relational Model*** SQL**** Transaction Management*** Database Design (E-R)*** Database Design (Normalization)***
  • Slide 3
  • DataBase System Haichang Gao, Software School, Xidian University 3 Introduction Functional Dependencies Normal Forms Lossless Decompositions Additional Design Considerations Part2 normalization
  • Slide 4
  • DataBase System Haichang Gao, Software School, Xidian University 4 Normalization ( ) is another approach to logical design of a relational database. E-R approach and normalization approach reinforce each other. Normalization starts with a real-world situation to be modeled and lists the data items that are candidates to become column names in relational tables, together with a list of rules about the relatedness of these data items. The aim is to represent all these data items as attributes of tables that obey restrictive conditions associated with what we call normal forms ( ). 1NF --> 2NF --> 3NF --> BCNF --> 4NF --> 5NF Introduction
  • Slide 5
  • DataBase System Haichang Gao, Software School, Xidian University 5 Design of the Bank Database branch = (branch_name, branch_city, assets) customer = (customer_id, customer_name, customer_street, customer_city) loan = (loan_number, amount) account = (account_number, balance) employee = (employee_id. employee_name, telephone_number, start_date) dependent_name = (employee_id, dname) account_branch = (account_number, branch_name) loan_branch = (loan_number, branch_name) borrower = (customer_id, loan_number) depositor = (customer_id, account_number) cust_banker = (customer_id, employee_id, type) works_for = (worker_employee_id, manager_employee_id) payment = (loan_number, payment_number, payment_date, payment_amount) savings_account = (account_number, interest_rate) checking_account = (account_number, overdraft_amount)
  • Slide 6
  • DataBase System Haichang Gao, Software School, Xidian University 6 Design of the Bank Database Suppose we combine borrow and loan to get Schema: bor_loan = (customer_id, loan_number, amount ) Instance: Result is possible repetition of information For borrower is M:N relationship
  • Slide 7
  • DataBase System Haichang Gao, Software School, Xidian University 7 Design of the Bank Database Consider combining loan_branch and loan Schema: loan_amt_br = (loan_number, amount, branch_name) Instance: No repetition For loan_branch is 1:N relationship
  • Slide 8
  • DataBase System Haichang Gao, Software School, Xidian University 8 Design of the Bank Database Example (decompose, ) : we cannot reconstruct the original employee relation
  • Slide 9
  • DataBase System Haichang Gao, Software School, Xidian University 9 Design of the Bank Database combining loan_branch and loan into Schema: loan_amt_br = (loan_number, amount, branch_name) Is a good relation schema combine borrow and loan to get Schema: bor_loan = (customer_id, loan_number, amount ) Is NOT a good relation schema Decide whether a particular relation R is in good or NOT? Suppose we had started with bor_loan. How would we know to split up (decompose, ) it into borrower and loan? Normalization theory is the tools used to solve those questions.
  • Slide 10
  • DataBase System Haichang Gao, Software School, Xidian University 10 Employee Information: A Running Example From one up to a large number of skills useful to the company
  • Slide 11
  • DataBase System Haichang Gao, Software School, Xidian University 11 Employee Information: A Running Example
  • Slide 12
  • DataBase System Haichang Gao, Software School, Xidian University 12 Update Anomaly ( ) A table T is subject to an update anomaly when changing a single attribute value for an entity instance or relationship instance represented in the table may require that several rows of T be updated. Anomalies of a Bad Database Design
  • Slide 13
  • DataBase System Haichang Gao, Software School, Xidian University 13 Delete Anomaly A table T is subject to a delete anomaly when deleting some row of the table to reflect the disappearance of some instance of an entity or relationship can cause us to lose information about some instance of a different entity or relationship that we do not wish to forget. Anomalies of a Bad Database Design
  • Slide 14
  • DataBase System Haichang Gao, Software School, Xidian University 14 Insert Anomaly We cannot represent information about some entity or instance without including information about some other instance of an entity or relationship that does not exist. Anomalies of a Bad Database Design
  • Slide 15
  • DataBase System Haichang Gao, Software School, Xidian University 15 Redundant Data An entity instance or relationship instance represented in a table T may account for several rows of T. Anomalies of a Bad Database Design
  • Slide 16
  • DataBase System Haichang Gao, Software School, Xidian University 16 Normalize the relation Anomalies of a Bad Database Design decompose
  • Slide 17
  • DataBase System Haichang Gao, Software School, Xidian University 17 Functional Dependencies ( ) The functional dependency holds on R if and only if for any legal relations r(R), whenever any two tuples t 1 and t 2 of r agree on the attributes ( ), they also agree on the attributes . That is, t 1 [ ] = t 2 [ ] t 1 [ ] = t 2 [ ] ( ) Functional Dependencies
  • Slide 18
  • DataBase System Haichang Gao, Software School, Xidian University 18 In the emp_info table, we get Functional Dependencies emp_id emp_name emp_id emp_phone? emp_id dep_name ?, and emp_phone emp_id
  • Slide 19
  • DataBase System Haichang Gao, Software School, Xidian University 19 Analyze the following tables (suppose they are valid) Functional Dependencies T2: A B B A T1: A B B A T3: A B B A
  • Slide 20
  • DataBase System Haichang Gao, Software School, Xidian University 20 Inclusion Rule ( ) Given a table T with a specified heading Head(T). If X and Y are sets of attributes contained in Head(T), and Y X, then XY. Proof. By def, need only demonstrate that if two rows u and v agree on X they must agree on Y. But Y is a subset of X, so seems obvious. Trivial Dependency ( ) A Trivial Dependency is a FD of the form X Y, in a table T where X Y Head(T). That will hold for any possible content of the table T. ( trivial dependency) Given a trivial dependency X Y in T, it must be the case that Y X. e.g. A A, AB A Logical implications among functional dependencies
  • Slide 21
  • DataBase System Haichang Gao, Software School, Xidian University 21 Armstrongs Axiom ( 1974) A1: Inclusion rule( ): if Y X, then XY R r t s t[X]=s[X] Y X t[Y]=s[Y] XY Example: customer_name, loan_number customer_name customer_name customer_name Armstrongs Axioms
  • Slide 22
  • DataBase System Haichang Gao, Software School, Xidian University 22 Armstrongs Axiom ( 1974) A2: Transitivity rule( ): if X Y and Y Z, then X Z R r t s t[X]=s[X] XY t[Y]=s[Y] YZ t[Z]=s[Z] F XZ Example: For relation: S( sno, sname, sdept, dept_manager ) sno sdept, sdept dept_manager THEN: sno dept_manager Armstrongs Axioms
  • Slide 23
  • DataBase System Haichang Gao, Software School, Xidian University 23 Armstrongs Axiom ( 1974) A3: Augmentation rule( ): if X Y, then XZ YZ R r t s t[XZ]=s[XZ] t[X]=s[X] t[Z]=s[Z] XY t[Y]=s[Y] t[YZ]=s[YZ] FXZYZ Example: For relation: S( sno, sname, sdept, dept_manager ) sno sdept THEN: (sno, sname) dept_manager, sname Armstrongs Axioms
  • Slide 24
  • DataBase System Haichang Gao, Software School, Xidian University 24 Some implications of Armstrongs Axiom [1] Union rule( ): if X Y and X Z, then X YZ (1) XY (P ) (2) XXYA2 (1) (3) XZ (4) XYYZA2 (3) (5) XYZ A3 (2) (4) {XY XZ } XYZ Example: S( sno, sname, sdept, dept_manager ) sno sname, sno sdept THEN: sno sname, sdept Armstrongs Axioms
  • Slide 25
  • DataBase System Haichang Gao, Software School, Xidian University 25 Some implications of Armstrongs Axiom [2] Decomposition rule( ): if X YZ, then X Y and X Z Example: S( sno, sname, sdept, dept_manager ) sno sname, sdept THEN: sno sname, sno sdept [3] Pseudotransitivity rule( ): if X Y and WY Z, then XW Z [4] Set accumulation rule( ): if X YZ and Z W, then X YZW ( ) Armstrongs Axioms
  • Slide 26
  • DataBase System Haichang Gao, Software School, Xidian University 26 The set of all functional dependencies logically implied by F is the closure of F, denoted by F +. We can find all of F + by applying Armstrong s Axioms: if , then (reflexivity) if , then (augmentation) if , and , then (transitivity) Armstrongs Axiom are often referred to as being valid(sound, ) and complete( ). Closure ( )
  • Slide 27
  • DataBase System Haichang Gao, Software School, Xidian University 27 Given R, U {A, B, C}, F={AB, BC}, The closure of F : F = { , A, AA, , ABA, //A1 AB,AAB,ABB,,ABCBC, //A2 BC, ABAC, //A2 A C} //A3 note there are 43 non-duplicate FDs. The closure of functional dependency sets includes all dependencies among attributes of a relation. drawback its too hard to be managed. Closure
  • Slide 28
  • DataBase System Haichang Gao, Software School, Xidian University 28 Algorithm To compute the closure of a set of functional dependencies F: Closure begin F + = F repeat for each functional dependency f in F + apply inclusion and augmentation rules on f add the resulting functional dependencies to F + for each pair of functional dependencies f 1 and f 2 in F + if f 1 and f 2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any further End
  • Slide 29
  • DataBase System Haichang Gao, Software School, Xidian University 29 Given a set of attributes define the closure of under F (denoted by F + ) as the set of attributes that are functionally determined by under F. Algorithm to compute F +, the closure of under F. Closure Closure of attributes ( ) result := ; while (changes to result) do for each in F do begin if result then result := result end
  • Slide 30
  • DataBase System Haichang Gao, Software School, Xidian University 30 Closure Closure of attributes ( ) Example1: Given R, R = (A, B, C, G, H, I) F = {A B, A C, CG H, CG I, B H} (AG) + = ? 1) result = AG 2) result = AGBC (A B and A C) 3) result = AGBCH (CG H and CG AGBC) 4) result = AGBCHI (CG I and CG AGBCH) Example2: Given R, R = (A, B, C, D, E) F={BCD, ADE, BA} (BC) + = ?
  • Slide 31
  • DataBase System Haichang Gao, Software School, Xidian University 31 Closure Closure of attributes ( ) There are several uses of the attribute closure algorithm: 1) Testing for superkey: To test if is a superkey, we compute +, and check if + contains all attributes of R. Example: for relation R, U = {A, B, C, D, E}, F = {ABC, BD, CE, ECB, ACB } IS AB a superkey or not? (AB) F + {ABCDE} = U So AB is a superkey
  • Slide 32
  • DataBase System Haichang Gao, Software School, Xidian University 32 Closure There are several uses of the attribute closure algorithm: Closure of attributes ( ) 2) Testing functional dependencies To check if a functional dependency holds (or, in other words, is in F + ), just check if +. Example: for relation R, U = {A, B, C, D, E}, F = {ABC, BD, CE, ECB, ACB } IS BECD implied by F ? For (BE) F + {BED}, not include CD, so not implied. IS ABE implied by F ? (Theorem) :
  • Slide 33
  • DataBase System Haichang Gao, Software School, Xidian University 33 Closure There are several uses of the attribute closure algorithm: Closure of attributes ( ) 3) Computing closure of F For each R, we find the closure +, and for each S +, we output a functional dependency S. (Theorem) :
  • Slide 34
  • DataBase System Haichang Gao, Software School, Xidian University 34 FD Set Cover( ): A set F of FDs on a table T is said to cover another set G of FDs on T, if the set G of FDs can be derived by implication rules from the set F, or in other words. If G F +. If F covers G and G covers F, then the two sets of FDs are said to be equivalent, and we write F G. If two FDs are equivalent, the have the same implication of FDs. Example: Consider the two sets of FDs on relaton R(ABCDE) : F={BCD, ADE, BA} and G={BCDE, BABC, ADE} Is F G or NOT? Cover
  • Slide 35
  • Database Systems
  • Slide 36
  • DataBase System Haichang Gao, Software School, Xidian University 36 Sets of functional dependencies may have redundant dependencies that can be inferred from the others. For example: A C is redundant in: {A B, B C} Parts of a functional dependency may be redundant E.g.: on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} we need a cover of F is a minimal set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies. Cover
  • Slide 37
  • DataBase System Haichang Gao, Software School, Xidian University 37 Minimal Cover( ) Step 1. Decomposition Right Hand Side of FDs Create an equivalent set H of FDs, with only single attributes on the right side.( ) Step 2. Erase extraneous attributes on LHS For in F Attribute A is extraneous in if A and F logically implies (F { }) {( A) }. Then replace with ( A) Step 3. Delete redundant FD For in F, if (F { }) logically implies , then delete from F. Minamal Cover
  • Slide 38
  • DataBase System Haichang Gao, Software School, Xidian University 38 Minamal Cover Example: for relation R, U = {A, B, C, D, E}, F={ABC, BCDE, BD, AD, EA} compute the minimal cover of F. 1) F 1 ={AB, AC, BCDE, BD, AD, EA} 2) for (BC) F + =BCDEA, include E, so D in LHS of BCDE is extraneous. F 2 {AB, AC, BCE, BD, AD, EA} 3) for AD because of (A) + F2-(AD ) =ABCED, is redundancy F min = {AB, AC, BCE, BD, EA}
  • Slide 41
  • DataBase System Haichang Gao, Software School, Xidian University 41 R = (A, B, C, D, E, F) F = {A BC, E CF, B E, CD EF} 1. (AB) + = ? 2. (AD) + = ? Is AD F implied by F? Page 307 7.6 7.7
  • Slide 42
  • DataBase System Haichang Gao, Software School, Xidian University 42 KEY K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K R, and for no K, R Prime attribute: an attribute that appeared in some candidate key non-prime attribute: an attribute that DO NOT appeared in any candidate key
  • Slide 43
  • DataBase System Haichang Gao, Software School, Xidian University 43 5.3 R F U L : F R : F LR : F N : F L N R LR
  • Slide 44
  • DataBase System Haichang Gao, Software School, Xidian University 44 5.3 R U F (1) F R L R LR N X L N Y LR R U={A,B,C,D,E} F={ABC, CDE, BD, EA} R : (1) R L N A, B,C,D,E LR X= Y {A,B,C,D,E}; (2) X F =U X R (?) (3) (2) X F = U
  • Slide 45
  • DataBase System Haichang Gao, Software School, Xidian University 45 5.3 R U F (1) (2) R U={A,B,C,D,E} F={ABC, CDE, BD, EA} R : (1) X= Y {A,B,C,D,E}; (2) (3) A A F + =ABCDE=U A B C D U E E F + =ABCDE=U E Y= {B,C,D} (3) Y A (XA) F + =U XA Y Y {A} (4)
  • Slide 46
  • DataBase System Haichang Gao, Software School, Xidian University 46 5.3 R U F (1) (2) (3) R U={A,B,C,D,E} F={ABC, CDE, BD, EA} R :(3) A E Y= {B,C,D} (4) Y (BC) F + =BCDEA=U BC (BD) F + U BC (CD) F + =CDEAB=U CD (4) Y XZ XZ F (XZ) F + (XZ) F + =U XZ Y
  • Slide 47
  • DataBase System Haichang Gao, Software School, Xidian University 47 5.3 R U={A,B,C,D,E} F={ABC, CDE, BD, EA} R :(3) A E Y= {B,C,D} (4) Y BC CD (5) BCD BC BCD R A E BC CD
  • Slide 48
  • DataBase System Haichang Gao, Software School, Xidian University 48 Normal Forms -- 1NF A relational schema R is in first normal form if the domains of all attributes of R are atomic. NO composite attributes, such as: customer( customer-id, name(first-name, middle-initial, last- name), date-of-birth ) Each attribute as an unit, even they have several part that have individual information. Example: Strings would normally be considered indivisible. For student number 130711*** , 13 is department number, but you cannot use. For doing so is a bad idea: leads to encoding of information in application program rather than in the database.
  • Slide 49
  • DataBase System Haichang Gao, Software School, Xidian University 49 Normal Forms -- 1NF A schema R not in 1NF, then it s NOT a relational schema. A relation R is in 1NF is not good enough. For relation: Employee( emp_id, emp_name, emp_phone, dept_name, dept_phone, dept_mgrname, skill_id, skill_name, skill_date, skill_lvl ) Is in 1NF Has Insert Anomaly, Delete Anomaly, Update Anomaly and Data Redundancy.
  • Slide 50
  • DataBase System Haichang Gao, Software School, Xidian University 50 Normal Forms -- 2NF Second normal form (2NF): A relation schema R with FD set F is said to be in 2NF, if for any functional dependency XA implied by F that lies in R, where A is a single attribute that is not in X and is non-prime( , ), X is not a proper subset( ) of any key K of R. Or there are NO non-prime attributes dependent on Candidate Key partially in 2NF. ( ) Example R(A, B,C,D), F = {AB C, AC BD} Candidate Key : AB, AC AB D, AC D is FULL dependency R 2NF
  • Slide 51
  • DataBase System Haichang Gao, Software School, Xidian University 51 Normal Forms -- 2NF For example Is relation schema emp_info 2NF ? Candidate Key? Non-Prime attributes? Test all FD according the definition of Normal Form. emp_info 2NF
  • Slide 52
  • Database Systems
  • Slide 53
  • DataBase System Haichang Gao, Software School, Xidian University 53 Normal Forms -- 2NF emp_info ( emp_id, emp_name, epm_phone, dept_name, dept_phone, dept_mgrname, skill_id, skill_name, skill_date, skill_lvl ) F = { emp_id emp_name, epm_phone, dept_name, dept_name dept_phone, dept_mgrname, skill_id skill_name, emp_id, skill_id skill_date, skill_lvl } Decomposition( ): emp (emp_id, emp_name, epm_phone, dept_name, dept_phone, dept_mgrname ) skill ( skill_id, skill_name ) emp_skill ( emp_id, skill_id, skill_date, skill_lvl ) 2NF
  • Slide 54
  • DataBase System Haichang Gao, Software School, Xidian University 54 Normal Forms -- 2NF For relation: bor_loan (customer_id, loan_number, amount ) F = {loan_number amount } CK: ( customer_id, loan_number ) bor_loan is NOT in 2NF For borrower is M:N relationship Merging a M:N relationship with an entity it associated induces a NON-2NF relation schema.
  • Slide 55
  • DataBase System Haichang Gao, Software School, Xidian University 55 Normal Forms -- 2NF A relation R is in 2NF is not good enough. For relation: emp (emp_id, emp_name, epm_phone, dept_name, dept_phone, dept_mgrname ) 2NF Has Insert Anomaly, Delete Anomaly, Update Anomaly and Data Redundancy.
  • Slide 56
  • DataBase System Haichang Gao, Software School, Xidian University 56 Normal Forms -- 3NF A relation schema R is in third normal form (3NF) if for all: in F + at least one of the following holds: is trivial (i.e., ) (not exist in canonical cover ) is a superkey for R Each attribute A in is contained in a candidate key for R.(or for canonical cover, A in is Prime attribute) For example SJP(S, J, P) S J P FD: (S, J)P (J, P)S CK: (S, J), (J, P) LHS of each FD is superkey, SPJ is in 3NF.
  • Slide 57
  • DataBase System Haichang Gao, Software School, Xidian University 57 Normal Forms -- 3NF Another define: A relation R is in 3NF if there are no nonprime attributes which transitively dependent on a key for R. (3NF ) For example loan_b (loan_number, branch_name, branch_city, assets) F = {loan_number branch_name, branch_name branch_city, assets } loan_number branch_name, branch_name branch_city so nonprime attribute branch_city is transitively dependent on candidate key loan_number SPJ is NOT in 3NF
  • Slide 58
  • DataBase System Haichang Gao, Software School, Xidian University 58 Normal Forms -- 3NF The two definations are equivalent: A relation schema R is in third normal form (3NF) if for all: in F + at least one of the following holds: is trivial (i.e., ) (not exist in canonical cover ) is a superkey for R Each attribute A in is contained in a candidate key for R.(or for canonical cover, A in is Prime attribute) Another define: A relation R is in 3NF if there are no nonprime attributes which transitively dependent on a key for R.
  • Slide 59
  • DataBase System Haichang Gao, Software School, Xidian University 59 Normal Forms -- 3NF For example emp (emp_id, emp_name, epm_phone, dept_name, dept_phone, dept_mgrname ) 2NF F = { emp_id emp_name, epm_phone, dept_name, dept_name dept_phone, dept_mgrname } dept_name is NOT a superkey; emp_name NOT in any candidate key; emp is NOT in 3NF Nonprime attribute dept_phone is transitively dependent on candidate key emp_id. So emp is NOT in 3NF.
  • Slide 60
  • DataBase System Haichang Gao, Software School, Xidian University 60 Normal Forms -- 3NF For example emp (emp_id, emp_name, epm_phone, dept_name, dept_phone, dept_mgrname ) F = { emp_id emp_name, epm_phone, dept_name, dept_name dept_phone, dept_mgrname } emp is NOT in 3NF Decomposition emp (emp_id, emp_name, epm_phone, dept_name ) F = { emp_id emp_name, epm_phone, dept_name } emp 3NF. dept (dept_name, dept_phone, dept_mgrname ) F = { dept_name dept_phone, dept_mgrname } dept 3NF.
  • Slide 61
  • DataBase System Haichang Gao, Software School, Xidian University 61 Normal Forms -- 3NF A relation R is in 3NF is not good enough. For relation: STC( S, T, C) SStudent, TTeacher, C--Course F = { (S C)T, (S T)C, TC } There is no nonprime attribute. STC is IN 3NF. The first two FD, LHS is SuperKey C in TC is prime attribute STC is IN 3NF. Has Insert Anomaly, Delete Anomaly, Update Anomaly and Data Redundancy.
  • Slide 62
  • DataBase System Haichang Gao, Software School, Xidian University 62 Normal Forms -- BCNF A relation schema R is in BCNF(Boyce-Codd Normal Form) with respect to a set F of functional dependencies if for all functional dependencies in F + of the form where R and R, at least one of the following holds: is trivial (i.e., ) is a superkey for R For example bor_loan ( customer_id, loan_number, amount ) F = { loan_number amount } bor_loan is not in BCNF, for loan_number is not a superkey bor_loan is not in 2NF, it just in 1NF.
  • Slide 63
  • DataBase System Haichang Gao, Software School, Xidian University 63 Normal Forms -- BCNF example1 SJP(S, J, P) S J P FD: (S, J)P (J, P)S CK: (S, J), (J, P) LHS of each FD is superkey, SPJ is in BCNF. example2 STC( S, T, C) F = { (S,C)T, (S,T)C, TC } There is no nonprime attribute. STC is IN 3NF. For TC, T is not a superkey STC is NOT in BCNF.
  • Slide 64
  • DataBase System Haichang Gao, Software School, Xidian University 64 Normal Forms Theorem: 1NF 2NF 3NF BCNF To determine a relation in nNF, one should give the highest Normal Form.
  • Slide 65
  • DataBase System Haichang Gao, Software School, Xidian University 65 Normal Forms Relation Database: emp (emp_id, emp_name, epm_phone, dept_name ) F = { emp_id emp_name, epm_phone, dept_name } emp BCNF. dept (dept_name, dept_phone, dept_mgrname ) F = { dept_name dept_phone, dept_mgrname } dept BCNF. skill ( skill_id, skill_name ) F = { skill_id skill_name } skill BCNF. emp_skill ( emp_id, skill_id, skill_date, skill_lvl ) F = { emp_id, skill_id skill_date, skill_lvl } emp_skill BCNF.
  • Slide 66
  • DataBase System Haichang Gao, Software School, Xidian University 66 Normal Forms (4NF) Multivalued dependency Let R be a relation schema and let R and R. The multivalued dependency(MVD, ) holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and t4 in r such that: t 1 [ ] = t 2 [ ] = t 3 [ ] = t 4 [ ] t 3 [ ] = t 1 [ ] t 3 [R ] = t 2 [R ] t 4 ] = t 2 [ ] t 4 [R ] = t 1 [R ] XYZ t1t1 xy1y1 z1z1 t2t2 xy2y2 z2z2 t3t3 xy1y1 z2z2 t4t4 xy2y2 z1z1
  • Slide 67
  • DataBase System Haichang Gao, Software School, Xidian University 67 Normal Forms For example WSC(W,S,C) Wwarehouse Ssafeguard Ccargo MVD: W S W C WSC w1w1 s1s1 c1c1 w1w1 s1s1 c2c2 w1w1 s1s1 c3c3 w1w1 s2s2 c1c1 w1w1 s2s2 c2c2 w1w1 s2s2 c3c3 w2w2 s3s3 c4c4 w2w2 s3s3 c5c5 w2w2 s4s4 c4c4 w2w2 s4s4 c5c5
  • Slide 68
  • DataBase System Haichang Gao, Software School, Xidian University 68 Normal Forms Consider a database classes (course, teacher, book ) MVD: course teacher, course book courseteacherbook database operating systems Avi Hank Sudarshan Avi Pete DB Concepts Ullman DB Concepts Ullman DB Concepts Ullman OS Concepts Stallings OS Concepts Stallings classes
  • Slide 69
  • DataBase System Haichang Gao, Software School, Xidian University 69 Normal Forms Consider a database classes (course, teacher, book ) Therefore, it is better to decompose classes into: courseteacher database operating systems Avi Hank Sudarshan Avi Jim teaches coursebook database operating systems DB Concepts Ullman OS Concepts Shaw text
  • Slide 70
  • DataBase System Haichang Gao, Software School, Xidian University 70 Normal Forms -- 4NF Fourth normal form (4NF): A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D + of the form , where R and R, at least one of the following hold: is trivial (i.e., or = R) is a superkey for schema R Where the closure D + of D is the set of all functional and multivalued dependencies logically implied by D. If a relation is in 4NF, it is in BCNF
  • Slide 71
  • DataBase System Haichang Gao, Software School, Xidian University 71 Normal Forms -- 4NF Normal forms: 4NF WSC(W,S,C) W S W C CTB(course, teacher, book) course teacher course book The above formal definition is supposed to formalize the notion that given a particular value of X (course) it has associated with it a set of values of Y (teacher) and a set of values of Z (book), and these two sets are in some sense independent of each other WSC ?NF CTB ?NF
  • Slide 72
  • DataBase System Haichang Gao, Software School, Xidian University 72 Normal Forms Normal forms: 4NF WSC(W,S,C) W S W C Anomalies: Decomposition WS (W,S) W S WSC(W, C) W C WSC w1w1 s1s1 c1c1 w1w1 s1s1 c2c2 w1w1 s1s1 c3c3 w1w1 s2s2 c1c1 w1w1 s2s2 c2c2 w1w1 s2s2 c3c3 w2w2 s3s3 c4c4 w2w2 s3s3 c5c5 w2w2 s4s4 c4c4 w2w2 s4s4 c5c5 WS w1w1 s1s1 w1w1 s2s2 w2w2 s3s3 w2w2 s4s4 WC w1w1 c1c1 w1w1 c2c2 w1w1 c3c3 w2w2 c4c4 w2w2 c5c5 WS 4NF WC 4NF
  • Slide 73
  • DataBase System Haichang Gao, Software School, Xidian University 73 Decompositions For relation R, a decomposition( ) of R into k relatons = { R 1, R 2, , R k } with two properties: (1) For each relation R i, U i is a proper subset of U ; (2) U = U 1 U 2 U k, U i U j = Given any specific instance r of R, the rows of r are projected onto the columns of each U i as a result of the decomposition. decomposition
  • Slide 74
  • DataBase System Haichang Gao, Software School, Xidian University 74 Lossless Decompositions A decomposition of a relation R with an associated set F of FDs is said to be a lossless decomposition, or sometimes a lossless-join decomposition ( ) if, for any possible instance r of R guarantee that : ABC a1100c1 a2200c2 a3300c3 a4200c4 ABC AB a1100 a2200 a3300 a4200 BC 100c1 200c2 300c3 200c4 AB BC ABC a1100c1 a2200c2 a2200c4 a3300c3 a4200c2 a4200c4 AB JOIN BC r = r 1 r 2 r k
  • Slide 75
  • DataBase System Haichang Gao, Software School, Xidian University 75 Lossless Decompositions For the case of R = (R 1, R 2 ), we require that for all possible relations r on schema R Theorem: A decomposition of R into R 1 and R 2 is lossless join if and only if at least one of the following dependencies is in F + : R 1 R 2 R 1 R 1 R 2 R 2 r = R1 (r ) R2 (r )
  • Slide 76
  • DataBase System Haichang Gao, Software School, Xidian University 76 Dependency Preservation Let F i be the set of dependencies F + that include only attributes in R i. A decomposition is dependency preserving ( ), if (F 1 F 2 F n ) + = F + If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.
  • Slide 77
  • DataBase System Haichang Gao, Software School, Xidian University 77 Decompositions Examples: R = ( A, B, C ) F = { A B, B C } Decomposition1: R 1 = (A, B), R 2 = (B, C) IS Lossless-join ? R 1 R 2 = {B} and B BC IS Dependency preserving? ( F 1 F 2 ) + = { A B, B C } + = F + Decomposition2: R 1 = (A, B), R 2 = (A, C) IS Lossless-join ? R 1 R 2 = {A} and A AB IS Dependency preserving? ( F 1 F 2 ) + = { A B, A C } can not imply B C, is non Dependency preserving
  • Slide 78
  • DataBase System Haichang Gao, Software School, Xidian University 78 Goals of Normalization Let R be a relation scheme with a set F of functional dependencies. Decide whether a relation scheme R is in good form. In the case that a relation scheme R is not in good form, decompose it into a set of relation scheme {R 1, R 2,..., R n } such that each relation scheme is in good form the decomposition is a lossless-join decomposition Preferably, the decomposition should be dependency preserving.
  • Slide 79
  • DataBase System Haichang Gao, Software School, Xidian University 79 BCNF Decomposition Algorithm R = (A, B, C ) F = {A B, B C} Key = {A} R is not in BCNF (B C but B is not superkey) Decomposition: R 1 = (B, C), R 2 = (A,B) result := {R }; done := false; compute F + ; while (not done) do if (there is a schema Ri in result that is not in BCNF) then begin let be a nontrivial functional dependency that holds on Ri such that Ri is not in F +, and = ; result := (result Ri ) (Ri ) ( , ); end else done := true; Note: each Ri is in BCNF, and decomposition is lossless-join.
  • Slide 80
  • DataBase System Haichang Gao, Software School, Xidian University 80 BCNF Decomposition Algorithm Original relation R and functional dependency F R = (branch_name, branch_city, assets, customer_name, loan_number, amount ) F = { branch_name assets, branch_city loan_number amount, branch_name } Key = { loan_number, customer_name } Decomposition For FD: branch_name assets, branch_city, decomposition: R 1 = (branch_name, branch_city, assets ) R 2 = (branch_name, customer_name, loan_number, amount ) For FD in R 2 loan_number amount, branch_name R 21 = (branch_name, loan_number, amount ) R 22 = (customer_name, loan_number ) Final decomposition: R 1, R 21, R 22
  • Slide 81
  • DataBase System Haichang Gao, Software School, Xidian University 81 BCNF Decomposition Algorithm emp_info ( emp_id, emp_name, epm_phone, dept_name, dept_phone, dept_mgrname, skill_id, skill_name, skill_date, skill_lvl ) F = { emp_id emp_name, epm_phone, dept_name, dept_name dept_phone, dept_mgrname, skill_id skill_name, (emp_id, skill_id) skill_date, skill_lvl } CK: ( emp_id, skill_id ) Decomposition to BCNF For emp_id emp_name, epm_phone, dept_name, dept_phone, dept_mgrname R1 = (emp_id, emp_name, epm_phone, dept_name, dept_phone, dept_mgrname) F1 = {emp_id emp_name, epm_phone, dept_name, dept_name dept_phone, dept_mgrname } R2 = (emp_id, skill_id, skill_name, skill_date, skill_lvl ) F2 = {skill_id skill_name,( emp_id, skill_id) skill_date, skill_lvl }
  • Slide 82
  • DataBase System Haichang Gao, Software School, Xidian University 82 E-R diagram vs. Normalization branch = (branch_name, branch_city, assets) customer = (customer_id, customer_name, customer_street, customer_city) loan = (loan_number, amount) account = (account_number, balance) employee = (employee_id. employee_name, telephone_number, start_date) dependent_name = (employee_id, dname) account_branch = (account_number, branch_name) loan_branch = (loan_number, branch_name) borrower = (customer_id, loan_number) depositor = (customer_id, account_number) cust_banker = (customer_id, employee_id, type) works_for = (worker_employee_id, manager_employee_id) payment = (loan_number, payment_number, payment_date, payment_amount) savings_account = (account_number, interest_rate) checking_account = (account_number, overdraft_amount)
  • Slide 83
  • DataBase System Haichang Gao, Software School, Xidian University 83 E-R diagram vs. Normalization branch = (branch_name, branch_city, assets) FD = {branch_name branch_city, assets } Branch BCNF customer = (customer_id, customer_name, customer_street, customer_city) FD = {customer_id customer_name, customer_street, customer_city} customer BCNF loan = (loan_number, amount, branch_name) FD = {loan_number amount, branch_name } loan BCNF account = (account_number, balance, branch_name) FD = {account_number balance, branch_name} account BCNF
  • Slide 84
  • DataBase System Haichang Gao, Software School, Xidian University 84 E-R diagram vs. Normalization branch = (branch_name, branch_city, assets) FD = {branch_name branch_city, assets } Branch BCNF employee = (employee_id. employee_name, telephone_number, start_date) FD = {employee_id employee_name, telephone_number, start_date} employee BCNF dependent_name = (employee_id, dname) FD = {employee_id dname } dependent_name 4NF borrower = (customer_id, loan_number) FD = borrower BCNF depositor = (customer_id, account_number) FD = depositor BCNF
  • Slide 85
  • DataBase System Haichang Gao, Software School, Xidian University 85 E-R diagram vs. Normalization cust_banker = (customer_id, employee_id, type) FD = {customer_id, employee_id type} cust_banker BCNF works_for = (worker_employee_id, manager_employee_id) FD = {worker_employee_id manager_employee_id} works_for BCNF payment = (loan_number, payment_number, payment_date, payment_amount) savings_account = (account_number, interest_rate) checking_account = (account_number, overdraft_amount)
  • Slide 86
  • DataBase System Haichang Gao, Software School, Xidian University 86 E-R diagram vs. Normalization E-R 3NF!