DataBase System Haichang Gao, Software School, Xidian
University 2 Major Content & Grade Introduction* The Relational
Model*** SQL**** Transaction Management*** Database Design (E-R)***
Database Design (Normalization)***
Slide 3
DataBase System Haichang Gao, Software School, Xidian
University 3 Introduction Functional Dependencies Normal Forms
Lossless Decompositions Additional Design Considerations Part2
normalization
Slide 4
DataBase System Haichang Gao, Software School, Xidian
University 4 Normalization ( ) is another approach to logical
design of a relational database. E-R approach and normalization
approach reinforce each other. Normalization starts with a
real-world situation to be modeled and lists the data items that
are candidates to become column names in relational tables,
together with a list of rules about the relatedness of these data
items. The aim is to represent all these data items as attributes
of tables that obey restrictive conditions associated with what we
call normal forms ( ). 1NF --> 2NF --> 3NF --> BCNF -->
4NF --> 5NF Introduction
DataBase System Haichang Gao, Software School, Xidian
University 6 Design of the Bank Database Suppose we combine borrow
and loan to get Schema: bor_loan = (customer_id, loan_number,
amount ) Instance: Result is possible repetition of information For
borrower is M:N relationship
Slide 7
DataBase System Haichang Gao, Software School, Xidian
University 7 Design of the Bank Database Consider combining
loan_branch and loan Schema: loan_amt_br = (loan_number, amount,
branch_name) Instance: No repetition For loan_branch is 1:N
relationship
Slide 8
DataBase System Haichang Gao, Software School, Xidian
University 8 Design of the Bank Database Example (decompose, ) : we
cannot reconstruct the original employee relation
Slide 9
DataBase System Haichang Gao, Software School, Xidian
University 9 Design of the Bank Database combining loan_branch and
loan into Schema: loan_amt_br = (loan_number, amount, branch_name)
Is a good relation schema combine borrow and loan to get Schema:
bor_loan = (customer_id, loan_number, amount ) Is NOT a good
relation schema Decide whether a particular relation R is in good
or NOT? Suppose we had started with bor_loan. How would we know to
split up (decompose, ) it into borrower and loan? Normalization
theory is the tools used to solve those questions.
Slide 10
DataBase System Haichang Gao, Software School, Xidian
University 10 Employee Information: A Running Example From one up
to a large number of skills useful to the company
Slide 11
DataBase System Haichang Gao, Software School, Xidian
University 11 Employee Information: A Running Example
Slide 12
DataBase System Haichang Gao, Software School, Xidian
University 12 Update Anomaly ( ) A table T is subject to an update
anomaly when changing a single attribute value for an entity
instance or relationship instance represented in the table may
require that several rows of T be updated. Anomalies of a Bad
Database Design
Slide 13
DataBase System Haichang Gao, Software School, Xidian
University 13 Delete Anomaly A table T is subject to a delete
anomaly when deleting some row of the table to reflect the
disappearance of some instance of an entity or relationship can
cause us to lose information about some instance of a different
entity or relationship that we do not wish to forget. Anomalies of
a Bad Database Design
Slide 14
DataBase System Haichang Gao, Software School, Xidian
University 14 Insert Anomaly We cannot represent information about
some entity or instance without including information about some
other instance of an entity or relationship that does not exist.
Anomalies of a Bad Database Design
Slide 15
DataBase System Haichang Gao, Software School, Xidian
University 15 Redundant Data An entity instance or relationship
instance represented in a table T may account for several rows of
T. Anomalies of a Bad Database Design
Slide 16
DataBase System Haichang Gao, Software School, Xidian
University 16 Normalize the relation Anomalies of a Bad Database
Design decompose
Slide 17
DataBase System Haichang Gao, Software School, Xidian
University 17 Functional Dependencies ( ) The functional dependency
holds on R if and only if for any legal relations r(R), whenever
any two tuples t 1 and t 2 of r agree on the attributes ( ), they
also agree on the attributes . That is, t 1 [ ] = t 2 [ ] t 1 [ ] =
t 2 [ ] ( ) Functional Dependencies
Slide 18
DataBase System Haichang Gao, Software School, Xidian
University 18 In the emp_info table, we get Functional Dependencies
emp_id emp_name emp_id emp_phone? emp_id dep_name ?, and emp_phone
emp_id
Slide 19
DataBase System Haichang Gao, Software School, Xidian
University 19 Analyze the following tables (suppose they are valid)
Functional Dependencies T2: A B B A T1: A B B A T3: A B B A
Slide 20
DataBase System Haichang Gao, Software School, Xidian
University 20 Inclusion Rule ( ) Given a table T with a specified
heading Head(T). If X and Y are sets of attributes contained in
Head(T), and Y X, then XY. Proof. By def, need only demonstrate
that if two rows u and v agree on X they must agree on Y. But Y is
a subset of X, so seems obvious. Trivial Dependency ( ) A Trivial
Dependency is a FD of the form X Y, in a table T where X Y Head(T).
That will hold for any possible content of the table T. ( trivial
dependency) Given a trivial dependency X Y in T, it must be the
case that Y X. e.g. A A, AB A Logical implications among functional
dependencies
Slide 21
DataBase System Haichang Gao, Software School, Xidian
University 21 Armstrongs Axiom ( 1974) A1: Inclusion rule( ): if Y
X, then XY R r t s t[X]=s[X] Y X t[Y]=s[Y] XY Example:
customer_name, loan_number customer_name customer_name
customer_name Armstrongs Axioms
Slide 22
DataBase System Haichang Gao, Software School, Xidian
University 22 Armstrongs Axiom ( 1974) A2: Transitivity rule( ): if
X Y and Y Z, then X Z R r t s t[X]=s[X] XY t[Y]=s[Y] YZ t[Z]=s[Z] F
XZ Example: For relation: S( sno, sname, sdept, dept_manager ) sno
sdept, sdept dept_manager THEN: sno dept_manager Armstrongs
Axioms
Slide 23
DataBase System Haichang Gao, Software School, Xidian
University 23 Armstrongs Axiom ( 1974) A3: Augmentation rule( ): if
X Y, then XZ YZ R r t s t[XZ]=s[XZ] t[X]=s[X] t[Z]=s[Z] XY
t[Y]=s[Y] t[YZ]=s[YZ] FXZYZ Example: For relation: S( sno, sname,
sdept, dept_manager ) sno sdept THEN: (sno, sname) dept_manager,
sname Armstrongs Axioms
Slide 24
DataBase System Haichang Gao, Software School, Xidian
University 24 Some implications of Armstrongs Axiom [1] Union rule(
): if X Y and X Z, then X YZ (1) XY (P ) (2) XXYA2 (1) (3) XZ (4)
XYYZA2 (3) (5) XYZ A3 (2) (4) {XY XZ } XYZ Example: S( sno, sname,
sdept, dept_manager ) sno sname, sno sdept THEN: sno sname, sdept
Armstrongs Axioms
Slide 25
DataBase System Haichang Gao, Software School, Xidian
University 25 Some implications of Armstrongs Axiom [2]
Decomposition rule( ): if X YZ, then X Y and X Z Example: S( sno,
sname, sdept, dept_manager ) sno sname, sdept THEN: sno sname, sno
sdept [3] Pseudotransitivity rule( ): if X Y and WY Z, then XW Z
[4] Set accumulation rule( ): if X YZ and Z W, then X YZW ( )
Armstrongs Axioms
Slide 26
DataBase System Haichang Gao, Software School, Xidian
University 26 The set of all functional dependencies logically
implied by F is the closure of F, denoted by F +. We can find all
of F + by applying Armstrong s Axioms: if , then (reflexivity) if ,
then (augmentation) if , and , then (transitivity) Armstrongs Axiom
are often referred to as being valid(sound, ) and complete( ).
Closure ( )
Slide 27
DataBase System Haichang Gao, Software School, Xidian
University 27 Given R, U {A, B, C}, F={AB, BC}, The closure of F :
F = { , A, AA, , ABA, //A1 AB,AAB,ABB,,ABCBC, //A2 BC, ABAC, //A2 A
C} //A3 note there are 43 non-duplicate FDs. The closure of
functional dependency sets includes all dependencies among
attributes of a relation. drawback its too hard to be managed.
Closure
Slide 28
DataBase System Haichang Gao, Software School, Xidian
University 28 Algorithm To compute the closure of a set of
functional dependencies F: Closure begin F + = F repeat for each
functional dependency f in F + apply inclusion and augmentation
rules on f add the resulting functional dependencies to F + for
each pair of functional dependencies f 1 and f 2 in F + if f 1 and
f 2 can be combined using transitivity then add the resulting
functional dependency to F + until F + does not change any further
End
Slide 29
DataBase System Haichang Gao, Software School, Xidian
University 29 Given a set of attributes define the closure of under
F (denoted by F + ) as the set of attributes that are functionally
determined by under F. Algorithm to compute F +, the closure of
under F. Closure Closure of attributes ( ) result := ; while
(changes to result) do for each in F do begin if result then result
:= result end
Slide 30
DataBase System Haichang Gao, Software School, Xidian
University 30 Closure Closure of attributes ( ) Example1: Given R,
R = (A, B, C, G, H, I) F = {A B, A C, CG H, CG I, B H} (AG) + = ?
1) result = AG 2) result = AGBC (A B and A C) 3) result = AGBCH (CG
H and CG AGBC) 4) result = AGBCHI (CG I and CG AGBCH) Example2:
Given R, R = (A, B, C, D, E) F={BCD, ADE, BA} (BC) + = ?
Slide 31
DataBase System Haichang Gao, Software School, Xidian
University 31 Closure Closure of attributes ( ) There are several
uses of the attribute closure algorithm: 1) Testing for superkey:
To test if is a superkey, we compute +, and check if + contains all
attributes of R. Example: for relation R, U = {A, B, C, D, E}, F =
{ABC, BD, CE, ECB, ACB } IS AB a superkey or not? (AB) F + {ABCDE}
= U So AB is a superkey
Slide 32
DataBase System Haichang Gao, Software School, Xidian
University 32 Closure There are several uses of the attribute
closure algorithm: Closure of attributes ( ) 2) Testing functional
dependencies To check if a functional dependency holds (or, in
other words, is in F + ), just check if +. Example: for relation R,
U = {A, B, C, D, E}, F = {ABC, BD, CE, ECB, ACB } IS BECD implied
by F ? For (BE) F + {BED}, not include CD, so not implied. IS ABE
implied by F ? (Theorem) :
Slide 33
DataBase System Haichang Gao, Software School, Xidian
University 33 Closure There are several uses of the attribute
closure algorithm: Closure of attributes ( ) 3) Computing closure
of F For each R, we find the closure +, and for each S +, we output
a functional dependency S. (Theorem) :
Slide 34
DataBase System Haichang Gao, Software School, Xidian
University 34 FD Set Cover( ): A set F of FDs on a table T is said
to cover another set G of FDs on T, if the set G of FDs can be
derived by implication rules from the set F, or in other words. If
G F +. If F covers G and G covers F, then the two sets of FDs are
said to be equivalent, and we write F G. If two FDs are equivalent,
the have the same implication of FDs. Example: Consider the two
sets of FDs on relaton R(ABCDE) : F={BCD, ADE, BA} and G={BCDE,
BABC, ADE} Is F G or NOT? Cover
Slide 35
Database Systems
Slide 36
DataBase System Haichang Gao, Software School, Xidian
University 36 Sets of functional dependencies may have redundant
dependencies that can be inferred from the others. For example: A C
is redundant in: {A B, B C} Parts of a functional dependency may be
redundant E.g.: on RHS: {A B, B C, A CD} can be simplified to {A B,
B C, A D} E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B,
B C, A D} we need a cover of F is a minimal set of functional
dependencies equivalent to F, having no redundant dependencies or
redundant parts of dependencies. Cover
Slide 37
DataBase System Haichang Gao, Software School, Xidian
University 37 Minimal Cover( ) Step 1. Decomposition Right Hand
Side of FDs Create an equivalent set H of FDs, with only single
attributes on the right side.( ) Step 2. Erase extraneous
attributes on LHS For in F Attribute A is extraneous in if A and F
logically implies (F { }) {( A) }. Then replace with ( A) Step 3.
Delete redundant FD For in F, if (F { }) logically implies , then
delete from F. Minamal Cover
Slide 38
DataBase System Haichang Gao, Software School, Xidian
University 38 Minamal Cover Example: for relation R, U = {A, B, C,
D, E}, F={ABC, BCDE, BD, AD, EA} compute the minimal cover of F. 1)
F 1 ={AB, AC, BCDE, BD, AD, EA} 2) for (BC) F + =BCDEA, include E,
so D in LHS of BCDE is extraneous. F 2 {AB, AC, BCE, BD, AD, EA} 3)
for AD because of (A) + F2-(AD ) =ABCED, is redundancy F min = {AB,
AC, BCE, BD, EA}
Slide 41
DataBase System Haichang Gao, Software School, Xidian
University 41 R = (A, B, C, D, E, F) F = {A BC, E CF, B E, CD EF}
1. (AB) + = ? 2. (AD) + = ? Is AD F implied by F? Page 307 7.6
7.7
Slide 42
DataBase System Haichang Gao, Software School, Xidian
University 42 KEY K is a superkey for relation schema R if and only
if K R K is a candidate key for R if and only if K R, and for no K,
R Prime attribute: an attribute that appeared in some candidate key
non-prime attribute: an attribute that DO NOT appeared in any
candidate key
Slide 43
DataBase System Haichang Gao, Software School, Xidian
University 43 5.3 R F U L : F R : F LR : F N : F L N R LR
Slide 44
DataBase System Haichang Gao, Software School, Xidian
University 44 5.3 R U F (1) F R L R LR N X L N Y LR R U={A,B,C,D,E}
F={ABC, CDE, BD, EA} R : (1) R L N A, B,C,D,E LR X= Y {A,B,C,D,E};
(2) X F =U X R (?) (3) (2) X F = U
Slide 45
DataBase System Haichang Gao, Software School, Xidian
University 45 5.3 R U F (1) (2) R U={A,B,C,D,E} F={ABC, CDE, BD,
EA} R : (1) X= Y {A,B,C,D,E}; (2) (3) A A F + =ABCDE=U A B C D U E
E F + =ABCDE=U E Y= {B,C,D} (3) Y A (XA) F + =U XA Y Y {A} (4)
Slide 46
DataBase System Haichang Gao, Software School, Xidian
University 46 5.3 R U F (1) (2) (3) R U={A,B,C,D,E} F={ABC, CDE,
BD, EA} R :(3) A E Y= {B,C,D} (4) Y (BC) F + =BCDEA=U BC (BD) F + U
BC (CD) F + =CDEAB=U CD (4) Y XZ XZ F (XZ) F + (XZ) F + =U XZ
Y
Slide 47
DataBase System Haichang Gao, Software School, Xidian
University 47 5.3 R U={A,B,C,D,E} F={ABC, CDE, BD, EA} R :(3) A E
Y= {B,C,D} (4) Y BC CD (5) BCD BC BCD R A E BC CD
Slide 48
DataBase System Haichang Gao, Software School, Xidian
University 48 Normal Forms -- 1NF A relational schema R is in first
normal form if the domains of all attributes of R are atomic. NO
composite attributes, such as: customer( customer-id,
name(first-name, middle-initial, last- name), date-of-birth ) Each
attribute as an unit, even they have several part that have
individual information. Example: Strings would normally be
considered indivisible. For student number 130711*** , 13 is
department number, but you cannot use. For doing so is a bad idea:
leads to encoding of information in application program rather than
in the database.
Slide 49
DataBase System Haichang Gao, Software School, Xidian
University 49 Normal Forms -- 1NF A schema R not in 1NF, then it s
NOT a relational schema. A relation R is in 1NF is not good enough.
For relation: Employee( emp_id, emp_name, emp_phone, dept_name,
dept_phone, dept_mgrname, skill_id, skill_name, skill_date,
skill_lvl ) Is in 1NF Has Insert Anomaly, Delete Anomaly, Update
Anomaly and Data Redundancy.
Slide 50
DataBase System Haichang Gao, Software School, Xidian
University 50 Normal Forms -- 2NF Second normal form (2NF): A
relation schema R with FD set F is said to be in 2NF, if for any
functional dependency XA implied by F that lies in R, where A is a
single attribute that is not in X and is non-prime( , ), X is not a
proper subset( ) of any key K of R. Or there are NO non-prime
attributes dependent on Candidate Key partially in 2NF. ( ) Example
R(A, B,C,D), F = {AB C, AC BD} Candidate Key : AB, AC AB D, AC D is
FULL dependency R 2NF
Slide 51
DataBase System Haichang Gao, Software School, Xidian
University 51 Normal Forms -- 2NF For example Is relation schema
emp_info 2NF ? Candidate Key? Non-Prime attributes? Test all FD
according the definition of Normal Form. emp_info 2NF
DataBase System Haichang Gao, Software School, Xidian
University 54 Normal Forms -- 2NF For relation: bor_loan
(customer_id, loan_number, amount ) F = {loan_number amount } CK: (
customer_id, loan_number ) bor_loan is NOT in 2NF For borrower is
M:N relationship Merging a M:N relationship with an entity it
associated induces a NON-2NF relation schema.
Slide 55
DataBase System Haichang Gao, Software School, Xidian
University 55 Normal Forms -- 2NF A relation R is in 2NF is not
good enough. For relation: emp (emp_id, emp_name, epm_phone,
dept_name, dept_phone, dept_mgrname ) 2NF Has Insert Anomaly,
Delete Anomaly, Update Anomaly and Data Redundancy.
Slide 56
DataBase System Haichang Gao, Software School, Xidian
University 56 Normal Forms -- 3NF A relation schema R is in third
normal form (3NF) if for all: in F + at least one of the following
holds: is trivial (i.e., ) (not exist in canonical cover ) is a
superkey for R Each attribute A in is contained in a candidate key
for R.(or for canonical cover, A in is Prime attribute) For example
SJP(S, J, P) S J P FD: (S, J)P (J, P)S CK: (S, J), (J, P) LHS of
each FD is superkey, SPJ is in 3NF.
Slide 57
DataBase System Haichang Gao, Software School, Xidian
University 57 Normal Forms -- 3NF Another define: A relation R is
in 3NF if there are no nonprime attributes which transitively
dependent on a key for R. (3NF ) For example loan_b (loan_number,
branch_name, branch_city, assets) F = {loan_number branch_name,
branch_name branch_city, assets } loan_number branch_name,
branch_name branch_city so nonprime attribute branch_city is
transitively dependent on candidate key loan_number SPJ is NOT in
3NF
Slide 58
DataBase System Haichang Gao, Software School, Xidian
University 58 Normal Forms -- 3NF The two definations are
equivalent: A relation schema R is in third normal form (3NF) if
for all: in F + at least one of the following holds: is trivial
(i.e., ) (not exist in canonical cover ) is a superkey for R Each
attribute A in is contained in a candidate key for R.(or for
canonical cover, A in is Prime attribute) Another define: A
relation R is in 3NF if there are no nonprime attributes which
transitively dependent on a key for R.
Slide 59
DataBase System Haichang Gao, Software School, Xidian
University 59 Normal Forms -- 3NF For example emp (emp_id,
emp_name, epm_phone, dept_name, dept_phone, dept_mgrname ) 2NF F =
{ emp_id emp_name, epm_phone, dept_name, dept_name dept_phone,
dept_mgrname } dept_name is NOT a superkey; emp_name NOT in any
candidate key; emp is NOT in 3NF Nonprime attribute dept_phone is
transitively dependent on candidate key emp_id. So emp is NOT in
3NF.
Slide 60
DataBase System Haichang Gao, Software School, Xidian
University 60 Normal Forms -- 3NF For example emp (emp_id,
emp_name, epm_phone, dept_name, dept_phone, dept_mgrname ) F = {
emp_id emp_name, epm_phone, dept_name, dept_name dept_phone,
dept_mgrname } emp is NOT in 3NF Decomposition emp (emp_id,
emp_name, epm_phone, dept_name ) F = { emp_id emp_name, epm_phone,
dept_name } emp 3NF. dept (dept_name, dept_phone, dept_mgrname ) F
= { dept_name dept_phone, dept_mgrname } dept 3NF.
Slide 61
DataBase System Haichang Gao, Software School, Xidian
University 61 Normal Forms -- 3NF A relation R is in 3NF is not
good enough. For relation: STC( S, T, C) SStudent, TTeacher,
C--Course F = { (S C)T, (S T)C, TC } There is no nonprime
attribute. STC is IN 3NF. The first two FD, LHS is SuperKey C in TC
is prime attribute STC is IN 3NF. Has Insert Anomaly, Delete
Anomaly, Update Anomaly and Data Redundancy.
Slide 62
DataBase System Haichang Gao, Software School, Xidian
University 62 Normal Forms -- BCNF A relation schema R is in
BCNF(Boyce-Codd Normal Form) with respect to a set F of functional
dependencies if for all functional dependencies in F + of the form
where R and R, at least one of the following holds: is trivial
(i.e., ) is a superkey for R For example bor_loan ( customer_id,
loan_number, amount ) F = { loan_number amount } bor_loan is not in
BCNF, for loan_number is not a superkey bor_loan is not in 2NF, it
just in 1NF.
Slide 63
DataBase System Haichang Gao, Software School, Xidian
University 63 Normal Forms -- BCNF example1 SJP(S, J, P) S J P FD:
(S, J)P (J, P)S CK: (S, J), (J, P) LHS of each FD is superkey, SPJ
is in BCNF. example2 STC( S, T, C) F = { (S,C)T, (S,T)C, TC } There
is no nonprime attribute. STC is IN 3NF. For TC, T is not a
superkey STC is NOT in BCNF.
Slide 64
DataBase System Haichang Gao, Software School, Xidian
University 64 Normal Forms Theorem: 1NF 2NF 3NF BCNF To determine a
relation in nNF, one should give the highest Normal Form.
Slide 65
DataBase System Haichang Gao, Software School, Xidian
University 65 Normal Forms Relation Database: emp (emp_id,
emp_name, epm_phone, dept_name ) F = { emp_id emp_name, epm_phone,
dept_name } emp BCNF. dept (dept_name, dept_phone, dept_mgrname ) F
= { dept_name dept_phone, dept_mgrname } dept BCNF. skill (
skill_id, skill_name ) F = { skill_id skill_name } skill BCNF.
emp_skill ( emp_id, skill_id, skill_date, skill_lvl ) F = { emp_id,
skill_id skill_date, skill_lvl } emp_skill BCNF.
Slide 66
DataBase System Haichang Gao, Software School, Xidian
University 66 Normal Forms (4NF) Multivalued dependency Let R be a
relation schema and let R and R. The multivalued dependency(MVD, )
holds on R if in any legal relation r(R), for all pairs for tuples
t1 and t2 in r such that t1[ ] = t2 [ ], there exist tuples t3 and
t4 in r such that: t 1 [ ] = t 2 [ ] = t 3 [ ] = t 4 [ ] t 3 [ ] =
t 1 [ ] t 3 [R ] = t 2 [R ] t 4 ] = t 2 [ ] t 4 [R ] = t 1 [R ] XYZ
t1t1 xy1y1 z1z1 t2t2 xy2y2 z2z2 t3t3 xy1y1 z2z2 t4t4 xy2y2
z1z1
Slide 67
DataBase System Haichang Gao, Software School, Xidian
University 67 Normal Forms For example WSC(W,S,C) Wwarehouse
Ssafeguard Ccargo MVD: W S W C WSC w1w1 s1s1 c1c1 w1w1 s1s1 c2c2
w1w1 s1s1 c3c3 w1w1 s2s2 c1c1 w1w1 s2s2 c2c2 w1w1 s2s2 c3c3 w2w2
s3s3 c4c4 w2w2 s3s3 c5c5 w2w2 s4s4 c4c4 w2w2 s4s4 c5c5
Slide 68
DataBase System Haichang Gao, Software School, Xidian
University 68 Normal Forms Consider a database classes (course,
teacher, book ) MVD: course teacher, course book courseteacherbook
database operating systems Avi Hank Sudarshan Avi Pete DB Concepts
Ullman DB Concepts Ullman DB Concepts Ullman OS Concepts Stallings
OS Concepts Stallings classes
Slide 69
DataBase System Haichang Gao, Software School, Xidian
University 69 Normal Forms Consider a database classes (course,
teacher, book ) Therefore, it is better to decompose classes into:
courseteacher database operating systems Avi Hank Sudarshan Avi Jim
teaches coursebook database operating systems DB Concepts Ullman OS
Concepts Shaw text
Slide 70
DataBase System Haichang Gao, Software School, Xidian
University 70 Normal Forms -- 4NF Fourth normal form (4NF): A
relation schema R is in 4NF with respect to a set D of functional
and multivalued dependencies if for all multivalued dependencies in
D + of the form , where R and R, at least one of the following
hold: is trivial (i.e., or = R) is a superkey for schema R Where
the closure D + of D is the set of all functional and multivalued
dependencies logically implied by D. If a relation is in 4NF, it is
in BCNF
Slide 71
DataBase System Haichang Gao, Software School, Xidian
University 71 Normal Forms -- 4NF Normal forms: 4NF WSC(W,S,C) W S
W C CTB(course, teacher, book) course teacher course book The above
formal definition is supposed to formalize the notion that given a
particular value of X (course) it has associated with it a set of
values of Y (teacher) and a set of values of Z (book), and these
two sets are in some sense independent of each other WSC ?NF CTB
?NF
Slide 72
DataBase System Haichang Gao, Software School, Xidian
University 72 Normal Forms Normal forms: 4NF WSC(W,S,C) W S W C
Anomalies: Decomposition WS (W,S) W S WSC(W, C) W C WSC w1w1 s1s1
c1c1 w1w1 s1s1 c2c2 w1w1 s1s1 c3c3 w1w1 s2s2 c1c1 w1w1 s2s2 c2c2
w1w1 s2s2 c3c3 w2w2 s3s3 c4c4 w2w2 s3s3 c5c5 w2w2 s4s4 c4c4 w2w2
s4s4 c5c5 WS w1w1 s1s1 w1w1 s2s2 w2w2 s3s3 w2w2 s4s4 WC w1w1 c1c1
w1w1 c2c2 w1w1 c3c3 w2w2 c4c4 w2w2 c5c5 WS 4NF WC 4NF
Slide 73
DataBase System Haichang Gao, Software School, Xidian
University 73 Decompositions For relation R, a decomposition( ) of
R into k relatons = { R 1, R 2, , R k } with two properties: (1)
For each relation R i, U i is a proper subset of U ; (2) U = U 1 U
2 U k, U i U j = Given any specific instance r of R, the rows of r
are projected onto the columns of each U i as a result of the
decomposition. decomposition
Slide 74
DataBase System Haichang Gao, Software School, Xidian
University 74 Lossless Decompositions A decomposition of a relation
R with an associated set F of FDs is said to be a lossless
decomposition, or sometimes a lossless-join decomposition ( ) if,
for any possible instance r of R guarantee that : ABC a1100c1
a2200c2 a3300c3 a4200c4 ABC AB a1100 a2200 a3300 a4200 BC 100c1
200c2 300c3 200c4 AB BC ABC a1100c1 a2200c2 a2200c4 a3300c3 a4200c2
a4200c4 AB JOIN BC r = r 1 r 2 r k
Slide 75
DataBase System Haichang Gao, Software School, Xidian
University 75 Lossless Decompositions For the case of R = (R 1, R 2
), we require that for all possible relations r on schema R
Theorem: A decomposition of R into R 1 and R 2 is lossless join if
and only if at least one of the following dependencies is in F + :
R 1 R 2 R 1 R 1 R 2 R 2 r = R1 (r ) R2 (r )
Slide 76
DataBase System Haichang Gao, Software School, Xidian
University 76 Dependency Preservation Let F i be the set of
dependencies F + that include only attributes in R i. A
decomposition is dependency preserving ( ), if (F 1 F 2 F n ) + = F
+ If it is not, then checking updates for violation of functional
dependencies may require computing joins, which is expensive.
Slide 77
DataBase System Haichang Gao, Software School, Xidian
University 77 Decompositions Examples: R = ( A, B, C ) F = { A B, B
C } Decomposition1: R 1 = (A, B), R 2 = (B, C) IS Lossless-join ? R
1 R 2 = {B} and B BC IS Dependency preserving? ( F 1 F 2 ) + = { A
B, B C } + = F + Decomposition2: R 1 = (A, B), R 2 = (A, C) IS
Lossless-join ? R 1 R 2 = {A} and A AB IS Dependency preserving? (
F 1 F 2 ) + = { A B, A C } can not imply B C, is non Dependency
preserving
Slide 78
DataBase System Haichang Gao, Software School, Xidian
University 78 Goals of Normalization Let R be a relation scheme
with a set F of functional dependencies. Decide whether a relation
scheme R is in good form. In the case that a relation scheme R is
not in good form, decompose it into a set of relation scheme {R 1,
R 2,..., R n } such that each relation scheme is in good form the
decomposition is a lossless-join decomposition Preferably, the
decomposition should be dependency preserving.
Slide 79
DataBase System Haichang Gao, Software School, Xidian
University 79 BCNF Decomposition Algorithm R = (A, B, C ) F = {A B,
B C} Key = {A} R is not in BCNF (B C but B is not superkey)
Decomposition: R 1 = (B, C), R 2 = (A,B) result := {R }; done :=
false; compute F + ; while (not done) do if (there is a schema Ri
in result that is not in BCNF) then begin let be a nontrivial
functional dependency that holds on Ri such that Ri is not in F +,
and = ; result := (result Ri ) (Ri ) ( , ); end else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.
Slide 80
DataBase System Haichang Gao, Software School, Xidian
University 80 BCNF Decomposition Algorithm Original relation R and
functional dependency F R = (branch_name, branch_city, assets,
customer_name, loan_number, amount ) F = { branch_name assets,
branch_city loan_number amount, branch_name } Key = { loan_number,
customer_name } Decomposition For FD: branch_name assets,
branch_city, decomposition: R 1 = (branch_name, branch_city, assets
) R 2 = (branch_name, customer_name, loan_number, amount ) For FD
in R 2 loan_number amount, branch_name R 21 = (branch_name,
loan_number, amount ) R 22 = (customer_name, loan_number ) Final
decomposition: R 1, R 21, R 22