Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization If...
-
Upload
tamara-towsley -
Category
Documents
-
view
219 -
download
1
Transcript of Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization If...
Lecture 7 Logical Database Design (2) 2
Normalization If a relation is not in BCNF or 3NF, we
refine it by decomposing it into two or more smaller relation schemas that are in the normal form.
Decomposition has to be used carefully, since there are potential problems.
What are desirable properties of a decomposition, and how to test them?
How to obtain a decomposition with some desirable properties?
Lecture 7 Logical Database Design (2) 3
Decomposition of a Relation Let R be a relation schema. A
decomposition of R, demoted by D = {R1, R2, ..., Rn}, is a set of relation schemas such that R = R1 ... Rn.
If {R1, R2, ..., Rn} is a decomposition of R and r is an instance of R, then
r R1(r) R2(r) . . . Rn(r)
Information may be lost (i.e. wrong tuples may be added by the natural join) due to a decomposition.
Lecture 7 Logical Database Design (2) 4
An Example of Information Loss Before
After
SID Name Cno Grade Room 101 Bill 1000 B 326 202 Mary 2000 A 326
SC
SR SG SID Name Cno Grade Room 101 Bill 1000 B 326 202 Mary 2000 A 326 101 Mary 1000 B 326 202 Bill 2000 A 326
SID Cno Grade Room 101 1000 B 326 202 2000 A 326
SGName Room Bill 326 Mary 326
SR
Lecture 7 Logical Database Design (2) 5
Lossless Join Decomposition Let R be a relation schema, and D = {R1,
R2, ..., Rn} be a decomposition of R. D is a lossless (non-additive) join decomposition of R if for every legal instance r of R, we have
r = R1(r) R2(r) . . . Rn(r) Theorem: Let F be a set of FDs over R, and
D = {R1, R2} be a decomposition of R. D is a lossless-join decomposition if and only if R1 R2 R1 - R2 is in F+; or R1 R2 R2 - R1 is in F+.
Lecture 7 Logical Database Design (2) 6
Lossless Join: An Example Consider F = {B AH, L CAt} over Bank-Loans(Bank, Assets, Headquarter,
Loan#, Customer, Amount).
Let D = {Banks(B,A,H), Loans(B,L,C,At)}. Since Banks Loans = B AH = Banks -
Loans is in F+ (since it is already in F), D is a lossless-join decomposition.
What if the decomposition contains more than two relations.
Lecture 7 Logical Database Design (2) 7
Test for Lossless Join *Algorithm TestLJ (Chase)Input: A relation schema R(A1, …, Am), a set of
FDs F, and a decomposition D = {R1, …, Rn}.
Output: Yes, if D is a Lossless join; no, otherwise.
Method:1. Create an n m table T (labeled by Ai and
Rj).
2. If Ri contains Aj, place aj at Ti,j. Otherwise,
place bij at Ti,j.
Lecture 7 Logical Database Design (2) 8
TestLJ (cont.) *3. Repeat for each FD X Y in F do For all rows with identical symbols on X
do make the symbols on Y identical. (choose aj over bij whenever possible)
Until no more change can be made.4. Return yes if there is a row of aj’s.
Otherwise, return no.
Lecture 7 Logical Database Design (2) 9
TestLJ: An ExampleContinue with the previous example. Set up the table T.
Enforce B AH.
B A H L C At BAH a1 a2 a3 b14 b15 b16
BLCAt a1 b22 b23 a4 a5 a6
B A H L C At BAH a1 a2 a3 b14 b15 b16
BLCAt a1 a2 a3 a4 a5 a6
Need to repeat until no more changes.
Lecture 7 Logical Database Design (2) 10
Dependency-Preserving Decomposition
Let F be a set of FDs over R, and D = {R1, R2, ..., Rn} be a decomposition of R. D is a dependency-preserving decomposition if
F+ = (R1(F) R2(F) . . . Rn(F))+
where for i = 1, …, nRi(F) = { X Y | X Y F and XY Ri}.
Restrict FDs to local relations. If all “global” FDs can be derived from “local” FDs, all dependencies are preserved.
Lecture 7 Logical Database Design (2) 11
Dependency Preservation: An Example
Consider F = {CS Z, Z C} over R(City, Street, Zipcode), and D ={R1(S, Z), R2(C, Z)}.
Then R1(F) = {} and R2(F) = {Z C} (consider non-trivial FDs only)
Since CS Z F+ but
CS Z (R1(F) R2(F))+,
D is not dependency-preserving.
Lecture 7 Logical Database Design (2) 12
Test for Dependency Preservation Algorithm TestDPInput: A relation schema R, A set of FDs F
over R, a decomposition D = {R1, R2, ..., Rn} of R.
Output: Yes, if D is dependency-preserving; no, otherwise.
Method: for every X Y F if Ri such that XY Ri
then X Y is preserved;
Lecture 7 Logical Database Design (2) 13
TestDP (cont.)
else W := X; repeat for i from 1 to n do
W := W ((W Ri)+ Ri);
until there is no change to W; if Y W then X Y is preserved; if every X Y is preserved then return yes; else return no. Derive global FDs using only local FDs.
Lecture 7 Logical Database Design (2) 14
TestDP: An example Consider F = {A B, B C, C D, D A }
over R(A, B, C, D), & D = {R1(A,B), R2(B,C), R3(C,D)}.
Is D a dependency-preserving decomposition? Since AB R1, A B is preserved. Since BC R2, B C is preserved. Since CD R3, C D is preserved.Since DA is not in any one of the three
relations, we need to compute W.
Lecture 7 Logical Database Design (2) 15
TestDP: An example (cont.) *
Initialization: W = D; first iteration: W = D ((D AB)+ AB) = D; W = D ((D BC)+ BC) = D; W = D ((D CD)+ CD) = D (D+ CD) = D (ABCD CD) = CD; W changed from D to CD.
Lecture 7 Logical Database Design (2) 16
TestDP: An example (cont.) *
second iteration:
W = CD ((CD AB)+ AB) = CD;
W = CD ((CD BC)+ BC)
= CD (C+ BC) = BCD;
W = BCD ((BCD CD)+ CD) = BCD; W changed from CD to BCD.
Lecture 7 Logical Database Design (2) 18
Normalization
It is good to have BCNF relation schemas. If a relation schema is not in BCNF, then
decompose it into a set of relation schemas: every new schema is in BCNF; it is lossless-join (can guarantee); it is dependency-preserving (no
guarantee). If not possible to have all nice properties,
be happy with a lossless join, dependency preserving 3NF decomposition (can guarantee)
Lecture 7 Logical Database Design (2) 19
Normalization to BCNF
Algorithm LLJD-BCNFInput: R: A relation schema F: A set of FDs satisfied by R.Output: A lossless-join decomposition D =
{R1, …, Rn}, such that each Ri is in BCNF.
Lecture 7 Logical Database Design (2) 20
Normalization to BCNF (cont.)
Method: D := {R}; while Ri D that is not in BCNF do
begin Find an FD X Y such that (1) Ri is not BCNF because of X Y, and
(2) XY Ri;
D := D - Ri {Ri - Y, XY}
end;
Lecture 7 Logical Database Design (2) 21
Normalization to BCNF (cont.) *
Theorem: Algorithm LLJD-BCNF is correct.Proof (sketch): Every schema in D is in BCNF because the
algorithm will not stop otherwise. D is a lossless-join decomposition because in
each iteration, Ri is decomposed into 2 smaller schemas (Ri - Y) and XY and they satisfy the condition:
(Ri - Y) XY = X Y = (XY - (Ri - Y)).
Lecture 7 Logical Database Design (2) 22
Normalization to BCNF: An Example
Consider F = {B AH, L CAt} over Bank-Loans(Bank, Assets, Headquarter,
Loan#, Customer, Amount), and a set of FDs,
Candidate keys: LBInitialization: D = {BAHLCAt }
Lecture 7 Logical Database Design (2) 23
Normalize to BCNF: An Example * 1st iteration: Ri = BAHLCAt is not in BCNF because B AH
is not a trivial FD and B is not a superkey. Replace BAHLCAt by BAH and BLCAt. Hence: D = {BAH, BLCAt}. BAH is in BCNF because in BAH, B is a
candidate key.
Lecture 7 Logical Database Design (2) 24
Normalize to BCNF: An Example *
2nd iteration: Ri = BLCAt is not in BCNF because L CAt is not a
trivial FD and L is not a superkey in BLCAt. Replace BLCAt by CLAt and BL.
Hence, D = {BAH, CLAt, BL}. CLAt is in BCNF because in CLAt, L is a candidate
key. BL is in BCNF (see theorem on next page).Final result: D = {BAH, CLAt, BL}. D happens to be dependency-preserving.
Any relation schema with exactly two attributes is in BCNF.
Lecture 7 Logical Database Design (2) 25
Normalize to BCNF: Another Ex. *
Consider R(City, Street, Zipcode), and F = {CS Z, Z C }. Candidate keys: CS, ZS.Initialization: D = {CSZ};1st iteration: R = CSZ is not in BCNF because Z C is
not a trivial FD and Z is not a superkey. D = {ZC, ZS}.D is not dependency-preserving because CS
Z is not preserved.
Lecture 7 Logical Database Design (2) 26
Equivalence of FD Sets Let F and G be two sets of FDs satisfied by
R. F and G are equivalent, denoted by F G, if F+ = G+.
Example: F = {B CD, AD E, B A} and G = {B CDE, B ABC, AD E} are equivalent.
Check to see that every FD in F is also in G+ and that every FD in G is also in F+
Lecture 7 Logical Database Design (2) 27
Extraneous Attributes Let F be a set of FDs. F contains an extraneous attribute A if
there is an FD X Y in F, such that either A X, and [F - {X Y} {X - {A} Y}] F; or A Y, and
[F - {X Y} {X Y - {A} }] F.
This is a “useless” attribute either at the left side or at the right side of an FD.
Lecture 7 Logical Database Design (2) 28
Summary
A good schema should have three properties: BCNF (or 3NF if BCNF can not be
obtained) Lossless join Dependency preserving
Lossless join BCNF decomposition is guaranteed, need to check for dependency preservation
Lossless join, dependency preserving 3NF decomposition is guaranteed (need to find the minimal cover)