Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization If...

Logical Database Design (3 of 3)

John Ortiz

Lecture 7 Logical Database Design (2) 2

Normalization If a relation is not in BCNF or 3NF, we

refine it by decomposing it into two or more smaller relation schemas that are in the normal form.

Decomposition has to be used carefully, since there are potential problems.

What are desirable properties of a decomposition, and how to test them?

How to obtain a decomposition with some desirable properties?


Decomposition of a Relation Let R be a relation schema. A

decomposition of R, demoted by D = {R1, R2, ..., Rn}, is a set of relation schemas such that R = R1 ... Rn.

If {R1, R2, ..., Rn} is a decomposition of R and r is an instance of R, then

r R1(r) R2(r) . . . Rn(r)

Information may be lost (i.e. wrong tuples may be added by the natural join) due to a decomposition.


An Example of Information Loss Before

After

SID Name Cno Grade Room 101 Bill 1000 B 326 202 Mary 2000 A 326

SC

SR SG SID Name Cno Grade Room 101 Bill 1000 B 326 202 Mary 2000 A 326 101 Mary 1000 B 326 202 Bill 2000 A 326

SID Cno Grade Room 101 1000 B 326 202 2000 A 326

SGName Room Bill 326 Mary 326

SR


Lossless Join Decomposition Let R be a relation schema, and D = {R1,

R2, ..., Rn} be a decomposition of R. D is a lossless (non-additive) join decomposition of R if for every legal instance r of R, we have

r = R1(r) R2(r) . . . Rn(r) Theorem: Let F be a set of FDs over R, and

D = {R1, R2} be a decomposition of R. D is a lossless-join decomposition if and only if R1 R2 R1 - R2 is in F+; or R1 R2 R2 - R1 is in F+.


Lossless Join: An Example Consider F = {B AH, L CAt} over Bank-Loans(Bank, Assets, Headquarter,

Loan#, Customer, Amount).

Let D = {Banks(B,A,H), Loans(B,L,C,At)}. Since Banks Loans = B AH = Banks -

Loans is in F+ (since it is already in F), D is a lossless-join decomposition.

What if the decomposition contains more than two relations.


Test for Lossless Join *Algorithm TestLJ (Chase)Input: A relation schema R(A1, …, Am), a set of

FDs F, and a decomposition D = {R1, …, Rn}.

Output: Yes, if D is a Lossless join; no, otherwise.

Method:1. Create an n m table T (labeled by Ai and

Rj).

2. If Ri contains Aj, place aj at Ti,j. Otherwise,

place bij at Ti,j.


TestLJ (cont.) *3. Repeat for each FD X Y in F do For all rows with identical symbols on X

do make the symbols on Y identical. (choose aj over bij whenever possible)

Until no more change can be made.4. Return yes if there is a row of aj’s.

Otherwise, return no.


TestLJ: An ExampleContinue with the previous example. Set up the table T.

Enforce B AH.

B A H L C At BAH a1 a2 a3 b14 b15 b16

BLCAt a1 b22 b23 a4 a5 a6

B A H L C At BAH a1 a2 a3 b14 b15 b16

BLCAt a1 a2 a3 a4 a5 a6

Need to repeat until no more changes.


Dependency-Preserving Decomposition

Let F be a set of FDs over R, and D = {R1, R2, ..., Rn} be a decomposition of R. D is a dependency-preserving decomposition if

F+ = (R1(F) R2(F) . . . Rn(F))+

where for i = 1, …, nRi(F) = { X Y | X Y F and XY Ri}.

Restrict FDs to local relations. If all “global” FDs can be derived from “local” FDs, all dependencies are preserved.


Dependency Preservation: An Example

Consider F = {CS Z, Z C} over R(City, Street, Zipcode), and D ={R1(S, Z), R2(C, Z)}.

Then R1(F) = {} and R2(F) = {Z C} (consider non-trivial FDs only)

Since CS Z F+ but

CS Z (R1(F) R2(F))+,

D is not dependency-preserving.


Test for Dependency Preservation Algorithm TestDPInput: A relation schema R, A set of FDs F

over R, a decomposition D = {R1, R2, ..., Rn} of R.

Output: Yes, if D is dependency-preserving; no, otherwise.

Method: for every X Y F if Ri such that XY Ri

then X Y is preserved;


TestDP (cont.)

else W := X; repeat for i from 1 to n do

W := W ((W Ri)+ Ri);

until there is no change to W; if Y W then X Y is preserved; if every X Y is preserved then return yes; else return no. Derive global FDs using only local FDs.


TestDP: An example Consider F = {A B, B C, C D, D A }

over R(A, B, C, D), & D = {R1(A,B), R2(B,C), R3(C,D)}.

Is D a dependency-preserving decomposition? Since AB R1, A B is preserved. Since BC R2, B C is preserved. Since CD R3, C D is preserved.Since DA is not in any one of the three

relations, we need to compute W.


TestDP: An example (cont.) *

Initialization: W = D; first iteration: W = D ((D AB)+ AB) = D; W = D ((D BC)+ BC) = D; W = D ((D CD)+ CD) = D (D+ CD) = D (ABCD CD) = CD; W changed from D to CD.


TestDP: An example (cont.) *

second iteration:

W = CD ((CD AB)+ AB) = CD;

W = CD ((CD BC)+ BC)

= CD (C+ BC) = BCD;

W = BCD ((BCD CD)+ CD) = BCD; W changed from CD to BCD.


Normalization

It is good to have BCNF relation schemas. If a relation schema is not in BCNF, then

decompose it into a set of relation schemas: every new schema is in BCNF; it is lossless-join (can guarantee); it is dependency-preserving (no

guarantee). If not possible to have all nice properties,

be happy with a lossless join, dependency preserving 3NF decomposition (can guarantee)


Normalization to BCNF

Algorithm LLJD-BCNFInput: R: A relation schema F: A set of FDs satisfied by R.Output: A lossless-join decomposition D =

{R1, …, Rn}, such that each Ri is in BCNF.


Normalization to BCNF (cont.)

Method: D := {R}; while Ri D that is not in BCNF do

begin Find an FD X Y such that (1) Ri is not BCNF because of X Y, and

(2) XY Ri;

D := D - Ri {Ri - Y, XY}

end;


Normalization to BCNF (cont.) *

Theorem: Algorithm LLJD-BCNF is correct.Proof (sketch): Every schema in D is in BCNF because the

algorithm will not stop otherwise. D is a lossless-join decomposition because in

each iteration, Ri is decomposed into 2 smaller schemas (Ri - Y) and XY and they satisfy the condition:

(Ri - Y) XY = X Y = (XY - (Ri - Y)).


Normalization to BCNF: An Example

Consider F = {B AH, L CAt} over Bank-Loans(Bank, Assets, Headquarter,

Loan#, Customer, Amount), and a set of FDs,

Candidate keys: LBInitialization: D = {BAHLCAt }


Normalize to BCNF: An Example * 1st iteration: Ri = BAHLCAt is not in BCNF because B AH

is not a trivial FD and B is not a superkey. Replace BAHLCAt by BAH and BLCAt. Hence: D = {BAH, BLCAt}. BAH is in BCNF because in BAH, B is a

candidate key.


Normalize to BCNF: An Example *

2nd iteration: Ri = BLCAt is not in BCNF because L CAt is not a

trivial FD and L is not a superkey in BLCAt. Replace BLCAt by CLAt and BL.

Hence, D = {BAH, CLAt, BL}. CLAt is in BCNF because in CLAt, L is a candidate

key. BL is in BCNF (see theorem on next page).Final result: D = {BAH, CLAt, BL}. D happens to be dependency-preserving.

Any relation schema with exactly two attributes is in BCNF.


Normalize to BCNF: Another Ex. *

Consider R(City, Street, Zipcode), and F = {CS Z, Z C }. Candidate keys: CS, ZS.Initialization: D = {CSZ};1st iteration: R = CSZ is not in BCNF because Z C is

not a trivial FD and Z is not a superkey. D = {ZC, ZS}.D is not dependency-preserving because CS

Z is not preserved.


Equivalence of FD Sets Let F and G be two sets of FDs satisfied by

R. F and G are equivalent, denoted by F G, if F+ = G+.

Example: F = {B CD, AD E, B A} and G = {B CDE, B ABC, AD E} are equivalent.

Check to see that every FD in F is also in G+ and that every FD in G is also in F+


Extraneous Attributes Let F be a set of FDs. F contains an extraneous attribute A if

there is an FD X Y in F, such that either A X, and [F - {X Y} {X - {A} Y}] F; or A Y, and

[F - {X Y} {X Y - {A} }] F.

This is a “useless” attribute either at the left side or at the right side of an FD.


Summary

A good schema should have three properties: BCNF (or 3NF if BCNF can not be

obtained) Lossless join Dependency preserving

Lossless join BCNF decomposition is guaranteed, need to check for dependency preservation

Lossless join, dependency preserving 3NF decomposition is guaranteed (need to find the minimal cover)


Look Ahead Next topic: SQL Overview & DDL Read textbook:

Chapter 8, 10.1-10.6

Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization If...

Documents

Transcript of Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization If...