1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College...

36
1 Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College Normalisation 5

Transcript of 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College...

Page 1: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

1

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Normalisation

5

Page 2: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

2

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Outline

Boyce-Codd Normal Form (BCNF) normalisation

non-loss decomposition Heath’s theorem normalisation process

• semantic assumptions and FDs

• CKs

• decomposition

normalisation vs dependency preservation• a decomposition may yield to a better solution than another one

• either-or situations: normalise or preserve FDs

Page 3: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

3

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

1

Page 4: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

4

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

2NF and 3NF

optional 2NF

• a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys

3NF (Zaniolo)• R is a relation; X is any set of attributes of R; A is any single

attribute of R; consider the following conditions:– X contains A– X contains a candidate key of R– A is contained in a candidate key of R

• if either of the three is true for every FD X A then R is in 3NF

Page 5: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

5

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

BCNF

a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant

informally the determinant of each relevant FD is a CK

Page 6: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

6

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Example

devise examples in class relations in BCNF relations not in BCNF

Page 7: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

7

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

BCNF

any relation can be non-loss decomposed into an equivalent set of BCNF relations

BCNF 3NF 2NF 1NF BCNF is still not guaranteed to be free of any update

anomalies

Page 8: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

8

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

2

Page 9: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

9

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Normalisation

the process of transforming a relation with redundancies into an “equivalent” set of relations that have less redundancies

“transformation” projection input :: one relation, say R output :: many relations, say R1, …, Rn

“equivalent” non-loss decomposition R1 join R2 … join Rn = R

R1, …, Rn should have normal forms higher than or equal to that of R

Page 10: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

10

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Non-loss decomposition

(Patient, Symptom, Doctor, Office, Diagnosis)• semantic assumptions

exercise

Page 11: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

11

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Lossy Decomposition

(Patient, Symptom, Doctor, Office, Diagnosis)• semantic assumptions

exercise

Page 12: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

12

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Heath’s theorem

can be used as the basis for normalisation theorem

suppose• R = (A, B, C), where A, B and C are disjoint sets of attributes

• AB

then • R = (A, B) join (A, C)

state “in English”

Page 13: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

13

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Normalisation – rules of thumb

take as basis for normalisation/Heath’s theorem a “problem” FD

maximise B when applying Heath’s theorem, on the basis of AB

try to maintain a one-to-one correspondence with real life entities

Page 14: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

14

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Normalisation

steps semantic assumptions FDs CKs decomposition

Page 15: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

15

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Simple example

(M_id, M_name, Type, Value)• M_id M_name

• M_id Type

• M_id Value

• Type Value

• not BCNF

Heath’s theorem for Type Value results

(Type, Value) (M_id, M_name, Type)

both relations are now in BCNF

Page 16: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

16

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

3

Page 17: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

17

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Example (R)

(project, task, max-budget, duration, payment-rate, contractor, contr-time)

FDs:

(project, task) max_budget, duration

(task, max_budget, duration) payment_rate

(project, task, contractor) contr_time

(project, task, max-budget, duration, payment-rate, contractor, contr-time)

Page 18: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

18

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Example – decomposition for R

Heath’s theorem for R (the initial relation) based on task, max_budget, duration payment_rate

leads to: R1 (task, max_budget, duration, payment_rate)

R2 (project, task, max_budget, duration, contractor, contr_time)

R1 is in BCNF

R2 is not in BCNF, due to

project, task max_budget, duration

Page 19: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

19

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Example – decomposition for R2

Heath’s theorem for R2, based on

project, task max_budget, duration leads to R21 (project, task, max_budget, duration)

R22 (project, task, contractor, contracted_time)

R21 is in BCNF

R22 is in BCNF

Page 20: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

20

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Example – solution

(task, max_budget, duration, payment_rate) (project, task, max_budget, duration) (project, task, contractor, contracted_time)

Page 21: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

21

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

4

Page 22: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

22

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Decomposition – 2 or more solutions

in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one

Page 23: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

23

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Decomposition – 2 solutions – example

Modules(M_id, M_name, Type, Value) solution #1

• Modules_Descr(M_id, M_name, Type)

• Type_Val(Type, Val)

solution #2 • Modules_Descr(M_id, M_name, Type)

• Module_Val(M_id, Val)

are they both non-loss? (apply Heath’s theorem) is there one better than the other?

Page 24: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

24

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Solution #1 vs Solution #2

updates• u1: insert the fact that a 3 semester module is worth 1.5cu

• u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu

• u3: change the type of a module but forget to change its value

solution #2 • u1 and u2 are impossible or very difficult to perform

• u3 is allowed

solution #1• u1 and u2 are straightforward

• u3 is not allowed

Page 25: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

25

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Solution #1 vs Solution #2

solution #1 more expressive

• certain facts cannot be expressed in solution #2; e.g. the value of a new type

• updates can be independently performed on the two component relations (i.e. all constraints are properly expressed)

• in solution #2: Type Value is lost, so this constraint must be enforced by the user by procedural code

independent projections • updates can be performed independently on each projection,

without the danger of ending with inconsistent data

Page 26: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

26

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Independent projections

M-id Type Value

Solution #1 Solution #2

M_name

M-id TypeM_name

Type Value

M-id TypeM_name

M_id Value

all direct : intraall transitive : inter

one transitive : intraone direct : lost

Page 27: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

27

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Independent projections - Risanen

R1 and R2 are two projections of R; R1 and R2 are independent if and only if

• every FD in R is a logical consequence of the FDs in R1 and R2

• the common attributes of R1 and R2 for a candidate key for at least one of R1 or R2

atomic relation• cannot be decomposed into independent projections

Page 28: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

28

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Dependency preservation

R was decomposed (normalisation) into R1, …, Rn

S - the set of FDs for R S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to

only the attributes of Ri)

S’ = S1 … Sn (usually, S’ S)

the decomposition is dependency preserving if S’+ = S+

Page 29: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

29

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

5

Page 30: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

30

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Normalisation vs dependency preservation

there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies: either the relation is normalised and some FDs are lost or, some FDs are not lost (they are expressed in the original

relation), but the relation is not in its higher normal form possible

in this case, no solution is better than the other other criteria will have to be considered to judge better

Page 31: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

31

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

a patient is treated by a single doctor for a certain disease each doctor only treats one kind of disease a doctor can treat more than one patient

is this relation BCNF? can you identify update anomalies? consider also (Patient, Disease, Doctor, Treatment)

with Patient, Disease Treatment

DiseaseDoctor

Patient

Normalisation vs dependency preservation: Example

Patient Disease Doctor

Page 32: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

32

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Possible decompositions

Patient Doctor

Patient Disease

Patient Doctor

Doctor Disease

Disease Doctor

Patient Disease

non-loss? (choose PKs)

non-loss? (choose PKs)

Heath’s theorem (choose PKs)

Page 33: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

33

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

BCNF vs dependency preservation

Patient Doctor Doctor Diseaseand

do not enforce a FD existing in the original specification, namely:

e.g. a patient can be given two doctors that treat the same disease (the system will not disallow this); the constraint would have to be maintained by procedural code

DiseaseDoctor

Patient

Page 34: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

34

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

BCNF vs dependency preservation

not every FD is expressible through normalisation when the relation was in its original form (3NF)

(Patient, Disease) Doctor was expressed• a doctor could not be assigned to more than one patient-disease

Doctor Disease was not expressed • generated update anomalies

in BCNF (decomposed) Doctor Disease was expressed (Patient, Disease) Doctor was not expressed

• generated update anomalies (refer to previous slide)• this latter FD would not have been expressed even if the

decomposition in all three 2-attribute relations had been considered

Page 35: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

35

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Page 36: 1 Term 2, 2004, Lecture 3, NormalisationMarian Ursu, Department of Computing, Goldsmiths College Normalisation 5.

36

Term 2, 2004, Lecture 3, Normalisation Marian Ursu, Department of Computing, Goldsmiths College

Conclusions

normal forms : formalisation of common sense art engineering possibility for automation; difficult, because of non-

determinism (more than one choices at one step)

BCNF always achievable not always free of update anomalies, because it cannot

always express all the FDs existing in the problem

there are higher normal forms (4NF, 5NF) defined on the basis of other concepts (not FDs)