Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed....

24
Database System Concepts, 7 th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Functional Dependencies and Normal Forms (Part 1)

Transcript of Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed....

Page 1: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

Database System Concepts, 7th Ed.

©Silberschatz, Korth and Sudarshan

See www.db-book.com for conditions on re-use

Functional Dependencies and Normal Forms

(Part 1)

Page 2: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.2Database System Concepts - 7th Edition

Features of Good Relational Designs

Suppose we have the following table, where ID is the key

There is repetition of information

If we want to update the building of the Comp. Sci. dept., we need to do it

in all rows where Comp. Sci. appears

Need to use null values (if we add a new department with no instructors)

Page 3: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.3Database System Concepts - 7th Edition

Functional Dependencies

Suppose we have the following table, where ID is the key

The data above is repeated, but there is a pattern… the department name

uniquely determines its building and budget.

This is a property of the real-world environment we want to model, and not of

only the specific table above.

In this case, we have that dept_name functionally determines building and

budget

Page 4: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.4Database System Concepts - 7th Edition

There are usually a variety of constraints (rules) on the data in the real

world.

For example, some of the constraints that are expected to hold in a

university database are:

• Students and instructors are uniquely identified by their ID.

• Each student and instructor has only one name.

• Each instructor and student is (primarily) associated with only one

department.

• Each department has only one value for its budget, and only one

associated building.

Functional Dependencies (Cont.)

Page 5: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.5Database System Concepts - 7th Edition

Functional Dependencies (Cont.)

A legal instance of a database (i.e., what we would like to admit as a valid

instance of the database) is one where all the real-world constraints are

satisfied.

Some real-world constraints can be expressed via so-called functional

dependencies

Page 6: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.6Database System Concepts - 7th Edition

Functional Dependencies Definition

Let R(U) be a relation schema. a functional dependency of R is an expression of the form

X Y

where X U and Y U

An instance r of R satisfies the functional dependency above if, whenever any two tuples t1 and t2 of r agree on the attributes X, they also agree on the attributes Y. That is,

t1[X] = t2 [X] t1[Y ] = t2 [Y ]

Example: Consider R(A,B) with the following instance r.

The instance satisfies B A; but not A B.

To specify that our legal instances of the relation R(U) must satisfy a certain set of functional dependencies F, we write <R(U),F>

1 4

1 5

3 7

Page 7: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.7Database System Concepts - 7th Edition

Functional Dependencies

Suppose we have the following table

<R(ID,name,salary,dept_name,building,budget),

• ID -> name,salary,dept_name,building,budget,

• dept_name -> building,budget >

Page 8: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.8Database System Concepts - 7th Edition

Decomposition

Functional dependencies highlight the parts of a relation where repetition

of information occurs (redundancy).

We can try avoiding the repetition, by decomposing the relation R in two

more relations. Functional dependencies give us suggestions on how to

do it.

< Prof(ID,name,salary,dept_name), ID -> name,salary,dept_name >

< Dept(dept_name,building,budget), dept_name -> building,budget >

To reduce redundancy, we made dept_name a key of a new relation

Dept, and the relation Prof only needs to refer to it with the key.

If we try to join the two tables together, we will get the original table.

However, we cannot decompose a relation arbitrarily, otherwise, the

above property might be lost.

Page 9: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.9Database System Concepts - 7th Edition

A Lossy Decomposition

Page 10: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.10Database System Concepts - 7th Edition

Lossless Decomposition

Let <R(U), F> be a relation schema. A decomposition of <R(U), F> is a

set of relation schemas

< 𝑅1 𝑿𝟏 , 𝐹1 >, …, < 𝑅𝑛 𝑿𝒏 , 𝐹𝑛 >

where 𝑼 = 𝑋1 ∪⋯∪ 𝑋𝑛

We say that the decomposition is a lossless decomposition if there is

no loss of information by replacing R with the n relations

Formally, for every instance r of R that satisfies F

• 𝑟 = Π𝑿𝟏 𝑟 ⋯ Π𝑿𝒏(𝑟)

Page 11: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.11Database System Concepts - 7th Edition

Normal Forms

Note that it is not always advised to decompose a relation:

• splitting a relation in multiple relations might make some queries less

efficient (we need to do more joins).

it is up to us, as designers, to understand how much our queries are

affected, and decide if we want to decompose or not.

If we decided we want to decompose our relation schema, then:

• The decomposition must be lossless (mandatory)

• The redundancies should be eliminated, or at least reduced as much

as possible

• The functional dependencies of the original schema should be

preserved in the decomposition, if possible.

To achieve the above properties, we decompose our schema into some

other schema in a so-called normal form.

To define normal forms, we need first some auxiliary notions.

Page 12: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.12Database System Concepts - 7th Edition

Closure of a Set of Functional Dependencies

Given a set F set of functional dependencies, there are certain other

functional dependencies that are logically implied by F.

• If A B and B C, then we can infer that A C

• etc.

A set F of FDs logically implies an FD 𝑋 → 𝑌 if every instance that

satisfies F also satisfies 𝑋 → 𝑌

The set of all functional dependencies logically implied by F is the

closure of F.

We denote the closure of F by F+.

How can we compute the closure of a set F?

Page 13: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.13Database System Concepts - 7th Edition

Closure of a Set of Functional Dependencies

We can compute F+, the closure of F, by repeatedly applying Armstrong’s

Axioms:

• Reflexive rule: if Y X, then X Y

• Augmentation rule: if X Y, then X Z YZ

• Transitivity rule: if X Y, and Y Z, then X Z

These rules are

• Sound -- generate only functional dependencies that actually hold,

and

• Complete -- generate all functional dependencies that hold.

Additional rules:

• Union rule: If X Y holds and X Z holds, then X YZ holds.

• Decomposition rule: If X YZ holds, then X Y holds and X

Z holds.

The above rules can be inferred from Armstrong’s axioms.

Page 14: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.14Database System Concepts - 7th Edition

Closure of a Set of Functional Dependencies

Example.

• A -> BCDEFGH, CE -> A, BD -> E

CE -> A and A -> BCDEFGH imply CE -> BCDEFGH

BD -> E implies BDC -> CE

BDC -> CE and CE -> A imply BDC -> A

Page 15: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.15Database System Concepts - 7th Edition

Closure of a set of Attributes

Another notion we are going to need, is the closure of a set of attributes.

This is useful for finding the superkeys of a relation.

Given a set of functional dependencies F and a set of attributes X

The closure of X, denoted 𝑋 +, is the set of all attributes that are

functionally determined by X. How do we compute it?

• Start from 𝑋 + = X.

• If there is a FD Z -> W in F with Z 𝑋 +, add W to 𝑋 +.

• Repeat until 𝑋 + does not change.

Example. F = {A -> B, B -> C, AC -> D }. What is (𝐴)+ ?

Starting from 𝑋 + = A, we derive B, and then C, obtaining ABC, then we

have AC -> D, and thus 𝑋 + = ABCD. A -> ABCD

Page 16: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.16Database System Concepts - 7th Edition

Closure of a set of Attributes (Cont.)

Consider < R(U), F >. The closure of a set X of attributes is very useful to

check if X is a superkey of the relation R.

Just compute 𝑋 +, and check if 𝑋 +=U. X -> U

X might not be a (minimal) key:

• We might be able to remove some attributes from X, and still derive U.

Example. < R(A,B,C,D), F = {A -> B, B -> C, AC -> D } > .

• 𝐴 += ABCD. A is a superkey (in this case, even a key).

• 𝐴𝐶 + = ACBD. AC is a superkey, but not a key.

• 𝐵 + = BC. B is not a superkey (and thus, not even a key).

Page 17: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.17Database System Concepts - 7th Edition

Trivial Functional Dependencies

A functional dependency X -> Y is trivial if Y X.

A trivial functional dependency is satisfied by all instances of a relation

Example:

• ID, name ID

• name name

Page 18: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.18Database System Concepts - 7th Edition

Boyce-Codd Normal Form

We now have all the ingredients to defined our first normal form.

A relation schema <R(U), F> is in Boyce-Codd Normal Form (BCNF) if for all functional dependencies X -> Y in F+ that are not trivial

• X is a superkey of R

Intuition: the only kind of redundancy that any relevant FD can

describe is the one where data is determined by a key of the relation,

and nothing else.

Since a key is unique, for each tuple, it means that there is no

redundancy in a schema in BCNF.

Page 19: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.19Database System Concepts - 7th Edition

Boyce-Codd Normal Form (Cont.)

Example schema that is not in BCNF:

< ProfDept (ID, name, salary, dept_name, building, budget ), F >

The set of functional dependencies is

• F = ID -> name,salary,dept_name,building,budget,

dept_name -> building, budget.

The only key is ID. The second dependency violates the BCNF condition.

If we decompose the relation schema into:

<Prof(ID,name,salary,dept_name> , ID -> name,salary,dept_name >

<Dept(dept_name,building,budget), dept_name -> building, budget>

The above two schemas are in BCNF. The decomposition is also lossless

Page 20: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.20Database System Concepts - 7th Edition

Minimal cover

Computing the closure of F can be very hard an time consuming, as it can contain exponentially many FDs.

We solve the issue, by focusing on a simpler equivalent version of F.

Consider a set F of functional dependencies. A minimal cover of F is a set of functional dependencies 𝐹𝑚𝑖𝑛 such that:

• 𝐹𝑚𝑖𝑛+ = 𝐹+ (i.e., the two sets are equivalent)

• All functional dependencies in 𝐹𝑚𝑖𝑛 are of the form X -> A

• If we remove one FD or an attribute from the left size of an FD in 𝐹𝑚𝑖𝑛 , then 𝐹𝑚𝑖𝑛 is no more equivalent to F, i.e., 𝐹𝑚𝑖𝑛

+ ≠ 𝐹+

So, 𝐹𝑚𝑖𝑛 contains a minimal amount of “information” to describe all the FDs implied by F.

Page 21: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.21Database System Concepts - 7th Edition

Minimal cover

One can prove that a schema <R(U), F> is in BCNF iff the BCNF conditions are satisfied by 𝐹𝑚𝑖𝑛. So, we can focus on 𝐹𝑚𝑖𝑛.

How do we compute a minimal cover of F?

• Step 1: Normalize each X -> ABC… in F, into X -> A, X -> B, X -> C, …

• Step 2: Until nothing more changes,

if there is an FD XA -> B, with 𝐴 ∈ 𝑋 +, then A is redundant, and can be removed.

• Step 3: If there is an FD X -> A such that 𝑋 + contains A, even if X -> A is not used to construct 𝑋 +, then X -> A is redundant, and can be removed

Page 22: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.22Database System Concepts - 7th Edition

Minimal cover

Example. < R(A,B,C,D,E), F>, with

F=A -> BCE, CDB -> A, CD -> E, E -> B

First, normalize:

A -> B, A -> C, A -> E, CDB -> A, CD -> E, E -> B

Remove left attributes:

𝐶𝐷 + = CDEB, so B is redundant

𝐶 + = 𝐶 (nothing to do)

𝐷 + = 𝐷 (nothing to do)

Remove redundant FDs:

A can derive B without using A -> B: it can derive ACEB. No other FDs are redundant.

𝐹𝑚𝑖𝑛 = 𝐴 → 𝐶, 𝐴 → 𝐸, 𝐶𝐷 → 𝐴, 𝐶𝐷 → 𝐸, 𝐸 → 𝐵

Page 23: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.23Database System Concepts - 7th Edition

Algorithm for BCNF decomposition

Algorithm: BCNF decomposition

• Input: <R(U), F> (where F is a minimal cover)

• Output: a decomposition of <R(U), 𝐹> in BCNF that is lossless

Choose some FD X -> A in 𝐹 that violates the BCNF conditions.

Compute Y = 𝑋 + ∖ 𝑋 and Z = U ∖ 𝑋𝑌

Construct the two relation schemas:

• < 𝑅1 𝑋𝑌 , (Π𝑋𝑌𝐹+)𝑚𝑖𝑛>,< 𝑅2 𝑋𝑍 , (Π𝑋𝑍𝐹

+)𝑚𝑖𝑛>

If one of the two schemas is not yet in BCNF, decompose it again.

Page 24: Functional Dependencies and Normal Forms (Part 1)...Database System Concepts, 7th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Functional Dependencies and Normal

©Silberschatz, Korth and Sudarshan7.24Database System Concepts - 7th Edition

Algorithm for BCNF decomposition

Example. < R(A,B,C), F= A -> B, B -> C>

F is already a minimal cover (nothing to remove).

The only key is A (because is the only set of attributes functionally

determining all the others).

A -> B satisfies the BCNF condition, but B -> C does not.

So, we compute all attributes that B can derive: 𝐵 + = 𝐵𝐶

We now split R in two relations:

• one relations has attributes 𝐵 +=BC,

• the other has all the remaining attributes (A)

• B must stay in both, to allow the two relations to join.

< 𝑅1 𝐵, 𝐶 , 𝐵 → 𝐶 >,< 𝑅2 𝐴,𝐵 , 𝐴 → 𝐵 >

Both schemas are in BCNF. Done.