Closure

28
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 B k } + , is defined as follows: {B 1 …B k } + = {B | any relation satisfies S will also satisfies B 1 …B k B}

description

Closure. The closure of { B 1 …B k } under the set of FDs S , denoted by { B 1 …B k } + , is defined as follows: { B 1 …B k } + = {B | any relation satisfies S will also satisfies B 1 …B k B}. Computing the closure. Given: the set S and { A 1 ,…,A n } - PowerPoint PPT Presentation

Transcript of Closure

Page 1: Closure

Closure

• The closure of {B1…Bk} under the set of FDs S, denoted by {B1…Bk}+, is defined as follows:

{B1…Bk}+ = {B | any relation satisfies S

will also satisfies B1…BkB}

Page 2: Closure

Computing the closure

• Given: the set S and {A1,…,An}• Compute: {A1,…,An}+- denote this set by X• Step 1: X = {A1,…,An}• Step 2: find a FD B1…BkB in S such that

{B1,…,Bk} X and B X, then X=X{B}• Step 3: repeat step 2 until nothing more can

be added to X, then go to step 4• Step 4: return X

Page 3: Closure

Example

• S = {ABC, BCAD, DE, CFB}• Compute {A,B}+ • Step 1: X = {A,B}• Step 2: X = X {C}={A,B,C} because ABC • Step 3 back to step 2: X = X {D} because

BCAD• Step 3 back to step 2: X = X {E} because DE• Step 3 back to step 2: nothing more• Step 3 go to step 4: return {A,B,C,D,E}

Page 4: Closure

Correctness of closure algorithm

• It computes true functional dependencies– proof: show that if B belongs to {A1,…,An}+ then A1…AnB

holds. By induction over the number of steps (n) used in adding an attribute B into the set X

• n=0 then B belongs to {A1,…,An} and so A1…AnB is a trivial functional dependency

• n n+1: if B is added to X in the step n+1, then A1…AnBj for all j by inductive hypothesis; this, together with B1…BkB, implies that A1…AnB

• It computes all functional dependencies– proof: show that if B does not belong to {A1,…,An}+ then A1…

AnB does not hold. By constructing an instance I of the relation R such that the FD does not hold.

A in the closure Others

111 … 111 000 … 000

111 … 111 111 … 111

Page 5: Closure

Simple questions

• What is {A1,…,An}+ if {A1,…,An} is a key of the relation?

• Can {A1,…,An}+ ={A1,…,An}?

• Does {B1,…,Bm} {A1,…,An} imply {B1,…,Bm}+ {A1,…,An}+?

Page 6: Closure

Transitive Rules

• Given A1…An B1…Bm

B1…Bm C1…Ck

• then A1…An C1…Ck

Page 7: Closure

Closing sets of FDs

• Given a set of FDs we can derive some other FDs using the rules about FDs (e.g. combining, splitting, and transitive)

• For a relation R, a set of FD is called a basis for R if all other FDs of R can be derived form it.

• A basis is minimal if none of its proper subsets is a basis.

Page 8: Closure

Projecting FDsB’s

R S• Given: – R with a set of FDs F– S (a new relation) is obtained by removing the attributes {B1,…,Bm} from

R • Questions: What are the FDs of S?• Answer: if A1…AnC1…Ck is a FD of R and none of the Bs appears

on the left or right side ({B1,…,Bm}{A1,…,An,C1,…,Ck}=) is a FD of S

Page 9: Closure

Projecting - Example

• Given R(A,B,C,D) with the FDs AB, BC, and CD.

• Remove the attribute B from R, we obtain a new relation S(A,C,D).

• What are the FDs of S?– AC?– AD?– CD? We can compute this by:

•Compute all the closure of every subset of {A,C,D}by using the FDs of R that do not contain B.

Page 10: Closure

Homework

3.5.1 Consider a relation with schema R(A,B,C,D) and FD’s ABC, CD, and DA.– What are all the nontrivial FD’s that follow from the given

FD’s? List only the FDs with one attribute on the right? (5pt)

– What are the keys of R? (5pt)

– What are the superkeys but not keys? (5pt)

3.5.3 Show that the following rule holds: (5pt)

if A1…AnB1…Bm and C1…CkD1…Dt hold

then A1…AnC1…Ck B1…BmD1…Dt also holds.

Page 11: Closure

For those whole like fun:

3.5.4 Does the following hold:– if AB then BA– if ABC and AC then BC

3.5.8 A set of attributes is closed if X+=X. What are the FDs of a relation R(A,B,C,D) if– all sets of four attributes are closed– the only closed sets are {} and {A,B,C,D}– the closed sets are {}, {A,B}, {A,B,C,D}

(note: the cases are considered separate)Stars: try the exercises with stars.

Page 12: Closure

Design of Relational Database Schema

title year length studioName starName filmType

Star Wars 1977 124 Fox Mark Hamill color

Star Wars 1977 124 Fox Harrison Ford color

Star Wars 1977 124 Fox Carrie Fisher color

Mighty Ducks 1991 104 Disney Emilio Estevez color

Wayne’s World 1992 95 Paramount Dana Carvey color

Wayne’s World 1992 95 Paramount Mike Meyers color

Some observations:• value of studioName is the same in several tuples • value of filmType is also repeated

What wrong with it?• redundancy store the same value unnecessary several time• update anormalies an update might require several changes • deletion anormalies losing information if delete a value

CAN WE AVOID THESE ANORMALIES?

Page 13: Closure

Possible ways to avoid anormalies (Intuition)

• The bad way: start again (Oh, no!)

• The natural way: try to decompose the given relation into two or more relations that– contain the same information– avoid the anormalies

Page 14: Closure

Exampletitle year length studioName starName filmType

Star Wars 1977 124 Fox Mark Hamill color

Star Wars 1977 124 Fox Harrison Ford color

Star Wars 1977 124 Fox Carrie Fisher color

Mighty Ducks 1991 104 Disney Emilio Estevez color

Wayne’s World 1992 95 Paramount Dana Carvey color

Wayne’s World 1992 95 Paramount Mike Meyers color

title year length studioName filmType

Star Wars 1977 124 Fox color

Mighty Ducks

1991 104 Disney color

Wayne’s World

1992 95 Paramount color

title year starName

Star Wars 1977 Mark Hamill

Star Wars 1977 Harrison Ford

Star Wars 1977 Carrie Fisher

Mighty Ducks 1991 Emilio Estevez

Wayne’s World 1992 Dana Carvey

Wayne’s World 1992 Mike Meyers

MovieStudioStar(title, year, length, studioName, starName, filmType)is decomposed into 2 relations MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)

Page 15: Closure

Decomposition

Given a relation R with schema {A1,…,An}. A decomposition of R into two relations S and T with schemas {B1,…,Bm} and {C1,…,Ck}, respectively, such that

1. {A1,…,An} = {B1,…,Bm} {C1,…,Ck}

2. The tuples in S are the projections onto {B1,…,Bm} of all the tuples in R.

3. The tuples in T are the projections onto {C1,…,Ck} of all the tuples in R.

Page 16: Closure

Example – Projectionstitle year length studioName starName filmType

Star Wars 1977 124 Fox Mark Hamill color

Star Wars 1977 124 Fox Harrison Ford color

Star Wars 1977 124 Fox Carrie Fisher color

Mighty Ducks 1991 104 Disney Emilio Estevez color

Wayne’s World 1992 95 Paramount Dana Carvey color

Wayne’s World 1992 95 Paramount Mike Meyers color

title year length studioName filmType

Star Wars 1977 124 Fox color

Mighty Ducks

1991 104 Disney color

Wayne’s World

1992 95 Paramount color

title year starName

Star Wars 1977 Mark Hamill

Star Wars 1977 Harrison Ford

Star Wars 1977 Carrie Fisher

Mighty Ducks 1991 Emilio Estevez

Wayne’s World 1992 Dana Carvey

Wayne’s World 1992 Mike Meyers

MovieStudioStar(title, year, length, studioName, starName, filmType)is decomposed into 2 relations MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)

How do we come up with this decomposition?

Page 17: Closure

Boyce-Codd Normal Form (BCNF)

• BCNF: a relation R is in BCNF iff: whenever there is a nontrivial FD A1…AnB for R, it is the case that {A1,…,An} is a superkey for R.

• Why this definition? Answer: if a relation is in BCNF then there is no anormaly.

• Example: MovieStudioStar(title, year, length, studioName, starName, filmType): not in BCNFMovieStudio(title, year, length, studioName, filmType): in BCNF StarsIn(title, year, starName): in BCNF

Page 18: Closure

Decomposition into BCNF

• Suppose that we decompose a relation R into two relations S and T which are in BCNF. The requirements for S and T:– S and T is a decomposition of R– it is possible to reconstruct R from S and T

• Will every decomposition of R satisfy these two conditions?

• What are the FDs of the new relations?

Page 19: Closure

Algorithm

• Given a relation R with the attributes {A1,…,An}.• Step 1: For every nontrivial FD B1…BmB if {B1,…,Bm}

is a superkey then returns R (no decomposition is needed)• Step 2: Takes a nontrivial FD B1…BmB such that {B1,

…,Bm} is not a superkey, then decomposes R into two relations S and T with the following schema:– S’s schema: {B1,…,Bm}+

– T’s schema: {B1,…,Bm} ({A1,…,An}\{B1,…,Bm}+)

• Repeat Step 1&2 for S and T until no decomposition is needed for every new relation; return the set of new relations as the result

Page 20: Closure

Example

The ‘new’ movie relation with the following attributes: {title,year,studioName,president,presAddress} (we call this set ALL)

with the FDs: {title yearstudioName, studioNamepresident, presidentpresAddress}Only one key: {title,year}studioNamepresident violated BCNFStep 2: takes studioNamepresident, decomposes into

– S with the schema {studioName}+={studioName,president,presAddress}– T with the schema

{studioName,title,year}={studioName} (ALL\ {studioName}+)Check: {studioName,title,year} is in BCNF (the first two FDs)

{studioName,president,presAddress} is not in BCNFContinue with the decomposition of S using presidentpresAddress and we get the

following two relation schemas: {president,presAddress} and {president,studioName} both are in BCNF.

The final result: {studioName,title,year}, {president,presAddress},{president,studioName}

Page 21: Closure

Recovering information from a decomposition

• Suppose that R with the schema {A1,…,An} is decomposed into two relations S and T according to the algorithm whose attributes are {B1,…,Bm}+ and {B1,…,Bm} ({A1,…,An}\{B1,…,Bm}+)

• The tuples of R can be obtained by joining all possible pairs of S and T where {B1,…,Bm} have the same values.

Page 22: Closure

Recovering …

t (R)

t’ (S)

t’’ (T)

the B’sthe rest of the closure others

Pro

ject

ion

Join

{B1,…,Bm}

{A1,…,An}\{B1,…,Bm}+{B1,…,Bm}+\ {B1,…,Bm}

Page 23: Closure

Example – Decomposition and Recoveringtitle year length studioName starName filmType

Star Wars 1977 124 Fox Mark Hamill color

Star Wars 1977 124 Fox Harrison Ford color

Star Wars 1977 124 Fox Carrie Fisher color

Mighty Ducks 1991 104 Disney Emilio Estevez color

Wayne’s World 1992 95 Paramount Dana Carvey color

Wayne’s World 1992 95 Paramount Mike Meyers color

title year length studioName filmType

Star Wars 1977 124 Fox color

Mighty Ducks

1991 104 Disney color

Wayne’s World

1992 95 Paramount color

title year starName

Star Wars 1977 Mark Hamill

Star Wars 1977 Harrison Ford

Star Wars 1977 Carrie Fisher

Mighty Ducks 1991 Emilio Estevez

Wayne’s World 1992 Dana Carvey

Wayne’s World 1992 Mike Meyers

MovieStudioStar(title, year, length, studioName, starName, filmType) is not in BCNFis decomposed into 2 relations that are in BCNF:MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)

Page 24: Closure

Some remarks• The algorithm will stop and output a set of BCNF relations.• Not every decomposition according to the algorithm is good • The FD’s for the new relations are determined by ‘projecting’.• If a decomposition is based on FDs (according to the algorithm)

then the recovering process will give us exactly the original relation.

• If a decomposition is not based on FDs then we might not be able to recover the original relation from the new ones: – Example: R(A,B,C) with AB and we decompose it into S(A,B) and

T(B,C):

A B C

1 2 3

4 2 5

A B

1 2

4 2

B C

2 3

2 5

A B C

1 2 3

1 2 5

4 2 3

4 2 5

Page 25: Closure

Third Normal Form (3NF)

• So far: if a relation is not in BCNF then anormalies arise.

• Given a relation Bookings with the attributes:– title: name of the movie

– theater: name of the theater where the movie is being shown

– city: the city where the theater is located

(a tuple (m,t,c): represents the fact that movie m is shown at theater t in city c)

Page 26: Closure

Bookings(title,theater,city)

• The FDs of the relations:– theater city

– title city theater

• theatercity violates the BCNF condition, why?• decomposition yields: {theater,city} and {theater,title} • Consider the relations:

theater city

Guild Menlo

Park Menlo

theater title

Guild Net

Park Net

theater title city

Guild Net Menlo

Park Net Menlo

recoveringViolate the FD title citytheater

Possible relations according to the FDsof each schema

Page 27: Closure

3NF

• A relaxation of the BCNF condition: a relation R is in 3NF if: whenever there is a nontrivial FD A1…AnB, either {A1,…,An} is a superkey or B is a member of some key.

• Bookings(title,theater,city) is in 3NF

Page 28: Closure

Checking BCNF and 3NF• Given R(A,B,C,D) with FDs ABC, CD, DA.• Question: Indicate the BCNF violations and 3NF violations.• Steps in answering the question:

– Step 1: compute all nontrivial FDs (right side: one att)– Step 2: find all keys– Step 3: find all the violations

• Step 1: ABC, CD, DA, ABD, CA, DBC, ACD• Step 2: Keys – {A,B}, {C,B}, and {D,B}• Step 3:

– BCNF violation: CD, DA, CA, ACD and their trivial extensions (e.g. CDD, DAA,…)

– 3NF violation: none