Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives Identify update, insertion and...

24
Schema Refinement SHIRAJ MOHAMED M | MIS 1

Transcript of Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives Identify update, insertion and...

Page 1: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Schema Refinement

SH

IRA

J M

OH

AM

ED

M |

MIS

1

Page 2: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Learning ObjectivesIdentify update, insertion and deletion

anomaliesIdentify possible keys given an instanceIdentify possible functional dependencies in a

relation Determine all keys in a schema

SH

IRA

J M

OH

AM

ED

M |

MIS

2

Page 3: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

What is Schema Refinement? Schema Refinement is the study of what should go

where in a DBMS, or, which schemas are best to describe an application.

For example, consider this schema

Versus this one:

Which schema do you think is best? Why?

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

EmpEID Name DeptIDA01 Ali 12 A12 Eric 10A13 Eric 12A03 Tyler 12

DeptDeptID DeptName12 Wing10 Tail

SH

IRA

J M

OH

AM

ED

M |

MIS

3

Page 4: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

What’s wrong?* The first problem students usually identify with the

EmpDept schema is that it combines two different ideas: employee information and department information. But what is wrong with this?

1. If we separated the two concepts we could save space.

2. Combining the two ideas leads to some bad anomalies.

These two problems occur because DeptID determines DeptName, but DeptID is not a key. Let’s look into the anomalies further.

SH

IRA

J M

OH

AM

ED

M |

MIS

4

Page 5: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Anomalies, Redundancy*

What anomalies are associated with EmpDept?Update Anomalies:

If the Wing department changes its name, we must change multiple rows in EmpDept

Insertion Anomalies: If a department has no employees, where do we store its name?

Deletion Anomalies: If A12 Eric quits, the information about the Tail department will be lost.

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

SH

IRA

J M

OH

AM

ED

M |

MIS

5

Page 6: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Practice Anomalies, Redundancies*

Identify anomalies associated with this schema. Include update, insertion and deletion anomalies.

EnrollStud(StudID, ClassID, Grade, ProfID, StudName)

Why do these anomalies occur?

SH

IRA

J M

OH

AM

ED

M |

MIS

6

Page 7: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Practice Anomalies, Redundancies*Update Anomaly: If a student changes his

name, we must change each row for which the student has taken a class. If a class changes the profID, we must change it for every row in which the class appears.

Insertion Anomaly: If a student has not taken a class, where do we store her name? If a class has no student grades recorded yet, where do we store its ProfID?

Deletion Anomaly: If a student drops her last course, the information about the student’s name will be lost. If the last student drops the course, the info about the ProfID will be lost.

SH

IRA

J M

OH

AM

ED

M |

MIS

7

Page 8: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Decomposition: A good solutionThe intergalactic standard solution to the

redundancy problem is to decompose redundant schemas, e.g., EmpDept becomes

The secret to understanding when and how to decompose schemas is Functional Dependencies, a generalization of keys.

When we say "X determines Y" we are stating a functional dependency.

EmpEID Name DeptIDA01 Ali 12 A12 Eric 10A13 Eric 12A03 Tyler 12

DeptDeptID DeptName12 Wing10 Tail

SH

IRA

J M

OH

AM

ED

M |

MIS

8

Page 9: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Review Keys

Note that EID being a key* of EmpDept means that the values of EID are unique, and EID is minimal.

Remember: you cannot determine keys from an instance, only from “natural” information or from a domain expert.

Let’s practice keys by identifying possible keys in an instance.

*sometimes called a candidate key

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

SH

IRA

J M

OH

AM

ED

M |

MIS

9

Page 10: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Identify Possible Keys*Identify all possible Keys based on this instance:

Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX10:42AM 233 def PDX SEA11:44AM 155 des ORD ATL12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA

SH

IRA

J M

OH

AM

ED

M |

MIS

10

Page 11: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Identify Possible Keys*Identify all possible Keys based on this instance:

Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX10:42AM 233 def PDX SEA11:44AM 155 des ORD ATL12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA

Possible keys are:{Time}, {Plane, Dest}, {Origin, Dest}

SH

IRA

J M

OH

AM

ED

M |

MIS

11

Page 12: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Functional Dependencies

A key like EID has another property: If two rows have the same EID, then they have the same value of every other attribute. We say EID functionally determines all other attributes and write this Functional Dependency (FD):

EID Name, DeptID, DeptNameIs Name DeptID true?

No, because rows 2 and 3 have the same Name but not the same DeptID.

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

SH

IRA

J M

OH

AM

ED

M |

MIS

12

Page 13: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Functional Dependencies, ctd.

Do you see any more FDs in EmpDept?Yes, the FD DeptID DeptName

DEFINITION: If A and B are sets of attributes in a relation,

we say that A (functionally) determines B, or AB is a Functional Dependency (FD) if whenever two rows agree on A, they agree on B. In other words, the value of a row on A functionally determines its value on B.

There are two special kinds of FDs:Key FDs, XA where X contains a key

Trivial FDs, such as NameName, or

Name,DeptIDDeptID

EID Name DeptID DeptNameA01 Ali 12 WingA12 Eric 10 TailA13 Eric 12 WingA03 Tyler 12 Wing

EmpDept

SH

IRA

J M

OH

AM

ED

M |

MIS

13

Page 14: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Identify (natural) FDs* What are the (natural) FDs in these relations? Identify

the key FDs but ignore trivial FDs

Customer(CustID, Address, City, Zip, State)

EnrollStud(StudID, ClassID, Grade, ProfID, StudName, ProfName)

SH

IRA

J M

OH

AM

ED

M |

MIS

14

Page 15: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Identify (natural) FDs* What are the (natural) FDs in these relations? Identify

the key FDs but ignore trivial FDs

Customer(CustID, Address, City, Zip, State) CustID -> Address, City, Zip, State. This is a key FD Address, City, State -> Zip Zip -> State

EnrollStud(StudID, ClassID, Grade, ProfID, StudName, ProfName)

{studID,ClassID}->grade, ProfID, StudName,ProfName. This is a key FD

StudID -> StudName ClassID -> ProfID,ProfName ProfID -> ProfName

SH

IRA

J M

OH

AM

ED

M |

MIS

15

Page 16: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

What are FDs?An FD is a generalization of the concept of

key.FDs, like keys and foreign keys, are a kind

of integrity constraint (IC).Like other ICs, FDs are part of a relation’s

schema.For example, a schema might be:

Assigned(EmpID Int,JobID Int,EmpName varchar(20),percent real,EmpID references… , JobID references…,

PRIMARY KEY (EmpID, JobID))

FDs: EmpIDEmpName

SH

IRA

J M

OH

AM

ED

M |

MIS

16

Page 17: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

How to determine FDsSo far we have dealt with “natural” FDs.

Sometimes it’s not clear what FDs apply in a relation, e.g., zip codes vs cities, or

Supplier(Name, Address, Crating, Discount) – unclear what are the FDs.

There are two ways to determine FDsInfer them as “natural” FDs from your experienceYou may be given them as part of the schema, by

the instructor or by the customer.As with keys, you cannot determine FDs from

an instance!But you can tell if something is not an FD

SH

IRA

J M

OH

AM

ED

M |

MIS

17

Page 18: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

LO8.3:Identify Possible FDs*Identify two possible non-key FDs based on

this instance (identical to slide 10). Remember the possible keys for this instance are {Time}, {Plane, Dest}, {Origin, Dest}

Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX10:42AM 233 def PDX SEA11:44AM 155 des ORD ATL12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA

SH

IRA

J M

OH

AM

ED

M |

MIS

18

Page 19: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

LO8.3:Identify Possible FDs*Identify two possible non-key FDs based on

this instance (identical to slide 10). Remember the possible keys for this instance are {Time}, {Plane, Dest}, {Origin, Dest}

Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX10:42AM 233 def PDX SEA11:44AM 155 des ORD ATL12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA

Possible FDs are Plane -> Flight and Plane -> Orig S

HIR

AJ M

OH

AM

ED

M |

MIS

19

Page 20: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Reasoning about FDs

EmpDept(EID, Name, DeptID, DeptName)Two natural FDs are

EIDDeptID and DeptIDDeptNameThese two FDs imply the FD EIDDeptName

Because if two tuples agree on EID, then by the first FD they agree on DeptID, then by the second FD they agree on DeptName.

The set of FDs implied by a given set F of FDs is called the closure of F and is denoted F+

SH

IRA

J M

OH

AM

ED

M |

MIS

20

Page 21: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Armstrong’s AxiomsThe closure of F can be computed using

these axiomsReflexivity: If X Y, then XYAugmentation: If XY, then XZYZ for any ZTransitivity: If XY and YZ then XZ

Armstrong’s axioms are sound (they generate only FDs in F+ when applied to FDs in F) and complete (repeated application of these axioms will generate all FDs in F+).

SH

IRA

J M

OH

AM

ED

M |

MIS

21

Page 22: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Determining KeysIn order to determine if X is a key of a

relation R, use this algorithm, which computes the attribute closure of X:

AttClos = X; // Note: X is a set of attributes

Repeat until there is no change If there is an FD UV with U AttClos,

then set AttClos = AttClos ∪ VAttClos=R if and only if X is a key

SH

IRA

J M

OH

AM

ED

M |

MIS

22

Page 23: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Determining the keys of R*Given the schema: R(A,B,C,D,E) BCA,

DEC . What are all the keys of this schema? Hint: any key must include A, BC or DE.

Why?

SH

IRA

J M

OH

AM

ED

M |

MIS

23

Page 24: Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Determining the keys of R*Any key must include A, BC or DE because otherwise the Attribute Closure algorithm will never get started.Determining the keys will be done in three steps, one for A, one for BC and one for DE.

1. A: it is already a key so we are done with this step, A is a key..

2. BC determines A which determines everything else so we are done, BC is a key

3. DE->DEC, dead end, so DE is not a key. Let’s try adding attributes to DE, in alphabetical order.We can’t add A to DE, since the result would not be minimal (A is a key)DEB->DEBC contains a key so DEB is a keyDEC, dead end, so DEC is not a key, and we can’t add anything to it to make a key (A and B would make it a key we have already seen).

Conclusion: The keys are A, BC and DEB. Notice how systematic we were. You’ll need it for the exercises and homework.

R(A,B,C,D,E) BCA, DEC .

SH

IRA

J M

OH

AM

ED

M |

MIS

24