Lecture 09: Functional Dependencies, Database Design
description
Transcript of Lecture 09: Functional Dependencies, Database Design
1
Lecture 09:Functional Dependencies,
Database Design
Monday, January 24, 2005
2
Outline
• Functional dependencies (3.4)
• Rules about FDs (3.5)
• Design of a Relational schema (3.6)
3
Functional Dependencies
Definition: A1, ..., Am B1, ..., Bn holds in R if:
t, t’ R, (t.A1=t’.A1 ... t.Am=t’.Am t.B1=t’.B1 ... t.Bn=t’.Bn )
A1 ... Am B1 ... Bm
if t, t’ agree here then t, t’ agree here
t
t’
R
4
In Class: Find All FDs
Student Dept Course Room
Alice CSE C++ 020
Bob CSE C++ 020
Alice EE HW 040
Carol CSE DB 045
Dan CSE Java 050
Elsa CSE DB 045
Frank EE Circuits 020
5
Answer
Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course
Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course
Do all FDsmake sensein practice ?
6
Review: Closure
Compute {A,B}+ X = {A, B, }
Compute {A, F}+ X = {A, F, }
R(A,B,C,D,E,F) A, B CA, D EB DA, F B
A, B CA, D EB DA, F B
Example
7
Review: Inference
A, B CA, D BB D
A, B CA, D BB D
Example:
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD
A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :
AB CD, ADBC, ABC D, ABD C, ACD BAB CD, ADBC, ABC D, ABD C, ACD B
8
Keys
• A superkey is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An B
• In other words: a set of attributes X is a superkey if X+ = all attributes
• A key is a minimal superkey– I.e. set of attributes which is a superkey and for which no subset
is a superkey
9
Computing (Super)Keys
• Compute X+ for all sets X
• If X+ = all attributes, then X is a key
• List only the minimal X’s
Note: there can be many keys !
• Example: R(A,B,C), ABC, BCAKeys: AB and BC
10
Examples of Keys• Product(name, price, category, color)
name, category pricecategory color
Key is: {name, category} ; superkeys are all supersets
• Enrollment(student, address, course, room, time)student addressroom, time coursestudent, course room, time
Keys are: [in class]
11
FD’s for E/R Diagrams
Given a relation constructed from an E/R diagram, what is its key?
Rule 1: If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set.
address name ssn
Person Person(address, name, ssn)
12
FD’s for E/R Diagrams
PersonbuysProduct
name
price name ssn
buys(name, ssn, date)
date
Rule 2: If the relation comes from a many-many relationship, the key of the relation is the set of all attribute keys in the relations corresponding to the entity sets
13
FD’s for E/R DiagramsExcept: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key.
Purchase
Product
Person
Store
CreditCard
name
card-nossn
sname
Purchase(name , sname, ssn, card-no)
14
FD’s for E/R Diagrams
More rules:
• Many-one, one-many, one-one relationships
• Multi-way relationships
• Weak entity sets
(Try to find them yourself, or check book)
15
FD’s for E/R DiagramsSay: “the CreditCard determines the Person”
Purchase
Product
Person
Store
CreditCard
name
card-nossn
sname
Purchase(name , sname, ssn, card-no)
Incomplete(what does
it say ?)
card-no ssn
16
Relational Schema Design(or Logical Design)
Main idea:
• Start with some relational schema
• Find out its FD’s
• Use them to design a better relational schema
17
Data Anomalies
When a database is poorly designed we get anomalies:
Redundancy: data is repeated
Updated anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
18
Relational Schema Design
Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:
what is his city ?
Recall set attributes (persons with several phones):
SSN Name, CitySSN Name, City
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
but not SSN PhoneNumber
19
Relation DecompositionBreak the relation into two:
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
20
Relational Schema Design
PersonbuysProduct
name
price name ssn
Conceptual Model:
Relational Model:plus FD’s
Normalization:Eliminates anomalies
21
Decompositions in General
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)
22
Decomposition
• Sometimes it is correct:Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Price
Gizmo 19.99
OneClick 24.99
Gizmo 19.99
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Lossless decomposition
23
Incorrect Decomposition
• Sometimes it is not:
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
Gizmo 19.99 Camera
Name Category
Gizmo Gadget
OneClick Camera
Gizmo Camera
Price Category
19.99 Gadget
24.99 Camera
19.99 Camera
What’sincorrect ??
Lossy decomposition
24
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An B1, ..., Bm
Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)
Example: name price, hence the first decomposition is lossless
Note: don’t need A1, ..., An C1, ..., Cp
25
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = will discuss
Boyce Codd Normal Form (BCNF) = will discuss
Others...
26
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a superkey for R
A relation R is in BCNF if:
If A1, ..., An B is a non-trivial dependency
in R , then {A1, ..., An} is a superkey for R
27
BCNF Decomposition Algorithm
A’s OthersB’s
R1
Is there a 2-attribute relation that isnot in BCNF ?
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2
Until no more violations
R2
In practice, we havea better algorithm (next):
28
Example
What are the dependencies?SSN Name, City
What are the keys?{SSN, PhoneNumber}
Is it in BCNF?
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield
29
Decompose it into BCNF
Name SSN City
Fred 123-45-6789 Seattle
Joe 987-65-4321 Westfield
SSN PhoneNumber
123-45-6789 206-555-1234
123-45-6789 206-555-6543
987-65-4321 908-555-2121
987-65-4321 908-555-1234
SSN Name, City
Let’s check anomalies:• Redundancy ?• Update ?• Delete ?
30
Example Decomposition Person(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Decompose in BCNF (in class):
31
BCNF Decomposition Algorithm
BCNF_Decompose(R)
find X s.t.: X ≠X+ ≠ [all attributes]
if (not found) then “R is in BCNF”
let Y = X+ - X let Z = [all attributes] - X+ decompose R into R1(X Y) and R2(X Z) continue to decompose R1 and R2
BCNF_Decompose(R)
find X s.t.: X ≠X+ ≠ [all attributes]
if (not found) then “R is in BCNF”
let Y = X+ - X let Z = [all attributes] - X+ decompose R into R1(X Y) and R2(X Z) continue to decompose R1 and R2
32
Example BCNF DecompositionPerson(name, SSN, age, hairColor, phoneNumber)
SSN name, ageage hairColor
Iteration 1: PersonSSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)
Iteration 2: Page+ = age, hairColorDecompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)
Iteration 1: PersonSSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)
Iteration 2: Page+ = age, hairColorDecompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)
Find X s.t.: X ≠X+ ≠ [all attributes]
What isthe key ?
33
Example
• R(A,B,C,D) A B, B C
• In R: A+ = ABC ≠ ABCD– Split into R1(A,B,C), R2(A,D)
• In R1: B+ = BC ≠ ABC– Split into R11(B,C), R12(A,B)
• Final decomposition: R11(B,C), R12(A,B), R2(A,D)
• What happens if in R we first pick B+ ? Or AB+ ?
What arethe keys ?