The Relational Datalsir · The Relational Data Model Lecture 6 2 Outline ¥Relational Data Model...
Transcript of The Relational Datalsir · The Relational Data Model Lecture 6 2 Outline ¥Relational Data Model...
1
The Relational DataModel
Lecture 6
2
Outline
• Relational Data Model
• Functional Dependencies
• Logical Schema Design
Reading
Chapter 8
3
The Relational Data Model
Data
Modeling
Relational
Schema
Physical
storage
E/R diagrams Tables:
column names: attributes
rows: tuples
Complex
file organization
and index
structures.
Have seen
this in SQL
Have seen
this tooDiscuss next
4
Terminology
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
Tuples or rows or records
Attribute namesTable name or relation name
Products:
5
Schemas
Relational Schema:
– Relation name plus attribute names
– E.g.• Product(Name, Price, Category, Manufacturer)
– In practice we add the domain for each attribute
Database Schema
– Set of relational schemas
– E.g.• Product(Name, Price, Category, Manufacturer)
• Company(Name, Address, Phone), . . .
6
Instances
• Relational schema = R(A1,…,Ak)Instance = relation with k attributes
• Database schema = R1(…), R2(…), …, Rn(…)Instance = n relations, of types R1, R2, ..., Rn
This is all mathematics, not to be confused with SQLtables!
(What's the difference?)
7
Example
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
Relational schema: Product(Name, Price, Category, Manufacturer)
Instance:
8
Design Criteria
• A relational schema should ensure:– Data integrity
• data is consistent and satisfies integrityconstraints
– Data redundancy should be avoided
• The process of creating a relationalschema that follows certain rules iscalled normalisation– We will learn several of these rules
9
Another “Example”
3.9Carol
3.7Bob
3.8Alice
CoursesGPAName
OS
DB
Math
OS
DB
OS
Math
Student
How can this be expressed in a relational schema?
10
Outlook: First Normal Form (1NF)
• A database schema is in First Normal Form ifall tables are flat
3.9Carol
3.7Bob
3.8Alice
CoursesGPAName
OS
DB
Math
OS
DB
OS
Math
Student
3.9Carol
3.7Bob
3.8Alice
GPAName
Student
Course
OS
DB
Math
OSCarol
OSAlice
DBBob
Alice
Carol
Alice
Student Course
DB
Math
Math
Takes Course
11
Example cont.
OS3.7Bob
OS3.8Alice
DB3.8Alice
OS3.9Carol
DB3.7Bob
Math3.8Alice
CourseGPAName
Theoretically, also this flattened
table is in 1NF but it has several
problems:
• Update anomalies• Inserts etc.
Only a compound key between
Name and Course is possible
12
1NF cont.
• Features:– All values (attributes) are atomic
• For instance, no comma separated are valuesallowed!
– “Conventional” SQL based databasestypically adhere to the 1NF
• Relational Rule 1 (based on Codd!s 12 Rules):
– A table that has no multi-valued fields issaid to be in the first normal form.
13
Normal Forms: Overview
– 1st Normal Form (1NF)
– 2nd Normal Form (2NF)
– 3rd Normal Form (3NF)
– Boyce Codd Normal Form (BCNF)
• The higher the normal form, the moreredundancies are reduced
Str
ict
ord
erin
g
14
Outline
• Relational Data Model
• Functional Dependencies
• Logical Schema Design
15
Functional Dependencies (FD)
• A form of constraint– hence, part of the schema
• Finding them is part of the databasedesign
• Also used in normalising the relations
Warning: this is the most abstract, and “hardest” part ofthe course.
16
Functional Dependency:
Graphically
(Patrick O’Neil 1994)
17
FD cont.
• Important: the intent of the DB designer isexpressed
• “Two rows cannot agree in value on attributeA and disagree on B”.
If r1(A) = r2(A) then r1(B) = r2(B)
– “A functionally determines B”
– “B is functionally dependent on A”
• In other words: A must be unique
18
FD Definitions
Definition:
If two tuples agree on the attributes
then they must also agree on the attributes
Formally:
A1, A2, …, An ! B1, B2, …, Bm
A1, A2, …, An
B1, B2, …, Bm
19
Examples
• EmpID ! Name, Phone, Position
• Position ! Phone
• but Phone ! Position
EmpID Name Phone Position
E0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer
20
Example
EmpID Name Phone Position
E0045 Smith 1234 Clerk
E1847 John 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer
Position ! Phone
Cle
rk
Sal
esre
p
Lay
er
.
.
.
Phone
Position
1234
9876
21
In General
• To check A ! B, erase all other columns
• check if the remaining relation is many-to-one (called functional in mathematics)
… A … B
X1 Y1
X2 Y2
… …
22
Typical Examples of FDs
Product: name ! price, manufacturer
Person: ssn ! name, age
Company: name ! stockprice, president
23
Example
Product(name, category, color, department, price)
name ! color
category ! department
color, category ! price
Consider these FDs:
What do they say?
24
ExampleFDs are constraints on relations:
• On some instances they hold
• On others they don’t
99ToysGreenGadgetTweaker
49ToysGreenGadgetGizmo
pricedepartmentcolorcategoryname
Does this instance satisfy all the FDs?
name ! color
category ! department
color, category ! price
25
Example
59Office-
supp.GreenStationaryGizmo
99ToysBlackGadgetTweaker
49ToysGreenGadgetGizmo
pricedepartmentcolorcategoryname
What about this one?
name ! color
category ! department
color, category ! price
26
Example
If some FDs are satisfied, then
others are satisfied too
If all these FDs are true:
name ! color
category ! department
color, category ! price
Then this FD also holds: name, category ! price
Why ??
27
Inference Rules for FDs
Is equivalent to
(1) Splitting rule
and
(2) Combining rule
Bm...B1An...A1
A1, A2, …, An ! B1, B2, …, Bm
A1, A2, …, An ! B1
A1, A2, …, An ! B2
. . . . .
A1, A2, …, An ! Bm
splitcombine
28
Inference Rules for FDs
(continued)
(3) Trivial Rule
Am…A1
where i = 1, 2, ..., n
Ai is a subset of A1..n
A1, A2, …, An ! Ai
29
Inference Rules for FDs
(continued)
(4) Transitive Closure Rule
If
and
then
A1, A2, …, An ! B1, B2, …, Bm
B1, B2, …, Bm ! C1, C2, …, Cp
A1, A2, …, An ! C1, C2, …, Cp
30
...C1 CpBm…B1Am…A1
31
Example (continued)
Start from the following FDs:
Infer the following FDs:
1. name ! color
2. category ! department
3. color, category ! price
8. name, category ! price
7. name, category ! color, category
6. name, category ! category
5. name, category ! color
4. name, category ! name
Which Rule
did we apply ?Inferred FD
32
Example (continued)
Answers:
Transitivity on 3, 78. name, category ! price
Split/combine on 5, 67. name, category ! color, category
Trivial rule6. name, category ! category
Transitivity on 4, 15. name, category ! color
Trivial rule4. name, category ! name
Which Rule
did we apply ?Inferred FD
1. name ! color
2. category ! department
3. color, category ! price
33
Another Example
• Enrollment(student, major, course, room,time)
student ! major
major, course ! room
course ! time
34
Another Rule
If
then
Augmentation follows from trivial rules and transitivity
How?
A1, A2, …, An ! B
A1, A2, …, An , C1, C2, …, Cp ! B
(5) Augmentation Rule
35
“Solving” the Augmentation
Rule
A1, A2, …, An, C1, C2, …, Cp ! A1, A2, …, An
A1, A2, …, An , C1, C2, …, Cp ! B
A1, A2, …, An ! B
Transitivity gives:
Trivial Rule
36
Summary of Rules
(1)Splitting (Decomposition) Rule
If X ! YZ, then X ! Y and X ! Z
(2) Combining (Union) Rule
If X ! Y and X ! Z, then X ! YZ
(3) Trivial (Reflexivity) Rule
If Y ! X, then X ! Y
(4) Transitive Closure Rule
If X ! Y and Y ! Z, then X ! Z
(5) Augmentation Rule
If X ! Y, then XZ ! Y, for every Z
37
Problem: infer ALL FDs
Given a set of FDs, infer all possible FDs
How to proceed ?
• Try all possible FDs, apply all rules– E.g. R(A, B, C, D): how many FDs are possible ?
• Answer: 24 subsets of attributes
• Drop trivial FDs, drop augmented FDs– Still way too many
• Better: use the Closure Algorithm (next)
38
FDs and Closure
• Typically, a relation R has a set of
defined functional dependencies F
• The closure of F (written F+) is the setof all functional dependencies that may
be derived from F
– F+ contains all defined FDs and derivedFDs
39
Closure of a set of AttributesGiven a set of attributes A1, …, An
The closure, {A1, …, An}+ , is the set of attributes B
s.t. A1, …, An ! B
name ! color
category ! department
color, category ! price
Example:
Closures:
name+ = {name, color}
{name, category}+ = {name, category, color, department, price}
color+ = {color}
40
Closure Algorithm
Start with X={A1, …, An}.
Repeat until X doesn’t change do:
if B1, …, Bn ! C is a FD and
B1, …, Bn are all in X
then add C to X. {name, category}+ =
{name, category, color,
department, price}
name ! color
category ! department
color, category ! price
Example:
41
Closure Algorithm more verbose
• Starting with the given set of attributes, repeatedly expand the set by
adding the right sides of FDs as soon as we have included their left sides.
• Eventually, we cannot expand the set any more, and the resulting set is
the closure.
1 Let X be a set of attributes that eventually will become the closure. First
we initialize X to be {A1, A2, …, An}.
2 Now, repeatedly search for some FD in X:
B1B2…Bm!C
such that all of Bs are in the set X, but C is not. We then add C to X.
3 Repeat step 2 as many times as necessary until no more attributes can
be added to X.
Since X can only grow, and the number of attributes is finite, eventually nothing
more can be added to X.
4 The set X after no more attributes can be added to it is the: {A1, A2, …,
An}+. (Alex Thomo 2006)
42
Example
Compute {A,B}+ X = {A, B, ? }
Compute {A, F}+ X = {A, F, ? }
R(A,B,C,D,E,F) A, B ! C
A, D ! E
B ! D
A, F ! B
43
Example (Solution)
Compute {A,B}+ X = {A, B, C, D, E }
Compute {A, F}+ X = {A, F, B, C, D, E }
R(A,B,C,D,E,F) A, B ! C
A, D ! E
B ! D
A, F ! B
44
Using Closure to Infer ALL FDs
A, B ! C
A, D ! B
B ! D
Example:
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ = ABCD, AC+ = AC, AD+ = ABCD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)
BCD+ = BCD, ABCD+ = ABCD
45
Problem: Finding FDs
• Approach 1: During Database Design– Designer derives them from real-world knowledge
of users
– Problem: knowledge might not be available
• Approach 2: From a Database Instance– Analyze a given database instance and find all
FDs satisfied by that instance
– Useful if designers don!t get enough informationfrom users
– Problem: FDs might be artificial for the giveninstance
46
Find All FDs
020CircuitsEEFrank
045DBCSEElsa
050JavaCSEDan
045DBCSECarol
040HWEEAlice
020C++CSEBob
020C++CSEAlice
RoomCourseDeptStudent
47
Some Answers
Course ! Dept, Room
Dept, Room ! Course
Student, Dept ! Course, Room
Student, Course ! Dept, Room
Student, Room ! Dept, Course
Do all FDs
make sense
in practice ?
48
Keys
• A key is a set of attributes A1, ..., An s.t. for anyother attribute B, we have A1, ..., An ! B– Example
• A minimal key is a set of attributes which is akey and for which no subset is a key– Example
• Note: our course book calls them superkeyand key
A1, A2, A3, A4
A1, A2, A3, A4
49
Computing Keys
• Compute X+ for all sets X
• If X+ = all attributes, then X is a key
• List only the minimal keys
Note: there can be many minimal keys !
• Example: R(A,B,C), AB!C, BC!AMinimal keys: AB and BC
50
Let!s reuse the Closure
ExampleA, B ! C
A, D ! B
B ! D
Example:
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ = ABCD, AC+ = AC, AD+ = ABCD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)
BCD+ = BCD, ABCD+ = ABCD
Minimal keys
Keys
51
More Examples of Keys
• Product(name, price, category, color)name, category ! price
category ! color
Keys are: {name, category} and all supersets
• Enrollment(student, address, course, room, time)student ! address
room, time ! course
student, course ! room, time
… done in class
52
Outline
• Relational Data Model
• Functional Dependencies
• Logical Schema Design
53
Relational Schema Design
(or Logical Schema Design)
Main idea:
• Start with some relational schema
• Find out its FDs
• Use them to design a better relational
schema
54
Data Anomalies
When a database is poorly designed we getanomalies:
Redundancy: data is repeated
Update anomalies: need to change in severalplaces
Delete anomalies: may lose data when we don!twant
55
Relational Schema Design
Anomalies:• Redundancy = repeat data
• Update anomalies = Fred moves to “Bellevue”
• Deletion anomalies = Joe deletes his phone number:
what is his city ?
Example: Persons with several phones
SSN ! Name, City
Westfield908-555-2121987-65-4321Joe
Seattle206-555-6543123-45-6789Fred
Seattle206-555-1234123-45-6789Fred
CityPhoneNumberSSNName
but not SSN ! PhoneNumber
56
Relation DecompositionBreak the relation into two:
Westfield987-65-4321Joe
Seattle123-45-6789Fred
CitySSNName
908-555-2121987-65-4321
206-555-6543123-45-6789
206-555-1234123-45-6789
PhoneNumberSSN
Anomalies have gone:• No more repeated data
• Easy to move Fred to “Bellevue” (how ?)
• Easy to delete all Joe’s phone number (how ?)
Westfield908-555-2121987-65-4321Joe
Seattle206-555-6543123-45-6789Fred
Seattle206-555-1234123-45-6789Fred
CityPhoneNumberSSNName
57
Relational Schema Design
PersonbuysProduct
name
price name ssn
Conceptual Model:
Relational Model:
plus FDs
Normalization:
Eliminates anomalies
58
Decompositions in General
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
59
Decomposition
• Sometimes it is correct:
Camera19.99Gizmo
Camera24.99OneClick
Gadget19.99Gizmo
CategoryPriceName
19.99Gizmo
24.99OneClick
19.99Gizmo
PriceName
CameraGizmo
CameraOneClick
GadgetGizmo
CategoryName
Lossless decomposition
60
Incorrect Decomposition
• Sometimes it is not:
Camera19.99Gizmo
Camera24.99OneClick
Gadget19.99Gizmo
CategoryPriceName
CameraGizmo
CameraOneClick
GadgetGizmo
CategoryName
Camera19.99
Camera24.99
Gadget19.99
CategoryPrice
What’s
incorrect ??
Lossy decomposition
61
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An ! B1, ..., Bm
Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
Example: name ! price, hence the first decomposition is lossless
Note: don’t need necessarily A1, ..., An ! C1, ..., Cp
62
Reminder: Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = in principleold/obsolete
Third Normal Form (3NF) = this lecture
Boyce Codd Normal Form (BCNF) = this lecture
Others...
63
Third Normal Form (3NF)
• A relation is in 3NF if, and only if:– 1NF is satisfied
– Every non-key attribute is functionallydependent on the whole key
• i.e. no partial key functional dependencies
– No transitive functional dependencies:• A non-key attribute must not be functionally
dependent on another non-key attribute
– No data redundancies
64
Example
• Now we havereached 3NF
1NF but not 3NF
manager ! address(http://db.grussel.org)
65
Potential Problems with 3NF
• If a relation has more than 1 candidate key,anomalies may occur
• 3NF does not deal satisfactory withoverlapping candidate keys
• Need a stricter form:
– Boyce Codd Normal Form (BCNF)
• Example see:http://www.answers.com/topic/boyce-codd-normal-form
66
“Prod Cat, Stock Cat ! Range Code”
BCNF
67
Boyce-Codd Normal Form
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute,
it should determine all the attributes of R.
Every determinant (LHS of the FD) is a candidate key.
A relation R is in BCNF if:
If A1, ..., An ! B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
68
BCNF Decomposition
Algorithm
A’s OthersB’s
R1
Repeat
choose A1, …, Am ! B1, …, Bn that violates the BCNF condition
split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others])
continue with both R1 and R2
Until no more violations
R2
69
Example
What are the dependencies?
SSN ! Name, City
What are the keys?
{SSN, PhoneNumber}
Is it in BCNF?
Westfield908-555-1234987-65-4321Joe
Westfield908-555-2121987-65-4321Joe
Seattle206-555-6543123-45-6789Fred
Seattle206-555-1234123-45-6789Fred
CityPhoneNumberSSNName
70
Decompose it into BCNF
Westfield987-65-4321Joe
Seattle123-45-6789Fred
CitySSNName
908-555-1234987-65-4321
908-555-2121987-65-4321
206-555-6543123-45-6789
206-555-1234123-45-6789
PhoneNumberSSN
SSN ! Name, City
Let’s check anomalies:
• Redundancy ?
• Update ?
• Delete ?
71
Summary of BCNF
DecompositionFind a dependency that violates the BCNF condition:
A’s OthersB’s
R1 R2
Heuristics: choose B , B , … B “as large as possible”1 2 m
Decompose:
2-attribute
relations are BCNF
Continue until
there are no
BCNF violations
left.
A1, A2, …, An ! B1, B2, …, Bm
72
Example DecompositionPerson(name, SSN, age, hairColor, phoneNumber)
SSN ! name, age
age ! hairColor
Decompose in BCNF (in class):
Step 1: find all keys (How ? Compute S+, for various sets S)
Step 2: now decompose
73
Other Example
• R(A,B,C,D) A ! B, B ! C
• Key: AD
• Violations of BCNF (need to decompose)
– A ! B, A! C, A!BC
• Pick A! BC:
– split into R1(A,B,C) R2(A,D)
• What happens if we pick A ! B first ?
74
Lossless Decompositions
A decomposition is lossless if we can recover:
R(A,B,C)
R1(A,B) R2(A,C)
R!(A,B,C) should be the same as R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
Decompose
Recover
75
Lossless Decompositions
• Given R(A,B,C) s.t. A!B, thedecomposition into R1(A,B), R2(A,C) is
lossless
76
3NF: A Problem with BCNF
Unit Company Product
Unit Company
Unit Product
FDs: Unit ! Company; Company, Product ! Unit
So, there is a BCNF violation, and we decompose.
Unit ! Company
No FDs
Notice: we loose the FD: Company, Product ! Unit
77
So What!s the Problem?
Unit Company Product
Unit Company Unit Product
Galaga99 UW Galaga99 databases
Bingo UW Bingo databases
No problem so far. All local FD’s are satisfied.
Let’s put all the data back into a single table again (anomalies?):
Galaga99 UW databases
Bingo UW databases
Violates the dependency: company, product -> unit!
78
Solution: 3rd Normal Form
(3NF)
A relation R is in Third Normal Form if :
Whenever there is a nontrivial dependency A1, A2, ..., An ! B
for R , then {A1, A2, ..., An } is a key for R,
or B is part of a key.
Tradeoff:
BCNF = no anomalies, but may lose some FDs
3NF = keeps all FDs, but may have some anomalies
79
Summary
• Dependencies on attributes are
important when designing database
schema
– Functional dependencies
– Attributes should be dependent only on(primary) key
• Use normalised database schemas to
avoid certain anomalies with data