Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy...
-
date post
21-Dec-2015 -
Category
Documents
-
view
228 -
download
0
Transcript of Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy...
cs3431
Why Normalization?
To remove potential redundancy in design Redundancy causes several anomalies: insert,
delete and update Redundancy wastes storage, and often slows
down query processing
Examples to follow next.
cs3431
What is Normalization?
Normalization uses concept of dependencies Functional Dependencies
Technique used: Decomposition Break R (A, B, C, D) into R1 (A, B) and R2 (B, C, D)
cs3431
Insert Anomaly
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 ER
Student
Question: Could we insert any professor ?Note: We cannot insert a professor who has no students.
Insert Anomaly: We are not able to insert “valid” value/(s)
cs3431
Delete Anomaly
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 ER
Student
Question: Can we delete a student and keep a professor info ?Note: We cannot delete a student that is the only student of a professor.
Note: In both cases, minimum cardinality of Professor in the correspondingER schema is 0
Student
sNumber
sName
Professor
pNumber
pName
HasAdvisor
(1,1) (0,1)
years
Delete Anomaly: We are not able to perform a delete without losing some “valid” information.
cs3431
Update Anomaly
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p1 MM
Student
Question: Can we simply update a professor’s name ?Note: To update the name of a professor, we have to update in multiple tuples.
Student
sNumber
sName
Professor
pNumber
pName
HasAdvisor
(1,1) (0,*)
years
Note the maximum cardinality of Professor in the corresponding ER schema is *
Update Anomaly: To update a value, we have to update multiple rows. Update anomalies are due to redundancy.
cs3431
Normalization
Need a method to find such “dependencies” exist between attributes Functional dependencies
Need a method to remove such harmful dependencies, when they exist Relational decomposition
cs3431
Keys : Revisited
A key for a relation R (a1, a2, …, an) is a set of attributes, K, that together uniquely determine the values for all attributes of R.
A key is minimal: no subset of K is a key.
A superkey need not be minimal
A prime attribute: an attribute that is part of a key
cs3431
Functional Dependencies (FDs)
sNumber sName address
1 Dave 144FL
2 Greg 320FL
Student
Suppose we have the FD: sName address
That is, there is a function from sName to address
Meaning: For any two rows in the Student relation with the same value for sName, the value for address must be same.
cs3431
FD and Keys
sNumber sName address
1 Dave 144FL
2 Greg 320FL
Student
Questions : • Does a key implies functional dependencies? Which ones ?• Does a functional dependency imply keys ? Which ones ?
Observation : Any key (primary or candidate) or superkey of a relation R functionally determines all attributes of R.
Primary Key : <sNumber>FD : sName address
cs3431
Properties of FDs
Consider A, B, C, Z are sets of attributes
Reflexive (trivial FD): if A B, then A B Transitive: if A B, and B C, then A C Augmentation: if A B, then AZ BZ Union: if A B, A C, then A BC Decomposition: if A BC, then A B, A C
Note: Sound and complete inference rules for FDs
cs3431
Inferring FDs
Suppose we have : a relation R (A, B, C) and functional dependencies A B, B C, C A
Questions : What is a key for R? Should we split R into multiple relations?
We can infer A ABC, B ABC, C ABC. Hence A, B, C are all keys.
cs3431
Reasoning About FDs
An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. = closure of F is the set of all FDs that
are implied by F.
Computing closure of a set of FDs can be expensive. Size of closure is exponential in # attrs!
F
cs3431
Reasoning About FDs
Instead of computing closure F+ of a set of FDs Too expensive
Typically, we just need to know if a given FD X Y is in closure of a set of FDs F.
Algorithm for efficient check: Compute attribute closure of X (denoted ) wrt F:
Set of all attributes A such that X A is in There is a linear time algorithm to compute this.
Check if Y is in X+ . If yes, then X A in F+.
X
F
cs3431
Reasoning About FDs (Contd.)
Does F = {A B, B C, C D E } imply A E?
Question : i.e, is A E in the functional set closure ?
Equivalent Question : Is E in the attribute closure ?
A
F
cs3431
Algorithm for Inference of FDs
Computing the closure of set of attributes {A1, A2, …, An}, denoted {A1, A2, …, An}+
1. Let X = {A1, A2, …, An}
2. If there exists a FD : B1, B2, …, Bm C, such that every Bi X, then X = X C
3. Repeat step 2 until no more attributes can be added.
4. {A1, A2, …, An}+ = X
cs3431
Inferring FDs: Example 1
Consider R (A, B, C, D, E) with FDs A B, B C, CD E Does A E? (Is A E in F+ ?)
Rephrase as : Is E in A+ ? Let us compute {A}+
{A}+ = {A, B, C} Therefore, A E is false
cs3431
Inferring FDs: Example 2
Given R (A, B, C), and FDs : A B, B C, C A What are possible keys for R ?
Compute the closure of attributes: {A}+ = {A, B, C} {B}+
= {A, B, C} {C}+
= {A, B, C}
So keys for R are <A>, <B>, <C>
cs3431
Decomposing Relations
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 MM
StudentProf
FDs: pNumber pName
sNumber sName pNumber
s1 Dave p1
s2 Greg p2
Student
pNumber pName
p1 MM
p2 MM
Professor
cs3431
Decomposition: Lossless Join Property
sNumber sName pName
S1 Dave MM
S2 Greg MM
Student
pNumber pName
p1 MM
p2 MM
Professor
sNumber sName pNumber pName
s1 Dave p1 MM
s1 Dave p2 MM
s2 Greg p1 MM
s2 Greg p2 MM
StudentProf
SpuriousTuples