Post on 10-Sep-2015
description
Database Systems
Normalization
GCE (A/L) ICT Training for Teachers 1
Contact Persons
Name : Buddhika H. Kasthuriarachchy
Email : buddhika.h@sliit.lk
Phone : 0112 413900 ext: 4301
Mobile : 0773607507
GCE (A/L) ICT Training for Teachers 2
Recommended Reading
https://sites.google.com/site/ictalnie/
Google user name : ictalnie2011
Password : ictalniepython
Fundamentals of Database Systems (5th
Edition) Ramez Elmasri /Shamkant B. Navathe
Database Management Systems (2nd
Edition) - Raghu Ramakrishna
/Johannes Gehrke
GCE (A/L) ICT Training for Teachers 3
Introduction
Conceptual Modeling is a subjective process
Therefore, the schema after the logical database design phase may not be very good (contain redundancies)
However, there are formalisms to ensure that the schema is good.
This process is called Normalization
GCE (A/L) ICT Training for Teachers 4
Relational database schema = set of relations
Relation = set of attributes
How we group the attributes to relations is very important
GCE (A/L) ICT Training for Teachers 5
Too many attributes in a relation Waste space
Anomalies
Decomposing the relation into too smaller set of relations
Loss-less join property
Dependency preserving property
GCE (A/L) ICT Training for Teachers 6
Too many attributes
For example,
LECTURER(id, name, address, salary,
deptno,dname building)
GCE (A/L) ICT Training for Teachers 7
Insertion Anomaly
1. Inserting a new lecturer to the
LECTURER table
- Department information is repeated
(ensure that correct department
information is inserted).
2. Inserting a department with no
employees
(Impossible b/c null values for id is not allowed)
GCE (A/L) ICT Training for Teachers 8
Deletion Anomalies
Deleting the last lecturer from the department will lose information about
the department
GCE (A/L) ICT Training for Teachers 9
Update Anomalies
Updating the departments building needs to be done for all lecturers
working for that department
GCE (A/L) ICT Training for Teachers 10
When redundancies exists, we should decompose the relations to smaller
relations
Loss-less join property: we might lose information if we decompose relations
Dependency-preserving property: The set of dependencies in S can be
verified by a set of dependencies in R1and R
GCE (A/L) ICT Training for Teachers 11
Loss-less join property:
For example,
GCE (A/L) ICT Training for Teachers 12
S P D
S1 P1 D1
S2 P2 D2
S3 P1 D3
S P
S1 P1
S2 P2
S3 P1
P D
P1 D1
P2 D2
P1 D3
S R1 R2
Joining them together, we get spurious tuples
GCE (A/L) ICT Training for Teachers 13
S P D
S1 P1 D1
S1 P1 D3
S2 P2 D2
S3 P1 D1
S3 P1 D3
R1 R2
To avoid the above mentioned issues in the relational schema, we can apply
a formal process called Normalization
Normalization is based on functional dependencies
GCE (A/L) ICT Training for Teachers 14
A functional dependency, denoted by X Y, where X and Y are sets of attributes in relation R, specifies the following constraint:
Let t1 and t2 be tuples of relation R for any given instance
Whenever t1[X] = t2[X] then t1[Y] = t2[Y]
where ti[X] represents the values for X in tuple ti
GCE (A/L) ICT Training for Teachers 15
Key points:
Redundancy is based on functional dependencies
Therefore, normalization is based on functional dependencies
GCE (A/L) ICT Training for Teachers 16
Given some FDs, we can usually infer additional FDs:
A B, B C implies A C
An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold.
F+ = closure of F is the set of all FDs that are implied by F.
How can we get F+?
GCE (A/L) ICT Training for Teachers 17
Armstrongs Axioms (X, Y, Z are sets of attributes):
Reflexivity: If X Y, then Y X
Augmentation: If X Y, then XZ YZ for any Z
Transitivity: If X Y and Y Z, then X Z
These are sound and complete inference rules for FDs!
GCE (A/L) ICT Training for Teachers 18
Couple of additional rules (that follow from AA):
Union: If X Y and X Z, then X YZ
Decomposition: If X YZ, then X Y and X Z
Example: Contracts(cid,sid,jid,did,pid,qty,value), and:
C is the key: C CSJDPQV
Project purchases each part using single contract: JP C
Dept purchases at most one part from a supplier: SD P
JP C, C CSJDPQV imply JP CSJDPQV
SD P implies SDJ JP
SDJ JP, JP CSJDPQV imply SDJ CSJDPQV
GCE (A/L) ICT Training for Teachers 19
Why is F+ important?
X RHS in relation R
X is a subset of attributes in relation R. If RHScontains all attributes of R, then X is a superkey.
If X is not a superkey, then values for X can repeat in different tuples resulting in redundancy!!!
So determining F+ can help us find superkeys and check for any redundancy.
GCE (A/L) ICT Training for Teachers 20
Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!)
Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F+. An efficient
check:
Compute attribute closure of X (denoted X+) wrt F:
Set of all attributes A such that X A is in F+
There is a linear time algorithm to compute this.
Check if Y is in X+
GCE (A/L) ICT Training for Teachers 21
Algorithm to find X+:
closure = X;
repeat until there is no change: {
If there is an FD U V in F such that U closure
then set closure = closure V
}
Does F = {A B, B C, CD E } imply A E?
i.e, is A E in the closure F+? Equivalently, is E in A+?
We can use the attribute closure to find out keys of the relation. If X+ contains all attributes of the relation, then X is a superkey.
GCE (A/L) ICT Training for Teachers 22
Schema Refinement Steps:
Determine F for relation R
Find all keys in F using attribute closure
Normalize
GCE (A/L) ICT Training for Teachers 23
There are many Normal Forms proposed to reduce redundancies
Some of the well-known ones are:
1st Normal Form
2nd Normal Form
3rd Normal Form
Boyce-Codd Normal Form
GCE (A/L) ICT Training for Teachers 24
Lossless join decomposition: Decomposition of R into X and Y is lossless-join
w.r.t. a set of FDs F if, for every instance r that satisfies F:
X(r) Y (r) = r
TheoremThis condition holds if attributes common to X
and Y contains a key for either X or Y
We can find a lossless join decomposition for 1st NF, 2nd NF, 3rd NF and BCNF (will see later)
GCE (A/L) ICT Training for Teachers 25
Dependency preserving property:
A relation R with a set of functional dependencies F, is decomposed into relations X and Y are said
to be dependency preserving iff F+ = (Fx FY)+
That is, a dependency-preserving decomposition allows us to enforce all FDs by examining a single
relation instance.
We can always obtain a dependency preserving decomposition for 1st NF, 2nd NF and 3rd NF. Not
necessarily for BCNF (will see later)
GCE (A/L) ICT Training for Teachers 26
Review of some terms
Superkey: Set if attributes S in relation R such that no two distinct tuples t1 and
t2 will have t1[S] = t2[S]
Key: A key is a superkey with the additional property that removal of any
attributes from the key will not satisfy
the key condition
GCE (A/L) ICT Training for Teachers 27
Candidate Key: Each key of a relation is called a candidate key
Primary Key: A candidate key is chosen to be the primary key
Prime Attribute: an attribute which is a member of a candidate key
Nonprime Attribute: An attribute which is not prime
GCE (A/L) ICT Training for Teachers 28
1st Normal Form
A relation R is in first normal form (1NF) if domains of all attributes in the relation are atomic (simple & indivisible).
GCE (A/L) ICT Training for Teachers 29
2nd Normal Form:
A relation R is in second normal form (2NF) if every nonprime attribute A in R
is not partially dependent on any key of
R
GCE (A/L) ICT Training for Teachers 30
Example
EMP_PROJ
GCE (A/L) ICT Training for Teachers 31
NIC PNUM HOURS ENAME PNAME LOC
FD1
FD2
FD3
GCE (A/L) ICT Training for Teachers 32
NIC PNUM HOURS
NIC ENAME
PNUM PNAME PLOC
EP1
EP2
EP3
3rd Normal Form:
A relation R is in 3rd normal form (3NF) if every
R is in 2NF, and
No nonprime attribute is transitively dependent on any key
GCE (A/L) ICT Training for Teachers 33
Example,
EMP_DEPT
GCE (A/L) ICT Training for Teachers 34
ENAME SSN BDATE ADD DNUM DNAME DMGR
GCE (A/L) ICT Training for Teachers 35
ED1
ED2
ENAME SSN BDATE ADD DNUM
DNUM DNAME DMGR