Database
Normalization
Designing Good Schemas
We know how to create schemas, but ... how do we create good schemas? what does good mean?
Schema quality measurements: semantics of the attributes minimal redundancy minimal frequency of null values
Functional Dependences
A column Y of relational table R is functionally dependent up on column X of relational table R if and only if:
Each value of X in R associated with each value of Y at any given time
Functional dependences
Y is functional dependent up on X same as values of X identify values of Y
If X Y then XZYZ IF XY and Y Z then XZ X Y means that Y depend on X or
X identify Y
Examples
S# Ename {S#, P#} Hours If for each value of S#, there are exactly one
corresponding value for sname, state, city then:
Sname Sate CityS#
Example
If {S#, p#} Qty
P# QTYS#
Redundancy Example Where’s the redundancy?
Redundancy Example
Example FDs
Transitive FDsProper FDs
Partial Key FDs
Partial Key FD
10
Normal Forms
Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies)
The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)
Normalization0NF
removemulti-valued
attributes1NF 2NF 3NF
removepartial
dependencies
removetransitive
dependencies
BCNF 4NF 5NFremove
remainingFD anomal
dependencies
removemultivalue
dependencies
removeremaininganomalies
1 NF
First normal form is
NO multi-valued attributes
No composite attribute
No nested relation
We create new table or new field (telephone, visiting)
1NF Normalization
Proper translation from ER multi-value attributes will achieve 1NF.
Still not a good solution,since we have redundancy in Dnumber and Dmgr_ssn.(This will be handled by 2NF.)
2 NF form
Second normal form that if primary key is multiple attribute and non-key attribute depend on part of primary key
P# HoursS# Cname pname Loc
2NF Normalization
Move the partial key and dependent attributes to a new relation.
Transitive Dependencies
X → Y is a transitive dependency (PD)
if there exists Z ⊈ any key
such that X → Z → Y TDs can cause redundancy if there are multiple
values of X that determine the same value of Z the value of Y for that value of Z is stored multiple times
3NF normalization: move (Z,Y) to new relation in which Z is the primary key
3 NF
The relation in 3NF if it is 2 NF and every
non-key attribute is non-transitively
dependent on primary key
3NF Normalization
Create new relation to hold the attributes in the transitive FD.
LHS of transitive FD becomes PK of new relation.
Transitive Dependency Example
I_OFFICE (instructor's office) is determinedby the non-PK attribute INSTR
DEPT COURSE SECTION ROOM INSTR I_OFFICE
DEPT COURSE SECTIONCOMP 51 1COMP 51 2COMP 163 1COMP 53 1COMP 53 2
ROOMWPC122WPC219WPC122WPC130WPC130
INSTRDOHERTYCLIBURNDOHERTYBOWRINGCARMAN
I_OFFICECSB109CSB107CSB109CSB108CSB104
NF Decomposition: Foreign Keys
DEPT COURSE SECTION ROOM INSTR I_OFFICE
DEPT COURSE SECTION ROOM INSTR
INSTR I_OFFICE
Decomposition:
Arthur Keller – CS 180
NormalizationGoal = BCNF = Boyce-Codd Normal Form =all FD’s follow from the fact “key
everything.” Formally, R is in BCNF if for every nontrivial
FD for R, say X A, then X is a superkey. “Nontrivial” = right-side attribute not in left side.
Why?1. Guarantees no redundancy due to FD’s.
2. Guarantees no update anomalies = one occurrence of a fact is updated, not all.
3. Guarantees no deletion anomalies = valid fact is lost when tuple is deleted.
Boyce-Codd Normal Form Sample data for Course Section table
Because Prefix Department, we know that (Prefix, Num, SecNum) could also be a primary key for this table.
Department Prefix Num SecNum CourseName Instructor
Mathematics Math 101 1 Algebra I Al Jeebra
Mathematics Math 101 2 Algebra I Al Jeebra
Mathematics Math 201 1 Calculus I Kal Kuelus
Philosophy Phil 201 1 Greek Thought Arie Stottle
Philosophy Phil 202 1 Euro Thought Mike Angelo
Marketing Mktg 410 1 Marketing Strategy
Marc Ekking
Marketing SpMkg 401 1 Advanced Sports Marketing
Hulk Hogan
23
Example
Students(name, addr, phones, CarLiked) A student’s phones are independent of the cars
they like. Thus, each of a student’s phones appears with
each of the cars they like in all combinations. This repetition is unlike redundancy due to
FD’s, of which name->addr is the only one.
24
Example
Students(name, addr, CarLiked, manf, favCar) FD’s: name->addr favCar, carsLiked->manf
Only key is {name, CarsLiked}. In each FD, the left side is not a superkey. Any one of these FD’s shows Students is not
in BCNF
25
Boyce-Codd Normal Form
We say a relation R is in BCNF if whenever X ->A is a nontrivial FD that holds in R, X is a superkey. Remember: nontrivial means A is not a member
of set X. Remember, a superkey is any superset of a key
(not necessarily a proper superset).
26
Example
Students(name, addr, CarsLiked, manf, favCar) F = name->addr, name -> favCar, CarsLiked->manf Pick BCNF violation name->addr. Close the left side: {name}+ = {name, addr, favCar}. Decomposed relations:
1. Students1(name, addr, favCar)
2. Students2(name, CarsLiked, manf)
27
3NF and BCNF
3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.
X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.
ExercisesThe following relation schema
is not in third normal form (3NF).
Is this an example of a transitive dependency or a partial key dependency?
Give an equivalent schema that is in 3NF.
SID FROM_CITY TO_CITY DISTANCE
SHIPMENT
WEIGHT
Exercises
This relation has been proposed to track Pacific alumni:
Alumni( SID, LastName, FirstName, Degree, YearAwarded, Phone).
Pacific allows students to receive multiple degrees,possibly in different years. Identify all FDs.
Give a new schema that is in third normal form.
ExercisesConsider the following relation schema:
Movie(title, genre, length, actor, sag_id, studio, studio_addr)
Every movie has a unique title. A movie may have multiple actors. Each actor has a unique sag_id. An actor may appear in multiple movies. A movie has exactly one studio,
but a studio may produce more than one movie. Each studio has exactly one address.
Identify all functional dependencies.
Normalize the schema to 3NF.
Top Related