Chapter 4 Normalization - Villanova Universitymdamian/Past/databasefa11/notes/...Chapter 4...
Transcript of Chapter 4 Normalization - Villanova Universitymdamian/Past/databasefa11/notes/...Chapter 4...
hChapter 4NormalizationNormalization
11
Data NormalizationData Normalization
• Formal process of decomposing• Formal process of decomposing relations with anomalies to produce smaller well-produce smaller, wellstructured and stable relations
• Primarily a tool to validate and• Primarily a tool to validate and improve a logical design so that it satisfies certain constraintsit satisfies certain constraints that avoid unnecessary duplication of data
22
duplication of data
Well-Structured RelationsWell Structured Relations• A relation that contains minimal data redundancy
and allows users to insert delete and update rowsand allows users to insert, delete, and update rows without causing data inconsistencies
• Goal is to avoid (minimize) anomaliesGoal is to avoid (minimize) anomalies– Insertion Anomaly – adding new rows forces user to
create duplicate dataD l ti A l d l i l f– Deletion Anomaly – deleting a row may cause loss of other data representing completely different facts
– Modification Anomaly – changing data in a row forces y g gchanges to other rows because of duplication
General rule of thumb: a table should not pertain to
33
more than one entity type
Example – Figure 4.2b
Question – Is this a relation?
Question What’s the primary key?
44
Question – What s the primary key?
Anomalies in this TableAnomalies in this Table• Insertion – can’t enter a new employee without having
the employee take a classthe employee take a class
• Deletion – if we remove employee 140, we lose information about the existence of a Tax Acc classinformation about the existence of a Tax Acc class
• Modification – giving a salary increase to employee 100 forces us to update multiple records100 forces us to update multiple records
Why do these anomalies exist?
Because there are two themes (entity types – what are they?) in this one relation (two themes, entity types, were combined). This results
y
55
one relation (two themes, entity types, were combined). This results in duplication, and an unnecessary dependency between the entities
Functional Dependenciesp
• Functional Dependency: The value of one attribute (the determinant) determines the value of another attributedeterminant) determines the value of another attribute.
– AB reads “Attribute B is functionally dependent on A”
– AB means if two rows have same value of A theyAB means if two rows have same value of A they necessarily have same value of B
– FDs are determined by semantics: You can’t say that a FD i j b l ki d h h i dexists just by looking at data. But can say whether it does
not exist by looking at data.
66
Quick Check
• Id Name?Id Name?
• Age Gender?
• Name Id?
• Name, Age Id?
77
Functional Dependencies and Keys• Functional Dependency: The value of one attribute (the
determinant) determines the value of another attribute.
C did t K• Candidate Key
– Attribute that uniquely identifies a row in a relation
– Could be a combination of (non-redundant) attributes– Could be a combination of (non-redundant) attributes
– Each non-key field is functionally dependent on every candidate key
88
Figure 4-23: Representing Functional Dependencies (cont.)
EmpID ____________________
EmpID, CourseTitle ____________________
99
First Normal Form (1NF)First Normal Form (1NF)
• Only atomic attributes (simple, single-value)• A primary key has been identified• Every relation is in 1NF by definition
• Fig. 4-2a is not in 1st Normal Form (multivaluedattributes) it is not a relationattributes) it is not a relation
• Fig. 4-2b is in 1st Normal form (but not in a well-structured relation)structured relation)
1010
1NF Example1NF Example
StudentStudentId StuName CourseId CourseName Grade
100 Mike 112 C++ A
Student
100 Mike 112 C++ A
100 Mike 111 Java B
101 Susan 222 Database A101 Susan 222 Database A
140 Lorenzo 224 Graphics B
1111
Practice Exercise #7, page #196Practice Exercise #7, page #196
1. Convert this table to a relation (named PART SUPPLIER) in 1NF2 Draw a relational schema for PART SUPPLIER and show the2. Draw a relational schema for PART SUPPLIER and show the
functional dependencies. Identify a candidate key.3. Identify each of the following: an insert anomaly, a delete anomaly, and
a modification anomaly.
1212
y
1313
T bl ith M lti l d
Figure: 4-22 Steps in normalization
Table with Multivalued attributes
First normal
Remove Multivalued Attributes
form (1NF)
Second normalform(2NF)
Remove Partial Dependencies
form(2NF)
Remove remaining
Remove …
Third normalform (3NF)
Boyce-Codd normalform (BC-NF)
Remove remaining anomalies resulting from multiple candidate keys
Remove MultivaluedFourth normalForm (4NF)
Remove Multivalued Dependencies
Remove Remaining
1414
Fifth normalform (5NF)
gAnomalies
Second Normal Form (2NF)( )
• 1NF and no partial (functional) dependencies• 1NF and no partial (functional) dependencies
• A partial dependency is when a non-key attribute depends on a part of the primary keydepends on a part of the primary key
• Of course, you need a composite PK
• Fig 4 2b is NOT in 2NF• Fig. 4-2b is NOT in 2NF
1515
Functional Dependencies in StudentFunctional Dependencies in Student
StudentId StuName CourseId CourseName Grade
Can represent FDs with arrows as above, or• StudentIdStuName,
• CourseId CourseNameS d Id C Id G d ( d S N• StudentId,CourseId Grade (and StuName, CourseName)
1616
Any partial FDs ?
Functional Dependencies in StudentFunctional Dependencies in Student
StudentId StuName CourseId CourseName Grade
Can represent FDs with arrows as above, or• StudentIdStuName,
• CourseId CourseNameS d Id C Id G d ( d S N• StudentId,CourseId Grade (and StuName, CourseName)
1717
Therefore, NOT in 2nd Normal Form!!
2NF: Normalizingg
• How do we convert the partial dependencies into normal ones ? By breaking into more tablesnormal ones ? By breaking into more tables.
StudentId StuName CourseId CourseName Grade
• Becomes … (notice above arrows mean functional dependency, below they mean FK constraints)
CourseId CourseNameStudentId StuName
1818
StudentId CourseId Grade
You Try It Now … y
SeriesId EpisodeId SeriesTitle EpisodeTitle AiringDate
• List all FDs
• Eliminate partial FDs, if anyEliminate partial FDs, if any
1919
2020
Third Normal Form
• 2NF and no transitive dependencies
• A transitive dependency is when a non-key attribute depends on another non-key p yattribute
2121
3NF Examplep
Course SectNum Classroom CapacityCourse SectNum Classroom Capacity
• Classroom Capacity
• Any partial FDs?
• Any transitive FDs?
2222
3NF Examplep
Course SectNum Classroom CapacityCourse SectNum Classroom Capacity
• Classroom Capacity TRANSITIVE
• Any partial FDs? NO
• Any transitive FDs? YES !
• How do we eliminate it?
• By breaking into its own table
2323
3NF Normalization
Course SectNum ClassroomCourse SectNum Classroom
Cl C itClassroom Capacity
2424
You Try It …y
StudentId ProgramId StudentName ProgramNameStudentId ProgramId StudentName ProgramName
• Partial FDs? Eliminate, if any.
• Transitive FDs? Eliminate, if any.
2525
Practice Exercise #15, page #198Practice Exercise #15, page #198
Insertion anomaly?Deletion anomaly?Modification anomaly?
1. Develop a diagram that shows the functional dependencies in the SHIPMENT relation.
2. In what normal form is SHIPMENT? Why? 3. Convert SHIPMENT to 3NF if necessary. Show the resulting table( s)
with the sample data presented in SHIPMENT.
2626
2727
T bl ith M lti l d
Figure: 4-22 Steps in normalization
Table with Multivalued attributes
First normal
Remove Multivalued Attributes
form (1NF)
Second normalform(2NF)
Remove Partial Dependencies
form(2NF)
Third normalform (3NF)
Remove remaining
Remove Transitive Dependencies
Boyce-Codd normalform (BC-NF)
Remove remaining anomalies resulting from multiple candidate keys
Remove MultivaluedFourth normalForm (4NF)
Remove Multivalued Dependencies
Remove Remaining
2828
Fifth normalform (5NF)
Anomalies
Further Normalization
• Boyce-Codd Normal form (BCNF)– Slight difference with 3NF– To be in 3NF but not in BNF, needs two composite
candidate keys, with one attribute of one key dependingcandidate keys, with one attribute of one key depending on one attribute of the other
– Not very common If a table contains only one candidate key the 3NF and– If a table contains only one candidate key, the 3NF and the BCNF are equivalent.
• Fourth Normal Form (4NF)( )– To break it, need to have multivalued dependencies, a
generalization of functional dependencies• Usually if you’re in 3NF you’re in BCNF 4NF
2929
• Usually, if you re in 3NF you re in BCNF, 4NF, …
Logical Database Design
You have just learned and completed one of pthe most important concepts and theories, p ,integrity constraintsand normalization, ,for developing a quality of database.
3030
q y
After learning one of most important database concepts and theories...
WHAT’S NEXT ?WHAT S NEXT ?
3131
Steps of Database Development
User view-1 User view-2 User view-3
…User view-N…
Conceptual Schema (Model)
…User interview & Integrated Model
Logical Model(ERD or E/ERD)
(Six) Relations Transformation (more relations produced)
NORMALIZATION (up to 3NF)(more tables
created)
3232
IMPLEMENTATION
M i R l tiMerging Relations
3333
Merging Relations (Vi I i )(View Integration)
• In a project development process, there mayIn a project development process, there may be a number of separate E-R diagrams and user views created and some of them mayuser views created and some of them may be redundant.
• Therefore some relations should be merged• Therefore, some relations should be merged to remove the redundancy.
3434
Merging Relations(Vi I t ti A l )(View Integration - An example)
EMPLOYEE1( EmployeeID, Name, Address, Phone)
EMPLOYEE2(EmployeeID, Name, Address, Jobcode,EMPLOYEE2(EmployeeID, Name, Address, Jobcode,
No_Years)
EMPLOYEE(EmployeeID, Name, Address, Phone, ( p y , , , ,Jobcode, No_Years)
3535
Merging Relations(P bl Vi I t ti )(Problems on View Integration)
Issues to watch out for when merging entities from different ER models: Synonyms: Different names, same meaning.
Homonyms: Same name, different meanings.y , g
Transitive Dependencies: –even if relations are in 3NF prior to merging, they may p g g y y
not be after merging
3636
Problems on View Integrationg
• Synonyms: Different names, same meaning.STUDENT1(StudentID, Name)
STUDENT2(MatriculationNo, Name, Address)
STUDENT(StudentNo, Name, Address)
• Homonyms: Same name, different meanings.STUDENT1(StudentID, Name, Address)
STUDENT2(StudentID, Name, PhoneNo, Address)
STUDENT(StudentID, Name, PhoneNo, Campus_Address, Permanent_Address)
3737
Problems on View IntegrationProblems on View Integration
• Transitive DependenciespSTUDENT1(StudentID, Major)STUDENT2(StudentID, Advisor)
• The result is ...STUDENT(StudentID, Major, Advisor) ??NF
• Suppose that each major has exactly one advisor. After removing transitive dependencies:
STUDENT_Major(StudentID, Major) Major_Advisor(Major, Advisor)
3838