Post on 15-Jan-2017
Normalization
Re-edited by:Oum Saokosal
Master of Engineering in Information Systems,
Jeonju University, South Korea012-252-752
oum_saokosal@yahoo.com
Normalization
Normalization: the process of converting complex data structures into simple, stable data structures.
The main idea is to avoid duplication of large data.
Why normalization? The relation derived from the user view or
data store will most likely be unnormalized. The problem usually happens when an
existing system uses unstructured file, e.g. in MS Excel.
The Three Steps of Normalization
The standard normalization has more than three steps: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Domain/Key Normal Form (DKNF)
However, only three steps (1NF, 2NF, 3NF) are sufficient for normalization.
I. First Normal Form (1NF)
The official qualifications for 1NF are:1. Each attribute must have a unique
name.2. Each attribute must have a single value.3. Row cannot be duplicated.4. There is no repeating groups.
Additional:1. Choose a primary key. The primary
key can be an attribute or combined attributes.
Name DOB Course PaymentSok 11/5/199
0IT 450 Dollars
Sao 4/4/1989 Mgt 400 DollarsChan 7/7/1991 IT Mgt IT: 450 Dollars
Mgt: 400 Dollars
Sok 11/5/1990
Mgt 400 Dollars
Sao 4/4/1989 Tour 1) 200 Dollars2) 200 Dollars1. Each attribute has unique name -> Good
2. The Payment has multi data type (currency & string) -> Bad
3. All rows are not duplicated -> Good4. The Course and Payment have repeating groups ->
Bad
Name DOB Course Payment ($)Sok 11/5/199
0IT 450
Sao 4/4/1989 Mgt 400Chan 7/7/1991 IT 450Chan 7/7/1991 Mgt 400Sok 11/5/199
0Mgt 400
Sao 4/4/1989 Tour 200Sao 4/4/1989 Tour 200All correct?
Name? No. Name has duplicated values.Or DOB, or Course or Payment? No. Each one has duplicated values.Name and DOB? No. They still have duplicated values.Name and DOB and Course? No. Still duplicated.
Combine all attribute? Still no. The last two rows are duplicated.So what else we can do? Of course, there is a way. Add a new attribute to be aprimary key. So let’s call it ID.
Not yet. Choose a primary key.
ID Name DOB Course Payment1 Sok 11/5/199
0IT 450
2 Sao 4/4/1989 Mgt 4003 Chan 7/7/1991 IT 4504 Chan 7/7/1991 Mgt 4005 Sok 11/5/199
0Mgt 300
6 Sao 4/4/1989 Tour 2007 Sao 4/4/1989 Tour 200Now it is completely in 1NF.
Next, check it if it is not in 2NF.
II. Second Normal Form (2NF)
The official qualifications for 2NF are:1. A table is already in 1NF.2. All nonkey attributes are fully dependent on the primary key.All partial dependencies are removed and placed in another table.
CourseID Semester Num Student
Course Name
IT101 2013-1 25 DatabaseIT101 2013-2 25 DatabaseIT102 2013-1 30 Web ProgIT102 2013-2 35 Web ProgIT103 2014-1 20 Networking
Assume you have a table below contain a primary (CourseID + Semester):
Primary Key
The Course Name depends on only CourseID, a part of the primary keynot the whole primary (CourseID + Semester).It’s called partial dependency.
Solution: Remove CourseID and Course Name together to create a new table.
CourseID Course NameIT101 DatabaseIT101 DatabaseIT102 Web ProgIT102 Web ProgIT103 Networking
SemesterDone? Oh no, it is still not in 1NF yet.
You have to remove the repeating groups too.
CourseID Course NameIT101 DatabaseIT102 Web ProgIT103 Networking
CourseID Semester Num Student
IT101 2013-1 25IT101 2013-2 25IT102 2013-1 30IT102 2013-2 35IT103 2014-1 20
1
M
III. Third Normal Form (3NF)
The official qualifications for 3NF are:1. A table is already in 2NF.2. Nonprimary key attributes do not depend on other nonprimary key attributes (i.e. no transitive dependencies)All transitive dependencies are removed and placed in another table.
StudyID Course Name Teacher Name Teacher Tel1 Database Sok Piseth 012 123 4562 Database Sao Kanha 0977 322 1113 Web Prog Chan Veasna 012 412 3334 Web Prog Chan Veasna 012 412 3335 Networking Pou Sambath 077 545 221
Assume you have a table below contain a primary (StudyID):
Primary Key The Teacher Tel is a nonkey attribute, andthe Teacher Name is also a nonkey atttribute.But Teacher Tel depends on Teacher Name.It is called transitive dependency.
Solution: Remove Teacher Name and Teacher Tel together to create a new table.
Teacher Name Teacher TelSok Piseth 012 123 456Sao Kanha 0977 322 111Chan Veasna 012 412 333Chan Veasna 012 412 333Pou Sambath 077 545 221
Done? Oh no, it is still not in 1NF yet. So you have to remove the repeating groups, and add a primary key.
Teacher Name
Teacher Tel
Sok Piseth 012 123 456Sao Kanha 0977 322 111Chan Veasna 012 412 333Pou Sambath 077 545 221
Note about primary key:- In theory, you can choose
Teacher Name to be a primary key.- But in practice, you should add
Teacher ID as the primary key.
T.ID Teacher Name
Teacher Tel
T1 Sok Piseth 012 123 456T2 Sao Kanha 0977 322
111T3 Chan Veasna 012 412 333T4 Pou Sambath 077 545 221
StudyID
Course Name T.ID
1 Database T12 Database T23 Web Prog T34 Web Prog T35 Networking T4M
1
ID Name DOB Course Payment1 Sok 11/5/199
0IT 450
2 Sao 4/4/1989 Mgt 4003 Chan 7/7/1991 IT 4504 Chan 7/7/1991 Mgt 4005 Sok 11/5/199
0Mgt 300
6 Sao 4/4/1989 Tour 2007 Sao 4/4/1989 Tour 200
What about this table?
In case of the above table, there is no 2NF because the primary keyis only one attribute, not the combined attributes.Therefore, you can skip 2NF and move to 3NF.In 3NF, you must remove transitive dependency. Both Name and DOB does not depend on ID. So remove them.Both Course and Payment does not depend on ID. So remove them.
ID Name DOBS1 Sok 11/5/199
0S2 Chan 7/7/1991S3 Sao 4/4/1989
CourseID CourseC1 ITC2 MgtC3 Tour
Student Course
PaymentPID SID Course Payment1 S1 C1 $4502 S3 C2 $4003 S2 C3 $4504 S2 C2 $4005 S1 C2 $3006 S2 C3 $2007 S2 C3 $200
1
M
1
M
Student CoursePayment
M N
PaymentID
Payment
For the Payment table, it is not done yet. It is a relationship between Student and Course.
Stop at 3NF
The most commonly used normal forms: First Normal Form(1NF) Second Normal Form (2NF) Third Normal Form (3NF)
Highest normalization is not always desirableMore JOINS are requiredAffect data retrieval performance/high
response time For most business database design purposes,
3NF is as high as we need to go in normalization process
Normalization in Real-World When you newly create a table in a
database tool, e.g. MS Access, SQL Server, MySQL, or Oracle, you won’t need all the steps.
The mentioned tools help you to overcome the 1NF already.
The 2NF happens when the primary key is combine attributes, e.g. StudentName + DOB. But to do so is unpractical.
Mostly, you only use 3NF. Because it can remove all transitive dependency.
Functional Dependency
A Bit More About Theory
20
Functional Dependencies
An important concept associated with normalization is functional dependency which describes the relationship between attributes.
21
Functional Dependencies
Functional dependency can be divided into two types: Full functional dependency/Partial
dependency (PD)• Will be used to transform 1NF 2NF
Transitive dependency (TD)• Will be used to transform 2NF 3NF
Functional Dependencies
22
Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).
1st row
2nd row
Relational SchemaSTUDENT(Stud_ID, Name, (Course_ID, Units))
Functional Dependencies
23
Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key (The Primary Key must be a Composite Key).
Cust_ID → Name
Functional Dependencies
24
Transitive Dependency – when a non-key attribute determines another non-key attribute.
Dept_ID → Dept_Name
Functional Dependencies
25
Consider a relation with attributes A and B, where attribute B is functionally depends on attribute A. Let say an A is a PK of R.
To describe the relationship between attributes A and B is to say that “A functionally determines B”.
A BB is functionallydepends on A
R(A,B)A B
26
Functional Dependencies When a functional dependency exist, the
attribute or group of attributes on the left-handed side of the arrow is called determinant.Determinant:
Refers to the attributes, or a group of attributes, on the left handed side of the arrow of a functional dependency.
A BA functionally determines B
staffNO sName position salary branchNoS21 Johan Manager 3000 B005S37 Ana Assistant 1200 B003S14 Daud Supervisor 1800 B003S9 Mary Assistant 900 B007S5 Siti Manager 2400 B003S41 Jani Assistant 900 B005
branchNO bAddressB005 123, KepongB007 456, NilaiB003 789, PTP
27
staff
branch
Functional Dependencies
Determinant
28
Functional Dependencies Consider the attributes staffNO and position of
the staff relation. For a specific staffNO (S21), we can determine
the position of that member of staff as Manager. staffNO functionally determines position.
Staff number (S21) Position (manager)
staffNO positionposition is functionallydepends on staffNO
29
Functional Dependencies
However the next figure illustrate that the opposite is not true, as position does not functionally determines staffNO.
A member of staff holds one position; however, they maybe several members of staff with the same position.
Position(manager)staff number (S21)
staff number (S5)
position staffNOstaffNO does not
functionallydepends on position
30
Partial Dependencies: Full functional dependency indicates that if A and B
are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A.
staff(staffNO,sName,position,salary,branchNO)
staffNO, staffName branchNO
True!!! each value of (staffNO, sName) is associated with a single value of branchNO.
however, branchNO is also functionally dependent on staffNO.
Functional Dependencies
31
Transitive Dependencies:
staff(staffNO,sName,position,salary,*branchNO)branch(branchNO,bAddress)
staffNO sName,position,salary,branchNO,bAddress
branchNO bAddress
True for transitive dependency!!! branchNO → bAddress
exists on staffNO via branchNO
Functional Dependencies
Normalization Process
32
Formal technique for analyzing relations based on their Primary Key (or candidate keys) and functional dependencies.
The technique executed as a series of steps (stage). Each step corresponds to a specific normal form, that have specific characteristic.
As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to anomalies.
Data Redundancies
0NF/UNF1NF2NF3NF
33
Normalization Process
2NF
3NF
UNF 1)Repeat Group2)PK is not defined
1NF 1)Remove Repeat Group2)Defined PK composite PK consist of
attributesTest for partial dependencyIf (exist)
(1 Table)
Test for transitive dependencyIf (exist)
(1 or 2 Tables)
(2 or 3 Tables)(more then 1 table)
(3 or 4 Tables)
(a b …. TD) 1(a ……. TD) 2(b ….… TD) 3
(a, b x, y) (a c, d) (b z) (c d)
Normalization Process Relation/Table Format -Have repeating group-PK not defined
-No repeating group-PK defined-Test partial dependency
-No repeating group-PK defined-No partial dependency-Test transitive dependency
-No repeating group-PK defined-No partial dependency-No transitive dependency
End of Chapter
Re-edited by:Oum Saokosal
Master of Engineering in Information Systems,
Jeonju University, South Korea012-252-752
oum_saokosal@yahoo.com