Database Concept - Normalization (1NF, 2NF, 3NF)

34
Normalization Re-edited by: Oum Saokosal Master of Engineering in Information Systems, Jeonju University, South Korea 012-252-752 [email protected]

Transcript of Database Concept - Normalization (1NF, 2NF, 3NF)

Page 1: Database Concept - Normalization (1NF, 2NF, 3NF)

Normalization

Re-edited by:Oum Saokosal

Master of Engineering in Information Systems,

Jeonju University, South Korea012-252-752

[email protected]

Page 2: Database Concept - Normalization (1NF, 2NF, 3NF)

Normalization

Normalization: the process of converting complex data structures into simple, stable data structures.

The main idea is to avoid duplication of large data.

Why normalization? The relation derived from the user view or

data store will most likely be unnormalized. The problem usually happens when an

existing system uses unstructured file, e.g. in MS Excel.

Page 3: Database Concept - Normalization (1NF, 2NF, 3NF)

The Three Steps of Normalization

The standard normalization has more than three steps: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Domain/Key Normal Form (DKNF)

However, only three steps (1NF, 2NF, 3NF) are sufficient for normalization.

Page 4: Database Concept - Normalization (1NF, 2NF, 3NF)

I. First Normal Form (1NF)

The official qualifications for 1NF are:1. Each attribute must have a unique

name.2. Each attribute must have a single value.3. Row cannot be duplicated.4. There is no repeating groups.

Additional:1. Choose a primary key. The primary

key can be an attribute or combined attributes.

Page 5: Database Concept - Normalization (1NF, 2NF, 3NF)

Name DOB Course PaymentSok 11/5/199

0IT 450 Dollars

Sao 4/4/1989 Mgt 400 DollarsChan 7/7/1991 IT Mgt IT: 450 Dollars

Mgt: 400 Dollars

Sok 11/5/1990

Mgt 400 Dollars

Sao 4/4/1989 Tour 1) 200 Dollars2) 200 Dollars1. Each attribute has unique name -> Good

2. The Payment has multi data type (currency & string) -> Bad

3. All rows are not duplicated -> Good4. The Course and Payment have repeating groups ->

Bad

Page 6: Database Concept - Normalization (1NF, 2NF, 3NF)

Name DOB Course Payment ($)Sok 11/5/199

0IT 450

Sao 4/4/1989 Mgt 400Chan 7/7/1991 IT 450Chan 7/7/1991 Mgt 400Sok 11/5/199

0Mgt 400

Sao 4/4/1989 Tour 200Sao 4/4/1989 Tour 200All correct?

Name? No. Name has duplicated values.Or DOB, or Course or Payment? No. Each one has duplicated values.Name and DOB? No. They still have duplicated values.Name and DOB and Course? No. Still duplicated.

Combine all attribute? Still no. The last two rows are duplicated.So what else we can do? Of course, there is a way. Add a new attribute to be aprimary key. So let’s call it ID.

Not yet. Choose a primary key.

Page 7: Database Concept - Normalization (1NF, 2NF, 3NF)

ID Name DOB Course Payment1 Sok 11/5/199

0IT 450

2 Sao 4/4/1989 Mgt 4003 Chan 7/7/1991 IT 4504 Chan 7/7/1991 Mgt 4005 Sok 11/5/199

0Mgt 300

6 Sao 4/4/1989 Tour 2007 Sao 4/4/1989 Tour 200Now it is completely in 1NF.

Next, check it if it is not in 2NF.

Page 8: Database Concept - Normalization (1NF, 2NF, 3NF)

II. Second Normal Form (2NF)

The official qualifications for 2NF are:1. A table is already in 1NF.2. All nonkey attributes are fully dependent on the primary key.All partial dependencies are removed and placed in another table.

Page 9: Database Concept - Normalization (1NF, 2NF, 3NF)

CourseID Semester Num Student

Course Name

IT101 2013-1 25 DatabaseIT101 2013-2 25 DatabaseIT102 2013-1 30 Web ProgIT102 2013-2 35 Web ProgIT103 2014-1 20 Networking

Assume you have a table below contain a primary (CourseID + Semester):

Primary Key

The Course Name depends on only CourseID, a part of the primary keynot the whole primary (CourseID + Semester).It’s called partial dependency.

Solution: Remove CourseID and Course Name together to create a new table.

Page 10: Database Concept - Normalization (1NF, 2NF, 3NF)

CourseID Course NameIT101 DatabaseIT101 DatabaseIT102 Web ProgIT102 Web ProgIT103 Networking

SemesterDone? Oh no, it is still not in 1NF yet.

You have to remove the repeating groups too.

CourseID Course NameIT101 DatabaseIT102 Web ProgIT103 Networking

CourseID Semester Num Student

IT101 2013-1 25IT101 2013-2 25IT102 2013-1 30IT102 2013-2 35IT103 2014-1 20

1

M

Page 11: Database Concept - Normalization (1NF, 2NF, 3NF)

III. Third Normal Form (3NF)

The official qualifications for 3NF are:1. A table is already in 2NF.2. Nonprimary key attributes do not depend on other nonprimary key attributes (i.e. no transitive dependencies)All transitive dependencies are removed and placed in another table.

Page 12: Database Concept - Normalization (1NF, 2NF, 3NF)

StudyID Course Name Teacher Name Teacher Tel1 Database Sok Piseth 012 123 4562 Database Sao Kanha 0977 322 1113 Web Prog Chan Veasna 012 412 3334 Web Prog Chan Veasna 012 412 3335 Networking Pou Sambath 077 545 221

Assume you have a table below contain a primary (StudyID):

Primary Key The Teacher Tel is a nonkey attribute, andthe Teacher Name is also a nonkey atttribute.But Teacher Tel depends on Teacher Name.It is called transitive dependency.

Solution: Remove Teacher Name and Teacher Tel together to create a new table.

Page 13: Database Concept - Normalization (1NF, 2NF, 3NF)

Teacher Name Teacher TelSok Piseth 012 123 456Sao Kanha 0977 322 111Chan Veasna 012 412 333Chan Veasna 012 412 333Pou Sambath 077 545 221

Done? Oh no, it is still not in 1NF yet. So you have to remove the repeating groups, and add a primary key.

Teacher Name

Teacher Tel

Sok Piseth 012 123 456Sao Kanha 0977 322 111Chan Veasna 012 412 333Pou Sambath 077 545 221

Note about primary key:- In theory, you can choose

Teacher Name to be a primary key.- But in practice, you should add

Teacher ID as the primary key.

T.ID Teacher Name

Teacher Tel

T1 Sok Piseth 012 123 456T2 Sao Kanha 0977 322

111T3 Chan Veasna 012 412 333T4 Pou Sambath 077 545 221

StudyID

Course Name T.ID

1 Database T12 Database T23 Web Prog T34 Web Prog T35 Networking T4M

1

Page 14: Database Concept - Normalization (1NF, 2NF, 3NF)

ID Name DOB Course Payment1 Sok 11/5/199

0IT 450

2 Sao 4/4/1989 Mgt 4003 Chan 7/7/1991 IT 4504 Chan 7/7/1991 Mgt 4005 Sok 11/5/199

0Mgt 300

6 Sao 4/4/1989 Tour 2007 Sao 4/4/1989 Tour 200

What about this table?

In case of the above table, there is no 2NF because the primary keyis only one attribute, not the combined attributes.Therefore, you can skip 2NF and move to 3NF.In 3NF, you must remove transitive dependency. Both Name and DOB does not depend on ID. So remove them.Both Course and Payment does not depend on ID. So remove them.

Page 15: Database Concept - Normalization (1NF, 2NF, 3NF)

ID Name DOBS1 Sok 11/5/199

0S2 Chan 7/7/1991S3 Sao 4/4/1989

CourseID CourseC1 ITC2 MgtC3 Tour

Student Course

PaymentPID SID Course Payment1 S1 C1 $4502 S3 C2 $4003 S2 C3 $4504 S2 C2 $4005 S1 C2 $3006 S2 C3 $2007 S2 C3 $200

1

M

1

M

Page 16: Database Concept - Normalization (1NF, 2NF, 3NF)

Student CoursePayment

M N

PaymentID

Payment

For the Payment table, it is not done yet. It is a relationship between Student and Course.

Page 17: Database Concept - Normalization (1NF, 2NF, 3NF)

Stop at 3NF

The most commonly used normal forms: First Normal Form(1NF) Second Normal Form (2NF) Third Normal Form (3NF)

Highest normalization is not always desirableMore JOINS are requiredAffect data retrieval performance/high

response time For most business database design purposes,

3NF is as high as we need to go in normalization process

Page 18: Database Concept - Normalization (1NF, 2NF, 3NF)

Normalization in Real-World When you newly create a table in a

database tool, e.g. MS Access, SQL Server, MySQL, or Oracle, you won’t need all the steps.

The mentioned tools help you to overcome the 1NF already.

The 2NF happens when the primary key is combine attributes, e.g. StudentName + DOB. But to do so is unpractical.

Mostly, you only use 3NF. Because it can remove all transitive dependency.

Page 19: Database Concept - Normalization (1NF, 2NF, 3NF)

Functional Dependency

A Bit More About Theory

Page 20: Database Concept - Normalization (1NF, 2NF, 3NF)

20

Functional Dependencies

An important concept associated with normalization is functional dependency which describes the relationship between attributes.

Page 21: Database Concept - Normalization (1NF, 2NF, 3NF)

21

Functional Dependencies

Functional dependency can be divided into two types: Full functional dependency/Partial

dependency (PD)• Will be used to transform 1NF 2NF

Transitive dependency (TD)• Will be used to transform 2NF 3NF

Page 22: Database Concept - Normalization (1NF, 2NF, 3NF)

Functional Dependencies

22

Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).

1st row

2nd row

Relational SchemaSTUDENT(Stud_ID, Name, (Course_ID, Units))

Page 23: Database Concept - Normalization (1NF, 2NF, 3NF)

Functional Dependencies

23

Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key (The Primary Key must be a Composite Key).

Cust_ID → Name

Page 24: Database Concept - Normalization (1NF, 2NF, 3NF)

Functional Dependencies

24

Transitive Dependency – when a non-key attribute determines another non-key attribute.

Dept_ID → Dept_Name

Page 25: Database Concept - Normalization (1NF, 2NF, 3NF)

Functional Dependencies

25

Consider a relation with attributes A and B, where attribute B is functionally depends on attribute A. Let say an A is a PK of R.

To describe the relationship between attributes A and B is to say that “A functionally determines B”.

A BB is functionallydepends on A

R(A,B)A B

Page 26: Database Concept - Normalization (1NF, 2NF, 3NF)

26

Functional Dependencies When a functional dependency exist, the

attribute or group of attributes on the left-handed side of the arrow is called determinant.Determinant:

Refers to the attributes, or a group of attributes, on the left handed side of the arrow of a functional dependency.

A BA functionally determines B

Page 27: Database Concept - Normalization (1NF, 2NF, 3NF)

staffNO sName position salary branchNoS21 Johan Manager 3000 B005S37 Ana Assistant 1200 B003S14 Daud Supervisor 1800 B003S9 Mary Assistant 900 B007S5 Siti Manager 2400 B003S41 Jani Assistant 900 B005

branchNO bAddressB005 123, KepongB007 456, NilaiB003 789, PTP

27

staff

branch

Functional Dependencies

Determinant

Page 28: Database Concept - Normalization (1NF, 2NF, 3NF)

28

Functional Dependencies Consider the attributes staffNO and position of

the staff relation. For a specific staffNO (S21), we can determine

the position of that member of staff as Manager. staffNO functionally determines position.

Staff number (S21) Position (manager)

staffNO positionposition is functionallydepends on staffNO

Page 29: Database Concept - Normalization (1NF, 2NF, 3NF)

29

Functional Dependencies

However the next figure illustrate that the opposite is not true, as position does not functionally determines staffNO.

A member of staff holds one position; however, they maybe several members of staff with the same position.

Position(manager)staff number (S21)

staff number (S5)

position staffNOstaffNO does not

functionallydepends on position

Page 30: Database Concept - Normalization (1NF, 2NF, 3NF)

30

Partial Dependencies: Full functional dependency indicates that if A and B

are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A.

staff(staffNO,sName,position,salary,branchNO)

staffNO, staffName branchNO

True!!! each value of (staffNO, sName) is associated with a single value of branchNO.

however, branchNO is also functionally dependent on staffNO.

Functional Dependencies

Page 31: Database Concept - Normalization (1NF, 2NF, 3NF)

31

Transitive Dependencies:

staff(staffNO,sName,position,salary,*branchNO)branch(branchNO,bAddress)

staffNO sName,position,salary,branchNO,bAddress

branchNO bAddress

True for transitive dependency!!! branchNO → bAddress

exists on staffNO via branchNO

Functional Dependencies

Page 32: Database Concept - Normalization (1NF, 2NF, 3NF)

Normalization Process

32

Formal technique for analyzing relations based on their Primary Key (or candidate keys) and functional dependencies.

The technique executed as a series of steps (stage). Each step corresponds to a specific normal form, that have specific characteristic.

As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to anomalies.

Data Redundancies

0NF/UNF1NF2NF3NF

Page 33: Database Concept - Normalization (1NF, 2NF, 3NF)

33

Normalization Process

2NF

3NF

UNF 1)Repeat Group2)PK is not defined

1NF 1)Remove Repeat Group2)Defined PK composite PK consist of

attributesTest for partial dependencyIf (exist)

(1 Table)

Test for transitive dependencyIf (exist)

(1 or 2 Tables)

(2 or 3 Tables)(more then 1 table)

(3 or 4 Tables)

(a b …. TD) 1(a ……. TD) 2(b ….… TD) 3

(a, b x, y) (a c, d) (b z) (c d)

Normalization Process Relation/Table Format -Have repeating group-PK not defined

-No repeating group-PK defined-Test partial dependency

-No repeating group-PK defined-No partial dependency-Test transitive dependency

-No repeating group-PK defined-No partial dependency-No transitive dependency

Page 34: Database Concept - Normalization (1NF, 2NF, 3NF)

End of Chapter

Re-edited by:Oum Saokosal

Master of Engineering in Information Systems,

Jeonju University, South Korea012-252-752

[email protected]