Normalization

35
Normalization

Transcript of Normalization

Page 1: Normalization

Normalization

Page 2: Normalization

Intro

• Good database design must be matched with good table structures

• E-R diagrams help use with overall design, but a good conceptual design does not necessarily lead to a good table structures

• Tables are the basic building blocks of the database• We wish to avoid data anomalies/redundancies by

controlling the table structure logically• The process of identifying and eliminating data

anomalies and redundancies is called normalization

Page 3: Normalization

Redundancy & Anomalies• Data redundancy = data stored in several places

– Too much data redundancy causes problems--which value is correct?– Data integrity and consistency suffer

• Data anomaly = abnormal data relationships – Insertion anomaly - Can’t add data because don’t know entire

primary key value, e.g., primary key based on first, middle, and last name

– Deletion anomaly - Deletions result in too many fields being removed unintentionally, e.g., delete an employee but lose transaction data

– Update anomaly - Change requires many updates, e.g., if you store customer names in transaction tables

Page 4: Normalization

Normalization

• Step-by-step process for eliminating data redundancies and anomalies

• Enables us to recognize bad table structures• Enables us to create good table structures• Stages are called normal forms each better than

previous (less anomalies/redundancy)– First normal form (1NF)– Second normal form (2NF)– Third normal form (3NF)

Page 5: Normalization

Normalization

• There are fourth and fifth normal forms that are seldom used

• Highest level not always the most desirable• In general, the higher the level the slower

the database response because of underlying pointer movement

• Most professionally designed databases reach third normal form

Page 6: Normalization

Need for Normalization

• To recognize good design, first look at bad one

• Example, construction company manages several projects and whose charges are dependent on employee’s position

Page 7: Normalization

Desired Report

Proj No Proj Name Emp No Emp Name Job Class Chg/Hr Hrs Billed Tot Chg1 Hurricane 101 John News Elec Eng 65 13 845

102 David Senior Comm Tech 60 16 960104 Anne Ramoras Comm Tech 60 19 1,140

Sub Tot 2,945

2 Coast 101 John News Elec Eng 65 15 975103 June Arbough Biol Eng 55 17 935

Sub Tot 1,910

3 Satellite 104 Anne Ramoras Comm Tech 60 18 1,080102 David Senior Comm Tech 60 14 840

Sub Tot 1,920Total 6,775

Page 8: Normalization

Table

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs1 Hurricane 101 John News Elec Eng 65 13

102 David Senior Comm Tech 60 16104 Anne Ramoras Comm Tech 60 19

2 Coast 101 John News Elec Eng 65 15103 June Arbough Biol Eng 55 17

3 Satellite 104 Anne Ramoras Comm Tech 60 18102 David Senior Comm Tech 60 14

Page 9: Normalization

Another View of Table

P_No P_Name E_No1 E_Name1 Job_Class1 Chg_Hr1 Hrs1 E_No2 E_Name2 Job_Class2 Chg_Hr2 Hrs21 Hurricane 101 John News Elec Eng 65 13 102 David Senior Comm Tech 60 162 Coast 101 John News Elec Eng 65 15 103 June Arbough Biol Eng 55 173 Satellite 104 Anne Ramoras Comm Tech 60 18 102 David Senior Comm Tech 60 14

Group 1 Group 2 Etc.

Page 10: Normalization

Problems

• P_No intended to be primary key but contains null values

• Data redundancies– Invites data inconsistencies (Elect Eng & EE)

• Anomalies– Update anomaly – modify Job_Class for E_No 101 requires

many alterations

– Insert anomaly – to add a project row we need an employee

– Deletion anomaly – delete E_No 101, we delete other vital data too

Page 11: Normalization

Problems• Date redundancy

– If add new employee to project 2 must type:

• Wastes data entry time• Wastes storage space• Leads to data inconsistency

– Huricane or Hurricane or Hurracaine

2 Coast 104 Anne Ramoras Comm Tech 60 19

Page 12: Normalization

Conversion to 1NF

• Table above has repeating groups

• Each P_No has a group of entries

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs1 Hurricane 101 John News Elec Eng 65 13

102 David Senior Comm Tech 60 16104 Anne Ramoras Comm Tech 60 19

Page 13: Normalization

1NF

• Eliminate repeating groups

• By adding entries in primary key column (at least)

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs1 Hurricane 101 John News Elec Eng 65 131 Hurricane 102 David Senior Comm Tech 60 161 Hurricane 104 Anne Ramoras Comm Tech 60 192 Coast 101 John News Elec Eng 65 152 Coast 103 June Arbough Biol Eng 55 173 Satellite 104 Anne Ramoras Comm Tech 60 183 Satellite 102 David Senior Comm Tech 60 14

Page 14: Normalization

Problems

• Primary key P_No does not uniquely identify all attributes in row

• Must create composite key made up of P_No & E_No

Page 15: Normalization

Dependency Diagram

• Helps us to discover relationships between entity attributes

• Upper arrows implies dependency on P_No & E_No

• Lower arrows implies dependency on only one attribute

P_No P_Name E_No E_Name Job_Class Chg_Hr Hrs

Page 16: Normalization

Dependencies

• Upper arrows– If you know P_No & E_No you can determine the other row values

• Lower arrows– Partial dependencies – based on only part of key– P_Name only dependent on P_No– E_Name, Job_Class, Chg_Hr only dependent on E_No

• Dependency diagram may be written:– P_No, E_No P_Name, E_Name, Job_Class, Chg_Hr, Hrs– P_No P_Name– E_No E_Name, Job_Class, Chg_Hr

Page 17: Normalization

New Table

• Composite primary key P_No & E_No

P_No E_No P_Name E_Name Job_Class Chg_Hr Hrs1 101 Hurricane John News Elec Eng 65 131 102 Hurricane David Senior Comm Tech 60 161 104 Hurricane Anne Ramoras Comm Tech 60 192 101 Coast John News Elec Eng 65 152 103 Coast June Arbough Biol Eng 55 173 104 Satellite Anne Ramoras Comm Tech 60 183 102 Satellite David Senior Comm Tech 60 14

Charges Table

Page 18: Normalization

1NF Definition

1. All the key attributes are defined – Any attribute that is part of the primary key

2. There are no repeating groups in the table– Each cell can contain one and only one value,

rather than set

3. All attributes are dependent on the primary key

Page 19: Normalization

Problems

• Contains partial dependencies– Dependencies base on only part of the primary key

• This makes table subject to data redundancies and hence to data anomalies

• Redundancy caused by fact that every row entry requires duplicate data– E.g., suppose E_No 105 is entered 20 times, must also enter

E_Name, Job_Class, Chg_Hr

• Anomalies caused by redundancy– E.g., employee name may be spelled Dave Senior or D.

Senior or David Senior

Page 20: Normalization

Conversion to 2NF

1. Starting with 1NF write each of the key components on separate lines, then write the original key on the last line

P_NoE_NoP_No E_No

• Each will become key in a new table• Original table split into three tables

Page 21: Normalization

Conversion to 2NF

2. Write the dependent attributes after each of the new keys using the dependency diagram

P_No P_Name

E_No E_Name, Job_Class, Chg_Hr

P_No E_No Hrs

Page 22: Normalization

Three New Tables

P_No E_No Hrs1 101 131 102 161 104 192 101 152 103 173 104 183 102 14

Project Table Employee Table

Assign Table

E_No E_Name Job_Class Chg_Hr101 John News Elec Eng 65102 David Senior Comm Tech 60103 June Arbough Biol Eng 55104 Anne Ramoras Comm Tech 60

P_No P_Name1 Hurricane2 Coast3 Satellite

Page 23: Normalization

2NF Definition

1. Table is in 1NF and2. It includes no partial dependencies (no attribute

is dependent on only a portion of the primary key)

• Note: Since partial dependencies can exist only if there is a composite key, a table with a single attribute as primary key is automatically in 2NF if it is in 1NF

Page 24: Normalization

Problem - Transitive Dependency

• Note that Chg_Hr is dependent on Job_Class, but neither Chg_Hr nor Job_Class is part of the primary key

• This is called transitive dependency– A condition in which an attribute is functionally

dependent on non-key attributes (another attribute that is not part of the primary key)

• Transitive dependency yields data anomalies

Page 25: Normalization

Conversion to 3NF• Break off the pieces that are identified by the transitive

dependency arrows (lower arrows) in the dependency diagram

• Store them in a separate tableP_No P_NameE_No E_Name, Job_ClassP_No E_No HrsJob_Class Chg_Hr

• Note: Job_Class must be retained in Employee table to establish a link to the newly created Job table

Page 26: Normalization

New Tables

P_No E_No Hrs1 101 131 102 161 104 192 101 152 103 173 104 183 102 14

Project Table Employee Table

Assign Table

P_No P_Name1 Hurricane2 Coast3 Satellite

Job_Class Chg_HrBiol Eng 55Comm Tech 60Elec Eng 65

Job Table

E_No E_Name Job_Class101 John News Elec Eng102 David Senior Comm Tech103 June Arbough Biol Eng104 Anne Ramoras Comm Tech

Page 27: Normalization

3NF Definition

1. Table is in 2NF and

2. It contains no transitive dependencies

Page 28: Normalization

Problem

• Although the four tables are in 3NF, we have a potential problem

• The Job_Class is entered for each new employee in the Employee table

• For example, too easy to enter Electrical Engr, or EE, or El Eng

Page 29: Normalization

Problem

E_No E_Name Job_Class101 John News Elec Eng102 David Senior Comm Tech103 June Arbough Biol Eng104 Anne Ramoras Comm Tech104 John Smith Comm Tech105 Alice White Biol Eng106 Bob Jones Elec Eng

Employee Table

Page 30: Normalization

New Attribute

• Create a Job_Code attribute to serve as primary key in the Job table and as a foreign key in the Employee table

Page 31: Normalization

Changed Tables

P_No E_No Hrs1 101 131 102 161 104 192 101 152 103 173 104 183 102 14

Project Table Employee Table

Assign Table

P_No P_Name1 Hurricane2 Coast3 Satellite

Job Table

E_No E_Name Job_Code101 John News 502102 David Senior 501103 June Arbough 500104 Anne Ramoras 501

Job_Code Job_Class Chg_Hr500 Biol Eng 55501 Comm Tech 60502 Elec Eng 65

Page 32: Normalization

3NF Version• Vast improvement over original design• No data anomalies

– In the Job table each job code has single job class and charge per hour entry

• No opportunities to use different values describing same object

– Similarly for Employee & Project tables – only one entry for each attribute

– Also, Assign table has only what is needed

• Data redundancy has been minimized– Keys are redundant but these are small– Assign table is very active but requires only the P_No,

E_No, and hours

Page 33: Normalization

Other Normal Forms

• 3NF is the most appropriate form for most applications

• There are other normal forms• 4NF – Isolate independent multiple relationships• 5NF – Isolate semantically related multiple

relationships• These are advanced and go beyond the scope of

the course

Page 34: Normalization

Summary

• 1NF – Eliminate repeating groups• 2NF – Eliminate partial dependencies• 3NF – Eliminate transitive dependencies

• Tables are the critical building blocks for the database

• It is important to design their structure well• Normalization is a formal process for doing this

that reduces data redundancy and anomalies

Page 35: Normalization

End

References:

New Perspectives on Microsoft Access 2000, Introductory, by Adamski, Hommel, and Finnegan, Course

Technology, 1999.

Access Database Design and Programming, Third Edition, by Roman, O’Reilly, 2002.

Database Systems: Design, Implementation, and Management, by Rob & Coronel, Boyd & Fraser, 1995.