Chapter 5 Normalization of Database Tables

59
Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Manageme Peter Rob & Carlos Coronel

description

Chapter 5 Normalization of Database Tables. Database Systems: Design, Implementation, and Management Peter Rob & Carlos Coronel. In this chapter, you will learn:. What normalization is and what role it plays in database design About the normal forms 1NF , 2NF , 3NF , BCNF , and 4NF - PowerPoint PPT Presentation

Transcript of Chapter 5 Normalization of Database Tables

Page 1: Chapter 5 Normalization of Database Tables

Chapter 5Normalization of Database TablesChapter 5Normalization of Database Tables

Database Systems: Design, Implementation, and Management

Peter Rob & Carlos Coronel

Page 2: Chapter 5 Normalization of Database Tables

In this chapter, you will learn:

What normalization is and what role it plays in database design

About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF

How normal forms can be transformed from lower normal forms to higher normal forms

That normalization and E-R modeling are used concurrently to produce a good database design

That some situations require denormalization to generate information efficiently

Page 3: Chapter 5 Normalization of Database Tables

Database Tables and NormalizationDatabase Tables and Normalization Normalization

Process for evaluating and correcting table structures to minimize data redundancies

process for assigning attributes to tables. It reduces data redundancies helps eliminate data anomalies. produces controlled redundancies to link tables

Normalization works through a series of stages called normal forms: First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Fourth normal form (4NF)

Page 4: Chapter 5 Normalization of Database Tables

Database Tables and NormalizationDatabase Tables and Normalization

Normalization 2NF is better than 1NF; 3NF is better than 2NF For most business database design purposes,

3NF is as high as we need to go in normalization process

The highest level of normalization is not always most desirable.

Page 5: Chapter 5 Normalization of Database Tables

Database Tables and NormalizationDatabase Tables and Normalization

The Need for Normalization

Case of a Construction Company

Building project -- Project number, Name, Employees assigned to the project.

Employee -- Employee number, Name, Job classification

The company charges its clients by billing the hours spent on each project. The hourly billing rate is dependent on the employee’s position.

Periodically, a report is generated. Table 5.1

The easiest way to generate the required report might seem to be a table whose contents correspond to the reporting requirements. Figure 5.1

Page 6: Chapter 5 Normalization of Database Tables
Page 7: Chapter 5 Normalization of Database Tables
Page 8: Chapter 5 Normalization of Database Tables

Need for Normalization:Problems with the Figure 5.1

The project number is intended to be a primary key, but it contains nulls.

The table displays data redundancies.

The table entries invite data inconsistencies.

The data redundancies yield the following anomalies:

Update anomalies. (modify JOB_CLASS for Employee 105)

Insertion anomalies. (a new Employee not yet assigned)

Deletion anomalies. ( Employee 103 quits)

Database Tables and NormalizationDatabase Tables and Normalization

Page 9: Chapter 5 Normalization of Database Tables

The Normalization ProcessThe Normalization Process

Page 10: Chapter 5 Normalization of Database Tables

Conversion to 1NF Repeating groups – a group of multiple entries can exist for any

single key attribute occurrence.

Repeating groups must be eliminatedAny project number (PROJ_NUM) can have a group of several data entries.

A relational table must not contain repeating groups.

Database Tables and NormalizationDatabase Tables and Normalization

Page 11: Chapter 5 Normalization of Database Tables

Step 1: Eliminate the Repeating Groups – Repeating groups can be eliminated by adding the appropriate entry in at least the primary key column(s).

Step 2: Identify the Primary Key• Uniquely identifies attribute values (rows)• Combination of PROJ_NUM and EMP_NUM

Page 12: Chapter 5 Normalization of Database Tables

Step 3: Identify all Dependencies Dependency Diagram

The primary key components are bold, underlined, and shaded in a different color.

The arrows above entities indicate all desirable dependencies ( dependencies based on PK )

The arrows below the dependency diagram indicate less desirable dependencies –

– partial dependencies ( dependencies based on only a part of PK )

– transitive dependencies ( nonprime attribute → nonprime attribute )

Prime attribute = Key attribute Nonprime attribute = Nonkey attribute

Database Tables and NormalizationDatabase Tables and Normalization

Page 13: Chapter 5 Normalization of Database Tables

Database Tables and NormalizationDatabase Tables and Normalization EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR PROJ_NUM → PROJ_NAME JOB_CLASS → CHG_HOUR

Page 14: Chapter 5 Normalization of Database Tables

1NF Definition The term first normal form (1NF) describes the

tabular format in which: All the key attributes are defined. There are no repeating groups in the table.

Each row/col intersection can contain one and only one value, not set of values.

All attributes are dependent on the primary key.

All relational tables satisfy the 1NF requirements. 1NF Drawback

Partial dependencies (EMP_NUM → EMP_NAME, JOB_CLASS_,

CHG_HOUR)→→ data redundancies→→ data anomalies

Database Tables and NormalizationDatabase Tables and Normalization

Page 15: Chapter 5 Normalization of Database Tables

Conversion to 2NF Step 1: Identify All Key Components

Writing each key component on a separate line, and then writing the original key on the last line and

PROJ_NUM

EMP_NUM

PROJ_NUM, EMP_NUM

Step 2: Identify the Dependent Attributes Writing the dependent attributes after each new key.

PROJECT ( PROJ_NUM, PROJ_NAME)

EMPLOYEE ( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)

ASSIGN ( PROJ_NUM, EMP_NUM, HOURS)

Database Tables and NormalizationDatabase Tables and Normalization

Page 16: Chapter 5 Normalization of Database Tables
Page 17: Chapter 5 Normalization of Database Tables

2NF Definition

A table is in 2NF if: It is in 1NF and

It includes no partial dependencies; that is, no attribute is dependent on only portion of primary key.

A table whose primary key is not compositemust automatically be in 2NF.BECAUSE a partial dependency can exist only if a table has a composite primary key

It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes.

2NF Drawback Transitive dependencies (JOB_CLASS → CHG_HOUR)

→→ data redundancies→→ data anomalies

Database Tables and NormalizationDatabase Tables and Normalization

Page 18: Chapter 5 Normalization of Database Tables

Conversion to 3NF Create a separate table with attributes in a transitive

functional dependence relationship. Step 1: Identify Each New Determinant

JOB_CLASS Step 2: Identify the Dependent Attributes

JOB_CLASS → CHG_HOUR Step 3: Remove the Dependent Attributes from Transitive

Dependencies

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)

JOB (JOB_CLASS, CHG_HOUR)

PROJECT (PROJ_NUM, PROJ_NAME)

ASSIGN (PROJ_NUM, EMP_NUM, HOURS)

Database Tables and NormalizationDatabase Tables and Normalization

Page 19: Chapter 5 Normalization of Database Tables
Page 20: Chapter 5 Normalization of Database Tables

3NF Definition A table is in 3NF if:

It is in 2NF and It contains no transitive dependencies.

Database Tables and NormalizationDatabase Tables and Normalization

Page 21: Chapter 5 Normalization of Database Tables

Improving the DesignImproving the Design

Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies

Normalization cannot, by itself, be relied on to make good designs

It is valuable because its use helps eliminate data redundancies

Page 22: Chapter 5 Normalization of Database Tables

Improving the DesignImproving the Design

Issues to address in order to produce a good normalized set of tables: Evaluate PK Assignments Evaluate Naming Conventions Refine Attribute Atomicity Identify New Attributes Identify New Relationships Refine Primary Keys as Required for Data Granularity Maintain Historical Accuracy Evaluate Using Derived Attributes

Page 23: Chapter 5 Normalization of Database Tables

Improving the DesignImproving the Design Adding relationships

Project’s manager EMP_NUM as a FK in PROJECT.

3NF3NF

Project manager

Page 24: Chapter 5 Normalization of Database Tables

Improving the DesignImproving the Design PK assignment

JOB_CODE Naming conventions

JOB_CHG_HOUR JOB_CLASS >> JOB_DESCRIPTION

2NF2NF

Page 25: Chapter 5 Normalization of Database Tables

Improving the DesignImproving the Design Attribute atomicity

EMP_NAME >> EMP_LNAME,EMP_FNAME,EMP_INITIAL Adding attributes

EMP_HIREDATE JOB_CLASS

3NF3NF

Page 26: Chapter 5 Normalization of Database Tables

Improving the DesignImproving the Design Refining PKs

(EMP_NUM+PROJ_NUM) >> ASSIGN_NUM Maintaining historical accuracy

ASSIGN_CHG_HOUR <<>> JOB_CHG_HOUR Using derived attributes

ASSIGN_CHARGE = ASSIGN_HOURS × ASSIGN_CHG_HOUR

3NF3NF

Page 27: Chapter 5 Normalization of Database Tables
Page 28: Chapter 5 Normalization of Database Tables
Page 29: Chapter 5 Normalization of Database Tables

Surrogate Key ConsiderationsSurrogate Key Considerations

When primary key is considered to be unsuitable, designers use system-defined surrogate keys The DBMS can be used to have the system assign

the PK values (JOB_CODE)>> to ensure entity integrity

Page 30: Chapter 5 Normalization of Database Tables

Limitations on system-defined surrogate keysLimitations on system-defined surrogate keys

Data entries in Table 5.3 are inappropriate because they duplicate existing records

However, it does not prevent us from making the entries shown in Table 5.3. >> Multiple duplicate records problem

We still must ensure the uniqueness in JOB_DESCRIPTION through the use of a unique index.

Page 31: Chapter 5 Normalization of Database Tables

Boyce-Codd Normal Form (BCNF) A table is in Boyce-Codd normal form (BCNF)

if every determinant in the table is a candidate key.

(A determinant is any attribute whose value determines other values with a row.)

If a table contains only one candidate key, the 3NF and the BCNF are equivalent.

BCNF can be violated only if the table contains more than one candidate key

BCNF is a special case of 3NF.

Figure 5.7 illustrates a table that is in 3NF but not in BCNF.

Figure 5.8 shows how the table can be decomposed to conform to the BCNF form.

Database Tables and NormalizationDatabase Tables and Normalization

Page 32: Chapter 5 Normalization of Database Tables

A Table That Is In 3NF But Not In BCNF A + B → C, D

C → B : Not transitive dependencies (A nonkey attribute is the determinant of a key attribute) → → 3NF

C : Not candidate key → → Not In BCNF

Page 33: Chapter 5 Normalization of Database Tables

The Decomposition of a Table Structure to meet BCNF Requirements

A + B → C, D

C → B

A + C → D

C → B

A + C → B, D

C → B

Change the PK to A+C

Page 34: Chapter 5 Normalization of Database Tables

The Boyce-Codd Normal Form (BCNF)The Boyce-Codd Normal Form (BCNF)

STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE

CLASS_CODE → STAFF_ID

Page 35: Chapter 5 Normalization of Database Tables

Decomposition into BCNF

Figure 5.9

Page 36: Chapter 5 Normalization of Database Tables

Normalization and Database DesignNormalization and Database Design

Normalization should be part of the design process E-R Diagram provides macro view Normalization provides micro view of entities

Focuses on characteristics of specific entities A micro view of the entities within the ER diagram

Difficult to separate normalization from E-R diagramming Two techniques should be used concurrently

Page 37: Chapter 5 Normalization of Database Tables

Normalization and Database DesignNormalization and Database Design Database Design and Normalization Example:

(Construction Company)

Summary of Operations:

The company manages many projects.

Each project requires the services of many employees.

An employee may be assigned to several different projects.

Some employees are not assigned to a project and perform duties not specifically related to a project.

Some employees are part of a labor pool, to be shared by all project teams.

Each employee has a (single) primary job classification. This job classification determines the hourly billing rate.

Many employees can have the same job classification.

Page 38: Chapter 5 Normalization of Database Tables

Two Initial Entities: PROJECT (PROJ_NUM, PROJ_NAME) [ 3NF ]

EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOUR)

No partial dep.

Transitive dep. JOB_DESCRIPTION → JOB_CHG_HOUR [ 2NF ]

Normalization and Database DesignNormalization and Database Design

Page 39: Chapter 5 Normalization of Database Tables

Normalization and Database DesignNormalization and Database Design

Page 40: Chapter 5 Normalization of Database Tables

Three Entities After Transitive Dependency Removed

PROJECT (PROJ_NUM, PROJ_NAME)

EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE)

JOB ( JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)

Normalization and Database DesignNormalization and Database Design

EMPLOYEE

Page 41: Chapter 5 Normalization of Database Tables

The Modified ERD For A Contracting Company

Because the normalization process yields an additional entity (JOB),we modify the initial ERD.

M

Is held by

Page 42: Chapter 5 Normalization of Database Tables

Normalization and Database DesignNormalization and Database Design

Page 43: Chapter 5 Normalization of Database Tables

Normalization and Database DesignNormalization and Database Design

Page 44: Chapter 5 Normalization of Database Tables

Creation of the Composite Entity ASSIGNMENT

Normalization and Database DesignNormalization and Database Design

The Final ( Implementable) ERD for the Contracting Company

Is held by

ASSIGNMENT

Page 45: Chapter 5 Normalization of Database Tables

Normalization and Database DesignNormalization and Database Design

Page 46: Chapter 5 Normalization of Database Tables

Attribute ASSIGN_HOUR is assigned to the composite entity ASSIGN.

“Manages” relationship is created between EMPLOYEE and PROJECT.

PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM)

ASSIGNMENT (ASSIGN_NUM, ASSIGN_DATE , ASSIGN_HOURS, ASSIGN_CHG_HOUR , ASSIGN_CHARGE,

EMP_NUM, PROJ_NUM)

EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_HIREDATE, JOB_CODE)

JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)

Normalization and Database DesignNormalization and Database Design

Manages

Page 47: Chapter 5 Normalization of Database Tables

Normalization and Database DesignNormalization and Database Design

Page 48: Chapter 5 Normalization of Database Tables

Higher-Level Normal FormsHigher-Level Normal Forms

In some databases, multiple multivalued attributes exist

4NF Definition A table is in 4NF

if it is in 3NF and has no multiple sets of multivalued dependencies.

Page 49: Chapter 5 Normalization of Database Tables

Higher-Level Normal FormsHigher-Level Normal Forms An employee can have multiple assignments and can also be

involved in multiple service organization. EMP_SERVICE (volunteer work) and EMP_ASSIGN (assigned

project)each may have many different values.

The table contain two sets of multivalued dependencies.

1

Independent

Page 50: Chapter 5 Normalization of Database Tables

The solution is to eliminate the problems caused by independent multivalued dependencies.

A Set of Tables in 4NF

Page 51: Chapter 5 Normalization of Database Tables

EMPLOYEE

ORGANIZATION

PROJECT

Service

Assign

M

M

N

N

Page 52: Chapter 5 Normalization of Database Tables

DenormalizationDenormalization

Creation of normalized relations is important database design goal

Processing requirements should also be a goal

If tables decomposed to conform to normalization requirements

Number of database tables expands

Joining larger number of tables takes additional disk input/output (I/O) operations and processing logic Reduces system speed

Page 53: Chapter 5 Normalization of Database Tables

DenormalizationDenormalization Normalization is only one of many database design

goals.

Normalized (decomposed) tables require additional processing, ( Join : additional I/O operations )reduce system speed.

CUSTOMER( CUS_NUM, CUS_NAME, … , ZIP_CODE, CITY)

Is it really practical? ZIP(ZIP_CODE, CITY)

Some degree of denormalization →→ increase processing speed

transitive dep.

Page 54: Chapter 5 Normalization of Database Tables

DenormalizationDenormalization Normalization purity is often difficult to sustain in the modern

database environment.

The conflict between design efficiency, information requirements, and processing speed are often resolved through compromises that include denormalization.

In fact, we used a 2NF structure in the JOB table (Fig.5.6) to decrease the likelihood of referential integrity violations.

JOB table

Original PK: JOB_CLASS

New PK: JOB_CODE

EMPLOYEE table

Original FK: JOB_CLASS

New FK: JOB_CODE

Use denormalization cautiously.

Page 55: Chapter 5 Normalization of Database Tables

The Initial 1NF Structure

Page 56: Chapter 5 Normalization of Database Tables

Identifying the Possible PK Attributes

Page 57: Chapter 5 Normalization of Database Tables

Table Structures Based On The Selected PKs

Foreign key

Page 58: Chapter 5 Normalization of Database Tables

SummarySummary

Page 59: Chapter 5 Normalization of Database Tables

SummarySummary