Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X...

32
Schema Refinement and Normal Forms

Transcript of Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X...

Page 1: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Schema Refinement and Normal Forms

Page 2: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

The Evils of Redundancy● Redundancy is at the root of several problems

associated with relational schemas:– redundant storage, insert/delete/update anomalies

● Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.

● Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).

● Decomposition should be used judiciously:– Is there reason to decompose a relation?– What problems (if any) does the decomposition cause?

Page 3: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Functional Dependencies (FDs)● Let R be a relational schema and let X and Y be two

subsets of the set of all attributes of R. We say Y is functionally dependent on X, written X → Y, if the Y-values are determined by the X-values.

● In other words: given two tuples in R, if the X values agree, then the Y values also automatically agree. (X and Y are sets of attributes.)

Page 4: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Functional Dependencies (FDs)

● Example 1: In a university register at any one time a single surname (Y) is associated with a particular registration-number (X).

● Note that the dependency is only one-way (i.e. directional): – registration-number determines surname, but the

reverse is not true.

Page 5: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Functional Dependencies (FDs)

● Example 2: national identification number determines the employee name National_id → employee_name.

The project number determines the project name and location project_no → {project_name, project_location} → hours

Page 6: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Example: Constraints on Entity Set

● Consider relation obtained from Hourly_Emps:– Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)

● Notation: We can denote this relation schema by listing the attributes: SNLRWH– This is really the set of attributes {S,N,L,R,W,H}.– We can also refer to all attributes of a relation by using the

relation name. (e.g., Hourly_Emps for SNLRWH)● Some FDs on Hourly_Emps:

– ssn is the key: S SNLRWH – rating determines hrly_wages: R W

Page 7: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Example (Contd.)

● Problems due to R W :– Update anomaly: Can we

change W in just the 1st tuple of SNLRWH?

– Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating?

– Deletion anomaly: If we delete all employees with rating 5, we lose the information about the wage for rating 5!

Page 8: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Possible Solution

Hourly_Emps2 Wages

Page 9: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Normalization

● Normalization is the process of – disallowing repeated groups of data in a relation

and – organizing the data to minimize redundancy.

● Normalisation usually involves dividing the database into two or more tables and defining the relationship between the tables.

Page 10: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Normalization

● The objective is to isolate data so that additions, deletions and modifications are made in just one table and then propagated through the rest of the database through the defined relationships.

Page 11: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Normal Form

● Example– A company obtains parts from a number of

suppliers. – Each supplier is located in one city. – A city can have more than one supplier located

there – and each city has a status code associated with it.– Each supplier may provide many parts.

Page 12: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

UNNORMALISED DATASupplier Status City P# QuantityS1 

20 London P1P2P3P4P5P6

300200400200100100

S2 

10 Paris P1P2

300400

S3 10 Paris P2 200S4 20 London P2

P4P5

200300400

Page 13: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

First Normal Form (1NF)● A table is in first normal form

– if every attribute is a simple (atomic) attribute– there are no duplicate rows in the table => there is

a designated primary key– there are no repeating groups or arrays

Page 14: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

First normal form●All values of the columns are atomic

Page 15: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

1NF

● FIRST NORMAL FORM (1NF) ≡ ELIMINATE REPEATING GROUPS

Page 16: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Anomalies with 1NF

● INSERT. – The fact that a certain supplier (s5) is located in a particular

city (Athens) cannot be added until they supplied a part. ● DELETE.

– If a row is deleted, then not only is the information about quantity and part lost but also information about the supplier.

● UPDATE. – If supplier s1 moved from London to New York, then six

rows would have to be updated with this new information.

Page 17: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

2NF● A relational table is in second normal form

2NF if – it is in 1NF, and – every non-key column is fully dependent upon

the whole primary key.

● Is FIRST in 2NF?– S# -> city,status– City -> status– (s#,p#) -> qty

Page 18: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Decompose 1NF into 2NF

● Identify any determinants that are part of the composite primary key (other than the entire composite key), and the columns they determine.

● Create and name a new table for each determinant and the unique columns it determines.

● Move the determined columns from the original table to the new table. The determinant becomes the primary key of the new table.

● Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key.

● The original table may be renamed to maintain semantic meaning.

Page 19: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

First normal form●All values of the columns are atomic

Page 20: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

2NF

Page 21: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

2NF

● SECOND NORMAL FORM (2NF) ≡ ELIMINATE PARTIAL DEPENDENCIES

Page 22: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Problems of 2NF

● INSERT. – The fact that a particular city has a certain status

(Rome has a status of 50) cannot be inserted until there is a supplier in the city.

● DELETE. – Deleting any row in SUPPLIER destroys the status

information about the city as well as the association between supplier and city.

Page 23: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

3NF● A relational table is in third normal form (3NF)

– if it is already in 2NF, and – every non-key column is non transitively

dependent upon its primary key. In other words, all non-key attributes are functionally dependent only upon the primary key.

● SUPPLIER is in 2NF but not in 3NF because it contains a transitive dependency. – A transitive dependency occurs when a non-key

column that is determined by the primary key is the determinant of other columns.

Page 24: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Decompose to 3NF● Identify any determinants, other than the primary key,

and the columns they determine. ● Create and name a new table for each determinant

and the unique columns it determines. ● Move the determined columns from the original table

to the new table. The determinant becomes the primary key of the new table.

● Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key.

● The original table may be renamed to maintain semantic meaning.

Jose Nymar
Highlight
Jose Nymar
Highlight
Page 25: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

3NF results

Page 26: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Advantages of 3NF

● it eliminates redundant data ● INSERT.

– Facts about the status of a city, Rome has a status of 50, can be added even though there is not supplier in that city.

– Likewise, facts about new suppliers can be added even though they have not yet supplied parts.

● DELETE. – Information about parts supplied can be deleted without

destroying information about a supplier or a city. ● UPDATE.

– Changing the location of a supplier or the status of a city requires modifying only one row.

Page 27: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Advanced NFs

● After 3NF, all normalization problems involve only tables which have three or more columns and all the columns are keys.

● Many practitioners argue that placing entities in 3NF is generally sufficient because it is rare that entities that are in 3NF are not also in 4NF and 5NF.

● They further argue that the benefits gained from transforming entities into 4NF and 5NF are so slight that it is not worth the effort.

Page 28: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Exercise 1

● An agency called Utalii supplies part-time/temporary staff to hotels throughout Kenya. The relation shown in Table 1 lists the time spent by agency staff working at two hotels.

Page 29: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Exercises

For Table 1, 2 and 3:i. What are the kinds of anomalies that this relation is susceptible to?

Provide examples of each kind based on Table 1.ii. What is a suitable Primary Key for the relation?

iii. List down all the Functional Dependencies present in the relation, in each case stating what kind of FD it is.

iv. Describe and illustrate the process of normalizing the table to 3NF. State any assumptions you make.

v. Write SQL statements to create the corresponding 3NF relations. The SQL statements should capture the constraints indicated as well as enforce referential integrity.

Page 30: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Table 1

ID_NoContract

NoHours/Week eName

hotelNo Hotel Name

HotelLocation

23476512 C1024 16 Paul Mungai H25 Interconn Nairobi

22566083 C1024 24Diana

Achieng H25 Interconn Nairobi

22543267 C1025 28Sarah

Muthoni H4 Hilton Mombasa

23476512 C1025 16 Paul Mungai H4 Hilton Mombasa

Page 31: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Table 2

RentalID Title CustomerID DateBorrowed Director DirectorRating Price

1 Die Hard 1001 3/3/2017 John McTiernan A 4.25

1 The last man standing 1001 3/3/2017 Walter Hill B 4.25

1 Wedding Crashers 1001 3/3/2017 David

Dobkin D 5.5

2 Dodgeball 1002 3/4/2017Rawson Marshall Thurber

C 5.5

2 Die Hard 1002 3/4/2017 John McTiernan A 4.25

3 As good as it gets 1003 1/5/2017 James

Brooks D 4.25

4 Forrest Gump 1001 1/5/2017 Robert

Zemeckis C 4.25

Page 32: Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X and Y be two subsets of the set of all attributes of R. We say Y is functionally dependent

Table 3

UnitID StudentID Date TutorID Topic Room Grade Book TutEmail

U1 St1 23.02.03 Tut1 GMT 629 4.7 Deumlich [email protected]

U2 St1 18.11.02 Tut3 GIn 631 5.1 Zehnder [email protected]

U1 St4 23.02.03 Tut1 GMT 629 4.3 Deumlich [email protected]

U5 St2 05.05.03 Tut3 PhF 632 4.9 Dümmlers [email protected]

U4 St2 04.07.03 Tut5 AVQ 621 5 SwissTopo [email protected]