Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X...
Transcript of Normal Forms Schema Refinement and - intUitiON KE€¦ · Let R be a relational schema and let X...
Schema Refinement and Normal Forms
The Evils of Redundancy● Redundancy is at the root of several problems
associated with relational schemas:– redundant storage, insert/delete/update anomalies
● Integrity constraints, in particular functional dependencies, can be used to identify schemas with such problems and to suggest refinements.
● Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).
● Decomposition should be used judiciously:– Is there reason to decompose a relation?– What problems (if any) does the decomposition cause?
Functional Dependencies (FDs)● Let R be a relational schema and let X and Y be two
subsets of the set of all attributes of R. We say Y is functionally dependent on X, written X → Y, if the Y-values are determined by the X-values.
● In other words: given two tuples in R, if the X values agree, then the Y values also automatically agree. (X and Y are sets of attributes.)
Functional Dependencies (FDs)
● Example 1: In a university register at any one time a single surname (Y) is associated with a particular registration-number (X).
● Note that the dependency is only one-way (i.e. directional): – registration-number determines surname, but the
reverse is not true.
Functional Dependencies (FDs)
● Example 2: national identification number determines the employee name National_id → employee_name.
The project number determines the project name and location project_no → {project_name, project_location} → hours
Example: Constraints on Entity Set
● Consider relation obtained from Hourly_Emps:– Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
● Notation: We can denote this relation schema by listing the attributes: SNLRWH– This is really the set of attributes {S,N,L,R,W,H}.– We can also refer to all attributes of a relation by using the
relation name. (e.g., Hourly_Emps for SNLRWH)● Some FDs on Hourly_Emps:
– ssn is the key: S SNLRWH – rating determines hrly_wages: R W
Example (Contd.)
● Problems due to R W :– Update anomaly: Can we
change W in just the 1st tuple of SNLRWH?
– Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating?
– Deletion anomaly: If we delete all employees with rating 5, we lose the information about the wage for rating 5!
Possible Solution
Hourly_Emps2 Wages
Normalization
● Normalization is the process of – disallowing repeated groups of data in a relation
and – organizing the data to minimize redundancy.
● Normalisation usually involves dividing the database into two or more tables and defining the relationship between the tables.
Normalization
● The objective is to isolate data so that additions, deletions and modifications are made in just one table and then propagated through the rest of the database through the defined relationships.
Normal Form
● Example– A company obtains parts from a number of
suppliers. – Each supplier is located in one city. – A city can have more than one supplier located
there – and each city has a status code associated with it.– Each supplier may provide many parts.
●
UNNORMALISED DATASupplier Status City P# QuantityS1
20 London P1P2P3P4P5P6
300200400200100100
S2
10 Paris P1P2
300400
S3 10 Paris P2 200S4 20 London P2
P4P5
200300400
First Normal Form (1NF)● A table is in first normal form
– if every attribute is a simple (atomic) attribute– there are no duplicate rows in the table => there is
a designated primary key– there are no repeating groups or arrays
First normal form●All values of the columns are atomic
1NF
● FIRST NORMAL FORM (1NF) ≡ ELIMINATE REPEATING GROUPS
Anomalies with 1NF
● INSERT. – The fact that a certain supplier (s5) is located in a particular
city (Athens) cannot be added until they supplied a part. ● DELETE.
– If a row is deleted, then not only is the information about quantity and part lost but also information about the supplier.
● UPDATE. – If supplier s1 moved from London to New York, then six
rows would have to be updated with this new information.
2NF● A relational table is in second normal form
2NF if – it is in 1NF, and – every non-key column is fully dependent upon
the whole primary key.
● Is FIRST in 2NF?– S# -> city,status– City -> status– (s#,p#) -> qty
Decompose 1NF into 2NF
● Identify any determinants that are part of the composite primary key (other than the entire composite key), and the columns they determine.
● Create and name a new table for each determinant and the unique columns it determines.
● Move the determined columns from the original table to the new table. The determinant becomes the primary key of the new table.
● Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key.
● The original table may be renamed to maintain semantic meaning.
First normal form●All values of the columns are atomic
2NF
2NF
● SECOND NORMAL FORM (2NF) ≡ ELIMINATE PARTIAL DEPENDENCIES
Problems of 2NF
● INSERT. – The fact that a particular city has a certain status
(Rome has a status of 50) cannot be inserted until there is a supplier in the city.
● DELETE. – Deleting any row in SUPPLIER destroys the status
information about the city as well as the association between supplier and city.
3NF● A relational table is in third normal form (3NF)
– if it is already in 2NF, and – every non-key column is non transitively
dependent upon its primary key. In other words, all non-key attributes are functionally dependent only upon the primary key.
● SUPPLIER is in 2NF but not in 3NF because it contains a transitive dependency. – A transitive dependency occurs when a non-key
column that is determined by the primary key is the determinant of other columns.
Decompose to 3NF● Identify any determinants, other than the primary key,
and the columns they determine. ● Create and name a new table for each determinant
and the unique columns it determines. ● Move the determined columns from the original table
to the new table. The determinant becomes the primary key of the new table.
● Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key.
● The original table may be renamed to maintain semantic meaning.
3NF results
Advantages of 3NF
● it eliminates redundant data ● INSERT.
– Facts about the status of a city, Rome has a status of 50, can be added even though there is not supplier in that city.
– Likewise, facts about new suppliers can be added even though they have not yet supplied parts.
● DELETE. – Information about parts supplied can be deleted without
destroying information about a supplier or a city. ● UPDATE.
– Changing the location of a supplier or the status of a city requires modifying only one row.
Advanced NFs
● After 3NF, all normalization problems involve only tables which have three or more columns and all the columns are keys.
● Many practitioners argue that placing entities in 3NF is generally sufficient because it is rare that entities that are in 3NF are not also in 4NF and 5NF.
● They further argue that the benefits gained from transforming entities into 4NF and 5NF are so slight that it is not worth the effort.
Exercise 1
● An agency called Utalii supplies part-time/temporary staff to hotels throughout Kenya. The relation shown in Table 1 lists the time spent by agency staff working at two hotels.
Exercises
For Table 1, 2 and 3:i. What are the kinds of anomalies that this relation is susceptible to?
Provide examples of each kind based on Table 1.ii. What is a suitable Primary Key for the relation?
iii. List down all the Functional Dependencies present in the relation, in each case stating what kind of FD it is.
iv. Describe and illustrate the process of normalizing the table to 3NF. State any assumptions you make.
v. Write SQL statements to create the corresponding 3NF relations. The SQL statements should capture the constraints indicated as well as enforce referential integrity.
Table 1
ID_NoContract
NoHours/Week eName
hotelNo Hotel Name
HotelLocation
23476512 C1024 16 Paul Mungai H25 Interconn Nairobi
22566083 C1024 24Diana
Achieng H25 Interconn Nairobi
22543267 C1025 28Sarah
Muthoni H4 Hilton Mombasa
23476512 C1025 16 Paul Mungai H4 Hilton Mombasa
Table 2
RentalID Title CustomerID DateBorrowed Director DirectorRating Price
1 Die Hard 1001 3/3/2017 John McTiernan A 4.25
1 The last man standing 1001 3/3/2017 Walter Hill B 4.25
1 Wedding Crashers 1001 3/3/2017 David
Dobkin D 5.5
2 Dodgeball 1002 3/4/2017Rawson Marshall Thurber
C 5.5
2 Die Hard 1002 3/4/2017 John McTiernan A 4.25
3 As good as it gets 1003 1/5/2017 James
Brooks D 4.25
4 Forrest Gump 1001 1/5/2017 Robert
Zemeckis C 4.25
Table 3
UnitID StudentID Date TutorID Topic Room Grade Book TutEmail
U1 St1 23.02.03 Tut1 GMT 629 4.7 Deumlich [email protected]
U2 St1 18.11.02 Tut3 GIn 631 5.1 Zehnder [email protected]
U1 St4 23.02.03 Tut1 GMT 629 4.3 Deumlich [email protected]
U5 St2 05.05.03 Tut3 PhF 632 4.9 Dümmlers [email protected]
U4 St2 04.07.03 Tut5 AVQ 621 5 SwissTopo [email protected]