The Relational Model – Functional Dependencies & Normalization

72
The Relational Model – Functional Dependencies & Normalization

description

The Relational Model – Functional Dependencies & Normalization. Optimal Database Design Selection Of Appropriate Relations/Tables For A Given Set Of Attributes Minimize Update Anomalies Redundancy Update Inconsistent Data Additions Deletions. Objectives. Definition of Anomaly. - PowerPoint PPT Presentation

Transcript of The Relational Model – Functional Dependencies & Normalization

Page 1: The Relational Model – Functional Dependencies & Normalization

The Relational Model – Functional

Dependencies & Normalization

Page 2: The Relational Model – Functional Dependencies & Normalization

Objectives Optimal Database Design

Selection Of Appropriate Relations/Tables For A Given Set Of Attributes

Minimize Update AnomaliesRedundancyUpdateInconsistent DataAdditionsDeletions

Page 3: The Relational Model – Functional Dependencies & Normalization

Definition of Anomaly

Something that deviates from our expectations

Page 4: The Relational Model – Functional Dependencies & Normalization

ExampleCUSTNUMB CUSTNAME CUSTADDR SNUMB SLSRNAME

123 Jones, R. 19 Oak St. 3 Adams, M.

456 Lan, J. 4 Pine St. 6 Smith, R.

461 Chu, W. 22 Main St. 12 Brown, M.

489 Obie, S. 76 High St. 6 Smith, R.

514 Wise, R 17 Birch St. 3 Adams, M.

... ... ... ... ...

999 Side, E. 87 Bay St. 12 Brown, N.

Page 5: The Relational Model – Functional Dependencies & Normalization

Specific Anomalies In This Relation

RedundancyWhy repeat the Sales Rep Name for Adams in each record? Suppose Adams has 500 customers? That means 500 times you repeat Adams’ name!

Update Suppose Slsr Mary Adams marries and changes her name? How many rows do we need to update?

Inconsistent dataNotice Brown's first initial varies : M, N

AdditionsNew Slsr J. Doe can't be entered until he has a customer

DeletionDelete all customers of Adams, and we lose the name of the salesrep Adams

Page 6: The Relational Model – Functional Dependencies & Normalization

Decomposition Of RelationsThe previous table can be decomposed into the following

two tablesCUSTNUMB CUSTNAME CUSTADDR SNUMB

123 Jones, R. 19 Oak St. 3456 Lan, J. 4 Pine St. 6461 Chu, W. 22 Main St. 12489 Obie, S. 76 High St. 6514 Wise, R 17 Birch St. 3... ... ... ...

999 Side, E. 87 Bay St. 12

SNUMB SLSRNAME3 Adams, M.

6 Smith, R.

12 Brown, M.

Page 7: The Relational Model – Functional Dependencies & Normalization

Notice That This Decomposition Resolved All Database Anomalies

REDUNDANCYNONE EXISTS

UPDATEJUST CHANGE MARY ADAMS' LAST NAME (ONCE) IN salesrep relation

INCONSISTENT DATAIMPOSSIBLE - M. BROWN'S NAME APPEARS ONLY ONCE!

ADDITIONSADD NEW SLSR J. DOE TO salesrep relation

DELETIONSWE CAN DELETE ALL OF ADAMS' CUSTOMERS AND STILL HAVE ADAMS IN salesrep

Page 8: The Relational Model – Functional Dependencies & Normalization

Conceptual Tools Needed For Decomposition

Functional DependenciesLossless Join DecompositionNormal Forms

Page 9: The Relational Model – Functional Dependencies & Normalization

Functional DependenciesCommon Issue in Designing a New Database From Existing Data

We have obtained one or more tables of existing data (such as from a spreadsheet or extracts from an existing corporate database).

The data is to be stored in a new database.

DATABASE DESIGN QUESTION: Should the data be stored as received, or should it be transformed for storage?

Page 10: The Relational Model – Functional Dependencies & Normalization

Should We Combine ORDER_ITEM and SKU_DATA into One Table (SKU_DATA)?

Should we store these two tables as they are, or should we combine them into one table in our new database?

Page 11: The Relational Model – Functional Dependencies & Normalization

But First—

We need to understand:The relational modelRelational model terminology

Page 12: The Relational Model – Functional Dependencies & Normalization

The Relational Model

Introduced in 1970

Created by E.F. CoddHe was an IBM engineerThe model used mathematics known as

“relational algebra”

Now the standard model for commercial DBMS products.

Page 13: The Relational Model – Functional Dependencies & Normalization

Important Relational Model Terms

EntityRelationFunctional DependencyDeterminantCandidate KeyComposite KeyPrimary KeySurrogate KeyForeign KeyReferential integrity constraintNormal FormMultivalued Dependency (new for us)

Page 14: The Relational Model – Functional Dependencies & Normalization

Entity

An entity is some identifiable thing that users want to track:CustomersComputersSales

Page 15: The Relational Model – Functional Dependencies & Normalization

RelationsA relation is a two-dimensional table that has the following characteristics: Rows contain data about an entity. Columns contain data about attributes of

the entity. All entries in a column are of the same kind. Each column has a unique name. Cells of the table hold a single value. The order of the columns is unimportant. The order of the rows is unimportant. No two rows may be identical

Page 16: The Relational Model – Functional Dependencies & Normalization

A Typical Relation

Page 17: The Relational Model – Functional Dependencies & Normalization

Tables That Are Not Relations:Multiple Entries per Cell

Page 18: The Relational Model – Functional Dependencies & Normalization

Tables That Are Not Relations:Table with Required Row

Order

Page 19: The Relational Model – Functional Dependencies & Normalization

A Valid Relation with Values of Different Length

Page 20: The Relational Model – Functional Dependencies & Normalization

An INVALID relation (Cells in a valid relation are supposed to hold a single value, but the Phone “cell” for Employees 400 and 700 have multiple phone numbers)

Page 21: The Relational Model – Functional Dependencies & Normalization

Alternative TerminologyAlthough not all tables are relations, as we have seen on the previous slides, the terms table and relation are generally used interchangeably.The following sets of terms are equivalent:

Page 22: The Relational Model – Functional Dependencies & Normalization

Functional Dependency

A functional dependency occurs when the value of one (set of) attribute(s) determines the value of a second (set of) attribute(s):

StudentID StudentName

StudentID (DormName, DormRoom, Fee)

The attribute on the left side of the functional dependency is called the determinant.

Functional dependencies may be based on equations:ExtendedPrice = Quantity X UnitPrice

(Quantity, UnitPrice) ExtendedPrice

But, function dependencies are definitely not equations!

Page 23: The Relational Model – Functional Dependencies & Normalization

Functional Dependencies Are Not Equations: An Example

ObjectColor Weight ObjectColor Shape ObjectColor (Weight, Shape)

We can deduce the following set of Functional Dependencies from the above diagram

But, does Shape functionally determine anything? (NO!)

Page 24: The Relational Model – Functional Dependencies & Normalization

Composite DeterminantsComposite determinant: a determinant of a functional dependency that consists of more than one attribute.

(StudentName, ClassName) (Grade)

Page 25: The Relational Model – Functional Dependencies & Normalization

Functional Dependency Rules(Not a complete list)

If A (B, C), then A B and A C

If (A,B) C, then neither A nor B determines C by itself

Page 26: The Relational Model – Functional Dependencies & Normalization

Functional Dependency Review

A functional dependency occurs when the value of one (or set of) attribute(s) determines the value of a second (or set of) attribute(s):

StudentID StudentNameStudentID (DormName, DormRoom, Fee)

The attribute on the left side of the functional dependency is called the determinant, the attribute on the right side is called the dependent.Functional dependencies may be based on equations:

ExtendedPrice = Quantity X UnitPrice(Quantity, UnitPrice) ExtendedPrice

Function dependencies are not equations

Page 27: The Relational Model – Functional Dependencies & Normalization

Composite Determinants

Composite determinant: A determinant of a functional dependency that consists of more than one attribute

Example of a Composite Determinant: (StudentName, ClassName) (Grade)

Page 28: The Relational Model – Functional Dependencies & Normalization

Find the functional dependencies in the SKU_DATA Table

Ask yourself the question – if we know the value of a particular attribute, will that value determine a unique value of some other attribute? (If “yes,” then we have a functional dependency between the attributes.)

Page 29: The Relational Model – Functional Dependencies & Normalization

Functional Dependencies in the SKU_DATA Table

SKU (SKU_Description, Department, Buyer)

SKU_Description (SKU, Department, Buyer)

Buyer Department

Page 30: The Relational Model – Functional Dependencies & Normalization

Find the functional dependencies in the ORDER_ITEM Table

Page 31: The Relational Model – Functional Dependencies & Normalization

Functional dependencies in ORDER_ITEM Table

(OrderNumber, SKU) (Quantity, Price, ExtendedPrice) Note that OderNumber by itself does not functionally

determine any other attribute While SKU, from the data, does appear to functionally

determine Price, we always need to be very careful in making inferences from data. Prices may change in the future, and the price might often be tied to a particular order. So, we would prefer to use the composite of SKU and OrderNumber as a determinant in a functional dependency, rather than SKU by itself.

(Quantity, Price) (ExtendedPrice) Note that this is derived from the equation

ExtendedPrice = Quantity * Price

Page 32: The Relational Model – Functional Dependencies & Normalization

When are determinant values unique?A determinant has unique values (i.e., all values are different) in a relation if, and only if, it functionally determines every other attribute in the relation So, in SKU_Data, SKU has all different (unique) values,

and it functionally determines every attribute in the table. On the other hand, Buyer, though a determinant, does not have unique values, and does not functionally determine all the other attributes in the relation.

So, you cannot find the determinants of all functional dependencies simply by looking for unique values in one column

Page 33: The Relational Model – Functional Dependencies & Normalization

A B C D Ea(1) b(1) c(1) d(1) e(1)a(1) b(1) c(2) d(1) e(1)a(2) b(1) c(1) d(1) e(1)a(2) b(2) c(1) d(2) e(1)a(2) b(2) c(2) d(3) e(2) BC ----> D (True or False?)

B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)

Page 34: The Relational Model – Functional Dependencies & Normalization

The Answers

BC ----> D (True or False?) B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)

Page 35: The Relational Model – Functional Dependencies & Normalization

Deducing Functional Dependencies

Since BC ----> D and D ----> BE, can we conclude that BC ----> BE ?

YES! (We will call this transitivity)

If BC ----> D and BC ----> A, can we conclude that D ----> A ?

NO! Nor can we conclude A ----> D.

Page 36: The Relational Model – Functional Dependencies & Normalization

Superkeys & FD's A superkey is an attribute or a set of attributes that identify

an entity UNIQUELY. In a relation (table), a SUPERKEY is any column or set of

columns whose values can be used to distinguish onerow from another.

Since a superkey identifies each item uniquely, it functionally determines all the attributes of a relation. STUID is a superkey SOCSEC is a superkey STUNAME is NOT a superkey STUID,STUNAME IS a superkey STUID,ANY OTHER SET OF ATTRIBUTES is a superkey

Page 37: The Relational Model – Functional Dependencies & Normalization

The Formal Theory Definition Of A Superkey

A set of attributes K is a superkey of relation (table) R, if K ----> R

In other words, a superkey functionally determines all the attributes in R

Page 38: The Relational Model – Functional Dependencies & Normalization

More On SuperkeysA superkey is a candidate key if it is minimal, i.e., if X is a superkey, then X minus {any attribute of X} is NOT a superkey.

A primary key is a candidate key which we choose to be THE "key."

Page 39: The Relational Model – Functional Dependencies & Normalization

Superkeys, Candidate Keys And Primary Keys

Superkey: a set of attributes which functionally determines all of the attributes in the relation

Candidate key:from the set of superkeys, we eliminate all those superkeys which have "extra" attributes (a superkey will have an "extra" attribute if, when we remove this attribute, the resulting set of attributes is also a superkey).

Primary key: if there is more than 1 candidate key, then the candidate key we choose for THE key is called the primary key - if there is exactly 1 candidate key, then that candidate key is the primary key.

Page 40: The Relational Model – Functional Dependencies & Normalization

Example - Obtain Candidate KeysConsider the following scheme from an airline database system:( P(pilot) , F(flight# ), D(date), T (scheduled time to depart) )

We have the following FD's : F ----> T PDT ----> F FD ----> P

Provide some superkeys: PDT is a superkey, and FD is a superkey. Is PDT a candidate key?

PD is not a superkey, nor is DT, nor is PT. So, PDT is a candidate key.

FD is also a candidate key, since neither F or D are superkeys.

Page 41: The Relational Model – Functional Dependencies & Normalization

Surrogate Keys

A surrogate key is an artificial attribute/column added to a relation to serve as a primary key:Often DBMS suppliedShort, numeric and never changes – an

ideal primary key!Has artificial values that are meaningless

to usersNormally hidden in forms and reports

Page 42: The Relational Model – Functional Dependencies & Normalization

Example of Surrogate Keys(NOTE: The primary key of the relation is underlined below)

RENTAL_PROPERTY without surrogate key:RENTAL_PROPERTY (Street, City,State/Province, Zip/PostalCode, Country, Rental_Rate)

RENTAL_PROPERTY with surrogate key: RENTAL_PROPERTY (PropertyID, Street, City,

State/Province, Zip/PostalCode, Country, Rental_Rate

Page 43: The Relational Model – Functional Dependencies & Normalization

Trivial FD'sA functional dependency is defined to be trivial if it is satisfied by every relation

Example of a trivial functional dependency: AB ----> A is satisfied by every

relation involving A.

Page 44: The Relational Model – Functional Dependencies & Normalization

Trivial Fd'sGeneralization and rule for trivial FD's:

An FD is trivial if it has the form: X ----> Y, where Y is a subset of X.

So, ABCD ----> ABC is a trivial FD.

A trivial FD does not make a significant statement about real world constraints - we are thus only interested in non-trivial FD's.

Page 45: The Relational Model – Functional Dependencies & Normalization

Another FD “Rule”

If (A,B) C, then neither A nor B by itself will functionally determine C.

Page 46: The Relational Model – Functional Dependencies & Normalization

Normal Forms There are numerous "normal forms" which

are categorizations based upon the kinds of “problems” that relations have.

These will be discussed:

First Normal Form (1NF)Second Normal Form (2NF)Third Normal Form (3NF)Boyce-Codd Normal Form (BCNF)

Page 47: The Relational Model – Functional Dependencies & Normalization

FIRST NORMAL FORM

A relation is in first normal form (1NF) iff every attribute in every row can contain only a single value. A 1NF relation cannot have any row that contains a repeating grouping of attribute values.

Page 48: The Relational Model – Functional Dependencies & Normalization

Example Of A Relation Not In 1NF

Ordnumb Orddte Partnumb Numbord12489 30109 AX12 1112491 30209 BT04 1 BZ66 112495 30409 CX11 2

*We can convert the above table to 1NF by flattening *

Ordnumb Orddte Partnumb Numbord12489 30109 AX12 1112491 30209 BT04 112491 30209 BZ66 112495 30409 CX11 2

Page 49: The Relational Model – Functional Dependencies & Normalization

Second Normal Form

Definition: an attribute is a non-key attribute if it is not a part of the primary key

Definition: A relation is in second normal form (2NF) if it is in first normal form and no non-key attribute is dependent on only a portion of the primary key (when the primary key is composite - consisting of 2 or more attributes)

Page 50: The Relational Model – Functional Dependencies & Normalization

Example Of A Relation In 1NF,

But Not 2NFOrdnumb Orddte Partnumb PartDesc Numbord Quoprice12489 90509 AX12 MOUSE 11 14.9512491 90509 BT04 DRV270G 1 120.9912491 90509 BZ66 DRV180G 1 80.9512495 90709 AX12 MOUSE 4 14.95

*****The following FD's hold on this relation*******Ordnumb ----> OrddtePartnumb ---> PartDescOrdnumb, Partnumb ----> Numbord, Quoprice

******The relation is NOT in 2NF because ...*********PartDesc is dependent on only a portion of primary key,

and similarly for Orddte

Page 51: The Relational Model – Functional Dependencies & Normalization

Transform Relation To 2NF First, take each subset of the set of attributes which make up

the primary key, and begin a relation with this subset as its primary key

(Ordnumb)(Partnumb)(Ordnumb, Partnumb)

Then, place each of the other attributes with the appropriate primary key, i.e., place each one with the minimal collection on which it depends

(Ordnumb, Orddte)(Partnumb, Partdesc)(Ordnumb, Partnumb, Numbord, Quoteprice)

Page 52: The Relational Model – Functional Dependencies & Normalization

Third Normal Form A relation is in Third Normal Form (3NF) iff it is

in Second Normal Form and there is no non-key attribute which is functionally dependent upon another non-key attribute in any functional dependency

("each non-key attribute must depend upon the key, the whole key, and nothing but the key")

Page 53: The Relational Model – Functional Dependencies & Normalization

Example Of Relation In 2NF, But Not 3NF Consider STUDENT(STUID, STUNAME, MAJOR,

CREDITS, FSJS) with the following FD's:Stuid ----> Stuname, Major, Credits, FSJSCredits---> FSJS

Since attribute FSJS depends on credits, student is not in 3NF

To create 3NF here, form a new relation (STATS) with the functionally dependent attribute and its determinant

STU2 ( Stuid, Stuname, Major, Credits) R1

STATS ( Credits, FSJS ) R2

Page 54: The Relational Model – Functional Dependencies & Normalization

Boyce-Codd Normal Form (BCNF) Reminder: a determinant is an attribute (or

collection of attributes) that functionally determines another attribute (or set of attributes), i.e., it is the LHS of a functional dependency

Example: in sosec ---------> stuname, sosec is a determinant

Def.: A relation is in Boyce-Codd normal form if every determinant is a candidate key

Page 55: The Relational Model – Functional Dependencies & Normalization

Another Example Of 2NF Relation (Not In 3NF And Not In BCNF)

GIVEN: PC (TAGNUM, COMPID, EMPNUM,EMPNAME,LOCATION)

and given the following functional dependencies: FD1: TAGNUM ---->COMPID,EMPNUM,EMPNAME,LOCATION.

FD2: EMPNUM-----> EMPNAME

Page 56: The Relational Model – Functional Dependencies & Normalization

This Relation Satisfies 2NF, But Not 3NF Or BCNF

TAGNUM COMPID EMPNUM EMPNAMELOCATION

32808 M759 611 DINH, M. ACCOUNTING

37691 B121 124 ALVAREZ, R SALES

57772 C007 567 FEINSTEIN, BINFO

SYSTEMS

59836 B221 124 ALVAREZ, R HOME

77740 M759 567 FEINSTEIN, B HOME

Page 57: The Relational Model – Functional Dependencies & Normalization

Some Anomalies Present In This Relation

UPDATE: If Betty Feinstein gets married, must change more than 1 record

INCONSISTENT DATA: Potential problem due to redundancy

ADDITIONS: New employee 347 cannot be added until a pc is assigned

Page 58: The Relational Model – Functional Dependencies & Normalization

Why Is The PC Relation Not In 3NF Or Boyce Codd Normal Form?

1) It is in 2NF (there is no non-key attribute dependent on only a portion of the primary key, since the primary key consists of only 1 attribute)

2) The primary key is TAGNUM.

3) The only candidate key is TAGNUM.

4) There are 2 determinants - TAGNUM AND EMPNUM.5) Since EMPNUM is a determinant but not a candidate key, the relation is not in BCNF. And it's not in 3NF either.

Page 59: The Relational Model – Functional Dependencies & Normalization

Changing Our PC Relation To 3NF

PC (TAGNUM, COMPID, EMPNUM, EMPNAME, LOCATION) is replaced by

PC (TAGNUM, COMPID, EMPNUM, LOCATION)

and

EMPLOYEE (EMPNUM, EMPNAME)

Page 60: The Relational Model – Functional Dependencies & Normalization

Transforming A 3NF Relation To BCNF

1) For each determinant that is not a candidate key, remove from the relation the attributes which are functionally determined by this determinant.

2) Create a new table containing all the attributes from the original relation which were functionally determined by this determinant.

3) Make the determinant the primary key of this new relation.

Page 61: The Relational Model – Functional Dependencies & Normalization

Important PointsA relation in 3NF may or may not be in

Boyce Codd Normal Form

BUT, a relation in Boyce Codd Normal Form will ALWAYS be in 3NF.

{Some textbooks consider Boyce Codd Normal Form to be "the" third Normal Form. Ours does not. }

Page 62: The Relational Model – Functional Dependencies & Normalization

Example of a relation in 3NF which is NOT in BCNF

SID MAJOR FACNAME100 Math Cauchy150 Psychology Jung200 Math Riemann250 Math Cauchy300 Psychology Perls300 Math Riemann

Suppose that, in a given university:1. Students may have one or more majors.2. A major may have several faculty members as as advisers.3. A faculty member can advise in only one major area.

Page 63: The Relational Model – Functional Dependencies & Normalization

Things to note from this example

The primary key is not SID !!

The primary key consists of two attributes: SID and MAJOR.

There is an important functional dependency corresponding to the statement "A Faculty member can advise students in only one major area." FACNAME -----> MAJOR

The relation IS in 2NF, since there are no non-key attributes dependent on only a portion of the primary key.

The relation is in 3NF, but NOT in BCNF.

Page 64: The Relational Model – Functional Dependencies & Normalization

The ADVISOR relation transformed to Boyce Codd

Normal Form

SID FACNAME

100 Cauchy

150 Jung

200 Riemann

250 Cauchy

300 Perls

300 Riemann

STU-ADV(SID, FACNAME)

FACNAME MAJOR

Cauchy MathJung Psychology

Riemann MathPerls Psychology

ADV-MAJOR(FACNAME, Major)

Page 65: The Relational Model – Functional Dependencies & Normalization

Going Directly to BCNF

Page 66: The Relational Model – Functional Dependencies & Normalization

Example 1 of Going Directly to BCNFThe SKU_DATA TABLE

Page 67: The Relational Model – Functional Dependencies & Normalization

Working Through The Example

SKU_DATA (SKU, SKU_Description, Department, Buyer)Identify the FDs:

a) SKU (SKU_Description, Department, Buyer)b) SKU_Description (SKU, Department, Buyer)c) Buyer Department

SKU and SKU_Description are candidate keys, Buyer is NOT a candidate key, so SKU_DATA is not in BCNF. Placing the columns of the problem FD (c) into a separate relation, with the determinant Buyer as the primary key, and making Buyer a foreign key in the SKU_DATA relation, we obtain:

SKU_DATA2 (SKU, SKU_Description, Buyer)BUYER (Buyer, Department)

Where BUYER.Buyer must exist in SKU_DATA2.Buyer

Page 68: The Relational Model – Functional Dependencies & Normalization

The Resulting Populated SKU_DATA2 and BUYER Relations, in BCNF

Page 69: The Relational Model – Functional Dependencies & Normalization

Example 2 of Going Directly to BCNFThe EQUIPMENT_REPAIR table

Page 70: The Relational Model – Functional Dependencies & Normalization

Working Through The ExampleEQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost,

RepairNumber, RepairDate, RepairAmount)Identify the FDs:

a) ItemNumber (Type, AcquisitionCost)b) RepairNumber (ItemNumber, Type, AcquisitionCost,

RepairDate, RepairAmount)

RepairNumber is a candidate key, ItemNumber is NOT a candidate key, so EQUIPMENT_REPAIR is not in BCNF. Placing the columns of the problem FD (a) into a separate relation, with the determinant ItemNumber as the primary key, and making ItemNumber a foreign key in the REPAIR relation, we obtain:

ITEM (ItemNumber, Type, AcquisitionCost)REPAIR (RepairNumber, RepairDate, RepairAmount, ItemNumber, )

Where REPAIR.ItemNumber must exist in ITEM.ItemNumber

Page 71: The Relational Model – Functional Dependencies & Normalization

The Resulting Populated REPAIR and ITEM Relations, in BCNF

Page 72: The Relational Model – Functional Dependencies & Normalization

SUMMARY OF NORMAL FORMS WE HAVE COVERED

1NF – A table that qualifies as a relation is in 1NF

2NF – A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key

3NF – A relation is in 3NF if it is in 2NF and there is no non-key attribute which is functionally dependent upon another non-key attribute in any functional dependency, or, equivalently, there are no determinants except the primary key, (or, equivalently, there are no transitive dependencies {i.e., there are no FDs where A B and B C} )

Boyce-Codd Normal Form (BCNF) – A relation is in BCNF if every determinant is a candidate key