The Relational Model – Functional
Dependencies & Normalization
Objectives Optimal Database Design
Selection Of Appropriate Relations/Tables For A Given Set Of Attributes
Minimize Update AnomaliesRedundancyUpdateInconsistent DataAdditionsDeletions
Definition of Anomaly
Something that deviates from our expectations
ExampleCUSTNUMB CUSTNAME CUSTADDR SNUMB SLSRNAME
123 Jones, R. 19 Oak St. 3 Adams, M.
456 Lan, J. 4 Pine St. 6 Smith, R.
461 Chu, W. 22 Main St. 12 Brown, M.
489 Obie, S. 76 High St. 6 Smith, R.
514 Wise, R 17 Birch St. 3 Adams, M.
... ... ... ... ...
999 Side, E. 87 Bay St. 12 Brown, N.
Specific Anomalies In This Relation
RedundancyWhy repeat the Sales Rep Name for Adams in each record? Suppose Adams has 500 customers? That means 500 times you repeat Adams’ name!
Update Suppose Slsr Mary Adams marries and changes her name? How many rows do we need to update?
Inconsistent dataNotice Brown's first initial varies : M, N
AdditionsNew Slsr J. Doe can't be entered until he has a customer
DeletionDelete all customers of Adams, and we lose the name of the salesrep Adams
Decomposition Of RelationsThe previous table can be decomposed into the following
two tablesCUSTNUMB CUSTNAME CUSTADDR SNUMB
123 Jones, R. 19 Oak St. 3456 Lan, J. 4 Pine St. 6461 Chu, W. 22 Main St. 12489 Obie, S. 76 High St. 6514 Wise, R 17 Birch St. 3... ... ... ...
999 Side, E. 87 Bay St. 12
SNUMB SLSRNAME3 Adams, M.
6 Smith, R.
12 Brown, M.
Notice That This Decomposition Resolved All Database Anomalies
REDUNDANCYNONE EXISTS
UPDATEJUST CHANGE MARY ADAMS' LAST NAME (ONCE) IN salesrep relation
INCONSISTENT DATAIMPOSSIBLE - M. BROWN'S NAME APPEARS ONLY ONCE!
ADDITIONSADD NEW SLSR J. DOE TO salesrep relation
DELETIONSWE CAN DELETE ALL OF ADAMS' CUSTOMERS AND STILL HAVE ADAMS IN salesrep
Conceptual Tools Needed For Decomposition
Functional DependenciesLossless Join DecompositionNormal Forms
Functional DependenciesCommon Issue in Designing a New Database From Existing Data
We have obtained one or more tables of existing data (such as from a spreadsheet or extracts from an existing corporate database).
The data is to be stored in a new database.
DATABASE DESIGN QUESTION: Should the data be stored as received, or should it be transformed for storage?
Should We Combine ORDER_ITEM and SKU_DATA into One Table (SKU_DATA)?
Should we store these two tables as they are, or should we combine them into one table in our new database?
But First—
We need to understand:The relational modelRelational model terminology
The Relational Model
Introduced in 1970
Created by E.F. CoddHe was an IBM engineerThe model used mathematics known as
“relational algebra”
Now the standard model for commercial DBMS products.
Important Relational Model Terms
EntityRelationFunctional DependencyDeterminantCandidate KeyComposite KeyPrimary KeySurrogate KeyForeign KeyReferential integrity constraintNormal FormMultivalued Dependency (new for us)
Entity
An entity is some identifiable thing that users want to track:CustomersComputersSales
RelationsA relation is a two-dimensional table that has the following characteristics: Rows contain data about an entity. Columns contain data about attributes of
the entity. All entries in a column are of the same kind. Each column has a unique name. Cells of the table hold a single value. The order of the columns is unimportant. The order of the rows is unimportant. No two rows may be identical
A Typical Relation
Tables That Are Not Relations:Multiple Entries per Cell
Tables That Are Not Relations:Table with Required Row
Order
A Valid Relation with Values of Different Length
An INVALID relation (Cells in a valid relation are supposed to hold a single value, but the Phone “cell” for Employees 400 and 700 have multiple phone numbers)
Alternative TerminologyAlthough not all tables are relations, as we have seen on the previous slides, the terms table and relation are generally used interchangeably.The following sets of terms are equivalent:
Functional Dependency
A functional dependency occurs when the value of one (set of) attribute(s) determines the value of a second (set of) attribute(s):
StudentID StudentName
StudentID (DormName, DormRoom, Fee)
The attribute on the left side of the functional dependency is called the determinant.
Functional dependencies may be based on equations:ExtendedPrice = Quantity X UnitPrice
(Quantity, UnitPrice) ExtendedPrice
But, function dependencies are definitely not equations!
Functional Dependencies Are Not Equations: An Example
ObjectColor Weight ObjectColor Shape ObjectColor (Weight, Shape)
We can deduce the following set of Functional Dependencies from the above diagram
But, does Shape functionally determine anything? (NO!)
Composite DeterminantsComposite determinant: a determinant of a functional dependency that consists of more than one attribute.
(StudentName, ClassName) (Grade)
Functional Dependency Rules(Not a complete list)
If A (B, C), then A B and A C
If (A,B) C, then neither A nor B determines C by itself
Functional Dependency Review
A functional dependency occurs when the value of one (or set of) attribute(s) determines the value of a second (or set of) attribute(s):
StudentID StudentNameStudentID (DormName, DormRoom, Fee)
The attribute on the left side of the functional dependency is called the determinant, the attribute on the right side is called the dependent.Functional dependencies may be based on equations:
ExtendedPrice = Quantity X UnitPrice(Quantity, UnitPrice) ExtendedPrice
Function dependencies are not equations
Composite Determinants
Composite determinant: A determinant of a functional dependency that consists of more than one attribute
Example of a Composite Determinant: (StudentName, ClassName) (Grade)
Find the functional dependencies in the SKU_DATA Table
Ask yourself the question – if we know the value of a particular attribute, will that value determine a unique value of some other attribute? (If “yes,” then we have a functional dependency between the attributes.)
Functional Dependencies in the SKU_DATA Table
SKU (SKU_Description, Department, Buyer)
SKU_Description (SKU, Department, Buyer)
Buyer Department
Find the functional dependencies in the ORDER_ITEM Table
Functional dependencies in ORDER_ITEM Table
(OrderNumber, SKU) (Quantity, Price, ExtendedPrice) Note that OderNumber by itself does not functionally
determine any other attribute While SKU, from the data, does appear to functionally
determine Price, we always need to be very careful in making inferences from data. Prices may change in the future, and the price might often be tied to a particular order. So, we would prefer to use the composite of SKU and OrderNumber as a determinant in a functional dependency, rather than SKU by itself.
(Quantity, Price) (ExtendedPrice) Note that this is derived from the equation
ExtendedPrice = Quantity * Price
When are determinant values unique?A determinant has unique values (i.e., all values are different) in a relation if, and only if, it functionally determines every other attribute in the relation So, in SKU_Data, SKU has all different (unique) values,
and it functionally determines every attribute in the table. On the other hand, Buyer, though a determinant, does not have unique values, and does not functionally determine all the other attributes in the relation.
So, you cannot find the determinants of all functional dependencies simply by looking for unique values in one column
A B C D Ea(1) b(1) c(1) d(1) e(1)a(1) b(1) c(2) d(1) e(1)a(2) b(1) c(1) d(1) e(1)a(2) b(2) c(1) d(2) e(1)a(2) b(2) c(2) d(3) e(2) BC ----> D (True or False?)
B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)
The Answers
BC ----> D (True or False?) B ----> A (True or False?) D ----> BE (True or False?) AB ----> C (True or False?)
Deducing Functional Dependencies
Since BC ----> D and D ----> BE, can we conclude that BC ----> BE ?
YES! (We will call this transitivity)
If BC ----> D and BC ----> A, can we conclude that D ----> A ?
NO! Nor can we conclude A ----> D.
Superkeys & FD's A superkey is an attribute or a set of attributes that identify
an entity UNIQUELY. In a relation (table), a SUPERKEY is any column or set of
columns whose values can be used to distinguish onerow from another.
Since a superkey identifies each item uniquely, it functionally determines all the attributes of a relation. STUID is a superkey SOCSEC is a superkey STUNAME is NOT a superkey STUID,STUNAME IS a superkey STUID,ANY OTHER SET OF ATTRIBUTES is a superkey
The Formal Theory Definition Of A Superkey
A set of attributes K is a superkey of relation (table) R, if K ----> R
In other words, a superkey functionally determines all the attributes in R
More On SuperkeysA superkey is a candidate key if it is minimal, i.e., if X is a superkey, then X minus {any attribute of X} is NOT a superkey.
A primary key is a candidate key which we choose to be THE "key."
Superkeys, Candidate Keys And Primary Keys
Superkey: a set of attributes which functionally determines all of the attributes in the relation
Candidate key:from the set of superkeys, we eliminate all those superkeys which have "extra" attributes (a superkey will have an "extra" attribute if, when we remove this attribute, the resulting set of attributes is also a superkey).
Primary key: if there is more than 1 candidate key, then the candidate key we choose for THE key is called the primary key - if there is exactly 1 candidate key, then that candidate key is the primary key.
Example - Obtain Candidate KeysConsider the following scheme from an airline database system:( P(pilot) , F(flight# ), D(date), T (scheduled time to depart) )
We have the following FD's : F ----> T PDT ----> F FD ----> P
Provide some superkeys: PDT is a superkey, and FD is a superkey. Is PDT a candidate key?
PD is not a superkey, nor is DT, nor is PT. So, PDT is a candidate key.
FD is also a candidate key, since neither F or D are superkeys.
Surrogate Keys
A surrogate key is an artificial attribute/column added to a relation to serve as a primary key:Often DBMS suppliedShort, numeric and never changes – an
ideal primary key!Has artificial values that are meaningless
to usersNormally hidden in forms and reports
Example of Surrogate Keys(NOTE: The primary key of the relation is underlined below)
RENTAL_PROPERTY without surrogate key:RENTAL_PROPERTY (Street, City,State/Province, Zip/PostalCode, Country, Rental_Rate)
RENTAL_PROPERTY with surrogate key: RENTAL_PROPERTY (PropertyID, Street, City,
State/Province, Zip/PostalCode, Country, Rental_Rate
Trivial FD'sA functional dependency is defined to be trivial if it is satisfied by every relation
Example of a trivial functional dependency: AB ----> A is satisfied by every
relation involving A.
Trivial Fd'sGeneralization and rule for trivial FD's:
An FD is trivial if it has the form: X ----> Y, where Y is a subset of X.
So, ABCD ----> ABC is a trivial FD.
A trivial FD does not make a significant statement about real world constraints - we are thus only interested in non-trivial FD's.
Another FD “Rule”
If (A,B) C, then neither A nor B by itself will functionally determine C.
Normal Forms There are numerous "normal forms" which
are categorizations based upon the kinds of “problems” that relations have.
These will be discussed:
First Normal Form (1NF)Second Normal Form (2NF)Third Normal Form (3NF)Boyce-Codd Normal Form (BCNF)
FIRST NORMAL FORM
A relation is in first normal form (1NF) iff every attribute in every row can contain only a single value. A 1NF relation cannot have any row that contains a repeating grouping of attribute values.
Example Of A Relation Not In 1NF
Ordnumb Orddte Partnumb Numbord12489 30109 AX12 1112491 30209 BT04 1 BZ66 112495 30409 CX11 2
*We can convert the above table to 1NF by flattening *
Ordnumb Orddte Partnumb Numbord12489 30109 AX12 1112491 30209 BT04 112491 30209 BZ66 112495 30409 CX11 2
Second Normal Form
Definition: an attribute is a non-key attribute if it is not a part of the primary key
Definition: A relation is in second normal form (2NF) if it is in first normal form and no non-key attribute is dependent on only a portion of the primary key (when the primary key is composite - consisting of 2 or more attributes)
Example Of A Relation In 1NF,
But Not 2NFOrdnumb Orddte Partnumb PartDesc Numbord Quoprice12489 90509 AX12 MOUSE 11 14.9512491 90509 BT04 DRV270G 1 120.9912491 90509 BZ66 DRV180G 1 80.9512495 90709 AX12 MOUSE 4 14.95
*****The following FD's hold on this relation*******Ordnumb ----> OrddtePartnumb ---> PartDescOrdnumb, Partnumb ----> Numbord, Quoprice
******The relation is NOT in 2NF because ...*********PartDesc is dependent on only a portion of primary key,
and similarly for Orddte
Transform Relation To 2NF First, take each subset of the set of attributes which make up
the primary key, and begin a relation with this subset as its primary key
(Ordnumb)(Partnumb)(Ordnumb, Partnumb)
Then, place each of the other attributes with the appropriate primary key, i.e., place each one with the minimal collection on which it depends
(Ordnumb, Orddte)(Partnumb, Partdesc)(Ordnumb, Partnumb, Numbord, Quoteprice)
Third Normal Form A relation is in Third Normal Form (3NF) iff it is
in Second Normal Form and there is no non-key attribute which is functionally dependent upon another non-key attribute in any functional dependency
("each non-key attribute must depend upon the key, the whole key, and nothing but the key")
Example Of Relation In 2NF, But Not 3NF Consider STUDENT(STUID, STUNAME, MAJOR,
CREDITS, FSJS) with the following FD's:Stuid ----> Stuname, Major, Credits, FSJSCredits---> FSJS
Since attribute FSJS depends on credits, student is not in 3NF
To create 3NF here, form a new relation (STATS) with the functionally dependent attribute and its determinant
STU2 ( Stuid, Stuname, Major, Credits) R1
STATS ( Credits, FSJS ) R2
Boyce-Codd Normal Form (BCNF) Reminder: a determinant is an attribute (or
collection of attributes) that functionally determines another attribute (or set of attributes), i.e., it is the LHS of a functional dependency
Example: in sosec ---------> stuname, sosec is a determinant
Def.: A relation is in Boyce-Codd normal form if every determinant is a candidate key
Another Example Of 2NF Relation (Not In 3NF And Not In BCNF)
GIVEN: PC (TAGNUM, COMPID, EMPNUM,EMPNAME,LOCATION)
and given the following functional dependencies: FD1: TAGNUM ---->COMPID,EMPNUM,EMPNAME,LOCATION.
FD2: EMPNUM-----> EMPNAME
This Relation Satisfies 2NF, But Not 3NF Or BCNF
TAGNUM COMPID EMPNUM EMPNAMELOCATION
32808 M759 611 DINH, M. ACCOUNTING
37691 B121 124 ALVAREZ, R SALES
57772 C007 567 FEINSTEIN, BINFO
SYSTEMS
59836 B221 124 ALVAREZ, R HOME
77740 M759 567 FEINSTEIN, B HOME
Some Anomalies Present In This Relation
UPDATE: If Betty Feinstein gets married, must change more than 1 record
INCONSISTENT DATA: Potential problem due to redundancy
ADDITIONS: New employee 347 cannot be added until a pc is assigned
Why Is The PC Relation Not In 3NF Or Boyce Codd Normal Form?
1) It is in 2NF (there is no non-key attribute dependent on only a portion of the primary key, since the primary key consists of only 1 attribute)
2) The primary key is TAGNUM.
3) The only candidate key is TAGNUM.
4) There are 2 determinants - TAGNUM AND EMPNUM.5) Since EMPNUM is a determinant but not a candidate key, the relation is not in BCNF. And it's not in 3NF either.
Changing Our PC Relation To 3NF
PC (TAGNUM, COMPID, EMPNUM, EMPNAME, LOCATION) is replaced by
PC (TAGNUM, COMPID, EMPNUM, LOCATION)
and
EMPLOYEE (EMPNUM, EMPNAME)
Transforming A 3NF Relation To BCNF
1) For each determinant that is not a candidate key, remove from the relation the attributes which are functionally determined by this determinant.
2) Create a new table containing all the attributes from the original relation which were functionally determined by this determinant.
3) Make the determinant the primary key of this new relation.
Important PointsA relation in 3NF may or may not be in
Boyce Codd Normal Form
BUT, a relation in Boyce Codd Normal Form will ALWAYS be in 3NF.
{Some textbooks consider Boyce Codd Normal Form to be "the" third Normal Form. Ours does not. }
Example of a relation in 3NF which is NOT in BCNF
SID MAJOR FACNAME100 Math Cauchy150 Psychology Jung200 Math Riemann250 Math Cauchy300 Psychology Perls300 Math Riemann
Suppose that, in a given university:1. Students may have one or more majors.2. A major may have several faculty members as as advisers.3. A faculty member can advise in only one major area.
Things to note from this example
The primary key is not SID !!
The primary key consists of two attributes: SID and MAJOR.
There is an important functional dependency corresponding to the statement "A Faculty member can advise students in only one major area." FACNAME -----> MAJOR
The relation IS in 2NF, since there are no non-key attributes dependent on only a portion of the primary key.
The relation is in 3NF, but NOT in BCNF.
The ADVISOR relation transformed to Boyce Codd
Normal Form
SID FACNAME
100 Cauchy
150 Jung
200 Riemann
250 Cauchy
300 Perls
300 Riemann
STU-ADV(SID, FACNAME)
FACNAME MAJOR
Cauchy MathJung Psychology
Riemann MathPerls Psychology
ADV-MAJOR(FACNAME, Major)
Going Directly to BCNF
Example 1 of Going Directly to BCNFThe SKU_DATA TABLE
Working Through The Example
SKU_DATA (SKU, SKU_Description, Department, Buyer)Identify the FDs:
a) SKU (SKU_Description, Department, Buyer)b) SKU_Description (SKU, Department, Buyer)c) Buyer Department
SKU and SKU_Description are candidate keys, Buyer is NOT a candidate key, so SKU_DATA is not in BCNF. Placing the columns of the problem FD (c) into a separate relation, with the determinant Buyer as the primary key, and making Buyer a foreign key in the SKU_DATA relation, we obtain:
SKU_DATA2 (SKU, SKU_Description, Buyer)BUYER (Buyer, Department)
Where BUYER.Buyer must exist in SKU_DATA2.Buyer
The Resulting Populated SKU_DATA2 and BUYER Relations, in BCNF
Example 2 of Going Directly to BCNFThe EQUIPMENT_REPAIR table
Working Through The ExampleEQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost,
RepairNumber, RepairDate, RepairAmount)Identify the FDs:
a) ItemNumber (Type, AcquisitionCost)b) RepairNumber (ItemNumber, Type, AcquisitionCost,
RepairDate, RepairAmount)
RepairNumber is a candidate key, ItemNumber is NOT a candidate key, so EQUIPMENT_REPAIR is not in BCNF. Placing the columns of the problem FD (a) into a separate relation, with the determinant ItemNumber as the primary key, and making ItemNumber a foreign key in the REPAIR relation, we obtain:
ITEM (ItemNumber, Type, AcquisitionCost)REPAIR (RepairNumber, RepairDate, RepairAmount, ItemNumber, )
Where REPAIR.ItemNumber must exist in ITEM.ItemNumber
The Resulting Populated REPAIR and ITEM Relations, in BCNF
SUMMARY OF NORMAL FORMS WE HAVE COVERED
1NF – A table that qualifies as a relation is in 1NF
2NF – A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key
3NF – A relation is in 3NF if it is in 2NF and there is no non-key attribute which is functionally dependent upon another non-key attribute in any functional dependency, or, equivalently, there are no determinants except the primary key, (or, equivalently, there are no transitive dependencies {i.e., there are no FDs where A B and B C} )
Boyce-Codd Normal Form (BCNF) – A relation is in BCNF if every determinant is a candidate key
Top Related