CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review In the normalization...

59
CSC 240 (Blum) 1 More on normalization

Transcript of CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review In the normalization...

Page 1: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 1

More on normalization

Page 2: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 2

Normalization Review

In the normalization approach, one has data in a table-like format and begins to rearrange the data into smaller tables that minimize the data redundancy without losing any of the information in the relationships.

This decomposition can be done in steps, each step called a Normal Form having more stringent conditions than the previous.

Page 3: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 3

Normalization Starting Point: Un-normalized table

The first step in normalization is to eliminate any multi-valued fields by flattening the table. A single row of the table with multi-

valued fields is replaced with many rows, one for each value in the multi-valued field.

When multi-valued fields are thus eliminated, the table is said to be in the First Normal Form.

Page 4: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 4

Un-normalized table example

PA Pennsylvania Harrisburgh BorskiBradyCoyne…

Robert A.Robert A. William J. …

NJ New Jersey Trenton AndrewsFerguson…

Robert E. Mike…

Page 5: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 5

Flattened table (now in the First Normal Form)

PA Pennsylvania Harrisburgh Borski Robert A.

PA Pennsylvania Harrisburgh Brady Robert A.

PA Pennsylvania Harrisburgh Coyne William J.

… … … … …

NJ New Jersey Trenton Andrews Robert E.

NJ New Jersey Trenton Ferguson Mike

… … … … …

Page 6: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 6

A note on primary keys of flattened tables

Recall that a primary key must uniquely identify each record in a table.

When a table is flattened to eliminate multi-valued fields, one of those previously multi-valued fields must be the primary key or part of a composite primary key.

Page 7: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 7

Example

StudentID LastName FirstName Middle Maj1Maj

2Dept CrseNo CourseName Semester Grade

9874923 

McDermott    

Mary    

Margaret    

ENG    

     

ENG 108 Writing II Fall 2002 A-

PHL 151 Critical Thinking Fall 2002 B+

MTH 150 Thinking Math Fall 2002 B

ENG 230 American Lit Spring 2003 B+

REL 150 Rel in America Spring 2003 B-

9840495    

Jameson    

John    

James    

CSC    

INFT

    

CSC 230 Programming Fall 2002 A-

MTH 160 Discrete Math Fall 2002 B-

ENG 107 Writing I Fall 2002 B+

PHL 151 Critical Thinking Spring 2003 B

REL 150 Rel in America Spring 2003 B+

When the table is flattened, something about the course must be part of the primary key – in this case we need Dept, CrseNo and Semester.

Page 8: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 8

The Gain: search-ability

Flattening the file will help with searching and querying. We do not have to look for “Coyne”

buried in the midst of a long list that must be parsed (broken into pieces), etc.

If you can imagine that you might want to search on it, it should be “atomic” in a field of a record by itself.

Page 9: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 9

The loss: data redundancy

Data redundancy is the unnecessary repetition of information.

Data redundancy makes it hard to maintain data integrity (correctness). E.g. when you change your address with one

department in an organization, but the other departments still have your old address.

Some repetition is necessary to maintain relationships.

Page 10: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 10

Flattened table (now in the First Normal Form)

PA Pennsylvania Harrisburgh Borski Robert A.

PA Pennsylvania Harrisburgh Brady Robert A.

PA Pennsylvania Harrisburgh Coyne William J.

… … … … …

• Repetition of PA is necessary to maintain the relationship between a HouseMember and a State.

• Repetition of Pennsylvania and Harrisburgh is unnecessary.

Page 11: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 11

Identifying and reducing redundancy

The reason Pennsylvania and Harrisburgh are unnecessarily repeated is because the relationship is fully realized by just using the stateSymbol.

The stateName and stateCapital are uniquely determined by the stateSymbol.

stateName and stateCapital are said to be functionally dependent on stateSymbol.

Page 12: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 12

Uniquely!!

In a Billboard example, the song may be a duet and thus song would not determine the artist uniquely.

Jay-Z and Alicia Keys both sing “Empire State of Mind,” thus “Empire State of Mind” cannot be used to determine the artist uniquely.

Page 13: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 13

Reducing Redundancy

Identifying functional dependencies is the key to reducing redundancy.

There are a few normal forms (Second Normal Form, Third Normal Form and Boyce-Codd Normal Form) which eliminate increasing degrees of redundancy.

What distinguishes the various forms is the field(s) upon which something is functionally dependent and the type of functional dependence.

Page 14: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 14

Determinant

To avoid awkward phrases like “the field(s) upon which something is functionally dependent”, we introduce the term determinant.

The determinant attribute determines some other attribute, i.e. this other attribute is functionally dependent upon the determinant field. stateName depends on stateSymbol stateSymbol is the determinant of stateName

Page 15: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 15

Types of functional dependence Attribute B is functionally dependent on

Attribute A if knowing the value of A means one can in turn know the value of B uniquely.

Attribute A (the determinant) may be a composite attribute – i.e., made up of more than one field.

If the full knowledge of A (all of its composite fields) is necessary to determine B, then B is fully dependent on A.

If only partial knowledge of A (some of its composite fields) is necessary to determine B, then B is partially dependent on A.

Page 16: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 16

Not a two-way street

Recall that B being functionally dependent on A does not mean A is functionally dependent on B. dateOfBirth is functionally dependent on

socSecNum, but socSecNum is not functionally dependent on dateOfBirth

Page 17: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 17

Transitive dependence

If attribute B is functionally dependent on A and attribute C is functionally dependent on B, then C is said to transitively dependent on A provided that A is not functionally dependent on C.

Page 18: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 18

A Transitive Dependency Example

employeeNumber is functionally dependent on socSecNum and salary is functionally dependent on employeeNumber, then salary is “transitively functionally dependent” on socSecNum.

Rephrased: socSecNum is a determinant of employeeNumber and employeeNumber is a determinant of salary, then socSecNum is a determinant of salary.

Page 19: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 19

Another Transitive Dependency Example

Imagine a table in which you sell a stock – a StockTransaction table.

stockID is functionally dependent on transactionID and stockName is functionally dependent on stockID, then stockName is transitively dependent on transactionID.

Page 20: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 20

Primary key

Recall the primary key is an attribute or set of attributes that uniquely identify each row in a table. Thus every attribute that is not part of the primary

key is functionally dependent on the primary key. Rephrased: The primary key is a determinant of

any non-primary-key attribute. The level of decomposition (the Normal

Form) is determined by the type of dependence the field has on the primary key.

Page 21: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 21

Second Normal Form

Eliminating any field (via table decomposition) that partially depends on the primary key puts the table into Second Normal Form (provided the table was in the First Normal Form prior to decomposition). Note that a table with a simple (non-

composite) primary key is necessarily in Second Normal Form.

Page 22: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 22

Second Normal Form Example CharacterFeaturedInEpisode(characterID, episodeID,

firstName, lastName, characterDescription, title, episodeDescription, originalAirDate) In the notation above the underlined fields are serving

as the primary key. The primary key is composite.

The attributes firstName, lastName and characterDescription are partially functionally dependent on the primary key because they are determined only by characterID.

The attributes title episodeDescription and originalAirDate are partially functionally dependent because they are determined only by episodeID.

Page 23: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 23

Second Normal Form Example (Cont.)

Create tables having primary keys which are subsets (including a proper subset but not the empty set) of the primary keys of the original table.

Place the non-primary-key attributes in the table in which they are fully dependent.

Page 24: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 24

Second Normal Form Example (Cont.)

Character(characterID, lastName, firstName, characterDescription)

Episode(episodeID, title, episodeDescription, originalAirDate)

EpisodeFeature(characterID, episodeID) Note that although no field depends on

both characterID and episodeID, we must keep the table with both keys to maintain the relationship (the lossless join property).

Page 25: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 25

Some Redundancy May Remain

In the above example, the firstName, lastName combination may serve as the primary key, each of the other attributes stateSymbol, stateName and stateCapital are fully dependent on the primary key. So it’s in Second Normal Form. But clearly there is still redundancy.

PA Pennsylvania Harrisburgh Borski Robert A.

PA Pennsylvania Harrisburgh Brady Robert A.

PA Pennsylvania Harrisburgh Coyne William J.

… … … … …

Page 26: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 26

Third Normal Form

Eliminating any field (via table decomposition) that transitively depends on the primary key puts the table into Third Normal Form (provided the table was in the Second Normal Form prior to decomposition).

Page 27: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 27

firstName, lastName determines stateSymbol which in turn determines stateName and stateCapital. (transitive dependence)

PA Pennsylvania Harrisburgh Borski Robert A.

PA Pennsylvania Harrisburgh Brady Robert A.

PA Pennsylvania Harrisburgh Coyne William J.

… … … … …

Page 28: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 28

Decomposition into the Third Normal Form

Create another table that has as a primary key the attribute which is the intermediate attribute in the transitive dependence.

(lastName, firstName, stateSymbol) (stateSymbol, stateName,

stateCapital)

Page 29: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 29

Another transitive dependence example

Customer(customerID, lastName, firstName, street, city, state, zipcode, stateTax, cityTax)

There is a simple primary key, so the table is in Second Normal Form (2NF).

But the city tax is dependent on the city and the state tax is dependent on the state.

In fact city and state are dependent on zipcode.

Page 30: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 30

Another transitive dependence example (Cont.)

Customer(customerID, lastName, firstName, street, zipcode)

AddressInfo(zipcode, city, state, cityTax, stateTax)

There could be further decomposition since stateTax depends on state. For practical purposes, many draw the line at

some of these decomposition even if they do reduce data redundancy. A question to ask is how likely is an update anomaly for the particular set of data.

Page 31: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 31

Zipcode Dependence (Cont.)

In fact city and state may also be dependent on zipcode. Sometimes a small city shares a

zipcode with a bordering city or neighborhood of a bordering city.

It also depends on whether one means a 5-digit zipcode or a 9-digit zipcode.

Page 32: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 32

Another transitive dependence example (Cont.)

Customer(customerID, lastName, firstName, street, zipcode)

ZipInfo(zipcode, city, state, cityTax, stateTax) There could be further decomposition

since stateTax depends on state.

Page 33: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 33

Another transitive dependence example (Cont.)

Customer(customerID, lastName, firstName,street, zipcode)

ZipInfo(zipcode, city, state) StateTax(state, stateTax) CityTax(state, city, cityTax)

Page 34: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 34

Recall the price

While redundancy has its price (increased storage and the possibility for update anomalies), minimizing redundancy also has a price: It introduces more tables. More tables means more joins when it

comes to querying the database.

Page 35: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 35

Introducing primary keys

If one has an awkward primary key Perhaps it is composite, e.g. FirstName, LastName Perhaps it may change, ItemName

Then it is valid to introduce an ID to serve as a primary key.

Just don’t let the introduction of simple key get in the way of eliminating data redundancy. This can be a problem with second normal form which is defined as having no partial dependence on the primary key. Thus the 2NF decomposition can depend on one’s choice of primary key.

Page 36: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 36

Some Redundancy May Remain at 3NF: Lot Example

Let us say the land within a county is broken up into lots and each lot is assigned a number.

A county is also broken down into municipalities (cities, townships, etc.).

The lots are assessed at some value. The table might look like:

LotAssessment(lotID, county, municipality, assessment)

Page 37: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 37

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The next stage is to select a primary key. There are two candidate keys:

1. LotAssessment(lotID, county, municipality, assessment)

2. LotAssessment(lotID, county, municipality, assessment)

Page 38: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 38

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The next thing to note in this example is that county is functionally dependent on municipality. 1. LotAssessment(lotID, county,

municipality, assessment)2. LotAssessment(lotID, county,

municipality, assessment) Note that choice two is not in 2NF.

Page 39: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 39

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The second choice LotAssessment(lotID, county,

municipality, assessment)

is decomposed as follows:LotAssessment(lotID, municipality,

assessment) CityCounty(municipality, county)

Page 40: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 40

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The first choice LotAssessment(lotID, county,

municipality, assessment)

on the other hand, is in 2NF and 3NF. There is no partial dependence on the

primary key. There is no transitive dependency on

the primary key.

Page 41: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 41

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

A possible feature of tables that are in 3NF but may still have redundancy is that there are various candidate keys from which one chooses the primary key. We will introduce a generalization

and/or extension of the Normal Form idea to ensure that we get further in our redundancy reduction independent of our initial choice of primary key.

Page 42: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 42

Generalization: Primary Candidate Key

One way to avoid the type of problem that occurred in the Lot example is to extend the definitions of the Second and Third Normal Forms.

To extend the definitions of the Second and Third Normal Forms, replace the term “depends on the primary key” with “depends on any candidate key.”

Page 43: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 43

Return to the lot

Note that the first lot table is not in generalized 2NF because there is a partial dependence on a candidate key.

So with the generalized version of 2NF, this table would be decomposed.

In fact, the decomposed table would have the second candidate key as its primary key.

In effect, you are forced to choose the candidate key that yields decomposition.

Page 44: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 44

Boyce-Codd Normal Form

After Third Normal form, the next stricter form is called the Boyce-Codd Normal Form.

A table is in Boyce-Codd Normal Form if the only determinants (attributes that determine other attributes) are candidate keys.

Page 45: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 45

Primary Key Choice: Tutoring Example

Let us say we have a tutoring center in which tutees (students) come in to see tutors to be tutored in a course.

A tutor is assigned to a tutee for a particular course. Thus the tutor-tutee pair is a determinant of

course. The tutor and tutee meet on a certain date at

a certain time and are assigned a room for the tutoring session.

A tutor and tutee meet at most once a day.

Page 46: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 46

Primary Key Choice: Tutoring Example (Cont.)

The starting table might look like:Tutoring(tutor, tutee, course, room, date,

time) The next stage is to identify the primary

key. In this example, there are many choices, i.e. there are many candidate keys.

Page 47: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 47

Primary Key Choice: Tutoring Example (Cont.)

On a given date, at a given time, in a given room, there can only be one tutor-tutee pair studying a course. So one choice for the primary key is date, time, roomTutoring(tutor, tutee, course, room, date,

time) This table is in 2NF but not 3NF because

tutor-tuteecourse is second part of a transitive dependence of course on the primary key.

Page 48: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 48

Primary Key Choice: Tutoring Example (Cont.)

On a given date, a given tutor-tutee pair meet just once to study their assigned course. So another choice for the primary key is date, tutor, tuteeTutoring(tutor, tutee, course, room, date,

time) This table is not in 2NF because course is

partially dependent on the primary key.

Page 49: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 49

Primary Key Choice: Tutoring Example (Cont.)

On a given date, at a given time, a given tutor meets a tutee in a room to study their assigned course. So another choice for the primary key is date, time, tutor Tutoring(tutor, tutee, course, room, date,

time) This table is in 2NF. It is not in 3NF but the

intermediate attribute (tutor-tutee) is comprised of part of the primary and part non-primary key.

Page 50: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 50

Primary Key Choice: Tutoring Example (Cont.)

On a given date, at a given time, a given tutee meets a tutor in a room to study their assigned course. So another choice for the primary key is date, time, tutee Tutoring(tutor, tutee, course, room, date, time) This table is in 2NF. It is not in 3NF but the

intermediate attribute (tutor-tutee) is comprised of part of the primary and part non-primary key.

Page 51: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 51

Primary Key Choice: Tutoring Example (Cont.)

While the tutoring example has to be decomposed to reach the 3NF, it does demonstrate that there can be many different primary keys.

If the generalized version of the 2NF is used then decomposition occurs there (because it occurs there for one of the candidate keys).

Page 52: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 52

Primary Key Choice: Tutoring Example (Cont.)

The resulting tables areTutoring(tutor, tutee, room, date, time)Subject(tutor, tutee, course)

This decomposition does not force a choice of the primary key in the first table.

Page 53: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 53

Many candidate keys

Another example with many possible keys is the football schedule table. We could use Week, HostName Week, AwayName Date, HostName Date, AwayName

week date hostCity hostState hostName awayCity awayState awayName hostScore awayScore

1 9/12/049/13/04…

PhiladelphiaCharlotte…

PANC…

EaglesPanthers…

New YorkGreen Bay…

NYWI…

GiantsPackers…

3114…

1724…

2 9/19/04 Kansas City MO Chiefs Charlotte NC Panthers 17 28

Page 54: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 54

Choose Date over Week

Note that Date Week (but not vice versa). In the third and fourth choices this is a

partial dependence on the primary key – and so Second Normal Form requires a table consisting of (Date, Week).

Generalized Second Normal Form would also require the decomposition because it is a partial dependence on a candidate key.

Boyce-Codd Normal Form would require the decomposition because the determinant (Date) is not by itself a candidate key.

Page 55: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 55

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 56: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 56

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 57: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 57

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 58: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 58

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 59: CSC 240 (Blum)1 More on normalization. CSC 240 (Blum)2 Normalization Review  In the normalization approach, one has data in a table-like format and begins.

CSC 240 (Blum) 59

References

Database Systems, Rob and Coronel Database Systems, Connolly and

Begg Fundamentals of Relational

Databases, Mata-Toledo and Cushman

Concepts of Database Management, Pratt and Adamski