Database Principles Relational Database Design II.

46
Database Principles Relational Database Design II

Transcript of Database Principles Relational Database Design II.

Page 1: Database Principles Relational Database Design II.

Database Principles

Relational Database Design II

Page 2: Database Principles Relational Database Design II.

Database Principles

Design Objective:

• Turn bad tables into good.– Create a set of tables; all about one thing

• Do so without loss of information– The new set of tables are related in such a way that

joining new tables recreates exactly the information found in the original tables

Page 3: Database Principles Relational Database Design II.

Database Principles

It is all about Information:

• How is information captured within a table?

• In the above table, a table about Suppliers, the “information” is that:– every supplier has a unique ID– every supplier has a unique name– every supplier has a unique location

Sno Sname Location

s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil

Supplier

Page 4: Database Principles Relational Database Design II.

Database Principles

Table Information and Enterprise Rules:

• The rules:– every supplier has a unique ID– every supplier has a unique name– every supplier has a unique location

are called Enterprise Rules (ER).• Enterprise Rules are rules that come from

– domain knowledge, – how the business or organization is run,– what the “experts” know about the business

Page 5: Database Principles Relational Database Design II.

Database Principles

Example:

• Instead of “every supplier has a unique location”:

• Suppose we are told that a supplier can have several locations:

Sno Sname Location

s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil

Supplier

pk = {Sno}

Sno Sname Location

s1 Acme NY,LAs2 Ajax Boss3 Apex Chi,Phil,SFs4 Ace LAs5 A-1 Phil

Supplier

This is not permitted:each row-column intersection should contain a single value

pk = {Sno}

Page 6: Database Principles Relational Database Design II.

Database Principles

Example (cont):

• Suppose we are told that a supplier can have several locations – a better solution:

• Point to be Made: Different Enterprise Rules result in different table configurations.

Sno Sname

s1 Acmes2 Ajaxs3 Apexs4 Aces5 A-1

Supplier Sno Location

s1 NYs1 LAs2 Boss3 Chis3 Phils3 SFs4 LAs5 Phil

SupplierLocation

pk = {Sno} pk = {Sno,Location}

Page 7: Database Principles Relational Database Design II.

Database Principles

How to Capture Enterprise Rules:

• A functional dependency (FD) is a functional relationship among attributes of a table, whose definition may vary over time.

• The difference between a mathematical function and a functional dependency is that the latter may have its definition change over time.

A B

f

Function Definition: For every a Є A there is a uniqueb Є B such that f(a) = b.

a b = f(a)

Page 8: Database Principles Relational Database Design II.

Database Principles

Example of Functional Dependencies:

• Consider the following:

• Both name_of and location_of are functional dependencies but may not both be functions.

SNOs SNAMEsname_of

SNOs LOCATIONs

location_of

Enterprise Rules:

Every Supplier has a unique nameChange over Time: Not likely

Every Supplier has a unique LocationChange over Time: Possibly.

Page 9: Database Principles Relational Database Design II.

Database Principles

Functional Dependencies for Capturing Enterprise Rules

• Write down all the functional dependencies in the following table:

• If you capture the table information in FDs you have also captured the Enterprise Rules of the business.

borrowerid b_name b_addr b_status loan_limit

pk

Cardholder

borrowerid b_nameborrowerid b_addrborrowerid b_statusborrowerid loan_limitb_status loan_limit

Page 10: Database Principles Relational Database Design II.

Database Principles

Many Enterprise Rules of a business are captured bythe Functional Dependencies found among the columns

of the tables that hold the data of the business.

As a database designer, if you capture the FDs of thevarious tables of the database you also

understand the rules by which the organization goes about its business.

Page 11: Database Principles Relational Database Design II.

Database Principles

Why we need FDs?

• Because we must never lose any information turning bad tables into good, we must know before we start what information we have.

• Relational Database Design starts with listing all FDs of all existing tables.

Page 12: Database Principles Relational Database Design II.

Database Principles

FD Notation:

• Suppose R = {A, B, C, D, E, F} is a table schema.

• Further suppose X ⊆ R and that there is a functional dependency from X to D. We write this:

A B C D E FR

X D

Page 13: Database Principles Relational Database Design II.

Database Principles

Different Ways of Describing FDs:

• If R is a table and A, B ε R then the following are equivalent:

– A B

– B depends on A

– A determines B

– B is determined by A

– if your know (the value of) A then you know (the value of) B

Page 14: Database Principles Relational Database Design II.

Database Principles

Example:

• Consider the table:

• Claim:

• Alternatively:

• Justification:

StudentID SSN Fname Lname DOB Address

Student

. . .

StudentID DOB

if you know the value of the StudentID then you can determine exactly the DOB.

there is an Enterprise Rule that says “everyone has a unique DOB”

Page 15: Database Principles Relational Database Design II.

Database Principles

Example:

• What other FDs exist in this table:

StudentID SSN Fname Lname DOB Address

Student

. . .

StudentID DOB

StudentID SSNStudentID FNameStudentID LNameStudentID Address

SSN StudentIDSSN FNameSSN LNameSSN AddressSSN DOB

What about: StudentID, Address DOB

Where do all these FDs come from?

Answer: Enterprise Rules

Page 16: Database Principles Relational Database Design II.

Database Principles

How Does Knowing the FDs tell Us If A Table Is Good?

• Remember, a table is “good” if it is about one thing.

• A table is always “about” whatever its key identifies.

• Therefore, a table is good if it only includes info about its key.

• An FD with a table key on the left-hand-side, tells us info about the key and consequently keeps the table “good”.

• Other FDs are consequently “bad”.

Page 17: Database Principles Relational Database Design II.

Database Principles

Good FDs and Good Tables:

• Def’n: A functional dependency is good in a table if it is of the form

table key some attribute

Sno Sname Location

s1 Acme NYs2 Ajax Boss3 Apex Chis4 Ace LAs5 A-1 Phil

Supplier

FDs:

Sno SnameSno Location

Hence the only FDs are “good”and consequently the table is also “good”.

pk = {Sno}

NOTE: The fact that in the current version of the table there are notwo suppliers with the same name does not mean that the world ofbusiness has an Enterprise Rule that says it must always be so. So we can’t say Sname Location

Page 18: Database Principles Relational Database Design II.

Database Principles

Something To Remember:

• The previous example points out an important fact.

We can’t be looking at current rows in the table, which reflect today’s reality, to decide what FDs exist or don’t exist. FDs reflect Enterprise Rules and FDs come from knowing about the business over the long term, not just what happens to hold true today.

Page 19: Database Principles Relational Database Design II.

Database Principles

Why is this a good thing?

• Because looking for FDs is a design-phase activity. We don’t need to have the tables already built.

• This means that we don’t need to build the tables only to find out later they are bad; we can avoid bad tables from the get-go.

Page 20: Database Principles Relational Database Design II.

Database Principles

What is Our Basic Job:

• To find out if a table is “good” we must first find all FDs in a table and decide which ones are “good” and which ones are not.

• In order to know which FDs are “good” we need to know all the keys of a table.

• Keys to a table are either super keys and have more columns than they need or as small as they can be, in which case are candidate keys.

• Super keys always contain a candidate key

• Basic Job: Find all candidate keys.

Page 21: Database Principles Relational Database Design II.

Database Principles

Reasoning Rule #1:

• Composition/Decomposition:

• The argument:– X A, B means X determines both A and B. This is

true either as a pair or individually so we can conclude

– if X A and X B then knowing X means you know both A and B; either individually or as a pair. Hence

X A, B is equivalent to X A and X B

X A, B implies X A and X B

X A and X B implies X A, B

Page 22: Database Principles Relational Database Design II.

Database Principles

Reasoning Rule #2:

• Identity:

• The argument:– If you know X then you know X.

X X

Page 23: Database Principles Relational Database Design II.

Database Principles

Reasoning Rule #3:

• Transitivity:

• The argument:– If you know X then you know Y. – But if you know Y you know Z.– Hence if you know X you know Z.

• Note: This is nothing more than function composition applied to FDs.

If X Y and Y Z then X Z

if f : A B and g : B C then g f : A C

Page 24: Database Principles Relational Database Design II.

Database Principles

Reasoning Rule #4:

• Trivial:

• The argument:– This says that X determines the empty set. – In other words, if you know X there is nothing else you

need or want to know.

If X Ǿ

Page 25: Database Principles Relational Database Design II.

Database Principles

Reasoning Rule #5:

• Augmentation:

• The argument:– This says that if X determines Y then including extra

columns in X does not alter this.

If X Y then X, A Y

Page 26: Database Principles Relational Database Design II.

Database Principles

Reasoning Rule #6:

• Augmentation+:

• The argument:– This says that if X determines Y and A determines B,

then X together with A determine Y together with B.

If X Y and A B then X, A Y,B

Page 27: Database Principles Relational Database Design II.

Database Principles

The 6 Reasoning Rules for FDs:

• Composition/Decomposition:

• Identity:

• Transitivity:

• Trivial:

• Augmentation:

• Augmentation+:

If X Y and Y Z then X Z

If X Ǿ

If X Y and Y Z then X Z

X X

If X Y then X, A Y

If X Y and A B then X, A Y,B

Page 28: Database Principles Relational Database Design II.

Database Principles

Be Careful:

• Notice, for example, that you can’t reason Augmentation backwards. You can’t say:

• A simple counter-example:

but neither

nor

are true.

If X, A Y then X Y

CourseNumber, SectionNumber ProfessorName

SectionNumber ProfessorName

CourseNumber ProfessorName

Page 29: Database Principles Relational Database Design II.

Database Principles

Exercise:

• Consider the table:

• The following sentences describe the data in the table:

StudID CrsID SecID Semester StudName CrsDesc Grade Bldg RoomNum RoomCap TTSlot

sid cid sec sem sn desc gd bg rn cap tt

A student, sid, whose name is sn enrolls in section sec of course cid in semester sem. The course description is desc. The student gets a grade of gr. The course was held in room rn of the building bg. The capacity of the room is cap. The time table slot for the course was tt.

Page 30: Database Principles Relational Database Design II.

Database Principles

Exercise:

• To find FDs we need to focus on Enterprise Rules

StudID CrsID SecID Semester StudName CrsDesc Grade Bldg RoomNum RoomCap TTSlot

sid cid sec sem sn desc gd bg rn cap tt

StudID StudName

CrsID CrsDesc

Bldg, RoomNum RoomCap

CrsID, SecID, Semester TTSlot

CrsID, SecID, Semester Bldg

CrsID, SecID, Semester RoomNum

StudID, CrsID, SecID, Semester Grade

Bldg, RoomNum, TTSlot, Semester CrsID,SecID

CrsID, SecID, Semester RoomCap, TTSlot, Bldg

Reason: Composition + Transitivity

Page 31: Database Principles Relational Database Design II.

Database Principles

Exercise:

• Find the FDs in this table:

borrowerid b_name b_addr b_status loan_limit

pk

Cardholder

borrowerid b_name, b_addr, b_status, loan_limit

b_status loan_limit

b_name, b_addr borrowerid (?)

good

bad

good

NOTE: The last FD, if it is an FD, shows us an important fact; namely if X determines a key then X too is a key. Reason: If X is a key then X R. If Y X then by transitivity Y R and so is a key too.

Page 32: Database Principles Relational Database Design II.

Database Principles

Finding Candidate keys:

• In order to know if an FD is good or bad, and so know if it belongs in the table or not, we need to know if a given set of columns (the columns in the left hand side of the FD) is a key (contains a candidate key).

• To find all keys we must first find all candidate keys.• Every key contains a candidate key.

Page 33: Database Principles Relational Database Design II.

Database Principles

Relationship Between FDs and Keys

• Can we recognize a set of columns as being a key by looking at a FD?

• Recall that a key is a set of columns whose values uniquely determine the remaining values in a row.

• Another way of putting this is:

• X R characterizes a key.

if X is a key to R and r1 and r2 are rows of Rthen: r1[X] = r2[X] implies r1 = r2

if you know X you know everything (R)

X R

Page 34: Database Principles Relational Database Design II.

Database Principles

Exercise:

• X is a key to R if and only if X (R \ X)

R = X (R \ X)If X is a key to R we know X Rby definition, soX X, (R \ X)From the Decomposition Rule we can say X X and X (R \ X)In other words,X (R \ X)

Now supposeX (R \ X)We already knowX Xby the Identity Rule.By the Composition Rule we can combine thelast two FDs.X X, (R \ X) = RSo X is a key.

R

X R \ X

Page 35: Database Principles Relational Database Design II.

Database Principles

Find Candidate Keys:

• Consider:

• Let’s assume that the FDs we know about are:– A, B C -- FD 1– D, E B -- FD 2– F D -- FD 3– B, E F -- FD 4– D A -- FD 5

A B C D E F

R

Page 36: Database Principles Relational Database Design II.

Database Principles

Find Candidate Keys (2):

A B C D E F

R

A, B C -- FD 1D, E B -- FD 2F D -- FD 3B, E F -- FD 4D A -- FD 5

We know we are looking for something like X R \ Xwhere we can’t make X any smaller. This meansan FD with all six columns, either on the left or the right hand side.

(i) D A -- FD5 (ii) D, B A, B -- (i), Identity(B), Aug+ (iii) D, B C -- (ii), FD1, Tran

(v) D, E B, D -- FD2, Identity(D), Aug+ (vi) D, E C -- (v), (iii),Tran

(viii) D, E B, E -- FD2, Iden(E), Aug+ (ix) D, E F -- (viii), FD4, Tran

(x) D, E A -- FD5, Aug (xi) D, E A, B, C, F -- (x), FD2, (vi), (ix), Comp

So {D, E} is a key since it determines all other columns. It is a candidate key if we can’t removeeither D or E. If we could then it should bepossible to prove either D Eor E Dusing only the original 5 FDs.

Clearly D E is impossible since nothing determines E (E is missing from all right-hand sides)

E D is also impossible. The best we can do is B, E F D

Page 37: Database Principles Relational Database Design II.

Database Principles

Finding Candidate Keys (3)

• Conclusion:

• Are there any other candidate keys? You must always ask and try to answer this question.

So can we solve the following query?

{D, E} is a candidate key to the table R.

Lemma: If X is a key to R and Y --> X then Y is a key to R

Proof: Y --> X and X --> R so Y --> R. Hence Y is a key to R.

? --> D, E

Page 38: Database Principles Relational Database Design II.

Database Principles

Finding Candidate Keys (4)

• Solve:

• So {F, E} is a key to R. It is a CK if it can't be made any smaller. Making it “smaller” means that:

• F --> E is impossible since once again, E does not appear on the right-hand-side of any original FD.

• B, E --> F is the best we can do.• {F, E} is another CK.

A, B C -- FD 1D, E B -- FD 2F D -- FD 3B, E F -- FD 4D A -- FD 5

? --> D, E

F --> D -- FD 3

(xii) F,E --> D, E -- FD 3, Ident(E), Aug+

either F --> E or E --> F

Page 39: Database Principles Relational Database Design II.

Database Principles

Finding Candidate Keys (5)

• Solve:

• So {B, E} is a key to R. It is a CK if it can't be made any smaller. Making it “smaller” means that:

• B --> E is impossible since once again, E does not appear on the right-hand-side of any original FD.

• D, E --> B is the best we can do.• {B, E} is another CK.

A, B C -- FD 1D, E B -- FD 2F D -- FD 3B, E F -- FD 4D A -- FD 5

? --> F, E

B,E --> F -- FD 4

(xii) B,E --> F, E -- FD 4, Aug

either B --> E or E --> B

Page 40: Database Principles Relational Database Design II.

Database Principles

Observations:

• The candidate keys are {B, E}, {D, E} and {F, E}.

• If an attribute (E in the last example) does not appear on the right-hand-side of any basic FD then it belongs to every candidate key.

Page 41: Database Principles Relational Database Design II.

Database Principles

Find Candidate Keys Exercise:

• Consider:

• Let’s assume that the FDs we know about are:– D C, A -- FD 1– B F -- FD 2– F C, E -- FD 3– B D -- FD 4

A B C D E F

R

Page 42: Database Principles Relational Database Design II.

Database Principles

Answer

• OBS: B belongs to every CK.

• The above shows {B} is a key. Since it is a singleton, it is a candidate key.

• Since every candidate key must contain {B} but can't actually be any larger than {B}, any other candidate key is also {B}.

• {B} is the only candidate key.

D C, A -- FD 1

B F -- FD 2

F C, E -- FD 3

B D -- FD 4 B --> F -- FD2 B --> D -- FD4(i) B --> C, A -- FD4, FD1, transitivity(ii) F --> E -- FD3, decomposition(iii) B --> E -- FD2, (iii), transitivity B --> A, C, D, E, F -- (i), FD4, (iii), FD2, composition

Page 43: Database Principles Relational Database Design II.

Database Principles

Find Candidate Keys Exercise:

• Consider:

• Let’s assume that the FDs we know about are:– B, E C -- FD 1– A, F B -- FD 2– C A, D -- FD 3– B E -- FD 4

A B C D E F

R

Page 44: Database Principles Relational Database Design II.

Database Principles

Answer

OBS: F belongs to every CK.

• The above shows {B,F} is a key. F can not be removed (it belongs to all CKs).

• In order to move B to the RHS we must leave A and F. But A is already moved so we need to leave C. But C is already moved so we must leave B.

• Hence in order to move B we must leave it where it is.• Hence {B,F} is a candidate key.

B, E C -- FD 1

A, F B -- FD 2

C A, D -- FD 3

B E -- FD 4

(i) B --> B, E -- FD4, ident(B), aug+(ii) B --> C -- (i), FD1, trans(iii) B --> A, D -- (ii), FD3, transitivity(iv) B --> A,C,D,E -- (i)-(iii), composition(v) B,F --> A,C,D,E -- (iv), aug

Page 45: Database Principles Relational Database Design II.

Database Principles

Answer (Additional Candidate Keys)

• {B,F} is a candidate key. • F belongs to every CK.

• Hence {A,F} is a key. It is a CK because– F can't be removed (see above)– A can't be removed since {F} is not a key

• {C, F} is a third CK.

B, E C -- FD 1

A, F B -- FD 2

C A, D -- FD 3

B E -- FD 4? --> B, F

(vi) A, F --> B -- FD2(vii) A, F --> B, F -- (vi), indent(F), aug+

C, F --> A, F -- FD3, decomp, indent(F), aug+

Page 46: Database Principles Relational Database Design II.

Database Principles

Finding Candidate keys:

• Observation: There is a candidate key excluding any attribute that appears on the right-hand-side of an FD.

Suppose X A and A is not in X then X, R \ {X,A} A by Augmentation Since all attributes of R appear in the above FD, the left hand side is a key andso by definition contains a candidate key.Since A does not appear on the left hand side

the CK does not contain A.