RELATIONAL DATABASE DESIGN Good Database Design Principles

28
Relational Database Design 1 RELATIONAL DATABASE DESIGN Basic Concepts Basic Concepts a database database is an collection of logically related records a relational database relational database stores its data in 2-dimensional tables a table table is a two-dimensional structure made up of rows (tuples, records) and columns (attributes, fields) example: a table of students engaged in sports activities, where a student is allowed to participate in at most one activity each row is unique and stores data about one entity row order is unimportant each column has a unique attribute name attribute name each column (attribute) description (metadata) is stored in the database Access metadata is stored and manipulated via the Table Design View grid column order is unimportant all entries in a column have the same data type Access examples: Text(50), Number(Integer), Date/Time each cell contains atomic data: no lists or sub-tables Table Characteristics Table Characteristics StudentID Activity Fee 100 Skiing 200 150 Swimming 50 175 Squash 50 200 Swimming 50

Transcript of RELATIONAL DATABASE DESIGN Good Database Design Principles

Page 1: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 1

RELATIONAL DATABASE DESIGN

Basic ConceptsBasic Concepts• a databasedatabase is an collection of logically related records• a relational databaserelational database stores its data in 2-dimensional

tables• a tabletable is a two-dimensional structure made up

of rows (tuples, records) and columns (attributes, fields)• example: a table of students engaged in sports activities,

where a student is allowed to participate in at most one activity

• each row is unique and stores data about one entity• row order is unimportant• each column has a unique attribute nameattribute name• each column (attribute) description (metadata) is stored in

the database• Access metadata is stored and manipulated via the Table Design View grid

• column order is unimportant• all entries in a column have the same data type

•Access examples: Text(50), Number(Integer), Date/Time

• each cell contains atomic data: no lists or sub-tables

Table CharacteristicsTable Characteristics

StudentID Activity Fee 100 Skiing 200 150 Swimming 50 175 Squash 50 200 Swimming 50

Page 2: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 2

RELATIONAL DATABASE DESIGN

Primary KeysPrimary Keys• a primary keyprimary key is an attribute or a collection of attributes

whose value(s) uniquely identify each row in a relation• a primary key should be minimal: it should not contain unnecessary attributes

• we assume that a student is allowed to participate in atmost one activity

• the only possible primary key in the above table is StudentIDStudentID•• Sometimes there is more than one possible choice; each possible choice is called a candidate key• what if we allow the students to participate in more than

one activity?

StudentID Activity Fee 100 Skiing 200 100 Golf 65 175 Squash 50 175 Swimming 50 200 Swimming 50 200 Golf 65

• now the only possible primary key is the combined value of (StudentIDStudentID, ActivityActivity), • such a multi-attribute primary key is called a composite keyor concatenated key

StudentID Activity Fee 100 Skiing 200 150 Swimming 50 175 Squash 50 200 Swimming 50

Page 3: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 3

RELATIONAL DATABASE DESIGN

Composite KeysComposite Keys• a table can only have one primary key• but sometimes the primary key can be made up of several fields• concatenation means putting two things next to one another: the concatenation of “burger” and “foo” is “burgerfoo”.• consider the following table of cars

• LicensePlateLicensePlate is not a possible primary key, because two different cars can have the same license plate number if they’re from different states • but if we concatenate LicensePlateLicensePlate and StateState, the resulting value of ((LicensePlateLicensePlate, State), State) must be unique:

•• example: example: ““LVR120NJLVR120NJ”” and and ““LVR120CTLVR120CT””•• therefore, ((LicensePlateLicensePlate, State) is a possible primary key, State) is a possible primary key(a candidate key)• Sometimes we may invent a new attribute to serve as a primary key (sometimes called a synthetic key)

• if no suitable primary key is available• or, to avoid composite keys• in Access, “Autonumber” fields can serve this purpose

LicensePlate State Make Model Year LVR120 NJ Honda Accord 2003 BCX50P NJ Buick Regal 1998 LVR120 CT Toyota Corolla 2002 908HYY MA Ford Windstar 2001 UHP33X NJ Nissan Altima 2006

Page 4: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 4

RELATIONAL DATABASE DESIGN

Foreign KeysForeign Keys• a foreign key is an attribute or a collection of attributes whose value are intended to match the primary key of some related record (usually in a different table)• example: the STATE and CITY table below

STATE table:

State Abbrev

StateName

Union Order

StateBird

State Population

CT Connecticut 5 American robin 3,287,116 MI Michigan 26 robin 9,295,297 SD South Dakota 40 pheasant 696,004 TN Tennessee 16 mocking bird 4,877,185 TX Texas 28 mocking bird 16,986,510 CITY table:

State Abbrev

CityName

City Population

CT Hartford 139,739 CT Madison 14,031 CT Portland 8,418 MI Lansing 127,321 SD Madison 6,257 SD Pierre 12,906 TN Nashville 488,374 TX Austin 465,622 TX Portland 12,224

• primary key in STATE relation: StateAbbrevStateAbbrev• primary key in CITY relation: (StateAbbrevStateAbbrev, CityNameCityName)• foreign key in CITY relation: StateAbbrevStateAbbrev

Page 5: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 5

RELATIONAL DATABASE DESIGN

Outline NotationOutline NotationSTATE(StateAbbrev, StateName, UnionOrder,

StateBird, StatePopulation)CITY(StateAbbrev, CityName, CityPopulation)

StateAbbrev foreign key to STATE

• Underline all parts of each primary key• Note foreign keys with “attribute foreign key to TABLE”

EntityEntity--Relationship DiagramsRelationship Diagrams

• one-to-many relationships: to determine the direction, always start with “one”

• “one city is in one state”• “one state contains many cities”

• the foreign key is always in “the many” – otherwise it could not be atomic (it would have to be a list)• We will study other kinds of relationships (one-to-one and many-to-many) shortly

Page 6: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 6

RELATIONAL DATABASE DESIGN

Functional DependencyFunctional Dependency• attribute B is functionally dependentfunctionally dependent on attribute A if

given a value of attribute A, there is only one possiblecorresponding value of attribute B

• that is, any two rows with the same value of A must have the same value for B

• attribute A is the determinantdeterminant of attribute B if attribute B is functionally dependent on attribute A

• in the STATE relation above, StateAbbrevStateAbbrev is a determinant of all other attributes

• in the STATE relation, the attribute StateNameStateName is also a determinant of all other attributes

• so, StateAbbrevStateAbbrev and StateNameStateName are both candidate keys for STATE• in the CITY relation above, the attributes (StateAbbrevStateAbbrev, , CityNameCityName)) together are a determinant of the attribute CityPopulationCityPopulation• in the CITY relation, the attribute CityNameCityName is not a

determinant of the attribute CityPopulationCityPopulation because multiple cities in the table may have the same name

Page 7: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 7

RELATIONAL DATABASE DESIGN

Dependency DiagramsDependency Diagrams• a dependency diagram or bubble diagram is a pictorial representation of functional dependencies

• an attribute is represented by an oval• you draw an arrow from A to B when attribute A

is a determinant of attribute B• example: when students were only allowed one sports activity, we have ACTIVITY(StudentID, Activity, Fee)

• example: when students can have multiple activities, we have ACTIVITY(StudentID, Activity, Fee)

StudentID

Fee

Activity

Page 8: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 8

RELATIONAL DATABASE DESIGN

• a partial dependencypartial dependency is a functional dependency whose determinant is part of the primary key (but not all of it)

• example: ACTIVITY(StudentID, Activity, Fee)

Partial DependenciesPartial Dependencies

Transitive DependenciesTransitive Dependencies• a transitive dependennsitive dependency is a functional dependency whose determinant is not the primary key, part of the primary key, or a candidate key• example: ACTIVITY(StudentID, Activity, Fee)

StudentID

Fee

Activity

Page 9: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 9

RELATIONAL DATABASE DESIGN

Database AnomaliesDatabase Anomalies• anomaliesanomalies are problems caused by bad database design

example: ACTIVITY(StudentID, Activity, Fee) StudentID Activity Fee

100 Skiing 200 100 Golf 65 175 Squash 50 175 Swimming 50 200 Swimming 50 200 Golf 65

• an insertion anomalyinsertion anomaly occurs when a row cannot be added to a relation, because not all data are available (or one has to invent “dummy” data)

• example: we want to store that scuba diving costs $175, but have no place to put this information until a student takes up scuba-diving (unless we create a fake student)

• a deletion anomalydeletion anomaly occurs when data is deleted from a relation, and other critical data are unintentionally lost

• example: if we delete the record with StudentID = 100, we forget that skiing costs $200

• an update anomalyupdate anomaly occurs when one must make many changes to reflect the modification of a single datum

• example: if the cost of swimming changes, then all entries with swimming Activity must be changed too

Page 10: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 10

RELATIONAL DATABASE DESIGN

Cause of AnomaliesCause of Anomalies• anomalies are primarily caused by:

• data redundancy: replication of the same field inmultiple tables, other than foreign keys

• Functional dependencies whose determinants are not candidate keys, including

• partial dependency• transitive dependency

• example: ACTIVITY(StudentID, Activity, Fee) StudentID Activity Fee

100 Skiing 200 100 Golf 65 175 Squash 50 175 Swimming 50 200 Swimming 50 200 Golf 65

• ActivityActivity by itself is not a candidate key, so we get anomalies (in this case, from a partial dependency)

StudentID

Fee

Activity

Page 11: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 11

RELATIONAL DATABASE DESIGN

Fixing Anomalies (Normalizing)Fixing Anomalies (Normalizing)• Break up tables so all dependencies are from primary (or candidate) keys

PARTICIPATING(StudentID, Activity)Activity foreign key to ACTIVITIES

ACTIVITY(Activity, Fee)

StudentID Activity Activity Fees 100 Skiing Skiing 200 100 Golf Golf 65 150 Swimming Swimming 50 175 Squash Squash 50 175 Swimming ScubaDiving 200 200 Swimming200 Golf

Page 12: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 12

RELATIONAL DATABASE DESIGN

• the above relations do not have any of the anomalies• we can add the cost of diving in ACTIVITIES

even though no one has taken it in STUDENTS• if StudentIDStudentID 100 drops Skiing, no skiing-related data

will be lost• if the cost of swimming changes, that cost need

only be changed in one place only (the ACTIVITIES table)

• the ActivityActivity field is in both tables, but that’s needed to relate (“join”) the information in the two tables

StudentID Activity Activity Fees 100 Skiing Skiing 200 100 Golf Golf 65 150 Swimming Swimming 50 175 Squash Squash 50 175 Swimming ScubaDiving 200 200 Swimming200 Golf

Page 13: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 13

RELATIONAL DATABASE DESIGN

Good Database Design PrinciplesGood Database Design Principles1. no redundancyno redundancy

• a field is stored in only one table, unless it happens tobe a foreign key

• replication of foreign keys is permissible, because they allow two tables to be joined together

2. no no ““badbad”” dependenciesdependencies• in the dependency diagram of any relation in the database, the determinant should be the whole primary key, or a candidate key. Violations of this rule include:

• partial dependencies• transitive dependencies

normalizationnormalization is the process of eliminating “bad”dependencies by splitting up tables and linking them with foreign keys

• “normal forms” are categories that classify how completely a table has been normalized• there are six recognized normal forms (NF):

First Normal Form (1NF)Second Normal Form (2NF)Third Normal Form (3NF)Boyce-Codd Normal Form (BCNF)Fourth Normal Form (4NF)Fifth Normal Form (5NF)

Page 14: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 14

RELATIONAL DATABASE DESIGN

First Normal FormFirst Normal Form• a table is said to be in the first normal form (1NF)first normal form (1NF)if all its attributes are atomic. Attributes that are not atomic go by the names

• Nested relations, nested tables, or sub-tables• Repeating groups or repeating sections• List-valued attributes

• example of a table that is not in first normal form:

ClientID

Client Name VetID VetName PetID PetName PetType

2173 Barbara Hennessey 27 PetVet 123

SamHooberTom

BirdDogHamster

4519 Vernon Noordsy 31 PetCare 2 Charlie Cat8005 Sandra Amidon 27 PetVet 1

2BeeferKirby

DogCat

8112 Helen Wandzell 24 PetsRUs 3 Kirby Dog

CLIENT(ClientD, ClientName, VetID, VetName, PET(PetID, PetName, PetType) )

• This kind of nested or hierarchical form is a very natural way for people to think about or view data.• However, the relational database philosophy claims that it may not be a very good way for computers to store some kinds of data. • Over the years, a lot of information systems have stored data in this kind of format – but they were not relationaldatabases

Page 15: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 15

RELATIONAL DATABASE DESIGN

• In order to eliminate the nested relation, pull out the nested relation and form a new table• Be sure to include the old key in the new table so that you can connect the tables back together.

CLIENT(ClientD, ClientName, VetID, VetName)PET(ClientID, PetID, PetName, PetType)

ClientID foreign key to CLIENT

ClientName

VetName

PetName

PetID

CLIENT

PET

VetID

PetType

ClientID

ClientID

• In this particular example, note that PetIDPetID is only unique within sets of pets with the same owner.

Page 16: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 16

RELATIONAL DATABASE DESIGN

Second Normal Form

StudentID

Fee

Activity

• Recall: a partial dependency occurs when• You have a composite primary key• A non-key attribute depends on part of the primary key, but not all of it

• A table in 1NF is said to be in the second normal form second normal form (2NF)(2NF) if it does not contain any partial dependencies. • Example of a partial dependency:

ACTIVITY(StudentID, Activity, Fee) on pages 6, 7, and 9

• Our new CLIENT-PET database does not have any partial dependencies• So, it already in second normal form • But it still has a transitive dependency :

ClientName

VetNameVetIDClientID

Page 17: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 17

RELATIONAL DATABASE DESIGN

Third Normal FormThird Normal Form• Recall: a transitive dependency happens when a non-key attribute depends on another non-key attribute, and that attribute could not have been used as an alternative primary key (or the same thing for a composition of several attributes).• A table of 2NF is said to be in the third normal form (3NF)third normal form (3NF) if it does not contain any transitive dependencies, • In order to eliminate transitive dependency, we split the CLIENTS table again:

CLIENTS(ClientID, ClientName, VetID)VetID foreign key to VET

PETS(ClientID, PetID, PetName, PetType)ClientID foreign key to CLIENT

VETS(VetID, VetName)

ClientID

ClientName

VetName

PetName

PetID

CLIENT

PET

VetID

PetType

VET

ClientID

VetID

Page 18: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 18

RELATIONAL DATABASE DESIGN

Third Normal Form (Cont.)Third Normal Form (Cont.)• CLIENTS-PETS-VETS database in third normal form:

VetID VetName27 PetVet31 PetCare24 PetsRUs

Client ID

Client Name

VetID

2173 Barbara Hennessey 27 4519 Vernon Noordsy 31 8005 Sandra Amidon 27 8112 Helen Wandzell 24

Client ID

PetID PetName PetType

2173 1 Sam Bird 2173 2 Hoober Dog 2173 3 Tom Hamster 4519 2 Charlie Cat 8005 1 Beefer Dog 8005 2 Kirby Cat 8112 3 Kirby Dog

• the database consists of three types of entities, stored as distinct relations in separate tables:

• clients (CLIENTS)• pets ( PETS)• vets (VETS)

• there is no redundancy (only foreign keys are replicated)• there are no partial and transitive dependencies

with MS Access table relationships

Page 19: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 19

RELATIONAL DATABASE DESIGN

Normal Forms and NormalizationNormal Forms and Normalization

• The distinctions between third normal form (3NF), Boyce-Codd normal form (BCNF), fourth normal form (4NF), and fifth normal form (5NF) are subtle.• They have to do with overlapping sets of attributes that could be used as primary keys (composite candidate keys).• For our purposes, it’s enough to know about 3NF.

• You need to be able to put a database in 3NF.• That is more important than recognizing 1NF and 2NF

• Key factors to recognize 3NF:

• All attributes atomic – gives you 1NF.

• Every determinant in every relationship is the whole primary key (or could have been chosen as an alternative primary key) – guarantees no partial or transitive dependencies.

• Redesigning a database so it’s in 3NF is called normalization.

Page 20: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 20

RELATIONAL DATABASE DESIGN

Example With Multiple Candidate KeysExample With Multiple Candidate Keys

• The dependencies SocialSecuritySocialSecurity## → GenderGender and SocialSecuritySocialSecurity## → BirthDateBirthDate are not considered transitive because we could have chosenSocialSecuritySocialSecurity## as the primary key for the table.• This kind of design will not give rise to anomalies.

DRIVER(License#, SocialSecurity#, Gender, BirthDate)

License# SocialSecurity#

Gender

BirthDate

Page 21: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 21

RELATIONAL DATABASE DESIGN

Normalization Example: Hardware Store Normalization Example: Hardware Store DatabaseDatabase

• the ORDERS table :

Order Numb

Cust Code

Order Date

Cust Name

ProdDescr Prod Price

Quantity

10001 5217 11/22/94 Williams Hammer $8.99 2 10001 5217 11/22/94 Williams Screwdriver $4.45 1 10002 5021 11/22/94 Johnson Clipper $18.22 1 10002 5021 11/22/94 Johnson Screwdriver $4.45 3 10002 5021 11/22/94 Johnson Crowbar $11.07 1 10002 5021 11/22/94 Johnson Saw $14.99 1 10003 4118 11/22/94 Lorenzo Hammer $8.99 1 10004 6002 11/22/94 Kopiusko Saw $14.99 1 10004 6002 11/22/94 Kopiusko Screwdriver $4.45 2 10005 5021 11/23/94 Johnson Cordlessdrill $34.95 1

• Note: in practice, we would also want to have product codes as well as descriptions, and use the product codes as keys to identify products. Here, we’ll identify products by their ProdDescr to keep the number of fields down.

Page 22: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 22

RELATIONAL DATABASE DESIGN

Example: Hardware Store Database (Cont.)Example: Hardware Store Database (Cont.)ORDERS(OrderNum, ProdDescr,

CustCode, OrderDate, CustName, ProdPrice, Quantity)

• Conversion of the hardware store database to 2NFQUANTITY(OrderNum, ProdDescr, Quantity)

OrderNum foreign key to ORDERSProdDescr foreign key to PRODUCTS

PRODUCTS(ProdDescr, ProdPrice)ORDERS(OrderNum, CustCode, OrderDate, CustName)

Quantity

OrderNum

ProdDescr

ProdDescr ProdPrice

OrderDate

CustNameCustCode

Transitive

OrderNum

Page 23: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 23

RELATIONAL DATABASE DESIGN

Example: Hardware Store Database (Cont.)Example: Hardware Store Database (Cont.)• conversion of the ORDERS relation to 3NF

QUANTITY(OrderNum, ProdDescr, Quantity)OrderNum foreign key to ORDERSProdDescr foreign key to PRODUCTS

PRODUCTS(ProdDescr, ProdPrice)ORDERS(OrderNum, CustCode, OrderDate)

CustCode foreign key to CUSTOMERSCUSTOMERS(CustCode, CustName)

Quantity

OrderNum

ProdDescr

ProdDescr ProdPrice

OrderDate

CustName

CustCode

OrderNum CustCode

Page 24: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 24

RELATIONAL DATABASE DESIGN

CustomerID

Phone LastName

FirstName

Address City State ZipCode

1 502-666-7777 Johnson Martha 125 Main St. Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm St. Bowling

GreenKY 42101

3 502-777-7575 Washington Elroy 95 Easy St. Smith’sGrove

KY 42171

4 502-333-9494 Adams Samuel 746 Brown Dr. Alvation KY 421225 502-474-4746 Steinmetz Susan 15 Speedway Dr. Portland TN 37148….. ……. …… …… …… ….. ….. …..

Trans ID

Rent Date

Customer ID

Video ID

Copy# Title Rent

1 4/18/95 3 1 2 2001:SpaceOdyssey $1.50 1 4/18/95 3 6 3 Clockwork Orange $1.50 2 4/18/95 7 8 1 Hopscotch $1.50 2 4/18/95 7 2 1 Apocalypse Now $2.00 2 4/18/95 7 6 1 Clockwork Orange $1.50 3 4/18/95 8 9 1 Luggage of the Gods $2.50 ….. ……. …… …… …… ….. …..

• a customer can rent multiple videos as part of the same transaction

• multiple copies of the same video exist• the copy#copy# field stores the number of the copy – unique only with copies of that same video• one customer cannot rent two copies of the same video at the same time

• although it has two tables, the database still contains some anomalies

Example: Video Store DatabaseExample: Video Store Database• the CUSTOMER relation:

• the RENTALFORM relation:

Page 25: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 25

RELATIONAL DATABASE DESIGN

Example: Video Store Database (Cont.)Example: Video Store Database (Cont.)• relations for the video store database

• CUSTOMER(CustomerID, Phone, Name, Address, City, State, ZipCode)

• RENTALFORM(TransID, RentDate, CustomerID, VideoID, Copy#, Title, Rent)

• dependency diagram for the video store database

Copy#

RentTitle

Phone Name Address

City State Zip

Customer ID

RentDate Customer ID

VideoID

TransID

Page 26: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 26

RELATIONAL DATABASE DESIGN

Example: Video Store Database (Cont.)Example: Video Store Database (Cont.)• video store database after eliminating partial and transitive dependencies

CUSTOMER(CustomerID, Phone, Name, Address, City, State, ZipCode)

RENTAL(TransID, RentDate, CustomerID)CustomerID foreign key to CUSTOMER

VIDEO(VideoID, Title, Rent)VIDEOSRENTED(TransID, VideoID, Copy#)

TransID foreign key to RENTALVideoID foreign key to VIDEO

Copy#

Phone Name Address

City State Zip

Customer ID

VideoID

RentDate

Customer ID

RentTitleVideoID

TransID

TransID

Page 27: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 27

RELATIONAL DATABASE DESIGN

Example: Video Store Database (Cont.)Example: Video Store Database (Cont.)

• table relationships for the video store database

Page 28: RELATIONAL DATABASE DESIGN Good Database Design Principles

Relational Database Design 28

RELATIONAL DATABASE DESIGN

Summary of Guidelines for Database DesignSummary of Guidelines for Database Design• identify the entities involved in the database• identify the fields relevant for each entity and define the

corresponding relations• determine the primary key of each relation• avoid data redundancy, but have some common fields so

that tables can be joined together• ensure that all the required database processing can be

done using the defined relations• normalize the relations by splitting them into smaller ones