Database Theory and Terminology, Part 2

53
Database Theory and Terminology, Part 2

description

Database Theory and Terminology, Part 2. How Many Tables?. Databases for real businesses tend to have a lot of tables, but not always the right number. Normalization generally results in more tables. - PowerPoint PPT Presentation

Transcript of Database Theory and Terminology, Part 2

Page 1: Database Theory  and Terminology, Part 2

Database Theory and Terminology, Part 2

Page 2: Database Theory  and Terminology, Part 2

How Many Tables?• Databases for real businesses tend to have a

lot of tables, but not always the right number.• Normalization generally results in more tables.• However, beginning database designers

frequently create too many tables in ways that have nothing to do with normalization. The most common of these are:– Using two tables in a one-to-one relationship.– Making separate tables based on an attribute.

Page 3: Database Theory  and Terminology, Part 2

One-to-one Relationship• A one-to-one relationship between two tables is when each record

in one table corresponds to one or zero records in the other table.• One-to-one relationships can legitimately be used in

supertype/subtype situations (coming soon), and rarely in other situations.

• Beginners frequently use them unnecessarily, using two tables where only one is needed.

• The next slide gives an example.• The two tables on top are in a one-to-one relationship on the

StudentId field.• This only complicates the database. The tables are easily combined

into one.

Page 4: Database Theory  and Terminology, Part 2

BAD EXAMPLE!

BETTER!

Page 5: Database Theory  and Terminology, Part 2

Separating Tables by an Attribute

• The most common type of error (at least for 373 students) is creating multiple tables for a single entity, separating the records based on the value of a single attribute.

• This results in a database with a lot of tables which is slow and difficult to query.

• Several examples follow.

Page 6: Database Theory  and Terminology, Part 2

Too Many Tables• It is not uncommon for beginning database designers to think that different

tables are used to represent different categories.• Here is a design for a database meant to hold the chemical elements.

• As you can see, each table has exactly the same fields.• The only thing separating the tables is the “Series” of the elements—Actinides,

NobleGases, Nonmetals, etc.• By recognizing that Series is really just another attribute of elements, all of these tables

can be combined into one table containing all elements.

BAD EXAMPLE!

Page 7: Database Theory  and Terminology, Part 2

• Adding a “Series” column allows all of the elements to be stored in a single table.

GOOD EXAMPLE!

Page 8: Database Theory  and Terminology, Part 2

Same fields? One table.• Obviously, these “tables” came from the Elements

table, which is where the data actually belongs.

• Note that the Elements table has all of the same fields as each table on the previous slide, plus a Series field. This allows elements from all series to be stored in a single table which is more efficient and easier to query.

• At least at the level of chemistry we are looking at here, “Series” is an attribute of the “Element” entity; not an entity in itself.

• Breaking up a single entity into multiple tables based on one attribute is bad database design.

Page 9: Database Theory  and Terminology, Part 2

Same Fields--Baseball

BAD EXAMPLE!

Page 10: Database Theory  and Terminology, Part 2

• This is a big improvement over the previous slide; however• The “TEAM” field is not a good choice to be a part of the primary key, since it uses the names of the teams.• If this were a database actually used by Major League Baseball or ESPN, teams would be assigned a

TeamNumber surrogate key which would be used in all related tables (like players, schedules, results).

BETTER EXAMPLE

Page 11: Database Theory  and Terminology, Part 2

Same Fields--Players

BAD EXAMPLE!

Page 12: Database Theory  and Terminology, Part 2

• In a simple database, this could be an acceptable table. It has all of the sports in a single table, and it has a good primary key.

• However, in a more heavy-duty database, StudentName would likely be divided into LastName, FirstName, and MiddleInitial fields, and SportName would be replaced with a SportID foreign key field which would link to a Sports table.

OKAY EXAMPLE

Page 13: Database Theory  and Terminology, Part 2

Multi-Single Table Parents

BAD EXAMPLE!

BETTER EXAMPLE

Page 14: Database Theory  and Terminology, Part 2

Same Fields, Same Table

• If you have two tables that have exactly the same fields, they almost certainly represent the same entity. Therefore,

• The tables should be combined, adding a field to hold the attribute that you had used to separate them.

Page 15: Database Theory  and Terminology, Part 2

Different Fields? Different Tables.

• The Customers and Products entities from GuateTours have no attributes in common.

• Trying to put them into the same table would make no sense.

• It would also violate every conceivable level of normalization.

Page 16: Database Theory  and Terminology, Part 2

But… Isn’t there something in between “all fields” and “no fields” in common?

• Good question! How about the Customers and Employees tables in GuateTours?

They share three fields in common, and even the primary keys are pretty similar. Should we combine them into a single table or not?

Page 17: Database Theory  and Terminology, Part 2

Another good question!

• In this case, we could try to combine employees and customers into a single Persons table, with a “PersonType” field to tell us whether a particular record is an employee or a customer.

• However, this ends up with a lot of blank cells, and some confusion as well. Who is the next customer’s boss? What is employee Jose’s PartySize?

Page 18: Database Theory  and Terminology, Part 2

Separate Tables

• In this case, keeping Employees and Customers in separate tables is the right choice.

• They have enough different fields, and • It is unlikely that anyone will run frequent queries to

get information from both fields, such as a list of the names and phone numbers of all employees and customers.

• Although both are examples of people, to the business they are treated completely differently.

• Therefore, separate tables.

Page 19: Database Theory  and Terminology, Part 2

Super Types and Sub Types

• This example is based on pages 184 to 188 of “Databases Demystified” (available on reserve).

• Here’s the relationship diagram; explanation on the following slides:

Page 20: Database Theory  and Terminology, Part 2

Super Types and Sub Types

• The Customer table is called a “super type”; it contains the fields shared by all types of customers of a particular business.

• The IndividualCustomer and CommercialCustomer tables are called “sub types”; they contain fields specific to those types of customers.

Page 21: Database Theory  and Terminology, Part 2

Super Types and Sub Types

• Both sub type tables are linked to the Customer table with one-to-one relationships; every customer in either sub type is matched with a single record in the Customer table, and each customer in the Customer table appears at most once in a sub type table.

Page 22: Database Theory  and Terminology, Part 2

Super Types and Sub Types

• After you have learned to create queries in Access and using SQL, you will see that:– We could easily recreate a

“complete” list of Individual Customers by running an INNER JOIN query between the IndividualCustomer and Customer tables.

– We can also quickly prepare a mailing or calling list for all customers with a simple query on the Customers table.

Page 23: Database Theory  and Terminology, Part 2

One-to-One Relationships

• The relationship between a super type table and its related sub type tables is one-to-one.

• Each record in one table corresponds to at most one record in the related table.

• The relationship between a supertype and its subtypes is one of the few places where it is necessary or appropriate to have one-to-one relationships.

Page 24: Database Theory  and Terminology, Part 2

Super Types, Sub Types Summary• Breaking up the Customer table into subtypes while retaining

common fields in the super type Customer table makes sense.– It provides organization, recognizing that the two types of customers

share attributes, but– It also avoids the confusion that would be caused if all customers were

included in a single table (what is the CompanyType of an individual?).– For many purposes, a company will treat all customers the same way

(mailings, sale prices).– In contrast, most businesses would not treat customers and employees

the same way:• not only would many fields be different, but • how they are used in the database is different. Therefore,• keeping them in separate tables is appropriate.

Page 25: Database Theory  and Terminology, Part 2

Lookup Tables• I cropped this part of the

relationship diagram out of the earlier slides.

• This shows that the “CompanyType” field of the CommercialCustomer table is related to the only field in the CustomerTypes table.

• That table is called a “Lookup” table—a limited set of values from which a particular field should be chosen.

Page 26: Database Theory  and Terminology, Part 2

Lookup Tables• It also common to have a two-

field lookup table—the allowable values along with a numeric primary key.

• The advantage of either type of lookup table is that it doesn’t allow database users to make up their own entries, which might be incorrect, misspelled, or otherwise inappropriate.

Page 27: Database Theory  and Terminology, Part 2

Lookup tables

• The table below demonstrates what can happen if you use text fields instead of lookups.

• Try writing a query to find all sole proprietorships in that table! (Assuming there are a lot more records.) Actually, don’t.

There are constructs in programming very similar to lookup tables. Anyone know what they are called? (Jeopardy music…)

Page 28: Database Theory  and Terminology, Part 2

Redundancy is Bad in Tables, Not in Lectures!

• Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED.

• Most “tables” you have seen—in books, in lectures, on the web—were probably optimized for display, not for storage.

• Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance.

• When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.

Page 29: Database Theory  and Terminology, Part 2

Relationships• In the Guate Tours database, go to the Database Tools

tab on the ribbon.• Click on “Relationships”. You should see this:

Page 30: Database Theory  and Terminology, Part 2

What the relationship diagram shows

• This is the relationship diagram for this database.• This diagram basically tells Access which fields in a

table are foreign keys—that is, which fields are primary keys of other tables.

• For example, the EmployeeID field in the Tours table is a foreign key—it is linked to the primary key of the Employees table.

• The “1” and the “” symbol indicate that this relationship is “one-to-many”

• That is, each tour has one employee, but each employee can work on many tours.

Page 31: Database Theory  and Terminology, Part 2

What Relationships Are• The technical term for a relationship is “foreign-key

constraint”• This means that when you place a value in a foreign-key

field, it should have a matching primary-key value in the related table.

• For example, we assign an employee to a tour by putting his/her EmployeeID number in the EmployeeID field in the Tours table.

• The relationship (foreign-key constraint) requires that the matching EmployeeID already exists in the Employees table.

Page 32: Database Theory  and Terminology, Part 2

Examining Relationships• If you right-click on one of the relationship lines, a context menu appears:

• Selecting “Edit Relationship” brings up this window:

• It shows the fields in the two tables that are related.

Page 33: Database Theory  and Terminology, Part 2

Enforce Referential Integrity

• “Enforce Referential Integrity” means that you are in a serious relationship; you’re not going to get out of this one easily!

• If you check this (as I will require you to do for assignments), Access will not allow you to enter a value which doesn’t exist in the related table.

• You see that the Tours table’s EmployeeID field is related to the Employees table’s primary key.

• Watch what happens if I try to assign a tour to a non-existent employee.

Page 34: Database Theory  and Terminology, Part 2

Access as Assistant

• “You cannot add or change a record because a related record is required in table ‘Employees.’

• In other words, Access is telling me “You asked me to enforce referential integrity, and that’s what I’m doin’! You gotta problem with that?”

• Basically, Access is helping me to teach you about foreign keys. One of the things you’ll learn to hate about Access, but I’ve learned to like.

Page 35: Database Theory  and Terminology, Part 2

Creating Relationships• To create relationships, you need to open the

Relationships window. You do this from the Database Tools tab on the ribbon.

• The easiest way to create a relationship is to drag a field from one table to another. The relationship properties box will appear:

Page 36: Database Theory  and Terminology, Part 2

Referential Integrity Must Be Enforced!

• As I said before, I will require you to check the “Enforce Referential Integrity” box in your relationships. This will accomplish three things:

1. It will protect the integrity of your data.2. It will give Access the opportunity to teach

you a lesson or two.3. It will annoy and frustrate you at times.

Page 37: Database Theory  and Terminology, Part 2

Cascading• I don’t want you to check the two other checkboxes: Cascade Update Related

Fields and Cascade Delete Related Records.• Cascade Update might happen if you changed a primary key value. Perhaps

you have a customer named Joe Superstitious, who just happens to have been assigned customer number 13. He thinks that’s bad luck, so you agree to change it for him. Cascade updates would cause all records in related tables (Orders, for example) to change CustomerID values of 13 to his new CustomerID.

• Maybe he’s so superstitious he won’t ever shop with you again; he wants to cancel his account. Cascade Delete would remove all related records, such as all the orders that Joe had placed over the years.

• There are other ways to deal with these situations (simply adding a True/False “Active” column to the Customers table does the trick). Cascade update and delete destroy data and are therefore dangerous and not recommended.

Page 38: Database Theory  and Terminology, Part 2

Relationship Types

• The most common type of relationships in databases are one-to-many and many-to-many.

• Oftentimes the distinction depends on how the business is run. In our example, the Employees to Tours relationship is one-to-many: One employee can work on many tours, but each tour has only one employee assigned.

Page 39: Database Theory  and Terminology, Part 2

Many-to-Many Relationships• If your tours became larger, it is certainly possible that

you might have more than one employee assigned to a tour. The relationship would then be many-to-many. One employee can work many tours, and one tour can have many employees.

• The Guate Tours database already has two many-to-many relationships: Customers-Tours, and Orders-Products. A tour usually has many customers, and a customer can sign up for many tours. An order can contain many types of products, and a particular product can be a part of many orders.

Page 40: Database Theory  and Terminology, Part 2

Representing Many-to-Many Relationships

• Access won’t allow you to directly define a many-to-many relationship (neither will any other DBMS)

• Many-to-many relationships are created using an intersection table: a table with a compound primary key which is composed of the primary keys of the two related tables.

• The intersection table is then related to each of the other tables with one-to-many relationships.

Page 41: Database Theory  and Terminology, Part 2

Many-to-Many Examples• Look back at the Relationships diagram in GuateTours.• The two intersection tables (which implement the many-to-

many relationships) are CustomerTour and OrderDetails.• Note that the primary key of CustomerTour includes the

keys from the two related tables PLUS the TourDate (since a customer might take the same tour more than once).

• The primary key of OrderDetails is composed of the primary keys of Orders and Products. Quantity is included here because it is a property of the combination of the order and the product: How many of THIS product are included in THIS order.

Page 42: Database Theory  and Terminology, Part 2

A Business Decision• Whether a relationship is one-to-many or many-to-many is

frequently a business decision.• Suppose that you buy your office supplies from Office Depot, Office

Max, or Staples.• For simplicity, you buy all of your paper from Office Depot, all of your

printer supplies from Office Max, and all of your tacks and staples from Staples.

• In this case, each supplier supplies many products, but each product comes from only one supplier. This is a one-to-many relationship:

Page 43: Database Theory  and Terminology, Part 2

More flexible, more complex• Using only one supplier for each product is

simple, but it could be costing you money. Why not buy all products from all suppliers when they are on sale?

• This creates a many-to-many relationship:

• Note that Price has been moved to the intersection table, since the price for each product may vary from store to store.

Page 44: Database Theory  and Terminology, Part 2

Many-to-Many• It is harder to design many-to-many relationships,

and to write application code for them; However• Chances are that in many cases where you think

that a one-to-many relationship is enough, you will eventually need the flexibility of a many-to-many relationship.– Will employees really do ONLY one thing? – Will players play ONLY one position?

• If the answer is “Maybe not,” use a many-to-many relationship.

Page 45: Database Theory  and Terminology, Part 2

Reflexive Relationships• Sometimes, a field in a table relates to another field in the same

table.• This usually indicates some sort of hierarchy within the records in

the table.• In GuateTours, I added a BossID field to the Employees table. This

field gets filled with the EmployeeID of that employee’s boss.• Some DBMS’s allow you to draw relationship diagrams which show

reflexive relationships directly—an arrow from BossID up to EmployeeID.

• Access doesn’t let you do this. To show a reflexive relationship, you must show a second copy of the Employees table and create the relationship between the original and the copy.

Page 46: Database Theory  and Terminology, Part 2

Quick Review

• You have now been introduced to much of the theory and terminology of relational databases.

• Being comfortable with the terminology will be crucial to your understanding the theory and practice of database design using third normal form.

• Therefore, here’s a quick review of some of the definitions you’ve seen (and will need to know for the rest of this lecture, as well as for exams):

Page 47: Database Theory  and Terminology, Part 2

Definitions• Database: a database is a collection of interrelated data

items that are managed as a single unit.• Relational Database: A collection of tables, the

relationships between them, and auxiliary items such as views and stored procedures. The tables are organized according to the principles first described by E.F. Codd.

• DBMS: Database Management System—the computer software that organizes the data on computers and manages access to it. Examples include Oracle, MySql, DB2, and Microsoft’s SQL Server (for large-scale databases) and Access (for smaller databases).

Page 48: Database Theory  and Terminology, Part 2

Definitions

• Relation: A set of ordered tuples. Relations are represented by tables in databases (not by relationships!)

• Entity: A generic noun, representing a class of things, but not one particular thing.

• Attribute, Field, Column: Properties possessed by entities. These are known as “fields” or “columns” in database tables.

Page 49: Database Theory  and Terminology, Part 2

Definitions

• Tuple, Record, Row: The theorist’s “tuple” becomes a “record” or “row” in a database table.

• The three anomalies: Insert, Update, and Delete. These are caused by trying to store information about more than one entity in a single table. We’ll look at these further next week.

• 3NF: Third Normal Form. This will be the main topic for next week.

Page 50: Database Theory  and Terminology, Part 2

Definitions• OLTP: Online Transaction Processing. This type of database is

used in the day-to-day operation of a business. It is designed to handle frequent changes, frequent requests for small amounts of data, and multiple concurrent users. It is the type of database that requires 3NF, and what we will be discussing next week.

• OLAP: Online Analytical Processing. Databases composed of historical data which isn’t being constantly updated. OLAP databases are used for analyzing performance, not for day-to-day operations. They do not require 3NF.

• Normalization: Modifying the design of a database so that its tables are in 3NF.

Page 51: Database Theory  and Terminology, Part 2

Definitions• Table Design: Defining the fields that make up a table, including

identifying data types and assigning primary keys.• Populating a Table: Adding rows of data to a table.• Constraint: A restriction on values that can be entered into a

column. Setting the data type is one type of constraint; adding numeric ranges or min/max text lengths is another; and primary and foreign keys are a third type of constraint.

• Primary Key: One or more columns in a table which (together) uniquely identify a row (distinguish it from all others in the table).

• Candidate Key: Any field or combination of fields that could serve as the primary key.

Page 52: Database Theory  and Terminology, Part 2

Definitions

• Simple Key: A primary key consisting of one field.• Compound Key: A primary key consisting of two

or more fields.• Natural Key: A pre-existing or ready-made field

which can serve as the primary key for a table.• Surrogate Key: A field (usually numeric) added to

a table as the primary key when no natural keys are available.

Page 53: Database Theory  and Terminology, Part 2

Definitions• Foreign Key: a field (or fields) in a table that is not the primary

key in that table, but IS the primary key in another table.• Referential Integrity: This is a property of a relationship in

Access which tells Access to take the relationship seriously by enforcing the foreign-key constraint. Entering a value in the foreign-key column of one table will require that that value already exist in the primary-key column of the other.

• Intersection Table: A table used to implement a many-to-many relationship. The primary key of the intersection table is the combination of the primary keys of the two related tables.