Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML,...

63
Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th , 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Transcript of Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML,...

Page 1: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

1

Instructor: Craig Duckett

Lecture 05: Tuesday, October 13th, 2015Transactions, Acid Test, DML, DDL, Normalization

BIT275: Database Design (Fall 2015)

Page 2: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

2

Assignment 1 is GRADED Woot! Woot!

• MID-TERM EXAM is LECTURE 10, Tuesday, November 3rd • Assignment 2 is due LECTURE 11, Thursday, November 5th,

in StudentTracker by MIDNIGHT

Page 3: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

3

Tuesday (LECTURE 5)• Database Design for Mere Mortals: Chapter 4

Thursday (LECTURE 6)• The Language of SQL: • Chapter 5: Sorting Data• Chapter 6: Column Based Logic

Page 4: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

4

• Transactions• The Acid Test• Data Manipulation Language (DML)• Data Definition Language (DDL)• Normalization• Denormalization

Page 5: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

5

Transactions and the ACID Test

Page 6: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Transactions and the ACID TestWhen we're entering or changing data in our databases, we'll often need to work with transactions. A transaction is an incredibly important thing in the world of databases. And to understand them, the best way is to think about what we think of as a transaction in the real world. If you talk about making a transaction, you often mean something commercial. You hand over $15 to a bookseller, the bookseller gives you the book. That's the transaction, and it's important that both of those things happen. If you hand over the money, you expect to get the book. If they hand you the book, they expect to get the money. Either both of these things happen, or neither of them do.

Now in a computing system, a classic example of transactions is a banking system. Imagine that you log on to your bank's website, and you want to transfer $2,000 from your savings account to your checking account.

Page 7: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Transactions and the ACID Test

Now, that's going to require two updates to this data, one to subtract $2,000 from the savings account, and the other to add $2,000 to the checking account. Now if the first part of this happens successfully, subtracting $2,000 and then we try and add $2,000, but there's a problem with the second part, perhaps it's locked for editing. Well, we will need to reverse the first part of this transaction.

Page 8: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Transactions and the ACID Test

We never want to be in a situation where $2,000 has been debited from the first account, but it didn't get added to the second. So, a transaction is how you define a combined unit of work, either both of these things happen or neither of them do, and the first change will be reversed instantly by the database if any part of the transaction fails.

Page 9: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Transactions and the ACID Test

Now, there's a common acronym that you'll come across when working with transactions in a database, ACID. A transaction must be Atomic, Consistent, Isolated, and Durable.

Being Atomic, and this is the Greek word atom meaning an indivisible unit, it refers to the core idea that this transaction must completely happen or not at all. So, whether there are two steps in the transaction or 20 steps, they're all contained within the transaction. They either all complete successfully, or they all return to the original state.

There is no such thing as a transaction that halfway occurs.

Page 10: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Transactions and the ACID TestNow whether the reason for the transaction failing is that the database had a power failure or ran out of space to hold a new data, or there was an application error, it doesn't matter. Atomic is the all or nothing rule. Now Consistency means that any transaction must take the database from one valid state to another valid state based on the rules of the database. So, even if a transaction is successfully atomic, it still cannot result in a situation that violates any of the integrity rules defined in a database.

Isolation refers to the data and the transaction being essentially locked for that moment in which the transaction is occurring. So, while we're trying to change a balance on one of our account records, another part of the system must not be allowed to access that data until the first transaction has finished. And durability refers to the transaction being considered robust. If it happens and the database says this transaction has happened successfully, then the transaction is guaranteed.

Say if you go to a travel website, purchase a flight on an aircraft, and you're guaranteed seat to A, then you should be able to regard that transaction as being durable. Even if half a second later, the database suffers a power failure and shuts down, when it reboots, that transaction will have survived that failure and they are not going to sell that seat to someone else simply because there was a glitch in the system half a second after you made your purchase. Now, the great thing about working with most Database Management Systems is these capabilities are built into the system. You don't have to worry about how to program these. You just need to know when to tell the database, this is a transaction; these three things, these four things, these 10 things must be done together. That's what we'll see how to do a little later on.

Page 11: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

11

Data Manipulation Language (DML)

Page 12: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)

A data manipulation language (DML) is a family of computer languages including commands permitting users to manipulate data in a database. This manipulation involves inserting data into database tables, retrieving existing data, deleting data from existing tables and modifying existing data. DML resembles simple English language and enhances efficient user interaction with the system.

So far, we've been focused on using the SELECT statement to read information out of a database, but that begs the question, how did the data get in there in the first place? We have several different keywords in SQL for inserting, updating, and deleting information. Now, this goes back to the idea that any computer system, not just databases that deals with storing data needs to provide four fundamental functions, the ability to Create, Read, Update, and Delete often using the acronym CRUD.

Page 13: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)Data manipulation languages have their functional capability organized by the initial word in a statement, which is almost always a verb. In the case of SQL, these verbs are:

SELECT ... FROM ... WHERE ... INSERT INTO ... VALUES ... UPDATE ... SET ... WHERE ... DELETE FROM ... WHERE ...

Well, if you want to work with CRUD in SQL, we don't use these exact words here. There are a couple of them that are different.

We know already that when we want to read information out of the database, we use the SQL keyword SELECT, and if we want to create, we're going to use the SQL keyword INSERT.

Now update and delete there's no conflict there. They are just UPDATE and DELETE in SQL.

Page 14: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)So, let's take a look at the ones we haven't covered yet. I'm going to talk about inserting first. So, the assumption is we're inserting a new row of data into a particular table and the general format is INSERT INTO, then your table name, then a list of columns and then the VALUES that you want to insert into those columns.

Page 15: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)So, if we're going to insert into this simple imaginary Employee table, I'll use the name of the table, and then inside parenthesis, the name of the columns that I'm going to provide values for.

Page 16: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)

Now, you specifically name columns because you don't always provide data for every single column in a new row. Some of these columns might be null. Some columns might provide their own default values and quite often you might have some columns that are automatically generating primary keys, and you certainly don't want to try and insert your own values in those situations. So, you name the columns you're going to provide values for.

Page 17: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)And then after the keyword values, you provide those values. If there are strings they need to be in single quotes, if they are numeric key, you don't need the quotes.

The columns, and you values must match in number, in order, and type. You can't say you're going to provide five columns and then only provide values for four. And also the data types must match. If the column is defined as a number, you can try and insert a string into that column.

Page 18: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)

If I go ahead and execute this, what it's going to do is create a new row in that table. Now, I'm making the assumption here that the way the database has been defined, Employee ID will be an automatically-generating primary key. So, we didn't need to provide a value there. Hire Date? We probably could have provided one, but I'm imagining that it was set up to do a default value of the today's date for this row to be created. And let's imagine that email was allowable to have null values there. So, if we didn't provide one, it's just null.

Page 19: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)If I go ahead and execute this, what it's going to do is create a new row in that table. Now, I'm making the assumption here that the way the database has been defined, Employee ID will be an automatically-generating primary key. So, we didn't need to provide a value there. Hire Date? We probably could have provided one, but I'm imagining that it was set up to do a default value of the today's date for this row to be created. And let's imagine that email was allowable to have null values there. So, if we didn't provide one, it's just null.

We don't have to rank manual insert statements for every new row in our database. It's much more likely that this process will be done programmatically. The people will be using a more pleasant user interface to enter data, an application, or web-based portal that used PHP or ASP pages, but this is the kind of insert statement that's going on behind the scenes.

http://faculty.cascadia.edu/cduckett/bit275/documentation/ToyAuction.sql.txtEXAMPLE:

Page 20: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)We know we can read information out of the database, but what if we want to change something? It's quite often that that we'll change information in the database once it has been created, and we use the update statement for that. The format here is UPDATE, table name. You're going to use the keyword SET, and you're going to change a particular column. So, what column and what value do you want it to be and then almost always you'll be using a WHERE clause. This is exactly the same as doing where in the select statement that allows you to limit the amount of rows that this update will apply to.

Page 21: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)So, an example would be, if we have that row that we just inserted, and I wanted to update it to change the email column, I'd say UPDATE the Employee table. I'm going to set the column code Email equal to, provide the string value, and then the WHERE condition.

Page 22: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)In this case, the easiest way would be employee ID equals 734, and if we run that it will reach directly into that row, and change that one value.

Now technically, you don't have to have a WHERE clause, but just as leaving off a WHERE clause in the Select statement would return all the rows in that table leaving off a WHERE clause in an update statement would mean that you would set this column value for every single row in that table. Now, occasionally, that is what you want in an update, but more often you will restrict your updates to one row or a handful of rows. So, we use WHERE clauses for that and using where in an update, it's exactly the same as using it in a select

Page 23: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)Finally, DELETE. DELETE is actually the simplest because it's all the about the where. The format is DELETE FROM, particular table name, WHERE condition. We don't have to name any columns in this query because delete just works on rows. If you delete a row, you delete an entire row. You can't delete part of a row. So, there's no point in naming any of the columns here.

Page 24: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)If you just want to blank out some information in an existing row, well, that's an update. So, that WHERE clause and the DELETE just defines which rows and which table you're going to delete.

So, we'd say to DELETE FROM Employee, WHERE. In this case employee ID equals 734. This is about as specific as we could possibly get.

Page 25: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)Now, you can delete multiple rows at a time the same way that you could use a where clause to select the multiple rows at the time, but for heaven sake take care with your delete statements, almost everyone who has worked with the databases for a while has a horror story about the DELETE statement that was written a little to casually. Because if you write something as simple as this and execute it, the database management system is just going to say, "Well, you're the boss and go ahead and delete every single row in this employee table." It won't ask, are you sure? It won't ask you to confirm anything. It will just do it. DELETE is a very powerful keyword.

Page 26: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Manipulation Language (DML)

So in this case, I'll do a select staff, an employee where employee ID equals 734 and just confirm that the result I would expect back, in this case one row, well, that's what would also affect if I was doing an update, and that's what I'd affect if I was doing a delete.

So, I'll create it as a select first and then change the select box to delete. Execute that, and we go to bit better faith that what we're actually affecting is just one row. So, to good practice to make sure the where clause is only selecting the rows you expect before you turn it into a delete or an update statement.

Page 27: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

27

Data Definition Language (DDL)

Page 28: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Definition Language (DDL)

Now the majority of SQL that you'll write will be things like SELECT statements, INSERTS, UPDATES, and DELETES but that's not the whole picture. These keywords are lumped together as the DML part of SQL. Data Manipulation Language is that part of the sequel language that deals with working with and manipulating data in your table. So, selecting it, inserting it, updating it, deleting it, but none of these words actually allow us to alter the structure of the database itself.

Page 29: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Definition Language (DDL)

So, how do we create a new table? How would we say what columns exist? How would we define primary keys? We can't use any of these words for that. But that can be done in SQL using a part of the language that is lumped together under the phrase DDL or Data Definition Language. The three key words there are Create, Alter, and Drop. These words let us change the structure of the database itself. Let's take a look at a couple of them.

Page 30: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Definition Language (DDL)

If I want to write SQL to create a new table, this is the format that I'll use. The word CREATE, and then the table name you just make it up, it's whatever you want to call it and then in parentheses, the definition of your columns. This is after all what a table is it's simply naming the columns and giving them data types in any rules they must comply to.

Page 31: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Definition Language (DDL)So, if I wanted to create a new employee table, I would say CREATE Employee and then inside parenthesis, here's how I would provide column definitions. We start off with EmployeeID as an INTEGER which is PRIMARY KEY. And these are key words we're using to tell SQL the importance of and the rules these columns must comply to. I'm not going to get into the specific of these for reasons I'll cover in just a moment, but the general format should be at least understandable about what this would do. It's going to create a new table with five columns, an integer, three VARCHAR, and that means the variable length character field. I can recognize even phrases like NOT NULL or NULL. So, it looks like we don't have to put anything in the department column and then we're finishing off with salary as an integer. Now, obviously in a production database, these can be far more complex than this but this is the basics of how it's done.

Page 32: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Definition Language (DDL)

Now, if we move on a little bit and realize we need to change the definition of it, we can use the key word, ALTER. Now, typically would alter a table. If you give it the table name and then you give it a bit of information about what you're trying to change. In this case, I'm saying I'm adding a new column. That column is called Email. It's going to be a variable length character string of 100 characters.

Page 33: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Definition Language (DDL)

If I have a problem with what I created earlier, I can use the last of the three keywords which is DROP. And we would simply say something like DROP TABLE Employee. We wouldn't need any WHERE clauses, any column definitions. We're simply saying destroy the entire table, again, something to be very, very careful about working with.

Page 34: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Data Definition Language (DDL)Now, I wanted to introduce these phrases, but I'm really not going to spend much more time on these words here for two reasons. First, many people who work on the actual design and structure of the database will use a visual application like MySQL Workbench or SQL Server Management Studio, rather than hand coding create an Alter statements.

It's true that if you're in a database administration role you all might then spend a lot of time with these kinds of data definition statements, Create, Alter, and Drop, but regular day to day developers working in these databases typically don't. You shouldn't be spending most of your time changing the definition of the database. You should be spending most of your time manipulating the actual data in it.

It's worth mentioning that there are a couple of other keywords in SQL that are typically lumped together under the idea of data control keywords. This is where you can either grant or revoke permissions for people in the database. This is not an area we're going to cover in this course because that really gets into the specifics of things like the operating system and the authentication schemes that you're using for your particular database. So, it's just not general content.

Page 35: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

35

Normalization

The 5 Rules of Normalization

Page 36: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Normalization

Once we've started to plan out our tables, our columns, and relationships, we do something called Database Normalization. This is a process where you take your database design, and you apply a set of formal criteria of rules called Normal Forms. These were developed about 40 years ago mainly by Edgar Codd, the father of relational databases. And we step through them 1, 2, 3, first normal form, second normal form, and third normal form. There are others but these are the important ones.

Page 37: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

NormalizationNormalization should be carried out for every database you design. It's really not that hard, even though, yes, when you first start reading about database normalization, you'll run into phrases like:

But you don't have to get into all this language unless you are mathematically inclined. The entire point of normalization is to make your database easier and more reliable to work with. You usually will end up creating a few new tables as part of the process. But the end result is your database will contain a minimum of duplicate or redundant data. It will contain data that's easy to get to, easier to edit, and maintain, and you can preform operations, even difficult ones on your database without creating garbage in it, without invalidating the state of it.

If you're a working database administrator or database designer, you can do normalization in your sleep. It's a core competency of the job. It's important. And as you'll see, we've already been doing a little of it.

Page 38: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

First Normal Form (1NF)

Before we apply the first set of criteria, what's called first normal form, often shortened to 1NF, I'm taking as a given that we already have our columns and our primary keys specified.

First normal form says that each of your columns and each of your tables should contain one value, just one value, and there should be no repeating groups.

Okay, what does this actually mean?

Page 39: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

First Normal Form (1NF)

Well, let's say I begin developing a database for my company and one of my tables is an Employee table, very simple stuff, EmployeeID, LastName, FirstName, and so on. And we allocate every employee a computer.

I want to keep track of that, so we'll add a ComputerSerial column to keep track of who has what. Now, this is actually okay right now. This technically is in first normal form. Here's the problem.

Page 40: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

First Normal Form (1NF)Let's say I figured out that some of our employees need a Mac and a PC to do the testing. Others need a desktop and a laptop. So, several people have multiple computers, and I want to keep track of all of them. There is a couple of ways that I could deal with this. I could just start stuffing extra data into that one column. We could start putting commas or vertical bars or any other delimiter and put in multiple values in the one ComputerSerial column.

This is just something you just do not do in Relational Database Design. We're violating first normal form.

Understand the relational databases will happily deal with hundreds of tables. Each table could have hundreds of columns and millions of rows. But they do not want columns that have a variable amount of values. You would find it hard to search directly for a serial number. You'd find it hard to sort. You'd find it hard to maintain. So, it's not in first normal form if you do this because first normal form demands that every column, every field contains one and only one value.

Page 41: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

First Normal Form (1NF)So, what we might do then is go back to the original way, and instead start adding new columns. So, ComputerSerial2, ComputerSerial3, this is what's called a repeating group, and there should be no repeating groups. The classic sign of a repeating group column is a column with the same name, and the number tacked onto the end of it just to make it unique, because usually this is a sign of an inflexible design.

Sure, if we could guarantee that there would only ever be two or three, that's fine. But what happens when we want to add the tablet and the smart phone? What happens when one employee manages testing and needs to be associated with six computers? We don't want to require a change to the database schema just because we buy a new computer. So, what do we do here?

Page 42: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

First Normal Form (1NF)Well, what we do is the same thing for a lot of these normalization steps. We'll take this data out of the Employee table, and put it in its own table.

Page 43: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

First Normal Form (1NF)This then has relationships. We create a one-to-many relationship between employee, and this new computer, or it could be called an asset table or whatever else makes sense. And it has a foreign key back to the Employee table. I can take any EmployeeID like 551, follow it to the Computer table, and find his two computers or 553, find his three computers, there are no repeating values, no repeating groups in either table. And this will get us into first normal form.

Page 44: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

First Normal Form (1NF)This then has relationships. We create a one-to-many relationship between employee, and this new computer, or it could be called an asset table or whatever else makes sense. And it has a foreign key back to the Employee table. I can take any EmployeeID like 551, follow it to the Computer table, and find his two computers or 553, find his three computers, there are no repeating values, no repeating groups in either table. And this will get us into first normal form.

Now, it's very common that the solution to a normalization issue is to create a new table. Sometimes, it's a one-to-many relationship like this, other times it might even require a many-to-many with a linking table.

Page 45: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Second Normal Form (2NF)

Before you attempt to go into second normal form or 2NF, well first, you have to be in first normal form. You don't pick and choose between them. You go through this one, two, three. Now whereas first normal form is about the idea of repeating values in a particular column, second normal form, and third normal form are all about the relationship between your columns that are your keys, and your other columns that aren't your keys. The second normal form has the rather puzzling official description that any non-key field should be dependent on the entire primary key. And that is about as simple as it can get phrased.

Page 46: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Second Normal Form (2NF)Now, when I say the word field, it usually refers to the idea that the actual value in a particular column position for a particular row. But what does this actually mean? Well, for most of what we've done in this course, this actually won't be an issue for us. Second normal form is only ever a problem when we're using a Composite Primary Key. That is a primary key made of two or more columns. So, let me show you a table that currently is in first normal form but not in second normal form. Going back to the idea of a database for a training center, I have an Events table here that has an ID of a Course, a Date, CourseTitle, Room, Capacity, AvailableSeats, and so on. Now, what's actually happening here is this table has been defined to use two columns as the primary key. It's a composite primary key.

Page 47: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Second Normal Form (2NF)Now, the issue with second normal form is that if you use a composite key, you need to look closely at the other columns in this table. So, going along to my non-key columns, I have CourseTitle, SQL Fundamentals, Room 4A, Capacity is 12, there are 4 seats available. A lot of this information would be unique to this one entry, this one course on this particular date. That's fine. But second normal form asks that all of my non-key columns, everything that isn't part of the key, so Course Title, Room, Capacity, Available, they all have to be dependent on the entire primary key. Now, that is the case for Room and Capacity and Available. These are unique values based on the fact that we're running this particular Date, this particular Course, and this particular room with a certain number of seats available. It will always be different. But CourseTitle, well, I could get that just from half of the key. I could get that just from the first part of the key. It has no connection to the Date whatsoever. SQL Fundamentals will always be based on SQL101.

Page 48: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Second Normal Form (2NF)

It doesn't matter if it's being run in March or April or May. Now, this might sound a little bit ivory tower. But here would be the impact.

What happens if somebody reached into this table, and they changed that Course ID, but they didn't change the title? Now, we've got a conflict. We might have the wrong title for the wrong piece of data. That's because my data now isn't in second normal form, and we're trying to fix that conflict from ever happening. So, how do we fix it?

Page 49: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Second Normal Form (2NF)Well, once again, we're going to rip out the CourseTitle. We're going to create a separate Courses table, where we want to again map the ID of the course into its own row. So, we'll always have one specific title for one specific ID. And then we create a one-to-many relationship between Events and Course. And removing that from the Event table means that everything in that table is now based on the entire key, particular course, at a particular date which may have a different room or different capacity, different number of available seats.

Page 50: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Third Normal Form (3NF)

Now, let's take a look at the third normal form. Well, as plainly as this can be described, it's that no non-key field, meaning, a column that is not part of the primary key is dependent on another non-key field. It is in a way similar to second normal form. Second normal form asks can I figure out any of the values in this row from just part of the composite key? While third normal form asks can I figure out any of the values in this row from any of the other values in this row? And I shouldn't be able to do that.

Page 51: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Third Normal Form (3NF)Let's take a look at an example. I've got this updated version of the Events and Courses table from the previous example. So, it's in both first normal form, it doesn't have any repeating values or repeating groups, and it's in second normal form. Meaning, there's no part of this that's dependent on just on a piece of the key. What I need to do for third normal form is look at my non-key fields, Room, Capacity, Availability. If I scan the entire row, let's take the first row, we've got SQL101 course occurring on the 1st of March. There is apparently 4 seats available. It's in Room 4A with a capacity of 12. Now, this is at a first look at it perfectly acceptable, because this course could be being scheduled in a different room every time with a different number of available seats as we start to sell different seats for a particular date. That's all okay.

Page 52: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Third Normal Form (3NF)

Here's the problem. It's between Room and Capacity. These are both non-key fields. These columns aren't part of the primary key. But if I look down the column for Room, I see 4A has 12 seats capacity, 4A has 12 seats, 7B has 14 seats. So, if every time we're in Room 4A, we always have 12 seats or every time we're in 7B, we always have 14 seats. I don't need to repeat that information. I could figure out capacity from Room and Room alone. I have one non-key field that is based on another non-key field. So, we don't need these to be stored in the same table.What we need to do is, you guessed it, split some of this information out into its own table.

Page 53: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Third Normal Form (3NF)So, we need to pull out Capacity from the Event table, and just keep Room. And that's as long as Room will always tell us a fixed capacity, we'd create our own table for it, 4A always has 12, 7B always has 14, and so on. Now, we're in third normal form, no non-key field is dependent on another non-key field.

Now, as you're seeing, it's all about the redundancy of the information. This is what we're trying to do with normalization.

Page 54: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Third Normal Form (3NF)Now, another example of third normal form would be something like this, which is very common.

Let's say we've got an OrderItem table, which is calculating different parts of an invoice. So, it has a ProductID with a Quantity, a UnitPrice, and a Total.

Page 55: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Third Normal Form (3NF)Now, you don't have to worry about how this might relate to different tables. All I'm interested in looking at is this part. We've got Quantity for UnitPrice of $10, Total is $40. Here is the issue. We can see that Total is based purely as on Quantity times UnitPrice. Now, Quantity and UnitPrice are both non-key fields. So we're figuring out Total from these other two non-key fields. We don't need to do this. We don't need to store this in the database. We don't want to store information in your table that's easily ascertained by adding other non-key fields together, or in this case multiplying them. One of the main reasons for this is to prevent any conflicts.

If in this example I have a row that says we have a Quantity of 4 and the UnitPrice of 10, but the Total says 50, well, where is the problem? There is a problem. How do we do it? How would we fix it?

Page 56: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Third Normal Form (3NF)Is the Total wrong or is the Quantity wrong? Your data doesn't make sense anymore. So, we would remove that Total column form this table. We can figure it out when we need to figure it out. Now, third normal form will help you figure out these potential problems.

Now just a quick side bar, in cases like this where you might find a total useful in the table, many database systems offer you the option of defining a computed or calculated column. It's not actually stored in the database, it is a convenient read-only fiction. Its value is automatically calculated based on the other columns in the table, and you may find that useful from time to time.

Page 57: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

57

Denormalization

Page 58: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Database DenormalizationSo, we should always take our database design through the first, second, and third normal forms. There are more criteria available. There are fourth, fifth, and sixth normal forms. There's something called Boyce-Codd normal form. But taking it to third normal form is the usual expectation in a business environment, and certainly all we need to cover in a course like this one. Now, you will actually find a lot of tables out there intentionally break normalization rules and some others seem like they do but they actually don't.

Page 59: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Denormalization

Here's one example. Let's say we've got an Employee table, and I'm storing an Email and a Phone number.

Technically, this can be described as breaking first normal form. It's a repeating group. But in practice, you may find it more convenient to just allow an Email and Email2 column or perhaps a HomePhone and MobilePhone column rather than splitting everything out into multiple tables and having to follow relationships every single time you read or write this data. This will be referred to as a de-normalization decision. You're consciously making the choice that something could be normalized out into another table. You could follow the official rules. But for convenience and/or for performance, you're not going to.

Page 60: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Denormalization

Normalizing a table like this, thinking that I've immediately spotted a non-key field dependency, that would actually be taking it too far and making things more inconvenient.

And the question is you really want to understand your data before you can make all these choices whether to normalize or de-normalize. And you might de-normalize to make things a bit more efficient, but do it knowingly instead of accidentally.

Page 61: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Denormalization

These really the three steps that we would go through, first normal form, second normal form, and third normal form.

First being about having no repeating values and no repeating groups, second normal form, no values based on just part of say half of a composite key, and third normal form, none of your non-key values should be based on or determined from another non-key value.

Taking your database design through these three central criteria will vastly improve the quality of your data.

Page 62: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

Denormalization

One example that can seem like a normalization and/or de-normalization issue but really isn't any table that's full of address information. This situation can be a little deceptive. If I look at a table like this, and I can see I've got Zip code being stored as the last column here. Theoretically, I could figure out what the City, and the State are just from the Zip Code, if I separated them out into their own table. So technically, I have non-key fields, City and State that are dependent on another non-key field, Zip, that could be figured out from Zip alone. However, this kind of case is not the full story because while it might be true 99% of the time that a Zip code maps to a particular City or Town, there are some cases where multiple towns or cities are allowed in the same zip code, some Zip codes even cross multiple states.

Page 63: Instructor: Craig Duckett Lecture 05: Tuesday, October 13 th, 2015 Transactions, Acid Test, DML, DDL, Normalization 1 BIT275: Database Design (Fall 2015)

63

ICE 05