CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database...

67
CSC 343 Introduction to Databases Summer 2018

Transcript of CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database...

Page 1: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

CSC 343 Introduction to Databases

Summer 2018

Page 2: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

WELCOME ALL

• This is Introduction to Databases course

• Tamanna Chhabra

Page 3: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

WELCOME ALL

• PhD from Aalto University, Finland

• Research papers in various international forums

• 4 years of teaching experience at college and university level

• My email is [email protected]

Page 4: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Course Outline

• Introduction to database management systems.

• The relational data model.

• Relational algebra.

• Querying and updating databases: the query language

SQL.

• Application programming with SQL.

• Integrity constraints, normal forms, and database

design.

• Elements of database system technology: query

processing, transaction management.

Page 5: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Course Marking Scheme

Work Weight Comment

3 assignments 30% 10% each

Homework 10% 1% each, due 6:00 pm each Thursday (except the midterm week)

midterm 15%

Final exam 45% You must earn at least 40% to pass the course

Page 6: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Admin Stuff

Important: Read the course syllabus

• Communication: website: required reading

Piazza: our FAQs and pinned posts are required reading

your questions: to Piazza please

personal matters: email or visit me

Office hours: – 4.00-5.30 pm on Thursday in BA3201

Page 7: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Active lectures

• Activities like:

team problem solving, reviewing other students’ solutions, and short quizzes.

• All three hours will be here, with me.

• We probably won’t use the “tutorial” time slot until next week.

Page 8: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Recommended Resources

• Ullman and Widom, “A First Course in Database

Systems”, third edition.

• Jennifer Widom‟s online mini-courses from Stanford.

Page 9: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Expectations- Classroom Policy

• My role as a teacher

Organized class sessions

Post PPTs and other material on time

Keep room for your input in class

Post grades on time

Zero tolerance for favouritism

Maintain positive learning environment

Page 10: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Expectations- Classroom Policy

• Your role as a student

Respectful behavior (Disrespectful behavior would be directed to Student’s conduct office)

Professionalism – punctuality and participation is expected

No social media during class time, devices would be allowed for in class activities

Page 11: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Assignment Policies

• You may work with a partner on assignments.

• You may not dissolve a partnership without

permission.

• Assignments must be submitted via MarkUs.

• Late policy: You have 6 grace tokens. Each can be

used for a 2 hour extension with no penalty.

Page 12: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

To-do list

• Anyone new to the CS Teaching Labs:

Your account name is your UTORid.

Check your email account declared on Acorn for a message with your password.

Try logging in.

• Read the course syllabus.

• Bookmark the course website http://www.cdf.toronto.edu/~csc343h/summer/

Page 13: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Learning outcomes

• What is a database?

• What is a Database Management System?

• Evolution of DBMS

• Components of a DBMS

Page 14: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

To understand what is a database we need to know the difference between data and information.

Page 15: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Data and information

Data Information

Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized.

When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information.

Data by itself alone is not significant. Information is significant by itself.

Page 16: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

What is a database?

Page 17: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

What is a database?

• Database (DB): A collection of information that exists over a period of time.

• The related information when placed in an organized form makes a database.

Page 18: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Operations on Databases

• To add new information

• To view or retrieve the stored information

• To modify or edit the existing information

• To remove or delete the unwanted information

• Arranging the information in a desired order etc.

Page 19: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Database and Computers

• There are two approaches for storing data in computers such as File based approach and Database approach.

Page 20: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Evolution of Database Management Systems

Page 21: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

File Based Approach

• A file system is a method for storing and organizing

computer files.

• Programmers used programming languages such as

COBOL, C++ to write applications that directly

access files to perform data management services and

provide information to users.

Page 22: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Early database management systems: files

• First commercial database systems evolved from file systems.

File systems allow storage of big amounts of data

They do not guarantee data safety(data can be lost if not backed up)

They do not resolve an issue of modifying the same file concurrently

No query language for the data in files.

Need to write programs for extracting even the most elementary information from a set of files.

Page 23: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Relational databases: key idea

• Think in terms of tables, not bits on disk.

• A database system should present the user with a view of data organized as tables (also called relations).

• Queries could be expressed in a very high-level language, which greatly increases the efficiency of database programmers.

Page 24: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Our dream system:

• Allows to create new databases and specify their schema (logical structure of the data) in a simple language

• Enables data query and modification, using a simple language

• Supports intelligent storage of very large amounts of data.

Enforcing constraints (to not allow the insertion of two different employees with the same SIN).

Page 25: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Our dream system:

• Controls access to data from many users at once (concurrency), without allowing “bad” interactions that can corrupt the consistency.

• Recovers from software failures and crashes.

Page 26: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Such system exists:

• Database Management System (DBMS) - complex software for storing and managing databases.

Page 27: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

So what is a database?

• A database is a collection of data managed by a DBMS.

Page 28: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example

• Suppose we have stored in a file called Employees records having the fields/columns (emp_code, name, dept_code)

• And in another file called Departments records having the fields: (dept_code, dept_name)

• Suppose now that given an employee, for instance with name “Smith”, we want to find out what department is he working for.

Page 29: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Solution

• In the absence of DBMS we have to write a program which will:

1. open the file Employees

2. declare a variable of the same type as the records stored in the file

3. scan the file: while the end of the file is not yet encountered, assign the current record to above variable. If the value of the name field is “Smith” then remember the value of the dept_code field. Suppose it is “100”

Page 30: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Solution

4. search in a similar way for a record with “100” for the dept_code in the Department file

5. print the dept_name when successfully found the dept_code

Very painful procedure and time consuming

Page 31: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Modern Solution

• Compare it to the short and elegant SQL query

SELECT dept_name FROM Employees, Department WHERE Employees.name="Smith" AND Employees.dept_code = Department.dept_code

Page 32: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Early applications of DBMS’s

• Airline reservation systems

• Banking systems

Data composed of many small items, and various queries and modifications on them.

Page 33: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Case 1: Airline Reservation Systems

• Here the items include:

Reservations by a single customer on a single flight, including such information as assigned seat…

Flights information – the airport they fly from and to, their departure and arrival times… Ticket information – prices, requirements, and availability.

• Typical queries ask for:

Flights leaving about a certain time from one given city to another, seats available, prices.

Page 34: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Case 1: Airline Reservation Systems

• Typical data modifications include:

Making a reservation in a flight for a customer, assigning a seat, etc.

Page 35: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Case 1: Airline Reservation Systems

• Many agents access parts of the data at any given time. DBMS must allow concurrent accesses and prevent problems such as two agents assigning the same seat simultaneously.

• DBMS should also protect against loss of records if the system suddenly fails.

Page 36: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Case 2: Banking Systems

• Data items include:

Customers, their names, addresses etc.

Accounts, and their balances Loans, and their

balances

Connections between customers and their accounts

and loans.

• Typical queries are those for account and loan

balances.

• Typical modifications are those representing a

withdrawal from or deposit to an account.

Page 37: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Banking Systems

• In banking systems failures cannot be tolerated.

E.g, once the money has been ejected from an ATM

machine, the bank must record the debit, even if the power

immediately fails.

Page 38: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example of a Relational DB

• Relations = Tables. Columns are “headed” by attribute names.

• Rows = Tuples

Accounts Relation

AccountNo Balance Type

12345 1000.0 Savings

67890 2846.9 Checking

……………. ………… …………

Page 39: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example of a Relational DB

• Queries Examples

1. What‟s the balance of account “67890” ?

2. Which are the savings accounts with negative

balances?

1. SELECT balance FROM Accounts WHERE

accountNo = 67890;

2. SELECT accountNo FROM Accounts

WHERE type = „savings‟ AND balance < 0;

Page 40: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Multiple choice questions

• Duplication of data at several places is called as

_______________.

Data Inconsistency

Data Redundancy

Data Isolation

None of the above

Page 41: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Problems with data redundancy

• Same information is stored in more than one file.

• For example: Data between the Payroll and the Personnel department is duplicated.

• Change of address reflected only in Personnel and not in Payroll.

• Pay slips would be sent to the wrong address.

Page 42: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Multiple choice questions

• Data Redundancy increases the cost of storing and

retrieving data.

True

False

Page 43: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Multiple choice questions

• If in redundant file common fields are not matching

then it results in _____________.

Data Redundancy

Data Integrity Problem

Data Isolation

Data Inconsistency

Page 44: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Multiple choice questions

• Which of the following terms does refer to the

correctness and completeness of the data in a

database?

Data security

Data independence

Data integrity

Data model

Page 45: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example

• Data redundancy can lead to loss of data integrity.

Page 46: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Multiple choice questions

• When multiple users try to access the same piece of

data at same time it is called

Data integrity

Concurrency

Data independence

None of these

Page 47: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Components of a Database Management System

Page 48: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

DBMS Architecture

• The “cylindrical”

component

contains not only

data, but also

metadata, i.e. info

about the structure

of data.

Page 49: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Metadata

• If DBMS is relational,

metadata includes:

names of relations,

names of attributes of

those relations, and data

types for those attributes

(e.g., integer or

character string).

Page 50: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Storage Manager

• The job of the Storage

Manager is to

obtain data from the data

storage, and

return new data to the data

storage when updated.

Page 51: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Query Processor

• Query Processor handles:

queries and modifications

to the data.

Finds the best way to carry

out a requested operation.

Page 52: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example: Query optimization

• A bank has a DB with two tables: Customers (name, SIN, address), Accounts (accountNo, balance, SIN)

• Query: “Find the balances of all accounts of which Sally is the owner.”

Page 53: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example: Query optimization

• SQL: SELECT Accounts.balance FROM Customers, Accounts WHERE Customers.SIN = Accounts.SIN AND Customers.name = 'Sally';

Page 54: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example: Query optimization

• This query - if executed naively: Pairs tuples of tables

specified in the FROM-clause into a new table R.

Chooses from R the tuples satisfying the condition in the WHERE clause.

Produces as answer only the values of attributes in SELECT-clause.

The performance would be terrible, because of the usually enormous (quadratic) size of all pairs of tuples.

Page 55: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Example: Query optimization

• Query processor will cleverly create a plan which inexpensively:

Retrieves the tuple for “Sally” and gets the SIN number

Retrieves the account tuples for this SIN number

Page 56: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Transaction manager

• Transaction Manager

assures that:

several queries running

simultaneously do not

interfere with each other

and that,

the system will not end

up with corrupted data

even if there is a power

failure.

Page 57: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Transaction Manager

• Transaction Manager

interacts with:

Query Manager Because it

may need to delay certain

query operations to avoid

conflicts.

Page 58: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

DBMS is a very complex system.

Good news: it has been already built for you to use

Page 59: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Modern DBMS’s guarantee:

• Controlling Redundancy:

In file system each application has its own private files which cannot be shared between multiple applications.

This can often lead to considerable redundancy;

By a centralized database, most of it can be controlled.

Page 60: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Modern DBMS’s guarantee:

• Integrity can be enforced: It means that the data in the database is always accurate

such that incorrect information cannot be stored in it.

Some integrity constraints are enforced on the database.

Page 61: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Modern DBMS’s guarantee:

• Inconsistency can be avoided:

When the same data is duplicated and changes are made at

one site which are not propagated to other site, it gives rise

to inconsistency.

So if the redundancy is removed chances of having

inconsistent data is also removed.

Page 62: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Modern DBMS’s guarantee:

• Data can be shared:

Since the data is centralized as compared to file system so it is shared.

Page 63: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Modern DBMS’s guarantee:

• Providing Backup and Recovery:

For eg if the computer fails in the middle of a update program, the recovery subsystem is responsible for making sure that the database is restored to the state it was in before the program started executing.

Page 64: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Modern DBMS’s guarantee:

• Concurrency Control:

It provides mechanisms to provide concurrent access of data to multiple users

Page 65: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Database studies

• Design of databases (data modeling).

How to structure information?

How to connect data items?

What constraints should the data satisfy?

• Database programming.

How to query and modify the database?

How is database programming combined with

conventional programming?

Page 66: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Databases Studies

• Database system implementation.

How does one build a DBMS, including such matters as

query processing, transaction processing and organizing

storage for efficient access?

Page 67: CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database Management System? ... COBOL, C++ to write applications that directly ... Ticket

Thanks to Marina Barsky and Diane Horton for the material.