CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database...
Transcript of CSC 343 Introduction to Databases Summer 2017csc343h/summer/content/lectures...•What is a Database...
CSC 343 Introduction to Databases
Summer 2018
WELCOME ALL
• This is Introduction to Databases course
• Tamanna Chhabra
WELCOME ALL
• PhD from Aalto University, Finland
• Research papers in various international forums
• 4 years of teaching experience at college and university level
• My email is [email protected]
Course Outline
• Introduction to database management systems.
• The relational data model.
• Relational algebra.
• Querying and updating databases: the query language
SQL.
• Application programming with SQL.
• Integrity constraints, normal forms, and database
design.
• Elements of database system technology: query
processing, transaction management.
Course Marking Scheme
Work Weight Comment
3 assignments 30% 10% each
Homework 10% 1% each, due 6:00 pm each Thursday (except the midterm week)
midterm 15%
Final exam 45% You must earn at least 40% to pass the course
Admin Stuff
Important: Read the course syllabus
• Communication: website: required reading
Piazza: our FAQs and pinned posts are required reading
your questions: to Piazza please
personal matters: email or visit me
Office hours: – 4.00-5.30 pm on Thursday in BA3201
Active lectures
• Activities like:
team problem solving, reviewing other students’ solutions, and short quizzes.
• All three hours will be here, with me.
• We probably won’t use the “tutorial” time slot until next week.
Recommended Resources
• Ullman and Widom, “A First Course in Database
Systems”, third edition.
• Jennifer Widom‟s online mini-courses from Stanford.
Expectations- Classroom Policy
• My role as a teacher
Organized class sessions
Post PPTs and other material on time
Keep room for your input in class
Post grades on time
Zero tolerance for favouritism
Maintain positive learning environment
Expectations- Classroom Policy
• Your role as a student
Respectful behavior (Disrespectful behavior would be directed to Student’s conduct office)
Professionalism – punctuality and participation is expected
No social media during class time, devices would be allowed for in class activities
Assignment Policies
• You may work with a partner on assignments.
• You may not dissolve a partnership without
permission.
• Assignments must be submitted via MarkUs.
• Late policy: You have 6 grace tokens. Each can be
used for a 2 hour extension with no penalty.
To-do list
• Anyone new to the CS Teaching Labs:
Your account name is your UTORid.
Check your email account declared on Acorn for a message with your password.
Try logging in.
• Read the course syllabus.
• Bookmark the course website http://www.cdf.toronto.edu/~csc343h/summer/
Learning outcomes
• What is a database?
• What is a Database Management System?
• Evolution of DBMS
• Components of a DBMS
To understand what is a database we need to know the difference between data and information.
Data and information
Data Information
Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized.
When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information.
Data by itself alone is not significant. Information is significant by itself.
What is a database?
What is a database?
• Database (DB): A collection of information that exists over a period of time.
• The related information when placed in an organized form makes a database.
Operations on Databases
• To add new information
• To view or retrieve the stored information
• To modify or edit the existing information
• To remove or delete the unwanted information
• Arranging the information in a desired order etc.
Database and Computers
• There are two approaches for storing data in computers such as File based approach and Database approach.
Evolution of Database Management Systems
File Based Approach
• A file system is a method for storing and organizing
computer files.
• Programmers used programming languages such as
COBOL, C++ to write applications that directly
access files to perform data management services and
provide information to users.
Early database management systems: files
• First commercial database systems evolved from file systems.
File systems allow storage of big amounts of data
They do not guarantee data safety(data can be lost if not backed up)
They do not resolve an issue of modifying the same file concurrently
No query language for the data in files.
Need to write programs for extracting even the most elementary information from a set of files.
Relational databases: key idea
• Think in terms of tables, not bits on disk.
• A database system should present the user with a view of data organized as tables (also called relations).
• Queries could be expressed in a very high-level language, which greatly increases the efficiency of database programmers.
Our dream system:
• Allows to create new databases and specify their schema (logical structure of the data) in a simple language
• Enables data query and modification, using a simple language
• Supports intelligent storage of very large amounts of data.
Enforcing constraints (to not allow the insertion of two different employees with the same SIN).
Our dream system:
• Controls access to data from many users at once (concurrency), without allowing “bad” interactions that can corrupt the consistency.
• Recovers from software failures and crashes.
Such system exists:
• Database Management System (DBMS) - complex software for storing and managing databases.
So what is a database?
• A database is a collection of data managed by a DBMS.
Example
• Suppose we have stored in a file called Employees records having the fields/columns (emp_code, name, dept_code)
• And in another file called Departments records having the fields: (dept_code, dept_name)
• Suppose now that given an employee, for instance with name “Smith”, we want to find out what department is he working for.
Solution
• In the absence of DBMS we have to write a program which will:
1. open the file Employees
2. declare a variable of the same type as the records stored in the file
3. scan the file: while the end of the file is not yet encountered, assign the current record to above variable. If the value of the name field is “Smith” then remember the value of the dept_code field. Suppose it is “100”
Solution
4. search in a similar way for a record with “100” for the dept_code in the Department file
5. print the dept_name when successfully found the dept_code
Very painful procedure and time consuming
Modern Solution
• Compare it to the short and elegant SQL query
SELECT dept_name FROM Employees, Department WHERE Employees.name="Smith" AND Employees.dept_code = Department.dept_code
Early applications of DBMS’s
• Airline reservation systems
• Banking systems
Data composed of many small items, and various queries and modifications on them.
Case 1: Airline Reservation Systems
• Here the items include:
Reservations by a single customer on a single flight, including such information as assigned seat…
Flights information – the airport they fly from and to, their departure and arrival times… Ticket information – prices, requirements, and availability.
• Typical queries ask for:
Flights leaving about a certain time from one given city to another, seats available, prices.
Case 1: Airline Reservation Systems
• Typical data modifications include:
Making a reservation in a flight for a customer, assigning a seat, etc.
Case 1: Airline Reservation Systems
• Many agents access parts of the data at any given time. DBMS must allow concurrent accesses and prevent problems such as two agents assigning the same seat simultaneously.
• DBMS should also protect against loss of records if the system suddenly fails.
Case 2: Banking Systems
• Data items include:
Customers, their names, addresses etc.
Accounts, and their balances Loans, and their
balances
Connections between customers and their accounts
and loans.
• Typical queries are those for account and loan
balances.
• Typical modifications are those representing a
withdrawal from or deposit to an account.
Banking Systems
• In banking systems failures cannot be tolerated.
E.g, once the money has been ejected from an ATM
machine, the bank must record the debit, even if the power
immediately fails.
Example of a Relational DB
• Relations = Tables. Columns are “headed” by attribute names.
• Rows = Tuples
Accounts Relation
AccountNo Balance Type
12345 1000.0 Savings
67890 2846.9 Checking
……………. ………… …………
Example of a Relational DB
• Queries Examples
1. What‟s the balance of account “67890” ?
2. Which are the savings accounts with negative
balances?
1. SELECT balance FROM Accounts WHERE
accountNo = 67890;
2. SELECT accountNo FROM Accounts
WHERE type = „savings‟ AND balance < 0;
Multiple choice questions
• Duplication of data at several places is called as
_______________.
Data Inconsistency
Data Redundancy
Data Isolation
None of the above
Problems with data redundancy
• Same information is stored in more than one file.
• For example: Data between the Payroll and the Personnel department is duplicated.
• Change of address reflected only in Personnel and not in Payroll.
• Pay slips would be sent to the wrong address.
Multiple choice questions
• Data Redundancy increases the cost of storing and
retrieving data.
True
False
Multiple choice questions
• If in redundant file common fields are not matching
then it results in _____________.
Data Redundancy
Data Integrity Problem
Data Isolation
Data Inconsistency
Multiple choice questions
• Which of the following terms does refer to the
correctness and completeness of the data in a
database?
Data security
Data independence
Data integrity
Data model
Example
• Data redundancy can lead to loss of data integrity.
Multiple choice questions
• When multiple users try to access the same piece of
data at same time it is called
Data integrity
Concurrency
Data independence
None of these
Components of a Database Management System
DBMS Architecture
• The “cylindrical”
component
contains not only
data, but also
metadata, i.e. info
about the structure
of data.
Metadata
• If DBMS is relational,
metadata includes:
names of relations,
names of attributes of
those relations, and data
types for those attributes
(e.g., integer or
character string).
Storage Manager
• The job of the Storage
Manager is to
obtain data from the data
storage, and
return new data to the data
storage when updated.
Query Processor
• Query Processor handles:
queries and modifications
to the data.
Finds the best way to carry
out a requested operation.
Example: Query optimization
• A bank has a DB with two tables: Customers (name, SIN, address), Accounts (accountNo, balance, SIN)
• Query: “Find the balances of all accounts of which Sally is the owner.”
Example: Query optimization
• SQL: SELECT Accounts.balance FROM Customers, Accounts WHERE Customers.SIN = Accounts.SIN AND Customers.name = 'Sally';
Example: Query optimization
• This query - if executed naively: Pairs tuples of tables
specified in the FROM-clause into a new table R.
Chooses from R the tuples satisfying the condition in the WHERE clause.
Produces as answer only the values of attributes in SELECT-clause.
The performance would be terrible, because of the usually enormous (quadratic) size of all pairs of tuples.
Example: Query optimization
• Query processor will cleverly create a plan which inexpensively:
Retrieves the tuple for “Sally” and gets the SIN number
Retrieves the account tuples for this SIN number
Transaction manager
• Transaction Manager
assures that:
several queries running
simultaneously do not
interfere with each other
and that,
the system will not end
up with corrupted data
even if there is a power
failure.
Transaction Manager
• Transaction Manager
interacts with:
Query Manager Because it
may need to delay certain
query operations to avoid
conflicts.
DBMS is a very complex system.
Good news: it has been already built for you to use
Modern DBMS’s guarantee:
• Controlling Redundancy:
In file system each application has its own private files which cannot be shared between multiple applications.
This can often lead to considerable redundancy;
By a centralized database, most of it can be controlled.
Modern DBMS’s guarantee:
• Integrity can be enforced: It means that the data in the database is always accurate
such that incorrect information cannot be stored in it.
Some integrity constraints are enforced on the database.
Modern DBMS’s guarantee:
• Inconsistency can be avoided:
When the same data is duplicated and changes are made at
one site which are not propagated to other site, it gives rise
to inconsistency.
So if the redundancy is removed chances of having
inconsistent data is also removed.
Modern DBMS’s guarantee:
• Data can be shared:
Since the data is centralized as compared to file system so it is shared.
Modern DBMS’s guarantee:
• Providing Backup and Recovery:
For eg if the computer fails in the middle of a update program, the recovery subsystem is responsible for making sure that the database is restored to the state it was in before the program started executing.
Modern DBMS’s guarantee:
• Concurrency Control:
It provides mechanisms to provide concurrent access of data to multiple users
Database studies
• Design of databases (data modeling).
How to structure information?
How to connect data items?
What constraints should the data satisfy?
• Database programming.
How to query and modify the database?
How is database programming combined with
conventional programming?
Databases Studies
• Database system implementation.
How does one build a DBMS, including such matters as
query processing, transaction processing and organizing
storage for efficient access?
Thanks to Marina Barsky and Diane Horton for the material.