Introduction to Database Systems CSE 444

29
1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005

description

Introduction to Database Systems CSE 444. Lecture #1 January 3, 2005. Staff. Instructor: Dan Suciu Allen, Room 662, [email protected] Office hours: Wednesday, 10:30-11:30 (advanced email recommended) TAs: Ashish Gupta, [email protected] - PowerPoint PPT Presentation

Transcript of Introduction to Database Systems CSE 444

Page 1: Introduction to Database Systems CSE 444

1

Introduction to Database SystemsCSE 444

Lecture #1January 3, 2005

Page 2: Introduction to Database Systems CSE 444

2

Staff• Instructor: Dan Suciu

– Allen, Room 662, [email protected] – Office hours: Wednesday, 10:30-11:30

(advanced email recommended)

• TAs:– Ashish Gupta, [email protected]– Victor Tung, [email protected]– Office hours: TBA (check mailing list)

Page 3: Introduction to Database Systems CSE 444

3

Communications• Web page:

http://www.cs.washington.edu/444/– Lectures will be available here– Homeworks will be posted here– The project description will be here

• Mailing list:

– please subscribe (see instructions on the Web page)

Page 4: Introduction to Database Systems CSE 444

4

Textbook(s)

Main textbook, available at the bookstore:

• Database Systems: The Complete Book, Hector Garcia-Molina, Jeffrey Ullman,Jennifer Widom

Most chapters are good. Some are not (functional dependecies).COME TO CLASS ! Slides are good, and we discuss in class.

Page 5: Introduction to Database Systems CSE 444

5

Other TextsOn reserve at the Engineering Library:• Database Management Systems, Ramakrishnan

– very comprehensive• XQuery from the Experts, Katz, Ed.

– The reference on XQuery• Fundamentals of Database Systems, Elmasri, Navathe

– very widely used, but we don’t use it• Foundations of Databases, Abiteboul, Hull, Vianu

– Mostly theory of databases• Data on the Web, Abiteboul, Buneman, Suciu

– XML and other new/advanced stuff

Page 6: Introduction to Database Systems CSE 444

6

Other Required Readings

There will be reading assignments from the Web:• SQL for Web Nerds, by Philip Greenspun,

http://philip.greenspun.com/sql/• Others, especially for XML

For SQL, a good source of information is the MSDN library (on your Windows machine)

Page 7: Introduction to Database Systems CSE 444

7

Outline for Today’s Lecture

• Overview of database systems– Reading assignment for Friday:

Introduction from SQL for Web Nerdshttp://philip.greenspun.com/sql/

• Course Outline

Page 8: Introduction to Database Systems CSE 444

8

What Is a Relational Database Management System ?

Database Management System = DBMSRelational DBMS = RDBMS

• A collection of files that store the data

• A big C program written by someone else that accesses and updates those files for you

Page 9: Introduction to Database Systems CSE 444

9

Where are RDBMS used ?

• Backend for traditional “database” applications

• Backend for large Websites• Backend for Web services

Page 10: Introduction to Database Systems CSE 444

10

Example of a Traditional Database Application

Suppose we are building a system to store the information about:• students• courses• professors• who takes what, who teaches what

Page 11: Introduction to Database Systems CSE 444

11

Can we do it without a DBMS ?

Sure we can! Start by storing the data in files:

students.txt courses.txt professors.txt

Now write C or Java programs to implement specific tasks

Page 12: Introduction to Database Systems CSE 444

12

Doing it without a DBMS...

• Enroll “Mary Johnson” in “CSE444”:

Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”

Write a C program to do the following:

CRASH !

Page 13: Introduction to Database Systems CSE 444

13

Problems without an DBMS...

• System crashes:

– What is the problem ?• Large data sets (say 50GB)

– Why is this a problem ?• Simultaneous access by many users

– Lock students.txt – what is the problem ?

Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”

CRASH !

Page 14: Introduction to Database Systems CSE 444

14

Enters a DMBS

Data files

Database server(someone else’s

C program) Applications

connection(ODBC, JDBC)

“Two tier system” or “client-server”

Page 15: Introduction to Database Systems CSE 444

15

Functionality of a DBMSThe programmer sees SQL, which has two components:• Data Definition Language - DDL• Data Manipulation Language - DML

– query language

Behind the scenes the DBMS has:• Query engine• Query optimizer• Storage management• Transaction Management (concurrency, recovery)

Page 16: Introduction to Database Systems CSE 444

16

How the Programmer Sees the DBMS

• Start with DDL to create tables:

• Continue with DML to populate tables:

CREATE TABLE Students (Name CHAR(30)SSN CHAR(9) PRIMARY KEY NOT NULL,Category CHAR(20)

) . . .

INSERT INTO StudentsVALUES(‘Charles’, ‘123456789’, ‘undergraduate’). . . .

Page 17: Introduction to Database Systems CSE 444

17

How the Programmer Sees the DBMS

• Tables:

• Still implemented as files, but behind the scenes can be quite complex

SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … …

SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 …

Students: Takes:

CID Name Quarter CSE444 Databases fall CSE541 Operating systems winter

Courses:

“data independence” = separate logical view from physical implementation

Page 18: Introduction to Database Systems CSE 444

18

Transactions

• Enroll “Mary Johnson” in “CSE444”:BEGIN TRANSACTION;

INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’

-- More updates here....

IF everything-went-OK THEN COMMIT;ELSE ROLLBACK

If system crashes, the transaction is still either committed or aborted

Page 19: Introduction to Database Systems CSE 444

19

Transactions

• A transaction = sequence of statements that either all succeed, or all fail

• Transactions have the ACID properties:A = atomicityC = consistencyI = isolationD = durability

Page 20: Introduction to Database Systems CSE 444

20

Queries

• Find all courses that “Mary” takes

• What happens behind the scene ?– Query processor figures out how to answer the

query efficiently.

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

Page 21: Introduction to Database Systems CSE 444

21

Queries, behind the scene

Imperative query execution plan:

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

Declarative SQL query

Students Takes

sid=sid

sname

name=“Mary”

cid=cid

Courses

The optimizer chooses the best execution plan for a query

Page 22: Introduction to Database Systems CSE 444

22

Database Systems

• The big commercial database vendors:– Oracle– IBM (DB2)– Microsoft (SQL Server)– Sybase

• Some free database systems (Unix) :– Postgres– MySQL– Predator

• In CSE444 we use SQL Server.

Page 23: Introduction to Database Systems CSE 444

23

New Trends in Databases

• Object-relational DBs• Main memory DBs• XML XML XML !

– Relational databases with XML support– Middleware between XML and relational databases– Native XML database systems– Large-scale XML message systems– Lots of research here at UW on XML and databases

• Security

Page 24: Introduction to Database Systems CSE 444

24

Course Outline

Part I• SQL (Chapter 7)• The relational data model (Chapter 3)• Database design (Chapters 2, 3, 7)• XML, XPath, XQuery

Midterm: Monday, February 7 (in class)

Page 25: Introduction to Database Systems CSE 444

25

Course OutlinePart II• SQL Access Control (security)• Transactions• Data storage, indexes (Chapters 11-13)• Query execution and optimization (Chapter 15,16)• Recovery (Chapter 17)

Final: Wednesday, March 16th, 2:30-4:20, MGH 241 (this room)

Page 26: Introduction to Database Systems CSE 444

26

Out of Town

• I will be out of town during three lectures

• Ashish Gupta will be guest lecturer

Page 27: Introduction to Database Systems CSE 444

27

Structure

• Prerequisites: Data structures course (CSE-326).

• Work & Grading:– Homework: 25% (4 of them; some light programming)– Project: 30% (next)– Midterm: 15% – Final: 25% – Intangibles: 5%

Page 28: Introduction to Database Systems CSE 444

28

The Project

• Models data management needs of a company

• Will have three phases– Correspond to Real World phases of system

evolution in a company

Page 29: Introduction to Database Systems CSE 444

29

So what is this course about, really ?

A bit of everything !• Languages: SQL, XPath, XQuery• Theory (Functional dependencies, normal forms)• Algorithms and data structures (in Part II)• Lots of programming and hacking for the project

Most importantly: how to meet Real World needs