1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

29
1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005

Transcript of 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

Page 1: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

1

Introduction to Database SystemsCSE 444

Lecture #1

January 3, 2005

Page 2: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

2

Staff• Instructor: Dan Suciu

– Allen, Room 662, [email protected]

– Office hours: Wednesday, 10:30-11:30 (advanced email recommended)

• TAs:– Ashish Gupta, [email protected]

– Victor Tung, [email protected]

– Office hours: TBA (check mailing list)

Page 3: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

3

Communications• Web page:

http://www.cs.washington.edu/444/– Lectures will be available here– Homeworks will be posted here– The project description will be here

• Mailing list:

– please subscribe (see instructions on the Web page)

Page 4: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

4

Textbook(s)

Main textbook, available at the bookstore:

• Database Systems: The Complete Book, Hector Garcia-Molina, Jeffrey Ullman,Jennifer Widom

Most chapters are good. Some are not (functional dependecies).COME TO CLASS ! Slides are good, and we discuss in class.

Page 5: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

5

Other Texts

On reserve at the Engineering Library:• Database Management Systems, Ramakrishnan

– very comprehensive

• XQuery from the Experts, Katz, Ed.– The reference on XQuery

• Fundamentals of Database Systems, Elmasri, Navathe– very widely used, but we don’t use it

• Foundations of Databases, Abiteboul, Hull, Vianu – Mostly theory of databases

• Data on the Web, Abiteboul, Buneman, Suciu– XML and other new/advanced stuff

Page 6: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

6

Other Required Readings

There will be reading assignments from the Web:

• SQL for Web Nerds, by Philip Greenspun, http://philip.greenspun.com/sql/

• Others, especially for XML

For SQL, a good source of information is the MSDN library (on your Windows machine)

Page 7: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

7

Outline for Today’s Lecture

• Overview of database systems– Reading assignment for Friday:

Introduction from SQL for Web Nerdshttp://philip.greenspun.com/sql/

• Course Outline

Page 8: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

8

What Is a Relational Database Management System ?

Database Management System = DBMS

Relational DBMS = RDBMS

• A collection of files that store the data

• A big C program written by someone else that accesses and updates those files for you

Page 9: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

9

Where are RDBMS used ?

• Backend for traditional “database” applications

• Backend for large Websites

• Backend for Web services

Page 10: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

10

Example of a Traditional Database Application

Suppose we are building a system

to store the information about:

• students

• courses

• professors

• who takes what, who teaches what

Page 11: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

11

Can we do it without a DBMS ?

Sure we can! Start by storing the data in files:

students.txt courses.txt professors.txt

Now write C or Java programs to implement specific tasks

Page 12: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

12

Doing it without a DBMS...

• Enroll “Mary Johnson” in “CSE444”:

Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”

Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”

Write a C program to do the following:

CRASH !

Page 13: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

13

Problems without an DBMS...

• System crashes:

– What is the problem ?

• Large data sets (say 50GB)– Why is this a problem ?

• Simultaneous access by many users– Lock students.txt – what is the problem ?

Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”

Read ‘students.txt’Read ‘courses.txt’Find&update the record “Mary Johnson”Find&update the record “CSE444”Write “students.txt”Write “courses.txt”

CRASH !

Page 14: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

14

Enters a DMBS

Data files

Database server(someone else’s

C program) Applications

connection

(ODBC, JDBC)

“Two tier system” or “client-server”

Page 15: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

15

Functionality of a DBMS

The programmer sees SQL, which has two components:• Data Definition Language - DDL• Data Manipulation Language - DML

– query language

Behind the scenes the DBMS has:• Query engine• Query optimizer• Storage management• Transaction Management (concurrency, recovery)

Page 16: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

16

How the Programmer Sees the DBMS

• Start with DDL to create tables:

• Continue with DML to populate tables:

CREATE TABLE Students (Name CHAR(30)SSN CHAR(9) PRIMARY KEY NOT NULL,Category CHAR(20)

) . . .

CREATE TABLE Students (Name CHAR(30)SSN CHAR(9) PRIMARY KEY NOT NULL,Category CHAR(20)

) . . .

INSERT INTO StudentsVALUES(‘Charles’, ‘123456789’, ‘undergraduate’). . . .

INSERT INTO StudentsVALUES(‘Charles’, ‘123456789’, ‘undergraduate’). . . .

Page 17: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

17

How the Programmer Sees the DBMS

• Tables:

• Still implemented as files, but behind the scenes can be quite complex

SSN Name Category 123-45-6789 Charles undergrad 234-56-7890 Dan grad … …

SSN CID 123-45-6789 CSE444 123-45-6789 CSE444 234-56-7890 CSE142 …

Students: Takes:

CID Name Quarter CSE444 Databases fall CSE541 Operating systems winter

Courses:

“data independence” = separate logical view from physical implementation

Page 18: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

18

Transactions

• Enroll “Mary Johnson” in “CSE444”:BEGIN TRANSACTION;

INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’

-- More updates here....

IF everything-went-OK THEN COMMIT;ELSE ROLLBACK

BEGIN TRANSACTION;

INSERT INTO Takes SELECT Students.SSN, Courses.CID FROM Students, Courses WHERE Students.name = ‘Mary Johnson’ and Courses.name = ‘CSE444’

-- More updates here....

IF everything-went-OK THEN COMMIT;ELSE ROLLBACK

If system crashes, the transaction is still either committed or aborted

Page 19: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

19

Transactions

• A transaction = sequence of statements that either all succeed, or all fail

• Transactions have the ACID properties:A = atomicity

C = consistency

I = isolation

D = durability

Page 20: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

20

Queries

• Find all courses that “Mary” takes

• What happens behind the scene ?– Query processor figures out how to answer the

query efficiently.

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

Page 21: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

21

Queries, behind the scene

Imperative query execution plan:

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

SELECT C.nameFROM Students S, Takes T, Courses CWHERE S.name=“Mary” and S.ssn = T.ssn and T.cid = C.cid

Declarative SQL query

Students Takes

sid=sid

sname

name=“Mary”

cid=cid

Courses

The optimizer chooses the best execution plan for a query

Page 22: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

22

Database Systems

• The big commercial database vendors:– Oracle– IBM (DB2)– Microsoft (SQL Server)– Sybase

• Some free database systems (Unix) :– Postgres– MySQL– Predator

• In CSE444 we use SQL Server.

Page 23: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

23

New Trends in Databases

• Object-relational DBs• Main memory DBs• XML XML XML !

– Relational databases with XML support– Middleware between XML and relational databases– Native XML database systems– Large-scale XML message systems– Lots of research here at UW on XML and databases

• Security

Page 24: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

24

Course Outline

Part I

• SQL (Chapter 7)

• The relational data model (Chapter 3)

• Database design (Chapters 2, 3, 7)

• XML, XPath, XQuery

Midterm: Monday, February 7 (in class)

Page 25: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

25

Course Outline

Part II• SQL Access Control (security)• Transactions• Data storage, indexes (Chapters 11-13)• Query execution and optimization (Chapter 15,16)• Recovery (Chapter 17)

Final: Wednesday, March 16th, 2:30-4:20, MGH 241 (this room)

Page 26: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

26

Out of Town

• I will be out of town during three lectures

• Ashish Gupta will be guest lecturer

Page 27: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

27

Structure

• Prerequisites: Data structures course (CSE-326).

• Work & Grading:– Homework: 25% (4 of them; some light programming)

– Project: 30% (next)

– Midterm: 15%

– Final: 25%

– Intangibles: 5%

Page 28: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

28

The Project

• Models data management needs of a company

• Will have three phases– Correspond to Real World phases of system

evolution in a company

Page 29: 1 Introduction to Database Systems CSE 444 Lecture #1 January 3, 2005.

29

So what is this course about, really ?

A bit of everything !• Languages: SQL, XPath, XQuery• Theory (Functional dependencies, normal forms)• Algorithms and data structures (in Part II)• Lots of programming and hacking for the project

Most importantly: how to meet Real World needs