Databases 2010 The Relational Model and SQLmis/dDB/sql-2010-1.pdf · Databases 2010 The Relational...

77
Databases 2010 The Relational Model and SQL Christian S. Jensen Computer Science, Aarhus University Acknowledgments: revised version of slides developed by Michael I. Schwartzbach

Transcript of Databases 2010 The Relational Model and SQLmis/dDB/sql-2010-1.pdf · Databases 2010 The Relational...

Databases 2010

The Relational Model and SQL

Christian S. Jensen

Computer Science, Aarhus University

Acknowledgments: revised version of slides developed by Michael I. Schwartzbach

2The Relational Model and SQL

What is a Database?

Queries are much more general than searching Efficient, convenient, and safe storage of and

multi-user access to very large amounts of persistent data

Main Entry: da·ta·base

Pronunciation: \ˈdā-tə-ˌbās, ˈda- also ˈdä-\

Function: noun

Date: circa 1962

: a usually large collection of data organized especially for rapid search and retrieval (as by a computer)— database transitive verb

3The Relational Model and SQL

What is a Database?

Queries are much more general than searching Efficient, convenient, and safe storage of and

multi-user access to massive amounts of persistent data

Main Entry: da·ta·base

Pronunciation: \ˈdā-tə-ˌbās, ˈda- also ˈdä-\

Function: noun

Date: circa 1962

: a usually large collection of data organized especially for rapid search and retrieval (as by a computer)— database transitive verb

Bank accounts

Blog archives

Google.com

Human genome

Amazon.com

Student records

4The Relational Model and SQL

Data Model

A (mathematical) representation of data• tables/relations

• trees

• graphs

Operations on data• insert, delete, update, query

Constraints on data• data types

• uniqueness

• dependencies

5The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

Simple but flexible and support many real-world applications

name age city

Joe 22 London

Jacques 27 Paris

Jose 34 Madrid

6The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

name age city

Joe 22 London

Jacques 27 Paris

Jose 34 Madrid

row (tuple)

7The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

name age city

Joe 22 London

Jacques 27 Paris

Jose 34 Madrid

schema

8The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

name age city

Joe 22 London

Jacques 27 Paris

Jose 34 Madrid

column

9The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

name age city

Joe 22 London

Jacques 27 Paris

Jose 34 Madrid

attribute

10The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

name age city

Joe 22 London

Jacques 27 Paris

Jose 34 Madrid

attribute value

11The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

Abstract tables• invariant under permutation of rows and columns

• no information is stored in the order

May or may not allow duplicate rows

name age city

Joe 22 London

Jacques 27 Paris

Jose 34 Madrid

12The Relational Model and SQL

The Relational Data Model

Data is stored in tables (relations)

Abstract tables• invariant under permutation of rows and columns

• no information is stored in the order

May or may not allow duplicate rows

city name age

Madrid Jose 34

London Joe 22

Paris Jacques 27

13The Relational Model and SQL

NULL Values

An attribute value may be NULL• it is unknown

• no value exists

• it is unknown or does not exist

NULL values are treated specially

animal color zoo

lion yellow Copenhagen

crocodile green London

Tyrannosaurus Rex NULL NULL

polar bear white Berlin

14The Relational Model and SQL

Advantages of The Relational Model

A simple, intuitive model

Often convenient for real-life data• but richer models are also needed, e.g., XML

An elegant mathematical foundation• set and multi-set theory

• relational algebra and calculi

Allows efficient algorithms

Industrial strength implementations are available

15The Relational Model and SQL

Schemas

Relation schema• name of the relation

• names of the attributes

• types of the attributes

• constraints

Database schema• collection of all relation schemas

16The Relational Model and SQL

Running Example

The database behind a tiny calendar system

• Rooms

• People

• Meetings

• Participants

• Equipment

17The Relational Model and SQL

Rooms

room: the name of a room

capacity: the number of people that it will hold

room capacity

Turing-216 6

Ada-333 26

Store-Aud 286

18The Relational Model and SQL

People

userid: unique user name

name: ordinary name

group: vip, tap, phd

office: a room or NULL

userid name group office

csj Christian S. Jensen vip Turing-216

doina Doina Bucur phd NULL

bnielsen Kai Birger Nielsen tap Hopper-017

19The Relational Model and SQL

Meetings

meetid: a unique id

date: the date of the meeting

slot: 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18

owner: the userid of the owner

what: a textual description

meetid date slot owner what

34716 2010-08-23 14 csj dDB

34717 2010-08-23 15 csj dDB

42835 2010-08-16 10 mis TA-meeting

20The Relational Model and SQL

Participants

meetid: the id of the meeting

pid: a userid or a room

status: u(nknown), a(ccept), d(ecline)

meetid pid status

34716 Store-Aud a

34716 csj a

42835 sigurd d

21The Relational Model and SQL

Equipment

room: the name of a room

type: the type of equipment

room type

Store-Aud projector

Store-Aud whiteboard

Hopper-017 mini-fridge

22The Relational Model and SQL

SQL

Structured Query Language

Invented by IBM in the 1970s (many versions)

High-Level, “declarative,” no low-level manipulations

Algebraic foundations

Representations, operations, constraints

Query optimization

DB2, Oracle, SQL Server, MySQL, …

23The Relational Model and SQL

Declaring Tables (1/3)

CREATE TABLE Rooms (

room VARCHAR(15),

capacity INT

);

CREATE TABLE People (

name VARCHAR(40),

office VARCHAR(15),

userid VARCHAR(15),

group CHAR(3)

);

24The Relational Model and SQL

Declaring Tables (2/3)

CREATE TABLE Meetings (

meetid INT,

date DATE,

slot INT,

owner VARCHAR(15),

what VARCHAR(40)

);

25The Relational Model and SQL

Declaring Tables (3/3)

CREATE TABLE Participants (

meetid INT,

pid VARCHAR(15),

status CHAR(1)

);

CREATE TABLE Equipment (

room VARCHAR(15),

type VARCHAR(20)

);

26The Relational Model and SQL

SQL Types

INT 217

CHAR(2) 'aa', 'ab', '12', '++'

VARCHAR(5) '', '12345', 'foo', 'x''y'

FLOAT 3.14, 42, 0.0018

DATE '2008-08-25'

TIME '14:15:00'

CLOB a text file BLOB a movie XML an XML document

27The Relational Model and SQL

Refinements

NOT NULL• the value cannot be NULL

DEFAULT value• a default value is specified

UNIQUE• the value is unique in the table

• unless it is NULL

PRIMARY KEY• the value is unique in the table

• the value is never NULL

• special syntax for multi-attribute primary keys

28The Relational Model and SQL

Refined Tables (1/3)

CREATE TABLE Rooms (

room VARCHAR(15) PRIMARY KEY,

capacity INT NOT NULL

);

CREATE TABLE People (

name VARCHAR(40) NOT NULL,

office VARCHAR(15),

userid VARCHAR(15) PRIMARY KEY,

group CHAR(3)

);

29The Relational Model and SQL

Declaring Tables (2/3)

CREATE TABLE Meetings (

meetid INT PRIMARY KEY,

date DATE,

slot INT,

owner VARCHAR(15) NOT NULL,

what VARCHAR(40)

);

30The Relational Model and SQL

Declaring Tables (3/3)

CREATE TABLE Participants (

meetid INT NOT NULL,

pid VARCHAR(15) NOT NULL,

status CHAR(1) DEFAULT 'u'

);

CREATE TABLE Equipment (

room VARCHAR(15) NOT NULL,

type VARCHAR(20) NOT NULL,

PRIMARY KEY (room, type)

);

31The Relational Model and SQL

SELECT-FROM-WHERE

The basic form of an SQL query

SELECT desired attributesFROM one or more tablesWHERE condition about the involved rows

Which meetings (“what”) have csj arranged?

SELECT what

FROM Meetings

WHERE owner = ‘csj';

meetid date slot owner what

34716 2010-08-23 14 csj dDB

34717 2010-08-23 15 csj dDB

42835 2010-08-16 10 mis TA-meeting

32The Relational Model and SQL

Simple Example

what

dDB

dDB

33The Relational Model and SQL

Loop Semantics for Single Table

Loop through all rows in the table

Check if the condition is true

Project the rows onto the desired attributes

Note that duplicates are kept...

34The Relational Model and SQL

Renaming in SELECT

The selected attributes can be given new names

SELECT name, group AS category

FROM People

WHERE office = ‘Ada-230';

name category

Vaida Ceikute phd

Rasmus Ibsen-Jensen phd

35The Relational Model and SQL

Expressions in SELECT

The attributes may have computed values

SELECT owner, date, slot*60 AS minute

FROM Meetings

WHERE owner = ‘csj';

owner date minute

csj 2010-08-23 840

csj 2010-08-23 900

36The Relational Model and SQL

Conditions in WHERE

AND, OR, NOT, =, <>, <, >, <=, >=, LIKE, ...

SELECT owner, what

FROM Meetings

WHERE slot >= 12 AND slot < 16

AND what LIKE '%beer%';

owner what

mis Afternoon beer

mis Belgian beer testing

mis Return empty beer bottles

37The Relational Model and SQL

3-Valued Logic

Arithmetic operations on NULL yield NULL

Any comparison with NULL yields unknown This gives 3 truth values: true, false, unknown Boolean connectives are defined appropriately

The WHERE clause accepts if the result is true

tt ff u

tt tt ff u

ff ff ff ff

u u ff u

tt ff u

tt tt tt tt

ff tt ff u

u tt u u

tt ff

ff tt

u u

AND NOTOR

38The Relational Model and SQL

A Surprise?

People

SELECT userid

FROM People

WHERE office='Turing-216' OR office<>'Turing-216';

userid name group office

csj Christian S. Jensen vip Turing-216

doina Doina Bucur phd NULL

bnielsen Kai Birger Nielsen tap Hopper-017

userid

csj

bnielsen

39The Relational Model and SQL

Testing for NULL

People

SELECT userid

FROM People

WHERE office IS NULL;

userid name group office

csj Christian S. Jensen vip Turing-216

doina Doina Bucur phd NULL

bnielsen Kai Birger Nielsen tap Hopper-017

userid

doina

40The Relational Model and SQL

Multiple Relations

Who have booked meetings on August 23, 2010?

SELECT name

FROM People, Meetings

WHERE date = '2010-08-23' AND

owner = userid;

The relations are joined

Multiple Relations Example

41The Relational Model and SQL

meetid date slot owner what

34716 2010-08-23 14 csj dDB

34717 2010-08-23 15 csj dDB

42835 2010-08-16 10 mis TA-meeting

userid name group office

csj Christian S. Jensen vip Turing-216

doina Doina Bucur phd NULL

bnielsen Kai Birger Nielsen tap Hopper-017

42The Relational Model and SQL

General Loop Semantics

Loop through all rows in all tables

For each combination• check if the condition is true

• project the rows onto the desired attributes

Note that duplicates are still kept...

43The Relational Model and SQL

Avoid possible name clashes

SELECT People.nameFROM People, Meetings

WHERE Meetings.date = '2008-08-23' AND Meetings.owner = People.userid;

Prefixing Attribute Variables

Multiple Relations

Who shares a room?

44The Relational Model and SQL

userid name group office

csj Christian S. Jensen vip Turing-216

vaida Vaida Ceikute phd Turing-216

ira Ira Assent vip Turing-217

roomie1 roomie2

Christian S. Jensen Vaida Ceikute

45The Relational Model and SQL

Naming Row Variables

Enables self-joins

SELECT p1.name AS roomie1, p2.name AS roomie2

FROM People p1, People p2WHERE p1.office = p2.office AND

p1.userid <> p2.userid;

A table of all roommates...

46The Relational Model and SQL

Avoiding Symmetric Pairs

SELECT p1.name AS roomie1,

p2.name AS roomie2

FROM People p1, People p2

WHERE p1.office = p2.office AND

p1.userid < p2.userid;

47The Relational Model and SQL

Aggregation

The SELECT clause may involve aggregate functions• SUM

• AVG

• COUNT

• MIN

• MAX

NULLs are ignored in these computations Except that count(*) counts all rows

48The Relational Model and SQL

Requirements

Aggregation of a column computes

a1 ⊗ a2 ⊗ a3 ⊗ ... ⊗ an

for some operator ⊗

This is only well-formed if ⊗ is• commutative: a ⊗ b = b ⊗ a

• associative: (a ⊗ b) ⊗ c = a ⊗ (b ⊗ c)

since the rows may be permuted

x

a1

a2

a3

...

an

49The Relational Model and SQL

Simple Example

What is the average capacity of a room?

SELECT AVG(capacity) AS average

FROM Rooms;

average

106

50The Relational Model and SQL

Avoiding Duplicates

SELECT DISTINCT removes duplicates

This is expensive!

But sometime necessary...

What kinds of equipment do we have?

SELECT DISTINCT type

FROM Equipment;

51The Relational Model and SQL

Avoiding Duplicates in Aggregation

How many kinds of equipment do we have?

SELECT COUNT(DISTINCT type) as number

FROM Equipment;

number

4

52The Relational Model and SQL

Scalar Functions

Lots of useful functions are available• integer and float functions

• string functions

• calendar functions

• ...

SELECT CHARACTER_LENGTH(name,CODEUNITS16),

UPPER(group)

FROM People;

53The Relational Model and SQL

Subqueries

Any query in parentheses can be used in• FROM clauses

• WHERE clauses

A query may be used as a value• if it returns only one row and one column

• otherwise, a run-time error occurs

54The Relational Model and SQL

Simple Example

Who shares an office with Ira?

SELECT name

FROM People

WHERE office = (SELECT office

FROM People

WHERE userid=‘ira');

55The Relational Model and SQL

Membership Tests

IN and NOT IN test membership in tables

Who has csj arranged to meet?

SELECT pid

FROM Participants

WHERE meetid IN (SELECT meetid

FROM Meetings

WHERE owner=‘csj')

AND

pid NOT IN (SELECT room

FROM Rooms);

Membership Tests

56The Relational Model and SQL

meetid pid status

34716 Store-Aud a

34716 csj a

42835 sigurd d

meetid date slot owner what

34716 2010-08-23 14 csj dDB

34717 2010-08-23 15 csj dDB

42835 2010-08-16 10 mis TA-meeting

57The Relational Model and SQL

Which meetings exceed the capacity of a room?

SELECT meetid

FROM Meetings

WHERE (SELECT COUNT(DISTINCT pid)

FROM Participants

WHERE meetid=Meetings.meetid AND

status<>'d' AND

pid NOT IN (SELECT room

FROM Rooms)

)

>

(SELECT capacity

FROM Rooms, Participants

WHERE room=pid AND meetid=Meetings.meetid)

;

Correlated Subqueries

58The Relational Model and SQL

Which meetings exceed the capacity of a room?

SELECT meetid

FROM Meetings

WHERE (SELECT COUNT(DISTINCT pid)

FROM Participants

WHERE meetid=Meetings.meetid AND

status<>'d' AND

pid NOT IN (SELECT room

FROM Rooms)

)

>

(SELECT capacity

FROM Rooms, Participants

WHERE room=pid AND meetid=Meetings.meetid)

;

Correlated Subqueries

static nested scope rules

59The Relational Model and SQL

EXISTS and NOT EXISTS

Check for emptiness or non-emptiness of a table

Who is alone in an office?

SELECT name

FROM People p1

WHERE NOT EXISTS (

SELECT *

FROM People

WHERE office = p1.office AND

userid <> p1.userid

);

60The Relational Model and SQL

ANY and ALL

Allow comparisons against• any row in a subquery

• all rows in a subquery

Which are the latest meetings that are planned?

SELECT what

FROM Meetings

WHERE date >= ALL(

SELECT date FROM Meetings

);

61The Relational Model and SQL

UNION, INTERSECT, and EXCEPT

Treat tables with the same schema as sets• remove duplicates (unless ALL is added)• computes ∪, ∩, and \

Who do not participate in a meeting they have themselves arranged?

(SELECT owner AS userid, meetid

FROM Meetings)

EXCEPT

(SELECT pid AS userid, meetid

FROM Participants);

62The Relational Model and SQL

The JOIN Operator

T1 JOIN T2 ON condition

is syntactic sugar for:

SELECT *

FROM T1,T2WHERE condition

63The Relational Model and SQL

Dangling Rows and FULL JOIN

T1 JOIN T2 ON condition

A row in T1 or T2 that does not match a row in the other table is dangling

An ordinary JOIN throws away dangling rows

A FULL JOIN preserves dangling rows by padding them with NULL values

A LEFT or RIGHT JOIN preserves dangling rows from one argument only

64The Relational Model and SQL

In which offices are meetings planned?

All offices with meetings or NULL SELECT office, meetid

FROM People LEFT JOIN Participants

ON pid=office;

Only those offices with meetings SELECT office, meetid

FROM People JOIN Participants

ON pid=office;

Simple Example

65The Relational Model and SQL

People and Participants

userid name group office

csj Christian S. Jensen vip Turing-216

doina Doina Bucur phd NULL

bnielsen Kai Birger Nielsen tap Hopper-017

meetid pid status

34716 Store-Aud a

34716 csj a

42835 sigurd d

66The Relational Model and SQL

Grouping

SELECT-FROM-WHERE-GROUP BY

Rows are grouped by a set of attributes

Aggregations in SELECT are done for each group

The attributes in SELECT must be either• aggregates or

• mentioned in the GROUP BY clause

67The Relational Model and SQL

How many meetings have each person arranged?

SELECT owner, COUNT(meetid) as number

FROM Meetings

GROUP BY owner;

Simple Example

owner number

amoeller 4

kjensen 1

csj 3

68The Relational Model and SQL

Advanced Example

What is the average number of invitations for the meetings that each person has arranged?

SELECT owner, AVG(pidno) AS average

FROM (SELECT owner,

m.meetid,

COUNT(pid) as pidno

FROM Meetings m, Participants p

WHERE m.meetid = p.meetid

GROUP BY owner, m.meetid)

GROUP BY owner;

69The Relational Model and SQL

HAVING

A HAVING clause may eliminate some groups

Which offices have more than one occupant?

SELECT office

FROM People

GROUP BY office

HAVING COUNT(*) > 1;

Attributes in HAVING must be aggregates or mentioned in GROUP BY

70The Relational Model and SQL

Modifications

SQL commands may modify the database

Three kinds of modifications• insert one or more rows

• delete one or more rows

• update existing rows or columns

Modifications do not return a result

71The Relational Model and SQL

INSERT INTO table VALUES (list of values);

INSERT INTO Participants

VALUES (42432, 'mis', 'a');

Optionally specify attribute names:

INSERT INTO

Participants(pid, status, meetid)

VALUES ('mis', 'a', 42432);

Missing values are NULL or defaults

Inserting a Single Row

72The Relational Model and SQL

Invite everyone Anders meets with to his Belgian beer tasting

INSERT INTO Participants (

SELECT 46432 AS meetid, pid, 'u' AS status

FROM Meetings, Participants

WHERE Meetings.meetid=Participants.meetid

AND owner = 'amoeller'

AND pid <> 'amoeller'

AND pid NOT IN (SELECT room FROM Rooms));

Inserting a Subquery

73The Relational Model and SQL

Deleting Some Rows

DELETE FROM table WHERE condition;

Delete Christian's office

DELETE FROM Rooms

WHERE room='Turing-216';

Delete all offices

DELETE FROM Rooms;

74The Relational Model and SQL

Delete all people with a roommate

DELETE FROM People p

WHERE EXISTS(

SELECT *

FROM People

WHERE office = p.office

AND userid <> p.userid

);

Deleting a Subquery

75The Relational Model and SQL

Meaning of Deletion

First the condition is computed for all rows

Then the deletions are performed

Otherwise the last person in a multi-person office would not be deleted!

76The Relational Model and SQL

Update

UPDATE table SET attribute assignmentsWHERE condition;

Move Anders to a smaller office

UPDATE People

SET office = 'Turing-213'

WHERE userid = 'amoeller';

77The Relational Model and SQL

SQL is Everywhere