Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf ·...

69
Advanced Database Systems Floris Geerts University of Antwerp

Transcript of Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf ·...

Page 1: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Advanced Database Systems

Floris Geerts

University of Antwerp

Floris Geerts (University of Antwerp) Advanced Database Systems 1 / 384

Page 2: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

What is a database?

Database = a very large, integrated collection of data.

Models real-world organisations

(e.g. enterprise, university, genome, ... ):I entities (e.g. students, modules, genes)I relationships (e.g. Joe is taking AD)

A DBMS is a software package designed to store, manage and querydatabases.

Floris Geerts (University of Antwerp) Advanced Database Systems 3 / 384

Page 3: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Database prehistory

Data entry Storage and retrieval

query processing sorting

Floris Geerts (University of Antwerp) Advanced Database Systems 4 / 384

Page 4: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Why use a database?

Why?

A DBMS provides generic functionality that otherwise would have to beimplemented over and over again.

Data independence;

E�cient access;

Data integrity and security;

Uniform data administration;

Concurrent access, recovery from crashes; and

Reduced application development time.

Floris Geerts (University of Antwerp) Advanced Database Systems 5 / 384

Page 5: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Why study databases?

Everybody needs them.

They are connected to most other areas of computer science:I programming languages and software engineering;I algorithms;I logic, discrete math, and theory of comp. (essential for data

organization and query languages); andI Systems issues: concurrency, operating systems, file organization and

networks.

There are lots of interesting problems, both in database research andin implementation.

Good design is always a challenge.

Floris Geerts (University of Antwerp) Advanced Database Systems 6 / 384

Page 6: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Modeling data

How to

model

the

data?

DBMS

Floris Geerts (University of Antwerp) Advanced Database Systems 7 / 384

Page 7: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Data models

Data model = a collection of concepts for describing data:I Relations, attributes, tuples (relational model)I Classes, subclasses, attributes, objects (object oriented)I Entities, relationships, attributes (entity-relationship)

A schema is a description of a particular collection of data using agiven data model.

The relational model of data is the most widely used model today:I Main concept: relation/table with rows and columnsI Every relation has a schema which describes the table.

Munros: MId MName Lat Long Height Rating1 The Saddle 57.167 5.384 1010 42 Ladhar Bheinn 57.067 5.750 1020 43 Schiehallion 56.667 4.098 1083 2.54 Ben Nevis 56.780 5.002 1343 1.5

Floris Geerts (University of Antwerp) Advanced Database Systems 8 / 384

Page 8: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

MunrosSir Hugh Thomas Munro (1856—1919)

Scottish mountaineer

List of mountains in Scotland over 3,000 feet(914.4 m), known as the Munros.

283 Munros in total (in 2009)

Floris Geerts (University of Antwerp) Advanced Database Systems 9 / 384

Page 9: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Levels of abstraction

Data in DBMS is described at three levels ofabstraction:

Views describe how users see the data

Conceptual Schema defines logicalstructure

Physical schema describes the files andindexes used

External Schema 1 External Schema 2 External Schema 3

Conceptual Schema

Physical Schema

Disk

Schemas are defined using data definition language (DDL) data ismodified/queried using data manipulation language (DML)

Floris Geerts (University of Antwerp) Advanced Database Systems 10 / 384

Page 10: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Example database

External Schema (View): All Munros that are not climbed

NotClimbed (MId: integer, MName: char(30))

Conceptual Schema:

Hikers (HId: integer, HName: char(30), Skill: char(3), Age: integer)

Munroes (MId: integer, MName: char(30), Lat: real, Long: real,

Height: integer, Rating: real)

Climbs (HId: integer, MId: integer, Date: data, Time: integer)

Physical Schema:I which relations are stored as unordered files.I which index structures are uses (e.g., on first attributes)

Floris Geerts (University of Antwerp) Advanced Database Systems 11 / 384

Page 11: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Data independence

Applications insulated from how data is structured and stored

Logical data independence: Protection from changes in logicalstructure of the data

I When conceptual schema changes, views can be redefinedI User can query same way as before

Physical data independence: Protection from physical changes in thestructure of the data

I When physical schema changes, conceptual schema stays the sameI Storage details are hidden from upper layers

Floris Geerts (University of Antwerp) Advanced Database Systems 12 / 384

Page 12: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

E�ciency

There are things that we like to do quickly and e�ciently:

I Give me all Munros higher than 1000mI Who climbed Ben Nevis?

We would like to program these as quickly as possible.

Such questions involving data stored in a DBMS are called queries.

DBMS ensures that such queries can be answered e�ciently usingpowerful query languages.

Floris Geerts (University of Antwerp) Advanced Database Systems 13 / 384

Page 13: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

Concurrency control

Concurrent execution of user queries is essential for good DBMSperformance:

I Disk access is slow therefore most e�cient access is for several usersconcurrently

Interleaving actions of di↵erent user programs/requests can lead toinconsistency:

I e.g. simultaneously money being transferred out of an account twicewhen su�cient funds only cover one transaction

DBMS ensures such problems do not occur!

Floris Geerts (University of Antwerp) Advanced Database Systems 14 / 384

Page 14: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Overview

DBMS structure

A typical DBMS has a layered architecture

Concurrency control and recovery not shown

One of several possible variations

Query optimization &

Execution

Relational operators

File access

Buffer management

Disk management

Disk

Some “real” DBMSmysql: www.mysql.org, open source, quite powerful

PostgreSQL: www.postgresql.org. open source, powerful

Microsoft Access: simple system, lots of nice GUI wrappers

Commercial systems:I Oracle 11g (www.oracle.com/database)I SQL Server 2008 (www.microsoft.com/sql)I DB2 (www.ibm.com/db2)

Floris Geerts (University of Antwerp) Advanced Database Systems 15 / 384

Page 15: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Outline

1 IntroductionOverviewRelational modelRelational query languages

2 Storage and indexing

3 Query evaluation

4 Query optimisation

5 Transactions, concurrency, and recovery

6 Parallel data management

Floris Geerts (University of Antwerp) Advanced Database Systems 16 / 384

Page 16: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Why study the relational model?

Its the dominant model in the marketplaceI Vendors: Microsoft, Oracle, IBM,I Open source: PostgreSQL, mysql, ...

SQL is the industrial realisation of the relational model

SQL has been standardised (several times)

Most of the commercial systems have substantially extended thestandard!

SQLSQL=Structured Query Language

Floris Geerts (University of Antwerp) Advanced Database Systems 17 / 384

Page 17: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

The relational model: early history

Proposed by E.F. Codd (IBM San Jose) 1970I Prior to this the dominant model was the network model (CODASYL)

Mid 70s: prototypesI Sequel at IBM San JoseI INGRES at UC Berkeley

1976-: System R at IBM San JoseI TransactionsI Query optimiserI Extended �-testing

Then...commercial systems... Figure: E.F. Codd

Floris Geerts (University of Antwerp) Advanced Database Systems 18 / 384

Page 18: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

The relational model: basics

A relational database is a collection of relations.

A relation consists of two parts:

I Relation instance: a table, with columns and rows.

I Relation schema: specifies the name of the relation, plus the name andtype of each column.

You can think of a relation instance as a set of rows or tuples

Floris Geerts (University of Antwerp) Advanced Database Systems 19 / 384

Page 19: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Example

Relation schema:

Climbs (HId: integer, MId: integer, Date: date, Time: integer)

relation name

field name

(attribute name)

domain

In general (and more formally):

R.f1 WD1; : : : ;fn WDn/

relation name

field name

(attribute name)

domain

Floris Geerts (University of Antwerp) Advanced Database Systems 20 / 384

Page 20: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Example

Relation instance:

Munros: MId MName Lat Long Height Rating1 The Saddle 57.167 5.384 1010 42 Ladhar Bheinn 57.067 5.750 1020 43 Schiehallion 56.667 4.098 1083 2.54 Ben Nevis 56.780 5.002 1343 1.5

Hikers: HId HName Skill Age123 Edmund EXP 80214 Arnold BEG 25313 Bridget EXP 33212 James MED 27

Climbs: HId MId Date Time123 1 10/10/88 5123 3 11/08/87 2.5313 1 12/08/89 4214 2 08/07/92 7313 2 06/07/94 5

relation name

field names

tuples/records/

rows

fields (attributes, columns)

Floris Geerts (University of Antwerp) Advanced Database Systems 21 / 384

Page 21: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Some terminology

A domain is a set of values. All domains in a relation must be atomic(indivisible).

Given a relation R(f1 : D1, . . . , fn : Dn), R is said to have arity(degree) n.

Given a relation instance, its cardinality is the number of rows.I For example, in Climbs, cardinality=5 and arity=4, domain of HId is

integer and that for Date is date.

Beware:Attributes within a table have di↵erent names; and

Tables have di↵erent names.

Floris Geerts (University of Antwerp) Advanced Database Systems 22 / 384

Page 22: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Relations and sets

A relation R(f1 : D1, . . . , fn : Dn) can be defined more formally as

{hf1 : d1, . . . , fn : dni | d1 2 Dom1, . . . , dn 2 Domn}.

Thus a relation is a set of tuples:

I There is no ordering of the tuples in the table; and

I There are no duplicate rows in the table.

Floris Geerts (University of Antwerp) Advanced Database Systems 23 / 384

Page 23: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

SQL

SQL is the ubiquitous language for relational databases.

Standardised by ANSI/ISO in 1986, 89, 92 and 1999.

Most DBMS support SQL-92 and currently most features of SQL-99are covered as well.

Part of SQL is a Data Definition Language (DDL) that supports:I creation of tables;I deletion of tables; andI modification of tables.

Floris Geerts (University of Antwerp) Advanced Database Systems 24 / 384

Page 24: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Creating tables

Consider

Munros(MId:int, MName:string, Lat:real, Long:real, Height:int,Rating:real)

Hikers(HId:int, HName:string, Skill:string, Age:int)

Climbs(HId:int, MId:int, Date:date, Time:int)

In its simplest use, SQL’s DDL provides a name and a type for eachcolumn of a table.

CREATE TABLE Hikers ( HId INTEGER,

HName CHAR(40),

Skill CHAR(3),

Age INTEGER )

Note that the domain of each field is specified and enforced by theDBMS.

Floris Geerts (University of Antwerp) Advanced Database Systems 25 / 384

Page 25: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Removing and altering tables

We can delete both the schema information and all the tuples, e.g.

DROP TABLE Hikers;

We can alter existing schemas, e.g. adding an extra field

ALTER TABLE Hikers

ADD COLUMN gender CHAR(2);

(every tuple is extended by a so-called null value).

or change the domain of a field:

ALTER TABLE Hikers

ALTER COLUMN gender CHAR(1);

or remove a fieldALTER TABLE Hikers

DROP COLUMN gender;

Floris Geerts (University of Antwerp) Advanced Database Systems 26 / 384

Page 26: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Adding and deleting tuples

Can insert tuples into a table, e.g

INSERT INTO Hikers (HId,HName,Skill,Age)

VALUES (314, ‘Sam’, ‘Exp’, 26);

Can remove tuples satisfying certain conditions, e.g.

DELETE

FROM Hikers H

WHERE H.Name=‘Arnold’

Can update tuples satisfying certain conditions, e.g.,

UPDATE Hikers H

SET H.Age=H.Age+1

WHERE H.Name=‘Arnold’;

More ways of changing things will be considered in the labs.

Floris Geerts (University of Antwerp) Advanced Database Systems 27 / 384

Page 27: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Updating tuples: old value semantics

Consider the following update:

UPDATE Hikers H

SET H.Age=H.Age+1

WHERE H.Age <= 25;

and instance: Hikers: HId HName Skill Age

123 Edmund EXP 80

214 Arnold BEG 25

313 Bridget EXP 33

212 James MED 27

WHERE clause is evaluated first, update (SET) statement second.

Floris Geerts (University of Antwerp) Advanced Database Systems 28 / 384

Page 28: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Updating tuples: old value semantics

Consider the following update:

UPDATE Hikers H

SET H.Age=H.Age+1

WHERE H.Age <= 25;

and instance: Hikers: HId HName Skill Age

123 Edmund EXP 80

214 Arnold BEG 25

313 Bridget EXP 33

212 James MED 27

WHERE clause is evaluated first, update (SET) statement second.

Floris Geerts (University of Antwerp) Advanced Database Systems 28 / 384

Page 29: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Updating tuples: old value semantics

Consider the following update:

UPDATE Hikers H

SET H.Age=H.Age+1

WHERE H.Age <= 25;

and instance: Hikers: HId HName Skill Age

123 Edmund EXP 80

214 Arnold BEG 26

313 Bridget EXP 33

212 James MED 27

WHERE clause is evaluated first, update (SET) statement second.

Floris Geerts (University of Antwerp) Advanced Database Systems 28 / 384

Page 30: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Integrity constraints (IC)

IC: condition that must be true for any instance of the database, e.g.,domain constraints.

I ICs are specified when schema is defined.I ICs are checked when relations are modified.

A legal instance of a relation is one that satisfies all specified ICs.I DBMS should not allow illegal instances.

If the DBMS checks ICs, stored data is more faithful to real-worldmeaning.

I Avoids data entry errors, too!

Floris Geerts (University of Antwerp) Advanced Database Systems 29 / 384

Page 31: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Primary key constraints

A set of fields is a key for a relation if:

1 No two distinct tuples can have same values in all key fields, and2 This is not true for any subset of the key.

Part 2 false? A superkey.

If theres >1 key for a relation, one of the keys is chosen (by DBA) tobe the primary key.

E.g., HId is a key for Hikers. (What about HName?). The set{HId ,HName} is a superkey.

Floris Geerts (University of Antwerp) Advanced Database Systems 30 / 384

Page 32: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Key constraints

CREATE TABLE Hikers ( HId INTEGER,HName CHAR(30),Skill CHAR(3),Age INTEGER,CONSTRAINT Blah PRIMARY KEY (HId) );

CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId, ) ;)

CONSTRAINT is optional and is only to provide name for constraint.

Updates that violate key constraints are rejected (and if constraintsare named, error message will include those names).

Do you think the key in the second example is the right choice? Becareful when assigning primary keys...

Floris Geerts (University of Antwerp) Advanced Database Systems 31 / 384

Page 33: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Key constraints

CREATE TABLE Hikers ( HId INTEGER,HName CHAR(30),Skill CHAR(3),Age INTEGER,CONSTRAINT Blah PRIMARY KEY (HId) );

CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId,Date) ;)

CONSTRAINT is optional and is only to provide name for constraint.

Updates that violate key constraints are rejected (and if constraintsare named, error message will include those names).

Do you think the key in the second example is the right choice? Becareful when assigning primary keys...

Floris Geerts (University of Antwerp) Advanced Database Systems 31 / 384

Page 34: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Key constraints

CREATE TABLE Hikers ( HId INTEGER,HName CHAR(30),Skill CHAR(3),Age INTEGER,UNIQUE (HName, Age)PRIMARY KEY (HId) );

Other keys can be specified using UNIQUE.

A tuple can only be referred to from elsewhere by storing its primarykey fields.

Index can be built on top of primary key fields to optimize access.

Floris Geerts (University of Antwerp) Advanced Database Systems 32 / 384

Page 35: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Foreign keys

Foreign key: set of fields in one relation that is used to “refer’ to atuple in another relation.

I Must correspond to primary key of the second relation.I Like a “logical pointer”.

E.g., we expect any MId value in the Climbs table to be included inthe MId column of the Munros table. Similarly for HId.

CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId,Date),FOREIGN KEY (HId) REFERENCES Hikers(HId),FOREIGN KEY (MId) REFERENCES Munros(MId) )

Floris Geerts (University of Antwerp) Advanced Database Systems 33 / 384

Page 36: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Foreign keys

Munros: MId MName Lat Long Height Rating1 The Saddle 57.167 5.384 1010 42 Ladhar Bheinn 57.067 5.750 1020 43 Schiehallion 56.667 4.098 1083 2.54 Ben Nevis 56.780 5.002 1343 1.5

Hikers: HId HName Skill Age123 Edmund EXP 80214 Arnold BEG 25313 Bridget EXP 33212 James MED 27

Climbs: HId MId Date Time123 1 10/10/88 5123 3 11/08/87 2.5313 1 12/08/89 4214 2 08/07/92 7313 2 06/07/94 5

Floris Geerts (University of Antwerp) Advanced Database Systems 34 / 384

Page 37: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Foreign keys

A foreign key can refer to the same relation.

E.g., extend Hikers with partner field containing the partner’s HId.Declare this field as foreign key referring to Hikers.

Hikers: HId HName Skill Age Partner

123 Edmund EXP 80 214

214 Arnold BEG 25 123

313 Bridget EXP 33 null

212 James MED 27 null

nonexisting

partners

no null values

No null values in primary key fields (they are used to identify tuples).

Floris Geerts (University of Antwerp) Advanced Database Systems 35 / 384

Page 38: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Enforcing integrity constraints

Consider Climbs and Munros; Climbs is a foreign key that referencesMunros.

What should be done if a Climbs tuple with a non-existent Munro idis inserted? (Reject it!)

What should be done if a Munro tuple is deleted?I Also delete all Climbs tuples that refer to it.I Disallow deletion of a Munro tuple that is referred to.I Set MId in Climbs tuples that refer to it to a default MId. (e.g., null in

case it is not a primary key field.)

Similar if primary key of Munro tuple is updated.

Floris Geerts (University of Antwerp) Advanced Database Systems 36 / 384

Page 39: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Integrity in SQL-99

SQL/99 supports all 4 options on deletes and updates.I Default is NO ACTION (delete/update is rejected)I CASCADE(also delete all tuples that refer to deleted tuple)I SET NULL /SET DEFAULT (sets foreign key value of referencing

tuple)

Default value has to be specified when creating table.

CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId,Date),FOREIGN KEY (HId) REFERENCES Hikers(HId),

ON DELETE NO ACTION

ON UPDATE SET DEFAULT

FOREIGN KEY (MId) REFERENCES Munros(MId)ON DELETE CASCADE

ON UPDATE SET DEFAULT )

Floris Geerts (University of Antwerp) Advanced Database Systems 37 / 384

Page 40: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Integrity in SQL-99

SQL/99 supports all 4 options on deletes and updates.I Default is NO ACTION (delete/update is rejected)I CASCADE(also delete all tuples that refer to deleted tuple)I SET NULL /SET DEFAULT (sets foreign key value of referencing

tuple)

Default value has to be specified when creating table.

CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE DEFAULT 7/10/2009,Time INTEGER,PRIMARY KEY (HId, MId,Date),FOREIGN KEY (HId) REFERENCES Hikers(HId),

ON DELETE NO ACTION

ON UPDATE SET DEFAULT

FOREIGN KEY (MId) REFERENCES Munros(MId)ON DELETE CASCADE

ON UPDATE SET DEFAULT )

Floris Geerts (University of Antwerp) Advanced Database Systems 37 / 384

Page 41: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational model

Where do ICs come from?

ICs are based upon the semantics of the real- world enterprise that isbeing described in the database relations.

We can check a database instance to see if an IC is violated, but wecan NEVER infer that an IC is true by looking at an instance.

I An IC is a statement about all possible instances!I From example, we know HName is not a key, but the assertion that

HId a key is given to us.

Key and foreign key ICs are the most common; more general ICssupported too.

Floris Geerts (University of Antwerp) Advanced Database Systems 38 / 384

Page 42: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Outline

1 IntroductionOverviewRelational modelRelational query languages

2 Storage and indexing

3 Query evaluation

4 Query optimisation

5 Transactions, concurrency, and recovery

6 Parallel data management

Floris Geerts (University of Antwerp) Advanced Database Systems 39 / 384

Page 43: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Relational query languages

Query languages allow the manipulation and retrieval of data from adatabase.

The relational model supports simple, powerful query languages:

I strong formal foundation; and

I allows for much (provably correct) optimisation.

NOTE: Query languages are not (necessarily) programming languages.

Floris Geerts (University of Antwerp) Advanced Database Systems 40 / 384

Page 44: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Formal relational query languages

Relational Algebra

Simple “operational” model, useful for expressing execution plans.

Relational Calculus

Logical model (declarative), useful for theoretical results.

Both languages were introduced by Codd in a series of papers.

They have equivalent expressive power.

They are the key to understanding SQL query processing!

Floris Geerts (University of Antwerp) Advanced Database Systems 41 / 384

Page 45: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Preliminaries

A query is applied to relation instances, and the result of a query isalso a relation instance.

input

instance

output

instance

query

For a given query, the schema of input relations are fixed.

The query will then execute over any valid instance.

The schema of the result can also be determined (and is fixed for thegiven query).

Floris Geerts (University of Antwerp) Advanced Database Systems 42 / 384

Page 46: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Relational algebra

Basic operations:I Selection (�): Selects a subset of rows from relation.I Projection (⇡): Deletes unwanted columns from relation.I Cross-product (⇥): Allows us to combine two relations.I Set-di↵erence (�): Allows us to subtract relations.I Union ([): Allows us to union relations.I Renaming (⇢): Allows to rename relation and field names.

Additional operations:I Intersection, join, division,I Not essential, but (very!) useful (especially join).

ClosureSince each operation returns a relation, operations can be composed!(One says that the algebra is closed.)

Floris Geerts (University of Antwerp) Advanced Database Systems 43 / 384

Page 47: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Projection

Choose a set of field names A and a table R

⇡A(R) extracts the columns in A from the table.

Example, given Munros =MId MName Lat Long Height Rating1 The Saddle 57.167 5.384 1010 42 Ladhar Bheinn 57.067 5.750 1020 43 Schiehallion 56.667 4.098 1083 2.54 Ben Nevis 56.780 5.002 1343 1.5

⇡MId,Rating(Munros) is

MId Rating1 42 43 2.54 1.5

Provides the user with a view by hiding some attributes.

Floris Geerts (University of Antwerp) Advanced Database Systems 44 / 384

Page 48: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Projection – continued

Suppose the result of a projection has a repeated value, how do wetreat it?

⇡Rating(Munros) is Rating442.51.5

or Rating42.51.5

?

In “pure” relational algebra the answer is always a set (recall that wedefined a relation instance as a set).

However, SQL and some other languages return a multiset for someoperations from which duplicates may be eliminated by a furtheroperation. (Why? Eliminating duplicates is expensive in practice).

Floris Geerts (University of Antwerp) Advanced Database Systems 45 / 384

Page 49: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Selection

Chooses tuples that satisfy some condition.

Selection �C (R) takes a table R and extracts those rows from it thatsatisfy the condition C .

For example,�Height > 1050(Munros) =

MId MName Lat Long Height Rating3 Schiehallion 56.667 4.098 1083 2.54 Ben Nevis 56.780 5.002 1343 1.5

What can go into a condition C?

Floris Geerts (University of Antwerp) Advanced Database Systems 46 / 384

Page 50: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Selection - continued

Conditions are built up from:

I Comparisons on attributes: R .A = R .A0, R .A 6= R .A0

I Comparisons on values. E.g., Height > 1000, MName = "BenNevis".

I Predicates constructed from these using _ (or), ^ (and), ¬ (not).E.g. (Lat > 57 ^ Height > 1000) _ (Height=Lat) .

A selection provides the user with a view of data by hiding tuples that donot satisfy the condition the user wants.

Floris Geerts (University of Antwerp) Advanced Database Systems 47 / 384

Page 51: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Combining selection and projection

Find all names and age of climbers of age > 30.

Relational algebra query

Q1 = ⇡HName,Age(�Age>30(Hikers))

An equivalent relational algebra query

Q2 = �Age>30(⇡HName,Age(Hikers))

The same declarative query can be translated into more than oneprocedural query.

Floris Geerts (University of Antwerp) Advanced Database Systems 48 / 384

Page 52: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Combining selection and projection

Are Q1 and Q2 the same?

They are semantically, as they produce the same result.

But they di↵er in terms of e�ciency:

I Q1 scans Hikers, selects some tuples, and the only scans selectedtuples.

I Q2 scans Hikers, projects out two attributes and then scans the resultagain.

Q1 is likely to be more e�cient than Q2.

Procedural languages can be optimized....

Floris Geerts (University of Antwerp) Advanced Database Systems 49 / 384

Page 53: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Set operations – union

If two tables have the same structure, we can perform set operations.

Same structure means union-compatible:I Same number of fields; andI Corresponding fields (taken from left to right) have the same domains.

Example:Hikers = HId HName Skill Age

123 Edmund EXP 80214 Arnold BEG 25313 Bridget EXP 33212 James MED 27

Climbers = HId HName Expertise Age214 Arnold BEG 25898 Jane MED 39

Hikers [ Climbers = HId HName Skill Age123 Edmund EXP 80214 Arnold BEG 25313 Bridget EXP 33212 James MED 27898 Jane MED 39

Output schema is that of the first relation (Hikers in the Example).

Floris Geerts (University of Antwerp) Advanced Database Systems 50 / 384

Page 54: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Set operations – set di↵erence

We can also take the di↵erence of two union-compatible tables:

Hikers � Climbers = HId HName Skill Age123 Edmund EXP 80313 Bridget EXP 33212 James MED 27

Again, output schema is that of the first relation.

Floris Geerts (University of Antwerp) Advanced Database Systems 51 / 384

Page 55: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Set operations – intersection

It turns out we can implement intersection in terms of otheroperations:

R \ S = R � (R � S)

Although it is mathematically nice to have fewer operators, this maynot be an e�cient way to implement intersection.

Intersection is also a special case of a join, which we’ll shortly discuss.

Floris Geerts (University of Antwerp) Advanced Database Systems 52 / 384

Page 56: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Cross product (Cartesian product)

The basic operation is the Cartesian product, R ⇥ S , whichconcatenates every tuple in R with every tuple in S .

Example:

A Ba1 b1a2 b2

C Dc1 d1c2 d2c3 d3

=

A B C Da1 b1 c1 d1a1 b1 c2 d2a1 b1 c3 d3a2 b2 c1 d1a2 b2 c2 d2a2 b2 c3 d3

Floris Geerts (University of Antwerp) Advanced Database Systems 53 / 384

Page 57: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Cartesian product – continued

What happens when we form a product of two tables with columnswith the same name?

Recall the schemas: Hikers(HId, HName, Skill, Age) andClimbs(HId, MId, Date,Time). What is the schema of Hikers ⇥Climbs?

Various possibilities including:I Forget the conflicting name (as in R&G) ((Hid), HName,Skill,

Age, (HId), MId, Date, Time). Allow positional references (bynumber) to columns.

I Label the conflicting colums with 1,2... (HId.1, HName,Skill, Age,HId.2, MId, Date, Time).

Neither of these is satisfactory. The product operation is no longercommutative (a property that is useful in optimization.)

Floris Geerts (University of Antwerp) Advanced Database Systems 54 / 384

Page 58: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Cartesian product – continued

If R1 has n tuples and R2 has m tuples then R1 ⇥ R2 has n ⇥mtuples.

This is an expensive operation: if R1 and R2 have both 1 000 tuples(small relation) then R1 ⇥ R2 has 1 000 000 tuples (large relation).

Query processors try to avoid building products - instead theyattempt to build only subsets which contain relevant information.

Floris Geerts (University of Antwerp) Advanced Database Systems 55 / 384

Page 59: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Renaming

To avoid confusion about attribute names, one can use the renamingoperator ⇢:

⇢(C (1 ! sid1, 5 ! sid2), Hikers⇥ Climbs)

This operatorI names result relation C ; andI explicitly names fields on positions 1 and 5 into sid1 and sid2.

In general,

⇢(R(oldname ! newname, . . . , position ! newname,E ),

Where E is a relational algebra expression.

Floris Geerts (University of Antwerp) Advanced Database Systems 56 / 384

Page 60: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Natural join

For obvious reasons of e�ciency we rarely use unconstrained crossproducts in practice.

A natural join (./) produces the set of all merges of tuples that agreeon their commonly named fields.

Example:HId MId Date Time123 1 10/10/88 5123 3 11/08/87 2.5313 1 12/08/89 4214 2 08/07/92 7313 2 06/07/94 5

./

HId HName Skill Age123 Edmund EXP 80214 Arnold BEG 25313 Bridget EXP 33212 James MED 27

=

HId MId Date Time HName Skill Age123 1 10/10/88 5 Edmund EXP 80123 3 11/08/87 2.5 Edmund EXP 80313 1 12/08/89 4 Bridget EXP 33214 2 08/07/92 7 Arnold BEG 25313 2 06/07/94 5 Bridget EXP 33

Floris Geerts (University of Antwerp) Advanced Database Systems 57 / 384

Page 61: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Natural Join – cont.

Natural join has interesting relationships with other operations. Whatis R ./ S when

I R = S

I R and S have no column names in common

I R and S have all column names in common, i.e., they are unioncompatible

Natural join has nice properties (assuming fields are identified bynames):

I Commutative: R ./ S = S ./ R

I Associative: R ./ (S ./ T ) = (R ./ S) ./ T

I Hence we can always simply write R1 ./ R2 ./ · · · ./ Rk .

Floris Geerts (University of Antwerp) Advanced Database Systems 58 / 384

Page 62: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Conditional Join

Extension of natural join in which a join condition is specified:

R ./C S for �C (R ./C S)

Special case in which join condition consists of equality conditions iscalled the equijoin.

A natural join is an equijoin in which equalities are specified on allcommon fields.

Floris Geerts (University of Antwerp) Advanced Database Systems 59 / 384

Page 63: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Interaction of the relational algebra operators

⇡A(R [ S) = ⇡A(R) [ ⇡A(S)

�C (R [ S) = �C (R) [ �C (S)

(R [ S) ./ T = (R ./ T ) [ (S ./ T )

T ./ (R [ S) = (T ./ R) [ (T ./ S).

Floris Geerts (University of Antwerp) Advanced Database Systems 60 / 384

Page 64: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Division

Suppose we have two tables with schemas R(A,B) and S(B). R/S isdefined to be the set of A values in R which are paired (in R) with allB values in S .

That is the set of all x for which ⇡B(S) ✓ ⇡B(�A=x(R)).

A/B = ⇡AR � ⇡A(⇡A(R) ./ ⇡B(S)� R)

The general definition of division extends this idea to more than oneattribute.

Floris Geerts (University of Antwerp) Advanced Database Systems 61 / 384

Page 65: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Examples

The names of people who have climbed The Saddle.

⇡HName(�MName="The Saddle"(Munros ./ Hikers ./ Climbs))

Note the optimization to:

⇡HName(�MName="The Saddle"(Munros) ./ Hikers ./ Climbs)

In what order would you perform the joins?

Floris Geerts (University of Antwerp) Advanced Database Systems 62 / 384

Page 66: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Examples – contThe highest Munro(s)This is more tricky. We first find the peaks (their MIds) that are lowerthan some other peak. LowerIds =

⇡MId(�Height<Height’(Munros ./ ⇡Height’(

⇢(Height ! Height’, Munros))))

(we could have used ⇥ instead of ./ here)Now we find the MIds of peaks that are not in this set (they must bethe peaks with maximum height)

MaxIds = ⇡MId(Munros)� LowerIds

Finally we get the names:

⇡MName(MaxIds ./ Munros)Floris Geerts (University of Antwerp) Advanced Database Systems 63 / 384

Page 67: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Examples – contThe names of hikers who have climbed all MunrosWe start by finding the set of HId,MId pairs for which the hiker hasnot climbed that peak.We do this by subtracting part of the Climbs table from the set of allHId,MId pairs. NotClimbed=

⇡HId(Hikers) ./ ⇡MId(Munros)� ⇡HId,MId(Climbs)

(we could have used ⇥ instead of ./ here)The HIds in this table identify the hikers who have not climed somepeak. By subtraction we get the HIds of hikers who have climbed allpeaks:

ClimbedAll = ⇡HId(Hikers)� ⇡HId(NotClimbed)

A join gets us the desired information:

⇡HName(Hikers ./ ClimbedAll)Floris Geerts (University of Antwerp) Advanced Database Systems 64 / 384

Page 68: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

What we cannot compute with relational algebra

Aggregate operations. E.g. “The number of hikers who have climbedSchiehallion” or “The average age of hikers”. These are possible inSQL which has numerous extensions to the relational algebra.

Recursive queries. Given a table Parent(Parent, Child) computethe Ancestor table. This appears to call for an arbitrary number ofjoins.

Non-relational data. For example, lists, arrays, multisets (bags); orrelations that are nested. These are ruled out by the relational datamodel, but they are important and are the province of object-orienteddatabases and “complex-object”/XML query languages.

Of course, we can always compute such things if we can talk to adatabase from a full-blown (Turing complete) programming language.

Floris Geerts (University of Antwerp) Advanced Database Systems 65 / 384

Page 69: Advanced Database Systems - UAntwerpenadrem.uantwerpen.be/sites/default/files/adbs-lect1_0.pdf · 2019. 1. 10. · organization and query languages); and I Systems issues: concurrency,

Introduction Relational query languages

Relational calculus

Declarative way of writing queries;

Ignorant of how things are computed;

Equivalent to relational algebra: Every query that can be expressed inthe relational algebra can be expressed in the calculus, and vice versa.

Floris Geerts (University of Antwerp) Advanced Database Systems 66 / 384