Introduction - Architecture of DBMS, Organizational...

655
Introduction Architecture of a DBMS Organizational Matters Recap Motivation General stuff Relational model/SQL-DDL Relational algebra/SQL DML Database Systems

Transcript of Introduction - Architecture of DBMS, Organizational...

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Database Systems

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Lecture �IntroductionArchitecture of DBMS, Organizational matters & Recap

    Database Systems

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Welcome all . . .

    . . . to this course whose lectures are primarily about digging in themud of database system internals.

    • While others talk about SQL and graphical query interfaces,we will

    � learn how DBMSs can access �les on hard disks withoutpaying toomuch for I/O tra�c,

    � see how to organize data on disk and which kind of“maps” for huge amounts of datawe can use to avoidto get lost,

    � assess what it means to sort/combine/�lter datavolumes that exceedmainmemory size by far, and

    � learn how user queries are represented and executedinside the database kernel.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Architecture of a DBMS / Course Outline

    data �les, indices, . . .

    Disk Space Manager

    Bu�er Manager

    Files and Access Methods

    Operator Evaluator Optimizer

    Executor Parser

    Lock Manager

    TransactionManager Recovery

    Manager

    DBMS

    Database

    SQL Commands

    Web Forms Applications SQL Interface

    thisco

    urse

    Figu

    reinsp

    iredby

    Ramak

    rish

    nan/Geh

    rke:

    “Datab

    aseMan

    agem

    entS

    ystems”,M

    cGraw-H

    ill��

    ��.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Architecture of a DBMS / Course Outline

    data �les, indices, . . .

    Disk Space Manager

    Bu�er Manager

    Files and Access Methods

    Operator Evaluator Optimizer

    Executor Parser

    Lock Manager

    TransactionManager Recovery

    Manager

    DBMS

    Database

    SQL Commands

    Web Forms Applications SQL Interface

    thisco

    urse

    Figu

    reinsp

    iredby

    Ramak

    rish

    nan/Geh

    rke:

    “Datab

    aseMan

    agem

    entS

    ystems”,M

    cGraw-H

    ill��

    ��.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Organizational Matters

    Lectures

    When WhereMondays, ��:��–��:�� G.���

    Website

    http://adrem.ua.ac.be/database-systems

    Contains slides/corrections/previous exams (with answers).

    Printed slides

    Will be available for purchase atCursusdienst, Room U���Groenenborgerlaan ������� Antwerpen

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Organizational Matters

    Examination

    • Written exam• Open book (you can bring anything that you want)

    No separate exercise sessions, no homeworks or project work.

    Contact Information

    • Email: �[email protected]• O�ce: G.���b• Contact hours: Only by appointment• Booking appointment: Only by email

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ReadingMaterial

    Slides should be self-contained.

    However, when in need for more information:

    • Raghu Ramakrishnan and Johannes Gehrke.DatabaseManagement Systems. McGraw-Hill.

    (any other database book will do as well.)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    What is a database?

    • Database = a very large, integrated collection of data.• Models real-world organisation (e.g. enterprise, university,genome, ... ):

    • entities (e.g. students, modules, genes)• relationships (e.g. Joe is taking Database Systems course)

    • A DBMS is a software package designed to store,manage andquery databases.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    Database prehistory

    Data entry Storage and retrieval

    query processing sorting

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Why use a database?

    Why?

    A DBMS provides generic functionality that otherwise would haveto be implemented over and over again.

    • Data independence;• E�cient access;• Data integrity and security;• Uniform data administration;• Concurrent access, recovery from crashes; and• Reduced application development time.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Why study databases?

    • Everybody needs them.• They are connected to most other areas of computer science:

    • programming languages and software engineering;• algorithms;• logic, discrete math, and theory of comp. (essential fordata organization and query languages); and

    • Systems issues: concurrency, operating systems, �leorganization and networks.

    • There are lots of interesting problems, both in databaseresearch and in implementation.

    How to store the data (=database design) is always a challenge.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Modeling data

    How to

    model

    the

    data?

    DBMS

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Datamodels

    • Datamodel = a collection of concepts for describing data:• Relations, attributes, tuples (relational model)• Classes, subclasses, attributes, objects (object oriented)• Entities, relationships, attributes (entity-relationship)

    • A schema is a description of a particular collection of datausing a given data model.

    • The relational model of data is the most widely used modeltoday:• Main concept: relation/tablewith rows and columns• Every relation has a schema which describes the table.

    Munros: MId MName Lat Long Height Rating� The Saddle ��.��� �.��� ���� �� Ladhar Bheinn ��.��� �.��� ���� �� Schiehallion ��.��� �.��� ���� �.�� Ben Nevis ��.��� �.��� ���� �.�

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Munros

    • Sir Hugh Thomas Munro (����—����)• Scottish mountaineer• List of mountains in Scotland over �,���feet (���.�m), known as the Munros.

    • ���Munros in total (in ����)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Levels of abstraction

    Data in DBMS is described at three levelsof abstraction:• Views/External Schemas describehow users see the data

    • Conceptual Schema de�neslogical structure

    • Physical schema describes the�les and indexes used

    External Schema 1 External Schema 2 External Schema 3

    Conceptual Schema

    Physical Schema

    Disk

    Schemas are de�ned using data de�nition language (DDL) data ismodi�ed/queried using data manipulation language (DML)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Example database

    • External Schema (View): All Munros that are not climbedNotClimbed (MId: integer, MName: char(30))

    • Conceptual Schema:Hikers (HId: integer, HName: char(30), Skill: char(3), Age: integer)

    Munroes (MId: integer, MName: char(30), Lat: real, Long: real,

    Height: integer, Rating: real)

    Climbs (HId: integer, MId: integer, Date: data, Time: integer)

    • Physical Schema:• which relations are stored as unordered �les.• which index structures are uses (e.g., on �rst attributes)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Data independence

    • Applications insulated from how data is structured and stored

    • Logical data independence: Protection from changes inlogical structure of the data• When conceptual schema changes, views can berede�ned

    • User can query same way as before

    • Physical data independence: Protection from physicalchanges in the structure of the data• When physical schema changes, conceptual schema staysthe same

    • Storage details are hidden from upper layers

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    E�ciency

    • There are things that we like to do quickly and e�ciently:• Give me all Munros higher than ����m• Who climbed Ben Nevis?

    • We would like to program these as quickly as possible.• Such questions involving data stored in a DBMS are calledqueries.

    • DBMS ensures that such queries can be answered e�cientlyusing powerful query languages.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Concurrency control

    • Concurrent execution of user queries is essential for goodDBMS performance:• Disk access is slow therefore most e�cient access isneeded when several users concurrently access the data

    • Interleaving actions of di�erent user programs/requests canlead to inconsistency:• e.g. when money is simultaneously being transferred outof an account twice when su�cient funds only cover onetransaction

    • DBMS ensures such problems do not occur!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    DBMS structure

    data �les, indices, . . .

    Disk Space Manager

    Bu�er Manager

    Files and Access Methods

    Operator Evaluator Optimizer

    Executor Parser

    Lock Manager

    TransactionManager Recovery

    Manager

    DBMS

    Database

    SQL Commands

    Web Forms Applications SQL Interface

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Some “real” DBMS

    • mysql: www.mysql.org, open source, quite powerful• PostgreSQL: www.postgresql.org. open source, powerful• Microsoft Access: simple system, lots of nice GUI wrappers• Commercial systems:

    • Oracle ��g (www.oracle.com/database)• SQL Server ���� (www.microsoft.com/sql)• DB� (www.ibm.com/db�)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Relational model?

    • It’s the dominant model in the marketplace• Vendors: Microsoft, Oracle, IBM,• Open source: PostgreSQL, mysql, ...

    • SQL is the industrial realisation of the relational model• SQL has been standardised (several times)• Most of the commercial systems have substantially extendedthe standard!

    SQL

    SQL=Structured Query Language

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    The relational model: early history

    • Proposed by E.F. Codd (IBM San José) ����• Prior to this the dominant model was the network model(CODASYL)

    • Mid ��’s: prototypes• Sequel at IBM San José• INGRES at UC Berkeley

    • ����-: System R at IBM San José• Transactions• Query optimiser• Extended β-testing

    • Then...commercial systems... Figure: E.F. Codd

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    The relational model: basics

    • A relational database is a collection of relations.• A relation consists of two parts:

    • Relation instance: a table, with columns and rows.• Relation schema: speci�es the name of the relation, plusthe name and type of each column.

    You can think of a relation instance as a set of rows or tuples

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Example

    • Relation schema:

    Climbs (HId: integer, MId: integer, Date: date, Time: integer)

    relation name

    field name

    (attribute name)

    domain

    • In general (and more formally):

    R.f1 WD1; : : : ;fn WDn/

    relation name

    field name

    (attribute name)

    domain

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Example

    • Relation instance:

    Munros: MId MName Lat Long Height Rating1 The Saddle 57.167 5.384 1010 42 Ladhar Bheinn 57.067 5.750 1020 43 Schiehallion 56.667 4.098 1083 2.54 Ben Nevis 56.780 5.002 1343 1.5

    Hikers: HId HName Skill Age123 Edmund EXP 80214 Arnold BEG 25313 Bridget EXP 33212 James MED 27

    Climbs: HId MId Date Time123 1 10/10/88 5123 3 11/08/87 2.5313 1 12/08/89 4214 2 08/07/92 7313 2 06/07/94 5

    relation name

    field names

    tuples/records/

    rows

    fields (attributes, columns)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Some terminology

    • A domain is a set of values. All domains in a relation must beatomic (indivisible).

    • Given a relation R(f� ⇥ D1, . . . , fn ⇥ Dn), R is said to have arity(degree) n.

    • Given a relation instance, its cardinality is the number ofrows.• For example, in Climbs, cardinality=� and arity=�,domain of HId is integer and that for Date is date.

    Beware:

    • Attributes within a table have di�erent names; and• Tables have di�erent names.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Relations and sets

    • A relation R(f� ⇥ D1, . . . , fn ⇥ Dn) can be de�nedmore formally as{Öf� ⇥ d� , . . . , fn ⇥ dnã ∂ d� " Dom� , . . . , dn " Domn}.

    • Thus a relation is a set of tuples:• There is no ordering of the tuples in the table; and• There are no duplicate rows in the table.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    SQL

    • SQL is the ubiquitous language for relational databases.• Standardised by ANSI/ISO in ����, ��, �� and ����.• Most DBMS support SQL-�� and currently most features ofSQL-�� are covered as well.

    • Part of SQL is a Data De�nition Language (DDL) thatsupports:• creation of tables;• deletion of tables; and• modi�cation of tables.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Creating tables

    • ConsiderMunros(MId:int, MName:string, Lat:real, Long:real, Height:int,

    Rating:real)

    Hikers(HId:int, HName:string, Skill:string, Age:int)Climbs(HId:int, MId:int, Date:date, Time:int)

    • In its simplest use, SQL’s DDL provides a name and a type foreach column of a table.

    CREATE TABLE Hikers ( HId INTEGER,HName CHAR(��),Skill CHAR(�),Age INTEGER )

    • Note that the domain of each �eld is speci�ed and enforcedby the DBMS.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Removing and altering tables

    • We can delete both the schema information and all the tuples,e.g.

    DROP TABLE Hikers;

    • We can alter existing schemas, e.g. adding an extra �eldALTER TABLE HikersADD COLUMN gender CHAR(�);

    (every tuple is extended by a so-called null value).• or change the domain of a �eld:

    ALTER TABLE HikersALTER COLUMN gender CHAR(�);

    • or remove a �eldALTER TABLE HikersDROP COLUMN gender;

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Adding and deleting tuples

    • Can insert tuples into a table, e.gINSERT INTO Hikers (HId,HName,Skill,Age)VALUES (���, ‘Sam’, ‘Exp’, ��);

    • Can remove tuples satisfying certain conditions, e.g.DELETEFROM Hikers HWHERE H.Name=‘Arnold’

    • Can update tuples satisfying certain conditions, e.g.,UPDATE Hikers HSET H.Age=H.Age+�WHERE H.Name=‘Arnold’;

    • More ways of changing/inserting data (LOAD, ...)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Updating tuples: old value semantics

    • Consider the following update:UPDATE Hikers HSET H.Age=H.Age+�WHERE H.Age

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Updating tuples: old value semantics

    • Consider the following update:UPDATE Hikers HSET H.Age=H.Age+�WHERE H.Age

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Updating tuples: old value semantics

    • Consider the following update:UPDATE Hikers HSET H.Age=H.Age+�WHERE H.Age

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Integrity constraints (IC)

    • IC: condition that must be true for any instance of thedatabase, e.g., domain constraints.• ICs are speci�ed when schema is de�ned.• ICs are checked when relations aremodi�ed.

    • A legal instance of a relation is one that satis�es all speci�edICs.• DBMS should not allow illegal instances.

    • If the DBMS checks ICs, stored data is more faithful toreal-world meaning.• Avoids data entry errors, too!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Primary key constraints

    • A set of �elds is a key for a relation if:� No two distinct tuples can have same values in all key�elds, and

    � This is not true for any subset of the key.

    • Part � false? A superkey.• If there are >� key for a relation, one of the keys is chosen (byDBA) to be the primary key.

    • E.g., HId is a key for Hikers. The set {HId,HName} is a superkey.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Key constraints

    CREATE TABLE Hikers ( HId INTEGER,HName CHAR(��),Skill CHAR(�),Age INTEGER,CONSTRAINT Blah PRIMARY KEY (HId) );

    CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId, ) );

    • CONSTRAINT is optional and is only to provide name forconstraint.

    • Updates that violate key constraints are rejected (and ifconstraints are named, error message will include thosenames).

    • Do you think the key in the second example is the rightchoice? Be careful when assigning primary keys...

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Key constraints

    CREATE TABLE Hikers ( HId INTEGER,HName CHAR(��),Skill CHAR(�),Age INTEGER,CONSTRAINT Blah PRIMARY KEY (HId) );

    CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId,Date) );

    • CONSTRAINT is optional and is only to provide name forconstraint.

    • Updates that violate key constraints are rejected (and ifconstraints are named, error message will include thosenames).

    • Do you think the key in the second example is the rightchoice? Be careful when assigning primary keys...

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Key constraints

    CREATE TABLE Hikers ( HId INTEGER,HName CHAR(��),Skill CHAR(�),Age INTEGER,UNIQUE (HName, Age)PRIMARY KEY (HId) );

    • Other keys can be speci�ed using UNIQUE.• A tuple can only be referred to from elsewhere by storing itsprimary key �elds.

    • Index can be built on top of primary key �elds to optimizeaccess.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Foreign keys

    • Foreign key: set of �elds in one relation that is used to “refer”to a tuple in another relation.• Must correspond to primary key of the second relation.• Like a “logical pointer”.

    • E.g., we expect any MId value in the Climbs table to beincluded in the MId column of the Munros table. Similarly forHId.

    CREATE TABLE Climbs ( HId INTEGER,MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId,Date),FOREIGN KEY (HId) REFERENCES Hikers(HId),FOREIGN KEY (MId) REFERENCES Munros(MId) )

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Foreign keys

    Munros: MId MName Lat Long Height Rating1 The Saddle 57.167 5.384 1010 42 Ladhar Bheinn 57.067 5.750 1020 43 Schiehallion 56.667 4.098 1083 2.54 Ben Nevis 56.780 5.002 1343 1.5

    Hikers: HId HName Skill Age123 Edmund EXP 80214 Arnold BEG 25313 Bridget EXP 33212 James MED 27

    Climbs: HId MId Date Time123 1 10/10/88 5123 3 11/08/87 2.5313 1 12/08/89 4214 2 08/07/92 7313 2 06/07/94 5

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Foreign keys

    • A foreign key can refer to the same relation.• E.g., extend Hikers with partner �eld containing the partner’sHId. Declare this �eld as foreign key referring to Hikers.

    Hikers: HId HName Skill Age Partner

    123 Edmund EXP 80 214214 Arnold BEG 25 123313 Bridget EXP 33 null212 James MED 27 null

    nonexisting

    partners

    no null values

    • No null values in primary key �elds (they are used to identifytuples).

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Enforcing integrity constraints

    • Consider Climbs and Munros; Climbs is a foreign key thatreferences Munros.

    • What should be done if a Climbs tuple with a non-existentMunro id is inserted? (Reject it!)

    • What should be done if a Munro tuple is deleted?• Also delete all Climbs tuples that refer to it.• Disallow deletion of a Munro tuple that is referred to.• Set MId in Climbs tuples that refer to it to a default MId.(e.g., null in case it is not a primary key �eld.)

    • Similar if primary key of Munro tuple is updated.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Integrity in SQL-��

    • SQL/�� supports all � options on deletes and updates.• Default is NO ACTION (delete/update is rejected)• CASCADE(also delete all tuples that refer to deleted tuple)• SET NULL /SET DEFAULT (sets foreign key value ofreferencing tuple)

    • Default value has to be speci�ed when creating table.CREATE TABLE Climbs ( HId INTEGER,

    MId INTEGER,Date DATE,Time INTEGER,PRIMARY KEY (HId, MId,Date),FOREIGN KEY (HId) REFERENCES Hikers(HId),

    ONDELETE NO ACTIONONUPDATE SET DEFAULT

    FOREIGN KEY (MId) REFERENCES Munros(MId)ONDELETE CASCADEONUPDATE SET DEFAULT )

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Where do ICs come from?

    • ICs are based upon the semantics of the real- world enterprisethat is being described in the database relations.

    • We can check a database instance to see if an IC is violated, butwe can NEVER infer that an IC is true by looking at an instance.

    • An IC is a statement about all possible instances!

    • Key and foreign key ICs are the most common; more generalICs supported too.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Relational query languages

    • Query languages allow the manipulation and retrieval of datafrom a database.

    • The relational model supports simple, powerful querylanguages:

    • strong formal foundation; and• allows for much (provably correct) optimisation.

    • NOTE: Query languages are not (necessarily) programminglanguages.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Formal relational query languages

    Relational Algebra

    Simple “operational” model, useful for expressing execution plans.

    Relational Calculus

    Logical model (declarative), useful for theoretical results.

    • Both languages were introduced by Codd in a series of papers.• They have equivalent expressive power.

    SQL

    Standardized query language used for specifying queries in DBMS.

    • Relational algebra is the key to understanding SQL queryprocessing!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Preliminaries

    • A query is applied to relation instances, and the result of aquery is also a relation instance.

    input

    instance

    output

    instance

    query

    • For a given query, the schema of input relations are �xed.• The query will then execute over any valid instance.• The schema of the result can also be determined (and is �xedfor the given query).

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Relational algebra

    • Basic operations:• Selection (σ): Selects a subset of rows from relation.• Projection (π): Deletes unwanted columns from relation.• Cross-product (✓): Allows us to combine two relations.• Set-di�erence (�): Allows us to subtract relations.• Union (

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Projection

    • Choose a set of �eld names A and a table R• πA(R) extracts the columns in A from the table.• Example, given Munros =

    MId MName Lat Long Height Rating� The Saddle ��.��� �.��� ���� �� Ladhar Bheinn ��.��� �.��� ���� �� Schiehallion ��.��� �.��� ���� �.�� Ben Nevis ��.��� �.��� ���� �.�

    • πMId,Rating(Munros) isMId Rating� �� �� �.�� �.�

    Provides the user with a view by hiding some attributes.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Projection – continued

    • Suppose the result of a projection has a repeated value, howdo we treat it?

    • πRating(Munros) is Rating���.��.�

    or Rating��.��.�

    ?

    • In “pure” relational algebra the answer is always a set (recallthat we de�ned a relation instance as a set).

    • However, SQL and some other languages return amultiset forsome operations from which duplicates may be eliminated bya further operation. (Why? Eliminating duplicates is expensivein practice).

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Selection

    • Chooses tuples that satisfy some condition.• Selection σC(R) takes a table R and extracts those rows from itthat satisfy the condition C.

    • For example,σHeight > 1050(Munros) =

    MId MName Lat Long Height Rating� Schiehallion ��.��� �.��� ���� �.�� Ben Nevis ��.��� �.��� ���� �.�

    • What can go into a condition C?

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Selection - continued

    • Conditions are built up from:• Comparisons on attributes: R.A = R.A¨, R.A j R.A¨• Comparisons on values. E.g., Height > ����, MName ="Ben Nevis".

    • Predicates constructed from these using 1 (or), 0 (and), ¬(not).E.g. (Lat > �� 0 Height > ����) 1 (Height=Lat) .

    A selection provides the user with a view of data by hiding tuplesthat do not satisfy the condition the user wants.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Combining selection and projection

    • Find all names and age of climbers of age > ��.• Relational algebra query

    Q� = πHName,Age(σAge>��(Hikers))

    • An equivalent relational algebra queryQ� = σAge>��(πHName,Age(Hikers))

    The same declarative query can be translated into more than oneprocedural query.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Combining selection and projection

    • Are Q� and Q� the same?• They are semantically, as they produce the same result.• But they di�er in terms of e�ciency:

    • Q� scans Hikers, selects some tuples, and the only scansselected tuples.

    • Q� scans Hikers, projects out two attributes and thenscans the result again.

    • Q� is likely to be more e�cient than Q� .• Procedural languages can be optimized....

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Set operations – union

    • If two tables have the same structure, we can perform setoperations.

    • Same structure means union-compatible:• Same number of �elds; and• Corresponding �elds (taken from left to right) have thesame domains.

    • Example:Hikers = HId HName Skill Age

    ��� Edmund EXP ����� Arnold BEG ����� Bridget EXP ����� James MED ��

    Climbers = HId HName Expertise Age��� Arnold BEG ����� Jane MED ��

    Hikers < Climbers = HId HName Skill Age��� Edmund EXP ����� Arnold BEG ����� Bridget EXP ����� James MED ����� Jane MED ��

    • Output schema is that of the �rst relation (Hikers in theExample).

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Set operations – set di�erence

    • We can also take the di�erence of two union-compatibletables:

    Hikers � Climbers = HId HName Skill Age��� Edmund EXP ����� Bridget EXP ����� James MED ��

    • Again, output schema is that of the �rst relation.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Set operations – intersection

    • It turns out we can implement intersection in terms of otheroperations:

    R = S = R � (R � S)

    • Although it is mathematically nice to have fewer operators,this may not be an e�cient way to implement intersection.

    • Intersection is also a special case of a join, which we’ll shortlydiscuss.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Cross product (Cartesian product)

    • The basic operation is the Cartesian product, R ✓ S, whichconcatenates every tuple in Rwith every tuple in S.

    • Example:

    A Ba� b�a� b�

    C Dc� d�c� d�c� d�

    =

    A B C Da� b� c� d�a� b� c� d�a� b� c� d�a� b� c� d�a� b� c� d�a� b� c� d�

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Cartesian product – continued

    • What happens when we form a product of two tables withcolumns with the same name?

    • Recall the schemas: Hikers(HId, HName, Skill, Age) andClimbs(HId, MId, Date,Time). What is the schema of Hikers ✓Climbs?

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Renaming

    • To avoid confusion about attribute names, one can use therenaming operator ρ:

    ρ(C(� � sid�, � � sid�),Hikers ✓ Climbs)

    • This operator• names result relation C; and• explicitly names �elds on positions � and � into sid� andsid�.

    • In general,

    ρ(R(oldname � newname, . . . , position � newname, E),

    Where E is a relational algebra expression.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Cartesian product – continued

    • If R� has n tuples and R� hasm tuples then R� ✓ R� has n ✓mtuples.

    • This is an expensive operation: if R� and R� have both � ���tuples (small relation) then R� ✓ R� has � ��� ��� tuples (largerelation).

    • Query processors try to avoid building products - instead theyattempt to build only subsets which contain relevantinformation.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Natural join

    • For obvious reasons of e�ciency we rarely use unconstrainedcross products in practice.

    • A natural join (*) produces the set of all merges of tuples thatagree on their commonly named �elds.

    • Example:HId MId Date Time��� � ��/��/�� ���� � ��/��/�� �.���� � ��/��/�� ���� � ��/��/�� ���� � ��/��/�� �

    *

    HId HName Skill Age��� Edmund EXP ����� Arnold BEG ����� Bridget EXP ����� James MED ��

    =

    HId MId Date Time HName Skill Age��� � ��/��/�� � Edmund EXP ����� � ��/��/�� �.� Edmund EXP ����� � ��/��/�� � Bridget EXP ����� � ��/��/�� � Arnold BEG ����� � ��/��/�� � Bridget EXP ��

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Natural Join – cont.

    • Natural join has interesting relationships with otheroperations. What is R * Swhen

    • R = S• R and S have no column names in common• R and S have all column names in common, i.e., they areunion compatible

    • Natural join has nice properties (assuming �elds are identi�edby names):

    • Commutative: R * S = S * R• Associative: R * (S * T) = (R * S) * T• Hence we can always simply write R� * R� * ⇧ * Rk .

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Conditional Join

    • Extension of natural join in which a join condition is speci�ed:R *C S for σC(R *C S)

    • Special case in which join condition consists of equalityconditions is called the equijoin.

    • A natural join is an equijoin in which equalities are speci�edon all common �elds.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Interaction of the relational algebra operators

    • πA(R < S) = πA(R) < πA(S)• σC(R < S) = σC(R) < σC(S)•(R < S) * T = R * T < S * T

    • T * (R < S) = T * R < T * S.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Examples

    • The names of people who have climbed The Saddle.

    πHName(σMName="The Saddle"(Munros * Hikers * Climbs))

    • Note the optimization to:πHName(σMName="The Saddle"(Munros) * Hikers * Climbs)

    • In what order would you perform the joins?

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Examples – cont

    • The highest Munro(s)• This is more tricky. We �rst �nd the peaks (their MIds) that arelower than some other peak. LowerIds =

    πMId(σHeight

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Examples – cont

    • The names of hikers who have climbed all Munros• We start by �nding the set of HId,MId pairs for which the hikerhas not climbed that peak.

    • We do this by subtracting part of the Climbs table from theset of all HId,MId pairs. NotClimbed=

    πHId(Hikers) * πMId(Munros) � πHId,MId(Climbs)(we could have used ✓ instead of * here)

    • The HIds in this table identify the hikers who have not climedsome peak. By subtraction we get the HIds of hikers who haveclimbed all peaks:

    ClimbedAll = πHId(Hikers) � πHId(NotClimbed)

    • A join gets us the desired information:πHName(Hikers * ClimbedAll)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    SQL DML

    Part of SQL is also the DataManipulation Language (DML) whichis used to ask queries to the DBMS.SQL adds more expressive power to relational algebra bysupporting:• grouping• aggregation• ....

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Basic SQL Query

    SELECT [DISTINCT] target-listFROM relation-listWHERE condition

    • relation-list: A list of table names. A table namemay befollowed by a “range variable” (an alias)

    • target-list: A list of attributes of the tables in relation-list: orexpressions built on these.

    • condition: Much like a condition in the relational algebra.Some more elaborate predicates (e.g. string matching usingregular expressions) are available.

    • DISTINCT: This optional keyword indicates that duplicatesshould be eliminated from the result. Default is that duplicatesare not eliminated.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Less basic SQL query

    SELECT [DISTINCT] select-listFROM from-listWHERE quali�cationGROUP BY grouping-listHAVING group-quali�cation

    • grouping-quali�cation: must have a single value per group!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    SQL: selfstudy

    Recall what is SQL and understand relationship with relationalalgebra.(I will put some SQL slides online).

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Limitations of SQL

    • Cannot express everything• Balance between expressiveness and e�ciency• If more power is needed: use programming language thatallows interaction with DBMS.• Almost every common programming language o�erssuch support.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Basic SQL query

    SELECT [DISTINCT] select-listFROM from-listWHERE quali�cation

    • from-list: A list of table names. A table namemay be followedby a “range variable” (an alias);

    • select-list: A list of attributes of the tables in from-list, orexpressions built on these;

    • quali�cation: Much like a condition in the relational algebra.Some more elaborate predicates (e.g. string matching usingregular expressions) are available in SQL; and �nally

    • DISTINCT: This optional keyword indicates that duplicatesshould be eliminated from the result. Default is that duplicatesare not eliminated.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Conceptual evaluation strategy

    � Compute the cross product of tables in from-list;� Discard tuples that fail the quali�cation;� Delete attributes not in select-list; and �nally� If DISTINCT then eliminate duplicates.

    Warning

    This is probably a very bad way of executing the query, and a goodquery optimizer will use all sorts of tricks to �nd e�cient strategiesto compute the query answer.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Select-Project queries

    SELECT *FROM MunrosWHERE Lat > ��;

    (* means all attributes)

    givesMId MName Lat Long Height Rating� The Saddle ��.��� �.��� ���� �� Ladhar Bheinn ��.��� �.��� ���� �

    SELECT Height, RatingFROM Munros; gives

    Height Rating� ����� �����.� �����.� ����

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Product

    SELECT *FROM Hikers, Climbs gives

    HId HName Skill Age HId MId Date Time��� Edmund EXP �� ��� � ��/��/�� ���� Arnold BEG �� ��� � ��/��/�� ���� Bridget EXP �� ��� � ��/��/�� ���� James MED �� ��� � ��/��/�� ���� Edmund EXP �� ��� � ��/��/�� �.���� Arnold BEG �� ��� � ��/��/�� �.�. . . . . . . . . . . . . . . . . . . . . . . .

    • Note that column names get duplicated. (One tries not to letthis happen.)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Product with selection (join)

    SELECT H .HName, C .MIdFROM Hikers H , Climbs CWHERE H .HId = C .HIdAND C .Time >= �

    gives

    HName MIdEdmund �Arnold �Bridget �

    • Note the use of aliases (range variables) H and C .• When we want to join a table to itself, they areessential.

    • Good practice is to always use them...

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Product (using renaming)

    SELECT H.Hid, H.HName, H.Skill, H.Age,C.HId AS HID� , C.Date, C.Time

    FROM Hikers H, Climbs Cgives

    HId HName Skill Age HId� MId Date Time��� Edmund EXP �� ��� � ��/��/�� ���� Arnold BEG �� ��� � ��/��/�� ���� Bridget EXP �� ��� � ��/��/�� ���� James MED �� ��� � ��/��/�� ���� Edmund EXP �� ��� � ��/��/�� �.���� Arnold BEG �� ��� � ��/��/�� �.�. . . . . . . . . . . . . . . . . . . . . . . .

    • Columns can be relabelled using AS .

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Duplicate elimination

    SELECT RatingFROM Routes; gives

    Rating���.��.�

    SELECT DISTINCT RatingFROM Routes; gives

    Rating��.��.�

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Stringmatching

    • LIKE is a predicate that can be used in where clause. is a wildcard – it denotes any character. � stands for � or morecharacters.

    SELECT *FROM MunrosWHERE MName LIKE ’S�on’

    gives

    MId MName Lat Long Height Rating� Schiehallion ��.��� �.��� ���� �.�

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Arithmetic

    • Arithmetic can be used in the SELECT part of the query as wellas in the WHERE part.

    SELECT MName, Height * �.�� AS HeightInFeetFROM MunrosWHERE Lat + Long > �� ;

    gives

    MName HeightInFeetThe Saddle ����Ladhar Bheinn ����

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Ordering output• ORDER BY can be used to sort the output in ascending orderbased on some attributes;

    • ORDER BY ... DESC does this in descending order.• In case of multiple attributes, sorting is done according to howthe attributes are listed.

    SELECT *FROMMunrosORDER BY Long;

    gives

    MId MName Lat Long Height Rating� Schiehallion ��.��� �.��� ���� �.�� Ben Nevis ��.��� �.��� ���� �.�� The Saddle ��.��� �.��� ���� �� Ladhar Bheinn ��.��� �.��� ���� �

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Ordering output• ORDER BY can be used to sort the output in ascending orderbased on some attributes;

    • ORDER BY ... DESC does this in descending order.• In case of multiple attributes, sorting is done according to howthe attributes are listed.

    SELECT *FROMMunrosORDER BY Long DESC;

    gives

    MId MName Lat Long Height Rating� Ladhar Bheinn ��.��� �.��� ���� �� The Saddle ��.��� �.��� ���� �� Ben Nevis ��.��� �.��� ���� �.�� Schiehallion ��.��� �.��� ���� �.�

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Ordering output• ORDER BY can be used to sort the output in ascending orderbased on some attributes;

    • ORDER BY ... DESC does this in descending order.• In case of multiple attributes, sorting is done according to howthe attributes are listed.

    SELECT *FROMMunrosORDER BY Rating, MName;

    gives

    MId MName Lat Long Height Rating� Ben Nevis ��.��� �.��� ���� �.�� Schiehallion ��.��� �.��� ���� �.�� Ladhar Bheinn ��.��� �.��� ���� �� The Saddle ��.��� �.��� ���� �

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Ordering output• ORDER BY can be used to sort the output in ascending orderbased on some attributes;

    • ORDER BY ... DESC does this in descending order.• In case of multiple attributes, sorting is done according to howthe attributes are listed.

    SELECT *FROMMunrosORDER BY Rating, MName DESC;

    gives

    MId MName Lat Long Height Rating� Ben Nevis ��.��� �.��� ���� �.�� Schiehallion ��.��� �.��� ���� �.�� The Saddle ��.��� �.��� ���� �� Ladhar Bheinn ��.��� �.��� ���� �

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Set operations – union

    SELECT HIdFROM HikersWHERE Skill = ‘EXP’UNIONSELECT HIdFROM ClimbsWHERE MId = �;

    givesHId������

    • The default is to eliminate duplicates from the union.• To preserve duplicates, use UNION ALL .

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Union compatibility

    SELECT HIdFROM HikersUNIONSELECT MIdFROM Climbs;

    gives HId���������������SELECT HName

    FROM HikersUNIONSELECT MIdFROM Munros;

    gives Error!!!

    • It means that the types as determined by the order of thecolumns must agree

    • The column names are taken from the �rst operand.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Intersection and di�erence

    • The operator names are INTERSECT for =, and MINUS(sometimes EXCEPT ) for �.

    • These are set operations (they eliminateduplicates).

    • MINUS ALL and INTERSECT ALL retain duplicates.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Nested queries

    • The predicate x IN S tests for set membership. Consider:SELECT HIdFROM ClimbsWHERE HId IN (SELECT HId

    FROM HikersWHERE Age < ��) ;

    andSELECT HIdFROM ClimbsINTERSECTSELECT HIdFROM HikersWHERE Age < ��

    • A “di�erence” can be written as:SELECT HIdFROM ClimbsWHERE HId NOT IN (SELECT HId

    FROM HikersWHERE Age < ��) ;

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Correlated Nested Queries

    • “Correlated” means using a variable in an inner scope.SELECT HId FROM Hikers hWHERE EXISTS (SELECT * FROM Climbs c

    WHERE h .HId=c.HId AND c.MId = �);

    SELECT HId FROM Hikers hWHERE NOT EXISTS (SELECT * FROM Climbs c

    WHERE h .CId=c.CId);

    SELECT HId FROM Hikers hWHERE EXISTS UNIQUE (SELECT * FROM Climbs c

    WHERE h .CId=c.CId);

    • EXISTS = non-empty, NOT EXISTS = empty, EXISTS UNIQUE =singleton set.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Comparisons with sets

    • x op ANY Smeans x op s for some s " S (S is a set)• x op ALL Smeans x op s for all s " S

    SELECT HName, AgeFROM HikersWHERE Age >= ALL (SELECT Age

    FROM Hikers)

    SELECT HName, AgeFROM HikersWHERE Age > ANY (SELECT Age

    FROM HikersWHERE HName=’Arnold’)

    What do these mean?

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    SQL is compositional

    • You can use a SELECT ... expression wherever you can use atable name.

    • Consider the query: “Find the names of hikers who have notclimbed any peak.”

    SELECT HNameFROM ( SELECT HId

    FROM HikersMINUSSELECT HIdFROM Climbs) Temp,Hikers

    WHERE Temp.HId = Hikers.HId;

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Views

    • To make complex queries understandable, we shoulddecompose them into understandable pieces.

    • E.g. We want to say something like:NC := SELECT HId

    FROM HikersMINUSSELECT HIdFROM Climbs ;

    and then SELECT HNameFROM NC, HikersWHERE NC.HId = Hikers.HId;

    Instead we writeCREATE VIEW NC

    AS SELECT HIdFROM HikersMINUSSELECT HIdFROM Climbs ;

    and then SELECT HNameFROM NC, HikersWHERE NC.HId = Hikers.HId;

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Evaluating queries on views

    CREATE VIEW MyPeaksAS SELECT MName, HeightFROM Munros

    and

    SELECT *FROM MyPeaksWHERE MName = ‘Ben Nevis’

    get rewritten to:SELECT MName, HeightFROM MunrosWHERE MName = ‘Ben Nevis’

    • Is this always a good idea?• Sometimes it is better to materialise a view.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Universal quanti�cation

    • “The names of hikers who have climbed all Munros”

    CREATE VIEW NotClimbed ⇥ HId has not climbed MIdAS SELECT HId, MId FROM Hikers, Munros

    MINUSSELECT HId, MId FROM Climbs

    CREATE VIEW ClimbedAll ⇥ HIds of climbers who have climbed all peaksAS SELECT HId FROM Hikers

    MINUSSELECT HId FROM NotClimbed

    SELECT HNameFROM Hikers, ClimbedAllWHERE Hikers.HId = ClimbedAll.Hid

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Universal quanti�cation – an alternative

    • The HIds of hikers who have climbed all peaks.SELECT HIdFROM Hikers hWHERE NOT EXISTS

    ( SELECT MId ⇥ Peaks not climbed by h.FROM Munros mWHERE NOT EXISTS

    ( SELECT *FROM Climbs cWHERE h.HId=c.HIdAND c.MId=m.MId ) )

    It’s not clear whether this version is any more comprehensible!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    SQL so far

    So far what we have seen extends relational algebra in two ways:• Use of multisets/bags as well as sets (SELECT DISTINCT, UNIONALL, etc.).

    • Arithmetic and more predicates in WHERE and arithmetic inSELECT output.

    • Sorting output using ORDER BY.These are minor extensions.

    • Amore interesting extension is the use of aggregate functions.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Counting

    SELECT COUNT(MId)FROM Munros; and

    SELECT COUNT(Rating)FROM Munros;

    both give the same answer (to within attribute labels):

    COUNT(Rating)�

    • Why?• To �x the answer to the second, useSELECT COUNT(DISTINCT Rating)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Aggregate operators

    • COUNT ([DISTINCT]) A : The number of (unique) values incolumn A.

    • SUM ([DISTINCT]) A : The sum of (unique) values in column A.• AVG ([DISTINCT]) A : The average of (unique) values in columnA.

    • MAX A : The maximum value in the A column.• MIN A : The minimum value in the A column.

    Note: These cannot be nested!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    Aggregate operators

    • If a select clause contains an aggregate operator, it must onlyuse aggregate operators, unless a GROUP BY operator ispresent.

    • SELECT MName , AVG(Rating)FROM Munros; is incorrect,

    • SELECT COUNT (DISTINCT MName), COUNT(Rating)FROM Munros;is allowed.

    • We next discuss GROUP BY...

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    GROUP BY

    • We’ve only applied aggregate operators to all tuples satisfyingsome condition.

    • Sometimes, we want to apply the aggregate operators to eachof several groups of tuples:

    • “Find the number of Munros for each rating level”:• In general, we don’t know howmany rating levels thereare and which rating levels do exist in the Munro table.

    • Suppose that we know that the rating levels are �.�, �.� or�, then we can write the following three queries fori = �.�, �.�, �:

    SELECT COUNT(M.Rating)FROM MunrosMWHERE M.Rating=i ;

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ��

    GROUP BY

    • Better:SELECT Rating, COUNT(*)FROM MunrosGROUP BY Rating;

    gives

    Rating COUNT(*)�.� ��.� �� �

    • Note: non-aggregate attribute appears in GROUP BY, asrequired.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ���

    Query with GROUP BY

    SELECT [DISTINCT] select-listFROM from-listWHERE quali�cationGROUP BY grouping-list

    • select-list: A list of attributes of the tables in from-list: or termswith aggregate operators.

    • grouping-list: list of attributes from tables in from-list.• Note: Every (non-aggregate) attribute name in select-listmustappear in grouping-list!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ���

    Query with GROUP BY: Conceptual evaluation

    � The cross-product of from-list is computed, tuples that failquali�cation are discarded, “unnecessary” �elds are deleted,and the remaining tuples are partitioned into groups by thevalue of attributes in grouping-list.

    � One answer tuple is generated per qualifying group.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ���

    GROUP BY – selecting on the “grouped” attributes

    SELECT Rating, AVG(Height)FROM MunrosGROUP BY RatingHAVING Rating > � AND COUNT(*) > �;

    gives

    Rating AVG(Height)� ����

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ���

    Query with GROUP BY and HAVING

    SELECT [DISTINCT] select-listFROM from-listWHERE quali�cationGROUP BY grouping-listHAVING group-quali�cation

    • grouping-quali�cation: must have a single value per group!

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ���

    Query with GROUP BY and HAVING: evaluation

    � The cross-product of from-list is computed, tuples that failquali�cation are discarded, “unnecessary” �elds are deleted,and the remaining tuples are partitioned into groups by thevalue of attributes in grouping-list.

    � The group-quali�cation is then applied to eliminate somegroups. Expressions in group-quali�cation must have a singlevalue per group!• In e�ect, an attribute in group-quali�cation that is not anargument of an aggregate operator also appears ingrouping-list.

    � One answer tuple is generated per qualifying group.

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ���

    GROUP BY - examples

    SELECT H.HName, AVG(M.Height)FROM Hikers H, Munros M, Climbs CWHERE H.HId=C.HId ANDM.MId=C.MIdGROUP BY H.HNameHAVING MAX(C.Time)

  • Introduction

    Architecture of aDBMS

    OrganizationalMatters

    RecapMotivation

    General stuff

    Relational model/SQL-DDL

    Relational algebra/SQLDML

    ���

    GROUP BY - example

    SELECT M.MNameFROM Munros M, Climbs CWHERE M.MId=C.CId AND C.Time > (SELECT MAX( C�.Time)

    FROM Climbs C�WHERE C�.Date=‘��/��/����’);

    • “Find those Munros whose climbs lasted longer than any ofthe climbs done yesterday”

    • Result of aggregates can be used in WHERE clause as well.SELECT M.MNameFROM Munros M, Climbs CWHERE M.MId=C.CId AND C.Time > �.� * (SELECT MAX( C�.Time)

    FROM Climbs C�WHERE C�.Date=‘��/��/����’);

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Lecture �StorageDisks, Bu�er Manager, Files. . .

    Database Systems

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Database Architecture

    data �les, indices, . . .

    Disk Space Manager

    Bu�er Manager

    Files and Access Methods

    Operator Evaluator Optimizer

    Executor Parser

    Lock Manager

    TransactionManager

    RecoveryManager

    DBMS

    Database

    SQL Commands

    Web Forms Applications SQL Interface

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    TheMemory Hierarchy

    CPU(with

    registers)

    caches

    main memory

    hard disks

    tape library

    capacity

    bytes

    kilo-/megabytes

    gigabytes

    terabytes

    petabytes

    latency

    < � ns

    < �� ns

    ��–��� ns

    �–��ms

    varies

    • Fast—but expensive and small—memory close to CPU• Larger, slower memory at the periphery• DBMSs try to hide latency by using the fast memory as acache.

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Magnetic Disks

    heads

    arm

    platter

    rotation

    sectorblock

    track

    • A stepper motor positions an array of diskheads on the requested track

    • Platters (disks) steadily rotate• Disks are managed in blocks: the systemreads/writes data one block at a time

    Photo:

    http://www.metallurgy.utah.edu/

    http://www.metallurgy.utah.edu/

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Access Time

    Data blocks can only be read andwritten if disk heads andplatters are positioned accordingly.

    • This design has implications on the access time to read/writea given block:

    De�nition (Access Time)

    � Move disk arms to desired track (seek time ts)� Disk controller waits for desired block to rotate under disk

    head (rotational delay tr)� Read/write data (transfer time ttr)

    � access time: t = ts + tr + ttr

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Example: Seagate Cheetah ��K.�(���GB, server-class drive)

    • Seagate Cheetah ��K.� performance characteristics:• � disks, � heads, avg. ��� kB/track, ���GB capacity• rotational speed: �� ��� rpm (revolutions per minute)• average seek time: �.�ms• transfer rate ⌅ ���MB/s

    What is the access time to read an �KB data block?

    average seek time ts = �.��msaverage rotational delay: �� �

    ��� ���min�� tr = �.��ms

    transfer time for � KB: � kB���MB/s ttr = �.��ms

    access time for an � kB data block t = �.��ms

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Example: Seagate Cheetah ��K.�(���GB, server-class drive)

    • Seagate Cheetah ��K.� performance characteristics:• � disks, � heads, avg. ��� kB/track, ���GB capacity• rotational speed: �� ��� rpm (revolutions per minute)• average seek time: �.�ms• transfer rate ⌅ ���MB/s

    What is the access time to read an �KB data block?

    average seek time ts = �.��msaverage rotational delay: �� �

    ��� ���min�� tr = �.��ms

    transfer time for � KB: � kB���MB/s ttr = �.��ms

    access time for an � kB data block t = �.��ms

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Sequential vs. RandomAccess

    Example (Read � ��� blocks of size � kB)

    • random access:trnd = � ��� � �.��ms = �.�� s

    • sequential read of adjacent blocks:tseq = ts + tr + � ��� � ttr + �� � ts,track-to-track

    = �.��ms + �.��ms + ��ms + �.�ms ⌅ ��.�msThe Seagate Cheetah ��K.� stores an average of ��� kB pertrack, with a �.�ms track-to-track seek time; our � kB blocks arespread across �� tracks.

    � Sequential I/O ismuch faster than random I/O� Avoid random I/Owhenever possible� As soon as we need at least ��.�ms�,���ms = �.��% of a �le,

    we better read the entire �le sequentially

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Performance Tricks

    • Disk manufacturers play a number of tricks to improveperformance:

    track skewingAlign sector � of each track to avoidrotational delay during longersequential scans

    request schedulingIf multiple requests have to be served, choose the onethat requires the smallest armmovement (SPTF: shortestpositioning time �rst, elevator algorithms)

    zoningOuter tracks are longer than the inner ones. Therefore,divide outer tracks into more sectors than inner tracks

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    Evolution of Hard Disk Technology

    Disk seek and rotational latencies have only marginally improvedover the last years (⌅ ��� per year)

    But:

    • Throughput (i.e., transfer rates) improve by ⌅ ��� per year• Hard disk capacity grows by ⌅ ��� every year

    Therefore:

    • Random access cost hurts even more as time progresses

    Example (� Years Ago: Seagate Barracuda ����.�)

    Read �K blocks of � kB sequentially/randomly: ���ms / �� ���ms

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Ways to Improve I/O Performance

    The latency penalty is hard to avoid

    But:• Throughput can be increased rather easily by exploitingparallelism

    • Idea: Use multiple disks and access them in parallel, try tohide latency

    A recent �� system (IBMDB� �.� on AIX) uses

    • ��,��� disk drives (��.�GB each, ��,��� rpm) (!)plus � ���.�GB internal SCSI drives,

    • connected with �� �Gbit �bre channel adapters,• yielding �mio transactions per minute

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    DiskMirroring

    • Replicate data onto multiple disks:

    � � � � �� � � � ⇧

    � � � � �� � � � ⇧

    � � � � �� � � � ⇧

    • Achieves I/O parallelism only for reads• Improved failure tolerance—can survive one disk failure• This is also known as RAID � (mirroring without parity)(RAID: Redundant Array of Inexpensive Disks)

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Disk Striping

    • Distribute data over disks:

    � � � � �� � � � ⇧

    � � � ⇧ � � � ⇧ � � � ⇧

    • Full I/O parallelism for read and write operations• High failure risk (here: � times risk of single disk failure)!• Also known as RAID � (striping without parity)

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Disk Striping with Parity

    • Distribute data and parity information over ) � disks:

    � � � � �� � � � ⇧

    � � � ⇧�/� � � � ⇧�/� � � ⇧�/� �/�

    • High I/O parallelism• Fault tolerance: any one disk may failwithout data loss(with dual parity/RAID �: two disks may fail)

    • Distribute parity (e.g., XOR) information over disks, separatingdata and associated parity

    • Also known as RAID � (striping with distributed parity)

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Solid-State Disks

    Solid state disks (SSDs) have emerged as an alternative toconventional hard disks

    • SSDs provide very low-latencyrandom read access (< �.��ms)

    • Randomwrites, however, aresigni�cantly slower than ontraditional magnetic drives:

    � (Blocks of ) Pages have to beerased before they can beupdated

    � Once pages have been erased,sequentially writing them isalmost as fast as reading

    �ash mag. disk

    read

    write

    read

    writetime

    Samsung ��GB flash disk; ���� bytes read/written randomly. Source: Koltsidas and Viglas. Flashing up the Storage Layer. VLDB ����.

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    SSDs: Page-Level Writes, Block-Level Deletes

    • Typical page size: ��� kB• SSDs erase blocks of pages: block ⌅ �� pages (�MB)

    Example (Perform block-level delete to accomodate new data pages)Illus

    trationtake

    nfrom

    arstechnica.com(The

    SSDRe

    volution

    )

    arstechnica.com

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Example: Seagate Pulsar.�(���GB, server-class solid state drive)

    • Seagate Pulsar.� performance characteristics:• NAND �ash memory, ���GB capacity• standard �.�¨¨ enclosure, no moving/rotating parts• data read/written in pages of ��� kB size• transfer rate ⌅ ���MB/s

    What is the access time to read an �KB data block?

    no seek time ts = �.��msno rotational delay: tr = �.��ms

    transfer time for � KB: ��� kB���MB/s ttr = �.��ms

    access time for an � kB data block t = �.��ms

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Example: Seagate Pulsar.�(���GB, server-class solid state drive)

    • Seagate Pulsar.� performance characteristics:• NAND �ash memory, ���GB capacity• standard �.�¨¨ enclosure, no moving/rotating parts• data read/written in pages of ��� kB size• transfer rate ⌅ ���MB/s

    What is the access time to read an �KB data block?

    no seek time ts = �.��msno rotational delay: tr = �.��ms

    transfer time for � KB: ��� kB���MB/s ttr = �.��ms

    access time for an � kB data block t = �.��ms

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Sequential vs. RandomAccess with SSDs

    Example (Read � ��� blocks of size � kB)

    • random access:trnd = � ��� � �.��ms = �.� s

    • sequential read of adjacent pages:tseq = , � ��� � � kB��� kB 2 � ttr ⌅ ��.�msThe Seagate Pulsar.� (sequentially) reads data in ��� kB chunks.

    � Sequential I/O still beats random I/O(but random I/O is more feasible again)

    • Adapting database technology to these characteristics is acurrent research topic

  • Storage

    Magnetic DisksAccess Time

    Sequential vs. RandomAccess

    I/O ParallelismRAID Levels �, �, and �

    Alternative StorageTechniquesSolid-State Disks

    Network-Based Storage

    Managing SpaceFree Space Management

    Buffer ManagerPinning and Unpinning

    Replacement Policies

    Databases vs.Operating Systems

    Files and RecordsHeap Files

    Free Space Management

    Inside a Page

    Alternative Page Layouts

    Recap

    ��

    Network-Based Storage

    Today the network is not a bottleneck any more:

    Storage media/interface Transfer rateHard disk ���–���MB/sSerial ATA ���MB/sUltra-��� SCSI ���MB/s��-Gbit Ethernet �,���MB/sIn�niband QDR ��,���MB/s

    For comp