Advanced Database Management...

144
Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of Computer Science, University of Manchester AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 1 / 144

Transcript of Advanced Database Management...

Page 1: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Advanced Database Management SystemsDatabase Management Systems

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 1 / 144

Page 2: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Outline

Database Management Systems: Definition

Database Management Systems: Languages

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 2 / 144

Page 3: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMSs Defined

Database Management SystemsWhat are they?

Definition

A database management system (DBMS) is

I a software package

I for supporting applications

I aimed at managing very large volumes of data

I efficiently and reliably

I transparently with respect to the underlying hardware and networkinfrastructures

I and projecting a coherent, abstract model

I of the underlying reality of concern to applications

I through high-level linguistic abstractions.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 3 / 144

Page 4: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMSs Defined

Managing Very Large Volumes of DataWhat does it mean?

managing storing for querying (often, updating as well, and morerecently searching, exploring, discovering)

very large for a given underlying hardware and network infrastructure,volumes that require special strategies to enable scale-out instorage with no scale-down in processing are very large

data basically, facts describing an entity (e.g., the employee withid=’123’ has name=’Jane’, the employee with id=’456’ hasname=’John’), grouped in collections (e.g., Employee) andrelated to one another (e.g., Jane manages John)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 4 / 144

Page 5: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMSs Defined

Projecting a Coherent Abstract ModelWhat does it mean?

I A data model comprises a set of abstract, domain-independentconcepts with which data in the database can be described.

I Such concepts (e.g., primary/foreign keys) induce integrityconstraints (e.g., referential ones, i.e., a foreign key in one relationmust appear as primary key in another).

I A schema describes the domain-specific concepts that go into adatabase in terms of the domain-independent concepts available inthe given data model.

I A database instance (i.e., the database state at a point in the lifecycle of the database) is a snapshot of the world as captured in dataand must always be valid with respect to the schema.

I Each DBMS supports one data model (e.g., the relational model),but many supported data models subsume others (e.g., theobject-relational model).

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 5 / 144

Page 6: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMSs Defined

High-LevelWhat does it mean?

I physical schema: storage-levelstructures

I conceptual/logical schema:domain-specific,application-independent conceptsorganized as a collection of entities(e.g., relations) and relationships (e.g.,expressed by means of primary/foreignkeys)

I external schema: application-specificconcepts/requirements as a collection ofviews/queries over the logical schema

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 6 / 144

Page 7: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMSs Defined

Linguistic Abstractions (1)What does it mean?

A DBMS typically supports three sub-languages:

DDL A data definition language to formulate schema-levelconcepts.

DML A data manipulation language to formulate changes to beeffected in a database instance.

(D)QL A (data) query language to formulate retrieval requestsover a database instance.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 7 / 144

Page 8: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMSs Defined

Linguistic Abstractions (2)What does it mean?

The best known DBMS language is SQL, different parts of which serve asDDL, DML and QL for (primarily) relational DBMSs.

I The goal of a DDL is todescribeapplication-independentnotions (i.e., at the physicaland logical levels).

I The goal of a DML and a(D)QL is describeapplication-specific notions(i.e., at the view level).

I CREATE TABLE

Employee

(id CHAR(3),name VARCHAR(30),branch VARCHAR(10))

I INSERT INTO

Employee

VALUES (’123’,’Jane’,’Manchester’)

INSERT INTO

Employee

VALUES (’456’,’John’,’Edinburgh’)

I SELECT id

FROM Employee

WHERE branch = ’Manchester’

I CREATE VIEW

ManchesterEmployees

AS

SELECT id

FROM Employee

WHERE branch = ’Manchester’

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 8 / 144

Page 9: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMS Languages

Database Languages (1)Are they any different?

I Database languages are different, by design, from general-purposeprogramming languages.

I Each sub-language caters for distinct needs that in a general-purposeprogramming language are not factored out.

I DDL statements update the stored metadata schema-level conceptsand correspond to variable and function declarations (but haveside-effects).

I DML statements update the database, i.e., change (equivalently, causea transition from) one database instance into another, and correspondto assignments.

I DQL expressions do not have side-effects, and correspond to (pure)expressions.

I Database languages are designed to operate on collections, withoutexplicit iteration, whereas most general-purpose programminglanguage are founded on explicit iteration.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 9 / 144

Page 10: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMS Languages

Database Languages (2)Are they any limited?

I QLs are limited, by design, compared with general-purposeprogramming languages.

I They are not Turing-complete.

I Even (ANSI) SQL was not Turing-complete until proceduralconstructs (i.e., so called persistent stored modules) were introducedto make it so.

I They are not meant for complex calculations nor for operating onanything other than large collections.

I These limitations make QLs declarative and give them simple formalsemantics (in calculus and in algebraic form).

I These properties in turn make it possible for a declarative query to berewritten by an optimizer into an efficient procedural execution plan.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 10 / 144

Page 11: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMS Languages

SummaryDatabase Management Systems

I The added-value that DBMSs deliver to organizations stems fromcontrolled use of abstraction.

I The adoption of application-independent data models provides acommon formal framework upon which several levels of descriptionbecome available.

I Such descriptions are conveyed in formal languages that separateconcerns and facilitate the implementation of efficient evaluationengines.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 11 / 144

Page 12: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

DBMS Languages

Advanced Database Management SystemsArchitecture: Classical Case and Variations

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 12 / 144

Page 13: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Outline

The Classical Case

Strengths and Weaknesses of Classical DBMSs

Variations on the Classical Case

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 13 / 144

Page 14: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

The Classical Case

The Architecture of DBMS-Centred ApplicationsThree Tiers

I A classical DBMS supportson-line transaction processing(OLTP) applications.

I Applications send queries andtransactions that the DBMSconverts into OLTP tuple-basedoperations over the data store.

I What comes back is structureddata (e.g., records) as tables,i.e., collections of tuples.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 14 / 144

Page 15: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

The Classical Case

Application InterfacesThree Routes

I Occasional, non-knowledgeableusers are served by graphicaluser interfaces (GUI), which areoften form-based.

I Most application programmingtends to benefit from databaseconnectivity middleware (e.g.,ODBC, JDBC, JDO, etc.).

I If necessary, from ageneral-purpose programminglanguage, applications caninvoke DBMS services.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 15 / 144

Page 16: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

The Classical Case

Internal Architecture of Classical DBMSsFour+One Main Service Types

I Query Processing

I Transaction Processing

I Concurrency Control

I Recovery

I Storage Management

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 16 / 144

Page 17: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

The Classical Case

The Query Processing StackDeclarative-to-Procedural, Equivalence-Preserving Program Transformation

1. Parse the declarative query

2. Translate to obtain an algebraicexpression

3. Rewrite into a canonical,heuristically-efficient logicalquery execution plan (QEP)

4. Select the algorithms andaccess methods to obtain aquasi-optimal, costed concreteQEP

5. Execute (the typicallyinterpretable form of) theprocedural QEP

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 17 / 144

Page 18: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

The Classical Case

Data Store ContentsFour Main Types of Data

I Metadata includes schema-levelinformation and a description ofthe underlying computinginfrastructure.

I Statistics are mostlyinformation about thetrend/summary characteristicsof past database instances.

I Indices are built for efficientaccess to the data.

I Data is what the databaseinstance contains.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 18 / 144

Page 19: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

The Classical Case

SummaryClassical DBMSs

I Classical DBMSs have been incredibly successful in underpinning theday-to-day life of organizations.

I Their internal functional architecture had not, until very recently,changed much over the last three decades.

I More recently, this architecture has been perceived as being unable todeliver certain kinds of services to business.

I Under pressure from business interests and reacting to opportunitiescreated by new computing infrastructures, classical DBMSs have beentransforming themselves into different kinds of advanced DBMSs thatthis course will explore.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 19 / 144

Page 20: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Strengths and Weaknesses

Classical DBMSs: StrengthsWhat are classical database management systems good at?

I Classical database management systems (DBMSs) have been verysuccessful.

I In the last four decades, they have become an indispensableinfrastructural component of organizations.

I They play a key role in reliably and efficiently reflecting thetransaction-level unfolding of operations in the value-adding chain ofan organization.

I Each transaction (e.g., an airline reservation, a credit card payment,an item sold in a checkout) is processed soundly, reliably, andefficiently.

I Effects are propagated throughout the organization.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 20 / 144

Page 21: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Strengths and Weaknesses

Classical DBMSs: WeaknessesWhere do classical database management systems come short?

I Classical database management systems (DBMSs) assume that:

1. Data is structured in the form of records2. Only on-line transaction processing (OLTP) is needed3. Data and computational resources are centralized4. There is central control over central resources5. There is no need for dynamically responding in real-time to external

events6. There is no need for embedding in the physical world in which

organizations exist

I This is too constraining for most modern businesses.

I Classical DBMSs support fewer needs of organizations than they usedto.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 21 / 144

Page 22: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Strengths and Weaknesses

Classical DBMSs: TrendsHow are DBMSs evolving?

I Most cutting-edge research in databases is geared towards supporting:

1. Un- and semi-structured data too2. On-line analytical processing (OLAP) too3. Distributed data and computational resources4. Absence of central control over distributed resources5. Dynamic response in real-time to external events6. Embedding in the physical world in which the organization exists

I DBMSs that exhibit these capabilities are advanced in the sense usedhere.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 22 / 144

Page 23: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Beyond Structured Data

I Add support to un- andsemi-structured data indocument form

I Stored in content repositories

I Using keyword-based search andaccess methods for graph/treefragments

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 23 / 144

Page 24: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Beyond OLTP

I Preprocess, aggregate andmaterialize separately

I Add support for OLAP

I Using multidimensional,denormalized logical schemas

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 24 / 144

Page 25: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Parallelization (1)Shared-Disk Parallelism

I Place a fast interconnectbetween memory andcomparatively slow disks

I Parallelize disk usage to avoidsecondary-memory contention

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 25 / 144

Page 26: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Parallelization (2)Shared-Memory Parallelism

I Place a fast interconnectbetween processor and memory

I Parallelize memory usage toavoid primary-memorycontention

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 26 / 144

Page 27: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Parallelization (3)Shared-Nothing Parallelism

I Place a fast interconnectbetween full processing units

I Parallelize processing usingblack-boxes that are locallyresource-rich

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 27 / 144

Page 28: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Distribution (1)Multiple DBMSs

I Harness distributed resources

I Using a full-fledged local orwide-area network asinterconnect

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 28 / 144

Page 29: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Distribution (2)Global, Integrated DBMSs

I Renounce central control

I Harness heterogeneous,autonomous, distributedresources

I Project a global view bymediation overwrapper-homogenized localsources

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 29 / 144

Page 30: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Distribution (3)Peer-to-Peer DBMSs

I Renounce global view and queryexpressiveness

I Benefit from inherent scalabilityover large-scale,extremely-wide-area networks

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 30 / 144

Page 31: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Distribution (4)Data Stream Management Systems

I Enable dynamic response inreal-time to external events

I Placing queries that executeperiodically or reactively

I Over data that is pushed ontothe system in the form ofunbounded streams

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 31 / 144

Page 32: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Distribution (5)Sensor Network Data Management

I Embed data-driven processes inthe physical world

I Overlaying query processingover an ad-hoc wireless networkof intelligent sensor nodes

I Over pull-based data streams

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 32 / 144

Page 33: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Advanced DBMSs (1)Why do they matter?

OLAP/DM Companies need to make more, and more complex, decisionsmore often and more effectively to remain competitive.

Text-/XML-DBMSs The ubiquity and transparency of networks meansdata can take many forms, is everywhere, and can beprocessed anywhere.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 33 / 144

Page 34: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Advanced DBMSs (2)Why do they matter?

P/P2P/DDBMSs For both data and computation, provision of resourcesis now largely servicized and can be negotiated, or harvested.

Stream DMSs Widespread cross-enterprise integration means thatcompanies must be able to respond in real-time to eventsstreaming in from their commercial and financial environment

Sensor DMSs Most companies are aiming to sense and respond not just tothe commercial and financial environment but to the physicalenvironment too.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 34 / 144

Page 35: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

SummaryAdvanced DBMSs

I Architecturally, advanced DBMSs characterize different responses toI modern functional and non-functional application requirementsI the availability of advanced computing and networking infrastructures

I Advanced DBMSs are motivated by real needs of modernorganizations, in both the industrial and the scientific arena.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 35 / 144

Page 36: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Variations

Advanced Database Management SystemsThe Relational Case: Data Model, Databases, Languages

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 36 / 144

Page 37: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Outline

Relational Model

Relational Databases

Relational Query Languages

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 37 / 144

Page 38: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Model

The Relational Model of DataWhy?

I Conceptually simple:I one single, collection-valued, domain-independent type

I Formally elegant:I a very constrained system of first-order logic with both a model- and a

proof-theoretic viewI gives rise to (formally equivalent) declarative and procedural languages,

i.e., the domain and the tuple relational calculi, and the relationalalgebra, resp.

I Practical:I underlies SQLI possible to implement efficientlyI has been so implemented many times

I Flexible:I often accommodates useful extensions

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 38 / 144

Page 39: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (1)Definitions (1)

Definition

A relational database is a set of relations.

Example

D = { Students, Enrolled, Courses, . . . }

Definition

A relation is defined by its schema and consists of a collection oftuples/rows/records that conform to that schema.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 39 / 144

Page 40: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (2)Definitions (2)

Definition

A schema defines the relation name and the name and domain/type ofits attributes/columns/fields.

Example

Students (stdid: integer, name: string, login: string, age: integer, gpa:real)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 40 / 144

Page 41: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (3)Definitions (3)

Definition

Given its schema, a relation (instance) is a subset of the Cartesianproduct induced by the domain of its attributes.

Definition

A tuple in a relation instance is an element in the Cartesian productdefined by the relation schema that the instance conforms to.

Example

stdid name login age gpa

53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 41 / 144

Page 42: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (4)Underlying Assumptions

I Relations are classically considered to be a set (hence, all tuples areunique and their order does not determine identity).

I In practice (e.g., in SQL), relations are multisets/bags, i.e., they maycontain duplicate tuples, but their order still does not determineidentity.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 42 / 144

Page 43: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (5)Definitions (4)

Definition

The number of attributes in a relation schema defines its arity/degree.

Definition

The number of tuples in a relation defines its cardinality.

Example

I arity(Students) = 5

I cardinality(Students) = |Students| = 3

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 43 / 144

Page 44: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (6)Integrity Constraints (1)

Definition

An integrity constraint (IC) is a property that must be true for alldatabase instances.

Example

Domain Constraint: The value of an attribute belongs to theschema-specified domain.

I ICs are specified when the schema is defined.

I ICs are checked when a relation is modified.

I A legal instance of a relation is one that satisfies all specified ICs.

I A DBMS does not normally allow an illegal instance to be stored (orto result from an update operation).

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 44 / 144

Page 45: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (7)Integrity Constraints (2)

Definition

1. A set of fields is a key for a relation if both:

1.1 No two distinct tuples can have the same values for those fields.1.2 This is not true for any subset of those fields.

2. A superkey is not a key, it is a set of fields that properly contains akey.

3. If there is more than one key for a relation, each such key is called acandidate key.

4. The primary key is uniquely chosen from the candidate keys.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 45 / 144

Page 46: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (8)Integrity Constraints (3)

Example

I {stdid} is a key in Students, so is {login}, {name} is not.

I {stdid, name} is a superkey in Students.

I {stdid} and {login} are candidate keys in Students

I {stdid} may have been chosen to be the primary key out of thecandidate keys.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 46 / 144

Page 47: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (9)Integrity Constraints (4)

I A foreign key is set of fields in one relation thatI appears as the primary key in another relationI can be used to refer to tuples in that other relationI by acting like a logical pointerI it expresses a relationship between two entities

I A DBMS does not normally allow an operation whose outcomeviolates referential integrity.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 47 / 144

Page 48: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (10)Integrity Constraints (5)

Example

Enrolled (cid: string, grade: string, studentid: string)

Example

cid grade studentid

CS101 A 53666MA102 B 53688BM222 B 53650

I E.g. {studentid} is a foreign key in Enrolled using the {stdid} primarykey of Students to refer to the latter.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 48 / 144

Page 49: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

Relational Databases (11)Where Do Integrity Constraints Come From?

I ICs are an aspect of how an organization chooses to model its datarequirements in the form of database relations.

I Just like a schema must be asserted, so must ICs.

I We can check a database instance to see if an IC is violated, but wecannot infer that an IC is true by looking at an instance.

I An IC is a statement about all possible instances, not about theparticular instance we happen to be looking at.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 49 / 144

Page 50: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Databases

SummaryRelational Databases

I Since their conception in the late 1960s, and particularly after thefirst successful, industrial-strength implementations appeared in the1970s, relational databases have attracted a great deal of praise fortheir useful elegance.

I Over the 1980s, relational databases became the dominant paradigm,a position they still hold (with some notable evolutionary additions).

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 50 / 144

Page 51: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (1)Why do they matter?

I They were a novel contribution and are a major strength of therelational model.

I They support simple, powerful, well-founded querying of data.

I Requests are specified declaratively and delegated to the DBMS forefficient evaluation.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 51 / 144

Page 52: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (2)Why are they special?

I The success of this approach depends onI the definition of a pair of query languages, one declarative and one

proceduralI being given a formal semantics that

I allows their equivalence to be provedI the mapping from one to other to be formalized

I with closure properties (i.e., any expression evaluates to an output ofthe same type as its arguments) for recursive composition.

I In view of such results, the DBMS can implement adomain-independent query processing stack, such as we have seen ina previous lecture.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 52 / 144

Page 53: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (3)Declarative and Procedural, Abstract and Concrete

I By declarative we mean a language in which we describe the desiredanswer without describing how to compute it.

I By procedural we mean a language in which we describe the desiredanswer by describing how to compute it.

I By abstract we mean that the language does not have a concrete (letalone, standardized) syntax (and thus, no reference implementation).

I By concrete we mean that the language does have a concrete(ideally, standardized) syntax (and thus, often a referenceimplementation as well).

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 53 / 144

Page 54: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (4)Domain/Tuple Relational Calculi, Relational Algebra, SQL

I Classically, the relational model defines three expressively-equivalentabstract languages:

I the domain relational calculus (DRC)I the tuple relational calculus (TRC)I the relational algebra (RA)

I RA is procedural (more on it later), DRC and TRC are declarative(see the Bibliography for more on those).

I SQL (for Structured Query Language) is a concrete languagewhose core is closely related to TRC.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 54 / 144

Page 55: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (5)A First Glimpse

Example

Retrieve the gpa of students with age greater than 18.

TRC: {A | ∃S ∈ Students (S .age > 18 ∧ A.gpa = S .gpa)}RA: πgpa(σage>18(Students))

SQL: SELECT S.gpa FROM Students S WHERE S.age > 18

Answer:gpa

3.8

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 55 / 144

Page 56: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (6)What do these queries compute?

Example

TRC: {A | ∃S ∈ Students ∃E ∈ Enrolled (S .stdid = E .studentid ∧E .grade =′B ′ ∧ A.name = S .name ∧ A.cid = E .cid)}

RA: πname,cid (σgrade= ′B′(Students ./stdid=studentid Enrolled))

SQL: SELECT S.name, E.cid

FROM Students S, Enrolled E

WHERE S.stdid = E.studentid AND E.grade = ’B’

Retrieve the names of the students who had a grade ’B’ and the course inwhich they did so. Answer:

name cid

Jones CS101AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 56 / 144

Page 57: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (7)Views as Named Queries

I A relation instance is normally defined extensionally (i.e., at eachpoint in time we can enumerate the tuples that belong to it).

I A view defines a relation instance intensionally (i.e., by means of anexpression that, when evaluated against a database instance,produces the corresponding relation instance).

I A view explicitly assigns a name to the relation it defines andimplicitly characterizes its schema (given that the type of theexpression can be inferred).

I Typically, (the substantive part of) the view definition language is the(D)QL,

I For a view, therefore, the DBMS only need store the query expression,not a set of tuples, as the latter can be obtained, whenever needed,by evaluating the former.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 57 / 144

Page 58: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (8)View Definition and Usage

I Start with:

Example

CREATE VIEW TopStudents (sname, stid, courseid)

AS SELECT S.name, S.stdid, E.cid

FROM Students S, Enrolled E

WHERE S.stdid = E.studentid and E.grade = ’B’

I Follow up with:

Example

SELECT T.sname, T.courseid

FROM TopStudents T

I This should look familiar.AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 58 / 144

Page 59: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Relational Query Languages (9)Why are views useful?

I Views can be used to present necessary information (or a summarythereof), while hiding details in underlying relation(s).

I Views can be used to project specific abstractions to specificapplications.

I Views can be materialized (e.g., in a data warehouse, to make OLAPfeasible).

I By ‘materializing’ a view we mean evaluating the view and storing theresult.

I Views are a useful mechanism for controlled fragmentation andintegration in distributed environments.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 59 / 144

Page 60: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

SummaryRelational Model, Databases, Query Languages

I The relational model remains the best formal foundation for the studyof DBMSs.

I It brings out the crucial role of query languages in providingconvenient mechanisms for interacting with the data.

I It lies behind the most successful DBMSs used by organizations today.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 60 / 144

Page 61: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Relational Query Languages

Advanced Database Management SystemsA Relational Algebra

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 61 / 144

Page 62: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Outline

Preliminaries

Example Relation Instances

Primitive and Derived Operations

Extensions to the Algebra

Operation Definitions

Example Expressions

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 62 / 144

Page 63: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Preliminaries

Preliminaries

I A query is applied to relation instances.

I The result of a query is also a relation instance.

I The schemas of the input/argument relations of a query are fixed.

I The schema for the result of a given query is statically known by typeinference on the schemas of the input/argument relations.

I Recall that relational algebra is closed (equiv., has a closureproperty), i.e., the input(s) and output of any relational-algebraicexpression is a relation instance.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 63 / 144

Page 64: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Example Relation Instances

Relational Algebra (1)Relation Instances Used in Examples (1)

I Let Sailors, Boats andReservations be examplerelations.

I Their schemas are on the right.

I Underlined sets of fields denotea key.

ExampleSailors (sid: integer, sname: string, rating: integer, age: real)

Boats (bid: integer, bname: string, colour: string)

Reservations (sid: integer, bid: integer, day: date)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 64 / 144

Page 65: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Example Relation Instances

Relational Algebra (2)Relation Instances Used in Examples (2)

I The various relation instancesused on the right.

I Note that there are two relationinstances (viz., S1 and S2) forSailors, one for Reservations(viz., R1), and none, yet, forBoats.

I Fields in an instance of one ofthese relations are referred to byname or by position (in whichcase the order is that ofappearance, left to right, in theschema).

ExampleS1 =

sid sname rating age

22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S2 =sid sname rating age

28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

R1 =sid bid day

22 101 10/10/9658 103 12/11/96

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 65 / 144

Page 66: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Primitive and Derived Operations

Relational Algebra (3)Primitive Operations

σ : selection returns those rows in the singleargument-relation that satisfy the given predicate

π : projection deletes those columns in the singleargument-relation that are not explicitly asked for

× : Cartesian (or cross) product concatenates each row inthe first argument relation with each row in the second toform a row in the output

\ : (set) difference returns the rows in the first argumentrelation that are not in the second

∪ : (set) union returns the rows that are either in the first orin the second argument relation (or in both)

The above is a complete set: any other relational-algebra can be derivedby a combination of the above.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 66 / 144

Page 67: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Primitive and Derived Operations

Relational Algebra (4)Derived Operations

∩ : R ∩ S ⇔ (R ∪ S) \ ((R \ S) ∪ (S \ R)(set) intersection returns the rows that are both in the firstand in the second argument relation

./ : R ./θ S ⇔ σθ(R × S)join concatenates each row in the first argument relationwith each row in the second and forms with them a row inthe output, provided that it satisfies the given predicate

÷ : (see below for derivation)division returns the rows in the first argument relation thatare associated with every row in the second argument relation

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 67 / 144

Page 68: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Extensions

Relational Algebra (5)Extensions (1)

Useful extensions for clarity of exposition are:

ρ : renaming returns the same relation instance passed asargument but assigns the given name(s) to the outputrelation (or any of its attributes)

← : assignment assigns the left-hand side name to the relationinstance resulting from evaluating the right-hand sideexpression

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 68 / 144

Page 69: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Extensions

Relational Algebra (6)Extensions (2)

Extensions that change the expressiveness of classical relation algebrainclude (see, e.g., [Silberschatz et al., 2005]):

I generalized projection, which allows arithmetic expressions (and notjust attribute names) to be specified

I (group-by) aggregation, which allows functions (such as count,sum, max, min, avg) to be applied on some attribute (possibly overpartitions defined by the given group-by attribute)

I other kinds of join (e.g., semijoin, antijoin, [left|right] outer join)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 69 / 144

Page 70: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Selection

I σθ(R) = {x | ∃x ∈ R (θ(x))}I Rows in the single input relation R

that satisfy the given selectioncondition (i.e., a Boolean expressionon the available attributes) are inthe output relation O.

I No duplicate rows can appear in O,so the cardinality of O cannot belarger than that of R.

I The schema of O is identical to theschema of R.

I The arity of O is the same as thatof R.

Example

σrating>8(S2) =

sid sname rating age

28 yuppy 9 35.058 rusty 10 35.0

σsid>10∧age<45.0(σrating>8(S2)) =

sid sname rating age

28 yuppy 9 35.058 rusty 10 35.0

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 70 / 144

Page 71: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Projection

I πa1,...,an (R) = {y | ∃x ∈ R (y .a1 =x .a1 ∧ . . . ∧ y .an = x .an)}

I Columns in the single input relationR that are not in the givenprojection list do not appear in theoutput relation O.

I Duplicate rows might appear in Ounless they are explicitly removed,and, if so, the cardinality of O is thesame as that of R.

I The schema of O maps one-to-oneto the given projection list, so thearity of O cannot be larger thanthat of R.

Example

πsname,rating (S2) =

sname rating

yuppy 9lubber 8guppy 5rusty 10

πsname,rating (σrating>8(S2)) =

sname rating

yuppy 9rusty 10

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 71 / 144

Page 72: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Set Operations (1)Union, Intersection, Difference

I These binary operations have theexpected set-theoretic semantics.

I Both input arguments R and S musthave compatible schemas (i.e., theirarity must be the same and thecolumns have to have the same typesone-to-one, left to right).

I The arity of O is identical to that of I .

I The cardinality of O may be largerthan that of the largest between in Rand S in the case of union, but not inthe case of intersection and difference.

Example

S1 ∪ S2 =

sid sname rating age

22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.028 yuppy 9 35.044 guppy 5 35.0

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 72 / 144

Page 73: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Set Operations (2)Union, Intersection, Difference

Example

S1 ∩ S2

sid sname rating age

31 lubber 8 55.558 rusty 10 35.0

Example

S1 \ S2

sid sname rating age

22 dustin 7 45.0

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 73 / 144

Page 74: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Cartesian/Cross Product

I R × S = {xy | ∃x ∈ R ∃y ∈S}

I The schema of O is theconcatenation of theschemas of R and S , unlessthere is a name clash, inwhich case renaming can beused.

I The arity of O is the sum ofthe arities of R and S .

I The cardinality of O is theproduct of the cardinalitiesof R and S .

Exampleρ1→sid1, 5→sid2(S1× R1) =

sid1 sname rating age sid2 bid day

22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 12/11/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 12/11/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 12/11/96

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 74 / 144

Page 75: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Joins (1)θ-Join

I R ./θ S = {xy | ∃x ∈ R ∃y ∈S (θ(xy))}

I R ./θ S ≡ σθ(R × S)

I The schema of O is as forCartesian product, as is arity.

I The cardinality of O cannotbe larger than the product ofthe cardinalities of R and S .

Exampleρ1→sid1, 5→sid2(S1 ./S1.sid<R1.sid R1) =

sid1 sname rating age sid2 bid day

22 dustin 7 45.0 58 103 12/11/9631 lubber 8 55.5 58 103 12/11/96

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 75 / 144

Page 76: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Joins (2)Equijoin and Natural Join

I An equijoin is a θ-join in whichall terms in θ are equalities.

I In an equijoin, the schema of Ois as for Cartesian product butonly one of the equated columnsis projected out, so the arityreduces by one for each suchcase.

I A natural join is an equijoin onall common columns.

I One only needs list the commoncolumns in the condition.

ExampleS1 ./sid R1 =

sid sname rating age bid day

22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 12/11/96

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 76 / 144

Page 77: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Division (1)Through Examples

I In integer division, giventwo integers A and B,A÷ B is the largestinteger Q such thatQ × B ≤ A.

I In relational division,given two relations Rand S , R ÷ S is thelargest relation instanceO such that O × S ⊆ R.

I If R lists suppliers andparts they supply, and Sparts, then R ÷ S listssuppliers of all S parts.

ExampleA =

sno pno

s1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

ExampleB1 =

pno

p2

B2 =pno

p2p4

B3 =pno

p1p2p4

ExampleA÷ B1 =

sno

s1s2s3s4

A÷ B2 =sno

s1s4

A÷ B3 =sno

s1

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 77 / 144

Page 78: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Division (2)Through Rewriting (1)

I Division, like join, can be defined by rewriting into primitiveoperations but, unlike join, it is not used very often, so most DBMSsdo not implement special algorithms for it.

I The schema of O is the schema of R minus the columns shared withS , so the arity of O cannot be as large as that of R.

I The cardinality of O cannot be larger than that of R.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 78 / 144

Page 79: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Division (3)Through Rewriting (2)

I Abusing notation, we can define division in terms of primitiveoperators as follows.

I Let r and s be relations with schemas R and S , respectively, and letS ⊆ R, then:

I T1 ← πR−S (r)× s computes the Cartesian product of πR−S (r) and sso that each tuple t ∈ πR−S (r) is paired with every s-tuple.

I T2 ← πR−S,S (r) merely reorders the attributes of r in preparation forthe set operation to come.

I T3 ← πR−S (T1 − T2) only retains those tuples t ∈ πR−S (r) such thatfor some tuple u in s, tu 6∈ r .

I r ÷ s ← πR−S (r)− T3 only retains those tuples t ∈ πR−S (r) such thatfor all tuples u in s, tu ∈ r .

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 79 / 144

Page 80: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Generalized Projection

I Recall that generalized projectionallows arithmetic expressionsinvolving attribute names (and notjust the latter) in the projection list.

I The first example to the rightreturns the sailor names with theirassociated ranking tripled.

I There is a further extended versionthat allows concomitant renamingas shown in the second example tothe right.

Exampleπsname,rating∗3(S2) =

sname rating

yuppy 27lubber 24guppy 15rusty 30

πsname,rating∗3→triplerating (S2) ≡ρ2→triplerating (πsname,rating∗3(S2)) =

sname triplerating

yuppy 27lubber 24guppy 15rusty 30

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 80 / 144

Page 81: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Definitions

Aggregation

I Recall that aggregation reducesa collection of values into asingle values by the applicationof a function such as count,sum, max, min or avg.

I It is also possible to form groupsby attribute values, e.g., to takethe average rating by age.

I Concomitant renaming can alsobe used.

Example

γavg(age)→averageage(S2) =

averageage

40.125

ageγavg(rating)→averagerating (S2) =

age averagerating

35.0 855.5 8

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 81 / 144

Page 82: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Example Relational Algebra Expressions (1)More Relation Instances

I For the next batch ofexamples, the variousrelation instances used areas shown.

ExampleS3 =

sid sname rating age

22 dustin 7 45.029 brutus 1 33.031 lubber 8 55.532 andy 8 25.558 rusty 10 35.064 horatio 7 35.071 zorba 10 16.074 horatio 9 35.085 art 3 25.595 bob 3 63.5

ExampleR2 =

sid bid day

22 101 10/10/9822 102 10/10/9822 103 08/10/9822 104 07/10/9831 102 10/11/9831 103 06/11/9831 104 12/11/9864 101 05/09/9864 102 08/09/9864 103 08/09/98

B1 =bid bname colour

101 interlake blue102 interlake red103 clipper green104 marine red

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 82 / 144

Page 83: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Example Relational Algebra Expressions (2)Find the names of the sailors who have reserved boat 103

I O1 = πsname(σbid=103(Reservations ./ Sailors))

I O2 = πsname((σbid=103(Reservations)) ./ Sailors)

I O1 ≡ O2

Example

sname

dustinlubberhoratio

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 83 / 144

Page 84: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Example Relational Algebra Expressions (3)Find the names of the sailors who have reserved a red boat

I Information about boat colour is only available in Boats, so an extrajoin is needed.

I O1 = πsname(σcolor=red (Boats) ./ (Reservations ./ Sailors))

I O2 = πsname(πsid (πbid (σcolor=red (Boats)) ./ Reservations) ./ Sailors)

I O1 ≡ O2

Example

sname

dustinlubberhoratio

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 84 / 144

Page 85: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Example Relational Algebra Expressions (4)Find the names of the sailors who have reserved a red or a green boat

I Using assignment:

1. T1 ← (σcolour=red (Boats) ∪ σcolour=green(Boats))2. O ← πsname(T1 ./ (Reservations ./ Sailors))

I Or:

1. T1 ← (σcolour=red∨colour=green(Boats)2. O ← πsname(T1 ./ (Reservations ./ Sailors))

Example

sname

dustinlubberhoratio

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 85 / 144

Page 86: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Example Relational Algebra Expressions (5)Find the names of the sailors who have reserved a red and a green boat

I Replacing ∪ with ∩ in the previous example doesn’t work.I Using assignment:

1. T1 ← πsid (σcolour=red (Boats) ./ Reservations)2. T2 ← πsid (σcolour=green(Boats) ./ Reservations)3. O ← πsname((T1 ∩ T2) ./ Sailors)

I On the other hand, πsname((T1 ∪ T2) ./ Sailors) does work for theprevious example.

Example

sname

dustinlubber

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 86 / 144

Page 87: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Example Relational Algebra Expressions (6)Find the ids of sailors older than 20 who have not reserved a red boat

I Using assignment:

1. T1 ← πsid (σage>20(Sailors)2. T2 ← πsid ((σcolour=red (Boats) ./ Reservations) ./ Sailors)3. O ← T1 \ T2

Example

sid

293258748595

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 87 / 144

Page 88: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Example Relational Algebra Expressions (7)Find the names of sailors who have reserved all boats

I The word all suggests the need for division.

I Projections are essential to arrange the schemas appropriately.

I Joins may be needed to recover columns that had to be dropped.I 1. T1 ← πsid,bid (Reservations)÷ πbid (Boats)

2. O ← πsname(T1 ./ Sailors)

Example

sname

dustin

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 88 / 144

Page 89: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

SummaryRelational Algebra

I An algebra, often extending, or modelled on, the relational algebralies at the heart of most advanced DBMSs.

I It is the most used target formalism for the internal representation oflogical plans.

I Rewriting that is based on logical-algebraic equivalences is animportant task in query optimization (as we will discuss later).

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 89 / 144

Page 90: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Examples

Advanced Database Management SystemsSQL

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 90 / 144

Page 91: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Outline

Example Relation Instances Again

Syntax and Semantics

Example SQL Queries

More Syntax and Semantics

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 91 / 144

Page 92: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Example Relation Instances Again

SQLRelation Instances Used in Examples

I Recall the relation instances onthe right.

I These have been used beforeand will be used again, asbefore, in the examples thatfollow.

ExampleS1 =

sid sname rating age

22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

S2 =sid sname rating age

28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

R1 =sid bid day

22 101 10/10/9658 103 12/11/96

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 92 / 144

Page 93: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Syntax and Semantics

Core, Informal SQL Syntax (1)The SELECT, FROM and WHERE Clauses

Definition

SELECT [DISTINCT] 〈 target list 〉FROM 〈 relation list 〉WHERE 〈 qualification 〉

I The SELECT clause defines which columns participate in the result,i.e., it plays the role of the relational-algebraic π operation.

I The FROM clause defines which relations are used as inputs, i.e., itcorresponds to the leaves in a relational-algebraic expression.

I The WHERE clause defines the (possibly complex) predicateexpression which a row must satisfy to participate in the result, i.e., itsupplies the predicates for relational-algebraic operations like σ and ./.

I DISTINCT is an optional keyword indicating that duplicates must beremoved from the answer.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 93 / 144

Page 94: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Syntax and Semantics

Core, Informal SQL Syntax (2)The Arguments to SELECT, FROM and WHERE Clauses

I The argument to a FROM clause is a list of relation names (possiblywith a range variable after each name, which allows a row in thatrelation to be referred to elsewhere in the query).

I It is good practice to use range variables, so use them.

I The argument to a SELECT clause is a list of expressions based onattributes taken from the relations in the relation list.

I If ’*’ is used instead of a list, all available attributes are projected out.

I The argument to a WHERE clause is referred to as a qualification,i.e., a Boolean expression whose terms are comparisons (of the formE op const or E1 op E2 where E ,E1,E2 are, typically, attributes takenfrom the relations in the relation list, andop ∈ {<,>,=, >=,=<,<>}) combined using the connectives AND,OR and NOT.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 94 / 144

Page 95: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Syntax and Semantics

Core, Informal SQL Semantics (1)Three-to-Four Steps to the Answer

I To characterize the semantics of a SQL query using a direct,clause-by-clause translation into a relational-algebraic expression thatevaluates to the correct answer, do the following:

1. Compute the cross-product of relations in the FROM list, call it J.2. Discard tuples in J that fail the qualification, call the result S.3. Delete from S any attribute that is not in the SELECT list, call the

result P.4. If DISTINCT is specified, eliminate duplicate rows in P to obtain the

result A, otherwise A=P.

I While as an evaluation strategy, the procedure above is likely to bevery inefficient, it provides a simple, clear characterization of theanswer to a query.

I As we will see, an optimizer is likely to find more efficient evaluationstrategies to compute the same answer.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 95 / 144

Page 96: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Syntax and Semantics

Core, Informal SQL Semantics (2)Find the names of the sailors who have reserved boat 103.

ExampleSELECT S.snameFROM Sailors S, Reservations RWHERE S.sid = R.sid AND R.bid = 103

ExampleAssume the database state contains {S1, R1}. Then, in Step 1,S1× R1 =

sid1 sname rating age sid2 bid day

22 dustin 7 45.0 22 101 10/10/9622 dustin 7 45.0 58 103 12/11/9631 lubber 8 55.5 22 101 10/10/9631 lubber 8 55.5 58 103 12/11/9658 rusty 10 35.0 22 101 10/10/9658 rusty 10 35.0 58 103 12/11/96

In Step 2, σS.sid=R.sid∧R.bid=103(S1× R1)

sid1 sname rating age sid2 bid day

58 rusty 10 35.0 58 103 12/11/96

In Step 3, πsname (σS.sid=R.sid∧R.bid=103(S1× R1))

sname

rusty

No DISTINCT implies no Step 4.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 96 / 144

Page 97: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Syntax and Semantics

Core, Informal SQL Semantics (3)Find the ids of sailors who have reserved at least one boat

ExampleSELECT S.sidFROM Sailors S, Reservations RWHERE S.sid = R.sid

ExampleAssume the database state contains {S1, R1}. Then, Step 1,S1× R1 produces the same results as in the previous example.

In Step 2, σS.sid=R.sid (S1× R1)

sid1 sname rating age sid2 bid day

22 dustin 7 45.0 22 101 10/10/9658 rusty 10 35.0 58 103 12/11/96

In Step 3, πsid (σS.sid=R.sid (S1× R1))

sid

2258

No DISTINCT implies no Step 4.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 97 / 144

Page 98: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Syntax and Semantics

Core, Informal SQL Syntax (3)Expressions and Strings

I Arithmetic expressions andstring pattern matching can alsobe used.

I AS and = are two ways to namefields in result.

I LIKE is used for stringmatching. ’ ’ stands for any onecharacter and ’%’ stands forzero or more arbitrarycharacters.

Example

SELECT S.age,age1=S.age-5,2*S.age AS age2

FROM Sailors SWHERE S.sname LIKE ’Y %Y’

Over S2, the answer would be:

age age1 age 2

35.0 30.0 70.0

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 98 / 144

Page 99: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Example SQL Queries

Example SQL Queries (1)More Relation Instances

I For the next batch ofexamples, the variousrelation instances used arealso known from previousexamples.

ExampleS3 =

sid sname rating age

22 dustin 7 45.029 brutus 1 33.031 lubber 8 55.532 andy 8 25.558 rusty 10 35.064 horatio 7 35.071 zorba 10 16.074 horatio 9 35.085 art 3 25.595 bob 3 63.5

ExampleR2 =

sid bid day

22 101 10/10/9822 102 10/10/9822 103 08/10/9822 104 07/10/9831 102 10/11/9831 103 06/11/9831 104 12/11/9864 101 05/09/9864 102 08/09/9864 103 08/09/98

B1 =bid bname colour

101 interlake blue102 interlake red103 clipper green104 marine red

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 99 / 144

Page 100: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Example SQL Queries

Example SQL Queries (2)Find the names of the sailors who have reserved a red boat

I Recall πsname(σcolor=red (Boats) ./ (Reservations ./ Sailors))

Example

SELECT S.snameFROM Sailors S, Boats B, Reservations RWHERE S.sid = R.sid

AND R.bid = B.bidAND B.colour = ’red’

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 100 / 144

Page 101: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Example SQL Queries

Example SQL Queries (3)Find the sids of the sailors who have reserved a red or a green boat

ExampleEither: SELECT S.sid

FROM Sailors S, Boats B, Reservations RWHERE S.sid = R.sid

AND R.bid = B.bidAND (B.colour = ’red’ OR B.colour = ’green’)

Or: SELECT S1.sidFROM Sailors S1, Boats B1, Reservations R1WHERE S1.sid = R1.sid

AND R1.bid = B1.bidAND B1.colour = ’red’

UNIONSELECT S2.sidFROM Sailors S2, Boats B2, Reservations R2WHERE S2.sid = R2.sid

AND R2.bid = B2.bidAND B2.colour = ’green’

Set difference is captured by EXCEPT, e.g., to find the sids of the sailorswho have reserved a red but not a green boat.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 101 / 144

Page 102: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Example SQL Queries

Example SQL Queries (4)Beware being quick when there are quirks

ExampleContrast this query: SELECT S.sid

FROM Sailors S, Boats B, Reservations RWHERE S.sid = R.sid

AND R.bid = B.bidAND (B.colour = ’red’ AND B.colour = ’green’)

with this one: SELECT S.sidFROM Sailors S, Boats B, Reservations RWHERE S.sid = R.sid

AND R.bid = B.bidAND B.colour = ’red’

INTERSECTSELECT S.sidFROM Sailors S, Boats B, Reservations RWHERE S.sid = R.sid

AND R.bid = B.bidAND B.colour = ’green’

and this one: SELECT S.sidFROM Sailors S, Boats B1, Reservations R1,

Boats B2, Reservations R2,WHERE S.sid = R1.sid AND S.sid=R2.Sid

AND R1.bid = B1.bid AND R2.Bid = B2.BidAND (B1.colour = ’red’ AND B2.colour = ’green’)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 102 / 144

Page 103: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Core, Informal SQL Syntax (4)Nested Queries

Example

SELECT S.snameFROM Sailors SWHERE S.sid IN ( SELECT R.sid

FROM Reservations RWHERE R.bid = 103

)

I The ability to nest queries is a powerful feature of SQL.

I Queries can be nested in the WHERE, FROM and HAVING clauses.

I To understand the semantics of nested queries, think of a nestedloop, i.e., for each Sailors tuple, compute the nested query and whichpass the IN qualification.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 103 / 144

Page 104: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Core, Informal SQL Syntax (5)Correlated Nested Queries

I We could have used NOT IN to find sailor who have not reservedboat 103.

I Another set comparison operator (implicitly, with the empty set) isEXISTS, and see the Bibliography for yet more.

I It is also possible to correlate the queries via shared range variables.

Example

SELECT S.snameFROM Sailors SWHERE EXISTS ( SELECT *

FROM Reservations RWHERE R.bid = 103 AND R.sid = S.sid

)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 104 / 144

Page 105: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Core, Informal SQL Syntax (6)Aggregate Operators

Definition

COUNT ([DISTINCT] A) the number of (unique) values in the A columnSUM ([DISTINCT] A) the sum of all (unique) values in the A columnAVG ([DISTINCT] A) the average of all (unique) values in the A columnMAX (A) the maximum value in the A columnMIN (A) the minimum value in the A column

I Aggregate operators are another significant extension of relationalalgebra in SQL.

I They take a collection of values as input and return a single value asoutput.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 105 / 144

Page 106: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Example SQL Queries (5)Aggregation Queries

Example

I SELECT COUNT(*)FROM Sailors S

I SELECT AVG(S.age)FROM Sailors SWHERE S.rating = 10

I SELECT COUNT(DISTINCT S.rating)FROM Sailors SWHERE S.name = ’horatio’ OR S.name = ’dusting’

I SELECT S.snameFROM Sailors SWHERE S.rating = ( SELECT MAX(S2.rating)

FROM Sailors S2WHERE S2.sname = ’horatio’

)AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 106 / 144

Page 107: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Example SQL Queries (6)Find the name and the age of the oldest sailor(s)

Example

I SELECT S.sname, MAX(S.age) -- Illegal SQL!FROM Sailors S

I SELECT S.sname, S.ageFROM Sailors SWHERE S.age = ( SELECT MAX(S2.age)

FROM Sailors S2)

I The first query is illegal: if a SELECT clause uses an aggregateoperation either it must do so for all attributes in the clause or else itmust contain a GROUP BY clause.

I The second query, with a nested query, is legal and correct.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 107 / 144

Page 108: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Core, Informal SQL Syntax (7)Partitioned Aggregation

Definition

SELECT [DISTINCT] 〈 target list 〉FROM 〈 relation list 〉WHERE 〈 qualification 〉GROUP BY 〈 grouping list 〉HAVING 〈 grouping qualification 〉

I A group is a partition of rows that agree on the values of theattributes in the grouping list.

I We can mix attribute names and applications of aggregate operationsin the target list but the attribute names must be a subset of thegrouping list, so that each row in the result corresponds to one group.

I The grouping qualification determines whether a row is produced inthe answer for a given group.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 108 / 144

Page 109: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Core, Informal SQL Semantics (4)Three More Steps Than Before to the Answer

I To characterize what is the answer to this extended form of an SQLquery:

1. Compute the cross-product of relations in the FROM list, call it J.2. Discard tuples in J that fail the qualification, call the result S.3. Delete from S any attribute that is not in the SELECT list, call the

result P.4. Sort P into groups by the value of attributes in the GROUP BY list,

call the result G.5. Discard groups in G that fail the grouping qualification, call the result

H.6. Generate one answer tuple per qualifying group, call the result T.7. If DISTINCT is specified, eliminate duplicate rows in T to obtain the

result A, otherwise A=T.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 109 / 144

Page 110: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Core, Informal SQL Semantics (5)Find the age of the youngest sailor that is at least 18, for each rating with at least 2 suchsailors

ExampleSELECT S.rating, MIN(S.age) AS minageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT(*) >= 2

Assume the database state contains {S3}. Then,in Step 1, S3 =

sid sname rating age

22 dustin 7 45.029 brutus 1 33.031 lubber 8 55.532 andy 8 25.558 rusty 10 35.064 horatio 7 35.071 zorba 10 16.074 horatio 9 35.085 art 3 25.595 bob 3 63.5

ExampleAfter Steps 2 and 3, we have:

rating age

7 45.01 33.08 55.58 25.5

10 35.07 35.09 35.03 25.53 63.5

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 110 / 144

Page 111: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Core, Informal SQL Semantics (6)Find the age of the youngest sailor that is at least 18, for each rating with at least 2 suchsailors

ExampleAfter Step 4, we have:

rating age

1 33.0

3 25.53 63.5

7 45.07 35.0

8 55.58 25.5

9 35.0

10 35.0

ExampleAfter Step 5, we have:

rating age

3 25.53 63.5

7 45.07 35.0

8 55.58 25.5

After Steps 6 and 7, we have:rating minage

3 25.57 35.08 25.5

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 111 / 144

Page 112: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Example SQL Queries (7)For each red boat, find the number of reservations for this boat

Example

SELECT B.bid, COUNT(*) AS reservationCountFROM Boats B, Reservations RWHERE R,bid = B.bid AND B.colour = ’red’GROUP BY B.Bid

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 112 / 144

Page 113: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Null Values

I Field values in a tuple are sometimes unknown (e.g., a rating has notbeen assigned) or inapplicable (e.g., someone who is not married hasno value to go in a ‘spouse name’ column).

I SQL provides a special value null for such situations.I The presence of null complicates many issues.

I Special operators are needed to check if value is/is not null.I Is rating > 8 true or false when rating is equal to null? What about

AND, OR and NOT connectives?I We need a 3-valued logic (true, false and unknown).I The meaning of many constructs must be defined carefully (e.g.,

WHERE clause eliminates rows that don’t evaluate to true).I New operators (in particular, outer joins) become possible and are

often needed.

I In SQL, null values can be disallowed when columns are defined.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 113 / 144

Page 114: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

SummarySQL

I SQL was an important factor in the early acceptance of the relationalmodel because it is more natural than earlier, procedural querylanguages.

I SQL is relationally complete (in fact, it has significantly moreexpressive power than relational algebra).

I Even queries that can be expressed in RA can often be expressedmore naturally in SQL.

I There are usually many alternative ways to write a query, so anoptimizer is needed to find an efficient evaluation plan.

I In practice, users need to be aware of how queries are optimized andevaluated for best results.

I NULL for unknown field values brings many complications

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 114 / 144

Page 115: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

More Syntax and Semantics

Advanced Database Management SystemsQuery Processing: Logical Optimization

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 115 / 144

Page 116: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Outline

Query Processing in a Nutshell

Equivalence-Based Rewriting of Logical QEPs

An Approach to Logical Rewriting

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 116 / 144

Page 117: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (1)The Query Processing Stack

1. Parse the declarative query

2. Translate to obtain an algebraicexpression

3. Rewrite into a canonical,heuristically-efficient logicalquery execution plan (QEP)

4. Select the algorithms andaccess methods to obtain aquasi-optimal, costed concreteQEP

5. Execute (the typicallyinterpretable form of) theprocedural QEP

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 117 / 144

Page 118: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (2)Example Relations

I Let Flights and UsedFor be examplerelations.

I Flights asserts which flight numberdeparts from where to where and itsdeparture and arrival times.

I UsedFor asserts which plane is usedfor each flight on which weekday.

I Usable asserts which plane type(e.g., a 767) can be used for whichflight.

I Certified asserts which pilot can flywhich plane type.

I Their schemas are shownbelow.

I Underlined sets of fieldsdenote a key.

Example (Schemas)Flights (fltno: string, from: string, to: string, dep: date,

arr: date)UsedFor (planeid:string, fltid: string, weekday: string)

Usable (flid:string, pltype: string)

Certified (pilid:string, planetype: string)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 118 / 144

Page 119: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (3)Example SQL Query and Corresponding Translator Output

Example (SQL)

SELECT U.planeid, F.fromFROM Flights F, UsedFor UWHERE F.fltno = U.fltid

Example (Algebraic Expression)πU.planeid,F.from(σF.fltno=U.fltid (Flights × UsedFor))

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 119 / 144

Page 120: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (4)Algebraic Expression and Corresponding Algebraic Operator Tree

Example (Algebraic Expression)πU.planeid,F.from(σF.fltno=U.fltid (Flights × UsedFor))

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 120 / 144

Page 121: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (5)Example Rewriting Rule and Corresponding Outcome

Example (Join Insertion Rule)

σθ(R × S)⇔ (R ./θ S)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 121 / 144

Page 122: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (6)Rewritten Operator Tree and Corresponding QEP

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 122 / 144

Page 123: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (7)Translate, then Rewrite

I The outcome of translation is a relational-algebraic expression derivedfrom a direct, clause-by-clause translation.

I A relational-algebraic expression can be represented as an algebraicoperator tree.

I By applying rewrite rules, the (usually naive) algebraic operator treecan be rewritten into a heuristically-efficient canonical form, oftencalled the logical QEP for the query.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 123 / 144

Page 124: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (8)Compute Costs to Select the Algorithms, then Evaluate

I Typically, every algebraic operator can be implemented by differentconcrete algorithms.

I Using cost models, the plan selector considers which concretealgorithm to use for each operator in the logical QEP.

I The various concrete algorithms that implement each operator oftenadopt an iterator pattern.

I The result of this process is a physical QEP, i.e., one that expresses aconcrete computational process in the actual environment in whichthe query is to be evaluated.

I The nodes (i.e., the selected concrete algorithms) in a physical QEPare often referred to as physical operators.

I The physical QEP can be compiled into an executable or, morecommonly, remains an interpretable structure that a queryevaluation engine knows how to process.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 124 / 144

Page 125: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Query Processing in a Nutshell

Overview of Query Processing (9)The Big Questions

I There are three main issues in query optimization:

1. For a given algebraic expression, which rewrite rules to use tocanonize it? This determines what heuristic optimization decisions arecarried out.

2. For a given canonical algebraic expression, which differentconcrete algorithm assignments to consider? This determines thesearch space for finding the desired QEP.

3. If the desired plan is the one that results in the shortestresponse time, what cost models should be used to estimate theresponse time of a QEP? This determines the (necessarilysub-optimal) choice of which QEP to use to evaluate the query.

I The above sequence of questions (with the strategy they imply) wasfirst proposed in 1979 in IBM’s System R, the first practical relationalDBMS, and remains the dominant paradigm.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 125 / 144

Page 126: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (1)Logical Optimization (1)

I Logical optimization involves the transformation of an expression E ina language L into an equivalent expression E ′ also in L where E ′ islikely to admit of a more efficient evaluation than E .

I Transformations are often expressed as rewrite rules that expressrelational-algebraic equivalences of various kinds.

I The different rewrite rules have different purposes, among which:I to break down complex predicates in selections and long attribute lists

in projections in order to enable more rewrites;I to move selections (which are cardinality-reducing) and projections

(which are arity-reducing) upstream (i.e., towards the leaves) andthereby reduce the data volumes that downstream operators mustcontend with;

I to enable entire subtrees to be implemented by a single efficientalgorithm.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 126 / 144

Page 127: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (2)Logical Optimization (2)

I Recall that an associative law states that two applications of anoperation ω can be performed in either order:(x ω y) ω z ⇔ x ω (y ω z)

I Recall that a commutative law states that the result an operation ωis independent of the order of the operands:(x ω y)⇔ (y ω x)

I For example:

1. + is both commutative and associative.2. − is neither.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 127 / 144

Page 128: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (3)Primitive/Derived Transformations

I The definitions of derived operations in terms of primitive ones arerelational-algebraic equivalences.

I The most widely used among these (call it R0 now, we have alludedto it already as a join-insertion rule) rewrites a Cartesian productfollowed by a selection into a join if the selection condition is a joincondition, i.e.:

σθ(R × S) ⇔ (R ./θ S)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 128 / 144

Page 129: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (4)Selection

I R1: A conjunctive selection condition can be broken up into acascade (i.e., a sequence) of individual σ operations, i.e.:

σθ1∧...∧θn (R) ⇔ σθ1(. . . (σθn (R)) . . .)

I R2: The σ operation is commutative, i.e.:

σθ1(σθ2(R)) ⇔ σθ2(σθ1(R))

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 129 / 144

Page 130: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (5)Projection

I R3: In a cascade (i.e., a sequence) of individual π operations all butthe last one can be ignored, i.e.:

πL1(. . . (πLn (R)) . . .) ⇔ πL1(R)

I R4: The σ and π operations commute if the selection condition θonly involves attributes in the projection list a1, . . . , an, i.e.:

πa1,...,an (σθ(R)) ⇔ σθ(πa1,...,an (R))

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 130 / 144

Page 131: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (6)Commutativity of Join and Cartesian Product (1)

I R5: Both the ./ and the × operations are commutative, i.e.:

R ./θ S ⇔ S ./θ RR × S ⇔ S × R

I R6.1: The σ and ./ (resp., ×) operations commute if the selectioncondition θ only involves attributes in one of the operands, i.e.:

σθ(R ./ S) ⇔ σθ(R) ./ S

I R6.2: The σ and ./ (resp., ×) operations commute if the selectioncondition θ is of the form θ1 ∧ θ2 and θ1 only involves attributes inone operand and θ2 only involves attributes in the other, i.e.:

σθ(R ./ S) ⇔ σθ1(R) ./ σθ2(S)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 131 / 144

Page 132: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (7)Commutativity of Join and Cartesian Product (2)

I R7.1: The π and ./ (resp., ×) operations commute if the projectionlist is of the form L = a1, . . . , am, b1, . . . , bn, the ai are attributes ofone operand, the bj are attributes of the other, and the join conditionθ only involves attributes in L, i.e.:

πL(R ./θ S) ⇔ (πa1,...,am (R)) ./θ (πb1,...,bn (S))

I R7.2: If the join condition θ includes R-attributes am+1, . . . , am+k andS−attributes bn+1, . . . , bn+l not in L, then they need also to beprojected from R and S and a final projection of L is still required, i.e.:

πL(R ./θ S)⇔ πL(πa1,...,am,am+1,...,am+k(R)) ./θ (πb1,...,bn,bn+1,...,bn+l

(S))

I R7.3: Since there is no predicate in a Cartesian product R7.1 alwaysapplies with ./θ replaced with ×.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 132 / 144

Page 133: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (8)Commutativity of Set Operations (1)

I R8: Both the ∪ and the ∩ (but not the \) operations arecommutative, i.e.:

R ∪ S ⇔ S ∪ RR ∩ S ⇔ S ∩ R

I R9: The ./, ×, ∪ and ∩ operations are individually associative, i.e.:

(R ./ S) ./ T ⇔ R ./ (S ./ T )(R × S)× T ⇔ R × (S × T )(R ∪ S) ∪ T ⇔ R ∪ (S ∪ T )(R ∩ S) ∩ T ⇔ R ∩ (S ∩ T )

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 133 / 144

Page 134: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (9)Commutativity of Set Operations (2)

I R10: The σ operation commutes with the ∪, ∩, and \ operations,i.e.:

σθ(R ∪ S) ⇔ (σθ(R)) ∪ (σθ(S))σθ(R ∩ S) ⇔ (σθ(R)) ∩ (σθ(S))σθ(R \ S) ⇔ (σθ(R)) \ (σθ(S))

I R11: The π operation commutes with the ∪ operation, i.e.:

πL(R ∪ S) ⇔ (πL(R) ∪ πL(S))

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 134 / 144

Page 135: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Equivalence-Based Rewriting of Logical QEPs

Relational-Algebraic Equivalences (10)Other Transformations

I Predicates can also be rewritten (e.g., using De Morgan’s laws topush negation into conjuncts and disjuncts).

I Note that using R1 to rewrite separate predicates into a conjunctionthereof could, depending on the compiler, lead to a contradiction, inwhich case the selection would be satisfied by no tuple in the input,resulting in an empty result.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 135 / 144

Page 136: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

A Heuristic Algebraic Optimization Strategy (1)

1. Use R1 to break up complex select conditions into individual ones.This creates opportunities for pushing selections down towards theleaves, thereby reducing the cardinality of intermediate results.

2. Use R2, R4, R6 and R10, which define commutativity for select andpushing selections down towards the leaves.

3. Use R5, R8 and R9, which define commutativity and associativity forbinary operations, to rearrange the leaf nodes. The following criteriacan be used:

3.1 Ensure that the leaves under the most restrictive selections (i.e., thosethat reduce the size of the result) are positioned to execute first(typically, as the left operand). This can rely on selectivity estimatescomputed from metadata in the system catalogue, as we shall see.

3.2 Avoid causing Cartesian products to be used, i.e., override the abovecriterion if it would not lead to a join being placed above the operands.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 136 / 144

Page 137: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

A Heuristic Algebraic Optimization Strategy (2)

4. Use R0 to combine selections on the result of Cartesian products intoa join. This allows efficient join algorithms to be used

5. Use R3, R4, R7 and R11, which define how project cascades andcommutes with other operations, to break up and move projectionlists down towards the leaves, thereby reducing the arity ofintermediate results. Only those attributes needed downstream in theplan need be kept from any operation.

6. Identify subtrees that express a composition of operations for whichthere is available a single, specific algorithm that computes the resultof the composition.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 137 / 144

Page 138: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

An Example Run (1)Schemas and Query

Example (Simplified Schema)

Employee (fname, mname, lname, bday, address, sex, sal, NI, dno)Project (pname, pnumber, ploc, dnum)WorksOn (eNI, pno, hours)

Example (SQL Query)

SELECT E.lname

FROM Employee E, WorksOn W, Project P

WHERE P.pname = ’Scorpio’

AND P.pnumber = W.pno AND W.eNI = e.NI

AND E.bday > ’1954.12.08’

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 138 / 144

Page 139: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

An Example Run (2)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 139 / 144

Page 140: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

An Example Run (3)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 140 / 144

Page 141: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

An Example Run (4)

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 141 / 144

Page 142: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

SummaryEquivalence-Based Rewriting of Logical QEPs

I For a given algebraic expression, deciding which rewrite rules to useto canonize it is a major issue in query processing.

I This determines what heuristic optimization decisions are carried out.

I Equivalence-based rewriting of logical QEPs is a classical strategy.

I It is sufficiently well-established and well-understood for there to be aconsensus on what works fairly well in the classical cases.

I There is great uncertainty as to what extent rewriting is also useful inseveral kinds of advanced DBMSs.

I In any case, cost-based optimization is always fundamentally required.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 142 / 144

Page 143: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

Acknowledgements

The material presented mixes original material by the author and byNorman Paton as well as material adapted from

I [Elmasri and Navathe, 2006]

I [Garcia-Molina et al., 2002]

I [Ramakrishnan and Gehrke, 2003]

I [Silberschatz et al., 2005]

The author gratefully acknowledges the work of the authors cited whileassuming complete responsibility any for mistake introduced in theadaptation of the material.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 143 / 144

Page 144: Advanced Database Management Systemsstudentnet.cs.manchester.ac.uk/pgt/2011/COMP60732/ADBMS-slide… · Advanced Database Management Systems Database Management Systems Alvaro A A

Rewriting

References

Elmasri, R. and Navathe, S. B. (2006).Fundamentals of Database Systems.Addison-Wesley, 5th edition.

Garcia-Molina, H., Ullman, J. D., and Widom, J. (2002).Database Systems: The Complete Book.Pearson Education Limited, 1st edition.

Ramakrishnan, R. and Gehrke, J. (2003).Database Management Systems.McGraw-Hill Education - Europe, 3rd edition.

Silberschatz, A., Korth, H. F., and Sudarshan, S. (2005).Database System Concepts.McGraw-Hill Education - Europe, 5th edition.

AAAF (School of CS, Manchester) Advanced DBMSs 2011-2012 144 / 144