CS 542 Overview of query processing

44
CS 542 Database Management Systems Relational Database Programming J Singh January 24, 2011

Transcript of CS 542 Overview of query processing

Page 1: CS 542 Overview of query processing

CS 542 Database Management Systems

Relational Database Programming

J Singh

January 24, 2011

Page 2: CS 542 Overview of query processing

2© J Singh, 2011 2

Simple SQL Queries (p1)

Relation BROWSER_TABLE

SELECT * FROM BROWSER_TABLE WHERE ENGINE = 'Gecko'

Browser Engine Platform Engine Version

Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7

AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9

Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7

Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8

Browser Engine Platform Engine Version

Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9Camino 1.5 Gecko OSX.3+ 1.8

Netscape Browser 8 Gecko Win 98SE+ 1.7Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.81. Start with the Relation

2. Select () Rows

Page 3: CS 542 Overview of query processing

3© J Singh, 2011 3

Simple SQL Queries (p2)

Relation BROWSER_TABLE

SELECT BROWSER, PLATFORM FROM BROWSER_TABLE

WHERE ENGINE = 'Gecko'

Browser Engine Platform Engine Version

Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7

AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9

Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7

Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8

Browser Platform

Firefox 3.0 Win 2k+ / OSX.3+Camino 1.5 OSX.3+

Netscape Browser 8 Win 98SE+Netscape Navigator 9 Win 98+ / OSX.2+

1. Start with the Relation2. Select () Rows3. Project () Columns

Page 4: CS 542 Overview of query processing

4© J Singh, 2011 4

Simple SQL Queries (p3)

Relation BROWSER_TABLE

SELECT BROWSER, PLATFORM AS OS FROM BROWSER_TABLE

WHERE ENGINE = 'Gecko'

Browser Engine Platform Engine Version

Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7

AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9

Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7

Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8

Browser OS

Firefox 3.0 Win 2k+ / OSX.3+Camino 1.5 OSX.3+

Netscape Browser 8 Win 98SE+Netscape Navigator 9 Win 98+ / OSX.2+

1. Start with the Relation2. Select () Rows3. Project () Columns4. Rename () Columns

Page 5: CS 542 Overview of query processing

5© J Singh, 2011 5

SQL Conditions

• In WHERE clause:

– String1 = String2, String1 > String2 and other

comparison operators

• Comparisons are controlled by „collations‟, e.g.,

– COLLATE Latin1_General_CI_AS (Latin1 collation, case insensitive, accent sensitive)

• For other available collations, check your database

• Collations can be specified at three levels

– For the entire database

– For an attribute during in CREATE TABLE

– In the WHERE clause

– LIKE String (pattern matching), e.g.,

• 'John Wayne' LIKE 'John%'

• 'John Wayne' LIKE ‘% W_yne'

Page 6: CS 542 Overview of query processing

6© J Singh, 2011 6

SQL Special Data Types (p1)

• Dates and Times (look them up)

• NULL values ( in Relational Algebra)

– Can mean one of three things:

• Value is unknown

• Value is inapplicable (e.g., spouse name for a single person)

• Value not shown – perhaps because of security concerns

– Regardless of the cause, NULL can not be treated as a constant

• Operations with NULLs

– NULL + number NULL

– NULL number NULL

– NULL = NULL UNKNOWN

– X IS NULL TRUE or FALSE (depending on X)

– NULL 0

– NULL - NULL

NULL

NULL

Page 7: CS 542 Overview of query processing

7© J Singh, 2011 7

SQL Special Data Types (p2)

• UNKNOWN values

– Result from comparison with NULLs

– Other comparisons yield TRUE or FALSE

• UNKNOWN means neither TRUE nor FALSE

– Operations when combined with other logical values

• UNKNOWN AND TRUE UNKNOWN

• UNKNOWN AND FALSE FALSE

• UNKNOWN OR TRUE TRUE

• UNKNOWN OR FALSE UNKNOWN

• NOT UNKNOWN UNKNOWN

Page 8: CS 542 Overview of query processing

8© J Singh, 2011 8

Ordering Results

Relation BROWSER_TABLE

SELECT BROWSER, PLATFORM FROM BROWSER_TABLE

WHERE ENGINE = 'Gecko' ORDER BY ENGINE_VERSION, BROWSER

Browser Engine Platform Engine Version

Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7

AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9

Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7

Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8

Browser Platform

Netscape Browser 8 Win 98SE+Camino 1.5 OSX.3+

Netscape Navigator 9 Win 98+ / OSX.2+Firefox 3.0 Win 2k+ / OSX.3+

1. Start with the Relation2. Select () Rows3. Order Rows4. Project () Columns

Page 9: CS 542 Overview of query processing

9© J Singh, 2011 9

Detour: World Database

• A sample MySQL database downloadable from the web

• Has 3 tables: City, Country, CountryLanguage

– City

• ID, Name, CountryCode, District, Population

– Country

• Code, Name, Continent, Region, SurfaceArea, IndepYear, Population, LifeExpectancy, GNP, GNPOld, LocalName, GovernmentForm, HeadOfState, Capital, Code2

– CountryLanguage

• CountryCode, Language, IsOfficial, Percentage

– The three tables are „connected‟ by the CountryCode attribute.

Page 10: CS 542 Overview of query processing

10© J Singh, 2011 10

Joins

• Find all cities in Estonia

SELECT City.Name

FROM City, Country

WHERE Country.Name = 'Estonia'

AND City.CountryCode = Country.Code ;

• Find all countries where Dutch is the official language

SELECT Country.Name

FROM Country, CountryLanguage

WHERE CountryLanguage.CountryCode = Country.Code

AND CountryLanguage.Language = 'Dutch'

AND CountryLanguage.isOfficial = 'T' ;

Page 11: CS 542 Overview of query processing

11© J Singh, 2011 11

Join Semantics – Nested Loops

• Find all cities in Estonia

SELECT City.Name FROM City, Country

WHERE Country.Name = 'Estonia‟

AND City.CountryCode = Country.Code

• Is equivalent to

For each tuple t1 in City:

For each tuple t2 in Country:

If the WHERE clause is satisfied:

Accumulate <t1, t2> into a result set

Project City.Name from the accumulated result set

Page 12: CS 542 Overview of query processing

12© J Singh, 2011 12

Join Semantics – Relational Algebra

• Find all cities in Estonia

SELECT City.Name

FROM City, Country

WHERE Country.Name = 'Estonia'

AND City.CountryCode = Country.Code

• Is equivalent to

A1( B1='Estonia' AND A2= B2

(A B) )

Where A = City, B = Country,

A1 = City.Name, A2 = City.CountryCode, A3 = Country.Code

Page 13: CS 542 Overview of query processing

13© J Singh, 2011 13

Self-Joins

• Find all districts in Kenya that have more than one city

SELECT distinct c1.district

FROM city c1, city c2, country

WHERE c1.name != c2.name

AND country.code = c1.countrycode

AND country.code = c2.countrycode

AND country.name = 'kenya';

– The same table (city) gets used with two names, c1 and c2

Page 14: CS 542 Overview of query processing

14© J Singh, 2011 14

Set Operators

• Find all districts in Kenya that have exactly one city

( SELECT distinct city.district

FROM city, country

WHERE country.code = city.countrycode

AND country.name = 'kenya' )

EXCEPT

( SELECT distinct c1.district

FROM city c1, city c2, country

WHERE c1.name != c2.name

AND country.code = c1.countrycode

AND country.code = c2.countrycode

AND country.name = 'kenya' );

• Both sides must yield the same tuples

Or UNION or INTERSECT

Page 15: CS 542 Overview of query processing

15© J Singh, 2011 15

Subqueries

• A different way to structure queries (without using joins)

SELECT ___________________

FROM _____Subquery 3____

WHERE _____Subquery 1____

_____Subquery 2____

Page 16: CS 542 Overview of query processing

16© J Singh, 2011 16

Subqueries Returning Scalars

• Find all cities in Estonia

SELECT City.Name

FROM City, Country

WHERE Country.Name = 'Estonia'

AND City.CountryCode = Country.Code

• Can also be written as

SELECT Name

FROM City

WHERE CountryCode =

(SELECT Code FROM Country WHERE Name = 'Estonia')

• The two forms are equivalent except when…

Page 17: CS 542 Overview of query processing

17© J Singh, 2011 17

Conditions Returning Relations

• Find all countries where Dutch is the official language

SELECT Country.Name

FROM Country, CountryLanguage

WHERE CountryLanguage.CountryCode = Country.Code

AND CountryLanguage.Language = 'Dutch'

AND isOfficial = 'T' ;

• Can also be written as

SELECT Name FROM Country

WHERE Code IN

( SELECT CountryCode IN CountryLanguage

WHERE Language = 'Dutch' AND isOfficial = 'T' );

Page 18: CS 542 Overview of query processing

18© J Singh, 2011 18

Conditions Returning Tuples

• Find all countries where Dutch is the official language

SELECT Name FROM Country

WHERE Code IN

( SELECT CountryCode IN CountryLanguage

WHERE Language = 'Dutch' AND isOfficial = 'T' );

• Can also be written as

SELECT Name FROM Country

WHERE (Code, 'T') IN

( SELECT CountryCode, isOfficial FROM CountryLanguage

WHERE Language = 'Dutch' );

Page 19: CS 542 Overview of query processing

19© J Singh, 2011 19

Subqueries in FROM clauses

• Total population of all countries with Dutch as the official language

SELECT Name FROM Country

WHERE Code IN

( SELECT CountryCode IN CountryLanguage

WHERE Language = 'Dutch' AND isOfficial = 'T' );

Page 20: CS 542 Overview of query processing

20© J Singh, 2011 20

Cross Joins

• Populations of cities in Finland relative to Aruba & Singapore

SELECT

city.name as City,

city.population as Population,

cntry.name as Country,

(city.population * 100 / cntry.population) as 'Percent'

FROM

(SELECT * FROM CITY WHERE CountryCode = 'fin') AS city

CROSS JOIN

(SELECT * FROM Country WHERE Code='abw' OR Code=‘sgp')

AS cntry;

Page 21: CS 542 Overview of query processing

21© J Singh, 2011 21

Theta Joins

• Cross Join with a condition

– The most common form of JOIN

• All cities in Finland with a population at least double of Aruba

SELECT

cty.name as City,

cty.population as Population,

cntry.name as Country,

(cty.population * 100 / cntry.population) as 'Percent'

FROM

( SELECT * FROM City WHERE CountryCode = 'fin') AS cty

JOIN (SELECT * FROM Country WHERE Code='abw') AS cntry

ON cty.population > 2*cntry.population;

Page 22: CS 542 Overview of query processing

22© J Singh, 2011 22

Outer Joins

• Selecting elements of a table regardless of whether they are present in the other table.

• Cities starting with 'TOK' and countries starting with 'J'

SELECT c.*, r.name as Country

FROM

(select * from city where city.name like 'tok%') as c

LEFT OUTER JOIN

(select * from country where country.code like 'j%') as r

ON (c.countrycode=r.code);

• Yields 6 cities, 5 in Japan and Tokat in Turkey

• What if we had done RIGHT OUTER JOIN?

Page 23: CS 542 Overview of query processing

23© J Singh, 2011 23

Review and Contrast Joins

• MySQL does not implement FULL OUTER JOIN

– How can we get it if we need it?

• Are CROSS JOIN and FULL OUTER JOIN the same thing?

• Table A has 3 rows, table B has 5 rows.

– How many rows does A CROSS JOIN B have?

– How many rows does A LEFT OUTER JOIN B have?

– How about A RIGHT OUTER JOIN B?

– A FULL OUTER JOIN B?

– A INNER JOIN B?

Page 24: CS 542 Overview of query processing

24© J Singh, 2011 24

Reading Assignment

• Section 6.4

• Section 6.5

– Keep timing considerations in mind

• SQL completely evaluates the query before affecting changes

Page 25: CS 542 Overview of query processing

25© J Singh, 2011 25

Transactions

• ACID

– Atomicity

• Sets of database operations that need to be accomplished atomically, either they all get done or none do. E.g., during money transfer,

– If money is taken out of one account, it must be added to the other

– Consistency

• Enforce constraints on types, values, foreign keys

• Maintain relationships among data elements (see Atomicity)

– Isolation

• Each transaction must appear to be executed as if no other transaction is executing at the same time.

– Durability

• Once committed, the change is permanent.

Page 26: CS 542 Overview of query processing

26© J Singh, 2011 26

Detour: Transaction Scenario

• Real Time Bank (RTB) is an on-line bank.

– RTB executes money transfers as soon as requests are entered

– RTB shows up-to-the-minute account balances

– Transactions that would create a negative balances are denied

• Scenario

– Initially, Alice has $250, Bob has $100, Cathy has $150

– Transactions:

1. Alice pays Bob $200

2. Bob pays Cathy $150

3. Cathy pays Alice $250

• Interesting aside: only transaction order 1, 2, 3 will succeed

– At a Nightly Processing Bank, transaction order would be irrelevant

Page 27: CS 542 Overview of query processing

27© J Singh, 2011 27

Transaction Atomicity

• Work by example: Alice pays Bob $200

BEGIN TRANSACTION

UPDATE Accounts

SET balance = balance – 200

WHERE Owner = 'Alice'

IF (0 > SELECT balance FROM Accounts WHERE Owner = 'Alice‘,

ROLLBACK TRANSACTION ) -- Note: Pidgin SQL Syntax

UPDATE Accounts

SET balance = balance + 200

WHERE Owner = 'Bob‘

COMMIT TRANSACTION

Page 28: CS 542 Overview of query processing

28© J Singh, 2011 28

Transaction Isolation

• Isolation levels and the problems they leave behind:

– READ UNCOMMITTED

• Dirty Read – data of an uncommitted transaction visible to others

– READ COMMITTED: only committed data is visible

• Non-repeatable Read – re-reads some data and find that it has changed due to another transaction committing

– REPEATABLE READ: place locks on all data that are used in the transaction

• Phantom Read – re-execute a subquery returning a set of rows and find a different set of rows

– SERIALIZABLE: As if all transactions occur in a completely isolated fashion

• Too restrictive, not able to support enough transaction volume

• Note: Not every database offers each isolation level.

Choose the isolation level with care!

Page 29: CS 542 Overview of query processing

CS 542 Database Management Systems

Database Logic – The Foundation of Datalog

Page 30: CS 542 Overview of query processing

30© J Singh, 2011 30

About Datalog

• Intellectual debt to Prolog, the logic programming language

• Responsible for addition of recursion to SQL-99.

– Extends SQL but still leaves it Turing-incomplete

• Introductory example:

– Facts:

• Par(sally, john), Par(martha, mary), Par(mary, peter), Par(john, peter)

– Rules:

• Sib(x, y) Par(x, p) AND Par(y, p) AND x <> y

• Cousin(x, y) Sib(x, y)

• Cousin(x, y) Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp)

– Cousin(sally, martha)

Page 31: CS 542 Overview of query processing

31© J Singh, 2011 31

Why Data Logic?

• Why is SQL not sufficient?

– Deductive rules express things that go in both FROM and WHERE clauses

– Allow for stating general requirements that are more difficult to state correctly in SQL

– Allow us to take advantage of research in logic programming and AI

Page 32: CS 542 Overview of query processing

32© J Singh, 2011 32

The Formalism of Rules

• The Head is true if all the subgoals are true

• The rule applies for all values of its arguments

• A variable appearing in the head is distinguished ; otherwise it is nondistinguished.

Ancestor(x, y)

Head = consequent,a single subgoal

Read thissymbol “if”

Body = antecedent =AND of subgoals.

Parent(x, z) AND Ancestor(z, y)

Page 33: CS 542 Overview of query processing

33© J Singh, 2011 33

Interpreting Rules

• The head is true for given values of the distinguished variables if there exist values of the non-distinguished variables that make all subgoals of the body true.

• For the head to be true, all variables must appear in some non-negated subgoal of the body

• Unsafe examples:

Page 34: CS 542 Overview of query processing

34© J Singh, 2011 34

IDB/EDB

• Convention: Predicates begin with a capital, variables begin with lowercase

– e.g., Ancestor (x, y)

• Fact predicates are atoms represented as relations

– If a tuple exists, that fact is true

– Otherwise, false

– A predicate representing a stored relation is called an extensional database (EDB).

• Subgoals of a rule may be facts or may themselves be rules

– EDB when it is a fact

– Intensional database (IDB) when it is a “derived relation”

• Rule heads are always IDBs

father

john tony

peter mary

mother

mary bob

Page 35: CS 542 Overview of query processing

35© J Singh, 2011 35

Computing IDB Relations Bottom-up

• As long as there is no negation of IDB subgoals, each IDB relation grows with each iteration

– At least, it does not shrink

• Since relations are finite, the loop eventually terminates

• Some rules make it impossible to predict that the loop has a chance to terminate.

– Considered unsafe

empty out all IDB relations

REPEAT

FOR (each IDB predicate p) DO

evaluate p using current

values of all relations;

UNTIL (no IDB relation is changed)

Rule Why unsafe?

isHappy(x)

isRich(y)We know y but the possibilities for x are infinite

Bachelor(x) NOT

isMarried(x)Negated, may remove x

IsCheap(x) x < 10 Infinite possibilities

Page 36: CS 542 Overview of query processing

36© J Singh, 2011 36

Computing IDB Relations Top-Down (p1)

• EDB: Par(c,p) = p is a parent of c.

• Generalized cousins: people with common ancestors one or more generations back:

Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y

Cousin(x,y) <- Sib(x,y)

Cousin(x,y) <- Par(x,xp) AND Par(y,yp)

AND Cousin(xp,yp)

• Form a dependency graph whose nodes = IDB predicates.

• Arc X ->Y if and only if there is a rule with X in the head and Y in the body.

• Cycle = recursion; no cycle = no recursion.

Page 37: CS 542 Overview of query processing

37© J Singh, 2011 37

Computing IDB Relations Top-down (p2)

• The recursion eventually terminates unless:

– A distinguished variable

1. does not appear in a subgoal

2. only appears in a negated subgoal

3. only appears in an arithmetic subgoal

– Same 3 conditions for variables in an arithmetic subgoal

– Same 3 conditions for variables in a negated subgoal

for IDB predicate p(x,y, …)

FOR EACH subgoal of p DO

IF subgoal is IDB, recursive call;

IF subgoal is EDB, look up

Rule Why unsafe?

isHappy(x)

isRich(y)We know x but the possibilities for y are infinite

Bachelor(x) NOT

isMarried(x)Negated, may resultin infinite recursion

IsCheap(x) x < 10 It‟s safe!

Page 38: CS 542 Overview of query processing

38© J Singh, 2011 38

Safe Rules

• A rule is safe if:

1. Each distinguished variable,

2. Each variable in an arithmetic subgoal, and

3. Each variable in a negated subgoal,

also appears in a nonnegated,

relational subgoal.

• Safe rules prevent infinite results.

Page 39: CS 542 Overview of query processing

39© J Singh, 2011 39

Evaluating Datalog Programs

• As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated.

• If an IDB predicate has more than one rule, each rule contributes tuples to its relation.

Page 40: CS 542 Overview of query processing

40© J Singh, 2011 40

Expressive Power of Datalog

• Without recursion, Datalog can express all and only the queries of core relational algebra.

– The same as SQL select-from-where, without aggregation and grouping.

• But with recursion, Datalog can express more than these languages.

• Yet still not Turing-complete.

Page 41: CS 542 Overview of query processing

41© J Singh, 2011 41

SQL Rule Definitions & Usage

• Definition of Datalog Rules:

WITH

[RECURSIVE] <RuleName> (<arguments>)

AS <query>;

• Invocation of Datalog Rules:

<SQL query about EDB, IDB>

Page 42: CS 542 Overview of query processing

42© J Singh, 2011 42

SQL Recursion Example (p1)

• Find Sally‟s cousins

– Using Recursive definition introduced earlier

– Par (child, parent) is the EDB

• Expected SQL Query

SELECT y

FROM Cousin

WHERE x = ‘Sally’;

• But first, we need to define the IDB Cousin

Page 43: CS 542 Overview of query processing

43© J Singh, 2011 43

SQL Recursion Example (p2)

• WITH Clause (non-recursive)

WITH Sib(x, y) AS

FROM Par p1, Par p2

WHERE p1.parent = p2.parent

AND p1.child <> p2.child;

• WITH Clause (recursive)

RECURSIVE Cousin(x, y) AS

(SELECT * FROM Sib)

UNION

(SELECT p1.child, p2.child

FROM Par p1, Par p2, Cousin

WHERE p1.parent = Cousin.x

AND p2.parent = Cousin.y);

Page 44: CS 542 Overview of query processing

44© J Singh, 2011 44

Next meeting

• January 31

• Sections 7.1 – 7.3

• Sections 8.1, 8.3 – 8.4

• Discussion of presentation topic proposals