Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für...

21
Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: selectivity query trees optimization goals query rewrite rules common sub-expression identification

Transcript of Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für...

Databases and Information Systems 1

Prof. Dr. Stefan BöttcherFakultät EIM, Institut für Informatik

Universität PaderbornWS 2009 / 2010

Contents: • selectivity • query trees• optimization goals • query rewrite rules• common sub-expression identification

2/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Query optimization in relational databases

Java C (speed-up factor 2-5) C Assembler (speed-up factor 2-5) , butSQL optimized SQL (speed-up by 1-100 and more) - Why?

- logical query optimization- physical query optimization ( next chapter )

3/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Operators of the relational algebra

R1 U R2 = union of relations R1 and R2

R1 R2 = intersection of relations R1 and R2

R1 X R2 = cartesian product of relations R1 and R2

R1 |X|C R2 = join of relations R1 and R2 with condition C

R1 - R2 = difference of relations R1 and R2

PA1,…,An( R ) = projection of relation R to the attributes A1, …, An

SC (R) = selection of the subset of tuples of relation R that match condition C

4/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Selectivity of queries

significant for size of intermediate result

selectivity of the selection with condition C :

selectivity ( SC(R) ) = | SC(R) | / | R |

selectivity of the Join with condition C :

selectivity ( R1 |X|C R2 ) = | R1 |X|C R2 | / ( | R1 | * | R2 | )

estimated (e.g. based on samples, histograms)  

5/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Goal of logical query optimization

SQL queriesselect A1,...,An from R1,..., Rm where C

correspond to algebra expression

PA1,...,An ( SC ( R1 x ... x Rm ) )

- very large intermediate results task: obtain the same result with

smaller intermediate results

e.g. move selection and projection inside expressions as far as possible

6/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Algebra tree for SQL query – example

Write a letter to each student enrolled in a database course

select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID

P B.firstName, B.lastName

 |

S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID

 | X

      /   \    X      C       /     \        B      E

7/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Logical query optimization – example (2)

assumptions: university database with following relationsbachelorStudent B : 10000 bachelor students,

each taking 5 courses on average enroll E : 50000 enrollments course C : 1000 courses, 2 of which have title ‘databases‘

and have 100 enrolled students each

SQL-Query: return firstName and lastName of all bachelor students in a database course

select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID

8/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Logical query optimization – example (3)

SQL-Query: return firstName and lastName of all bachelor students in a database course

select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID

P B.firstName, B.lastName

 | S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID

 | X

      /   \    X      C       /     \        B      E

assumptions: 10000 bachelor students,

each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘

and have 100 enrolled students each

9/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Logical query optimization – example (4)

select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID

P B.firstName, B.lastName

 | S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID

 | X

      /   \    X      C       /     \        B      E

500.000.000.000500.000.000

50.00010.000

1.000

200

200

assumptions: 10000 bachelor students,

each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘

and have 100 enrolled students each

10/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Logical query optimization – example (5)

A possible optimization :

P B.firstName, B.lastName

 | |X|

C.courseID = E.courseID       /         \

|X| S C.title = 'databases'

B.sID=E.sID   \              /      \          \ 

  B       E          C

250.000

50.00010.000

1.000

200

200

select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and

C.courseID = E.courseID and B.sID=E.sID

assumptions: 10000 bachelor students,

each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘

and have 100 enrolled students each

11/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Logical query optimization – example (6)

A better optimization :

P B.firstName, B.lastName

  | |X|

B.sID = E.sID      /         \    B          |X| C.courseID = E.courseID

   /        \  

E       S C.title = 'databases'

       |     C

2

50.000

10.000

1.000

200

200

200

assumptions: 10000 bachelor students,

each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘

and have 100 enrolled students each

select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and

C.courseID = E.courseID and B.sID=E.sID

12/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Rules of logical query optimization (1)

Union, intersection, cartesian product and join are commutative and associative .

R1 U R2 = R2 U R1 R1 R2 = R2 R1 R1 X R2 = R2 X R1

R1 |X|C R2 = R2 |X|C R1

( R1 U R2 ) U R3 = R1 U ( R2 U R3 ) ( R1 R2 ) R3 = R1 ( R2 R3 ) ( R1 X R2 ) X R3 = R1 X ( R2 X R3 )

( R1 |X|C1 R2 ) |X|C2 R3 = R1 |X|C1 ( R2 |X|C2 R3 )

13/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Rules of logical query optimization (2)

whenever the selection condition is a conjunction, selections can be cut off and their order can be swapped :

SC1 and C2(R) = SC1(SC2(R)) = SC2(SC1(R))

  push selections inside union, difference and intersection:

SC ( R1 U R2 )  = SC ( R1 ) U SC ( R2 )

SC ( R1 - R2 )  = SC ( R1 ) - SC ( R2 )

SC ( R1 R2 )  = SC ( R1 ) SC ( R2 )

 

14/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Rules of logical query optimization (3)

push selection inside a join, i.e. to a join argument,

SC ( R1 |X|C2 R2 )  = SC ( R1 ) |X|C2 R2     

if C only uses attributes of R1

push selection inside an argument of a cartesian product

SC ( R1 X R2 )  = SC ( R1 ) X R2     

if C only uses attributes of R1

if this is impossible for both R1 and R2, i.e., C uses attributes of R1 and of R2 : substitute selection applied to cartesian product with join

SC ( R1 X R2 )  = R1 |X|C R2    

15/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Rules of logical query optimization (4)

order of projection and selection can be swapped, if the projection yields all attributes needed for the selection condition :

SC ( PA1,...,Am ( R1 ) )  = PA1,...,Am ( SC ( R1 ) )    

if C only uses attributes of A1,...,Am.

push projection inside union

PA1,...,Am ( R1 U R2 )  = PA1,...,Am ( R1 ) U PA1,...,Am ( R2 )

push projection into the join, i.e. apply it a join argument, if the join attributes are contained in the projection

PA1,...,Am ( R1 |X|C R2 ) = PA1,...,Am ( ( PA1,...,Am,AC1,...,ACn ( R1 ) ) |X|C R2 ) 

where AC1,...,ACn are the attributes of R1 needed to check the join condition C.

projections can be combined and inserted additionally

PA1,...,Am ( R1 )  = PA1,...,Am (  PA1,...,Am,AC1,...,ACn ( R1 )   )    

16/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Logical query optimization - steps

represent SQL query as a logical query tree

apply the following optimizations to this query tree

• cut off and push down selections

• combine selections and cartesian products to joins

• determine join sequence with smallest intermediate result

• where possible push down and insert projections

17/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Logical query optimization - exercises

1. represent following SQL query as a logical query tree

select E.courseID from bachelorStudent B , enroll E where B.lastName = 'Meier' and B.sID = E.sID

optimize step by step and write down optimized logical query

2. assume 1000 courses, 10000 bachelor students,

each taking 5 courses on average4 of them have lastName 'Meier'

compute selectivity of selection and join in optimized query

18/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Finding common sub-expressions - goal

SQL query represented as a logical query tree

sub-tree R1

sub-tree R2

op op

R1 = R2 ?

R1 R2 ?

reuse !

recompute

19/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Finding common sub-expressions – (2)

R1 = R2 ?

R1 R2 ?

reuse !

recompute

R1 = R2 ? normalize R1 and R2 by applying algebra rules+ compare normalized queries

SC1 (R) SC2 (R)

if C1 implies C2 ( i.e. (not C1 or C2) = true )

R1 |X|C R2 R1 |X| R2

20/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Finding common sub-expressions – (3)

R1 R2 ?

reuse !

recompute

use monotonicity of union, intersection, join, cartesian product, selection and projection

If R1 R3 and R2 R4 then

R1 U R2R1 R2R1 X R2

R1 |X|C R2

R1 - R4

PA1,…,An( R1 )

SC (R1)

R3 U R4R3 R4R3 X R4

R3 |X|C R4

R3 - R2

PA1,…,An( R3 )

SC (R3)

21/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization

Summary – logical query optimization

Goal: minimize intermediate results

SQL query

logical query tree

apply transformation rules

search common sub-expressions