Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für...
-
Upload
jace-artley -
Category
Documents
-
view
213 -
download
0
Transcript of Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für...
Databases and Information Systems 1
Prof. Dr. Stefan BöttcherFakultät EIM, Institut für Informatik
Universität PaderbornWS 2009 / 2010
Contents: • selectivity • query trees• optimization goals • query rewrite rules• common sub-expression identification
2/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Query optimization in relational databases
Java C (speed-up factor 2-5) C Assembler (speed-up factor 2-5) , butSQL optimized SQL (speed-up by 1-100 and more) - Why?
- logical query optimization- physical query optimization ( next chapter )
3/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Operators of the relational algebra
R1 U R2 = union of relations R1 and R2
R1 R2 = intersection of relations R1 and R2
R1 X R2 = cartesian product of relations R1 and R2
R1 |X|C R2 = join of relations R1 and R2 with condition C
R1 - R2 = difference of relations R1 and R2
PA1,…,An( R ) = projection of relation R to the attributes A1, …, An
SC (R) = selection of the subset of tuples of relation R that match condition C
4/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Selectivity of queries
significant for size of intermediate result
selectivity of the selection with condition C :
selectivity ( SC(R) ) = | SC(R) | / | R |
selectivity of the Join with condition C :
selectivity ( R1 |X|C R2 ) = | R1 |X|C R2 | / ( | R1 | * | R2 | )
estimated (e.g. based on samples, histograms)
5/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Goal of logical query optimization
SQL queriesselect A1,...,An from R1,..., Rm where C
correspond to algebra expression
PA1,...,An ( SC ( R1 x ... x Rm ) )
- very large intermediate results task: obtain the same result with
smaller intermediate results
e.g. move selection and projection inside expressions as far as possible
6/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Algebra tree for SQL query – example
Write a letter to each student enrolled in a database course
select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID
P B.firstName, B.lastName
|
S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID
| X
/ \ X C / \ B E
7/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Logical query optimization – example (2)
assumptions: university database with following relationsbachelorStudent B : 10000 bachelor students,
each taking 5 courses on average enroll E : 50000 enrollments course C : 1000 courses, 2 of which have title ‘databases‘
and have 100 enrolled students each
SQL-Query: return firstName and lastName of all bachelor students in a database course
select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID
8/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Logical query optimization – example (3)
SQL-Query: return firstName and lastName of all bachelor students in a database course
select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID
P B.firstName, B.lastName
| S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID
| X
/ \ X C / \ B E
assumptions: 10000 bachelor students,
each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘
and have 100 enrolled students each
9/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Logical query optimization – example (4)
select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID
P B.firstName, B.lastName
| S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID
| X
/ \ X C / \ B E
500.000.000.000500.000.000
50.00010.000
1.000
200
200
assumptions: 10000 bachelor students,
each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘
and have 100 enrolled students each
10/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Logical query optimization – example (5)
A possible optimization :
P B.firstName, B.lastName
| |X|
C.courseID = E.courseID / \
|X| S C.title = 'databases'
B.sID=E.sID \ / \ \
B E C
250.000
50.00010.000
1.000
200
200
select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and
C.courseID = E.courseID and B.sID=E.sID
assumptions: 10000 bachelor students,
each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘
and have 100 enrolled students each
11/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Logical query optimization – example (6)
A better optimization :
P B.firstName, B.lastName
| |X|
B.sID = E.sID / \ B |X| C.courseID = E.courseID
/ \
E S C.title = 'databases'
| C
2
50.000
10.000
1.000
200
200
200
assumptions: 10000 bachelor students,
each taking 5 courses on average1000 courses, 2 of which have title ‘databases‘
and have 100 enrolled students each
select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and
C.courseID = E.courseID and B.sID=E.sID
12/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Rules of logical query optimization (1)
Union, intersection, cartesian product and join are commutative and associative .
R1 U R2 = R2 U R1 R1 R2 = R2 R1 R1 X R2 = R2 X R1
R1 |X|C R2 = R2 |X|C R1
( R1 U R2 ) U R3 = R1 U ( R2 U R3 ) ( R1 R2 ) R3 = R1 ( R2 R3 ) ( R1 X R2 ) X R3 = R1 X ( R2 X R3 )
( R1 |X|C1 R2 ) |X|C2 R3 = R1 |X|C1 ( R2 |X|C2 R3 )
13/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Rules of logical query optimization (2)
whenever the selection condition is a conjunction, selections can be cut off and their order can be swapped :
SC1 and C2(R) = SC1(SC2(R)) = SC2(SC1(R))
push selections inside union, difference and intersection:
SC ( R1 U R2 ) = SC ( R1 ) U SC ( R2 )
SC ( R1 - R2 ) = SC ( R1 ) - SC ( R2 )
SC ( R1 R2 ) = SC ( R1 ) SC ( R2 )
14/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Rules of logical query optimization (3)
push selection inside a join, i.e. to a join argument,
SC ( R1 |X|C2 R2 ) = SC ( R1 ) |X|C2 R2
if C only uses attributes of R1
push selection inside an argument of a cartesian product
SC ( R1 X R2 ) = SC ( R1 ) X R2
if C only uses attributes of R1
if this is impossible for both R1 and R2, i.e., C uses attributes of R1 and of R2 : substitute selection applied to cartesian product with join
SC ( R1 X R2 ) = R1 |X|C R2
15/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Rules of logical query optimization (4)
order of projection and selection can be swapped, if the projection yields all attributes needed for the selection condition :
SC ( PA1,...,Am ( R1 ) ) = PA1,...,Am ( SC ( R1 ) )
if C only uses attributes of A1,...,Am.
push projection inside union
PA1,...,Am ( R1 U R2 ) = PA1,...,Am ( R1 ) U PA1,...,Am ( R2 )
push projection into the join, i.e. apply it a join argument, if the join attributes are contained in the projection
PA1,...,Am ( R1 |X|C R2 ) = PA1,...,Am ( ( PA1,...,Am,AC1,...,ACn ( R1 ) ) |X|C R2 )
where AC1,...,ACn are the attributes of R1 needed to check the join condition C.
projections can be combined and inserted additionally
PA1,...,Am ( R1 ) = PA1,...,Am ( PA1,...,Am,AC1,...,ACn ( R1 ) )
16/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Logical query optimization - steps
represent SQL query as a logical query tree
apply the following optimizations to this query tree
• cut off and push down selections
• combine selections and cartesian products to joins
• determine join sequence with smallest intermediate result
• where possible push down and insert projections
17/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Logical query optimization - exercises
1. represent following SQL query as a logical query tree
select E.courseID from bachelorStudent B , enroll E where B.lastName = 'Meier' and B.sID = E.sID
optimize step by step and write down optimized logical query
2. assume 1000 courses, 10000 bachelor students,
each taking 5 courses on average4 of them have lastName 'Meier'
compute selectivity of selection and join in optimized query
18/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Finding common sub-expressions - goal
SQL query represented as a logical query tree
sub-tree R1
sub-tree R2
op op
R1 = R2 ?
R1 R2 ?
reuse !
recompute
19/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Finding common sub-expressions – (2)
R1 = R2 ?
R1 R2 ?
reuse !
recompute
R1 = R2 ? normalize R1 and R2 by applying algebra rules+ compare normalized queries
SC1 (R) SC2 (R)
if C1 implies C2 ( i.e. (not C1 or C2) = true )
R1 |X|C R2 R1 |X| R2
20/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Finding common sub-expressions – (3)
R1 R2 ?
reuse !
recompute
use monotonicity of union, intersection, join, cartesian product, selection and projection
If R1 R3 and R2 R4 then
R1 U R2R1 R2R1 X R2
R1 |X|C R2
R1 - R4
PA1,…,An( R1 )
SC (R1)
R3 U R4R3 R4R3 X R4
R3 |X|C R4
R3 - R2
PA1,…,An( R3 )
SC (R3)
21/21Databases and Information Systems I – WS 2009/2010 – Logical Query OptimizationDatabases and Information Systems I – WS 2009/2010 – Logical Query Optimization
Summary – logical query optimization
Goal: minimize intermediate results
SQL query
logical query tree
apply transformation rules
search common sub-expressions