Dynamic Query Optimization

of 34 /34
Transparent Access to Grid Data Objects | IBM Confidential © 2003 IBM Corporation Progressive Query Processing | ACM SIGMOD 2004 © 2004 IBM Corporation 2 Dynamic Query Optimization

Embed Size (px)

description

Dynamic Query Optimization. Problems with static optimization. Cost function instability: cardinality error of n-way join grows exponentially with n Unknown run-time bindings for host variables Changing environment parameters: amount of available space, concurrency rate, etc. - PowerPoint PPT Presentation

Transcript of Dynamic Query Optimization

IBM blue-and-white templatePresentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer information may also be appear in this area. Place flush left, aligned at bottom, 8-10pt Arial Regular, white
IBM logo must not be moved, added to, or altered in any way.
Indications in green = Live content Indications in white = Edit in master Indications in blue = Locked elements Indications in black = Optional elements
Presentation title: 28pt Arial Regular, black Recommended maximum length: 2 lines
Group name: 17pt Arial Regular, white Maximum length: 1 line
Copyright: 10pt Arial Regular, white
For client presentations, client’s logo may go in this area
Presentation subtitle: 20pt Arial Regular, teal R045 | G182 | B179 Recommended maximum length: 2 lines
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer information may also be appear in this area. Place flush left, aligned at bottom, 8-10pt Arial Regular, white
IBM logo must not be moved, added to, or altered in any way.
Indications in green = Live content Indications in white = Edit in master Indications in blue = Locked elements Indications in black = Optional elements
Presentation title: 28pt Arial Regular, black Recommended maximum length: 2 lines
Group name: 17pt Arial Regular, white Maximum length: 1 line
Copyright: 10pt Arial Regular, white
For client presentations, client’s logo may go in this area
Robust Query Processing through Progressive Optimization
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Cost function instability: cardinality error of n-way join grows exponentially with n
Unknown run-time bindings for host variables
Changing environment parameters: amount of available space, concurrency rate, etc
Static optimization comes in two flavours:
Optimize query Q, store the plan, run it whenever Q is posed
Every time when Q is posed, optimize it and run it
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Early Solutions
run several plans simultaneously for a short time, and then select one “best” plan and run it for a long time
at every point in a standard query plan where the optimizer cannot accurately estimate the selectivity of an input, a choose-plan operator is inserted
Select Choose-Plan
Unbound predicate
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Limitations:
Can only collect statistics that can be gathered in one pass
Not useful for pipelined execution
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Assume 8MB memory available and 4.2MB necessary for each hash-join
The optimizer allocates 4.2MB for the first hash-join and 250KB for the second (causing it to execute in two passes)
During execution, the statistics collector find out that only 7,500 tuples produced by the filter
The memory manager allocates each of the two hash-joins 2.05MB
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Query Plan Modification
Once the statistics are available, modify the plan on the fly
Hard to implement!
© 2003 IBM Corporation
© 2004 IBM Corporation
select avg(Temp1.selectattr1),
Submit a new query using the partial results
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
SELECT count(*) from cars, accidents, owners
WHERE c.id = a.cid and c.id=o.cid and
c.make=‘Honda’ and c.model=‘Accord’
Over-specified queries
Mis-estimated single-predicate selectivity
Out-of-date statistics
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
SQL Compilation
1. Monitor
2. Analyze
3. Feedback
4. Exploit
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Can detect problem early!
Correct the plan dynamically before we waste any more time!
May never execute this exact query again
Parameter markers
Rare correlations
Complex predicates
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Check Estimated cardinalities vs. Actuals at runtime
When checking fails:
Correct the cardinality estimates based on the actual cardinalities
Re-optimize the query, possibly exploiting already performed work
Questions:
When is an error big enough to be worth reoptimizing?
Tradeoff between opportunity (# reoptimization points) and
risk (performance regression)
© 2003 IBM Corporation
© 2004 IBM Corporation
Performed updates
Don’t reoptimize if the plan is almost finished
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Very low risk of regression
Provides safeguard for hash-join, merge-join, etc.
Lazy Checking with Eager Materialization
Pro-actively add dams to enable checkpointing
E.g. outer of nested-loops join
Eager Checking
It may be too late to wait until the dam is complete
Check cardinalities before tuples are inserted into the dam
Can extrapolate to estimate final cardinality
DAM
© 2003 IBM Corporation
© 2004 IBM Corporation
Cardinality is actual cardinality
Re-invoke query compiler
ELSE continue execution
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Validity Range Determination (1)
At a given operator, what input cardinality change will cause a plan change? i.e. when is this plan valid
In general, equivalent to parametric optimization
Super-exponential explosion of alternative plans to consider
Finds optimal plan for each value range, for each subset of predicates,
So we focus on changes in a single operator
Local decision
Disadvantage: Pessimistic model, since it misses reoptimization opportunities
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
cost(P1, est_cardouter) < cost(P2, est_cardouter)
Estimate upper and lower bounds on cardouter s.t. P2 dominates P1
Use bounds to update (narrow) the validity range of outer (likewise for inner)
Applies to arbitrary operators
Can be applied all the way up the plan tree
outer
inner
L1
Q
P
P1
outer
inner
L2
P
Q
P2
© 2003 IBM Corporation
© 2004 IBM Corporation
Lineitem × Orders query
N1,M1,H1: Orders as outer
N1: ISCAN on inner
N2,M2,H2: Lineitem as outer
N1
H2
M1
© 2003 IBM Corporation
© 2004 IBM Corporation
Upper bounds vary
Still upper bounds set conservatively; no false reoptimization
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
N1
H2
M1
© 2003 IBM Corporation
© 2004 IBM Corporation
Re-optimization Opportunities with POP
© 2003 IBM Corporation
© 2004 IBM Corporation
4-way Join:
© 2003 IBM Corporation
© 2004 IBM Corporation
Box: 25th to 75th percentile of queries
Response Time Scatter Plot
0.6708353171
3.087526915
0.4473180039
38.819461552
6.9501742712
0.8290937963
5.5999466951
33.8036536249
49.342060854
3.3976106384
2.2113442549
0.2103239704
0.9579981672
14.5018337408
0.4632789495
0.6719284823
4.4129337539
43.6477126977
0.3322046706
0.9980746884
0.8789400279
0.8070483037
0.8668656222
0.8419756465
0.6191151343
0.8892166031
0.9546582162
33.7623433103
19.965840708
84.8196202532
49.6767169179
0.6767547198
13.5706059416
18.1280760626
3.0551119418
14.0534811903
49.0818257723
2.2857699374
33.9783376863
Speedup
Speedup Bar Chart
Query Time Rows
© 2003 IBM Corporation
© 2004 IBM Corporation
Response Time Scatter Plot
0.6708353171
3.087526915
0.4473180039
38.819461552
6.9501742712
0.8290937963
5.5999466951
33.8036536249
49.342060854
3.3976106384
2.2113442549
0.2103239704
0.9579981672
14.5018337408
0.4632789495
0.6719284823
4.4129337539
43.6477126977
0.3322046706
0.9980746884
0.8789400279
0.8070483037
0.8668656222
0.8419756465
0.6191151343
0.8892166031
0.9546582162
33.7623433103
19.965840708
84.8196202532
49.6767169179
0.6767547198
13.5706059416
18.1280760626
3.0551119418
14.0534811903
49.0818257723
2.2857699374
33.9783376863
Speedup
Speedup Bar Chart
Query Time Rows
© 2003 IBM Corporation
© 2004 IBM Corporation
Response Time Scatter Plot
0.6708353171
3.087526915
0.4473180039
38.819461552
6.9501742712
0.8290937963
5.5999466951
33.8036536249
49.342060854
3.3976106384
2.2113442549
0.2103239704
0.9579981672
14.5018337408
0.4632789495
0.6719284823
4.4129337539
43.6477126977
0.3322046706
0.9980746884
0.8789400279
0.8070483037
0.8668656222
0.8419756465
0.6191151343
0.8892166031
0.9546582162
33.7623433103
19.965840708
84.8196202532
49.6767169179
0.6767547198
13.5706059416
18.1280760626
3.0551119418
14.0534811903
49.0818257723
2.2857699374
33.9783376863
Speedup
Speedup Bar Chart
Query Time Rows
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Ingres: adaptive nested loop join
XJoin, Tukwila: adaptive hash join
Pang/Carey/Livny, Zhang/Larson: dynamic memory adjustment

SteMs: adaptation of join algorithms, spanning trees, …
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Conclusions
POP makes plans for complex queries more robust to optimizer misestimates
Significant performance improvement on real workloads
Overhead of re-optimization is very low, scales with DB size
Validity ranges tell us how risky a plan is
Can be used for many applications to act upon cardinality sensitivity
Future Work:
# concurrent applications
Actual run time, actual # I/Os
Avoid re-optimization too late in plan of if cost of optimization too high
Re-optimization in shared-nothing query plans
Extend validity ranges to more general plan robustness measures
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Overhead about 2-3%
Hence will go down with higher data sizes
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
© 2003 IBM Corporation
© 2004 IBM Corporation
Execution time
If invalid:
Associate with MQT for that TEMP / SORT
Name of GTT
Re-compile query!
Re-compilation time
Transparent Access to Grid Data Objects | IBM Confidential
© 2003 IBM Corporation
© 2004 IBM Corporation
Possible regressions, due to
Change of plan (one of 2 off-setting errors gets fixed by POP)
Re-optimization overhead (minimal)
Getting MQT “match” structure back to Optimizer
Prototype not yet “industrial strength”
Now hung at top-most level (presumes no statement cache)
Schema may have changed since 1st compilation
This would change the QGM for this query, invalidating the MQT “matching”
0
0.2
0.4
0.6
0.8
1
1.2
Queries
LC (above HJ)
Default Selectivity Estimate
Correct Selectivity Estimate
0
250
500
750
1000
1250
1500
0250500750100012501500