Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.
-
Author
grant-walters -
Category
Documents
-
view
214 -
download
0
Embed Size (px)
Transcript of Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.
-
Semantic Query Optimization TechniquesNovember 16, 2005By : Mladen Kovacevic
-
Background1980's, semantic information stored in dbs as integrity constraints could be used for query optimization
semantic: of or relating to meaning or the study of meaning(http://wordnet.princeton.edu)
integrity : preserve data consistency when changes made in db.
no extensive implementation existing today (1999)
-
IntroductionKey factor in relational database systems improvement in query execution time, is query optimization.
Query execution can be improved by:
Analyzing integrity information, and rewriting queries exploiting this information (JE & PI)
Avoid expensive sorting costs (Order Optimization)
Exploiting uniqueness by knowing rows will be unique, thus, avoiding extra sorts. (EU)
-
Presentation OverviewSemantic Query Optimization techniques
Join Elimination (JE) Predicate Introduction (PI)
Order Optimization (OO)
Exploiting Uniqueness (EU)
-
Some MotivationDescribing two techniques in SQO, demonstrated in DB2 UDB.Predicate IntroductionJoin Elimination
Reasons: rewriting queries by hand showed that these two provided consistent optimization.practical to implementextendible to other DBMSs.
Data sets used : TPC-D and APB-1 OLAP benchmarks
only REFERENTIAL INTEGRITY constraints and CHECK CONSTRAINTS used!
-
Semantic Query Optimization (SQO) TechniquesJoin Elimination: Some joins need NOT be evaluated since the result may be known apriori (more on this later)
Join Introduction: Adding a join can help if relation is small compared to original relations and highly selective.
Predicate Elimination : If predicate known to be always true, can be eliminated from query (DISTINCT clause on Primary Key Uniqueness exploitation!)
Predicate Introduction: New predicates on indexed attributes can result in a much faster access plan.
Detecting the Empty Answer Set : If query predicates inconsistent with integrity constraints, the query does not have answer.
-
Why SQO implementations not used?Deductive Databases : Many cases SQO techniques were designed for deductive databases, thus not appearing to be useful in relational database context.
CPU & I/O Speeds similar : When being developed, CPU & I/O speeds were not as dramatically different(savings in I/O not worth the CPU time added)
Lack of Integrity Constraints : Thought that many integrity constraints are needed for SQO to be useful
-
Two-stage OptimizerExamples of SQO techniques always designed for a two-stage optimizer
Stage 1 : logically equivalent queries created (DB2s query rewrite optimization)
Stage 2 : generate plans of all these queries, choosing the one with lowest estimated cost. (DB2s query plan optimization)Join order, join methods, join site in a distributed database, method for accessing input table, etc.
-
Join EliminationSimple : Eliminate relation where join is over tables related through referential integrity constraint, and primary key table referenced only in the joinVIEW DEFINITIONCREATE VIEW Supplier_Info (n, a, c) asSELECT s_name, s_address, n_nameFROM tpcd.supplier, tpcd.nationWHERE s_nationkey = n_nationkey
QUERYSELECT s_n, s_aFROM Supplier_Info
-
Join Elimination (cont)Query can be rewritten internally as:SELECT s_n, s_aFROM tpcd.supplierWhy do such a simple rewrite?
User may not have access to the supplier table, and/or may only know about the view. Sometimes GUI managers create these dumb queries so need to optimize Non-programmers write queries often, and may not even think about this.
Algorithm for generic redundant join removal provided in paper.
-
Example Join EliminationSELECT p_name, p_retailprice, s_name, s_addressFROM tpcd.lineitem, tpcd.partsupp, tpcd.part, tpcd.supplierWHERE p_partkey = ps_partkey and s_suppkey = ps_suppkey and ps_partkey = l_partkey and ps_suppkey = l_suppkey and l_shipdate between '1994-01-01' and '1996-06-30' and l_discount >= 0.1GROUP BY p_name, p_retailprice, s_name, s_addressORDER BY p_name, s_namePARTPARTKEYSUPPLIERSUPPKEYPARTSUPPPARTKEYSUPPKEYLINEITEMPARTKEYSUPPKEY1 many relationship
-
Example : Join EliminationAny immediate improvements that can be seen here? p_partkey = ps_partkey and s_suppkey = ps_suppkey and ps_partkey = l_partkey and ps_suppkey = l_suppkeyP_PARTKEYPS_PARTKEYL_PARTKEYS_SUPPKEYPS_SUPPKEYL_SUPPKEYP_PARTKEY = PS_PARTKEYPS_PARTKEY = L_PARTKEYS_SUPPKEY = PS_SUPPKEYPS_SUPPKEY = L_SUPPKEYS_SUPPKEY = L_SUPPKEYPS_PARTKEY = L_PARTKEY
-
Results100 MB db sizeExecution Time : 58.5 sec -> 38.25 sec (35 % improvement) I/O Cost: 4631 -> 1498 page reads (67 % improvement)
Chart1
4631
1498
Pages
Join Elimination Optimizing Query 1 Pages Read
data
OriginalOptimized
Q1 seconds58.538.2535.00%
Q1 pages4631149867.00%
Q2 seconds6331.6479.3399.00%
OLAP seconds
I198.29.7
I2576.223.9
I311.310.9
I412.511.4
I5504.3167.2
I6586.4231
I7523.5268.5
I85.64.9
I95.25.1
I104.74.3
Estimated Cost
P15403118308
P211331611222
P310867669623
P412834235974
P5195098133665
Execution Time
P113.55.4
P224.94.9
P325.158.3
P446.438.6
P556.598.3
Modified Results
P321.310.9
P552.245.6
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
58.5
38.25
Seconds
Join Elimination Optimizing Query 1 Execution Time
joinElim2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim2
4631
1498
Pages
Join Elimination Optimizing Query 1 Pages Read
joinElim3
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim3
0
0
Seconds
Join Elimination Optimizing Query 2 Execution Time
joinElimOLAP
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElimOLAP
00
00
00
00
00
00
00
00
00
00
Query Name
Execution Time (seconds)
Join Elimination Optimized Query in OLAP Environment
PiEstimate
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiEstimate
00
00
00
00
00
Query
Estimated Cost (internal units)
Predicate Introduction Estimated Costs
PiActual
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual
00
00
00
00
00
Query
Execution Time (seconds)
Predicate Introduction Execution Times
PiActual2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual2
21.310.9
52.245.6
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
Chart2
58.5
38.25
Seconds
Join Elimination Optimizing Query 1 Execution Time
data
OriginalOptimized
Q1 seconds58.538.2535.00%
Q1 pages4631149867.00%
Q2 seconds6331.6479.3399.00%
OLAP seconds
I198.29.7
I2576.223.9
I311.310.9
I412.511.4
I5504.3167.2
I6586.4231
I7523.5268.5
I85.64.9
I95.25.1
I104.74.3
Estimated Cost
P15403118308
P211331611222
P310867669623
P412834235974
P5195098133665
Execution Time
P113.55.4
P224.94.9
P325.158.3
P446.438.6
P556.598.3
Modified Results
P321.310.9
P552.245.6
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
58.5
38.25
Seconds
Join Elimination Optimizing Query 1 Execution Time
joinElim2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim2
4631
1498
Pages
Join Elimination Optimizing Query 1 Pages Read
joinElim3
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim3
0
0
Seconds
Join Elimination Optimizing Query 2 Execution Time
joinElimOLAP
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElimOLAP
00
00
00
00
00
00
00
00
00
00
Query Name
Execution Time (seconds)
Join Elimination Optimized Query in OLAP Environment
PiEstimate
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiEstimate
00
00
00
00
00
Query
Estimated Cost (internal units)
Predicate Introduction Estimated Costs
PiActual
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual
00
00
00
00
00
Query
Execution Time (seconds)
Predicate Introduction Execution Times
PiActual2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual2
21.310.9
52.245.6
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
-
Results OLAP EnvironmentIn OLAP (online analytical processing) servers, using a star schema (one fact table, with several dimension tables) improvements ranged from 2% to 96 %.
In these cases, much improvement came from CPU cost instead of I/O, because dimension tables were small enough to fit into memory...
-
Chart4
98.29.7
576.223.9
11.310.9
12.511.4
504.3167.2
586.4231
523.5268.5
5.64.9
5.25.1
4.74.3
Original
Optimized
Query Name
Execution Time (seconds)
Join Elimination Optimized Query in OLAP Environment
data
OriginalOptimized
Q1 seconds58.538.2535.00%
Q1 pages4631149867.00%
Q2 seconds6331.6479.3399.00%
OLAP seconds
I198.29.7
I2576.223.9
I311.310.9
I412.511.4
I5504.3167.2
I6586.4231
I7523.5268.5
I85.64.9
I95.25.1
I104.74.3
Estimated Cost
P15403118308
P211331611222
P310867669623
P412834235974
P5195098133665
Execution Time
P113.55.4
P224.94.9
P325.158.3
P446.438.6
P556.598.3
Modified Results
P321.310.9
P552.245.6
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
58.5
38.25
Seconds
Join Elimination Optimizing Query 1 Execution Time
joinElim2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim2
4631
1498
Pages
Join Elimination Optimizing Query 1 Pages Read
joinElim3
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim3
6331.64
79.33
Seconds
Join Elimination Optimizing Query 2 Execution Time
joinElimOLAP
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElimOLAP
98.29.7
576.223.9
11.310.9
12.511.4
504.3167.2
586.4231
523.5268.5
5.64.9
5.25.1
4.74.3
Original
Optimized
Query Name
Execution Time (seconds)
Join Elimination Optimized Query in OLAP Environment
PiEstimate
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiEstimate
00
00
00
00
00
Query
Estimated Cost (internal units)
Predicate Introduction Estimated Costs
PiActual
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual
00
00
00
00
00
Query
Execution Time (seconds)
Predicate Introduction Execution Times
PiActual2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual2
21.310.9
52.245.6
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
-
Predicate IntroductionTechniques discussed :
Index Introduction : add new predicate on attribute if index exists on that attribute.Assumption : index retrieval is better than table scan, is this always good?
Scan Reduction : reduce number of tuples that qualify for a join.Problem : Not very common; unlikely that there will be any check constraints or predicates with inequalities about join columns
Detecting empty query answer set (not shown as query execution time essentially 0)
-
Example - Predicate IntroductionSELECT sum(l_extendedprice * l_discount) as revenueFROM tpcd.lineitemWHERE l_shipdate >= date(1994-01-01) and l_shipdate < date(1994-01-01)+ 1 year and l_discount between .06 0.01 and .06 + 0.01 and l_quantity < 24;
Check Constraint : l_shipdate = date(1994-01-01)
-
Example - Predicate IntroductionSELECT sum(l_extendedprice * l_discount) as revenueFROM tpcd.lineitemWHERE l_shipdate >= date(1994-01-01) and l_shipdate < date(1994-01-01)+ 1 year and l_receiptdate >= date(1994-01-01) and l_discount between .06 0.01 and .06 + 0.01 and l_quantity < 24;
Check Constraint : l_shipdate = date(1994-01-01)
Why would we want to do this? In order to have optimizer choose a plan using the index. Is this always good?NO! What if most of the rows in the table need to be returned? We should use a tablescan instead.
-
Predicate Introduction - AlgorithmInput : set of all check constraints defined for a database and the set of all predicates in query
Output: set of all non-redundant formulas derivable from the input set. This answer set can then be added to the query, but only a few are potentially useful. The goal in the paper was to choose additions that would guarantee improvement.
Conditions in paper: Conservative approach of introducing predicates that will have the plan optimizer use an index. Insist on only one index available with the query predicate.
-
Predicate Introduction - Results
Chart1
5403118308
11331611222
10867669623
12834235974
195098133665
Original
Optimized
Query
Estimated Cost (internal units)
Predicate Introduction Estimated Costs
data
OriginalOptimized
Q1 seconds58.538.2535.00%
Q1 pages4631149867.00%
Q2 seconds6331.6479.3399.00%
OLAP seconds
I198.29.7
I2576.223.9
I311.310.9
I412.511.4
I5504.3167.2
I6586.4231
I7523.5268.5
I85.64.9
I95.25.1
I104.74.3
Estimated Cost
P15403118308
P211331611222
P310867669623
P412834235974
P5195098133665
Execution Time
P113.55.4
P224.94.9
P325.158.3
P446.438.6
P556.598.3
Modified Results
P321.310.9
P552.245.6
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
58.5
38.25
Seconds
Join Elimination Optimizing Query 1 Execution Time
joinElim2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim2
4631
1498
Pages
Join Elimination Optimizing Query 1 Pages Read
joinElim3
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim3
6331.64
79.33
Seconds
Join Elimination Optimizing Query 2 Execution Time
joinElimOLAP
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElimOLAP
98.29.7
576.223.9
11.310.9
12.511.4
504.3167.2
586.4231
523.5268.5
5.64.9
5.25.1
4.74.3
Original
Optimized
Query Name
Execution Time (seconds)
Join Elimination Optimized Query in OLAP Environment
PiEstimate
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiEstimate
5403118308
11331611222
10867669623
12834235974
195098133665
Original
Optimized
Query
Estimated Cost (internal units)
Predicate Introduction Estimated Costs
PiActual
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual
13.55.4
24.94.9
25.158.3
46.438.6
56.598.3
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
PiActual2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual2
21.310.9
52.245.6
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
-
Predicate Introduction - ResultsWhy?
Chart2
13.55.4
24.94.9
25.158.3
46.438.6
56.598.3
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
data
OriginalOptimized
Q1 seconds58.538.2535.00%
Q1 pages4631149867.00%
Q2 seconds6331.6479.3399.00%
OLAP seconds
I198.29.7
I2576.223.9
I311.310.9
I412.511.4
I5504.3167.2
I6586.4231
I7523.5268.5
I85.64.9
I95.25.1
I104.74.3
Estimated Cost
P15403118308
P211331611222
P310867669623
P412834235974
P5195098133665
Execution Time
P113.55.4
P224.94.9
P325.158.3
P446.438.6
P556.598.3
Modified Results
P321.310.9
P552.245.6
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
58.5
38.25
Seconds
Join Elimination Optimizing Query 1 Execution Time
joinElim2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim2
4631
1498
Pages
Join Elimination Optimizing Query 1 Pages Read
joinElim3
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim3
6331.64
79.33
Seconds
Join Elimination Optimizing Query 2 Execution Time
joinElimOLAP
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElimOLAP
98.29.7
576.223.9
11.310.9
12.511.4
504.3167.2
586.4231
523.5268.5
5.64.9
5.25.1
4.74.3
Original
Optimized
Query Name
Execution Time (seconds)
Join Elimination Optimized Query in OLAP Environment
PiEstimate
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiEstimate
5403118308
11331611222
10867669623
12834235974
195098133665
Original
Optimized
Query
Estimated Cost (internal units)
Predicate Introduction Estimated Costs
PiActual
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual
13.55.4
24.94.9
25.158.3
46.438.6
56.598.3
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
PiActual2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual2
21.310.9
52.245.6
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
-
Why Longer Execution for P3/P5?P2 and P3 are the same except for the followingP2 :SELECT ...FROM ...WHERE l_shipdate >= date ('1998-09-01') and l_shipdate < date ('1998-09-01') + 1 month
P3 :SELECT ...FROM ...WHERE l_shipdate >= date ('1995-09-01') and l_shipdate < date ('1995-09-01') + 1 month
Difference in table shows that P2 has 2 % of the tuples falling in the range while P3 has 48 % of the tuples fall in the category : BOTH plans will choose index scan! P3 is so large that tablescan is better in this case.Cost model underestimates cost of locking/unlocking index pagesEstimated number of tuples goes down because of the reduction factor problem (multiply in the new predicate added)
-
Adjustments for Reduction Factor ProblemAdd new predicate only when it contains a major column of an index and a scan of that index is sufficient to answer the query (thus, no table scan necessary) Original Index : New Index :
Chart3
21.310.9
52.245.6
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
data
OriginalOptimized
Q1 seconds58.538.2535.00%
Q1 pages4631149867.00%
Q2 seconds6331.6479.3399.00%
OLAP seconds
I198.29.7
I2576.223.9
I311.310.9
I412.511.4
I5504.3167.2
I6586.4231
I7523.5268.5
I85.64.9
I95.25.1
I104.74.3
Estimated Cost
P15403118308
P211331611222
P310867669623
P412834235974
P5195098133665
Execution Time
P113.55.4
P224.94.9
P325.158.3
P446.438.6
P556.598.3
Modified Results
P321.310.9
P552.245.6
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim1
58.5
38.25
Seconds
Join Elimination Optimizing Query 1 Execution Time
joinElim2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim2
4631
1498
Pages
Join Elimination Optimizing Query 1 Pages Read
joinElim3
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElim3
6331.64
79.33
Seconds
Join Elimination Optimizing Query 2 Execution Time
joinElimOLAP
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
joinElimOLAP
98.29.7
576.223.9
11.310.9
12.511.4
504.3167.2
586.4231
523.5268.5
5.64.9
5.25.1
4.74.3
Original
Optimized
Query Name
Execution Time (seconds)
Join Elimination Optimized Query in OLAP Environment
PiEstimate
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiEstimate
5403118308
11331611222
10867669623
12834235974
195098133665
Original
Optimized
Query
Estimated Cost (internal units)
Predicate Introduction Estimated Costs
PiActual
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual
13.55.4
24.94.9
25.158.3
46.438.6
56.598.3
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
PiActual2
&C&"Bitstream Vera Serif,Regular"&12&A
&C&"Bitstream Vera Serif,Regular"&12Page &P
PiActual2
21.310.9
52.245.6
Original
Optimized
Query
Execution Time (seconds)
Predicate Introduction Execution Times
-
Order Optimization TechniquesAccess plan strategies exploit the physical orderings provided either by indexes or sorting
GOAL: optimize the sorting strategy
TechniquesPushing down sorts in joinsMinimizing the number of sorting columnsDetecting when sorting can be avoided because of predicates, keys or indexesOrder Optimization : detecting when indexes provide an interesting order, so that sorting can be either avoided, and used as sparingly as possible.Interesting Orders : when the side effect of a join produces rows in sorted order, which can be taken advantage of later (if another join needed, ORDER BY, GROUP BY, DISTINCT)
-
Fundamental OperatorsOrder optimization requires the following operations
Reduce OrderTest OrderCover OrderHomogenize Order
-
Order Optimization Results
-
Exploiting UniquenessChecking to see if query contains unnecessary DISTINCT clausesHow does this make improvements?
Removing duplicates is performed by SORTING, a costly operation.
Example is removing DISTINCT keyword from query if it is applied onto the primary key itself (since primary keys are, by definition, distinct)
-
How to exploit uniqueness?Using knowledge about:KeysTable ConstraintsQuery Predicates
Cannot always be tested efficiently, so we look for a sufficient solution.
-
SummaryImportant Outcome : experimental evidence showing SQO can provide effective enhancement to the traditional query optimization.Join Elimination : geared towards OLAP environment (where very useful)Independent on existence of complex integrity constraint semantic reasoning used about referential integrity constraintsEasy to implement and executePredicate Introduction : guaranteeing improvements more difficult, needing rather severe restrictions imposed (limits the applicability of this approach)Order Optimization : utilizing functional dependencies and table information, we use it in creating a smart access plan, avoiding or optimizing sort operations.Exploiting Uniqueness : uniqueness is powerful when it reduces the number of expensive sorts. Discovering true ways of exploiting this technique are quite tricky and specific.
-
ReferencesQi Cheng, Jarek Gryz, Fred Koo, et al: Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. Proceedings of the 25th VLDB Conference, Edinburg, Scotland,1999.
David E. Simmen, Eugene J. Shekita, Timothy Malkemus: Fundamental Techniques for Order Optimization. SIGMOD Conference 1996: 57-67
G. N. Paulley, Per-ke Larson: Exploiting Uniqueness in Query Optimization. ICDE 1994: 68-79
-
The End.
Stage 1: Query Rewrite PhaseRule based system easy to expand when needed without worrying about other areas of codepredicate pushdown, subquery to join transformation, magic sets transformation, handling duplicates, merging of views and decorrelating complex subqueries
< 1% of query execution time spent on query rewrite phase.
Stage 2: Query Plan OptimizationDB2s Limitation: Only ONE of the potential queries can be passed to the query plan optimizer, thus, the SQO generated MUST produce a faster query (not always the case!)
Advantage: Stop it from spending more time on optimization than query execution itself.