Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

of 31 /31
Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic

Embed Size (px)

Transcript of Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

  • Semantic Query Optimization TechniquesNovember 16, 2005By : Mladen Kovacevic

  • Background1980's, semantic information stored in dbs as integrity constraints could be used for query optimization

    semantic: of or relating to meaning or the study of meaning(http://wordnet.princeton.edu)

    integrity : preserve data consistency when changes made in db.

    no extensive implementation existing today (1999)

  • IntroductionKey factor in relational database systems improvement in query execution time, is query optimization.

    Query execution can be improved by:

    Analyzing integrity information, and rewriting queries exploiting this information (JE & PI)

    Avoid expensive sorting costs (Order Optimization)

    Exploiting uniqueness by knowing rows will be unique, thus, avoiding extra sorts. (EU)

  • Presentation OverviewSemantic Query Optimization techniques

    Join Elimination (JE) Predicate Introduction (PI)

    Order Optimization (OO)

    Exploiting Uniqueness (EU)

  • Some MotivationDescribing two techniques in SQO, demonstrated in DB2 UDB.Predicate IntroductionJoin Elimination

    Reasons: rewriting queries by hand showed that these two provided consistent optimization.practical to implementextendible to other DBMSs.

    Data sets used : TPC-D and APB-1 OLAP benchmarks

    only REFERENTIAL INTEGRITY constraints and CHECK CONSTRAINTS used!

  • Semantic Query Optimization (SQO) TechniquesJoin Elimination: Some joins need NOT be evaluated since the result may be known apriori (more on this later)

    Join Introduction: Adding a join can help if relation is small compared to original relations and highly selective.

    Predicate Elimination : If predicate known to be always true, can be eliminated from query (DISTINCT clause on Primary Key Uniqueness exploitation!)

    Predicate Introduction: New predicates on indexed attributes can result in a much faster access plan.

    Detecting the Empty Answer Set : If query predicates inconsistent with integrity constraints, the query does not have answer.

  • Why SQO implementations not used?Deductive Databases : Many cases SQO techniques were designed for deductive databases, thus not appearing to be useful in relational database context.

    CPU & I/O Speeds similar : When being developed, CPU & I/O speeds were not as dramatically different(savings in I/O not worth the CPU time added)

    Lack of Integrity Constraints : Thought that many integrity constraints are needed for SQO to be useful

  • Two-stage OptimizerExamples of SQO techniques always designed for a two-stage optimizer

    Stage 1 : logically equivalent queries created (DB2s query rewrite optimization)

    Stage 2 : generate plans of all these queries, choosing the one with lowest estimated cost. (DB2s query plan optimization)Join order, join methods, join site in a distributed database, method for accessing input table, etc.

  • Join EliminationSimple : Eliminate relation where join is over tables related through referential integrity constraint, and primary key table referenced only in the joinVIEW DEFINITIONCREATE VIEW Supplier_Info (n, a, c) asSELECT s_name, s_address, n_nameFROM tpcd.supplier, tpcd.nationWHERE s_nationkey = n_nationkey

    QUERYSELECT s_n, s_aFROM Supplier_Info

  • Join Elimination (cont)Query can be rewritten internally as:SELECT s_n, s_aFROM tpcd.supplierWhy do such a simple rewrite?

    User may not have access to the supplier table, and/or may only know about the view. Sometimes GUI managers create these dumb queries so need to optimize Non-programmers write queries often, and may not even think about this.

    Algorithm for generic redundant join removal provided in paper.

  • Example Join EliminationSELECT p_name, p_retailprice, s_name, s_addressFROM tpcd.lineitem, tpcd.partsupp, tpcd.part, tpcd.supplierWHERE p_partkey = ps_partkey and s_suppkey = ps_suppkey and ps_partkey = l_partkey and ps_suppkey = l_suppkey and l_shipdate between '1994-01-01' and '1996-06-30' and l_discount >= 0.1GROUP BY p_name, p_retailprice, s_name, s_addressORDER BY p_name, s_namePARTPARTKEYSUPPLIERSUPPKEYPARTSUPPPARTKEYSUPPKEYLINEITEMPARTKEYSUPPKEY1 many relationship

  • Example : Join EliminationAny immediate improvements that can be seen here? p_partkey = ps_partkey and s_suppkey = ps_suppkey and ps_partkey = l_partkey and ps_suppkey = l_suppkeyP_PARTKEYPS_PARTKEYL_PARTKEYS_SUPPKEYPS_SUPPKEYL_SUPPKEYP_PARTKEY = PS_PARTKEYPS_PARTKEY = L_PARTKEYS_SUPPKEY = PS_SUPPKEYPS_SUPPKEY = L_SUPPKEYS_SUPPKEY = L_SUPPKEYPS_PARTKEY = L_PARTKEY

  • Results100 MB db sizeExecution Time : 58.5 sec -> 38.25 sec (35 % improvement) I/O Cost: 4631 -> 1498 page reads (67 % improvement)

    Chart1

    4631

    1498

    Pages

    Join Elimination Optimizing Query 1 Pages Read

    data

    OriginalOptimized

    Q1 seconds58.538.2535.00%

    Q1 pages4631149867.00%

    Q2 seconds6331.6479.3399.00%

    OLAP seconds

    I198.29.7

    I2576.223.9

    I311.310.9

    I412.511.4

    I5504.3167.2

    I6586.4231

    I7523.5268.5

    I85.64.9

    I95.25.1

    I104.74.3

    Estimated Cost

    P15403118308

    P211331611222

    P310867669623

    P412834235974

    P5195098133665

    Execution Time

    P113.55.4

    P224.94.9

    P325.158.3

    P446.438.6

    P556.598.3

    Modified Results

    P321.310.9

    P552.245.6

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    58.5

    38.25

    Seconds

    Join Elimination Optimizing Query 1 Execution Time

    joinElim2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim2

    4631

    1498

    Pages

    Join Elimination Optimizing Query 1 Pages Read

    joinElim3

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim3

    0

    0

    Seconds

    Join Elimination Optimizing Query 2 Execution Time

    joinElimOLAP

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElimOLAP

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Query Name

    Execution Time (seconds)

    Join Elimination Optimized Query in OLAP Environment

    PiEstimate

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiEstimate

    00

    00

    00

    00

    00

    Query

    Estimated Cost (internal units)

    Predicate Introduction Estimated Costs

    PiActual

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual

    00

    00

    00

    00

    00

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    PiActual2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual2

    21.310.9

    52.245.6

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    Chart2

    58.5

    38.25

    Seconds

    Join Elimination Optimizing Query 1 Execution Time

    data

    OriginalOptimized

    Q1 seconds58.538.2535.00%

    Q1 pages4631149867.00%

    Q2 seconds6331.6479.3399.00%

    OLAP seconds

    I198.29.7

    I2576.223.9

    I311.310.9

    I412.511.4

    I5504.3167.2

    I6586.4231

    I7523.5268.5

    I85.64.9

    I95.25.1

    I104.74.3

    Estimated Cost

    P15403118308

    P211331611222

    P310867669623

    P412834235974

    P5195098133665

    Execution Time

    P113.55.4

    P224.94.9

    P325.158.3

    P446.438.6

    P556.598.3

    Modified Results

    P321.310.9

    P552.245.6

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    58.5

    38.25

    Seconds

    Join Elimination Optimizing Query 1 Execution Time

    joinElim2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim2

    4631

    1498

    Pages

    Join Elimination Optimizing Query 1 Pages Read

    joinElim3

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim3

    0

    0

    Seconds

    Join Elimination Optimizing Query 2 Execution Time

    joinElimOLAP

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElimOLAP

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Query Name

    Execution Time (seconds)

    Join Elimination Optimized Query in OLAP Environment

    PiEstimate

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiEstimate

    00

    00

    00

    00

    00

    Query

    Estimated Cost (internal units)

    Predicate Introduction Estimated Costs

    PiActual

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual

    00

    00

    00

    00

    00

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    PiActual2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual2

    21.310.9

    52.245.6

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

  • Results OLAP EnvironmentIn OLAP (online analytical processing) servers, using a star schema (one fact table, with several dimension tables) improvements ranged from 2% to 96 %.

    In these cases, much improvement came from CPU cost instead of I/O, because dimension tables were small enough to fit into memory...

  • Chart4

    98.29.7

    576.223.9

    11.310.9

    12.511.4

    504.3167.2

    586.4231

    523.5268.5

    5.64.9

    5.25.1

    4.74.3

    Original

    Optimized

    Query Name

    Execution Time (seconds)

    Join Elimination Optimized Query in OLAP Environment

    data

    OriginalOptimized

    Q1 seconds58.538.2535.00%

    Q1 pages4631149867.00%

    Q2 seconds6331.6479.3399.00%

    OLAP seconds

    I198.29.7

    I2576.223.9

    I311.310.9

    I412.511.4

    I5504.3167.2

    I6586.4231

    I7523.5268.5

    I85.64.9

    I95.25.1

    I104.74.3

    Estimated Cost

    P15403118308

    P211331611222

    P310867669623

    P412834235974

    P5195098133665

    Execution Time

    P113.55.4

    P224.94.9

    P325.158.3

    P446.438.6

    P556.598.3

    Modified Results

    P321.310.9

    P552.245.6

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    58.5

    38.25

    Seconds

    Join Elimination Optimizing Query 1 Execution Time

    joinElim2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim2

    4631

    1498

    Pages

    Join Elimination Optimizing Query 1 Pages Read

    joinElim3

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim3

    6331.64

    79.33

    Seconds

    Join Elimination Optimizing Query 2 Execution Time

    joinElimOLAP

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElimOLAP

    98.29.7

    576.223.9

    11.310.9

    12.511.4

    504.3167.2

    586.4231

    523.5268.5

    5.64.9

    5.25.1

    4.74.3

    Original

    Optimized

    Query Name

    Execution Time (seconds)

    Join Elimination Optimized Query in OLAP Environment

    PiEstimate

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiEstimate

    00

    00

    00

    00

    00

    Query

    Estimated Cost (internal units)

    Predicate Introduction Estimated Costs

    PiActual

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual

    00

    00

    00

    00

    00

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    PiActual2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual2

    21.310.9

    52.245.6

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

  • Predicate IntroductionTechniques discussed :

    Index Introduction : add new predicate on attribute if index exists on that attribute.Assumption : index retrieval is better than table scan, is this always good?

    Scan Reduction : reduce number of tuples that qualify for a join.Problem : Not very common; unlikely that there will be any check constraints or predicates with inequalities about join columns

    Detecting empty query answer set (not shown as query execution time essentially 0)

  • Example - Predicate IntroductionSELECT sum(l_extendedprice * l_discount) as revenueFROM tpcd.lineitemWHERE l_shipdate >= date(1994-01-01) and l_shipdate < date(1994-01-01)+ 1 year and l_discount between .06 0.01 and .06 + 0.01 and l_quantity < 24;

    Check Constraint : l_shipdate = date(1994-01-01)

  • Example - Predicate IntroductionSELECT sum(l_extendedprice * l_discount) as revenueFROM tpcd.lineitemWHERE l_shipdate >= date(1994-01-01) and l_shipdate < date(1994-01-01)+ 1 year and l_receiptdate >= date(1994-01-01) and l_discount between .06 0.01 and .06 + 0.01 and l_quantity < 24;

    Check Constraint : l_shipdate = date(1994-01-01)

    Why would we want to do this? In order to have optimizer choose a plan using the index. Is this always good?NO! What if most of the rows in the table need to be returned? We should use a tablescan instead.

  • Predicate Introduction - AlgorithmInput : set of all check constraints defined for a database and the set of all predicates in query

    Output: set of all non-redundant formulas derivable from the input set. This answer set can then be added to the query, but only a few are potentially useful. The goal in the paper was to choose additions that would guarantee improvement.

    Conditions in paper: Conservative approach of introducing predicates that will have the plan optimizer use an index. Insist on only one index available with the query predicate.

  • Predicate Introduction - Results

    Chart1

    5403118308

    11331611222

    10867669623

    12834235974

    195098133665

    Original

    Optimized

    Query

    Estimated Cost (internal units)

    Predicate Introduction Estimated Costs

    data

    OriginalOptimized

    Q1 seconds58.538.2535.00%

    Q1 pages4631149867.00%

    Q2 seconds6331.6479.3399.00%

    OLAP seconds

    I198.29.7

    I2576.223.9

    I311.310.9

    I412.511.4

    I5504.3167.2

    I6586.4231

    I7523.5268.5

    I85.64.9

    I95.25.1

    I104.74.3

    Estimated Cost

    P15403118308

    P211331611222

    P310867669623

    P412834235974

    P5195098133665

    Execution Time

    P113.55.4

    P224.94.9

    P325.158.3

    P446.438.6

    P556.598.3

    Modified Results

    P321.310.9

    P552.245.6

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    58.5

    38.25

    Seconds

    Join Elimination Optimizing Query 1 Execution Time

    joinElim2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim2

    4631

    1498

    Pages

    Join Elimination Optimizing Query 1 Pages Read

    joinElim3

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim3

    6331.64

    79.33

    Seconds

    Join Elimination Optimizing Query 2 Execution Time

    joinElimOLAP

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElimOLAP

    98.29.7

    576.223.9

    11.310.9

    12.511.4

    504.3167.2

    586.4231

    523.5268.5

    5.64.9

    5.25.1

    4.74.3

    Original

    Optimized

    Query Name

    Execution Time (seconds)

    Join Elimination Optimized Query in OLAP Environment

    PiEstimate

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiEstimate

    5403118308

    11331611222

    10867669623

    12834235974

    195098133665

    Original

    Optimized

    Query

    Estimated Cost (internal units)

    Predicate Introduction Estimated Costs

    PiActual

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual

    13.55.4

    24.94.9

    25.158.3

    46.438.6

    56.598.3

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    PiActual2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual2

    21.310.9

    52.245.6

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

  • Predicate Introduction - ResultsWhy?

    Chart2

    13.55.4

    24.94.9

    25.158.3

    46.438.6

    56.598.3

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    data

    OriginalOptimized

    Q1 seconds58.538.2535.00%

    Q1 pages4631149867.00%

    Q2 seconds6331.6479.3399.00%

    OLAP seconds

    I198.29.7

    I2576.223.9

    I311.310.9

    I412.511.4

    I5504.3167.2

    I6586.4231

    I7523.5268.5

    I85.64.9

    I95.25.1

    I104.74.3

    Estimated Cost

    P15403118308

    P211331611222

    P310867669623

    P412834235974

    P5195098133665

    Execution Time

    P113.55.4

    P224.94.9

    P325.158.3

    P446.438.6

    P556.598.3

    Modified Results

    P321.310.9

    P552.245.6

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    58.5

    38.25

    Seconds

    Join Elimination Optimizing Query 1 Execution Time

    joinElim2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim2

    4631

    1498

    Pages

    Join Elimination Optimizing Query 1 Pages Read

    joinElim3

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim3

    6331.64

    79.33

    Seconds

    Join Elimination Optimizing Query 2 Execution Time

    joinElimOLAP

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElimOLAP

    98.29.7

    576.223.9

    11.310.9

    12.511.4

    504.3167.2

    586.4231

    523.5268.5

    5.64.9

    5.25.1

    4.74.3

    Original

    Optimized

    Query Name

    Execution Time (seconds)

    Join Elimination Optimized Query in OLAP Environment

    PiEstimate

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiEstimate

    5403118308

    11331611222

    10867669623

    12834235974

    195098133665

    Original

    Optimized

    Query

    Estimated Cost (internal units)

    Predicate Introduction Estimated Costs

    PiActual

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual

    13.55.4

    24.94.9

    25.158.3

    46.438.6

    56.598.3

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    PiActual2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual2

    21.310.9

    52.245.6

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

  • Why Longer Execution for P3/P5?P2 and P3 are the same except for the followingP2 :SELECT ...FROM ...WHERE l_shipdate >= date ('1998-09-01') and l_shipdate < date ('1998-09-01') + 1 month

    P3 :SELECT ...FROM ...WHERE l_shipdate >= date ('1995-09-01') and l_shipdate < date ('1995-09-01') + 1 month

    Difference in table shows that P2 has 2 % of the tuples falling in the range while P3 has 48 % of the tuples fall in the category : BOTH plans will choose index scan! P3 is so large that tablescan is better in this case.Cost model underestimates cost of locking/unlocking index pagesEstimated number of tuples goes down because of the reduction factor problem (multiply in the new predicate added)

  • Adjustments for Reduction Factor ProblemAdd new predicate only when it contains a major column of an index and a scan of that index is sufficient to answer the query (thus, no table scan necessary) Original Index : New Index :

    Chart3

    21.310.9

    52.245.6

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    data

    OriginalOptimized

    Q1 seconds58.538.2535.00%

    Q1 pages4631149867.00%

    Q2 seconds6331.6479.3399.00%

    OLAP seconds

    I198.29.7

    I2576.223.9

    I311.310.9

    I412.511.4

    I5504.3167.2

    I6586.4231

    I7523.5268.5

    I85.64.9

    I95.25.1

    I104.74.3

    Estimated Cost

    P15403118308

    P211331611222

    P310867669623

    P412834235974

    P5195098133665

    Execution Time

    P113.55.4

    P224.94.9

    P325.158.3

    P446.438.6

    P556.598.3

    Modified Results

    P321.310.9

    P552.245.6

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim1

    58.5

    38.25

    Seconds

    Join Elimination Optimizing Query 1 Execution Time

    joinElim2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim2

    4631

    1498

    Pages

    Join Elimination Optimizing Query 1 Pages Read

    joinElim3

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElim3

    6331.64

    79.33

    Seconds

    Join Elimination Optimizing Query 2 Execution Time

    joinElimOLAP

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    joinElimOLAP

    98.29.7

    576.223.9

    11.310.9

    12.511.4

    504.3167.2

    586.4231

    523.5268.5

    5.64.9

    5.25.1

    4.74.3

    Original

    Optimized

    Query Name

    Execution Time (seconds)

    Join Elimination Optimized Query in OLAP Environment

    PiEstimate

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiEstimate

    5403118308

    11331611222

    10867669623

    12834235974

    195098133665

    Original

    Optimized

    Query

    Estimated Cost (internal units)

    Predicate Introduction Estimated Costs

    PiActual

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual

    13.55.4

    24.94.9

    25.158.3

    46.438.6

    56.598.3

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

    PiActual2

    &C&"Bitstream Vera Serif,Regular"&12&A

    &C&"Bitstream Vera Serif,Regular"&12Page &P

    PiActual2

    21.310.9

    52.245.6

    Original

    Optimized

    Query

    Execution Time (seconds)

    Predicate Introduction Execution Times

  • Order Optimization TechniquesAccess plan strategies exploit the physical orderings provided either by indexes or sorting

    GOAL: optimize the sorting strategy

    TechniquesPushing down sorts in joinsMinimizing the number of sorting columnsDetecting when sorting can be avoided because of predicates, keys or indexesOrder Optimization : detecting when indexes provide an interesting order, so that sorting can be either avoided, and used as sparingly as possible.Interesting Orders : when the side effect of a join produces rows in sorted order, which can be taken advantage of later (if another join needed, ORDER BY, GROUP BY, DISTINCT)

  • Fundamental OperatorsOrder optimization requires the following operations

    Reduce OrderTest OrderCover OrderHomogenize Order

  • Order Optimization Results

  • Exploiting UniquenessChecking to see if query contains unnecessary DISTINCT clausesHow does this make improvements?

    Removing duplicates is performed by SORTING, a costly operation.

    Example is removing DISTINCT keyword from query if it is applied onto the primary key itself (since primary keys are, by definition, distinct)

  • How to exploit uniqueness?Using knowledge about:KeysTable ConstraintsQuery Predicates

    Cannot always be tested efficiently, so we look for a sufficient solution.

  • SummaryImportant Outcome : experimental evidence showing SQO can provide effective enhancement to the traditional query optimization.Join Elimination : geared towards OLAP environment (where very useful)Independent on existence of complex integrity constraint semantic reasoning used about referential integrity constraintsEasy to implement and executePredicate Introduction : guaranteeing improvements more difficult, needing rather severe restrictions imposed (limits the applicability of this approach)Order Optimization : utilizing functional dependencies and table information, we use it in creating a smart access plan, avoiding or optimizing sort operations.Exploiting Uniqueness : uniqueness is powerful when it reduces the number of expensive sorts. Discovering true ways of exploiting this technique are quite tricky and specific.

  • ReferencesQi Cheng, Jarek Gryz, Fred Koo, et al: Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. Proceedings of the 25th VLDB Conference, Edinburg, Scotland,1999.

    David E. Simmen, Eugene J. Shekita, Timothy Malkemus: Fundamental Techniques for Order Optimization. SIGMOD Conference 1996: 57-67

    G. N. Paulley, Per-ke Larson: Exploiting Uniqueness in Query Optimization. ICDE 1994: 68-79

  • The End.

    Stage 1: Query Rewrite PhaseRule based system easy to expand when needed without worrying about other areas of codepredicate pushdown, subquery to join transformation, magic sets transformation, handling duplicates, merging of views and decorrelating complex subqueries

    < 1% of query execution time spent on query rewrite phase.

    Stage 2: Query Plan OptimizationDB2s Limitation: Only ONE of the potential queries can be passed to the query plan optimizer, thus, the SQO generated MUST produce a faster query (not always the case!)

    Advantage: Stop it from spending more time on optimization than query execution itself.