Download - Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Transcript
Page 1: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Semantic Query Optimization Techniques

November 16, 2005

By : Mladen Kovacevic

Page 2: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Background• 1980's, semantic information stored in

dbs as integrity constraints could be used for query optimization

• semantic: “of or relating to meaning or the study of meaning”(http://wordnet.princeton.edu)

• integrity : preserve data consistency when changes made in db.

• no extensive implementation existing today (1999)

Page 3: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Introduction• Key factor in relational database

system’s improvement in query execution time, is query optimization.

• Query execution can be improved by:

– Analyzing integrity information, and rewriting queries exploiting this information (JE & PI)

– Avoid expensive sorting costs (Order Optimization)

– Exploiting uniqueness by knowing rows will be unique, thus, avoiding extra sorts. (EU)

Page 4: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Presentation Overview• Semantic Query Optimization techniques

– Join Elimination (JE)

– Predicate Introduction (PI)

– Order Optimization (OO)

– Exploiting Uniqueness (EU)

Page 5: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Some Motivation• Describing two techniques in SQO,

demonstrated in DB2 UDB.– Predicate Introduction– Join Elimination

• Reasons: – rewriting queries by hand showed that these

two provided consistent optimization.– practical to implement– extendible to other DBMS’s.

• Data sets used : TPC-D and APB-1 OLAP benchmarks

only REFERENTIAL INTEGRITY constraints and CHECK CONSTRAINTS used!

Page 6: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Semantic Query Optimization (SQO) Techniques

• Join Elimination: Some joins need NOT be evaluated since the result may be known apriori (more on this later)

• Join Introduction: Adding a join can help if relation is small compared to original relations and highly selective.

• Predicate Elimination : If predicate known to be always true, can be eliminated from query (DISTINCT clause on Primary Key – Uniqueness exploitation!)

• Predicate Introduction: New predicates on indexed attributes can result in a much faster access plan.

• Detecting the Empty Answer Set : If query predicates inconsistent with integrity constraints, the query does not have answer.

Page 7: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Why SQO implementations not used?• Deductive Databases : Many cases SQO

techniques were designed for deductive databases, thus not appearing to be useful in relational database context.

• CPU & I/O Speeds similar : When being developed, CPU & I/O speeds were not as dramatically different– (savings in I/O not worth the CPU time

added)

• Lack of Integrity Constraints : Thought that many integrity constraints are needed for SQO to be useful

Page 8: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Two-stage Optimizer• Examples of SQO techniques always designed

for a two-stage optimizer

– Stage 1 : logically equivalent queries created (DB2’s query rewrite optimization)

– Stage 2 : generate plans of all these queries, choosing the one with lowest estimated cost. (DB2’s query plan optimization)

• Join order, join methods, join site in a distributed database, method for accessing input table, etc.

Page 9: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Join Elimination· Simple : Eliminate relation where join is over tables

related through referential integrity constraint, and primary key table referenced only in the join

VIEW DEFINITIONCREATE VIEW Supplier_Info (n, a, c) asSELECT s_name, s_address, n_nameFROM tpcd.supplier, tpcd.nationWHERE s_nationkey = n_nationkey

QUERYSELECT s_n, s_aFROM Supplier_Info

Page 10: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Join Elimination (con’t)· Query can be rewritten internally as:

SELECT s_n, s_aFROM tpcd.supplier

Why do such a simple rewrite?

• User may not have access to the supplier table, and/or may only know about the view.• Sometimes GUI managers create these “dumb” queries so need to optimize• Non-programmers write queries often, and may

not even think about this.

• Algorithm for generic redundant join removal provided in paper.

Page 11: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Example – Join EliminationSELECT p_name, p_retailprice, s_name, s_addressFROM tpcd.lineitem, tpcd.partsupp, tpcd.part, tpcd.supplierWHERE p_partkey = ps_partkey and s_suppkey = ps_suppkey and ps_partkey = l_partkey and ps_suppkey = l_suppkey and l_shipdate between '1994-01-01' and '1996-06-30' and l_discount >= 0.1GROUP BY p_name, p_retailprice, s_name, s_addressORDER BY p_name, s_name

PARTPARTKEY

SUPPLIERSUPPKEY

PARTSUPPPARTKEY

SUPPKEY

LINEITEMPARTKEY

SUPPKEY

1 – many relationship

Page 12: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Example : Join Elimination• Any immediate improvements that can be seen

here?

p_partkey = ps_partkey and s_suppkey = ps_suppkey and ps_partkey = l_partkey and ps_suppkey = l_suppkey

P_PARTKEY PS_PARTKEY L_PARTKEY

S_SUPPKEY PS_SUPPKEY L_SUPPKEY

P_PARTKEY = PS_PARTKEY PS_PARTKEY = L_PARTKEY

S_SUPPKEY = PS_SUPPKEY PS_SUPPKEY = L_SUPPKEY

S_SUPPKEY = L_SUPPKEY

PS_PARTKEY = L_PARTKEY

Page 13: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Results• 100 MB db size• Execution Time : 58.5 sec -> 38.25 sec (35 % improvement) • I/O Cost: 4631 -> 1498 page reads (67 % improvement)

Join Elimination Optimizing Query 1 Pages Read

4631

1498

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Original Optimized

Pag

es

Join Elimination Optimizing Query 1 Execution Time

58.5

38.25

0

10

20

30

40

50

60

70

Original Optimized

Se

con

ds

Page 14: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Results – OLAP Environment• In OLAP (online analytical processing) servers,

using a star schema (one fact table, with several dimension tables) improvements ranged from 2% to 96 %.

– In these cases, much improvement came from CPU cost instead of I/O, because dimension tables were small enough to fit into memory...

Page 15: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Join Elimination Optimized Query in OLAP Environment

0

100

200

300

400

500

600

700

I1 I2 I3 I4 I5 I6 I7 I8 I9 I10Query Name

Exe

cutio

n T

ime

(se

con

ds)

Page 16: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Predicate Introduction• Techniques discussed :

– Index Introduction : add new predicate on attribute if index exists on that attribute.

– Assumption : index retrieval is better than table scan, is this always good?

– Scan Reduction : reduce number of tuples that qualify for a join.

– Problem : Not very common; unlikely that there will be any check constraints or predicates with inequalities about join columns

• Detecting empty query answer set (not shown as query execution time essentially 0)

Page 17: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Example - Predicate IntroductionSELECT sum(l_extendedprice * l_discount) as revenue

FROM tpcd.lineitem

WHERE l_shipdate >= date(‘1994-01-01’) and

l_shipdate < date(‘1994-01-01’)+ 1 year and

l_discount between .06 – 0.01 and

.06 + 0.01 and l_quantity < 24;

Check Constraint : l_shipdate <= l_receiptdateIndex : l_receiptdate

• Maintaining semantics, we can add :– l_receiptdate >= date(‘1994-01-01’)

Page 18: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Example - Predicate IntroductionSELECT sum(l_extendedprice * l_discount) as revenue

FROM tpcd.lineitem

WHERE l_shipdate >= date(‘1994-01-01’) and

l_shipdate < date(‘1994-01-01’)+ 1 year and

l_receiptdate >= date(‘1994-01-01’) and

l_discount between .06 – 0.01 and

.06 + 0.01 and l_quantity < 24;

Check Constraint : l_shipdate <= l_receiptdateIndex : l_receiptdate

• Maintaining semantics, we can add :– l_receiptdate >= date(‘1994-01-01’)

• Why would we want to do this? In order to have optimizer choose a plan using the index. Is this always good?

• NO! What if most of the rows in the table need to be returned? We should use a tablescan instead.

Page 19: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Predicate Introduction - Algorithm• Input : set of all check constraints defined for

a database and the set of all predicates in query

• Output: set of all non-redundant formulas derivable from the input set. This answer set can then be added to the query, but only a few are potentially useful.

• The goal in the paper was to choose additions that would guarantee improvement.

• Conditions in paper: Conservative approach of introducing predicates that will have the plan optimizer use an index. Insist on only one index available with the query predicate.

Page 20: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Predicate Introduction - Results

Predicate Introduction Estimated Costs

0

50000

100000

150000

200000

250000

P1 P2 P3 P4 P5

Query

Est

ima

ted

Co

st (

inte

rna

l un

its)

Original

Optimized

Page 21: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Predicate Introduction - Results

Predicate Introduction Execution Times

0

20

40

60

80

100

120

P1 P2 P3 P4 P5

Query

Exe

cutio

n T

ime

(se

con

ds)

Original

Optimized

Why?

Page 22: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Why Longer Execution for P3/P5?• P2 and P3 are the same except for the followingP2 :SELECT ...FROM ...WHERE l_shipdate >= date ('1998-09-01') and l_shipdate < date ('1998-09-01') + 1 month

P3 :SELECT ...FROM ...WHERE l_shipdate >= date ('1995-09-01') and l_shipdate < date ('1995-09-01') + 1 month

• Difference in table shows that P2 has 2 % of the tuples falling in the range while P3 has 48 % of the tuples fall in the category : BOTH plans will choose index scan! P3 is so large that tablescan is better in this case.

1. Cost model underestimates cost of locking/unlocking index pages

2. Estimated number of tuples goes down because of the reduction factor problem (multiply in the new predicate added)

Page 23: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Adjustments for Reduction Factor Problem• Add new predicate only when it contains a major

column of an index and a scan of that index is sufficient to answer the query (thus, no table scan necessary)

• Original Index : <receiptdate, discount, quantity, extendedprice>

• New Index : <receiptdate, discount, quantity, extendedprice, shipdate, partkey, suppkey, orderkey> Predicate Introduction Execution Times

0

10

20

30

40

50

60

P3 P5

Query

Exe

cutio

n T

ime

(se

con

ds)

Original

Optimized

Page 24: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Order Optimization Techniques• Access plan strategies exploit the physical

orderings provided either by indexes or sorting

• GOAL: optimize the sorting strategy

• Techniques– Pushing down sorts in joins– Minimizing the number of sorting columns– Detecting when sorting can be avoided because of

predicates, keys or indexes

• Order Optimization : detecting when indexes provide an interesting order, so that sorting can be either avoided, and used as sparingly as possible.

• Interesting Orders : when the side effect of a join produces rows in sorted order, which can be taken advantage of later (if another join needed, ORDER BY, GROUP BY, DISTINCT)

Page 25: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Fundamental Operators• Order optimization requires the following

operations

– Reduce Order– Test Order– Cover Order– Homogenize Order

Page 26: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Order Optimization Results

Page 27: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Exploiting Uniqueness• Checking to see if query contains unnecessary

DISTINCT clauses– How does this make improvements?

• Removing duplicates is performed by SORTING, a costly operation.

• Example is removing DISTINCT keyword from query if it is applied onto the primary key itself (since primary keys are, by definition, distinct)

Page 28: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

How to exploit uniqueness?• Using knowledge about:

– Keys– Table Constraints– Query Predicates

• Cannot always be tested efficiently, so we look for a sufficient solution.

Page 29: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

Summary• Important Outcome : experimental evidence

showing SQO can provide effective enhancement to the traditional query optimization.– Join Elimination : geared towards OLAP

environment (where very useful)– Independent on existence of complex

integrity constraint – semantic reasoning used about referential integrity constraints

– Easy to implement and execute

– Predicate Introduction : guaranteeing improvements more difficult, needing rather severe restrictions imposed (limits the applicability of this approach)

– Order Optimization : utilizing functional dependencies and table information, we use it in creating a “smart” access plan, avoiding or optimizing sort operations.

– Exploiting Uniqueness : uniqueness is powerful when it reduces the number of expensive sorts. Discovering true ways of exploiting this technique are quite tricky and specific.

Page 30: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

References• Qi Cheng, Jarek Gryz, Fred Koo, et al: Implementation of

Two Semantic Query Optimization Techniques in DB2 Universal Database. Proceedings of the 25th VLDB Conference, Edinburg, Scotland,1999.

• David E. Simmen, Eugene J. Shekita, Timothy Malkemus: FundamentalTechniques for Order Optimization. SIGMOD Conference 1996: 57-67

• G. N. Paulley, Per-ke Larson: Exploiting Uniqueness in Query Optimization. ICDE 1994: 68-79

Page 31: Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.

The End.