Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang...

Post on 27-Dec-2015

214 views 2 download

Transcript of Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang...

Optimizing Queries and Diverse Data Sources

Laura M. Hass

Donald Kossman

Edward L. Wimmers

Jun Yang

Presented By

Siddhartha Dasari

Overview

• Problem Definition

• Architectural View of Garlic

• Query Plan Generation

• Query Optimization

• Conclusions

Trying to solve

• The use of middleware systems.

• Optimize queries over sources with varying query processing capabilities.

• Use of cost based model.

• Implementation of garlic approach.

• Cost-based (Disco, Garlic, ... )

• Quality-based (Object Globe, HiQIQ, … )

• Adaptive query optimization (Telegraph,

• Tukwila, … )

• Capability-based (Tsimmis, Infomaster )

Optimizing Queries

Reasons to optimize

• People don’t like to wait, they want the programs to be fast.

• The response time should be smaller.

• Even if you use a faster server, this is proved wrong.

How to optimize?

• Filter as much as possible

• Where clause is most important in your query.

• Never write “select * “ specify the correct fields you want to know.

• Join the two tables by using all keys that are related to the tables.

How do they do QueryOptimization?

• Processing costs (estimated from cost model of CPU, I/O)

• Communication costs (estimated using constants in catalog)

• Cost to initiate sub queries & methods (estimated using constants in catalog)

• Wrapper costs (estimated by wrapper)• Plans are pruned upon enumeration• Plan A not used as building block for more complex plan

if cheaper alternatives available• Plans with unique properties are not pruned

How to optimize? (cont..)

Example:• Query : Select * From Employees

In Program : Add a filter on Dept or use command : if Dept = R&D

  Corrected :

Select Name, Salary From Employees Where Dept = R&D

• For i = 1 to 2000

Call Query : Select salary From Employees Where EmpID = Parameter(i)

Corrected:

Select salary From Employees Where EmpID >= 1 and EmpID <= 2000

Garlic Architecture

• Wrapper acts as a interface between query services and data sources.

• Catalog contains local/global schemas.

• Query services contains the query language processor and distributed query execution engine.

• Query language processor generates execution plan based on input.

• Query execution engine passes sub-queries to wrappers and assembles final result.

• Assembly may include performing joins, applying predicates, sorting, aggregates

What do wrappers do?

• Wrappers can wrap various types of data sources.

• Garlic wrappers are specific to Garlic provides interface to data source using Garlic’s internal protocols.

• Data described in an OO model, methods can be applied on data.

• Data source notifies wrapper of capabilities using rules.

• Wrapper does not have to reflect full query functionality of data source.

What are STARS?

• STARs = STrategy Alternative Rules• Rules are high-level, declarative, compact specification

of legal alternatives• STARs define high-level constructs from low level

database operators or other STARs.

JoinRoot(T1,T2,p)={ Permuted Join(T1,T2,P)

Permuted Join(T2,T1,P)

How are plans constructed?

• Tuples are operated upon by POPs (Plan Operators)• A POP generally corresponds to one executable

operator• POPs include: join, sort, filter, fetch, temp, scan,

pushdown (work to be performed by source)• POPs have properties that describes the specifics of the

operations.• Source property records where output stream comes

from (needed?)

Example

•Push Down POP performs operationson the data source

•Data sources only return OID

•Wrappers take Push Downs andperforms them on sources bytranslation into query or API calls

•Source property shows whereexecution occurs

•Properties of POPs are functions ofparent POP (I.e. predicates)

•Additional properties: cost, card

What do stars do?

•Stars can be viewed as a grammar with some set of rules.

•A Star determines how POPs can be combined in a plan.

Here f1, f2 … are the name of the star or POPs.

Stars (cont…)A star can retrieve columns that are needed by another star.

Access root STAR to create plans

How is plan enumeration performed?

• Access Root STAR is used to create plans to select all attributes used in query (no real variability in plans, performs a Push Down).

• Join Root STAR is used to create plans to perform joins.• Finish Root STAR is used to include any missing parts of

the query (i.e. projections, ordering)• Pruning for query optimization performed throughout to

minimize number of plans to enumerate.

How are data sourcecapabilities determined?

• Wrapper implements STARs that describe the capability of each data source.

• STARs follow POP structure mentioned previously• Simple STARs can model basic capabilities of data

sources• Complex capability is arguably not needed as• Garlic’s query engine can make up for it.• Wrapper can iteratively add STARs to:

– Introduce source quickly into mediated schema– Improve performance

Modeling wrappers using stars

• University Offers a course, course description and online complaint mechanisms.

• That is (relational, text, mail)

• The mail has a sender, date, body and subject

Modeling wrapper using STARs• The class objects have attributes like courses and

professor.

Modeling Wrappers using STARs

Disco optimizer

Tsimmis

Future Trends

Any Questions?

Thank you