Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang...

26
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari

Transcript of Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang...

Page 1: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Optimizing Queries and Diverse Data Sources

Laura M. Hass

Donald Kossman

Edward L. Wimmers

Jun Yang

Presented By

Siddhartha Dasari

Page 2: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Overview

• Problem Definition

• Architectural View of Garlic

• Query Plan Generation

• Query Optimization

• Conclusions

Page 3: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Trying to solve

• The use of middleware systems.

• Optimize queries over sources with varying query processing capabilities.

• Use of cost based model.

• Implementation of garlic approach.

Page 4: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

• Cost-based (Disco, Garlic, ... )

• Quality-based (Object Globe, HiQIQ, … )

• Adaptive query optimization (Telegraph,

• Tukwila, … )

• Capability-based (Tsimmis, Infomaster )

Page 5: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Optimizing Queries

Reasons to optimize

• People don’t like to wait, they want the programs to be fast.

• The response time should be smaller.

• Even if you use a faster server, this is proved wrong.

Page 6: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

How to optimize?

• Filter as much as possible

• Where clause is most important in your query.

• Never write “select * “ specify the correct fields you want to know.

• Join the two tables by using all keys that are related to the tables.

Page 7: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

How do they do QueryOptimization?

• Processing costs (estimated from cost model of CPU, I/O)

• Communication costs (estimated using constants in catalog)

• Cost to initiate sub queries & methods (estimated using constants in catalog)

• Wrapper costs (estimated by wrapper)• Plans are pruned upon enumeration• Plan A not used as building block for more complex plan

if cheaper alternatives available• Plans with unique properties are not pruned

Page 8: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

How to optimize? (cont..)

Example:• Query : Select * From Employees

In Program : Add a filter on Dept or use command : if Dept = R&D

  Corrected :

Select Name, Salary From Employees Where Dept = R&D

• For i = 1 to 2000

Call Query : Select salary From Employees Where EmpID = Parameter(i)

Corrected:

Select salary From Employees Where EmpID >= 1 and EmpID <= 2000

Page 9: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Garlic Architecture

• Wrapper acts as a interface between query services and data sources.

• Catalog contains local/global schemas.

Page 10: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

• Query services contains the query language processor and distributed query execution engine.

• Query language processor generates execution plan based on input.

• Query execution engine passes sub-queries to wrappers and assembles final result.

• Assembly may include performing joins, applying predicates, sorting, aggregates

Page 11: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

What do wrappers do?

• Wrappers can wrap various types of data sources.

• Garlic wrappers are specific to Garlic provides interface to data source using Garlic’s internal protocols.

• Data described in an OO model, methods can be applied on data.

• Data source notifies wrapper of capabilities using rules.

• Wrapper does not have to reflect full query functionality of data source.

Page 12: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

What are STARS?

• STARs = STrategy Alternative Rules• Rules are high-level, declarative, compact specification

of legal alternatives• STARs define high-level constructs from low level

database operators or other STARs.

JoinRoot(T1,T2,p)={ Permuted Join(T1,T2,P)

Permuted Join(T2,T1,P)

Page 13: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

How are plans constructed?

• Tuples are operated upon by POPs (Plan Operators)• A POP generally corresponds to one executable

operator• POPs include: join, sort, filter, fetch, temp, scan,

pushdown (work to be performed by source)• POPs have properties that describes the specifics of the

operations.• Source property records where output stream comes

from (needed?)

Page 14: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Example

•Push Down POP performs operationson the data source

•Data sources only return OID

•Wrappers take Push Downs andperforms them on sources bytranslation into query or API calls

•Source property shows whereexecution occurs

•Properties of POPs are functions ofparent POP (I.e. predicates)

•Additional properties: cost, card

Page 15: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Page 16: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

What do stars do?

•Stars can be viewed as a grammar with some set of rules.

•A Star determines how POPs can be combined in a plan.

Here f1, f2 … are the name of the star or POPs.

Page 17: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Stars (cont…)A star can retrieve columns that are needed by another star.

Access root STAR to create plans

Page 18: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

How is plan enumeration performed?

• Access Root STAR is used to create plans to select all attributes used in query (no real variability in plans, performs a Push Down).

• Join Root STAR is used to create plans to perform joins.• Finish Root STAR is used to include any missing parts of

the query (i.e. projections, ordering)• Pruning for query optimization performed throughout to

minimize number of plans to enumerate.

Page 19: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

How are data sourcecapabilities determined?

• Wrapper implements STARs that describe the capability of each data source.

• STARs follow POP structure mentioned previously• Simple STARs can model basic capabilities of data

sources• Complex capability is arguably not needed as• Garlic’s query engine can make up for it.• Wrapper can iteratively add STARs to:

– Introduce source quickly into mediated schema– Improve performance

Page 20: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Modeling wrappers using stars

• University Offers a course, course description and online complaint mechanisms.

• That is (relational, text, mail)

• The mail has a sender, date, body and subject

Page 21: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Modeling wrapper using STARs• The class objects have attributes like courses and

professor.

Page 22: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Modeling Wrappers using STARs

Page 23: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Disco optimizer

Page 24: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Tsimmis

Page 25: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Future Trends

Page 26: Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Any Questions?

Thank you