Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali...

33
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA VLDB 2005

Transcript of Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali...

Page 1: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Schema-Based Query Optimization for XQuery over XML Streams

Hong SuElke A. Rundensteiner

Murali Mani

Worcester Polytechnic Institute, Massachusetts, USA

VLDB 2005

Page 2: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Schema-Based Query Optimization (SQO)

Schema knowledge can be utilized to optimize queries

Well studied in deductive/relational databases Join elimination predicate elimination, detection of empty answer set …

Equally applicable to XML for flat value filtering

Page 3: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

SQO for XML Pattern Retrieval

General XML SQO Applicable to both static and streaming XML E.g..: Query tree minimization [Amer-Yahia+02]

Static XML Specific SQO Focus on expediting random access of data E.g.: Query rewrite using “extents” (indices built

on element types) [Fernandez+98], … Stream specific XML SQO

Focus on expediting token-by-token sequential access of data

Page 4: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Stream Specific SQO Example

/seller[shipTo]

Without schema

<!element seller((billTo,shipTo)|sameAddr, …)>

Buffer seller element

Retrieve /shipTo

Buffer seller element

Retrieve /shipTo

Retrieve /sameAddr

<seller><sameAddr>…<url>…<url></seller>

buffer:

buffer:<seller>

When retrieved

Skip computation

Page 5: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Related Work YFilter [Diao02] and XSM [Ludscher 03]

Use schema to decide whether pattern results are recursive or types of child elements

Essentially propose general XML SQO FluXQuery [Koch+04]

Use schema to minimize buffer size Is complementary to our focus (aim to skip

unnecessary computations) SIX [Gupta+03]

Use indices interleaved with XML data to reduce parsing

Could be combined with our techniques

Page 6: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Challenge: Constraint Useful?

/seller/shipToRetrieve /shipTo

Retrieve /sameAddr

When retrieved

Nothing to save: /shipTo is the only pattern

retrieval<!element seller((billTo,shipTo)|sameAddr, …)>

/seller[shipTo]/billTo Retrieve /shipTo

Retrieve /sameAddr

When retrieved

Retrieve /billTo

Nothing to save: /billTo has

already been retrieved

Page 7: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Challenge : Benefits/Overhead?

Maximal benefits: no beneficial optimization should be missed Any failed patterns should be detected as early

as possible

Minimal overhead: no redundant optimization should be introduced Whether a particular pattern fails should not be

repeatedly checked

Page 8: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Challenge: Plan Execution

Optimization at lower level than query rewrite

Specific physical implementations are needed

/seller[shipTo]

Buffer seller element

Retrieve /shipTo

Retrieve /sameAddr

When retrieved

No query can capture

this optimization

<!element seller((billTo,shipTo)|sameAddr, …)>

Page 9: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Outline

SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

Page 10: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Physical Implementation of Pattern Retrieval

Note: Important to understand physical

stream engine implementation for designing effective SQO

Our implementation: Widely used automata implementation

[e.g., Tukwila, YFilter]

Page 11: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Example Query and its Automata

0 1 2

911 12

auctionsauction

shipToseller

primary, secondary phone

10

…for $a in /auctions/auction, $b in $a/seller[shipTo]where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item where $c//keyword=“auto” return $b/*/phone</auction>

*

<auctions> <auction> … <phone> … </phone>input

[2,3][1][0]

[1][0][0]

stack [12#]

[11]

…[2,3][1][0]

… …[11]

…[2,3][1][0]

#: buffering flag

Page 12: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Example Query and its Automata

0 1 2

911 12

auctionsauction

shipToseller

primary, secondary phone

10

*

<auctions> <auction> … <phone> … </phone>input

[2,3][1][0]

[1][0][0]

stack [12#]

[11]

…[2,3][1][0]

… …[11]

…[2,3][1][0]

#: buffering flag

Opt. opportunities:

1. avoid transitions as much as possible

2. revoke buffering flag as soon as possible

Page 13: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Is Constraint Useful for Opt.?

Constraints used to find “ending marks” of a pattern within a context element

<!element seller((billTo, shipTo)|sameAddr?, …)>

<sameAddr> is ending mark of /shipTo within seller element context

Page 14: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Page 15: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Pattern may fail to appear

<!element auction(seller, …)>

Ending mark for $a/seller is not helpfulfor $a in /auctions/auction,

$b in $a/seller…

+<!element auction(seller?, …)>

Ending mark for $a/seller is helpful

Page 16: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Pattern may fail to appear Pattern is required

for $c in $a/itemreturn <c>$a/category</c>

<!element item

(category?, desc, …)>+

Ending mark for $a/category is not helpful

for $c in $a/item[category]return <c>$a/category</c>

Ending mark for $a/category is helpful

Page 17: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Pattern may fail to appear Pattern is required

and The early filtering can be beneficial:

Transitions may happen after ending marks Buffering flags may be raised before ending

marks

Page 18: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

SQO Design

Helpful ending marks identified by our SQO

Three SQO rules designed using Occurrence constraints Exclusive constraints Order constraints

Page 19: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Example SQO Rule

Use occurrence constraint Event-condition-action output by rule

for $a in /auctions/auction, $b in $a/sellerWhere $b/*/phone = “508-1234567”…

<!element seller(primary, secondary, …)>

<!element primary (phone)>

<!element secondary (phone)>

+

Event: second </phone> is encountered in a seller

Condition: $b/*/phone = “508-1234567” not satisfied yet

Action: skip rest computations within current seller element

Page 20: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Outline

SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

Page 21: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Properties of SQO Application

Maximal benefits

Minimal overhead

Page 22: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Maximal Benefit

Definition of “rule independence” Proof of “maximal benefits” given

If rules are all independent, as long as each rule is applied

on each pattern, maximal benefits are ensured

Page 23: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Minimal Overhead: Redundancy

Same pattern redundancy : Multiple ending marks adopted for same pattern

<!element seller ( shipTo?, billTo, url )>

for $a in /auctions/auction, $b in $a/seller[shipTo]…

Query Schema Constraints Ending mark <billTo> for $b/shipTo

<billTo> guarantees to capture failure of /shipTo

Ending mark <url> for $b/shipTo

Redundant

Page 24: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Minimal Overhead: Redundancy?

Parent-child pattern redundancy: ending marks of child patterns early filter parent pattern

for $a in /auctions/auction, $b in $a/seller[shipTo]…

optional<!element auction (seller, bidder)><!element seller (shipTo, billTo?)>

Query Constraints<billTo> for $b/shipTo

<bidder> for $a/seller

required

Can be used to capture failure of $a/seller[shipTo]

Redundant<!element auction (seller, bidder)><!element seller (shipTo, billTo)>

Page 25: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

SQO Application Algorithm Input:

XQuery represented as a tree XML Schema represented as a graph

Processing: Query tree traversed top-down

“maximal benefits” ensured

Tree node applied by local/regional appliers Same pattern redundancy excluded by local applier Parent-child pattern redundancy excluded by regional applier

Output: Event-condition-actions attached to tree nodes

Page 26: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Outline

SQO Technique Design Guideline SQO Application Execution of Optimized Plan Experimentations

Page 27: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Encoding ECAs in Automata

E: push-in or pop-out of state C: pattern result buffer checked A: actions include:

Suspend computations by removing automata transitions

Clean up result generated within current context element

Prepare for recovering computation for next context element (e.g., backup transitions)

Page 28: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Example: ECAs in Automata

0 1 2

9

5

auctionsauction

shipTo

item

seller

3

10

13sameAddr

(1, startTag, none,state 2)

Event: 1st <sameAddr> encounteredCondition: noneAction: cut all transitions from 1. q22. States reachable via : q33. States between q2 and q13: q9

…<auction> <seller>

primary, secondary

11 12phone

(…, state 3)

<sameAddr> </sameAddr>

<item> </item>

<primary> </primary>

for $a in /auctions/auction, $b in $a/seller[shipTo]where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item …</auction>

Page 29: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Outline

SQO technique design guideline SQO application Execution of optimized plan Experimentations

Page 30: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Optimization Effected by ?

0

1

2

3

4

5

0% 25% 50% 75% 100%

Selectivity of the Pattern with Ending Marks

Exe

cutio

n Ti

me

Rat

io:

with

out S

QO

/ w

ith S

QO Minor Unit

Gain

MediumUnit Gain

Major UnitGain

How often pattern fails (pattern selectivity)

• How much gain each early filtering brings (unit gain)

Page 31: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Necessity of Design Guideline

Selectivity of Pattern with the Only Useful Ending Mark

Plan without SQO

Plan with SQO (1 ending mark)

Plan with SQO but no guideline considered (30 ending marks)

Page 32: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Conclusion

First SQL on streaming XML Support SQO on nested XQuery with “*” or “//” Offer criteria of “useful” constraints Ensure maximal benefits and minimal overhead in SQO

application Provide execution strategy in widely-used automata-

based model Implement SQO optimizer in Raindrop system (VLDB’04

demo) Experimentally demonstrate SQO brings significant

improvement with little overhead

Page 33: Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.

Visit our XQuery engine over XML stream

project (RAINDROP) website

http://davis.wpi.edu/dsrg/raindrop/

Supported by USA National Science Foundation and IBM PhD Fellowship