Lazy Query Evaluation for Active XML

46
1 UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda INRIA Futurs presented by: Grigoris Karvounarakis Univ. of Pennsylvania CIS 650 October 14, 2004

description

Lazy Query Evaluation for Active XML. Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda INRIA Futurs. presented by: Grigoris Karvounarakis Univ. of Pennsylvania CIS 650 October 14, 2004. Active XML. function nodes. - PowerPoint PPT Presentation

Transcript of Lazy Query Evaluation for Active XML

Page 1: Lazy Query Evaluation for Active XML

1

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Lazy Query Evaluation for Active XMLAbiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda

INRIA Futurs

presented by: Grigoris Karvounarakis Univ. of Pennsylvania CIS 650

October 14, 2004

Page 2: Lazy Query Evaluation for Active XML

CIS 650 2

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Active XML

function nodes

Page 3: Lazy Query Evaluation for Active XML

CIS 650 3

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Tree Pattern Queries

result nodes

descendant edge

Page 4: Lazy Query Evaluation for Active XML

CIS 650 4

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Tree Pattern Queries

Similar to Pattern Trees from TAX/TLC algebra+ variable nodes, used to bind variables to sub-trees(variable nodes with the same name must be mapped to elements with the same tag name)

+ result nodes Embedding (of a query q into a doc d) = Match Result of embedding = bindings of output

variables on witness tree

Page 5: Lazy Query Evaluation for Active XML

CIS 650 5

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

No embedding …

Page 6: Lazy Query Evaluation for Active XML

CIS 650 6

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

No embedding …

… but if we evaluate

1

Page 7: Lazy Query Evaluation for Active XML

CIS 650 7

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Embedding Example

Page 8: Lazy Query Evaluation for Active XML

CIS 650 8

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Embedding Example

Page 9: Lazy Query Evaluation for Active XML

CIS 650 9

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Embedding Example

X Y

Page 10: Lazy Query Evaluation for Active XML

CIS 650 10

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Relevant rewriting

(getNearbyRestos) is a relevant function node

In general, a function node is relevant, if there exists some rewriting of the document where some of the nodes it produces belongs to a match

Rewriting the document by invoking relevant function nodes produces relevant rewritings d1 !v1 d2 !v2 … dn

A document that contains no calls that are relevant to a query q is said to be complete for q

1

Page 11: Lazy Query Evaluation for Active XML

CIS 650 11

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Problem definition

Given an Active XML document d and a query q, find an efficient way to evaluate the query over the document

Naïve approach: interleave query evaluation with function calls

Better: try to compute (a superset of) the relevant functions calls for q and execute q over the rewriting of d (that results from executing these function calls)

Page 12: Lazy Query Evaluation for Active XML

CIS 650 12

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Problem definition

Given an Active XML document d and a query q, find an efficient way to evaluate the query over the document

Naïve approach: interleave query evaluation with function calls Better: try to compute (a superset of) the relevant functions

calls for q and execute q over the rewriting of d (that results from executing these function calls)

Efficiency tradeoff time to compute approximation of set of relevant functions

(larger for more accurate approx) time to execute the function calls (smaller for more accurate

approx) and time to execute query over resulting rewriting of document (smaller document for more accurate approx)

Page 13: Lazy Query Evaluation for Active XML

CIS 650 13

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Outline

Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

Page 14: Lazy Query Evaluation for Active XML

CIS 650 14

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Linear Path Queries

/*()

/nyHotels/*()

/nyHotels/hotel/*()

/nyHotels/hotel/name/*()

/nyHotels/hotel/rating/*()

/nyHotels/hotel/nearby/*()

/nyHotels/hotel/nearby//*()

/nyHotels/hotel/nearby//restaurant/*()

/nyHotels/hotel/nearby//restaurant/name/*()

/nyHotels/hotel/nearby//restaurant/address/*()

/nyHotels/hotel/nearby//restaurant/rating/*()

Page 15: Lazy Query Evaluation for Active XML

CIS 650 15

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Linear Path Queries

Correct, but usually inaccurate Ignores filtering conditions in the path from the root or in other branches that could make some of the functions irrelevant (e.g. there is no chance that a getNearbyRestos() function node under a hotel is relevant, if the hotel rating is not “*****”)

Page 16: Lazy Query Evaluation for Active XML

CIS 650 16

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Node Focused Queries

For each node in the query tree, replace it with an OR node (to add a branch *() to match any functions, similarly with LPQs)

Then, for every node v in the resulting query tree, create qv = q – {v and its subtree}, with output node fv pointing at the position of the *() OR-sibling of v

Each such query tree involves the path from the root to the node (as in LPQ) + any parts of the tree that would have to be matched anyway, for the whole query tree to match.

Page 17: Lazy Query Evaluation for Active XML

CIS 650 17

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

NFQ Example

nyHotels

hotel

name nearby

“Best Western”“*****”restaurant

name address

rating

rating

“*****”

X Y

*

*

* *

*

*

* * *

*

Page 18: Lazy Query Evaluation for Active XML

CIS 650 18

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

nyHotels

hotel

name nearby

“Best Western”“*****”restaurant

name address

rating

rating

“*****”

X Y

*

*

* *

*

*

* * *

*

NFQ Example

Page 19: Lazy Query Evaluation for Active XML

CIS 650 19

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

nyHotels

NFQ Example

*

Page 20: Lazy Query Evaluation for Active XML

CIS 650 20

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

nyHotels

NFQ Example

*

Page 21: Lazy Query Evaluation for Active XML

CIS 650 21

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

nyHotels

*

NFQ Example

Page 22: Lazy Query Evaluation for Active XML

CIS 650 22

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

nyHotels

hotel

name nearby

“*****”restaurant

name address

rating

rating

“*****”

X Y

*

*

* *

*

*

* * *

*

Another NFQ Example

“Best Western”

Page 23: Lazy Query Evaluation for Active XML

CIS 650 23

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Another NFQ Example

nyHotels

hotel

name nearby

“*****”

rating

*

*

* *

*

*

*

“Best Western”

Page 24: Lazy Query Evaluation for Active XML

CIS 650 24

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Another NFQ Example

nyHotels

hotel

name nearby

“*****”

rating

*

*

* *

*

*

*

“Best Western”

Page 25: Lazy Query Evaluation for Active XML

CIS 650 25

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Another NFQ Example

nyHotels

hotel

name

nearby

“*****”

rating*

* *

**

“Best Western”

Page 26: Lazy Query Evaluation for Active XML

CIS 650 26

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Node Focused Queries

Assuming that functions can return data of arbitrary type, the function nodes that are relevant for a query q are precisely the ones retrieved by the NFQs of q

Page 27: Lazy Query Evaluation for Active XML

CIS 650 27

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Outline

Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

Page 28: Lazy Query Evaluation for Active XML

CIS 650 28

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Sequencing relevant calls

Naïve NFQA algorithm:1. Evaluate all NFQs2. Pick one of the returned functions, say fv

3. Evaluate the function and rewrite the document (d !fv d’)

4. Until all NFQs return empty results (i.e., there are no more relevant calls)

After every loop, although the NFQs remain the same, their result can change (since evaluating functions at step 3 above can introduce new function nodes or make some results irrelevant)

Page 29: Lazy Query Evaluation for Active XML

CIS 650 29

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Improving NFQA

“Predict” when NFQ results could not have possibly changed and avoid reevaluating them Identify dependences between NFQs and the effect

of executing functions they return

Page 30: Lazy Query Evaluation for Active XML

CIS 650 30

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Influence of NFQs

nyHotels

*

nyHotels

hotel

name

nearby

“*****”

rating*

* *

**

“Best Western”

NFQ1 NFQ2

NFQ1 can influence NFQ2, but not vice versa

Page 31: Lazy Query Evaluation for Active XML

CIS 650 31

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Influence of NFQs

NFQ1 may influence NFQ2 iff the output function node of NFQ1 is an ancestor (in the query tree) of the output node of NFQ2

Two NFQs belong in the same layer if they may influence (directly or transitively) each other. Inside every layer, we have to reevaluate every NFQ

after every function call Multiple equivalent NFQs (i.e., in the same layer) can

only exist under //– so that, not knowing the output type, both nodes could appear as descendants of each other, e.g. //a, //b: in /a/b, //a matches /a and //b matches /a/b, while in /b/a, //b matches /b and //a matches /b/a

Page 32: Lazy Query Evaluation for Active XML

CIS 650 32

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Influence of NFQs

L1 < L2 iff some NFQ in L1 may influence (directly or transitively) some NFQ in We have to process L1 before L2 (without having to

process L1 again afterwards) When processing L1 has finished, OR-nodes

corresponding to returned functions are redundant and thus NFQs in L2 can be simplified by removing them

Page 33: Lazy Query Evaluation for Active XML

CIS 650 33

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Parallelizing calls

Let qlin be the linear path from the root to the output node of NFQ q, not inclusive (note: qlin is a regular expression)

Two NFQs q, q’ that belong to the same layer are independent iff there are no common words in the regular languages of qlin, q’lin

E.g: //a, //b are independent But //a//c and //b//c are not: (e.g. both match /a/b/c)

If all NFQs in a layer are independent, we can call all functions returned by the same NFQ in a step of NFQA in parallel. Other sufficient conditions could exist, too …

Page 34: Lazy Query Evaluation for Active XML

CIS 650 34

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Outline

Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

Page 35: Lazy Query Evaluation for Active XML

CIS 650 35

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Using types

Use function return type to “predict” shape of data that a function call can return

Similar to check for existence of a possible rewriting If this shape cannot match the (corresponding part of) the query pattern, they can be discarded

In some cases, one can go further and restrict not only the output type but also the specific names of functions that could match

Refined NFQs Use set of function names of appropriate return type instead of *()

Use F-guides (later) to make them even more refined

Page 36: Lazy Query Evaluation for Active XML

CIS 650 36

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Refined NFQ example

nyHotels

hotel

name

nearby

“*****”

rating*

*

**

“Best Western”

*

Page 37: Lazy Query Evaluation for Active XML

CIS 650 37

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Refined NFQ example

nyHotels

hotel

name

nearby

“*****”

rating*

* getRating

getNearbyRestos

*

“Best Western”

Page 38: Lazy Query Evaluation for Active XML

CIS 650 38

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Pushing queries

Similar to pushing selections on scans in relational queries or pushing queries to data sources in mediator systems

Reduce amount of (useless) data that are transferred (assuming functions correspond to remote (web) services), by filtering irrelevant matches and projecting only on output variable nodes

Page 39: Lazy Query Evaluation for Active XML

CIS 650 39

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Outline

Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion

Page 40: Lazy Query Evaluation for Active XML

CIS 650 40

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Lenient rewriting

Trade accuracy for efficiency Use XPath or LPQs instead of NFQ (faster processing)

Use a lenient form of type checking (ignoring order and cardinality of elements)

Page 41: Lazy Query Evaluation for Active XML

CIS 650 41

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Function call guides

Similar to dataguides for function calls One occurrence for each path that leads to some function node + pointers to function nodes

Page 42: Lazy Query Evaluation for Active XML

CIS 650 42

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Function call guides

Similar to dataguides for function calls One occurrence for each path that leads to some function node + pointers to function nodes

paths that don’t lead to functions are left out

Page 43: Lazy Query Evaluation for Active XML

CIS 650 43

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Function call guides

Similar to dataguides for function calls One occurrence for each path that leads to some function node + pointers to function nodes

pointers to getRating calls

pointers to getNearbyRestos, getNearbyMuseums calls

pointers to getHotels calls

Page 44: Lazy Query Evaluation for Active XML

CIS 650 44

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Function call guides

Use F-guides for: Generation of Refined NFQs (use return type within appropriate F-guide part to get only function names that can indeed appear in the corresponding tree fragment)

Efficient approximation of relevant function nodes: evaluate queries (NFQs) on F-guide evaluate queries on original document using LPQs

Initial filtering: Can get rid of NFQs for nodes that don’t have any children in the F-guide

Page 45: Lazy Query Evaluation for Active XML

CIS 650 45

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Conclusions

Active XML: Interesting new area Nothing fundamentally novel Applies known tools (distributed processing, lazy evaluation) in a new context, giving new life to documents

Greatest challenge: formulate the right research questions well

Answers to these well-formulated questions are fairly easy.

Contributions of this paper: Formulates such an interesting question Thorough understanding of different aspects of the problem (accuracy vs. performance and their effect to overall efficiency)

Page 46: Lazy Query Evaluation for Active XML

CIS 650 46

UNIVERSITY of PENNSYLVANIA Grigoris Karvounarakis October 04

Questions?