1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of...

20
1 Le Thi Thu Thuy*, Doan Dai Duong* , Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada {Thuy_Thi_Thu.Le, Duong_Dai.Doan, bhavsar}@unb.ca **Institute for Information Technology - e-Business, NRC, Fredericton, NB, Canada [email protected] A Bottom-up Strategy for Query Decomposition First IEEE International Conference on Digital Information Management (ICDIM) December 6-8, 2006

Transcript of 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of...

Page 1: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

1

Le Thi Thu Thuy*, Doan Dai Duong* , Virendrakumar C. Bhavsar* and Harold Boley**

*Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada

{Thuy_Thi_Thu.Le, Duong_Dai.Doan, bhavsar}@unb.ca

**Institute for Information Technology - e-Business, NRC, Fredericton, NB, Canada

[email protected]

A Bottom-up Strategy for Query Decomposition

First  IEEE International Conference on Digital Information Management (ICDIM)

December 6-8, 2006

Page 2: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

2

Agenda

Introduction Lausen and Marron (LM) Approach Proposed Approach Query Decomposition Algorithm Additional Cases of Input Queries Conclusion

Page 3: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

3

Introduction

Utilization of available heterogeneous web data sources is still a demanding task

Automatic retrieval of relevant data from distributed and chaotic sources

Avoid generation of such data from scratch

Global-As-View (GAV) Distributed data sources follow their own schemas Integration system integrates heterogeneous

schemas to a global schema Users interact with the integration system through a

global schema

Page 4: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

4

Introduction

Query of a user based on a global schema Cannot be directly employed to query

distributed sources due to different structures of global schema and distributed nature

To access data from distributed sources, global query must be decomposed into subqueries, conforming to structures of distributed sources

Query decomposition plays important role

Page 5: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

Users

Integrated data

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

query<Fname>><national/></student>

<student source=“A"><Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>

</student><student source=“A">

<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>

</student>

System 1<student source="B">

<Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>

</student><student source="B">

<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>

</student>

System 2<student source=“C">

<Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>

</student><student source=“C">

<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>

</student>

System n

s

n r c

s

n r c

s

n r c

xxx

xx xx

x x

xx

QUERY DECOMPOSITION

DATACONVERSION

Query nQuery 1<Fname>><national/></student>

Query 2<Fname>><national/></student>

<Fname>><national/></student>

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

Result 1 Result 2 Result n

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>

Result 2 Result nResult 1

Mappings are

needed

General Scenario for Query Decomposition of Distributed Databases

5

Page 6: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

6

Problems with Mappings

Building mappings is a difficult task Mappings are normally handcrafted

Can we decompose a global query into subqueries without mappings ?

Page 7: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

7

Lausen and Marron (LM02) Approach

XML data sources Use XPath query

Qglobal='/p1/p2/…/pi/…/pn-1/pn'

Decompose global query into subqueries without mappings

Use top-down approach Process from top (root node) of a tree (schema) to its

bottom (leaf nodes)

Process global query from left to right (P1 Pn)G. Lausen and P.J. Marron, “Adaptive evaluation techniques for querying XML-based E-Catalogs,” DBLP, 2002, pp. 19-28.

Page 8: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

8

Proposed Approach

XML data sources Use XPath query

Qglobal='/p1/p2/…/pi/…/pn-1/pn'

Decompose global query into subqueries without mappings

Use bottom-up approach Process from bottom (leaf nodes) of a tree (schema)

to the top (root node)

Process global query from right to left (Pn P1)

Page 9: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

a. Global schema

b. SESP schema

c. BIGGER schema

Fig. 1. Example of a global schema and two local schemas (from LM)

XPath query based on global schema

Qglobal='/p1/p2/…/pi/…/pn-1/pn'

Qglobal= '/department/mobile/products/jammer[price<200]'

Find QSESP AND QBIGGER for schemas SESP and BIGGER ?

QSESP ='/products/jammer[price<200]'

QBIGGER='/department/mobile/jammer[price<200]'9

Example

Page 10: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

10

Query Decomposition Algorithm

Given Qglobal='/p1/p2/…/pi/…pn-1/pn'

Take rightmost part pn to evaluate If pn is not found in local schema no subquery

for schema. Stop the algorithm Else, pn is found at a node in the tree (local

schema), mark that node so that the next search will only be performed on its ancestor nodes

Sequentially, consider pi (i=(n-1),...1)

of the query for evaluation

Page 11: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

Check(Pi, Anchor)

Pi exists in the local schema S

from Anchor up to the root node

Pi is matched with the root of S

Subquery:='/'Subquery

Subquery:=Pi'/'Subquery

Anchor:=father(Pi)in S

Yes

i>1

No

Yes

Anchor:=LeftmostLeafNodeSubquery:=''i:= |Qglobal|

Stop

Subquery=''Yes

No

Subquery:=Pi Pi=Anchor

Yes

Subquery:=Pi'//'Subquery

No

Return Subquery

Yes

No

No

Subquery=''

i=1

Subquery:='//'Subquery

Yes

No Yes

No

i := i-1

Flowchart of the algorithm

11

Page 12: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

12

XPath queries contain constraints (filter expressions)Qglobal :=

'/department/mobile/products/jammer[price<200]'

Idea Examine price before jammer. Avoid transforming the whole query if price

does not exist in local schema

Considerable reduction in execution time

Additional Cases of Input Queries

(Constraints in Queries)

Page 13: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

13

Additional Cases of Input Queries

(Constraints in Queries)

In this case, no subquery for local schema from the global query

Qglobal :=

'/department/mobile/products/jammer[price<200]'

products

jammer

namecompany

Fig. 2. Local schema without price leaf node (adapted from LM)

Page 14: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

14

Algorithm Analysis

We evaluate right to left parts of the input query and from bottom to top of the XML tree

Worst case No subquery for a local schema The rightmost part Pn of global query has to be

compared to all nodes of local schema Time complexity

for a query having n parts to full k-ary tree of height h

Numbers of nodes in full k-ary tree of height h

Page 15: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

15

Algorithm Analysis

Best case The rightmost part pn matches with a leaf node of

tree at first The above is true for all pi nodes at upper levels

of tree. Time complexity

for a query having n parts to a full k-ary tree of height h min(n,h) because algorithm stops when

all n parts of Qglobal are processed

all nodes from bottom to top of the tree (with height h) are traversed

Page 16: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

16

LM Approach – Algorithm Analysis

Transform XPath query

Qglobal='/p1/p2/…/pi/…/pn-1/pn‘

into local subqueries for local schemas At each pi, to evaluate pi for a binary tree of

height h,three operators are used compute and select suitable elements from global query to form local subqueries

No transformation: 1 unit time Subquery generalization: 2h+1-1 unit times Subquery elimination: 1 unit time

Page 17: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

17

LM Approach: Algorithm Analysis (cont.)

Time complexity to evaluate the whole query

Time complexity of the algorithm for a query having n parts given a full k-ary tree of height h

Page 18: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

18

Comparison

LM Algorithm Our algorithm

Worst case

Best case

Page 19: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

19

Conclusion

Proposed an efficient bottom-up algorithm for query decomposition without predefined mappings

Global query is efficiently processed based on its constraints

Our algorithm can be extended to work not only with XPath queries, but also with general path expressions like those in Object-Oriented Databases

Page 20: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,

20