1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of...
-
Upload
tracey-stanley -
Category
Documents
-
view
214 -
download
2
Transcript of 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of...
![Page 1: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/1.jpg)
1
Le Thi Thu Thuy*, Doan Dai Duong* , Virendrakumar C. Bhavsar* and Harold Boley**
*Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
{Thuy_Thi_Thu.Le, Duong_Dai.Doan, bhavsar}@unb.ca
**Institute for Information Technology - e-Business, NRC, Fredericton, NB, Canada
A Bottom-up Strategy for Query Decomposition
First IEEE International Conference on Digital Information Management (ICDIM)
December 6-8, 2006
![Page 2: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/2.jpg)
2
Agenda
Introduction Lausen and Marron (LM) Approach Proposed Approach Query Decomposition Algorithm Additional Cases of Input Queries Conclusion
![Page 3: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/3.jpg)
3
Introduction
Utilization of available heterogeneous web data sources is still a demanding task
Automatic retrieval of relevant data from distributed and chaotic sources
Avoid generation of such data from scratch
Global-As-View (GAV) Distributed data sources follow their own schemas Integration system integrates heterogeneous
schemas to a global schema Users interact with the integration system through a
global schema
![Page 4: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/4.jpg)
4
Introduction
Query of a user based on a global schema Cannot be directly employed to query
distributed sources due to different structures of global schema and distributed nature
To access data from distributed sources, global query must be decomposed into subqueries, conforming to structures of distributed sources
Query decomposition plays important role
![Page 5: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/5.jpg)
Users
Integrated data
<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>
query<Fname>><national/></student>
<student source=“A"><Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>
</student><student source=“A">
<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>
</student>
System 1<student source="B">
<Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>
</student><student source="B">
<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>
</student>
System 2<student source=“C">
<Fname> Xuan</Fname><room>G26</room><nationality>Vietnam</nationality>
</student><student source=“C">
<Fname>Phuoc</Fname><room>A12</room><nationality>Campuchia</nationality>
</student>
System n
s
n r c
s
n r c
s
n r c
xxx
xx xx
x x
xx
QUERY DECOMPOSITION
DATACONVERSION
Query nQuery 1<Fname>><national/></student>
Query 2<Fname>><national/></student>
<Fname>><national/></student>
<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>
<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>
<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>
Result 1 Result 2 Result n
<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>
<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>
<studenewrrwerr"><Fname>><room/rrrrrrrrrrr><national/rrewe></studeerewrewnt>
Result 2 Result nResult 1
Mappings are
needed
General Scenario for Query Decomposition of Distributed Databases
5
![Page 6: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/6.jpg)
6
Problems with Mappings
Building mappings is a difficult task Mappings are normally handcrafted
Can we decompose a global query into subqueries without mappings ?
![Page 7: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/7.jpg)
7
Lausen and Marron (LM02) Approach
XML data sources Use XPath query
Qglobal='/p1/p2/…/pi/…/pn-1/pn'
Decompose global query into subqueries without mappings
Use top-down approach Process from top (root node) of a tree (schema) to its
bottom (leaf nodes)
Process global query from left to right (P1 Pn)G. Lausen and P.J. Marron, “Adaptive evaluation techniques for querying XML-based E-Catalogs,” DBLP, 2002, pp. 19-28.
![Page 8: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/8.jpg)
8
Proposed Approach
XML data sources Use XPath query
Qglobal='/p1/p2/…/pi/…/pn-1/pn'
Decompose global query into subqueries without mappings
Use bottom-up approach Process from bottom (leaf nodes) of a tree (schema)
to the top (root node)
Process global query from right to left (Pn P1)
![Page 9: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/9.jpg)
a. Global schema
b. SESP schema
c. BIGGER schema
Fig. 1. Example of a global schema and two local schemas (from LM)
XPath query based on global schema
Qglobal='/p1/p2/…/pi/…/pn-1/pn'
Qglobal= '/department/mobile/products/jammer[price<200]'
Find QSESP AND QBIGGER for schemas SESP and BIGGER ?
QSESP ='/products/jammer[price<200]'
QBIGGER='/department/mobile/jammer[price<200]'9
Example
![Page 10: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/10.jpg)
10
Query Decomposition Algorithm
Given Qglobal='/p1/p2/…/pi/…pn-1/pn'
Take rightmost part pn to evaluate If pn is not found in local schema no subquery
for schema. Stop the algorithm Else, pn is found at a node in the tree (local
schema), mark that node so that the next search will only be performed on its ancestor nodes
Sequentially, consider pi (i=(n-1),...1)
of the query for evaluation
![Page 11: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/11.jpg)
Check(Pi, Anchor)
Pi exists in the local schema S
from Anchor up to the root node
Pi is matched with the root of S
Subquery:='/'Subquery
Subquery:=Pi'/'Subquery
Anchor:=father(Pi)in S
Yes
i>1
No
Yes
Anchor:=LeftmostLeafNodeSubquery:=''i:= |Qglobal|
Stop
Subquery=''Yes
No
Subquery:=Pi Pi=Anchor
Yes
Subquery:=Pi'//'Subquery
No
Return Subquery
Yes
No
No
Subquery=''
i=1
Subquery:='//'Subquery
Yes
No Yes
No
i := i-1
Flowchart of the algorithm
11
![Page 12: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/12.jpg)
12
XPath queries contain constraints (filter expressions)Qglobal :=
'/department/mobile/products/jammer[price<200]'
Idea Examine price before jammer. Avoid transforming the whole query if price
does not exist in local schema
Considerable reduction in execution time
Additional Cases of Input Queries
(Constraints in Queries)
![Page 13: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/13.jpg)
13
Additional Cases of Input Queries
(Constraints in Queries)
In this case, no subquery for local schema from the global query
Qglobal :=
'/department/mobile/products/jammer[price<200]'
products
jammer
namecompany
Fig. 2. Local schema without price leaf node (adapted from LM)
![Page 14: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/14.jpg)
14
Algorithm Analysis
We evaluate right to left parts of the input query and from bottom to top of the XML tree
Worst case No subquery for a local schema The rightmost part Pn of global query has to be
compared to all nodes of local schema Time complexity
for a query having n parts to full k-ary tree of height h
Numbers of nodes in full k-ary tree of height h
![Page 15: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/15.jpg)
15
Algorithm Analysis
Best case The rightmost part pn matches with a leaf node of
tree at first The above is true for all pi nodes at upper levels
of tree. Time complexity
for a query having n parts to a full k-ary tree of height h min(n,h) because algorithm stops when
all n parts of Qglobal are processed
all nodes from bottom to top of the tree (with height h) are traversed
![Page 16: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/16.jpg)
16
LM Approach – Algorithm Analysis
Transform XPath query
Qglobal='/p1/p2/…/pi/…/pn-1/pn‘
into local subqueries for local schemas At each pi, to evaluate pi for a binary tree of
height h,three operators are used compute and select suitable elements from global query to form local subqueries
No transformation: 1 unit time Subquery generalization: 2h+1-1 unit times Subquery elimination: 1 unit time
![Page 17: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/17.jpg)
17
LM Approach: Algorithm Analysis (cont.)
Time complexity to evaluate the whole query
Time complexity of the algorithm for a query having n parts given a full k-ary tree of height h
![Page 18: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/18.jpg)
18
Comparison
LM Algorithm Our algorithm
Worst case
Best case
![Page 19: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/19.jpg)
19
Conclusion
Proposed an efficient bottom-up algorithm for query decomposition without predefined mappings
Global query is efficiently processed based on its constraints
Our algorithm can be extended to work not only with XPath queries, but also with general path expressions like those in Object-Oriented Databases
![Page 20: 1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,](https://reader036.fdocuments.in/reader036/viewer/2022082818/56649f135503460f94c26c2b/html5/thumbnails/20.jpg)
20