Evaluation of Partial Path Queries on XML Data
Stefanos Souldatos (NTUA, GREECE)Xiaoying Wu (NJIT, USA)Dimitri Theodoratos (NJIT, USA)Theodore Dalamagas (NTUA, GREECE)Timos Sellis (NTUA, GREECE)
Partial path queries
Query processing
Query evaluation
Experiments
Conclusion
Evaluation of Partial Path Queries on XML Data
3
Difficulties on Querying XML Data
Creta
theHotel.grtheHotel.gr
CretaCreta
CityCity
CityCity
ChaniaChania
IslandIsland
AthensAthens
IslandIslandLocationLocation
PorosPoros
CityCity
HeraklioHeraklioCenterCenter
Athens Creta
4
Difficulties on Querying XML Data
Creta
Search problemName: Xiaoying WuPlace: Athens Center, HeraklioPurpose: Sightseeing
Problem:
structural difference
Search problemName: Xiaoying WuPlace: Athens Center, HeraklioPurpose: Sightseeing
Problem:
structural difference
Parthenon (438 BC)
Phaistos’ Disk (1700
BC)
theHotel.grtheHotel.gr
CretaCreta
CityCity
CityCity
ChaniaChania
IslandIsland
AthensAthens
IslandIslandLocationLocation
PorosPoros
CityCity
HeraklioHeraklioCenterCenter
Athens Creta
5
Difficulties on Querying XML Data
Creta
Search problemName: Theodore DalamagasPlace: IslandsPurpose: Sea sports
Problem:
structural inconsistency
Search problemName: Theodore DalamagasPlace: IslandsPurpose: Sea sports
Problem:
structural inconsistency
theHotel.grtheHotel.gr
CretaCreta
CityCity
CityCity
ChaniaChania
IslandIsland
AthensAthens
IslandIslandLocationLocation
PorosPoros
CityCity
HeraklioHeraklioCenterCenter
Athens Creta
Windsurf
Jet ski
6
Difficulties on Querying XML Data
Creta
Search problem Name: Dimitri TheodoratosPlace: HeraklioPurpose: HDMS Conference
Problem:
unknown structure
Search problem Name: Dimitri TheodoratosPlace: HeraklioPurpose: HDMS Conference
Problem:
unknown structure
theHotel.grtheHotel.gr
CretaCreta
CityCity
CityCity
ChaniaChania
IslandIsland
AthensAthens
IslandIslandLocationLocation
PorosPoros
CityCity
HeraklioHeraklioCenterCenter
Athens Creta
HDMS 2008
7
Difficulties on Querying XML Data
Creta
theHotel.grtheHotel.gr
Search problem Name: Stefanos SouldatosPlace: Any islandPurpose: Escape from PhD!
Problem:
multiple sources
Search problem Name: Stefanos SouldatosPlace: Any islandPurpose: Escape from PhD!
Problem:
multiple sources
hotels.grhotels.gr
holidays.grholidays.gr
1400 islands
8
Difficulties on Querying XML Data
Creta
theHotel.grtheHotel.gr
CretaCreta
CityCity
CityCity
ChaniaChania
IslandIsland
AthensAthens
IslandIslandLocationLocation
PorosPoros
CityCity
HeraklioHeraklioCenterCenter
Athens Creta
Can we use existing query languages (XPath, XQuery) to express our queries?
Can we use existing techniques to evaluate our queries?
9
Path Queries in XPath
theHotel.grtheHotel.gr
CityCity IslandIsland
partial path queries
theHotel.grtheHotel.gr
CityCity
IslandIsland
theHotel.grtheHotel.gr
CityCity
IslandIsland
//theHotel.gr [descendant-or-self::*[ancestor-or-self::City] [ancestor-or-self::Island]]
/theHotel.gr/City//Island//theHotel.gr//City [descendant-or-self::*[ancestor-or-self::Island]]
no structure(keywords)
full structure(path patterns)
10
Partial Path Queries
root node (optional)
query node labelled by “a”
child relationship
descendant relationship
r
aa
b
r
c
da
c
partial path query
11
Partial Path Queries
a
b
r
c
da
cQUERY
PROCESSING a
b
r
c
d
a
partial path query partial path query
in canonical form
QUERYEVALUATION
Evaluation of Partial Path Queries on XML Data
Partial path queries
Query processing
Query evaluation
Experiments
Conclusion
13
Query Processing
a
b
r
c
da
c
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
14
Query Processing
a
b
r
c
da
c
IR1
INFERENCE RULES(IR1) |- r//ai
(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z
x,y,z,w: query nodesai/bj: nodes labelled by a/b
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
15
Query Processing
a
b
r
c
da
cIR4
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
INFERENCE RULES(IR1) |- r//ai(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z
x,y,z,w: query nodesai/bj: nodes labelled by a/b
16
Query Processing
a
b
r
c
da
c
IR4
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
INFERENCE RULES(IR1) |- r//ai(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z
x,y,z,w: query nodesai/bj: nodes labelled by a/b
17
Query Processing
a
b
r
c
d
a
c
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
INFERENCE RULES(IR1) |- r//ai(IR2) x/y |- x//y (IR3) x//y, y//z |- x//z(IR4) x/ai, x//bj |- ai//bj(IR5) ai/x, bj//x |- bj//ai(IR6) x/y, y/w, x//z, z//w |- x/z(IR7) x/y, x//z, w/z, w//y |- x/z(IR8) x/y, y/w, x/z |- z/w(IR9) x//y, y//w, x/z |- z//w(IR10) x/y, w/y, w/z |- x/z(IR11) x//y, w/y, w//z |- x//z(IR12) x/y, y/w, z/w |- x/z(IR13) x//y, y//w, z/w |- x//z
x,y,z,w: query nodesai/bj: nodes labelled by a/b
18
Query Processing
a
b
r
c
d
a
c
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
yx
A query is unsatisfiable if its full form contains a trivial
cycle:
19
Query Processing
c
a
b
r
c
d
a
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
yx
y
yz
y
yx
yz
yx
y
zy
A node y is redundant if one of the following patterns occur:
a)
b)
c)
d)
20
Query Processing
a
b
r
c
d
a
1. Full form2. Satisfiability3. Redundant nodes4. Canonical form
canonical form of satisfiable query
=full form
– IR2 – IR3 – redundant nodes
canonical form of satisfiable query
=full form
– IR2 – IR3 – redundant nodes
The canonical form of a query is a directed acyclic graph
(dag)
Evaluation of Partial Path Queries on XML Data
Partial path queries
Query processing
Query evaluation
Experiments
Conclusion
22
Evaluation Algorithms
Based on PathStack [Bruno et al. ’02]
Produce all possible path queries… Decompose into root-to-leaf paths… PartialMJ: Decompose a spanning tree into paths…
Extending PathStack [Bruno et al. ’02]
PartialPathStack: Produce a topological order of the query nodes and extend PathStack to handle it…
24
Based on PathStack
dc
e
b
r
a
g
fd
c
e
b
r
a
g
fd
c
e
b
r
a
g
f
c
e
b
r
a
d
g
fd
c
e
b
r
a
g
f
1. Producing all possible path queries…
25
Based on PathStack
d
c
e
b
r
a
g
f
c
e
b
r
a
d
g
fd
c
e
b
r
a
g
f
d
c
e
b
r
a
g
f
d
c
b
r
a
e
g
f
1. Producing all possible path queries…
26
Based on PathStack
c
e
b
r
a
d
g
f
Problems:
too many queries to evaluate
multiple traversal of the XML tree
1. Producing all possible path queries…
27
b
r
a
d
g
f
r
ac
de
Based on PathStack
2. Decomposing into root-to-leaf paths…
b
r
a
de
r
ac
d
g
f
28
Based on PathStack
2. Decomposing into root-to-leaf paths…
b
r
a
d
g
f
r
ac
de
b
r
a
de
r
ac
d
g
f
PathStack
29
b
r
a
d
g
f
r
ac
de
Based on PathStack
2. Decomposing into root-to-leaf paths…
b
r
a
de
r
ac
d
g
fProblems:
path overlaps
more than one components to evaluate
intermediate results
30
Based on PathStack
PartialMJ. Using a spanning tree…
Remove edges to create a spanning tree
b
r
a
d
g
f
r
acb
r
a
de
31
Based on PathStack
PartialMJ. Using a spanning tree…
b
r
a
d
g
f
r
acb
r
a
de
c
e
b
r
a
d
g
f
32
Based on PathStack
PartialMJ. Using a spanning tree…
b
r
a
d
g
f
r
acb
r
a
de
c
e
b
r
a
d
g
f
PathStack
33
Based on PathStack
PartialMJ. Using a spanning tree…
b
r
a
d
g
f
r
acb
r
a
de
c
e
b
r
a
d
g
f
Join conditions (identity, structural, path)
34
Based on PathStack
PartialMJ. Using a spanning tree…
b
r
a
d
g
f
r
acb
r
a
de
c
e
b
r
a
d
g
f
Join conditions (identity, structural, path)
35
Based on PathStack
PartialMJ. Using a spanning tree…
b
r
a
d
g
f
r
acb
r
a
de
c
e
b
r
a
d
g
f
Join conditions (identity, structural, path)
36
Based on PathStack
PartialMJ. Using a spanning tree…
b
r
a
d
g
f
r
acb
r
a
de
c
e
b
r
a
d
g
f
37
Based on PathStack
PartialMJ. Using a spanning tree…
c
e
b
r
a
d
g
f Problems:
path overlaps
more than one components to evaluate
intermediate results
38
Extending PathStack
dc
e
b
r
a
g
f
PartialPathStack. Employ a topological order…
c
e
b
r
a
d
g
f
39
Extending PathStack
PartialPathStack. Employ a topological order…
c
e
b
r
a
d
g
fd
c
e
b
r
a
g
f
PartialPathStack
40
PartialPathStack Examplequerytree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
db
r
a
c esink
nodes
results
41
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r
sink nodes
results
42
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
results
43
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1 b1
sink nodes
results
44
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1 b1 d1
sink nodes
results
45
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1 b1 d1 c1
sink nodes
results
46
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
results
OUTPUT!!!
47
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
results
OUTPUT!!!
48
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
results
OUTPUT!!!
49
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
results
OUTPUT!!!
50
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
results
OUTPUT!!!
51
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
OUTPUT!!!
results
ra1b1d1c1e1
52
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
results
ra1b1d1c1e1
d2
53
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2 c2
OUTPUT!!!
results
ra1b1d1c1e1
54
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2 c2
OUTPUT!!!
results
ra1b1d1c1e1
55
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2 c2
OUTPUT!!!
results
ra1b1d1c1e1
56
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2 c2
OUTPUT!!!
results
ra1b1d1c1e1
57
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2 c2
OUTPUT!!!
results
ra1b1d1c1e1
58
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2 c2
OUTPUT!!!
results
ra1b1d1c1e1
ra1b1d1c2e1
59
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2 c2
results
ra1b1d1c1e1
ra1b1d1c2e1
60
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2
results
ra1b1d1c1e1
ra1b1d1c2e1
e2
OUTPUT!!!
61
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2
results
ra1b1d1c1e1
ra1b1d1c2e1
e2
OUTPUT!!!
62
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2
results
ra1b1d1c1e1
ra1b1d1c2e1
e2
OUTPUT!!!
63
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2
results
ra1b1d1c1e1
ra1b1d1c2e1
e2
OUTPUT!!!
64
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2
results
ra1b1d1c1e1
ra1b1d1c2e1
e2
OUTPUT!!!
65
PartialPathStack Exampletree
Sr Sa Sb Sd Sc Se
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
query
db
r
a
c e
r a1
sink nodes
b1 d1 c1 e1
d2
results
ra1b1d1c1e1
ra1b1d1c2e1
ra1b1d1c1e2
e2
OUTPUT!!!
66
PartialPathStack Examplequerytree
d2
e1
c1
d1
c2 e2
d1
b1
a1
r
db
r
a
c e
results
ra1b1d1c1e1
ra1b1d1c2e1
ra1b1d1c1e2
only one component to evaluate
no intermediate results
67
Evaluation Algorithms
Problems:
Algorithm:
Many queries /
components to evaluate
Path overlaps
Intermediate results
Produce all path queries…
Decompose into paths…
PartialMJ (spanning tree)
PartialPathStack
68
PartialPathStack vs PathStack
PathStack• Path queries• Indegree = 1• Outdegree = 1• O(input + output)
d
c
e
b
r
a
g
f
d
c
e
b
r
a
g
f
PartialPathStack• Partial path queries• Indegree > 1• Outdegree > 1• O(input*indegree + output*outdegree)
Evaluation of Partial Path Queries on XML Data
Partial path queries
Query processing
Query evaluation
Experiments
Conclusion
70
Queries Used in the Experiments
d
c
e
b
r
a
f
d
c
eb
r
a
f d
e
r
a
fc
b
d
e
r
a
fc
b
Q1/Q5 Q2/Q6 Q3/Q7 Q4/Q8
71
Experiment 1
Execution time on Treebank…2.5 million nodes
72
Experiment 1
path queries
Execution time on Treebank…2.5 million nodes
73
Experiment 1
too many results
Execution time on Treebank…2.5 million nodes
74
Experiment 1
2.5 million nodes(IBM AlphaWorks
XML generator)
Execution time on Synthetic data…
75
Experiment 2
PartialMJ
PartialPathStack
PartialPathStack
PartialMJ
PartialPathStack
PartialMJ
Q2
Q3 Q7
Execution time varying the size of the XML tree…(1 - 3 million nodes)
Evaluation of Partial Path Queries on XML Data
Partial path queries
Query processing
Query evaluation
Experiments
Conclusion
77
Conclusion
Evaluation Containment
Heuristics for
Containment
Partial Path Queries CIKM ’07 SSDBM ’06 CIKM ’06
Queries with repetitions
? SSDBM ’06 CIKM ’06
Partial Tree Queries ? SSDBM ’06 CIKM ’06
Questions?
Partial path queries
Query processing
Query evaluation
Experiments
Conclusion
Top Related