Suggestion of Promising Result Types for XML Keyword Search
description
Transcript of Suggestion of Promising Result Types for XML Keyword Search
![Page 1: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/1.jpg)
Suggestion of Promising Result Types for XML
Keyword Search
Joint work with Jianxin Li, Chengfei Liu and Rui Zhou (Swinburne University of Technology, Australia)
Wei WangUniversity of New South Wales, Australia
![Page 2: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/2.jpg)
2
OutlineMotivationScoring Result TypesQuery Processing AlgorithmsExperimental StudyConclusions
![Page 3: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/3.jpg)
3
MotivationKeyword query is easy to use for casual users
No need to know a query language or schema of the data
Keyword query is inherently imprecise. How to find relevant results?Browse all relevant results – Impossible or
Unusable!Restrict the results
1. XSEarch, SLCA/ELCA, and their variants2. Return result instances from the most likely query
result type (XReal and our work)
If it has a syntax, it isn't user friendly-- /usr/game/fortune
![Page 4: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/4.jpg)
4
Query Result Types
Query = {1980, art}Result type:•
/root/students/student
• /root/books/bookA label path such that at least one of its corresponding instances contains all the search keywords
Intuition: users want to fetch instances of certain entity type with keyword predicates
![Page 5: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/5.jpg)
5
Ranking
Query = {1980, art}Result type:•
/root/students/student
• /root/books/bookScore each result type, and select the most promising return type (i.e., the one with the highest score)
Subtleties: 1 return type n query templates n*m result instances
![Page 6: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/6.jpg)
6
Scoring Individual Results /1score(result type) =
aggregate( score(instance1),
score(instance2), …)Need to score individual result instance R
1. Not all matches are equal in terms of content Inverse element frequency (ief(x)) = N / # nodes
containing the token x E.g., Weight(ni contains a) = log(ief(a))
a b c
![Page 7: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/7.jpg)
7
Scoring Individual Results /2score(result type) =
aggregate( score(instance1), score(instance2),
…)Need to score individual result instance R
2. Not all matches are equal in terms of structure distance between the match and the root of the
subtree also considers avg-depth of the XML
tree to attenuate the impact of long paths
a
b c
dist=3
![Page 8: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/8.jpg)
8
Scoring Individual Results /3score(result type) =
aggregate( score(instance1),
score(instance2), …)Need to score individual result instance R
3. Favor tightly-coupled results When calculating dist(), discount the shared path
segments
Loosely coupled Tightly coupled
![Page 9: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/9.jpg)
9
Scoring Individual Resultsscore(result type) =
aggregate( score(instance1),
score(instance2), …)Need to score individual result instance R
The final formula
€
Score(R,Q) =
weight(ki) , if dist'(N,ni) = 0;i=1
n
∑
weight(ki)i=1
n
∑ ( dist'(N,nii=1
n
∑ ) −μ1)μ 2 , Otherwise.
⎧
⎨
⎪ ⎪
⎩
⎪ ⎪
![Page 10: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/10.jpg)
10
Scoring Return Typesscore(result type) =
aggregate( score(instance1),
score(instance2), …)aggregate: sum up the top-k instance scores
k = average instance numbers of all query result types
Pad 0.0 if necessary
![Page 11: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/11.jpg)
11
Query Processing AlgorithmsINL (Inverted Node List-based Algorithm)
Merge all relevant nodes and group the merged results by different result types Using inverted index + Dewey encoding
Calculate the score for each result type by using ranking function Only needs to keep the top-k best scores for each
result type
Slow because of no pruning or skipping
![Page 12: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/12.jpg)
12
SDI AlgorithmHow to be more efficient?
Approximately compute the score of each return type
Prune some of the less likely return typesSDI (Statistic Distribution Information-based
Algorithm)Based on several additional indexes:
Keyword-path index, Enhanced F&B index (with distributional info).
Generate query templates by merging distinct pathsEstimate the scores of each query templatesAggregate the scores for each result type
![Page 13: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/13.jpg)
13
Keyword-Path Index & Query TemplatesMaps each keyword to the set of label paths that characterizes
all its occurrences
Merge the label paths to obtain query templatesroot/students/student[born ~ 1980][interest ~ art]root/books/book/titleroot/books/book[year ~ 1980][title ~ art]
Iteratively ascend to its parent label path if a query template has no estimated result
1980 { root/students/student/born, root/books/book/year, root/books/book/title }
art { root/students/student/interest, root/books/book/title }
![Page 14: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/14.jpg)
14
Enhanced F&B Index
More refined XSketch synopsisEstimate size of certain simple queries,
e.g., size(root/books/book[title]) size(root/books/book[name])
Hardly handles correlationsize(root/books/book[title][name])
![Page 15: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/15.jpg)
15
Structural Distribution
size(root/books/book[title][name]) = 0
![Page 16: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/16.jpg)
16
Value Distribution
![Page 17: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/17.jpg)
17
Estimation
/root/books/book 6/root/books/book/year 1/root/books/book/title 1hbook[year][title] = 4/6
f1980|year = 3/5
fart|title = 4/5Final estimation = 1.92
title
root
books
book
year
1980 art
![Page 18: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/18.jpg)
18
RecapSDI
Retrieve the relevant label paths by the keyword-path index
Generate query templates by merging distinct paths
Estimate the scores of each query templatesAggregate the scores for each result type
![Page 19: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/19.jpg)
19
Experiment Setup /1Three real datasets used:
NASA: astronomical data UWM: course data derived from university
websites. DBLP: computer science journals and
proceedings
![Page 20: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/20.jpg)
20
Experiment Setup /218 Keyword queries:
![Page 21: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/21.jpg)
21
Quality of Suggestions
NASA: XReal only focuses on one node at the higher level while INL and SDI can reach to the more detailed nodes.UWM: For Q12 and Q32, INL and SDI can predict more meaningful results than XReal does. For Q42, XReal can do “Better”.DBLP: All methods produce the same results because the structure of DBLP is so flat.
![Page 22: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/22.jpg)
22
Efficiency
• SDI’s speedup against XReal: 3x ~ 10x • Speedup is even more significant on other two datasets
![Page 23: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/23.jpg)
23
ConclusionsAlleviates the inherent imprecision of
keyword queries by scoring their result typesCan only return instances from the most
promising oneOr take such score into consideration in the
final ranking functionEfficient estimation-based method to find
most promising return typesExperimental results demonstrates both the
effectiveness and efficiency of the proposed approach
![Page 24: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/24.jpg)
Q & A
Our Keyword Search Project Homepage: http://www.cse.unsw.edu.au/~weiw/project/SPARK.html
![Page 25: Suggestion of Promising Result Types for XML Keyword Search](https://reader035.fdocuments.in/reader035/viewer/2022062221/56813bfc550346895da54e59/html5/thumbnails/25.jpg)
25
Related Work[Liu & Chen, SIGMOD07]
Classifies XML nodes into one of three node typesHowever, it only identifies a specific return node
type for each result.
XReal [Bao et al, ICDE09]Summarizing the statistic information between
element nodes and all tokens in the leaf nodesIR style method is used to infer the result type
based on the statistics.However, it does not model the correlation among
the XML elements and values.