1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY...
-
Upload
april-cobb -
Category
Documents
-
view
223 -
download
3
Transcript of 1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY...
1
Ranking Inexact AnswersRanking Inexact Answers
2
Ranking IssuesRanking Issues
• When inexact querying is allowed, there may be MANY answers– different answers have a different level of
incompleteness
• Ranking the answers allows the user to quickly see the (hopefully) most relevant answers
• Preference: Create answers in ranking order– Why is this important?
• We will consider several different approaches to this problem
3
Tree Pattern RelaxationTree Pattern Relaxation
Amer-Yahia, Cho, Srivastava
EDBT 2002
4
Tree PatternsTree Patterns
• Queries are tree patterns, as considered in
previous lessons
Book
Collection Editor
Name Address
Double line indicates
descendent
5
Relaxed QueriesRelaxed Queries
• Four types of “relaxations” are allowed on the trees
• Node Generalization: Assume that we know a
relationship of types/super-types among labels.
Allow label to be changed to super-type
Book
Collection Editor
Name Address
Document
Collection Editor
Name Address
6
Relaxed QueriesRelaxed Queries
• Leaf Node Deletion: Delete a leaf node (and its
incoming edge) from the tree
Book
Collection Editor
Name Address
Book
Editor
Name Address
7
Relaxed QueriesRelaxed Queries
• Edge Generalization: Change a parent-child edge
to an ancestor-descendent edge
Book
Collection Editor
Name Address
Book
Editor
Name Address
Collection
8
Relaxed QueriesRelaxed Queries
• Subtree Promotion: A query subtree can be
promoted so that it is directly connected to its
former grandparent by an ancestor-descendent
edgeBook
Collection Editor
Name Address
Book
Editor Name
Address
Collection
9
Composing RelaxationsComposing Relaxations
• Relaxations can be composed. Are the following
relaxations of Q?
Book
Collection Editor
Name Address
QBook
Collection
Book
Collection Address
Name
Document
Address
10
Approximate Answers and RankingApproximate Answers and Ranking
• An approximate answer to Q is an exact answer to a
relaxed query derived from Q
• In order to give different answers different rankings, tree
patterns are weighted
• Each node and edge has 2 weights – value when exactly
satisfied, value when satisfied by a relaxationBook
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(6, 0) (5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
A fragment of a document that
exactly satisfies the query will have a
score of: 45
11
Example RankingExample Ranking
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(6, 0) (5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
Book
Person
Name Address
Details
Sam NY
How much would this
answer score?
12
Example RankingExample Ranking
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(6, 0) (5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
Book
Person
Name Address
Details
Sam NY
How much would this
answer score?
13
Problem DefinitionProblem Definition
Given an XML document D, a weighted tree
pattern Q and a threshold t, find all approximate
answers of Q in D whose scores are ≥ t
• Naive strategy to solve the problem:
– Find all relaxations of Q
– For each relaxation, compute all exact answers
– remove answers with score below t
• Is this a good strategy?
14
Problem DefinitionProblem Definition
Given an XML document D, a weighted tree pattern Q and a threshold t, find all approximate answers of Q in D whose scores are ≥ t
• A better strategy to compute an answer to a relaxation of a query:– Intuition: Compute the query as a series of joins
– Can use stack-merge algorithms (studied before) for computing joins
– filter out intermediate results whose scores are too low
15
The Query PlanThe Query Plan
• We now show the how to derive a plan for
evaluating queries in this setting
• First, we show how an exact plan is derived
• Then, we consider how each individual
relaxation can be added in
• Finally, we show the complete relaxed plan
16
Query Plan: Exact AnswersQuery Plan: Exact Answers
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
Book Collection
Editor
Address
Name
c(Book, Collection)
c(Book, Editor)
c(Editor, Name)
d(Editor, Address)
c(x,y) = y is child of x
d(x,y) = y is descendent of x
(6, 0)
17
Query Plan: Exact AnswersQuery Plan: Exact Answers
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
Book Collection
Editor
Address
Name
c(Book, Collection)
c(Book, Editor)
c(Editor, Name)
d(Editor, Address)
Remember, to compute a join, e.g., of Book and Collection, we actually find the list of Books and the list of Collections (from the index) and perform the stack-merge algorithms
(6, 0)
18
Adding Relaxations into PlanAdding Relaxations into Plan
• Node generalization: Book relaxed to Document
Book Collection
Editor
Address
Namec(Book, Editor)
c(Editor, Name)
d(Editor, Address)
Document
c(Book, Collection)c(Document, Collection)
c(Document, Editor)
19
Adding Relaxations into PlanAdding Relaxations into Plan
• Edge generalization: Relax Editor-Name Edge
Book Collection
Editor
Address
Namec(Book, Editor)
c(Editor, Name)
d(Editor, Address)
c(Book, Collection)
c(Editor, Name) or
(Not exists c(Editor,Name)
and d(Editor, Name((
Written in short as:c(Editor, Name) or
d(Editor, Name(
We only allow relaxations when a direct child does
not exist
20
Adding Relaxations into PlanAdding Relaxations into Plan
• Subtree Promotion: Promote tree rooted at Name
Book Collection
Editor
Address
Namec(Book, Editor)
c(Editor, Name)
d(Editor, Address)
c(Book, Collection)
c(Editor, Name) or
(Not exists c(Editor,Name)
and d(Book, Name((
Written in short as:c(Editor, Name) or
d(Book, Name(
21
Adding Relaxations into PlanAdding Relaxations into Plan
• Leaf Node Deletion: Make Address Optional
Book Collection
Editor
Address
Namec(Book, Editor)
c(Editor, Name)
d(Editor, Address)
c(Book, Collection)
Outer Join Operator: Means that should join if possible, but not delete values that
cannot join
22
Combining All Possible RelaxationsCombining All Possible Relaxations
• All approximate answers can be derived from the following
query plan
Document Collection
Editor
Address
Namec(Document, Editor) OR d(Document, Editor)
c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)
d(Editor, Address) OR d(Document, Address)
c(Book, Collection) OR d(Document, Collection)
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)(6, 0)
23
Creating “Best Answers”Creating “Best Answers”
• Want to find answers whose ranking is over
the threshold t
• Naive solution: Create all answers. Delete
answers with low ranking
• Algorithm Thres: Goal of the algorithm is to
prune intermediate answers that cannot
possibly meet the specified threshold
24
Associating Nodes with Maximal WeightAssociating Nodes with Maximal Weight
• The maximal weight of a node in the evaluation plan is the
largest value by which the score of an intermediate answer
computed for that node can grow
Document Collection
Editor
Address
Namec(Document, Editor) OR d(Document, Editor)
c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)
d(Editor, Address) OR d(Document,Address)
c(Book, Collection) OR d(Document, Collection)
25
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
Document Collection
Editor
Address
Namec(Document, Editor) OR d(Document, Editor)
c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)
d(Editor, Address) OR d(Document,Address)
c(Book, Collection) OR d(Document, Collection)
(38) (39)
(6, 0)
(30) (40)
(39)
(41)
(21)
(7)
(0)
26
Algorithm ThresAlgorithm Thres
• Relaxed query evaluation plan is computed
bottom-up
– Note that the joins are computed for all matching
intermediate results at the same time
• At each step, intermediate results are computed,
along with their scores
• If the sum of an intermediate result score with the
maximal weight of the current node is less than the
threshold, prune the intermediate result
27
Example: Threshold = 35Example: Threshold = 35
Book
Editor
Name Address
Details
Sam NYDocument Collection
Editor
Namec(Document, Editor) OR d(Document, Editor)
c(Editor, Name) OR d(Editor, Name) OR d(Document,Name)
d(Editor, Address) OR d(Document,Address)
c(Book, Collection) OR d(Document, Collection)
(38) (39)
(30) (40)
(39)
(41)
(21)
(7)
(0)
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
Address
(6, 0)
When will the answer be pruned?
7
7
16
27
28
Test YourselfTest Yourself
29
Example RankingExample Ranking
Book
Collection Editor
Name Address
(7, 1)
(4, 3)(2, 1)
(6, 0) (5, 0)
(8, 5)
(6, 0) (4, 0)
(3, 0)
Document
Name Address
Sam NY
How much would this
answer score?Collection
30
(8, 5)
Query PlanQuery Plan
Book
Collection Editor
Name
(7, 1)
(4, 3)(2, 1)
(5, 0)
(6, 0)
(6, 0)
1. What will the exact plan look like?
FName LName
2. What will the plan look like if all possible relaxations are added?
3. What is maximal weight by which the score of an intermediate answer can
grow, for each node?
(2, 1) (2, 1)
(2, 0)(1, 0)