Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1...
-
Upload
loreen-carpenter -
Category
Documents
-
view
221 -
download
2
Transcript of Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1...
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo1Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo
Graph Query Reformulation with DiversityDavide Mottin, University of TrentoFrancesco Bonchi, Yahoo Labs - Francesco Gullo, Yahoo Labs
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo2
Issues with Pattern SearchO
OH OSQuery
510 matches
• Too many matches• Results are not
grouped
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo3
Solution: Discovering Specializations
O
OH OS 510 matches
O
OH
OS
CH3
O
OH OS
SH
O
OH OS
H3C
O
OH OS
CH3
O
OH
OS
CH3HS
448 matches 46 matches 114 matches
382 matches 46 matches
Specializations
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo4
Applications
• Finding groups of molecules having a particular reagent
• Analyze a set of proteins to find diseases• Workflow optimization• Anomaly detection in a network• 3D shape search with similar properties
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo5
Dealing with Specializations in Web and Relational Data
• Faceted Search• present aspects of the results [Roy08]
• Query reformulation• Modify some of the query conditions
• In structured databases [Mishra09]• In web search [Dang10]
First Study of Problem on GRAPHS
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo6
Graph Query Reformulation
Results
Query
Specializations:query supergraphs
…
Exponential numberof specializations
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo7
Challenges
• The number of reformulation is exponential• Quantify the interestingness of a reformulation• Finding query specializations is NP-complete
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo8
A Naïve Approach: k-most frequent super-patterns
Query
480 matches
450 matches
100 matches
SuperPatterns
30 matches420 matches
Until k patterns are found:- Retrieve the most frequent super-
patternFrequent ≠ Interesting
!
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo9
Our Approach
Graph Query Reformulation with Diversity• Finds k meaningful specializations efficiently
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo10
Finding Meaningful Specializations
Results
Query
Coverage Diversity
Find k meaningful specializations:1. Span all the results
2. Present different aspects of the results
?
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo11
Diversity Matters
Results
Query
Objective function f(Q)
λ = 1• Non optimal: f({Q1’,Q2’}) = 7
• Optimal: f({Q3’,Q4’}) = 8
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo12
Problem
Graph Query Reformulation with Diversity
12
Theorem (NP-hardness)The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo13
Solution: Greedy Algorithm
13 Greedy
While k-specializations are not found
1. Find the specialization leading to the maximum increment of the objective function (marginal gain)
2. Add the specialization to the results
TheoremThe algorithm is a ½-approximation
Finding the maximum gain is #P-complete
[Valiant79]
Solution
Fast_MMPG: Branch and bound algorithm to efficiently find the specialization with the maximum marginal gain
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo14
The multiplicity vector
Results
0 0 0 0 01 1 0 0 02 2 1 1 02 2 2 2 02 3 3 3 1
Output set of specializations
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo15
Upper bound on the Marginal gain
LemmaThe marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far.
Upper bound : is the value of the objective function considering only results with multiplicity
Theorem
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo16
Upper bound
Results
0 0 0 0 01 2 1 1 1
Output set of specializations
1 2 1 1 1
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo17
Until the reformulation with the maximum upper bound and marginal gain is not found1. Expand the reformulation with the max upper
bound2. Prune Reformulations with marginal gain
smaller than the upper bound so far
The Fast_MMPG Algorithm
upper bound
marginal gain
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo18
Experimental Setup
18
• Datasets: • AIDS: 10k chemical compounds
• Financial: 17k transaction workflows
• Web: 13k interactions with a recommender system
• Baseline algorithms: • k-freq: returns top-k frequent supergraphs of a query
• LIndex: informative patterns index
• Experiments: • Time and objective function value varying k, query size, λ
• Anecdotal
• Scalability
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo19
Time Comparison
Number of specializations1. k-freq runs only slighly faster2. Time increases linearly in k3. Fast_MMPG has real-time
performance
Query size1. Fast_MMPG comparable to k-
freq2. Time decreases with query
size (less reformulations)
number of reformulations (k)
query size
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo20
Objective function gain
20
Analysis1. Lambda correctly moves the objective function towards
diversity2. k-freq only captures coverage
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo21
Qualitative evaluation
k-freq
Fast_MMPG
C O
O OH
C
O CH3
C
O Fe
C
O NH2
C
O
CH3
C
CH3
O CH3
C
O CH3
C C
O CH2
C C
O NH2
C
O CH2
C NH
Query
Analysis• k-freq finds specialization of the same superquery
• Fast_MMPG returns reformulations with more diversified structures
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo22
Conclusions
• First study of the problem of query reformulation in graph databases
• Principled objective function optimizing coverage and diversity
• Algorithmic solutions with quality guarantees and real time responses
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo23
Questions?
Thank you!