Wolf Siberski1 What do you mean? – Determining the Intent of Keyword Queries on Structured Data.
-
Upload
maud-sutton -
Category
Documents
-
view
216 -
download
0
Transcript of Wolf Siberski1 What do you mean? – Determining the Intent of Keyword Queries on Structured Data.
Wolf Siberski 1
Wolf Siberski
What do you mean? – Determining the Intent of Keyword Queries on Structured Data
Wolf Siberski 2
Overview
■ Motivation■ Approaches in keyword search on structured data■ QUICK – Query Intent Construction for Keywords
■ User interaction■ Algorithm■ Evaluation
■ Conclusion
Wolf Siberski 3
The Information Search Process
What is my search objective?What is my search objective?
What exactly do I want to know?
What exactly do I want to know?
How do I express my search request?How do I express my search request?
Which result satisfies my information need?
Which result satisfies my information need?
Sutcliffe/Ennis: Towards a cognitive theory of information retrieval
Identify problem
Articulate needs
Query formulation
Evaluate results Unsatisfactory
results
Usergoals
Failed search
NeedTypes
Concepts
Domain Knowledge
Information system knowledge
Information System
Execute query
Information problem
Successful search
Results
Wolf Siberski 4
Identify problem
Articulate needs
Query formulation
Evaluate results Unsatisfactory
results
Usergoals
Failed search
NeedTypes
Concepts
Domain Knowledge
Information system knowledge
Information System
Execute query
Information problem
Successful search
Results
IMDB Example – Keyword search
In which movies did they both act?
Brad Pitt Angelina Jolie
Have they been working together?
Brad Pitt Angelina JolieIMDb Brad Pitt Angelina Jolie
Wolf Siberski 5
Identify problem
Articulate needs
Query formulation
Evaluate results Unsatisfactory
results
Usergoals
Failed search
NeedTypes
Concepts
Domain Knowledge
Information system knowledge
Information System
Execute query
Information problem
Successful search
Results
IMDB Example – Database search
In which movies did they both act?
Brad Pitt Angelina Jolie
Are they working together, too?
SELECT M.Title, M.Year FROM Movie M, Actor A1, Actor A2, ActsIn R1, ActsIn R2 WHEREA1.Name = 'Brad Pitt' AND A2.Name = 'Angelina Jolie' AND R1.ActorId = A1.Id AND R2.ActorId = A2.Id ANDR1.MovieId = R2.MovieId AND M.Id = R1.MovieId
Movie
PK id
title year
Actor
PK id
name
Movie Character
nameFK1 actsInFK2 actedBy
M.Title M.Year
101 Biggest Celebrity Oops 2004
Mr. & Mrs. Smith 2005
Stars on Trial 2005
The 72nd Academy Awards 2000
…
Wolf Siberski 6
Context
■ Trend: general information captured as structured data (DBpedia, LinkedData, etc.)
■ Limited support for complex information needs■ Keywords: Limited expressivity, but user-friendly■ Structured Queries: High expressivity, but difficult to master
New ways to access this data required
Wolf Siberski 7
IR on Structured Data (Incomplete)
■ Not a new idea (Universal Relation, 1984)
1. Relevance Notion for structured data■ Extract data subgraphs (tuple joins) matching the query■ Rank results according to relevance score■ BANKS,DISCOVER, SPARK, EASE, etc.
■ Can serve the ‚head‘ of user distribution, but not the long tail■ Low quality of relevance judgements [Coffmann/Weaver, CIKM10]
2. Form builder■ Enable visual construction of user-defined query forms
■ Requires exploration of database schema
Wolf Siberski 8
QUICK – Keyword Search on Databases
■ User starts with keyword search
■ QUICK guides user through query construction process
■ Combines ■ Ease-of-use of keyword search■ Expressivity of database queries
G. Zenz, X. Zhou, E. Minack, W. Siberski, and W. Nejdl:From keywords to semantic queries – Incremental query construction on the semantic web. Journal of Web Semantics, Elsevier, 2009. http://dx.doi.org/10.1016/j.websem.2009.07.005
Wolf Siberski 9
QUICK Search Process
User
KeywordsCompute possible query intentions
QUICK
Compute selection options
Refined Interpretation
Selection optionsSelect intended interpretation
Select intended query
QueryCompute results
Results
Is “Brad” part of a movie title?Is “Brad” part of an actor name?…
Brad Pitt Angelina Jolie
“Brad” is part of an actor name
Find movies where both Brad Pitt and Angelina Jolie are actors
Evaluate results
M.Title M.Year101 Biggest Ce… 2004Mr. & Mrs. Smith 2005Stars on Trial 2005
Wolf Siberski 10
Actor
name
QUICK – Concepts
■ RDF Schema
■ Query Template■ Query pattern on the schema
■ Contains only free variables
■ Semantic Query■ Interpretation of a keyword query
■ Produced from query template by binding keywords
MovieMovie
Character actsIn
title name
Actor
name
actedBy
ActorMovie
Character
actedBy
name name
Actor
name
Movie
actsIn actedBy
title
MovieCharacter
Actor
name
brad pitt
ActorMovie
Character
actedBy
name
pitt
name
bradActor
name
pitt
Movie
actsIn actedBy
title
brad
MovieCharacter
Wolf Siberski 11
■ Query Hierarchy■ Semantic queries ordered by sub-query relationship
■ Query Guide■ Graph including paths to all possible queries
Query Guide
...
...
Movie
title
brad
name
brad
MovieCharacter
name
pitt
Movie
title
pitt
Actor
name
pitt
Actor
name
pitt
Movie
actsIn actedBy
title
brad
MovieCharacter
ActorMovie
Character
actedBy
name
pitt
name
brad
Actor
name
brad
MovieCharacter
Actor
name
brad pitt
Actor
name
pitt
ActorMovie
Character
actedBy
name
pitt
name
brad
Movie
title
pitt
Actor
name
brad pittActor
name
pitt
Movie
actsIn actedBy
title
brad
MovieCharacter
Wolf Siberski 15
Query Guide Construction – Offline Stage
■ Generate all Query Templates■ Start with one-variable queries■ Produce all possible combinations■ Repeat until max. join path length reached
■ Build Inverted Index■ Terms -> Attributes■ Enables fast keyword-query mapping at runtime
Wolf Siberski 16
Query Guide Construction – Online Stage
■ Identify possible queries (leafs of query guide)■ Extract partial query graph from template graph
■ Problem: query space can be very large
Find minimal query guide
■ Cost function: # of steps+ # of inspected suggestions■ Minimal guide: smallest maximum cost
■ Depth/width tradeoff:
Too flat Too deep
...
...............
Optimum:
ln(n) split
Wolf Siberski 17
Greedy Query Guide Construction
■ Finding Minimal Guide: NP-Hard
■ Use approach similar to set cover approximation
■ Determine nodes (=refinement options) top-down■ Greedily select node leading to the lowest cost
– Cost estimation: minimally incurred cost
■ Repeat until all nodes are covered
Wolf Siberski 18
Evaluation – Experiment Settings
■ IMDB database■ Semantic Web representation
■ Queries from AOL query log■ Selection criteria
– Movie-related
– 2-5 keywords
– Refers to at least 2 entities
■ Manual assessment of query intention
■ Search process■ Manual input of keywords■ Selection of correct option according to query intention
Wolf Siberski 19
Evaluation – Guide Quality
■ Intended construction option usually among top 3■ Usually 3-5 clicks needed to construct query■ Effective also for large query spaces
Wolf Siberski 20
Conclusion
■ Query construction with QUICK■ Highly effective construction process■ All intentions can be constructed■ No query language or schema knowledge
required
■ Further directions■ Combine with relevance heuristics (IQP)■ More flexible user interaction
– Use facets for keyword bindings
– Better multi term support
■ Optimized query guide generation– Exploit entity notion (QUnits)
– Progressive query guide creation
■ Connect to QbE/Query Form Creation
Wolf Siberski 21
Evaluation – Performance
No. of terms Initialization time (ms)
Response time (ms)
2 98 2
3 993 19
4 16,797 1,035
>4 31,838 3,290
All 3,659 314
■ Initialization takes too much time for long queries■ RDF store as bottleneck (creation of query hierarchy)
■ After initialization, response time is ok
Wolf Siberski 22
Optimizations
■ Identification of semantic queries■ Index template subsets by attribute to enable fast filtering of
queries without results■ Enable fast disjunction of template subsets (e.g., ‚and on bitsets)
■ QCG generation■ Parallel subquery computation■ Caching of frequent subqueries
Wolf Siberski 23
Misc Ideas
■ Use Google‘s KDD annotated Named Entity Recognition test set (Piggyback, http://sites.google.com/site/massiciara/)
Wolf Siberski 24
Cross Connections
■ Thomas Gottron: Traditional features (e.g. TF) not useful for very short text
■ Hinrich Schütze: entity related queries often ambigouous■ Michael Granitzer: cycle of refinement/exploration■ Norbert Fuhr: generate clusters based on possible
queries and let users select the right cluster