Post on 03-Mar-2021
IntroductionEvaluation Framework
ExperimentsConclusion
A Framework for EvaluatingDatabase Keyword Search Strategies
Joel Coffman Alfred C. Weaver
University of Virginia
28 October 2010
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 1 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Outline
Introduction
Evaluation Framework
Experiments
Conclusion
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 2 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Outline
Introduction
Evaluation Framework
Experiments
Conclusion
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 2 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Outline
Introduction
Evaluation Framework
Experiments
Conclusion
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 2 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Outline
Introduction
Evaluation Framework
Experiments
Conclusion
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 2 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Cast
personId characterId movieId
10 7 1810 7 1911 9 19
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Definition (query result)A tree of tuples that is reduced withrespect to the query.
Indiana Jones and the Last Crusade
Professor Henry Jones
Sean Connery
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Definition (query result)A tree of tuples that is reduced withrespect to the query.
Indiana Jones and the Last Crusade
Professor Henry Jones
Sean Connery
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Which would you rather write?
SELECT Person . nameFROM Person , Character , Movie , CastWHERE Person . i d = Cast . personIdAND Character . i d = Cast . cha rac te r IdAND Movie . i d = Cast . movieIdAND Character . name = ' Professor Henry
Jones 'AND Movie . t i t l e = ' Ind iana Jones and
the Last Crusade ' ;
or “Henry Jones Last Crusade”
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Which would you rather write?
SELECT Person . nameFROM Person , Character , Movie , CastWHERE Person . i d = Cast . personIdAND Character . i d = Cast . cha rac te r IdAND Movie . i d = Cast . movieIdAND Character . name = ' Professor Henry
Jones 'AND Movie . t i t l e = ' Ind iana Jones and
the Last Crusade ' ;
or “Henry Jones Last Crusade”
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Keyword SearchI Preferred means of data exploration and retrieval online
I > 4 billion searches dailyI Desire to extend paradigm to relational databases
ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?
Person
id name
10 Ford, Harrison11 Connery, Sean
Character
id name
7 Indiana Jones9 Professor Henry Jones
Movie
id title
18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade
Which would you rather write?
SELECT Person . nameFROM Person , Character , Movie , CastWHERE Person . i d = Cast . personIdAND Character . i d = Cast . cha rac te r IdAND Movie . i d = Cast . movieIdAND Character . name = ' Professor Henry
Jones 'AND Movie . t i t l e = ' Ind iana Jones and
the Last Crusade ' ;
or “Henry Jones Last Crusade”
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 3 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
The ProblemRelational keyword search has been a hot topic since 2002
I Evaluations ad hoc, no standardization
Example (Search Effectiveness)DISCOVER � Hristidis et al. � Liu et al. � SPARK � Xu et al.
I ≈ 16-fold improvement in search effectiveness during past decadeI Liu et al. claim to be better than Google
I SPARK achieves Mean Reciprocal Rank (MRR) of 1.0
I Best systems at TREC score ≈ 0.8 (Webber, 2010)
Hypothesis: Existing evaluations overstate retrievaleffectiveness
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 4 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
The ProblemRelational keyword search has been a hot topic since 2002
I Evaluations ad hoc, no standardization
Example (Search Effectiveness)DISCOVER � Hristidis et al. � Liu et al. � SPARK � Xu et al.
I ≈ 16-fold improvement in search effectiveness during past decade
I Liu et al. claim to be better than Google
I SPARK achieves Mean Reciprocal Rank (MRR) of 1.0
I Best systems at TREC score ≈ 0.8 (Webber, 2010)
Hypothesis: Existing evaluations overstate retrievaleffectiveness
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 4 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
The ProblemRelational keyword search has been a hot topic since 2002
I Evaluations ad hoc, no standardization
Example (Search Effectiveness)DISCOVER � Hristidis et al. � Liu et al. � SPARK � Xu et al.
I ≈ 16-fold improvement in search effectiveness during past decadeI Liu et al. claim to be better than Google
I SPARK achieves Mean Reciprocal Rank (MRR) of 1.0
I Best systems at TREC score ≈ 0.8 (Webber, 2010)
Hypothesis: Existing evaluations overstate retrievaleffectiveness
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 4 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
The ProblemRelational keyword search has been a hot topic since 2002
I Evaluations ad hoc, no standardization
Example (Search Effectiveness)DISCOVER � Hristidis et al. � Liu et al. � SPARK � Xu et al.
I ≈ 16-fold improvement in search effectiveness during past decadeI Liu et al. claim to be better than Google
I SPARK achieves Mean Reciprocal Rank (MRR) of 1.0
I Best systems at TREC score ≈ 0.8 (Webber, 2010)
Hypothesis: Existing evaluations overstate retrievaleffectiveness
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 4 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
The ProblemRelational keyword search has been a hot topic since 2002
I Evaluations ad hoc, no standardization
Example (Search Effectiveness)DISCOVER � Hristidis et al. � Liu et al. � SPARK � Xu et al.
I ≈ 16-fold improvement in search effectiveness during past decadeI Liu et al. claim to be better than Google
I SPARK achieves Mean Reciprocal Rank (MRR) of 1.0
I Best systems at TREC score ≈ 0.8 (Webber, 2010)
Hypothesis: Existing evaluations overstate retrievaleffectiveness
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 4 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
The ProblemRelational keyword search has been a hot topic since 2002
I Evaluations ad hoc, no standardization
Example (Search Effectiveness)DISCOVER � Hristidis et al. � Liu et al. � SPARK � Xu et al.
I ≈ 16-fold improvement in search effectiveness during past decadeI Liu et al. claim to be better than Google
I SPARK achieves Mean Reciprocal Rank (MRR) of 1.0
I Best systems at TREC score ≈ 0.8 (Webber, 2010)
Hypothesis: Existing evaluations overstate retrievaleffectiveness
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 4 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
BackgroundMotivation
Survey of Existing EvaluationsI Existing experiments unrepeatable
I Few details included in literatureI Datasets, query workloads, and relevance assessments not
releasedI Query workloads vary widely
I 12–1100 queries included in experimentsI Too few representative queries
I ExperimentsI Performance-focus, less than half consider search
effectivenessI Little system comparison
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 5 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
DatasetsQueriesRelevance Assessments
Outline
IntroductionBackgroundMotivation
Evaluation FrameworkDatasetsQueriesRelevance Assessments
Experiments
Conclusion
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 6 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
DatasetsQueriesRelevance Assessments
DatasetsI 3 datasets
I Subsets of IMDb and Wikipedia used in experimentsI Evaluate systems that assume index fits in memory
Dataset Size (MB) Relations Tuples
MONDIAL 10 28 17KIMDb 427 6 1.7MWikipedia 378 6 0.2M
IMDb (original) 9017 44 44.3MWikipedia (original) 670 42 1.6M
Table: Dataset characteristics.
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 7 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
DatasetsQueriesRelevance Assessments
Query WorkloadI 50 information needs (minimum for evaluating retrieval
systems)I Query statistics are similar to those submitted to Internet
search engines
Search log Synthesized
Dataset |Q| JqK JqK |Q| JqK JqK
MONDIAL 50 1–5 2.04IMDb 101,903 1–96 2.71 50 1–26 3.88Wikipedia 122,956 1–95 2.87 50 1–6 2.66
Overall 20,527,863 1–245 2.37 150 1–26 2.86
Legend|Q| total number of queriesJqK range in number of query termsJqK average number of terms per query
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 8 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
DatasetsQueriesRelevance Assessments
Relevance AssessmentsI Binary relevance assessments
Results
Dataset JrK JrK
MONDIAL 1–35 5.90IMDb 1–35 4.32Wikipedia 1–13 3.26
Overall 1–35 4.49
LegendJrK range in number of relevant results per queryJrK average number of relevant results per query
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 9 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Outline
IntroductionBackgroundMotivation
Evaluation FrameworkDatasetsQueriesRelevance Assessments
Experiments
Conclusion
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 10 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
ExperimentsObjectives and Metrics
I Determine search effectiveness of different systemsI Mean Reciprocal Rank (MRR)I Mean Average Precision (MAP)
I Impact the number of results retrieved has on metricsI Interpolated precision
I Correlation of results from different systemsI Minimizing Kendall distance (Kmin)
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 11 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
SystemsI 2 major approaches to keyword search in relational
databasesI Relational
I Specific to relational databasesI Use IR-style ranking functions
I Proximity SearchI Applicable to arbitrary data graphsI Minimizes the total weight of result trees
I 8 systems published in major proceedingsI . . . plus our own ranking scheme, structured cover density
ranking (CD)
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 12 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Single-Entity EffectivenessI Proximity search systems handle single-entity queries well
but not scalable
Figure: Mean reciprocal rank for queries targeting a single tuple.
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 13 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Overall EffectivenessI No ranking scheme outperforms all others
I IR-style ranking (excluding CD) generally not as good asproximity search
Figure: Mean average precision across all queries.
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 14 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Additional ExperimentsNumber of results retrieved
I Precision-recall curve inaccurate above 40% recall forsmall k
I k should be at least double the number of relevant results
Ranking CorrelationI Ranking functions derived from common ancestor produce
similar resultsI Prefer simpler ranking functions
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 15 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
Outline
IntroductionBackgroundMotivation
Evaluation FrameworkDatasetsQueriesRelevance Assessments
Experiments
Conclusion
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 16 / 18
IntroductionEvaluation Framework
ExperimentsConclusion
ConclusionsI Existing evaluations ad hoc, lacking standardization
I Standardized evaluation critical to progressI Our evaluation benchmark is the first designed for keyword
search within relational databasesI Datasets, queries, and relevance assessments available for
other researchersI No existing ranking scheme is most effective on all
workloadsI Improve ranking by considering additional factors
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 17 / 18
Questions
Questions?
Download the datasets, queries, and relevance assessments:http://www.cs.virginia.edu/~jmc7tp/projects/search
Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 18 / 18