A Framework for Evaluating Database Keyword Search …...18 Raiders of the Lost Ark 19 Indiana Jones...

IntroductionEvaluation Framework

ExperimentsConclusion

A Framework for EvaluatingDatabase Keyword Search Strategies

Joel Coffman Alfred C. Weaver

University of Virginia

28 October 2010

Coffman and Weaver Evaluating Database Keyword Search 28 October 2010 1 / 18

Outline

Introduction

Evaluation Framework

Experiments

Conclusion

Outline

Introduction

Experiments

Conclusion

Outline

Introduction

Experiments

Conclusion

Outline

Introduction

Experiments

Conclusion

BackgroundMotivation

Keyword SearchI Preferred means of data exploration and retrieval online

I > 4 billion searches dailyI Desire to extend paradigm to relational databases

ExampleWho played Professor Henry Jones in Indiana Jones and the Last Crusade?

Person

id name

10 Ford, Harrison11 Connery, Sean

Character

id name

7 Indiana Jones9 Professor Henry Jones

id title

18 Raiders of the Lost Ark19 Indiana Jones and the Last Crusade

Person

id name

Character

id name

id title

Person

id name

Character

id name

id title

personId characterId movieId

10 7 1810 7 1911 9 19

Person

id name

Character

id name

id title

Definition (query result)A tree of tuples that is reduced withrespect to the query.

Indiana Jones and the Last Crusade

Professor Henry Jones

Sean Connery

Person

id name

Character

id name

id title

Definition (query result)A tree of tuples that is reduced withrespect to the query.

Indiana Jones and the Last Crusade

Professor Henry Jones

Sean Connery

Person

id name

Character

id name

id title

Which would you rather write?

SELECT Person . nameFROM Person , Character , Movie , CastWHERE Person . i d = Cast . personIdAND Character . i d = Cast . cha rac te r IdAND Movie . i d = Cast . movieIdAND Character . name = ' Professor Henry

Jones 'AND Movie . t i t l e = ' Ind iana Jones and

the Last Crusade ' ;

or “Henry Jones Last Crusade”

Person

id name

Character

id name

id title

Person

id name

Character

id name

id title

The ProblemRelational keyword search has been a hot topic since 2002

I Evaluations ad hoc, no standardization

Example (Search Effectiveness)DISCOVER � Hristidis et al. � Liu et al. � SPARK � Xu et al.

I ≈ 16-fold improvement in search effectiveness during past decadeI Liu et al. claim to be better than Google

I SPARK achieves Mean Reciprocal Rank (MRR) of 1.0

I Best systems at TREC score ≈ 0.8 (Webber, 2010)

Hypothesis: Existing evaluations overstate retrievaleffectiveness

I ≈ 16-fold improvement in search effectiveness during past decade

I Liu et al. claim to be better than Google

Survey of Existing EvaluationsI Existing experiments unrepeatable

I Few details included in literatureI Datasets, query workloads, and relevance assessments not

releasedI Query workloads vary widely

I 12–1100 queries included in experimentsI Too few representative queries

I ExperimentsI Performance-focus, less than half consider search

effectivenessI Little system comparison

DatasetsQueriesRelevance Assessments

Outline

IntroductionBackgroundMotivation

Evaluation FrameworkDatasetsQueriesRelevance Assessments

Experiments

Conclusion

DatasetsI 3 datasets

I Subsets of IMDb and Wikipedia used in experimentsI Evaluate systems that assume index fits in memory

Dataset Size (MB) Relations Tuples

MONDIAL 10 28 17KIMDb 427 6 1.7MWikipedia 378 6 0.2M

IMDb (original) 9017 44 44.3MWikipedia (original) 670 42 1.6M

Table: Dataset characteristics.

Query WorkloadI 50 information needs (minimum for evaluating retrieval

systems)I Query statistics are similar to those submitted to Internet

search engines

Search log Synthesized

Dataset |Q| JqK JqK |Q| JqK JqK

MONDIAL 50 1–5 2.04IMDb 101,903 1–96 2.71 50 1–26 3.88Wikipedia 122,956 1–95 2.87 50 1–6 2.66

Overall 20,527,863 1–245 2.37 150 1–26 2.86

Legend|Q| total number of queriesJqK range in number of query termsJqK average number of terms per query

Relevance AssessmentsI Binary relevance assessments

Results

Dataset JrK JrK

MONDIAL 1–35 5.90IMDb 1–35 4.32Wikipedia 1–13 3.26

Overall 1–35 4.49

LegendJrK range in number of relevant results per queryJrK average number of relevant results per query

Outline

Experiments

Conclusion

ExperimentsObjectives and Metrics

I Determine search effectiveness of different systemsI Mean Reciprocal Rank (MRR)I Mean Average Precision (MAP)

I Impact the number of results retrieved has on metricsI Interpolated precision

I Correlation of results from different systemsI Minimizing Kendall distance (Kmin)

SystemsI 2 major approaches to keyword search in relational

databasesI Relational

I Specific to relational databasesI Use IR-style ranking functions

I Proximity SearchI Applicable to arbitrary data graphsI Minimizes the total weight of result trees

I 8 systems published in major proceedingsI . . . plus our own ranking scheme, structured cover density

ranking (CD)

Single-Entity EffectivenessI Proximity search systems handle single-entity queries well

but not scalable

Figure: Mean reciprocal rank for queries targeting a single tuple.

Overall EffectivenessI No ranking scheme outperforms all others

I IR-style ranking (excluding CD) generally not as good asproximity search

Figure: Mean average precision across all queries.

Additional ExperimentsNumber of results retrieved

I Precision-recall curve inaccurate above 40% recall forsmall k

I k should be at least double the number of relevant results

Ranking CorrelationI Ranking functions derived from common ancestor produce

similar resultsI Prefer simpler ranking functions

Outline

Experiments

Conclusion

ConclusionsI Existing evaluations ad hoc, lacking standardization

I Standardized evaluation critical to progressI Our evaluation benchmark is the first designed for keyword

search within relational databasesI Datasets, queries, and relevance assessments available for

other researchersI No existing ranking scheme is most effective on all

workloadsI Improve ranking by considering additional factors

Questions

Questions?

Download the datasets, queries, and relevance assessments:http://www.cs.virginia.edu/~jmc7tp/projects/search

A Framework for Evaluating Database Keyword Search …...18 Raiders of the Lost Ark 19 Indiana Jones...

Documents

Transcript of A Framework for Evaluating Database Keyword Search …...18 Raiders of the Lost Ark 19 Indiana Jones...

Ocean raiders

3rd crusade

Black Crusade

Raiders Cheerleader

FIRST CRUSADE

Home Raiders

Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Doolittles Raiders

Rutgers raiders

The Doolittle Raiders - April 18, 1942 The Doolittle Raiders - April 18 ...

Primes in Tuples I

CRUSADE & TERRORISM.pdf

Numbers, lists and tuples

The Doolittle Raiders -The Doolittle Raiders - April 18, 1942 · The Doolittle Raiders -The Doolittle Raiders - April 18, 1942. Title: DoolittleFinal Created Date: 3/25/2015 8:53:23

OAKLAND RAIDERS UPDATED BIOGRAPHIESprod.static.raiders.clubs.nfl.com/assets/docs/UpdatedBiosFINAL2012.pdfOakland Raiders updated biographies | END OF 2012 SEASON OAKLAND RAIDERS UPDATED

Indiana Jones Selection - Notenversand · Indiana Jones Selection Raiders March/Slave Children'S Crusade/Short Rounds Theme/Love Theme/Keeper Of The Grail,The Heide, H.Van Der / Williams,

A Calculated Crusade: Venice, Commerce, and the Fourth Crusade

Tuples + Sorting - Stanford Universityweb.stanford.edu/class/cs106a/lectures/19-Tuples/19...Piech + Sahami, CS106A, Stanford University Learning Goals 1.Learning about tuples in Python

The Crusade

Oakland Raiders | About the Oakland Raiders