Set Retrieval 2.0
-
Upload
daniel-tunkelang -
Category
Technology
-
view
1.820 -
download
0
description
Transcript of Set Retrieval 2.0
© 2008 Endeca Technologies, Inc. All rights reserved.
Set Retrieval 2.0
Daniel TunkelangChief Scientist, Endeca
© 2008 Endeca Technologies, Inc. All rights reserved.2
howdy!
• 1988 – 1992
• 1993 – 1998
• 1999 -
© 2008 Endeca Technologies, Inc. All rights reserved.3
overview
what’s right with search today?
what’s wrong with search today?
how do we fix it?
© 2008 Endeca Technologies, Inc. All rights reserved.4
let’s quickly review some history…
© 2008 Endeca Technologies, Inc. All rights reserved.5
1947: Hans Peter Luhn
© 2008 Endeca Technologies, Inc. All rights reserved.6
1968: Gerald Salton
© 2008 Endeca Technologies, Inc. All rights reserved.7
1972: Karen Spärck Jones
© 2008 Endeca Technologies, Inc. All rights reserved.8
1980s: lots of progress
© 2008 Endeca Technologies, Inc. All rights reserved.9
1990s – 2000s: WWW
© 2008 Endeca Technologies, Inc. All rights reserved.10
today
© 2008 Endeca Technologies, Inc. All rights reserved.11
so, do we all feel lucky?
© 2008 Endeca Technologies, Inc. All rights reserved.12
recession? what recession?
© 2008 Endeca Technologies, Inc. All rights reserved.13
ask the users…
© 2008 Endeca Technologies, Inc. All rights reserved.14
…though they do have complaints
78% wish search engines could read their minds
what frustrates users most?– 25%: deluge of results– 24%: too many paid listings– 19%: inability to understand their keywords– 19%: disorganized / random results
The State of SearchAutobytel & Kelton Research, Oct ’07
© 2008 Endeca Technologies, Inc. All rights reserved.15
web search vs. enterprise search
“Search on the internet is solved. I always find what I need.
But why not in the enterprise?
Seems like a solution waiting to happen.”
- a Fortune 500 CTO
© 2008 Endeca Technologies, Inc. All rights reserved.16
enterprise users really have complaints
Why is Joe the Knowledge Worker so upset?
– 49%: finding the information needed to do their job is difficult and time consuming
– 50%: findability within organization worse than on their own consumer-facing site
Market IQ Report on FindabilityAIIM, June ’08
© 2008 Endeca Technologies, Inc. All rights reserved.17
selection bias?
© 2008 Endeca Technologies, Inc. All rights reserved.18
the library and information science critique
• models– relevance is subjective
• evaluation– neglects interactivity
• tools– no support for exploration
© 2008 Endeca Technologies, Inc. All rights reserved.19
the rebuttal
"Tell us what to do, and we will do it."
© 2008 Endeca Technologies, Inc. All rights reserved.20
besides, search is 90% solved
© 2008 Endeca Technologies, Inc. All rights reserved.21
we need to call a truce
- real, effective systems
- that support interaction
- cost-effective to evaluate
© 2008 Endeca Technologies, Inc. All rights reserved.22
let’s go back to the 80s for a moment
© 2008 Endeca Technologies, Inc. All rights reserved.23
then vs. now
• known-item search was an open problem– now it’s a commodity
• library and information science ideas of the 80s– ahead of their time
• now we can find known items– let’s tackle more ambitious information needs
© 2008 Endeca Technologies, Inc. All rights reserved.24
requirements
© 2008 Endeca Technologies, Inc. All rights reserved.25
transparency
© 2008 Endeca Technologies, Inc. All rights reserved.26
control
© 2008 Endeca Technologies, Inc. All rights reserved.27
guidance
© 2008 Endeca Technologies, Inc. All rights reserved.28
precision = fraction of retrieved documents that are relevant
recall = fraction of relevant documents that are retrieved
retrieveddocuments
relevantdocuments
set retrieval
© 2008 Endeca Technologies, Inc. All rights reserved.29
recall
precision
the classic trade-off
© 2008 Endeca Technologies, Inc. All rights reserved.30
set retrieval: 2 out of 3
© 2008 Endeca Technologies, Inc. All rights reserved.31
set retrieval 2.0 = set retrieval + guidance
Did you mean: guidance Related SearchesGuidance Counselor SalaryGuidance Counselor Job DescriptionDefinition of GuidanceGuidance CounselingHistory of Guidance CounselingChild GuidanceCareer GuidanceWhat Is the Meaning of GuidanceFree Marriage CounselingProblems in MarriageCareer ExplorationRole of School Counselor
© 2008 Endeca Technologies, Inc. All rights reserved.32
guidance vs. mind reading
• system can’t read your mind
• spouse / best friend can’t read your mind
• sometimes you can’t read your own mind
© 2008 Endeca Technologies, Inc. All rights reserved.33
so where does guidance come from?
© 2008 Endeca Technologies, Inc. All rights reserved.34
it’s people!
© 2008 Endeca Technologies, Inc. All rights reserved.35
human-computer information retrieval
• don’t just guess the user’s intent– optimize communication
• de-emphasize the top ten documents– response is a set of documents
• think beyond single queries– support refinement and exploration
© 2008 Endeca Technologies, Inc. All rights reserved.36
recall
precision
hcir cheats the trade-off
© 2008 Endeca Technologies, Inc. All rights reserved.37
but how do we get there?
© 2008 Endeca Technologies, Inc. All rights reserved.38
set retrieval 2.0
• set retrieval that responds to queries with– overview of the user's current context– organized set of options for exploration
• contextual summaries of document sets– optimize system’s communication with user
• query refinement options– optimize user’s communication with system
© 2008 Endeca Technologies, Inc. All rights reserved.39
faceted search guides refinement
© 2008 Endeca Technologies, Inc. All rights reserved.40
showing the right facets: microwaves
© 2008 Endeca Technologies, Inc. All rights reserved.41
showing the right facets: ceiling fans
© 2008 Endeca Technologies, Inc. All rights reserved.42
query-driven clarification before refinement
Matching Categories include:
Appliances > Small Appliances > Irons & Steamers
Appliances > Small Appliances > Microwaves & Steamers
Bath > Sauna & Spas > Steamers
Kitchen > Bakeware & Cookware > Cookware >Open Stock Pots > Double Boilers & Steamers
Kitchen > Small Appliances > Steamers
© 2008 Endeca Technologies, Inc. All rights reserved.43
results-driven clarification before refinement
Search: storage
© 2008 Endeca Technologies, Inc. All rights reserved.44
taxonomies are so 1990s
© 2008 Endeca Technologies, Inc. All rights reserved.45
dynamic topic facet
Subject
Electronic data processing (1002)
Distributed processing (937)
Parallel processing (619)
Computer networks (562)
Fault-tolerant-computing (365)Show more…
© 2008 Endeca Technologies, Inc. All rights reserved.46
facets populated using entity extraction
apple production
© 2008 Endeca Technologies, Inc. All rights reserved.47
bootstrap on folksonomies
© 2008 Endeca Technologies, Inc. All rights reserved.48
or learn from users
© 2008 Endeca Technologies, Inc. All rights reserved.49
hcir using set retrieval 2.0
emphasize set summaries over ranked lists
establish a dialog between the user and the data
enable exploration and discovery
© 2008 Endeca Technologies, Inc. All rights reserved.50
think outside the (search) box
• best-first search works for many use cases
• but not for some of the most valuable ones
• set retrieval 2.0 = set retrieval + guidance
• human-computer information retrieval
© 2008 Endeca Technologies, Inc. All rights reserved.51
thank you
communication 1.0email: [email protected]
communication 2.0blog: http://thenoisychannel.com
twitter: http://twitter.com/dtunkelang