Design for Interaction

Post on 09-May-2015

2.744 views 4 download

description

Design for Interaction by Daniel Tunkelang, Chief Scientist of Endeca An invited presentation at SIGMOD '09 (http://sigmod09.org/) Research in information retrieval has focused on presenting the most relevant results to a user in response to a free-text search query. Research in database systems assumes a model where the user enters a formal query, and the results are exactly those the user requested. Neither community has emphasized user interaction—a critical concern for practical information access. As William Goffman noted in the 1960s and Nick Belkin continually reminds us today, the relationship between a document and query, though necessary, is not sufficient to determine relevance—yet ranked retrieval approaches rely heavily or exclusively on this relationship. Meanwhile, recent work on database usability by Jeff Naughton and H.V. Jagadish surfaces the rigidity of database systems that return nothing unless users know how to formulate precise queries. This talk presents human-computer information retrieval (HCIR) as a general approach that addresses some of the key challenges facing both research communities. A vision first put forward by Gary Marchionini, HCIR expects people and systems to work together to implement information access. Such an approach requires rethinking information access not as a matching or ranking problem, but rather as a communication problem. Specifically, we need interfaces that optimize the bidirectional communication between the user and the system, thus optimizing the symbiotic division of labor between the two. This talk reviews the history of HCIR efforts and presents ongoing work to implement the HCIR vision. In particular, it presents an interactive set retrieval approach that responds to queries with an overview of the user's current context and an organized set of options for incremental exploration.

Transcript of Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.

design for interaction

Daniel TunkelangChief Scientist, Endeca

© 2009 Endeca Technologies, Inc. All rights reserved.2

about me

Organizing SIGIR ’09 Industry Track in Boston on July 22nd!

© 2009 Endeca Technologies, Inc. All rights reserved.3

about endeca

250M+ end users

per month

250M+ end users

per month600+ customers

$100M+ annual sales

leading provider ofsearch applications

© 2009 Endeca Technologies, Inc. All rights reserved.4

what i hope you learn from this talk

the db and ir perspectives have a common thread

convergence may be upon us

but we need interaction to make it work

© 2009 Endeca Technologies, Inc. All rights reserved.5

overview

don't put all your eggs in one basket

design for interaction

human-computer information retrieval

© 2009 Endeca Technologies, Inc. All rights reserved.6

don’t put all your eggs in one basket

Still Life with Basket and Broken Eggs by Michael Edwards, 2008

© 2009 Endeca Technologies, Inc. All rights reserved.7

the db approach: perfection in, perfection out

http://www.storeitfoodsblog.com/category/food-preparation/meat-grinder/

© 2009 Endeca Technologies, Inc. All rights reserved.8

db usability researchers recognize the pain

© 2009 Endeca Technologies, Inc. All rights reserved.9

sql is hard

Making Database Systems Usable[Jagadish et al., SIGMOD 2007]

• labor-intensive query construction

• lengthy query evaluation

• high query reformulation cost

__sql

© 2009 Endeca Technologies, Inc. All rights reserved.10

data sucks and users are lazy

Extracting Problems for Databaseand IR Researchers[Naughton, Spring 2008 North East DB/IR Day]

• real data is– incomplete– inconsistent– incorrect

• users don’t want to learn– data schemas– structured query languages we’re not gonna take it!

© 2009 Endeca Technologies, Inc. All rights reserved.11

the ir way: don’t worry, be happy

http://adsoftheworld.com/media/print/mcdonalds_burger_mysteries

© 2009 Endeca Technologies, Inc. All rights reserved.12

ir for db people: what would google do?

information Need query select from results

rank using IR model

USER:

SYSTEM:tf-idf PageRank

© 2009 Endeca Technologies, Inc. All rights reserved.13

assumptions of relevance-centric ir approach

• self-awareness

• self-expression

• model knows best

• answer is a document

• one-shot query

© 2009 Endeca Technologies, Inc. All rights reserved.14

life is not a batch

• db approach expects too much of user• ir approach expects too much of system

both approaches act as if it allcomes down to a single query

is that your final answer question?

© 2009 Endeca Technologies, Inc. All rights reserved.15

design for interaction

The Future of Social Interaction by Jim Stoten

© 2009 Endeca Technologies, Inc. All rights reserved.16

changes assumptions about what to optimize

recall

pre

cis

ion

complexity relevance

communication

© 2009 Endeca Technologies, Inc. All rights reserved.17

how do we optimize communication?

transparency

control

guidance

© 2009 Endeca Technologies, Inc. All rights reserved.18

ir offers a black box

ca c'est la caisse. le mouton que tu veux est dedans.

© 2009 Endeca Technologies, Inc. All rights reserved.19

db / set retrieval offers 2 out of 3

transparency

control

guidance

© 2009 Endeca Technologies, Inc. All rights reserved.20

but we need it all!

• set retrieval is a failure in the ir world– though quite successful in the db world!

• but ranked retrieval is inherently crippled– no transparency, control, or guidance!

how do we optimize for communication?

© 2009 Endeca Technologies, Inc. All rights reserved.21

human-computer information retrieval

• don’t just guess the user’s intent• increase user responsibility and control• require and reward human intellectual effort

“Toward Human-Computer Information Retrieval”

Gary Marchionini

© 2009 Endeca Technologies, Inc. All rights reserved.22

great idea

how?

© 2009 Endeca Technologies, Inc. All rights reserved.23

treat query construction as a process

A Case for Interaction[Koenemann and Belkin, 1996]

• used term feedback to improve alerting queries

• users select from suggested terms

• 17 – 34% improvement in precision @ 30

• users liked the feedback interface

© 2009 Endeca Technologies, Inc. All rights reserved.24

expose the facets of semistructured content

© 2009 Endeca Technologies, Inc. All rights reserved.25

success in the lab and the field

• favored in user studies by Marti Hearst– http://flamenco.berkeley.edu/

• ubiquitous in ecommerce– amazon.com– eBay– endeca powers 42 of top 100 online retailers

• taking over media, libraries, enterprise, etc.

© 2009 Endeca Technologies, Inc. All rights reserved.26

even a few db folks have drunk the kool-aid

DataGuides[Goldman and Widom, VLDB 1997]

• user-friendly schema summaries

Magnet[Sinha and Karger, SIGMOD 2005]

• navigation and refinement options

common theme: semistructured

© 2009 Endeca Technologies, Inc. All rights reserved.27

what is semistructured data?

• one universe

• self-describing

• blends data / meta-data

© 2009 Endeca Technologies, Inc. All rights reserved.28

data modeling flexibility

• no a-priori schema– integrated sources without up-front schema design

• richer modeling capabilities tame data complexity– hierarchy, multi-valued fields, sparse fields

• schema flexibility eases schema evolution– new entity types, new data source

Databases Content ManagementERP

Groupware and Collaboration

WWW Internet

SOA, ESB,Web ServiceFile Systems

© 2009 Endeca Technologies, Inc. All rights reserved.29

semantically direct queries

<shirt><sku>1234</sku><sleeve>Long</sleeve><desc>Classic end-on-end shirt</desc><price>39.99</price><salePrice>29.99</salePrice><color>Blue</color><color>Yellow</color><color>White</color>...

</shirt> <trousers><sku>1579</sku><price>59.99</price><color>Khaki</color>...

</trousers>

which on-sale itemsare available in blue?

<buyingGuide><title>Selecting the right ski coat for you.</title><file>skiguide.pdf</file><keyword>ski</keyword><keyword>coat</keyword>...

</buyingGuide>

which attributescharacterize on-saleblue items?

price, sleeve,color, salePrice,brand, fabric, …

© 2009 Endeca Technologies, Inc. All rights reserved.30

but let’s make this concrete

Uh oh, I’m presenting atSIGMOD! Better find a good

book about databases!

© 2009 Endeca Technologies, Inc. All rights reserved.31

quick, to the goog-mobile!

not quite…

© 2009 Endeca Technologies, Inc. All rights reserved.32

i know, i’ll go to the library!

#%@$!

© 2009 Endeca Technologies, Inc. All rights reserved.33

let’s try a little hcir…

© 2009 Endeca Technologies, Inc. All rights reserved.34

hcir works for news too

© 2009 Endeca Technologies, Inc. All rights reserved.35

life in a semistructured world

• search is a great starting point– users can’t / won’t initiate structured queries

• ranked lists are an inadequate ending point– search queries are lossy projections of intent

• hcir leads users down a garden path to structure

© 2009 Endeca Technologies, Inc. All rights reserved.36

lots of trade-offs

“everything should be made as simple as possible, but no simpler”

“speed of thought” vs. “going nowhere quickly”

“to err is human, but to really foul things up requires a computer”

simple interfaces don’talways yield satisfaction

© 2009 Endeca Technologies, Inc. All rights reserved.37

users want the triumvirate

• transparency• control• guidance

transparency and control are easy

guidance requires cleverness

© 2009 Endeca Technologies, Inc. All rights reserved.38

in closing

all of us want to help people access information

the best help is to help them help themselves

design for interaction thoughtransparency, control, guidance

© 2009 Endeca Technologies, Inc. All rights reserved.39

thank you…and come to SIGIR!

communication 1.0email: dt@endeca.com

communication 2.0blog: http://thenoisychannel.com

twitter: http://twitter.com/dtunkelang

SIGIR: July 19-23 in Boston Industry Track on July 22nd!