Design for Interaction

39
© 2009 Endeca Technologies, Inc. All rights reserved. design for interaction Daniel Tunkelang Chief Scientist, Endeca

description

Design for Interaction by Daniel Tunkelang, Chief Scientist of Endeca An invited presentation at SIGMOD '09 (http://sigmod09.org/) Research in information retrieval has focused on presenting the most relevant results to a user in response to a free-text search query. Research in database systems assumes a model where the user enters a formal query, and the results are exactly those the user requested. Neither community has emphasized user interaction—a critical concern for practical information access. As William Goffman noted in the 1960s and Nick Belkin continually reminds us today, the relationship between a document and query, though necessary, is not sufficient to determine relevance—yet ranked retrieval approaches rely heavily or exclusively on this relationship. Meanwhile, recent work on database usability by Jeff Naughton and H.V. Jagadish surfaces the rigidity of database systems that return nothing unless users know how to formulate precise queries. This talk presents human-computer information retrieval (HCIR) as a general approach that addresses some of the key challenges facing both research communities. A vision first put forward by Gary Marchionini, HCIR expects people and systems to work together to implement information access. Such an approach requires rethinking information access not as a matching or ranking problem, but rather as a communication problem. Specifically, we need interfaces that optimize the bidirectional communication between the user and the system, thus optimizing the symbiotic division of labor between the two. This talk reviews the history of HCIR efforts and presents ongoing work to implement the HCIR vision. In particular, it presents an interactive set retrieval approach that responds to queries with an overview of the user's current context and an organized set of options for incremental exploration.

Transcript of Design for Interaction

Page 1: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.

design for interaction

Daniel TunkelangChief Scientist, Endeca

Page 2: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.2

about me

Organizing SIGIR ’09 Industry Track in Boston on July 22nd!

Page 3: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.3

about endeca

250M+ end users

per month

250M+ end users

per month600+ customers

$100M+ annual sales

leading provider ofsearch applications

Page 4: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.4

what i hope you learn from this talk

the db and ir perspectives have a common thread

convergence may be upon us

but we need interaction to make it work

Page 5: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.5

overview

don't put all your eggs in one basket

design for interaction

human-computer information retrieval

Page 6: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.6

don’t put all your eggs in one basket

Still Life with Basket and Broken Eggs by Michael Edwards, 2008

Page 7: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.7

the db approach: perfection in, perfection out

http://www.storeitfoodsblog.com/category/food-preparation/meat-grinder/

Page 8: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.8

db usability researchers recognize the pain

Page 9: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.9

sql is hard

Making Database Systems Usable[Jagadish et al., SIGMOD 2007]

• labor-intensive query construction

• lengthy query evaluation

• high query reformulation cost

__sql

Page 10: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.10

data sucks and users are lazy

Extracting Problems for Databaseand IR Researchers[Naughton, Spring 2008 North East DB/IR Day]

• real data is– incomplete– inconsistent– incorrect

• users don’t want to learn– data schemas– structured query languages we’re not gonna take it!

Page 11: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.11

the ir way: don’t worry, be happy

http://adsoftheworld.com/media/print/mcdonalds_burger_mysteries

Page 12: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.12

ir for db people: what would google do?

information Need query select from results

rank using IR model

USER:

SYSTEM:tf-idf PageRank

Page 13: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.13

assumptions of relevance-centric ir approach

• self-awareness

• self-expression

• model knows best

• answer is a document

• one-shot query

Page 14: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.14

life is not a batch

• db approach expects too much of user• ir approach expects too much of system

both approaches act as if it allcomes down to a single query

is that your final answer question?

Page 15: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.15

design for interaction

The Future of Social Interaction by Jim Stoten

Page 16: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.16

changes assumptions about what to optimize

recall

pre

cis

ion

complexity relevance

communication

Page 17: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.17

how do we optimize communication?

transparency

control

guidance

Page 18: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.18

ir offers a black box

ca c'est la caisse. le mouton que tu veux est dedans.

Page 19: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.19

db / set retrieval offers 2 out of 3

transparency

control

guidance

Page 20: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.20

but we need it all!

• set retrieval is a failure in the ir world– though quite successful in the db world!

• but ranked retrieval is inherently crippled– no transparency, control, or guidance!

how do we optimize for communication?

Page 21: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.21

human-computer information retrieval

• don’t just guess the user’s intent• increase user responsibility and control• require and reward human intellectual effort

“Toward Human-Computer Information Retrieval”

Gary Marchionini

Page 22: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.22

great idea

how?

Page 23: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.23

treat query construction as a process

A Case for Interaction[Koenemann and Belkin, 1996]

• used term feedback to improve alerting queries

• users select from suggested terms

• 17 – 34% improvement in precision @ 30

• users liked the feedback interface

Page 24: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.24

expose the facets of semistructured content

Page 25: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.25

success in the lab and the field

• favored in user studies by Marti Hearst– http://flamenco.berkeley.edu/

• ubiquitous in ecommerce– amazon.com– eBay– endeca powers 42 of top 100 online retailers

• taking over media, libraries, enterprise, etc.

Page 26: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.26

even a few db folks have drunk the kool-aid

DataGuides[Goldman and Widom, VLDB 1997]

• user-friendly schema summaries

Magnet[Sinha and Karger, SIGMOD 2005]

• navigation and refinement options

common theme: semistructured

Page 27: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.27

what is semistructured data?

• one universe

• self-describing

• blends data / meta-data

Page 28: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.28

data modeling flexibility

• no a-priori schema– integrated sources without up-front schema design

• richer modeling capabilities tame data complexity– hierarchy, multi-valued fields, sparse fields

• schema flexibility eases schema evolution– new entity types, new data source

Databases Content ManagementERP

Groupware and Collaboration

WWW Internet

SOA, ESB,Web ServiceFile Systems

Page 29: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.29

semantically direct queries

<shirt><sku>1234</sku><sleeve>Long</sleeve><desc>Classic end-on-end shirt</desc><price>39.99</price><salePrice>29.99</salePrice><color>Blue</color><color>Yellow</color><color>White</color>...

</shirt> <trousers><sku>1579</sku><price>59.99</price><color>Khaki</color>...

</trousers>

which on-sale itemsare available in blue?

<buyingGuide><title>Selecting the right ski coat for you.</title><file>skiguide.pdf</file><keyword>ski</keyword><keyword>coat</keyword>...

</buyingGuide>

which attributescharacterize on-saleblue items?

price, sleeve,color, salePrice,brand, fabric, …

Page 30: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.30

but let’s make this concrete

Uh oh, I’m presenting atSIGMOD! Better find a good

book about databases!

Page 31: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.31

quick, to the goog-mobile!

not quite…

Page 32: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.32

i know, i’ll go to the library!

#%@$!

Page 33: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.33

let’s try a little hcir…

Page 34: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.34

hcir works for news too

Page 35: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.35

life in a semistructured world

• search is a great starting point– users can’t / won’t initiate structured queries

• ranked lists are an inadequate ending point– search queries are lossy projections of intent

• hcir leads users down a garden path to structure

Page 36: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.36

lots of trade-offs

“everything should be made as simple as possible, but no simpler”

“speed of thought” vs. “going nowhere quickly”

“to err is human, but to really foul things up requires a computer”

simple interfaces don’talways yield satisfaction

Page 37: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.37

users want the triumvirate

• transparency• control• guidance

transparency and control are easy

guidance requires cleverness

Page 38: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.38

in closing

all of us want to help people access information

the best help is to help them help themselves

design for interaction thoughtransparency, control, guidance

Page 39: Design for Interaction

© 2009 Endeca Technologies, Inc. All rights reserved.39

thank you…and come to SIGIR!

communication 1.0email: [email protected]

communication 2.0blog: http://thenoisychannel.com

twitter: http://twitter.com/dtunkelang

SIGIR: July 19-23 in Boston Industry Track on July 22nd!