2003.12.02 - SLIDE 1IS 202 – FALL 2003 Lecture 23: Interfaces for Information Retrieval II Prof....

2003.12.02 - SLIDE 1IS 202 – FALL 2003

Lecture 23: Interfaces for Information Retrieval II

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2003http://www.sims.berkeley.edu/academics/courses/is202/f03/

SIMS 202:

Information Organization

and Retrieval

2003.12.02 - SLIDE 2IS 202 – FALL 2003

Lecture Overview

• Review of Last Time– Introduction to HCI

– Why Interfaces Don’t Work

– Early Visions: Memex

• Interfaces for Information Retrieval II

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

2003.12.02 - SLIDE 3IS 202 – FALL 2003

Lecture Overview








2003.12.02 - SLIDE 4IS 202 – FALL 2003

“Drawing the Circles”

2003.12.02 - SLIDE 5

Human-Computer Interaction (HCI)

• Human– The end-users of a program– The others in the organization– The designers of the program

• Computer– The machines the programs run on

• Interaction– The users tell the computers what they want– The computers communicate results– The computer may also tell users what the computer

wants them to do

2003.12.02 - SLIDE 6IS 202 – FALL 2003

Shneiderman’s Design Principles

• Provide informative feedback

• Permit easy reversal of actions

• Support an internal locus of control

• Reduce working memory load

• Provide alternative interfaces for expert and novice users

2003.12.02 - SLIDE 7IS 202 – FALL 2003

HCI for IR

• Information seeking is an imprecise process

• UI should aid users in understanding and expressing their information needs– Help formulate queries– Select among available information sources– Understand search results– Keep track of the progress of their search

2003.12.02 - SLIDE 8

How to Design and Build UIs

• Task analysis

• Rapid prototyping

• Evaluation

• Implementation

Design

Prototype

Evaluate

Iterate at every stage!

2003.12.02 - SLIDE 9IS 202 – FALL 2003

Evaluation Techniques

• Qualitative vs. quantitative methods• Qualitative (non-numeric, discursive,

ethnographic)– Focus groups– Interviews– Surveys– User observation– Participatory design sessions

• Quantitative (numeric, statistical, empirical)– User testing– System testing

2003.12.02 - SLIDE 10IS 202 – FALL 2003

Why Interfaces Don’t Work

• Because…– We still think of using the interface– We still talk of designing the interface– We still talk of improving the interface

• “We need to aid the task, not the interface to the task.”

• “The computer of the future should be invisible.”

2003.12.02 - SLIDE 11IS 202 – FALL 2003

“What Dr. Bush Foresees”

Cyclops CameraWorn on forehead, it would photograph anything you see and want to record. Film would be developed at once by dry photography.

MicrofilmIt could reduce Encyclopaedia Britannica to volume of a matchbox. Material cost: 5¢. Thus a whole library could be kept in a desk.

VocoderA machine which could type when talked to. But you might have to talk a special phonetic language to this mechanical supersecretary.

Thinking machineA development of the mathematical calculator. Give it premises and it would pass out conclusions, all in accordance with logic.

MemexAn aid to memory. Like the brain, Memex would file material by association. Press a key and it would run through a “trail” of facts.

2003.12.02 - SLIDE 12IS 202 – FALL 2003

Interaction Paradigms for IR

• Direct manipulation– Query specification– Query refinement– Result selection

• Delegation– Agents– Recommender systems– Filtering

2003.12.02 - SLIDE 13IS 202 – FALL 2003

Lecture Overview








2003.12.02 - SLIDE 14IS 202 – FALL 2003

HCI For IR

• Browsing– Visualizing collections and documents– Navigating collections and documents

• Searching– Formulating queries– Visualizing results– Navigating results– Refining queries– Selecting results

2003.12.02 - SLIDE 15IS 202 – FALL 2003

Information Visualization

• Utility– Inherently visual data– Making the abstract concrete– Making the invisible visible

• Techniques– Icons– Color highlighting– Brushing and linking– Panning and zooming– Focus-plus-context– Magic lenses– Animation

2003.12.02 - SLIDE 16IS 202 – FALL 2003

Mapping

• Logical structure of the information– Hierarchy– Rank– Proximity– Similarity distance– Term frequency– History of changes– Etc.

• Perceptual representation of the information– Outlines, trees, graphs– Color, size, shape,

distance– Symbolic icons– Animation, interaction– Etc.

2003.12.02 - SLIDE 17IS 202 – FALL 2003

Task = Information Access

• The standard interaction model for information access

1) Start with an information need2) Select a system and collections to search on3) Formulate a query4) Send the query to the system5) Receive the results6) Scan, evaluate, and interpret the results7) Stop, or8) Reformulate the query and go to Step 4

2003.12.02 - SLIDE 18IS 202 – FALL 2003

HCI Questions for IR

• Where does a user start? – Faced with a large set of collections, how can

a user choose one to begin with?

• How will a user formulate a query?

• How will a user scan, evaluate, and interpret the results?

• How can a user reformulate a query?

2003.12.02 - SLIDE 19IS 202 – FALL 2003

HCI for IR: Collection Selection

Question 1: Where does the user start?

2003.12.02 - SLIDE 20IS 202 – FALL 2003

Starting Points for Search

• Faced with a prompt or an empty entry form … how to start?– Lists of sources– Overviews

• Clusters• Category Hierarchies/Subject Codes• Co-citation links

– Examples, Wizards, and Guided Tours– Automatic source selection

2003.12.02 - SLIDE 21IS 202 – FALL 2003

List of Sources

• Have to guess based on the name

• Requires prior exposure/experience

2003.12.02 - SLIDE 22IS 202 – FALL 2003

Old Lexis-Nexis Interface

2003.12.02 - SLIDE 23IS 202 – FALL 2003

Overviews

• Supervised (manual) category overviews– Yahoo!– HiBrowse– MeSHBrowse

• Unsupervised (automated) groupings – Clustering– Kohonen feature maps

2003.12.02 - SLIDE 24IS 202 – FALL 2003

Yahoo! Interface

2003.12.02 - SLIDE 25IS 202 – FALL 2003

Summary: Category Labels

• Advantages– Interpretable– Capture summary information– Describe multiple facets of content– Domain dependent, and so descriptive

• Disadvantages– Do not scale well (for organizing documents)– Domain dependent, so costly to acquire– May mismatch users’ interests

2003.12.02 - SLIDE 26IS 202 – FALL 2003

Text Clustering

• What clustering does– Finds overall similarities among groups of documents– Finds overall similarities among groups of tokens– Picks out some themes, ignores others

• How clustering works– Cluster entire collection– Find cluster centroid that best matches the query– Problems with clustering

• It is expensive• It doesn’t work well

2003.12.02 - SLIDE 27IS 202 – FALL 2003

Scatter/Gather Interface

2003.12.02 - SLIDE 28IS 202 – FALL 2003

“ThemeScapes” Clustering

2003.12.02 - SLIDE 29IS 202 – FALL 2003

Kohonen Feature Maps on Text

2003.12.02 - SLIDE 30IS 202 – FALL 2003

Summary: Clustering

• Advantages– Get an overview of main themes– Domain independent

• Disadvantages– Many of the ways documents could group together

are not shown– Not always easy to understand what they mean– Can’t see what documents are about– Documents may be forced into one position in

semantic space– Hard to view titles

• Perhaps more suited for pattern discovery– Problem: often only one view on the space

2003.12.02 - SLIDE 31IS 202 – FALL 2003

HCI for IR: Query Formulation

• Question 2: How will a user formulate a query?

2003.12.02 - SLIDE 32IS 202 – FALL 2003

Query Specification

• Interaction styles (Shneiderman 97)– Command language– Form fill– Menu selection– Direct manipulation– Natural language

• What about gesture, eye-tracking, or implicit inputs like reading habits?

2003.12.02 - SLIDE 33IS 202 – FALL 2003

Command-Based Query Specification

• COMMAND ATTRIBUTE value CONNECTOR …– FIND PA shneiderman AND TW interface

• What are the ATTRIBUTE names?

• What are the COMMAND names?

• What are allowable values?

2003.12.02 - SLIDE 34IS 202 – FALL 2003

Form-Based Query Specification

2003.12.02 - SLIDE 35IS 202 – FALL 2003

Form-Based Query Specification

2003.12.02 - SLIDE 36IS 202 – FALL 2003

Direct Manipulation Query Specification

2003.12.02 - SLIDE 37IS 202 – FALL 2003

Menu-Based Query Specification

2003.12.02 - SLIDE 38IS 202 – FALL 2003

Natural Language Query

• AskJeeves– http://www.ask.com/

2003.12.02 - SLIDE 39IS 202 – FALL 2003

HCI for IR: Viewing Results

• Question 3: How will a user scan, evaluate, and interpret the results?

2003.12.02 - SLIDE 40IS 202 – FALL 2003

Display of Retrieval Results

• Goal– Minimize time/effort for deciding which

documents to examine in detail

• Idea– Show the roles of the query terms in the

retrieved documents, making use of document structure

2003.12.02 - SLIDE 41IS 202 – FALL 2003

Putting Results in Context

• Interfaces should – Give hints about the roles terms play in the

collection– Give hints about what will happen if various

terms are combined– Show explicitly why documents are retrieved

in response to the query– Summarize compactly the subset of interest

2003.12.02 - SLIDE 42IS 202 – FALL 2003

Putting Results in Context

• Visualizations of query term distribution– KWIC, TileBars, SeeSoft, Virtual Shakespeare

• Visualizing shared subsets of query terms– InfoCrystal, VIBE

• Table of contents as context– SuperBook, Cha-Cha

2003.12.02 - SLIDE 43IS 202 – FALL 2003

KWIC (Keyword in Context)

2003.12.02 - SLIDE 44IS 202 – FALL 2003

TileBars

• Graphical representation of term distribution and overlap• Simultaneously indicate

– Relative document length

– Query term frequencies

– Query term distributions

– Query term overlap

2003.12.02 - SLIDE 45IS 202 – FALL 2003

TileBars Example

• Mainly about both DBMS & reliability

• Mainly about DBMS, discusses reliability

• Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability

• Mainly about high-tech layoffs

Query terms:

What roles do they play in retrieved documents?

DBMS (Database Systems)

Reliability

2003.12.02 - SLIDE 46IS 202 – FALL 2003

TileBars Example

2003.12.02 - SLIDE 47IS 202 – FALL 2003

SeeSoft (Eick & Wills 95)

2003.12.02 - SLIDE 48IS 202 – FALL 2003

David Small: Virtual Shakespeare

2003.12.02 - SLIDE 49IS 202 – FALL 2003

Other Approaches

• Show how often each query term occurs in sets of retrieved documents– VIBE (Korfhage ‘91)– InfoCrystal (Spoerri ‘94)

2003.12.02 - SLIDE 50IS 202 – FALL 2003

VIBE (Olson et al. 93, Korfhage 93)

2003.12.02 - SLIDE 51IS 202 – FALL 2003

InfoCrystal (Spoerri 94)

2003.12.02 - SLIDE 52IS 202 – FALL 2003

Problems with InfoCrystal

• Can’t see proximity or frequency of terms within documents

• Quantities not represented graphically

• More than 4 terms hard to handle

• No help in selecting terms to begin with

2003.12.02 - SLIDE 53IS 202 – FALL 2003

Cha-Cha (Chen & Hearst 98)

• Shows “Table-Of-Contents”-like view, like SuperBook

• Focus+Context using hyperlinks to create the TOC

• Integrates Web Site structure navigation with search

2003.12.02 - SLIDE 54IS 202 – FALL 2003

HCI for IR: Query Reformulation

• Question 4: How can a user reformulate a query?

2003.12.02 - SLIDE 55IS 202 – FALL 2003

Query Reformulation

• Thesaurus expansion– Suggest terms similar to query terms

• Relevance feedback– Suggest terms (and documents) similar to

retrieved documents that have been judged to be relevant

– “More like this” interaction

2003.12.02 - SLIDE 56IS 202 – FALL 2003

Relevance Feedback

• Modify existing query based on relevance judgements– Extract terms from relevant documents and add them

to the query– And/or re-weight the terms already in the query

• Two main approaches– Automatic (pseudo-relevance feedback)– Users select relevant documents

• Users/system select terms from an automatically generated list

2003.12.02 - SLIDE 57IS 202 – FALL 2003

Revealing Internals

• Opaque (black box) – (Like web search engines)

• Transparent – (See used terms after Relevance Feedback )

• Penetrable – (Choose suggested terms before Relevance

Feedback )

• Which do you think worked best?

2003.12.02 - SLIDE 58IS 202 – FALL 2003

Effectiveness Results

• Subjects using Relevance Feedback showed 17% - 34% better performance than without Relevance Feedback

• Subjects with penetration case did 15% better as a group than those in opaque and transparent cases

2003.12.02 - SLIDE 59IS 202 – FALL 2003

Summary: Relevance Feedback

• Iterative query modification can improve precision and recall for a standing query

• In at least one study, users were able to make good choices by seeing which terms were suggested for Relevance Feedback and selecting among them

• So … “more like this” can be useful!

2003.12.02 - SLIDE 60IS 202 – FALL 2003

Summary: HCI for IR

• Focus on the task, not the tool• Be aware of

– User abilities and differences– Prior work and innovations– Design guidelines and rules-of-thumb

• Iterate, iterate, iterate

• It is very difficult to design good UIs• It is very difficult to evaluate search UIs• Better interfaces in future should produce better

IR experiences

2003.12.02 - SLIDE 61IS 202 – FALL 2003

Lecture Overview








2003.12.02 - SLIDE 62IS 202 – FALL 2003

Discussion Questions

• Arthur Law on Interfaces for IR– Using visualization in web information retrieval

revealed poor results for navigation. However, this study was conducted in 1998. Are people more accustomed to these tools now with websites such as "http://www.smartmoney.com/marketmap/"? Perhaps this method of navigation will be better for the computer generation and their higher comfort level for using the web.

2003.12.02 - SLIDE 63IS 202 – FALL 2003


• Arthur Law on Interfaces for IR– There are various examples of command line

approaches and visual approaches. Individuals perform differently with each method so will the next step involve combining these methods to optimize each person's task of information retrieval? Or will a dominant company, i.e., LexisNexis or Google enforce one method of doing queries?

2003.12.02 - SLIDE 64IS 202 – FALL 2003


• Paul Laskowski on Interfaces for IR– MIR describes at least six sources of contextual

information for the documents returned by a query: metadata, term scores, location of terms in each document, combinations of terms present in each document, tables of contents, and hyperlink structure. Which of these sources provides the most help for selecting relevant documents (or does it depend on the task)? Which types of context can help with reformulating a query? In the case of the location of terms, several tools are listed that graphically show where terms are placed in each document. I imagine using this to select documents where the terms appear in the same paragraph. Should this process be automated so that documents score higher when the search terms are near to each other? In what other ways might I use this information?

2003.12.02 - SLIDE 65IS 202 – FALL 2003


• Brooke Maury on Interfaces for IR– In chapter 10.7, Hearst discusses an application

developed by Kozierok and Maes that keeps track of a user’s activities and makes recommendations based on previous action or situations. What impact does this “assistant/agent” application have on privacy? Is this too heavy a price to pay for achieving a positive human computer exchange or a more successful retrieval? If a system is charged with “looking over the shoulder” of a user, is there an ethical imperative to encrypt that information or otherwise provide safeguards against the misuse or abuse of that information?

2003.12.02 - SLIDE 66IS 202 – FALL 2003


• Brooke Maury on Interfaces for IR– The study by Koenemann and Belkin

suggests that the most effective systems will allow users total control and access to what information is used for decision-making (They call such applications ‘penetrable.’). The system developed by Kozierok & Maes makes a number of important decisions without input from the user. Should K & M’s application be more ‘penetrable’?

2003.12.02 - SLIDE 67IS 202 – FALL 2003


• Dan Perkel on Interfaces for IR– While the web "has suddenly made vast quantities of

information available globally" (MIR, 322) some would argue that it also comes at the price of a giant step backwards in terms of interfaces (As one example, compare the functionality of and types of interaction allowed by an email web app such as YahooMail/HotMail with an email client such as Eudora/Outlook/AppleMail). What does this say about the future of visualization techniques for IR? What needs to happen (technically, business-wise, other) for a top search engine to add an interactive visualization component to its search results?

2003.12.02 - SLIDE 68IS 202 – FALL 2003


• Joseph Hall on Interfaces for IR– In section 10.9 of MIR: "The field of information visualization needs some

new ideas about how to display large, abstract information spaces intuitively.“ The seems to be the "holy grail" of HCI. Something that can intuitively deal with large information spaces... with feeble human brains providing imperfect queries. For example, a nowhere-near feeble brain and pretty direct query is evidenced by danah boyd's most recent blog entry: turtles all the way down http://www.zephoria.org/thoughts/archives/000889.html#000889 In this blog entry, danah has already queried the state-of-the-art search tool, Google, and unfortunately came across conflicting results.

– While Google can handle large information spaces sometimes the PageRank algorithm is just not enough. Seeing as humans tend to think in terms of "concentration"[1], what are some of the "penetrable" ways that IR tools could more effectively facilitate the human thought process instead of simply retrieving information?

– [1] An old card game that requires remembering exactly where you saw a certain card for retrieval later.

2003.12.02 - SLIDE 69IS 202 – FALL 2003

Lecture Overview








2003.12.02 - SLIDE 70IS 202 – FALL 2003

Next Time

• Wishter DEMO!

• Final Exam Review

2003.12.02 - SLIDE 1IS 202 – FALL 2003 Lecture 23: Interfaces for Information Retrieval II Prof....

Documents

Transcript of 2003.12.02 - SLIDE 1IS 202 – FALL 2003 Lecture 23: Interfaces for Information Retrieval II Prof....