2003.12.02 - SLIDE 1IS 202 – FALL 2003 Lecture 23: Interfaces for Information Retrieval II Prof....
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
1
Transcript of 2003.12.02 - SLIDE 1IS 202 – FALL 2003 Lecture 23: Interfaces for Information Retrieval II Prof....
2003.12.02 - SLIDE 1IS 202 – FALL 2003
Lecture 23: Interfaces for Information Retrieval II
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2003http://www.sims.berkeley.edu/academics/courses/is202/f03/
SIMS 202:
Information Organization
and Retrieval
2003.12.02 - SLIDE 2IS 202 – FALL 2003
Lecture Overview
• Review of Last Time– Introduction to HCI
– Why Interfaces Don’t Work
– Early Visions: Memex
• Interfaces for Information Retrieval II
• Discussion Questions
• Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
2003.12.02 - SLIDE 3IS 202 – FALL 2003
Lecture Overview
• Review of Last Time– Introduction to HCI
– Why Interfaces Don’t Work
– Early Visions: Memex
• Interfaces for Information Retrieval II
• Discussion Questions
• Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
2003.12.02 - SLIDE 5
Human-Computer Interaction (HCI)
• Human– The end-users of a program– The others in the organization– The designers of the program
• Computer– The machines the programs run on
• Interaction– The users tell the computers what they want– The computers communicate results– The computer may also tell users what the computer
wants them to do
2003.12.02 - SLIDE 6IS 202 – FALL 2003
Shneiderman’s Design Principles
• Provide informative feedback
• Permit easy reversal of actions
• Support an internal locus of control
• Reduce working memory load
• Provide alternative interfaces for expert and novice users
2003.12.02 - SLIDE 7IS 202 – FALL 2003
HCI for IR
• Information seeking is an imprecise process
• UI should aid users in understanding and expressing their information needs– Help formulate queries– Select among available information sources– Understand search results– Keep track of the progress of their search
2003.12.02 - SLIDE 8
How to Design and Build UIs
• Task analysis
• Rapid prototyping
• Evaluation
• Implementation
Design
Prototype
Evaluate
Iterate at every stage!
2003.12.02 - SLIDE 9IS 202 – FALL 2003
Evaluation Techniques
• Qualitative vs. quantitative methods• Qualitative (non-numeric, discursive,
ethnographic)– Focus groups– Interviews– Surveys– User observation– Participatory design sessions
• Quantitative (numeric, statistical, empirical)– User testing– System testing
2003.12.02 - SLIDE 10IS 202 – FALL 2003
Why Interfaces Don’t Work
• Because…– We still think of using the interface– We still talk of designing the interface– We still talk of improving the interface
• “We need to aid the task, not the interface to the task.”
• “The computer of the future should be invisible.”
2003.12.02 - SLIDE 11IS 202 – FALL 2003
“What Dr. Bush Foresees”
Cyclops CameraWorn on forehead, it would photograph anything you see and want to record. Film would be developed at once by dry photography.
MicrofilmIt could reduce Encyclopaedia Britannica to volume of a matchbox. Material cost: 5¢. Thus a whole library could be kept in a desk.
VocoderA machine which could type when talked to. But you might have to talk a special phonetic language to this mechanical supersecretary.
Thinking machineA development of the mathematical calculator. Give it premises and it would pass out conclusions, all in accordance with logic.
MemexAn aid to memory. Like the brain, Memex would file material by association. Press a key and it would run through a “trail” of facts.
2003.12.02 - SLIDE 12IS 202 – FALL 2003
Interaction Paradigms for IR
• Direct manipulation– Query specification– Query refinement– Result selection
• Delegation– Agents– Recommender systems– Filtering
2003.12.02 - SLIDE 13IS 202 – FALL 2003
Lecture Overview
• Review of Last Time– Introduction to HCI
– Why Interfaces Don’t Work
– Early Visions: Memex
• Interfaces for Information Retrieval II
• Discussion Questions
• Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
2003.12.02 - SLIDE 14IS 202 – FALL 2003
HCI For IR
• Browsing– Visualizing collections and documents– Navigating collections and documents
• Searching– Formulating queries– Visualizing results– Navigating results– Refining queries– Selecting results
2003.12.02 - SLIDE 15IS 202 – FALL 2003
Information Visualization
• Utility– Inherently visual data– Making the abstract concrete– Making the invisible visible
• Techniques– Icons– Color highlighting– Brushing and linking– Panning and zooming– Focus-plus-context– Magic lenses– Animation
2003.12.02 - SLIDE 16IS 202 – FALL 2003
Mapping
• Logical structure of the information– Hierarchy– Rank– Proximity– Similarity distance– Term frequency– History of changes– Etc.
• Perceptual representation of the information– Outlines, trees, graphs– Color, size, shape,
distance– Symbolic icons– Animation, interaction– Etc.
2003.12.02 - SLIDE 17IS 202 – FALL 2003
Task = Information Access
• The standard interaction model for information access
1) Start with an information need2) Select a system and collections to search on3) Formulate a query4) Send the query to the system5) Receive the results6) Scan, evaluate, and interpret the results7) Stop, or8) Reformulate the query and go to Step 4
2003.12.02 - SLIDE 18IS 202 – FALL 2003
HCI Questions for IR
• Where does a user start? – Faced with a large set of collections, how can
a user choose one to begin with?
• How will a user formulate a query?
• How will a user scan, evaluate, and interpret the results?
• How can a user reformulate a query?
2003.12.02 - SLIDE 19IS 202 – FALL 2003
HCI for IR: Collection Selection
Question 1: Where does the user start?
2003.12.02 - SLIDE 20IS 202 – FALL 2003
Starting Points for Search
• Faced with a prompt or an empty entry form … how to start?– Lists of sources– Overviews
• Clusters• Category Hierarchies/Subject Codes• Co-citation links
– Examples, Wizards, and Guided Tours– Automatic source selection
2003.12.02 - SLIDE 21IS 202 – FALL 2003
List of Sources
• Have to guess based on the name
• Requires prior exposure/experience
2003.12.02 - SLIDE 23IS 202 – FALL 2003
Overviews
• Supervised (manual) category overviews– Yahoo!– HiBrowse– MeSHBrowse
• Unsupervised (automated) groupings – Clustering– Kohonen feature maps
2003.12.02 - SLIDE 25IS 202 – FALL 2003
Summary: Category Labels
• Advantages– Interpretable– Capture summary information– Describe multiple facets of content– Domain dependent, and so descriptive
• Disadvantages– Do not scale well (for organizing documents)– Domain dependent, so costly to acquire– May mismatch users’ interests
2003.12.02 - SLIDE 26IS 202 – FALL 2003
Text Clustering
• What clustering does– Finds overall similarities among groups of documents– Finds overall similarities among groups of tokens– Picks out some themes, ignores others
• How clustering works– Cluster entire collection– Find cluster centroid that best matches the query– Problems with clustering
• It is expensive• It doesn’t work well
2003.12.02 - SLIDE 30IS 202 – FALL 2003
Summary: Clustering
• Advantages– Get an overview of main themes– Domain independent
• Disadvantages– Many of the ways documents could group together
are not shown– Not always easy to understand what they mean– Can’t see what documents are about– Documents may be forced into one position in
semantic space– Hard to view titles
• Perhaps more suited for pattern discovery– Problem: often only one view on the space
2003.12.02 - SLIDE 31IS 202 – FALL 2003
HCI for IR: Query Formulation
• Question 2: How will a user formulate a query?
2003.12.02 - SLIDE 32IS 202 – FALL 2003
Query Specification
• Interaction styles (Shneiderman 97)– Command language– Form fill– Menu selection– Direct manipulation– Natural language
• What about gesture, eye-tracking, or implicit inputs like reading habits?
2003.12.02 - SLIDE 33IS 202 – FALL 2003
Command-Based Query Specification
• COMMAND ATTRIBUTE value CONNECTOR …– FIND PA shneiderman AND TW interface
• What are the ATTRIBUTE names?
• What are the COMMAND names?
• What are allowable values?
2003.12.02 - SLIDE 39IS 202 – FALL 2003
HCI for IR: Viewing Results
• Question 3: How will a user scan, evaluate, and interpret the results?
2003.12.02 - SLIDE 40IS 202 – FALL 2003
Display of Retrieval Results
• Goal– Minimize time/effort for deciding which
documents to examine in detail
• Idea– Show the roles of the query terms in the
retrieved documents, making use of document structure
2003.12.02 - SLIDE 41IS 202 – FALL 2003
Putting Results in Context
• Interfaces should – Give hints about the roles terms play in the
collection– Give hints about what will happen if various
terms are combined– Show explicitly why documents are retrieved
in response to the query– Summarize compactly the subset of interest
2003.12.02 - SLIDE 42IS 202 – FALL 2003
Putting Results in Context
• Visualizations of query term distribution– KWIC, TileBars, SeeSoft, Virtual Shakespeare
• Visualizing shared subsets of query terms– InfoCrystal, VIBE
• Table of contents as context– SuperBook, Cha-Cha
2003.12.02 - SLIDE 44IS 202 – FALL 2003
TileBars
• Graphical representation of term distribution and overlap• Simultaneously indicate
– Relative document length
– Query term frequencies
– Query term distributions
– Query term overlap
2003.12.02 - SLIDE 45IS 202 – FALL 2003
TileBars Example
• Mainly about both DBMS & reliability
• Mainly about DBMS, discusses reliability
• Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability
• Mainly about high-tech layoffs
Query terms:
What roles do they play in retrieved documents?
DBMS (Database Systems)
Reliability
2003.12.02 - SLIDE 49IS 202 – FALL 2003
Other Approaches
• Show how often each query term occurs in sets of retrieved documents– VIBE (Korfhage ‘91)– InfoCrystal (Spoerri ‘94)
2003.12.02 - SLIDE 52IS 202 – FALL 2003
Problems with InfoCrystal
• Can’t see proximity or frequency of terms within documents
• Quantities not represented graphically
• More than 4 terms hard to handle
• No help in selecting terms to begin with
2003.12.02 - SLIDE 53IS 202 – FALL 2003
Cha-Cha (Chen & Hearst 98)
• Shows “Table-Of-Contents”-like view, like SuperBook
• Focus+Context using hyperlinks to create the TOC
• Integrates Web Site structure navigation with search
2003.12.02 - SLIDE 54IS 202 – FALL 2003
HCI for IR: Query Reformulation
• Question 4: How can a user reformulate a query?
2003.12.02 - SLIDE 55IS 202 – FALL 2003
Query Reformulation
• Thesaurus expansion– Suggest terms similar to query terms
• Relevance feedback– Suggest terms (and documents) similar to
retrieved documents that have been judged to be relevant
– “More like this” interaction
2003.12.02 - SLIDE 56IS 202 – FALL 2003
Relevance Feedback
• Modify existing query based on relevance judgements– Extract terms from relevant documents and add them
to the query– And/or re-weight the terms already in the query
• Two main approaches– Automatic (pseudo-relevance feedback)– Users select relevant documents
• Users/system select terms from an automatically generated list
2003.12.02 - SLIDE 57IS 202 – FALL 2003
Revealing Internals
• Opaque (black box) – (Like web search engines)
• Transparent – (See used terms after Relevance Feedback )
• Penetrable – (Choose suggested terms before Relevance
Feedback )
• Which do you think worked best?
2003.12.02 - SLIDE 58IS 202 – FALL 2003
Effectiveness Results
• Subjects using Relevance Feedback showed 17% - 34% better performance than without Relevance Feedback
• Subjects with penetration case did 15% better as a group than those in opaque and transparent cases
2003.12.02 - SLIDE 59IS 202 – FALL 2003
Summary: Relevance Feedback
• Iterative query modification can improve precision and recall for a standing query
• In at least one study, users were able to make good choices by seeing which terms were suggested for Relevance Feedback and selecting among them
• So … “more like this” can be useful!
2003.12.02 - SLIDE 60IS 202 – FALL 2003
Summary: HCI for IR
• Focus on the task, not the tool• Be aware of
– User abilities and differences– Prior work and innovations– Design guidelines and rules-of-thumb
• Iterate, iterate, iterate
• It is very difficult to design good UIs• It is very difficult to evaluate search UIs• Better interfaces in future should produce better
IR experiences
2003.12.02 - SLIDE 61IS 202 – FALL 2003
Lecture Overview
• Review of Last Time– Introduction to HCI
– Why Interfaces Don’t Work
– Early Visions: Memex
• Interfaces for Information Retrieval II
• Discussion Questions
• Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack
2003.12.02 - SLIDE 62IS 202 – FALL 2003
Discussion Questions
• Arthur Law on Interfaces for IR– Using visualization in web information retrieval
revealed poor results for navigation. However, this study was conducted in 1998. Are people more accustomed to these tools now with websites such as "http://www.smartmoney.com/marketmap/"? Perhaps this method of navigation will be better for the computer generation and their higher comfort level for using the web.
2003.12.02 - SLIDE 63IS 202 – FALL 2003
Discussion Questions
• Arthur Law on Interfaces for IR– There are various examples of command line
approaches and visual approaches. Individuals perform differently with each method so will the next step involve combining these methods to optimize each person's task of information retrieval? Or will a dominant company, i.e., LexisNexis or Google enforce one method of doing queries?
2003.12.02 - SLIDE 64IS 202 – FALL 2003
Discussion Questions
• Paul Laskowski on Interfaces for IR– MIR describes at least six sources of contextual
information for the documents returned by a query: metadata, term scores, location of terms in each document, combinations of terms present in each document, tables of contents, and hyperlink structure. Which of these sources provides the most help for selecting relevant documents (or does it depend on the task)? Which types of context can help with reformulating a query? In the case of the location of terms, several tools are listed that graphically show where terms are placed in each document. I imagine using this to select documents where the terms appear in the same paragraph. Should this process be automated so that documents score higher when the search terms are near to each other? In what other ways might I use this information?
2003.12.02 - SLIDE 65IS 202 – FALL 2003
Discussion Questions
• Brooke Maury on Interfaces for IR– In chapter 10.7, Hearst discusses an application
developed by Kozierok and Maes that keeps track of a user’s activities and makes recommendations based on previous action or situations. What impact does this “assistant/agent” application have on privacy? Is this too heavy a price to pay for achieving a positive human computer exchange or a more successful retrieval? If a system is charged with “looking over the shoulder” of a user, is there an ethical imperative to encrypt that information or otherwise provide safeguards against the misuse or abuse of that information?
2003.12.02 - SLIDE 66IS 202 – FALL 2003
Discussion Questions
• Brooke Maury on Interfaces for IR– The study by Koenemann and Belkin
suggests that the most effective systems will allow users total control and access to what information is used for decision-making (They call such applications ‘penetrable.’). The system developed by Kozierok & Maes makes a number of important decisions without input from the user. Should K & M’s application be more ‘penetrable’?
2003.12.02 - SLIDE 67IS 202 – FALL 2003
Discussion Questions
• Dan Perkel on Interfaces for IR– While the web "has suddenly made vast quantities of
information available globally" (MIR, 322) some would argue that it also comes at the price of a giant step backwards in terms of interfaces (As one example, compare the functionality of and types of interaction allowed by an email web app such as YahooMail/HotMail with an email client such as Eudora/Outlook/AppleMail). What does this say about the future of visualization techniques for IR? What needs to happen (technically, business-wise, other) for a top search engine to add an interactive visualization component to its search results?
2003.12.02 - SLIDE 68IS 202 – FALL 2003
Discussion Questions
• Joseph Hall on Interfaces for IR– In section 10.9 of MIR: "The field of information visualization needs some
new ideas about how to display large, abstract information spaces intuitively.“ The seems to be the "holy grail" of HCI. Something that can intuitively deal with large information spaces... with feeble human brains providing imperfect queries. For example, a nowhere-near feeble brain and pretty direct query is evidenced by danah boyd's most recent blog entry: turtles all the way down http://www.zephoria.org/thoughts/archives/000889.html#000889 In this blog entry, danah has already queried the state-of-the-art search tool, Google, and unfortunately came across conflicting results.
– While Google can handle large information spaces sometimes the PageRank algorithm is just not enough. Seeing as humans tend to think in terms of "concentration"[1], what are some of the "penetrable" ways that IR tools could more effectively facilitate the human thought process instead of simply retrieving information?
– [1] An old card game that requires remembering exactly where you saw a certain card for retrieval later.
2003.12.02 - SLIDE 69IS 202 – FALL 2003
Lecture Overview
• Review of Last Time– Introduction to HCI
– Why Interfaces Don’t Work
– Early Visions: Memex
• Interfaces for Information Retrieval II
• Discussion Questions
• Action Items for Next Time
Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack