Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management...
-
Upload
heriberto-barlow -
Category
Documents
-
view
217 -
download
0
Transcript of Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management...
Haystack: Per-User Information Environment
1999 Conference on Information and Knowledge Management
Eytan Adar et al
Presented by Xiao Hu
CS491CXZ
What is Haystack?
A software for organizing and retrieving personal information
Totally personalized One user, one Haystack
Personal digital bookshelfA prototype
Haystack and IR
IR large corpus precision-recall metric “expert”relevance judge
IF (collaborative filtering) preference for similar
users require explicit user input
Haystack personal collection user’s satisfaction particular user focus on searching specific to one usercan observe user’s
implicit information needs
All Users / Groups of Users A Single User
Haystack Functionality
Automated data gathering Information maximization gathering as much information as possible
Customized information collectionAdaptation to individual query needs
A IR system that adapts to its user
?
General Data Model
Accommodate all information arbitrary pieces of data metadata links between them
Facilitate data growth new data user’s annotation user’s information behavior
A semantic network
full text searching
bibliographic info. Searching
associate searching
adapt to the user
General Data Model (illustration)
Haystack Document
%!PS Postscript …
tie.Body
Glue todays …
http://goose.lcs.mit.edu …
File Type Guess Rule
Postscript …
needle.Binary
tie.Text
needle.Text
needle.Text
needle.Text
needle.Text
tie.Location.URL
tie.file Type.Postscript
tie.Creator
Needle tie Bale
General Data Model (summary)
Inheritance hierarchy Straw needle: primitive information bale: collection of related straws tie: relationship b/w straws
Metadata representationRecursive metadata annotation Interface Haystack to external “services”
Index agents controlling external devices
System Architecture
Database, searching engine an adapter as interface to various engines
Core Haystack system (root server) data model implementation operation-system-like services
Client level services user interface proxy services data augmenting
services
annotation, querying, browsing
observing interaction with external information resources
modifying data, adding links,…
System Architecture (illustration)
“Real” world
Proxy services: WWW, Mail,
Filesystem, etc.
User Event Driven Clients: Type Guesset, Fetch Services, etc.
Internet Services: Java, WWW, Emacs, Command Line, etc. Query
“Helper” Services
Internet Services: Java, WWW, Emacs, Command Line, etc.
Query Query
Haystack Data
Other Info. Sys. IR System Lore DBM
Client Layer
Service Layer
Database Layer
Haystack Root Server
Indexing in Haystack
Straws generate textual informationIR system stores such informationInfo. from each straw will be regarded as
one unit of indexing allows to associate pieces of information
Incrementally indexing whenever a series of changes happen
Information gathering
User’s explicit annotationUser’s behaviors observed by the system
interaction with outside world (www, emails) interaction with Haystack
building query paths – adapting to the user’s style
Analyzing corpus already in Haystack indexing metadata extraction adding links between documents
User’s explicit annotation
Probably the best information sourceMight not be realisticNicer interface to encourage usersHCI studies
Observers
Proxy services WWW, email proxies
Recording webpages the user sees Tracing the path of browsing Recording visiting time ……
Query observer Using query interactions to mold the data model
to the user
Plug in new data
Adding links b/w nodes
Facilitating retrieval
Query Observer
Integrates queries into the data modelQuery straw
a bale, containing query text, rank of docs, …. attached nodes of matched documents annotations from user’s choices relevance
feedback
tuned to a particular user
Query path a chain of query straws in a single searching good for future retrievals:
presenting similar query terms adapting relevance of documents
by reindexing documents with text of the query path
Information gathering
User’s explicit annotationUser’s behaviors observed by the system
interaction with outside world (www, emails) interaction with Haystack
building query paths – adapting to the user’s style
Analyzing corpus already in Haystack indexing metadata extraction adding links between documents
Data augmenting clients
Data driven clients
Data augmenting clients
“Real” world
Proxy services: WWW, Mail,
Filesystem, etc.
User Event Driven Clients: Type Guesset, Fetch Services, etc.
Internet Services: Java, WWW, Emacs, Command Line, etc. Query
“Helper” Services
Internet Services: Java, WWW, Emacs, Command Line, etc.
Query Query
Haystack Data
Other Info. Sys. IR System Lore DBM
Client Layer
Service Layer
Database Layer
Haystack Root Server
Data augmenting clients
digesting existing information, generating new information
Independent but cooperating Fetch clients Type inference clients Extractor clients Field finder clients
Triggered by events: data changes in Haystack
Data augmenting clients: example
Figure 3: Data driven clients in action
Fetch client
Type inference client
Extractor client
(a) (b)
Document Bits
(0010100111…)
URL URL
(http://web.mit.edu…)
Document
(Haystack: A…) URL Document Bits
Type
URL
Document Bits
Type (Postscript)
Document Document
(d) (c)
Document
SummaryA prototype of a personalized information
organization and retrieval systemRelationship with IRGeneral Data Model
graph, straws, …System Architecture
three layers: DB, core, clientsData Gathering
three approaches