Personal Information Search and Discovery
-
Upload
amelie-marian -
Category
Documents
-
view
224 -
download
0
Transcript of Personal Information Search and Discovery
Amélie Marian - Rutgers University - SinFra 2015
2
Personal data is everywhere
Amélie Marian - Rutgers University - SinFra 2015
3
Personal data is exploding
• More and more devices/systems are capturing all parts of our lives:– Actively
emails, social media, calendar, contacts…
– PassivelyGPS, records of financial transactions, records of purchases
– StealthilyClicks, searches, interactions, tv viewing habits
Amélie Marian - Rutgers University - SinFra 2015
4
The time for Personal Information Management Systems is now!
“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”
- Vannevar Bush, The Atlantic Monthly, 1945
Amélie Marian - Rutgers University - SinFra 2015
5
Personal Information Management Challenges
• Data fragmentation– Storage, archiving– Data integration– Data maintenance– Synchronization– Data quality
• Data ownership– Access control– Privacy– Sharing
• Functionalities– Search– Knowledge discovery and data mining– Internet of things
Amélie Marian - Rutgers University - SinFra 2015
6
Saving Personal Data – Old School
Amélie Marian - Rutgers University - SinFra 2015
7
Searching Personal Data – Old School…
File cabinet around 1888
Amélie Marian - Rutgers University - SinFra 2015
8
Personal Information Management – the Digital Age
% grep PIMS /usr/amelie/presentations
Amélie Marian - Rutgers University - SinFra 2015
9
First-generation PIMS – Desktop based
• Storage– Archival, safe-keeping
• Organization– Structure– Different file types
• Finding and re-finding information– Different from traditional IR/Web search systems– Keyword searches not ideal
Amélie Marian - Rutgers University - SinFra 2015
10
Desktop Search Tools
• Google Desktop Search (defunct)• Apple Spotlight• Windows Search
• Lead to frustration when users cannot find information they know they have
Use IR-style keyword searches Some metadata filtering
Amélie Marian - Rutgers University - SinFra 2015
11
Some Past PIMS projects
• Lifestreams– Time oriented streams
• Stuff I’ve seen– History of web behavior
• Haystack– Uniform data model
• Connections, Seetrieve– Task-based organization
• Dataspaces– Semantic connections. Data
integration
• deskWeb– Looks at the social network graph
Various use of – Context– Time– Social network
Limitations– Limited data integration– Local storage– Basic functionalities
Amélie Marian - Rutgers University - SinFra 2015
12
A changing landscape
Cloud-based model
Heterogeneous data types and formats
Need for richer functionalities
The Future of Personal Information Search
Amélie Marian - Rutgers University - SinFra 2015
14
Life-logging
From Memex to MyLifeBitsMemex: Memory index or Memory extender
– Hypertext system by Vannevar Bush in 1945 – Compress and store all of their books,
records, and communications…– Provide an "enlarged intimate supplement to
one's memory”
MyLifeBits– Microsoft Research project with Gordon Bell– All documents read or produced by Bell, CDs,
emails, web pages browsed, phone and instant messaging conversations, etc.
Amélie Marian - Rutgers University - SinFra 2015
15
Hypermnesia
Exceptionally exact or vivid memory, especially as associated with certain mental illnesses
For a user: We cannot live knowing that any word, any move will leave a trace?
For the ecosystem: We cannot store all the data we produce – lack of storage resources
Forgetting is Key to a Healthy MindScientific AmericanImage: Aaron Goodman
A main issue is to select the information we choose to keep
Amélie Marian - Rutgers University - SinFra 2015
16
Memory Tasks
• The “five Rs” memory tasks -Sellen and Whitaker, CACM
2010
RecollectingReminiscingRetrievingReflectingRemembering intentions
Amélie Marian - Rutgers University - SinFra 2015
17
Recollecting
• Task-based memory process• Retracing steps to recollect information– “Where did I leave my keys” – “When was the last time I saw Pierre”
• Follow a series of cues to identify information
Need: Connections between memory objects (integration and navigation)
Amélie Marian - Rutgers University - SinFra 2015
18
Reminiscing
• Browsing through past memories to re-live them
• Experience-based (no specific goal in mind)– E.g., looking at old
photos
Need: Connections between memory objects (integration and navigation)
Amélie Marian - Rutgers University - SinFra 2015
19
Retrieving
• Retrieving specific information– Files, documents, pictures– Data snippets
• Use of metadata• Can be combined with recollection
Need: Query model, Indexes, and Search
algorithms
Amélie Marian - Rutgers University - SinFra 2015
20
Reflecting
• Learning from the past– Identify patterns– Personal data analysis
• Towards a Personal Knowledge Base (PKB)– Individual vs. shared knowledge– Privacy concerns
Need: Knowledge Discovery and Mining techniques designed for personal data
Amélie Marian - Rutgers University - SinFra 2015
21
Remembering Intentions
• Focus on prospective memory– To-do lists– Appointment reminders
• Active focus of commercial companies– Google Now– Notification apps (time- or location-based)– Microsoft Personal Agent project?
Need: NLP techniques designed for personal data
Amélie Marian - Rutgers University - SinFra 2015
22
One more wish: Serendipity
• Hearing by chance a song that is going to totally obsess you
• A suggested book that will change your life
• Entering this small restaurant that you will remember forever
This is serendipitous
• A perfect search engine • A perfect recommendation
system• A perfect computer assistantEfficient but not exciting
They lack serendipity
Design programs that would help introduce serendipity in our lives – Focus on the experience
Amélie Marian - Rutgers University - SinFra 2015
23
Digital Self Project at Rutgers University
• Personal data is rich in contextual information – We remember our data based on
contextual cues
• Individualized context-aware personal information management tool – Integrate users’ fragmented data – Support personal information search – Build a personal knowledge base
Faculty– Amélie Marian– Thu Nguyen– Alex Borgida
Students (past and present)
– Daniela Vianna– Valia Kalokiri– Alicia-Michelle Yong– Chaolun Xia
Amélie Marian - Rutgers University - SinFra 2015
24
Digital Self Architecture• Data Collection
– Identification, retrieval, storage – Personal Extraction Tool:
https://github.com/ameliemarian/DigitalSelf
• Data Integration– Multidimensional, context-aware,
unified data model– w5h Model
• Search– based on the natural memory
retrieval process– Context-aware, approximate– -w5h Search
• Knowledge Discovery– Find connections and patterns– Integrates user behavior and
feedback
Amélie Marian - Rutgers University - SinFra 2015
25
w5h - Context-aware Data Model
• Personal information is rich in contextual information– Metadata– Application data – Environment knowledge
• Cognitive Psychology– contextual cues are strong triggers for autobiographical
memories • Personal information can be modeled and indexed
following six dimensions – – what, who, where, when, why and how - w5h Model
Amélie Marian - Rutgers University - SinFra 2015
26
Preliminary Results - MRR
Bold: statistically significant (p<0.05)
w5h: context aware search, w5h indexesText: Mongodb text index over integrated data Solr: Text index on raw data
Amélie Marian - Rutgers University - SinFra 2015
27
Conclusions
• The time for better Personal Information Management is now!– Many exciting research challenges– Important ethical and societal implications
• The ability to search and recover past information is a critical feature of future PIMS– Need to take the specificities of Personal
Information search into account
28
ReferencesPIMS:As we may think, Vannevar Bush, the Atlantic Monthly, 2005.Personal Information Management. W. Jones and J. Teevan, editors.
University of Washington Press, 2007.Beyond total capture: a constructive critique of Lifelogging, Sellen and Whitaker, CACM 2010.A tool for personal data extraction. Vianna, Yong, Xia, Marian, and Nguyen, IIWeb 2014.Microsoft’s Stuff I’ve Seen project (Dumais et al. SIGIR 2003)MyLifeBits (Gemmel, Bell and Lueder, CACM 2006)deskWeb (Zerr et al. SIGIR 2010)Connections (Soules and Ganger, SOSP 2005)Seetrieve (Gyllstrom and Soules, IUI 2008)LifeStreams (Fertig, Freeman, and Gelernter, CHI 1996)Haystack (Karger et al. CIDR 2005)Data Integration:A survey of approaches to automatic schema matching, Rahm & Bernstein 2001. Principles of Data integration, Doan, Halevy, Ives, 2012.Principles of dataspace systems, Halevy, Franklin, and Maier. CACM, 2006.
Amélie Marian - Rutgers University - SinFra 2015