Clef 2015 Keynote Grefenstette September 8, 2015, Toulouse

58
Personal Information Systems and Personal Semantics Gregory Grefenstette CLEF 2015 September 8, 2015 ,

Transcript of Clef 2015 Keynote Grefenstette September 8, 2015, Toulouse

Page 1: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Personal Information Systems and Personal Semantics

Gregory Grefenstette

CLEF 2015

September 8, 2015 ,

Page 2: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Information is moving from the Web to Apps Each person generates a lot of data Two communities use it now Search in one’s own data is the future Four ways to search We need personal facets

Page 3: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 3

http://www.statista.com/statistics/263795/number-of-available-apps-in-the-apple-app-store/

Apple announced that 100 billion apps had been downloaded from its App Store (June 2015)

Page 4: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 4

2014

Page 5: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Another trend

Page 6: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Smart Glasses

http://en.wikipedia.org/wiki/File:A_Google_Glass_wearer.jpg

http://en.wikipedia.org/wiki/File:Aimoneyetap.jpg

http://en.wikipedia.org/wiki/File:Golden-i_3.8_Headset_Computer.png

Sony US Patent Application 20130069850

Microsoft US Patent Application 20120293548

Page 7: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

https://www.youtube.com/watch?v=b7I7JuQXttw

Page 8: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Okay … Apps, Quantified Self, Smart Glasses Step back to NOW

Page 9: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Personal Big Data

Page 10: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Personal Big Data

Email sent

Email received

Social network posts

IP address location

SMS, chats

Search history

Web pages visited

Media viewed

Credit card purchases

Call data

GPS locations

Vitals signs

Activity/inactivity

Lifestyle

Conversations

Reading

People seen

Noises heard

• 

Page 11: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Who uses this data today?

Surely, each person should have the same access to their own data

Page 12: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Impediments to using our own data

•  Data Silos

•  Ownership

•  Privacy

•  Big Data Problems •  Variety •  Volume •  Merging -- Semantics

Page 13: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Supposing we could get all our data back into our own hands, how could we search it? Short course on 4 types of search

Page 14: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Search Engines – Cranfield/SMART Model

14 8 Sept 2015 CLEF 2015 Grefenstette

ftp://ftp.cs.cornell.edu/pub/smart/cran

.I 6

.W ventricular septal defect occurring in association with aortic regurgitation .I 7 .W radioisotopes in heart scanning. mainly used in diagnosis of pericardial effusions. also used to study tumors, heart enlargement, aneurysms and pericardial thickening. technetium, rihsa, radioactive hippurate, cholegraffin are used. .I 8 .W the effects of drugs on the bone marrow of man and animals, …

5 332 5 333 6 112 6 115 6 116 6 118 6 122 6 238 6 239 6 242 6 260 6 309 6 320 6 321 6 323 7 92 7 121 7 189 7 389 7 390 7 391 7 392 7 393 8 52 8 60

conditions . .I 237 cisternal fluid oxygen ... using a beckman micro-oxyg.. tension simultaneously in the.. and in arterial blood under.. that the cisternal oxygen.. oxygen tension of the surroun. the available free oxygen... duration in the cerebral... .I 238 ventricular septal defect obstruction . a case of ventricular... lesion and infundibular... coronary cusp of the aortic.. septal defect, was demonstra.. as a polyp-like mass in the... catheterization and angiocard ventricular outflow obstr... .I 239 functional adaptations of the congenital heart disease ....

queries

qrels documents

Page 15: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 15

Search Engines – Cranfield/SMART Model

Page 16: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 16

Schedules 3 Economics, Education, Society 33 Economics and Management 338 Industries, Products 338.1 – 338.4 Specific kinds of industries 338.4 Secondary Industries and Services 338.47 Goods and Services

Built from 338.471 – 338.479 Subdivisions for Goods and Services

Schedules 338.476 Technology 338.4767 Manufacturing 338.47677 Textiles 338.476772 Textiles of Seed hair fibres 338.4767721 Cotton

Built from 338.47677210 Facet Indicator for Standard Subdivision Table 1 338.476772109 Historical, geographic, persons treatment Built from 338.4767721094 Europe Western Europe Table 2 338.47677210942 England and Wales

338.476772109427 Northwestern England and Isle of Man 338.4767721094276 Lancashire

“The Lancashire cotton industry : a study in economic development” Assigned DDC Code: 338.4767721094276

Search Engines – Dewey Decimal Faceted Model

Page 17: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 17

Search Engines – Dewey Decimal Faceted Model

Page 18: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2 Other Search Models: Maps, Time Intervals

2015 CLEF 2015 Grefenstette - 18

Page 19: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Past Attempts

2015 CLEF 2015 Grefenstette - 19

Page 20: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

MyLifeBits

2015 CLEF 2015 Grefenstette - 20

Gemmell, Jim, Gordon Bell, and Roger Lueder. "MyLifeBits: a personal database for everything." Communications of the ACM 49.1 (2006): 88-95.

"But even with convenient classifications and labels ready to apply, we are still asking the user to become a filing clerk – manually annotating every document, email, photo, or conversation."

Page 21: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

LifeLog

2015 CLEF 2015 Grefenstette - 21

…The user can order the life-log agent to add retrieval keys (annotation) with an arbitrary name by simple operations on his cellular phone while the agent is capturing a life-log video. This enables the agent to identify a scene that the user wants to remember throughout his life, and thus the user can access easily to the videos that were captured during precious experiences"

Aizawa, Kiyoharu, Tetsuro Hori, Shinya Kawasaki, and Takayuki Ishikawa. "Capture and efficient retrieval of life log." In Pervasive 2004 Workshop on Memory and Sharing Experiences, pp. 15-20. 2004.

Page 22: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Stuff I’ve Seen

2015 CLEF 2015 Grefenstette - 22

…Research in cognitive psychology has found that people remember information, particularly older information, not in terms of exact time, but in terms of key episodes, such as a child’s birthday, exotic travel,…

Cutrell, Edward, Susan T. Dumais, and Jaime Teevan. "Searching to eliminate personal information management." Communications of the ACM 49.1 (2006): 58-64

Page 23: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

PERSON

2015 CLEF 2015 Grefenstette - 23

…we define the general category for user’s activity in advance, such as ordinary activity and extra-ordinary activity. In ordinary activity is related to the activity in home or office. Generally, the activities occurred outside of those area, they are classified as extraordinary activities. In addition to these pre-defined activities, users can add their own activity through our learning based structure… For some duration, we record whole activities of user. For the repeated activities at same time, in same place with similar objects, our activity engine will register as user defined activities by asking in which category those can be included.

Kim, Ig-Jae, et al. "PERSON: personalized experience recoding and searching on networked environment." Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences. ACM, 2006.

Page 24: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Personal Data Prototype

2015 CLEF 2015 Grefenstette - 24

…Landmarks of tags are defined by the frequency of tags that are assigned to each item of personal data. A tag that has been in heavy use during a period of time is a candidate for a landmark. A tag that has rarely been used during a long period of time is also a candidate for a landmark. Outliers are candidates for landmarks in time-series data, such as home energy use, the number of steps walked, and histories of body weight. Data that exceed pre-defined or user-defined thresholds are also candidates. Other landmarks are public landmarks, which include shocking public news, bestsellers, blockbuster films, and annual rankings of top Web-search words. We can recall our own experiences on those days from these landmarks.

Teraoka, Teruhiko. "Organization and exploration of heterogeneous personal data collected in daily life." Human-Centric Computing and Information Sciences 2.1 (2012): 1-15.

Page 25: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Dublin City University

2015 CLEF 2015 Grefenstette - 25

…The user can order the life-log agent to add retrieval keys (annotation) with an arbitrary name by simple operations on his cellular phone while the agent is capturing a life-log video. This enables the agent to identify a scene that the user wants to remember throughout his life, and thus the user can access easily to the videos that were captured during precious experiences"

Qiu, Zhengwei. "A lifelogging system supporting multimodal access." PhD diss., Dublin City University, 2013. Wang, Peng, and Alan F. Smeaton. "Aggregating semantic concepts for event representation in lifelogging." Proceedings of the International Workshop on Semantic Web Information Management. ACM, 2011.

Page 26: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Okay, we’ve seen -- Apps / QS -- Personal Big Data -- Some early attempts Everyone says Time is important Maps are important String search is important but… Facets, what are our personal facets? How can we automate them?

2015 CLEF 2015 Grefenstette - 26

Page 27: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 27

swimming

Page 28: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 28

swimming

(my) people involved in something about swimming

Page 29: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 29

swimming

things I’ve bought involving swimming

Page 30: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 30

swimming

(my) photos and facebook posts related to swimming

Page 31: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 31

swimming

emails about swimming things

Page 32: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 32

swimming

places I’ve been involving swimming

Page 33: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 33

swimming

days involving swimming things

Page 34: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 34

swimming

phone calls about swimming things…

Page 35: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 35

swimming

Page 36: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Rather Self-Centred, no?

2015 CLEF 2015 Grefenstette - 36

Page 37: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Personal Information System

Personal archives

Induction semantic dimensions

Personal Semantic hierachies

Crowdsourced semantic Hierarchies (eg. Wikipedia)

Expert semantic Hierarchies (eg. MeSH)

Ingest/Annotate/Merge

Page 38: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 PTraces Grefenstette - 38

swimming

Kni tt i ng

poker

Paint i ng

.

.

.

Pai

nt i

ng

Page 39: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Expert >>> Crowdsourcing >>> Personal Ontology Folksonomy Models

Page 40: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models

Page 41: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models

Page 42: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Knitting>Knitting_methods_for_shaping>Short_row_(knitting) Knitting>Knitting_stitches Knitting>Knitting_stitches>List_of_knitting_stitches Knitting>Knitting_stitches>Basic_knitted_fabrics Knitting>Knitting_stitches>Decrease_(knitting) Knitting>Knitting_stitches>Dip_stitch Knitting>Knitting_stitches>Drop-stitch_knitting Knitting>Knitting_stitches>Elongated_stitch Knitting>Knitting_stitches>Fair_Isle_(technique) Knitting>Knitting_stitches>Grafting_(knitting) Knitting>Knitting_stitches>Loop_knitting Knitting>Knitting_stitches>Pick_up_stitches_(knitting) Knitting>Knitting_stitches>Plaited_stitch_(knitting) Knitting>Knitting_stitches>Slip-stitch_knitting Knitting>Knitting_stitches>Yarn_over Knitting>Knitting_tools_and_materials Knitting>Knitting_tools_and_materials>Eisaku_Noro_Company Knitting>Knitting_tools_and_materials>Hank_(textile) Knitting>Knitting_tools_and_materials>Knitting_machine Knitting>Knitting_tools_and_materials>Knitting_Nancy Knitting>Knitting_tools_and_materials>Knitting_needle Knitting>Knitting_tools_and_materials>Knitting_needle_cap Knitting>Knitting_tools_and_materials>Lazy_Kate Knitting>Knitting_tools_and_materials>Liaghra Knitting>Knitting_tools_and_materials>Nostepinne Knitting>Knitting_tools_and_materials>Row_counter_(hand_knitting) Knitting>Knitting_tools_and_materials>Stitch_holder Knitting>Knitting_tools_and_materials>Stocking_frame Knitting>Knitting_tools_and_materials>Variegated_yarn Knitting>Knitting_tools_and_materials>Yarn

Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models

Page 43: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Expert >>> Crowdsourcing >>> Personal Models Folksonomy Models

Page 44: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 44

Page 45: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 45

Page 46: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Well, no….

2015 CLEF 2015 Grefenstette - 46

Page 47: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Tweet

2015 CLEF 2015 Grefenstette - 47

Less than 12 hours until I am in the pool crying... thankful for mirrored goggles

Swimming>pool Swimming>goggles

facets

I’d want this …

Page 48: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 48

swimming -- weightlifting, cycling, gymnastics, judo, table, volleyball, archery, rowing, badminton, track, water, taekwondo, tennis, field, diving, handball, boxing, softball, karate, pentathlon, fencing, athletics, triathlon, wrestling, soccer

http://webdocs.cs.ualberta.ca/~lindek/downloads.htm Distributional Semantics 1.5 billion words

Wordnet

Page 49: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Existing taxonomies are for societal exchanges

Do you want to buy this? What famous person did this when? What can we make for this?

2015 CLEF 2015 Grefenstette - 49

We are missing a description of what is related to us, doing something…

specific vocabularies loose taxonomies … facets

Page 50: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Somthing like…. Sports/swimming/backstroke Sports/swimming/on my back Sports/swimming/breastroke Sports/swimming/fins Sports/swimming/goggles Sports/swimming/fast lane Sports/swimming/slow lane Sports/swimming/laps Sports/swimming/lifeguard Sports/swimming/pool Sports/swimming/lake Sports/swimming/ocean Sports/swimming/Neuilly Nautic Centre Sport/swimming/South Hills Pool Sports/swimming/towel Sports/swimming/25m Sports/swimming/goggles Sports/swimming/cap Sports/swimming/swim suit 2015 CLEF 2015 Grefenstette - 50

Page 51: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

2015 CLEF 2015 Grefenstette - 51

http://www.notsoboringlife.com/list-of-hobbies/ Not just swimming!

Page 52: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Conclusion on Personal facets

There is a lot of work to do •  for predictable needs (hobbies, pastimes, sports), we do not

have the basic facets we need •  for personal information (family, friends, familiar places), we

have very little •  And this should be multilingual, too

2015 CLEF 2015 Grefenstette - 52

Page 53: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

•  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more

2015 CLEF 2015 Grefenstette - 53

Conclusion: Searching Personal Big Data

Page 54: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

•  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more

•  At one point, people will want their information back

2015 CLEF 2015 Grefenstette - 54

Conclusion: Searching Personal Big Data

Page 55: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

•  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more

•  At one point, people will want their information back •  When you have too much information, you need facets •  The facets for organizing personal information will be

needed and do not yet exist

2015 CLEF 2015 Grefenstette - 55

Conclusion: Searching Personal Big Data

Page 56: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Conclusion: Searching Personal Big Data •  Information is moving from the Web into Apps •  People are generating information in these siloed Apps •  People generate more digital information every day •  Wearable computing will create even more

•  At one point, people will want their information back •  When you have too much information, you need facets •  The facets for organizing personal information will be

needed and do not yet exist •  There are billions of cell phone users. They will all

want this. You should start working on it.

2015 CLEF 2015 Grefenstette - 56

Page 57: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

- 57 - 57

Thank you !

www.inria.fr

Page 58: Clef 2015 Keynote Grefenstette  September 8, 2015, Toulouse

Gurrin, Cathal and Smeaton, Alan F. and Doherty, Aiden R. (2014) LifeLogging: personal big data. Foundations and Trends in Information Retrieval, 8 (1). pp. 1-125. ISSN 1554-0677

Content type Per day Volume per day Volume per year Video 16 hours 90 GB 33 TB Autographer Camera

3000 images 1.3 GB 480 GB

Audio 16 hours 630 MB 230 GB Microsoft Sensecam

4500 images 82 MB 30 GB

Accelerometer 58,000 readings 138 KB 50 MB Locations 10,000 readings 27 KB 10 MB Bluetooth Interactions

400 (estimated) 5 MB 2 GB

Words heard or read

100,000 700 KB 255 MB