Hasler2014
-
Upload
exascale-infolab -
Category
Data & Analytics
-
view
93 -
download
0
description
Transcript of Hasler2014
![Page 1: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/1.jpg)
Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland
Reclaim yourDigital Life
![Page 2: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/2.jpg)
Motivation (1/3)
Commoditization of digital equipment■ Desktops, laptops, netbooks, mobile phones,
tablets, e-book readers, set-top boxes, personal GPSs, digital cameras, TVs, etc.
Fragmentation of information across devices
![Page 3: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/3.jpg)
Motivation (2/3)
The story of my life...■ Where are the pictures of my niece’s birthday?■ How should I consolidate/backup my emails?
Fortunately there’s the cloud, right?
![Page 4: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/4.jpg)
Motivation (3/3)
2014 twist on Personal Information Management: lifelogging, health-monitoring■ Everylog, Memoto, Google Glasses, Nike's FuelBand,
FitBit, Samsung GearFit & competitors...➡Urgent need to index & integrate continuous personal
feeds for automated processing
![Page 5: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/5.jpg)
Problem Definition
Personal digital information is today fragmented and externalized
➡ “Each site is a silo, walled off from the others…” [TBL 10.2010]■ Data partitioning■ Loss of governance
How shall one automatically reclaim and meaningfully organize his/her digital information dispersed online and on various devices to generate useful digital memories?
![Page 6: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/6.jpg)
MEM0R1ES......a highly-available, secure, scalable, and semantically-rich platform to extract, preserve, integrate and expose personal information for a smarter world
![Page 7: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/7.jpg)
the -Team
Prof. Dr. Philippe Cudré-Mauroux
Prof. Dr. Karl Aberer
Prof. Dr. Maria Sokhn
Julien Tscherrig
Joël Dumoulin
Michele Catasta
Dr. Gianluca Demartini
Alberto Tonon
![Page 8: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/8.jpg)
Last Year…
Device & Service Wrappers [EIA-FR]
■ Generic Wrapper Architecture: SMTP, Gmail, Google Drive, Facebook, DBPedia, Flickr, LinkedIn
■ Browser wrapper: [EPFL]
Lifelogging rich features (context, user activities and focus, etc.) from the browser
Storage Infrastructure ■ Multi-purpose, declarative & elastic storage
layer [UNIFR]
![Page 9: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/9.jpg)
Result from the Digital Reclaiming
➡Heterogeneous Graphs of EntitiesInformation duplication
Sometimes with different facets
Missing information
![Page 10: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/10.jpg)
Today’s Focus
Meaningful information integration from heterogeneous graphs of entities
1. Entity Search (AOR)
2. Entity Typing (TRank)
3. Entity Clustering (ZenCrowd, MemorySense, Predict)
4. Entity Elicitation (Transactive Search)
Use-case: leveraging digital mem0r1es from a conference participation (demonstrators)
![Page 11: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/11.jpg)
1. Entity Search [UNIFR]
Main idea: combine unstructured and structured search to find relevant entities in the graph■ Inverted index to locate first candidates■ Graph queries to refine the results
■ Graph traversals (queries on object properties)■ Graph neighborhoods (queries on data type properties)
![Page 12: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/12.jpg)
1. Entity Search
➡ up to 25% MAP improvement over BM25!
![Page 13: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/13.jpg)
2. Entity Typing [UNIFR+EPFL]
Entities can have many types (facets)■ Which fine-grained types are most relevant given
the context?
Thing
American Billionaire
s
People from King
CountyPeople from
Seattle
Windows People
Agent
Person
Living People
American People of Scottish Descent
Harvard University
People
American Computer
Programmers
American Philanthropists
People from
Seattle
![Page 14: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/14.jpg)
2. Entity Typing
Integrates BigData types from the Web of data■ Tree of 447’260 types■ Rooted on <owl:Thing> ■ Depth of 19
Ranks relevant types by analyzing the context ■ Textual context■ Graph context■ Decision trees■ Linear regression
![Page 15: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/15.jpg)
3. Entity Clustering
Several efforts to cluster entities into meaningful groups depending on context:
PREDIct [EIA-FR]
■ Extracts Web information through wrappers
■ Models topics through Latent Dirichlet Allocation
■ Predictions based on topic trends
![Page 16: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/16.jpg)
3. Entity Clustering
MemorySense [EPFL]
■ Clusters mobile data into macro-activities
■ Leverages location, machine-learning and an activity ontology
B-hist [UNIFR+EPFL+EIA-FR]
■ Better browser history clustering through entity typing and machine-learning
![Page 17: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/17.jpg)
4. Entity Elicitation [EPFL+UNIFR]
Filling the gaps in mem0r1es entity graphs■ e.g., ‘who also attended WWW03 last year?’■ Traditional methods (Web crawling, machine-
learning, micro-task crowdsourcing) are insufficient■ Errors and lack of discriminative features (➘precision)
■ Lack of public data (➘recall)
![Page 18: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/18.jpg)
4. Entity Elicitation
Adapting the concept of transactive memories (group memories) from psychology
➡Transactive search methods to elicit information
■ Social network analysis (to direct the search)■ Crowdsourcing (to get the information)■ 46% improvement (F1) over best alternative
![Page 19: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/19.jpg)
Demo
Use-case on scientific conference memoriesBased on 4 demonstrators:■ Visualizing clustered mobile data (MemorySense)■ Information elicitation through Transactive Search
(Hippocampus)■ Browsing clustered Web history (B-hist)■ Clustering and prediction of topics based on
extracted information (PREDIct)
![Page 20: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/20.jpg)
Dissemination (1)
Papers at top research venues:■ Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux: Combining inverted indices and structured
search for ad-hoc object retrieval. SIGIR 2012.
■ Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer: TRank, Ranking Entity Types Using the Web of Data. International Semantic Web Conference ISWC 2013.
■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudré-Mauroux: Hippocampus, answering memory queries using transactive search. WWW 2014.
■ Michele Catasta, Alberto Tonon, Vincent Pasquier, Gianluca Demartini, Karl Aberer, Philippe Cudré-Mauroux: B-hist, Better Entity-Centric Search over Personal Web Browsing History. International Semantic Web Conference ISWC 2014.
■ Michele Catasta, Alberto Tonon, Gianluca Demartini, Jean-Eudes Ranvier, Karl Aberer, Philippe Cudré-Mauroux: B-hist, Entity-Centric Search over Personal Web Browsing History. Journal of Web Semantics, 2014 (to appear).
■ Michele Catasta, Alberto Tonon, Djellel Eddine Difallah, Gianluca Demartini, Karl Aberer, Philippe Cudre-Mauroux: TransactiveDB: Tapping into Collective Human Memories. PVLDB, 2014 (in revision).
■ Julien Tscherrig, Philippe Cudre-Mauroux, Elena Mugellini, Omar Abou Khaled, Maria Sokhn: SemantiConverter: A Flexible Framework to Convert Semi-Structured Data into RDF. Submitted for publication.
![Page 21: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/21.jpg)
Dissemination (2)
Android app on Google PlayOpen-source release of most components■ https://github.com/MEM0R1ES
ISWC 2013 Best-Paper Award nominee (TRank)Semantic Web Challenge 2013 Finalist (B-hist)Wall Street Journal mention (B-hist, 30.10.2013)Technology transfer■ Extracting entities (Google Zurich)■ MemorySense (Samsung)■ TRank (Yahoo!)
Start-up (?)
![Page 22: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/22.jpg)
Current Research Directions
Modelling tail-entitiesTransactive DB operatorAutomatic capture of important memories■ Google Glasses
Software integration
![Page 23: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/23.jpg)
Conclusions
Exciting project■ Important, timely societal issues■ Fundamental research questions
■ Data Storage, Data Integration, Data Clustering, Data Elicitation
Stimulating collaboration■ Involving 3 (4) institutions➡ Thanks to all partners for their contributions!
A number of tangible results already ■Open-source software components■Publications at top research venues■Industry transfer
![Page 24: Hasler2014](https://reader033.fdocuments.in/reader033/viewer/2022052822/554e8c55b4c90526358b4b25/html5/thumbnails/24.jpg)
Thanks a lot for your attention,
… and many thanks to the Hasler Stiftungfor funding this project!
Questions?
Hasler Stiftung SmartWorld Workshop, June 19, 2014, Thun — Switzerland
Reclaim your Digital Life