HUMAN COMPUTATION IN THE LINKED DATA MANAGEMENT LIFE CYCLE ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON
7/18/2013
1st PRELIDA workshop 1
HUMAN COMPUTATION Outsourcing tasks that machines find difficult to solve to humans (accuracy, efficiency, costs)
SEMANTIC TECHNOLOGIES ARE ALL ABOUT AUTOMATION
…but many tasks rely on human input
• Modeling a domain • Integrating data sources
originating from different contexts
• Producing semantic markup for various types of digital artifacts
• ...
3 1st PRELIDA workshop
DIMENSIONS OF HUMAN COMPUTATION SYSTEMS
What Tasks that
require basic human skills
How Distribution
Coordination Aggregation
Quality Closed vs
open answers
Ground truth Quantitative vs qualitative Who is the evaluator?
Optimize! Incentives Reduce
problem size Task
assignment
7/18/2013
1st PRELIDA workshop 4
GAMES WITH A PURPOSE (GWAP)
Human computation disguised as casual games Tasks are divided into parallelizable atomic units (challenges) solved (consensually) by players Game models
• Single vs. multi-player • Selection agreement vs. input agreement vs. inversion-
problem games
7/18/2013 5
MICROTASK CROWDSOURCING Similar types of tasks, but different incentives model (monetary reward, PPP) Successfully applied to transcription, classification, and content generation, data collection, image tagging, website feedback, usability tests…
7/18/2013
1st PRELIDA workshop 6
THE SAME, BUT DIFFERENT • Tasks leveraging common human skills, appealing to large
audiences • Selection of domain and task more constrained in games to
create typical UX • Tasks decomposed into smaller units of work to be solved
independently • Complex workflows
• Creating a casual game experience vs. patterns in microtasks • Quality assurance
• Synchronous interaction in games • Levels of difficulty and near-real-time feedback in games • Many methods applied in both cases (redundancy, votes,
statistical techniques) • Different set of incentives and motivators
7/18/2013
1st PRELIDA workshop 7
Physical World (people and devices)
HYBRID SYSTEMS
Design and composition
Participation and data supply
Model of social interaction
Virtual world (Network of social interactions)
Dave Robertson
Not sure
EXAMPLE: HYBRID DATA INTEGRATION
paper conf Data integration VLDB-01
Data mining SIGMOD-02
title author email OLAP Mike mike@a
Social media Jane jane@b
Generate plausible matches – paper = title, paper = author, paper = email, paper = venue – conf = title, conf = author, conf = email, conf = venue
Ask users to verify
paper conf Data integration VLDB-01
Data mining SIGMOD-02
title author email venue OLAP Mike mike@a ICDE-02
Social media Jane jane@b PODS-05
Does attribute paper match attribute author?
No Yes
[McCann, Shen, Doan, ICDE 2008] 9
EXAMPLES FROM THE LINKED DATA WORLD
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON, UK
7/18/2013
1st PRELIDA workshop 10
WHAT IS DIFFERENT ABOUT SEMANTIC SYSTEMS?
Semantic Web tools vs. applications
• Intelligent (specialized) Web sites (portals) with improved (local) search based on vocabularies and ontologies
• X2X integration (often combined with Web services)
• Knowledge representation, communication and exchange
7/18/2013
1st PRELIDA workshop
TASKS NAMED IN METHODOLOGIES ARE TOO HIGH-LEVEL
Crowdsource very specific tasks that are (highly) divisible
• Labeling (in different languages) • Finding relationships • Populating the ontology • Aligning and interlinking • Ontology-based annotation • Validating the results of automatic
methods • …
Think about the context of the application (social structure) and about how to hide tasks behind existing practices and tools
12
7/18/2013
Tutorial@ESWC2013
TASTE IT! TRY IT! • Restaurant review Android app developed in the Insemtives project • Uses Dbpedia concepts to generate structured reviews • Uses mechanism design/gamification to configure incentives • User study
• 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts
7/18/2013
1st PRELIDA workshop 13 https://play.google.com/store/apps/details?id=insemtives.android&hl=en
0
500
1000
1500
2000
2500
CAFE FASTFOOD PUB RESTAURANT
Numer of reviews
Number of semantic annotations (type of cuisine)
Number of semantic annotations (dishes)
LODREFINE
7/18/2013
1st PRELIDA workshop 14 http://research.zemanta.com/crowds-to-the-rescue/
DBPEDIA CURATION
7/18/2013
1st PRELIDA workshop 15 http://aksw.org/Projects/TripleCheckMate.html
CROWDMAP Experiments using MTurk, CrowdFlower and established benchmarks Enhancing the results of automatic techniques Fast, accurate, cost-effective [Sarasua, Simperl, Noy, ISWC2012]
16
CartP 301-304
100R50P Edas-Iasted
100R50P Ekaw-Iasted
100R50P Cmt-Ekaw
100R50P ConfOf-Ekaw
Imp 301-304
PRECISION 0.53 0.8 1.0 1.0 0.93 0.73
RECALL 1.0 0.42 0.7 0.75 0.65 1.0
ONTOLOGY POPULATION
7/18/2013
1st PRELIDA workshop 17
LINKED DATA CURATION
7/18/2013
1st PRELIDA workshop 18
PROBLEMS AND CHALLENGES •What is feasible and how can tasks be optimally translated into microtasks?
• Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions
•What to show to users • Natural language descriptions of Linked Data/SPARQL • How much context • What form of rendering • How about links?
•How to combine with automatic tools • Which results to validate
• Low precision (no fun for gamers...) • Low recall (vs all possible questions)
•How to embed it into an existing application • Tasks are fine granular, perceived as additional burden to the actual functionality
•What to do with the resulting data? • Integration into existing practices • Vocabularies!
7/18/2013
1st PRELIDA workshop 19
Top Related