CrowdFill: Collecting Structured Data from the Crowd
-
Upload
hyunjung-park -
Category
Software
-
view
286 -
download
1
Transcript of CrowdFill: Collecting Structured Data from the Crowd
CrowdFill: Collecting Structured Data from the Crowd
Hyunjung Park Jennifer Widom
Stanford University
Goal
•Collect high-quality structured data from the crowd, while capping total monetary cost and keeping latency low
6/25/2014 Hyunjung Park 2
name nationality position caps goals
Brazil
Messi FW
Klose Germany 133
Traditional Microtask-based Approach
1. Decompose the data collection task into a set of microtaskse.g., “What position does Klose play?”
“How many goals has Messi scored?”
2. Each worker provides specific pieces of data via microtasks
3. Assemble the collected pieces of data into the final table
6/25/2014 Hyunjung Park 3
CrowdFill’s Table-filling Approach
1. Present an entire partially-filled table to all participating workers
2. Each worker contributes what they know to the table by filling in empty cells, and voting on data entered by others
3. Propagate worker actions in real-time to synchronize the table across all workers
6/25/2014 Hyunjung Park 4
CrowdFill’s Table-filling Approach
6/25/2014 Hyunjung Park 5
Outline
•Formal model
•Overall architecture
•Concurrent operations
•Satisfying values constraint
•Compensation scheme
•Experimental evaluation
•Related work
6/25/2014 Hyunjung Park 6
Formal Model: Schema
•Table SpecificationColumn definitions and primary keySoccerPlayer(name, nationality, position, caps, goals)
•Scoring FunctionAccept a row r if and only if f(ur, dr) > 0
where ur and dr are its upvote and downvote countse.g., “majority of three or more”
f(ur, dr) = ur−dr if ur+dr≥20 otherwise
6/25/2014 Hyunjung Park 7
Formal Model: Constraints
•Values ConstraintFinal table S must “match” template T (a partially-filled
table)
•Cardinality ConstraintFinal table S must contain at least N rowsSpecial case of values constraint
6/25/2014 Hyunjung Park 8
name nationality position
Argentina
FW
name nationality position
Messi Argentina FW
Rooney England FW
Formal Model: Candidate Table
•Candidate table RExposed to clientsPrimary key not enforcedEach row annotated with its upvote and downvote
counts
6/25/2014 Hyunjung Park 9
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Formal Model: Operations
•Primitive Operations on R Insert a new empty row into RFill in an empty column of a row with a valueUpvote a complete rowDownvote a non-empty row
6/25/2014 Hyunjung Park 10
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 0 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 0
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 1
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose Germany DF 1 2
Formal Model: Final Table
•Final table SDerived from candidate table REach complete row r in R such that f(ur, dr) > 0, andf(ur, dr) is the highest score of any row with the same primary key as r
6/25/2014 Hyunjung Park 11
name nationality position
Messi Argentina FW 2 0
Ronaldo Portugal FW 3 0
Ronaldo Portugal MF 2 1
Neymar Brazil 0 1
Klose German DF 1 2
name nationality position
Messi Argentina FW
Ronaldo Portugal FW
CrowdFill Architecture
Front-end Server
Back-end Server
Database
Worker Client
Web Interface
CrowdsourcingMarketplace
taskacceptance
task setup,payment
results collectiontable specs, payment
Execution Server
CentralClient
Worker Client
Worker Client
Worker Client
dataentry
6/25/2014 Hyunjung Park 12
Outline
•Formal model
•Overall architecture
•Concurrent operations
•Satisfying values constraint
•Compensation scheme
•Experimental evaluation
•Related work
6/25/2014 Hyunjung Park 13
Concurrent Operations
•Model designed to minimize effects of concurrency (details in paper)Operations are easily mergedConflicts are resolved seamlessly
•Convergence theoremArchitecture ensures server and all clients apply the
same operations, possibly with different ordersTheorem guarantees that server and all clients
converge to the same candidate table whenever the system quiesces
6/25/2014 Hyunjung Park 14
Satisfying Values Constraint
•Values constraint Final table S must match template T
•Worker clientsPerform fill, upvote, and downvote operationsNeed not be aware of the template T
• Special “Central client”Automatically populates new rows to guide the final table S
towards the template T
• Probable Row Invariant (PRI)R always contains just enough “probable” rows matching
template TPRI maintained based on maximum bipartite matching
6/25/2014 Hyunjung Park 15
Compensation Scheme: Overview
•After data collectionAllocate a total monetary budget based on each
worker’s overall contribution to the final tableEncourage workers to submit useful workMake total monetary cost predictable
•During data collectionProvide estimated compensation for individual actions
to keep workers engaged
6/25/2014 Hyunjung Park 16
Compensation Scheme: Contribution
•Given final table S, operation op contributed to Sif:op filled in a cell in S (“direct” contribution)op first provided a value for S while creating a subset
of a row in S (“indirect” contribution)op upvoted a row in Sop downvoted a combination of values not present in S
6/25/2014 Hyunjung Park 17
Compensation Scheme: Allocation
•Uniform allocationEach cell and contributing vote has the same
compensationEach cell divided into direct and indirect contributions
•Column-weighted allocationTake into account varying difficulty of filling in
different columns and casting votes
•Dual-weighted allocationAlso take into account entering new key values can get
progressively more difficult as the table fills up
6/25/2014 Hyunjung Park 18
Experimental Evaluation: Setting
•SoccerPlayer(name, nationality, position, caps, goals, date-of-birth)
•Scoring function: “majority of three or more”
•Goal: information about 20 players with caps between 80 and 99
•Five volunteer workers
•Total monetary budget: $10
•Dual-weighted allocation scheme
6/25/2014 Hyunjung Park 19
Experimental Evaluation: Summary
• In our representative runOverall latency: 10m 44s#Rows in the candidate table: 23Final compensations: $0.51, $1.68, $2.08, $2.24, $3.49No “slowdown” in obtaining new primary keys
6/25/2014 Hyunjung Park 20
Accuracy of Estimated Compensation
6/25/2014 Hyunjung Park 21
Related Work
•Crowdsourcing structured dataCrowdDB [Franklin et al. 2011]Deco [Park et al. 2012]
•Real-time cooperative editing systemsConvergence [Ellis and Gibbs 1989] Intention preservation [Sun et al. 1998]
•Monetary compensation for crowdsourcing Incentive designs [Shaw et al. 2011]
6/25/2014 Hyunjung Park 22
Summary
•CrowdFill’s novel table-filling approachReal-time collaboration among workers Intuitive data entry interfaceCompensation based on contribution
• In the paper:Full description of the formal modelPRI maintenance algorithm with examples
More details about the compensation schemeMore experimental results
6/25/2014 Hyunjung Park 23
Thank you