Probability in EECS Jean Walrand – EECS – UC Berkeley Kalman Filter.
Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC...
-
Upload
maria-imogen-fisher -
Category
Documents
-
view
215 -
download
0
Transcript of Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC...
![Page 1: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/1.jpg)
Making Sense at Scale withAlgorithms, Machines & People
Michael FranklinEECS, Computer Science
UC Berkeley
Emory UDecember 7, 2011
![Page 2: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/2.jpg)
Defining the Big Data Problem
2
Size
+
Complexity
=Answers that don’t meet quality, time and cost requirements.
![Page 3: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/3.jpg)
The State of the Art
Algorithms
Machines
People
search
Watson/IBM
3
![Page 4: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/4.jpg)
Needed: A Holistic Approach
search
Watson/IBM
Machines
People
Algorithms
4
![Page 5: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/5.jpg)
AMP Team• 8 (primary) Faculty at Berkeley
• Databases, Machine Learning, Networking, Security, Systems, …
• 4 Partner ApplicationsParticipatory Sensing
Mobile Millenium (Alex Bayen, Civ Eng)
Collective Discovery
Opinion Space (Ken Goldberg, IEOR)
Urban Planning and Simulation
UrbanSim (Paul Waddell, Env. Des.)
Cancer Genomics/Personalized Medicine (Taylor Sittler, UCSF)
5
![Page 6: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/6.jpg)
Big Data Opportunity• The Cancer Genome Atlas (TCGA)
– 20 cancer types x 500 patients each x 1 tumor genome + 1 normal genome = 5 petabytes
– David Haussler (UCSC) Datacenter online 12/11?– Intel donate AMP Lab cluster, put it next to TCGA
6
Slide from David Haussler, UCSC, “Cancer Genomics,” AMP retreat, 5/24/11
![Page 7: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/7.jpg)
Berkeley Systems Lab Model
7
Industrial Collaboration:
“Two Feet In” Model:
![Page 8: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/8.jpg)
Berkeley Data Analytics System (BDAS)
8
Infra. Builder
Algo/Tools
Data Collector
Data Analyst
Higher Query Languages / Processing Frameworks
Resource Management
StorageData
CollectorCrowd
Interface
Analytics Libraries, Data Integration
Data Source Selector
Result Control Center
Visualization
Qua
lity
Con
trol
Mon
itorin
g/D
ebug
ging
A Top-to-Bottom Rethinking of the Big Data Analytics Stack integratingAlgorithms, Machines, and People
![Page 9: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/9.jpg)
Algos: More Data = Better Answers
Given an inferential goal and a fixed computational budget, provide a guarantee (supported by an algorithm and an analysis) that the quality of inference will increase monotonically as data accrue (without bound)
Error bars on every answer!
Est
imat
e
# of data points
true answer
![Page 10: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/10.jpg)
Towards ML/Systems Co-Design
• Some ingredients of a system that can estimate and manage statistical risk:– Distributed bootstrap (bag of little bootstraps; BLB)– Subsampling (stratified)– Active sampling (cf. crowd-sourcing)– Bias estimation (especially with crowd-sourced data)– Distributed optimization– Streaming versions of classical ML algorithms– Streaming distributed bootstrap
• These all must be scalable, and robust
![Page 11: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/11.jpg)
Machines Agenda• New software stack to
• Effectively manage cluster resources• Effectively extract value out of big data
• Projects:• “Datacenter OS”
• Extend Mesos distributed resource manager
• Common Runtime• Structured, Unstructured, Streaming, Sampling, …
• New Processing frameworks & storage systems• E.g., Spark – parallel environment for iterative algorithms
• Example: Quicksilver Query Processor• Allows users to navigate tradeoff space (quality,
time, and cost) for complex queries11
![Page 12: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/12.jpg)
QuickSilver: Where do We Want to Go?
12
“simple” queries on PBs of data take hoursToday:
sub-second arbitrary queries on PBs of dataIdeal:
compute complex queries on PBs of data in < x seconds with < y% error
Goal:
![Page 13: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/13.jpg)
People• Make people an integrated part of
the system!• Leverage human activity• Leverage human intelligence
(crowdsourcing)
Use the crowd to:
Find missing data
Integrate data
Make subjective comparisons
Recognize patterns
Solve problems13
Machines + Algorithms
data
, ac
tivity
Que
stio
ns Answ
ers
![Page 14: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/14.jpg)
Human-Tolerant ComputingPeople throughout the analytics lifecycle?
• Inconsistent answer quality• Incentives• Latency & Variance• Open vs. Closed world• Hybrid Human/Machine Design
Approaches:• Statistical Methods for error and bias• Quality-conscious Interface design• Cost (time, quality)-based optimization
14
![Page 15: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/15.jpg)
CROWDSOURCING EXAMPLES
![Page 16: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/16.jpg)
Citizen ScienceNASA “Clickworkers” circa 2000
![Page 17: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/17.jpg)
Citizen Journalism/Participatory Sensing
17
![Page 18: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/18.jpg)
Expert Advice
![Page 19: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/19.jpg)
Data collection
• Freebase
![Page 20: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/20.jpg)
One View of Crowdsourcing
From Quinn & Bederson, “Human Computation: A Survey and Taxonomy of a Growing Field”, CHI 2011.
![Page 21: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/21.jpg)
Industry View
![Page 22: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/22.jpg)
Participatory Culture - Explicit
22
![Page 23: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/23.jpg)
Participatory Culture – Implicit
23
John Murrell: GM SV 9/17/09…every time we use a Google app or service, we are working on behalf of the search sovereign, creating more content for it to index and monetize or teaching it something potentially useful about our desires, intentions and behavior.
![Page 24: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/24.jpg)
Types of Tasks
Task Granularity Examples
Complex Tasks • Build a website• Develop a software system• Overthrow a government?
Simple Projects • Design a logo and visual identity• Write a term paper
Macro Tasks • Write a restaurant review• Test a new website feature• Identify a galaxy
Micro Tasks • Label an image• Verify an address• Simple entity resolution
Inspired by the report: “Paid Crowdsourcing”, Smartsheet.com, 9/15/2009
![Page 25: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/25.jpg)
Amazon Mechanical Turk (AMT)
![Page 26: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/26.jpg)
A Programmable Interface
26
• Amazon Mechanical Turk API• Requestors place Human Intelligence Tasks
(HITs) via “createHit()” API• Parameters include: #of replicas, expiration, User
Interface,…
• Requestors approve jobs and payment “getAssignments()”, “approveAssignments()”
• Workers (a.k.a. “turkers”) choose jobs, do them, get paid
![Page 27: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/27.jpg)
Worker’s View
![Page 28: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/28.jpg)
Requestor’s Veiew
![Page 29: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/29.jpg)
CrowdDB: A Radical New Idea?
“The hope is that, in not too many years, human brains and computing machines will be coupled together very tightly, and that the resulting partnership will think as no human brain has ever thought and process data in a way not approached by the information-handling machines we know today.”
![Page 30: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/30.jpg)
Problem: DB-hard Queries
30
SELECT Market_CapFrom CompaniesWhere Company_Name = “IBM”
Number of Rows: 0
Problem: Entity Resolution
Company_Name Address Market Cap
Google Googleplex, Mtn. View CA $170Bn
Intl. Business Machines Armonk, NY $203Bn
Microsoft Redmond, WA $206Bn
![Page 31: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/31.jpg)
DB-hard Queries
31
SELECT Market_CapFrom CompaniesWhere Company_Name = “Apple”
Number of Rows: 0
Problem: Closed World Assumption
Company_Name Address Market Cap
Google Googleplex, Mtn. View CA $170Bn
Intl. Business Machines Armonk, NY $203Bn
Microsoft Redmond, WA $206Bn
![Page 32: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/32.jpg)
DB-hard Queries
32
SELECT Top_1 ImageFrom PicturesWhere Topic = “Business Success”Order By Relevance
Number of Rows: 0
Problem: Subjective Comparison
![Page 33: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/33.jpg)
CrowdDB
33
Use the crowd to answer DB-hard queries
Where to use the crowd:• Find missing data• Make subjective
comparisons• Recognize patterns
But not:• Anything the computer
already does well M. Franklin et al. CrowdDB: Answering Queries with Crowdsourcing, SIGMOD 2011
![Page 34: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/34.jpg)
CrowdSQL
SELECT * FROM companies WHERE Name ~= “Big Blue”
34
CREATE CROWD TABLE department ( university STRING, department STRING, phone_no STRING) PRIMARY KEY (university, department);
CREATE TABLE company ( name STRING PRIMARY KEY, hq_address CROWD STRING);
DML Extensions:
SELECT p FROM picture WHERE subject = "Golden Gate Bridge" ORDER BY CROWDORDER(p, "Which pic shows better %subject");
DDL Extensions:
CROWDORDER operators (currently UDFs):CrowdEqual:
Crowdsourced columns Crowdsourced tables
![Page 35: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/35.jpg)
User Interface Generation
• A clear UI is key to response time and answer quality.
• Can leverage the SQL Schema to auto-generate UI (e.g., Oracle Forms, etc.)
35
![Page 36: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/36.jpg)
![Page 37: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/37.jpg)
Subjective Comparisons
37
MTFunction • implements the CROWDEQUAL and CROWDORDER
comparison • Takes some description and a type (equal, order)
parameter• Quality control again based on majority vote• Ordering can be further optimized (e.g., Three-way
comparisions vs. Two-way comparisons)
![Page 38: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/38.jpg)
Does it Work?: Picture ordering
38
Query:SELECT p FROM picture WHERE subject = "Golden Gate Bridge" ORDER BY CROWDORDER(p, "Which pic shows better %subject");
Data-Size: 30 subject areas, with 8 pictures eachBatching: 4 orderings per HITReplication: 3 Assignments per HITPrice: 1 cent per HIT
(turker-votes, turker-ranking, expert-ranking)
![Page 39: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/39.jpg)
User Interface vs. Quality
(Department first) (Professor first) (De-normalized Probe)
≈20% Error-Rate ≈80% Error-Rate39
≈10% Error-Rate
To get informationabout Professorsand their Departments…
![Page 40: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/40.jpg)
Can we build a “Crowd Optimizer”?
40
Select *From RestaurantWhere city = …
![Page 41: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/41.jpg)
Price vs. Response Time
41
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 600%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
$0.01 $0.02 $0.03 $0.04
Time (mins)
Perc
enta
ge o
f HIT
s th
at h
ave
at le
ast
one
assi
gnm
ent c
ompl
eted
5 Assignments, 100 HITs
![Page 42: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/42.jpg)
Turker Affinity and Errors
42 [Franklin, Kossmann, Kraska, Ramesh, Xin: CrowdDB: Answering Queries with Crowdsourcing. SIGMOD,2011]
Turker Rank
![Page 43: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/43.jpg)
Turker Affinity and Errors
43 [Franklin, Kossmann, Kraska, Ramesh, Xin: CrowdDB: Answering Queries with Crowdsourcing. SIGMOD,2011]
Turker Rank
![Page 44: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/44.jpg)
Can we build a “Crowd Optimizer”?
44
Select *From RestaurantWhere city = …
I would do work for this requester again.
This guy should be shunned.
I advise not clicking on his “information about restaurants” hits.
Hmm... I smell lab rat material.
be very wary of doing any work for this requester…
![Page 45: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/45.jpg)
Processor Relations?
Tim Klas Kraska
HIT Group » I recently did 299 HITs for this requester.… Of the 299 HITs I completed, 11 of them were rejected without any reason being given. Prior to this I only had 14 rejections, a .2% rejection rate. I currently have 8522 submitted HITs, with a .3% rejection rate after the rejections from this requester (25 total rejections). I have attempted to contact the requester and will update if I receive a response. Until then
be very wary of doing any work for this requester, as it appears that they are rejecting about 1 in every 27 HITs being submitted. posted by …
fair:2 / 5 fast:4 / 5 pay:2 / 5 comm:0 / 5
45
![Page 46: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/46.jpg)
Open World = No Semantics?
46
Select * From Crowd_Sourced_Table
• What does the above query return?
• In the old world, it was just a table scan.• In the crowdsourced world:
• Which answers are “Right”?• When to stop?
• BioStatistics to the Rescue????
![Page 47: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/47.jpg)
Open World = No Semantics?
47
Select * From Crowd_Sourced_Table
Species Acquisition Curve for Data
![Page 48: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/48.jpg)
Why the Crowd is Different• Classical Approaches don’t quite work• Incorrect Answers
– Chicago is not a State– How to spell Mississippi?
• Streakers vs. Samplers– Individuals sample without replacement– and Worker/Task affinity
• List Walking– e.g., Google “Ice Cream Flavors”
• The above can be detected and mitigated to some extent.
48
![Page 49: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/49.jpg)
How Can You Trust the Crowd?
• General Techniques– Approval Rate / Demographic Restrictions– Qualification Test– Gold Sets/Honey Pots– Redundancy– Verification/Review– Justification/Automatic Verification
• Query Specific Techniques• Worker Relationship Management49
![Page 50: Making Sense at Scale with Algorithms, Machines & People Michael Franklin EECS, Computer Science UC Berkeley Emory U December 7, 2011.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649ee45503460f94bf30fc/html5/thumbnails/50.jpg)
Making Sense at Scale• Data Size is only part of the challenge• Balance quality, cost and time for a given problem• To address, we must Holistically integrate
Algorithms, Machines, and People
amplab.cs.berkeley.edu
50