CROWDSOURCING
description
Transcript of CROWDSOURCING
![Page 1: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/1.jpg)
CROWDSOURCING
Massimo PoesioPart 4: Dealing with crowdsourced
data
![Page 2: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/2.jpg)
THE DATA
• The result of crowdsourcing in whatever form is a mass of often inconsistent judgments
• Need techniques for identifying reliable annotations and reliable annotators– In the Phrase Detectives context, to discriminate
between genuine ambiguity and disagreements due to error
![Page 3: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/3.jpg)
THE ANDROCLES EXAMPLE
![Page 4: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/4.jpg)
SOME APPROACHES
• Majority voting• But: it ignores the substantial differences in
behavior between annotators• Alternatives:
– Removing bad annotators eg using clustering– Weighing annotators
![Page 5: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/5.jpg)
SNOW ET AL
![Page 6: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/6.jpg)
SNOW ET AL: WEIGHTING ANNOTATORS
![Page 7: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/7.jpg)
LATENT MODELS OF ANNOTATION QUALITY
• The problem of reaching a conclusion on the basis of judgments by separate experts that may often be in disagreement is a longstanding one in epidemiology
• A number of techniques developed, including– Dawid and Skene 1979 (also used by Passonneau &
Carpenter)– Latent Annotation model (Uebersax 1994)– Raykar et al 2010
• Recently, Carpenter has been developing an explicit Hierarchical Bayesian model (2008)
![Page 8: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/8.jpg)
DAWID AND SKENE 1979
• Model consists of likelihood for1. annotations (labels from annotators) 2. categories (true labels) for items given 3. annotator accuracies and biases4. prevalence of labels • Frequentists estimate 2–4 given 1• Optional regularization of estimates (for 3 and
4)
![Page 9: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/9.jpg)
A GENERATIVE MODEL OF THE ANNOTATION TASK
• What all of these models do is to provide an EXPLICIT PROBABILISTIC MODEL of the observations in terms of annotators, labels, and items
![Page 10: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/10.jpg)
THE DATA
• K possible labels• J annotators• I number of items • N total number of annotations of the I items
produced by the J annotators• y_{i,j}: label produced for item i by coder j
![Page 11: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/11.jpg)
THE DATA: BY ITEM
ITEM CODER 1 CODER 2 … CODER J
1 y_{1,1} y_{1,2} ..
2 y_{2,1} ..
3
4 y_{4,2} y_{4,J}
…
I y_{I,1} …
![Page 12: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/12.jpg)
THE DATA: BY ANNOTATIONS
ANNOTATION LABEL
1 y_{1,1}
2 y_{1,2}
3 y_{2,3}
4 ..
..
N
![Page 13: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/13.jpg)
THE ANNOTATION TABLE
ANNOTATION Ii_n jj_n y_{ii_n,jj_n}
1 1 1 A
2 1 2 A
3 2 3 B
4 ..
..
N
![Page 14: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/14.jpg)
A GENERATIVE MODEL OF THE ANNOTATION TASK
• The probabilistic model specifies the probability of a particular label on the basis of PARAMETERS specifying the behavior of the annotators, the prevalence of the labels, etc
• In Bayesian models, these parameters are specified in terms of PROBABILITY DISTRIBUTIONS
![Page 15: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/15.jpg)
THE PARAMETERS OF THE MODEL
• z_i: the ACTUAL category of item i• Θ_{j,k,k’}: ANNOTATOR RESPONSE
– the probability that annotator j labels an item as k’ when it belongs to category k
• π_k: PREVALENCE– The probability that an item belongs to category k
![Page 16: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/16.jpg)
DISTRIBUTIONS
• Each of the parameters is characterized in terms of a PROBABILITY DISTRIBUTION
• When we have some information on the data, these distributions can be used to characterize their behavior – E.g., annotators may be all equally good / there
may be a skew• Otherwise just defaults
![Page 17: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/17.jpg)
DISTRIBUTIONS
• Prevalence of labels (PRIOR)– π ~ Dir(α)
• Annotator j’s response to item of category k (PRIOR)– Θ_{j,k} ~ Dir(β_k)
• True category of item i (LIKELIHOOD):– z_i ~ Categorical(π)
• Label from j for item i (LIKELIHOOD):– y_{i,j} ~ Categorical(Θ_{j,z_i})
![Page 18: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/18.jpg)
TYPES OF ANNOTATORS: SPAMMY
(RESPONSE TO ALL ITEMS THE SAME)
![Page 19: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/19.jpg)
TYPES OF ANNOTATORS: BIASED
(HAS SKEW IN RESPONSE – COMMON IN LOW PREVALENCE DATA)
![Page 20: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/20.jpg)
QUICK INTRO TO DIRICHLET
• Dirichlet is often seen in Bayesian models (e.g., Latent Dirichlet Allocation, LDC) because it is a CONJUGATE PRIOR of the MULTINOMIAL distribution
![Page 21: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/21.jpg)
BINOMIAL AND MULTINOMIAL
![Page 22: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/22.jpg)
NLE 24
CONJUGATE PRIOR• In Bayesian inference the objective is to compute a
POSTERIOR on the basis of a LIKELIHOOD and a PRIOR
• A CONJUGATE PRIOR of distribution D is a distribution such that if it is used for the prior, then the posterior also has that shape– E.g., ‘Dirichlet is a conjugate prior of the multinomial’ means that if
the likelihood is a multinomial and the prior is Dirichlet then the posterior is also Dirichlet.
)()()|()|(
APBPBAPABP
![Page 23: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/23.jpg)
DIRICHLET DISTRIBUTION
![Page 24: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/24.jpg)
CATEGORICAL
• The categorical distribution is a generalization of the Bernoulli distribution that specifies the probability of a given outcome for a binary trial– E.g., the probability of getting a head in a coin toss– Cfr.: BINOMIAL distribution that specifies the
probability of getting N heads
![Page 25: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/25.jpg)
A GRAPHICAL VIEW OF THE MODEL
![Page 26: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/26.jpg)
THE PROBABILISTIC MODEL OF A GIVEN LABEL
![Page 27: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/27.jpg)
AN EXAMPLE
![Page 28: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/28.jpg)
PROBABILISTIC INFERENCE
• Probabilistic inference techniques are used to INFER the parameters from the data and therefore compute the probabilities and parameters– Often: Expectation Maximization (EM)
• The EM implementation in R used by Carpenter & Passonneau to estimate the parameters available from – https://github.com/bob- carpenter/ anno
![Page 29: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/29.jpg)
APPLICATION TO WORD SENSE DISTRIBUTION (CARPENTER & PASSONNEAU, 2013, 2014)
• Carpenter and Passonneau used the Dawid and Skeene model to compare manual annotators with turkers on word sense disambiguation anno of the MASC corpus
![Page 30: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/30.jpg)
THE MASC corpus
• Manually annotated subcorpus (MASC)– 500K word subset of Open American National Corpus
(OANC)• Multiple genres: technical manuals, poetry, news,
dialogue, etc.• 16 types of annotation (not all manual)
– part of speech, phrases, word sense, named entity, ... • 100 item word-sense corpus
– balanced by genre and part-of-speech (noun, verb, adjective)
![Page 31: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/31.jpg)
MASC WORDSENSE
• 100 words balanced between adjs, nouns, & verbs
• 1000 sentences for each word• Annotated using WordNet senses for these
words• ~ 1M tokens
![Page 32: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/32.jpg)
MASC Wordsense: annotation using trained annotators
• pre-training on 50 items• independent labeling of 1000 items • 100 items labeled by 3 or 4 annotators • agreement on these 100 items reported• only single round of annotation, most items
single annotated
![Page 33: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/33.jpg)
Annotation using trained annotators
• College students from Vassar, Barnard, Columbia
• 2–3 years of work on project• General training plus per-word training• Supervised by
– Becky Passonneau– Nancy Ide (maintainer of MASC)– Christiane Fellbaum (maintainer of WordNet)
![Page 34: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/34.jpg)
Annotation using crowdsourcing
• 45 randomly selected words balanced across nouns, verbs, and adjectives were reannotated using crowdsourcing
• 1000 instances per word • 25+ annotators per instance • high number of annotators to
– estimate difficulty– reject independence of labels
![Page 35: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/35.jpg)
Differences from trained situation
• Annotators not trained• Not told to look at WordNet• Each HIT:
– 10 sentences for the same word– WordNet senses listed under the word
![Page 36: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/36.jpg)
METHODS
• Passonneau & Carpenter used their model to– Evaluate prevalence of labels in different ways– Evaluate annotator response
![Page 37: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/37.jpg)
PREVALENCE ESTIMATION
![Page 38: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/38.jpg)
ASSESSMENT OF QUALITY
![Page 39: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/39.jpg)
ANNOTATOR RESPONSE
![Page 40: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/40.jpg)
AGREEMENT RATES
![Page 41: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/41.jpg)
OTHER MODELS
• Raykar et al, 2010• Carpenter, 2008
![Page 42: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/42.jpg)
RAYKAR ET AL 2010
• Simultaneously ESTIMATES THE GROUND TRUTH from noisy labels, produces an ASSESSMENT OF THE ANNOTATORS, and LEARNS A CLASSIFIER– Based on logistic regression
• Bayesian (includes priors on the annotators)
![Page 43: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/43.jpg)
ANNOTATORS
• Annotator j characterized by her/his– SENSITIVITY: the ability to recognize positive cases
• α_j = P(y_j=1|y=1)– SPECIFICITY: the ability to recognize negative cases
• β_j = P(y_j=1|y=1)
![Page 44: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/44.jpg)
www.phrasedetectives.com
Raykar et al propose a version of the EM algorithm that can be used to estimate P(O|θ) as well as sensitivity and specificity for each annotator
€
P O |θ( ) = P y i1...y i
R | x i,θ( )i=1
N
∏
€
P O |θ( ) = [aipi + bii=1
N
∏ (1− pi)]
Carpenter developed a fully Bayesian version of the approach based on gradient descent
RAYKAR ET AL
![Page 45: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/45.jpg)
CARPENTER
![Page 46: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/46.jpg)
DISAGREEMENT IN INTERPRETATION
![Page 47: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/47.jpg)
15.12 M: we’re gonna take the engine E3
15.13 : and shove it over to Corning
15.14 : hook [it] up to [the tanker car]
15.15 : _and_
15.16 : send it back to Elmira
(from the TRAINS-91 dialogues collected at the University of Rochester)
AMBIGUITY: REFERENT
![Page 48: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/48.jpg)
www.phrasedetectives.com
About 160 workers at a factory that made paper for the Kent filters were exposed to asbestos in the 1950s.
Areas of the factory were particularly dusty where the crocidolite was used.
Workers dumped large burlap sacks of the imported material into a huge bin, poured in cotton and acetate fibers and mechanically mixed the dry fibers in a process used to make filters.
Workers described "clouds of blue dust" that hung over parts of the factory,
even though exhaust fans ventilated the area.
AMBIGUITY: REFERENT
![Page 49: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/49.jpg)
AMBIGUITY: EXPLETIVES
'I beg your pardon!' said the Mouse, frowning, but very politely: 'Did you speak?'
'Not I!' said the Lory hastily.
'I thought you did,' said the Mouse. '--I proceed. "Edwin and Morcar,the earls of Mercia and Northumbria, declared for him: and even Stigand,the patriotic archbishop of Canterbury, found it advisable--"'
'Found WHAT?' said the Duck.
'Found IT,' the Mouse replied rather crossly: 'of course you know what"it" means.'
![Page 50: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/50.jpg)
OTHER DATA: WORDSENSE DISAMBIGUATION (Passonneau et al 2010)
And our ideas of what constitutes a FAIR wage on a FAIR return on capital are historically contingent … {sense1, sense1, sense1, sense2, sense2, sense2}
… the federal government … is wrangling for its FAIR share of the dividend … {sense1, sense1, sense2, sense2, sense8, sense8}
![Page 51: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/51.jpg)
OTHER DATA: POS (Plank, Hovy & Søgaard 2014)
Noam goes OUT tonight {ADP, PRT}
Noam likes SOCIAL media {ADJ, NOUN}
![Page 52: CROWDSOURCING](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681668e550346895dda5d93/html5/thumbnails/52.jpg)
REFERENCES
• Passonneau & Carpenter, 2014. The Benefits of a Model of Annotation. TACL. To appear.
• Raykar, Yu, Zhao, Valadez, Florin, Bogoni, & Moy, 2010. Learning from crowds. Journal of Machine Learning Research.