Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data...
Transcript of Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data...
![Page 1: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/1.jpg)
Ac#ve Learning
Burr Se0les
Machine Learning 10-‐701 / 15-‐781 Nov 14, 2012
![Page 2: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/2.jpg)
Let’s Play 20 Ques#ons!
• I’m thinking of something; ask me yes/no ques#ons to figure out what it is…
![Page 3: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/3.jpg)
How Do We Automate Inquiry? A Though Experiment
![Page 4: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/4.jpg)
A Thought Experiment
• suppose you are on an Earth convoy sent to colonize planet Zelgon
people who ate the round Zelgian fruits found them tasty!
people who ate the rough Zelgian fruits found them gross!
![Page 5: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/5.jpg)
Poisonous vs. Yummy Alien Fruits
• there is a con#nuous range of round-‐to-‐rough fruit shapes on Zelgon:
you need to learn how to classify fruits as safe or noxious
and you need to do this while risking as li0le as possible (i.e., colonist health)
![Page 6: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/6.jpg)
Supervised Learning Approach
problem:
PAC theory tells us we need O(1/ε) tests to obtain an error rate of ε…
a lot of people might get sick in the process!
![Page 7: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/7.jpg)
Can We Do Be0er?
this is just a binary search…
requiring O(1/ε) fruits (e.g., samples) but only O (log2 1/ε) tests (e.g., queries)
our first “ac#ve learning” algorithm!
![Page 8: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/8.jpg)
Supervised Learning
raw unlabeled data
supervised learner induces a classifier
labeled training instances
expert / oracle analyzes experiments
to determine labels
random sample
![Page 9: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/9.jpg)
Ac#ve Learning
raw unlabeled data
expert / oracle analyzes experiments
to determine labels
active learner induces a classifier
inspect the unlabeled data
request labels for selected data
![Page 10: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/10.jpg)
Learning Curves
text classification: baseball vs. hockey
active learning
passive learning
bette
r
cost (e.g., number of instance queries)
![Page 11: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/11.jpg)
Who Uses Ac#ve Learning?
Sentiment analysis for blogs; Noisy relabeling – Prem Melville!
Biomedical NLP & IR; Computer-aided diagnosis – Balaji Krishnapuram
MS Outlook voicemail plug-in [Kapoor et al., IJCAI'07]; “A variety of prototypes that are in use throughout the company.” – Eric Horvitz
“While I can confirm that we're using active learning in earnest on many problem areas… I really can't provide any more details than that. Sorry to be so opaque!" – David Cohn
![Page 12: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/12.jpg)
Ac#ve Learning Scenarios
![Page 13: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/13.jpg)
Problems with Query Synthesis
13
an early real-‐world applica#on: neural-‐net queries synthesized for handwri0en digits [Lang & Baum, 1992]
problem: humans couldn’t interpret the queries!
ideally, we can ensure that the queries come from the underlying “natural” distribu>on
![Page 14: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/14.jpg)
Ac#ve Learning Scenarios
more common in theory papers
more common in application papers
![Page 15: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/15.jpg)
Ac>ve Learning Approaches (1) Uncertainty Sampling
![Page 16: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/16.jpg)
Zelgian Fruits Revisited
• let’s interpret our Zelgian fruit binary search in terms of a probabilis7c classifier:
0.5 0.0
1.0 0.5 0.5
![Page 17: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/17.jpg)
Uncertainty Sampling
• query instances the learner is most uncertain about
400 instances sampled from 2 class Gaussians
random sampling 30 labeled instances
(accuracy=0.7)
uncertainty sampling 30 labeled instances
(accuracy=0.9)
[Lewis & Gale, SIGIR’94]
![Page 18: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/18.jpg)
Common Uncertainty Measures
least confident
margin
entropy
![Page 19: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/19.jpg)
Common Uncertainty Measures
note: for binary tasks, these are func#onally equivalent!
![Page 20: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/20.jpg)
Common Uncertainty Measures
illustration of preferred (dark red) posterior distributions in a 3-label classification task
note: for mul#-‐class tasks, these are not equivalent!
![Page 21: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/21.jpg)
Informa#on-‐Theore#c Interpreta#on
• the “surprisal” I is a measure (in bits, nats, etc.) of the informa#on content for outcome y of variable Y:
• so this is how “informa#ve” the oracle’s label y will be
• but the learner doesn’t know the oracle’s answer yet! we can es#mate it as an expecta7on over all possible labels:
• which is entropy-‐based uncertainty sampling
![Page 22: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/22.jpg)
Uncertainty Sampling in Prac#ce
• pool-‐based ac#ve learning: – evaluate each x in U – rank and query the top K instances
– retrain, repeat
• selec#ve sampling: – threshold a “region of uncertainty,” e.g., [0.2, 0.8] – observe new instances, but only query those that fall within the region
– retrain, repeat
![Page 23: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/23.jpg)
Uncertainty Sampling: Example
target function
active neural net (stream-based uncertainty sampling)
20 40 60 80 100
neural net trained from 100 random pixels
100
![Page 24: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/24.jpg)
Simple and Widely-‐Used
• text classifica>on – Lewis & Gale ICML’94;
• POS tagging – Dagan & Engelson, ICML’95;
Ringger et al., ACL’07
• disambigua>on – Fujii et al., CL’98;
• parsing – Hwa, CL’ 04
• informa>on extrac>on – Scheffer et al., CAIDA’01;
Se0les & Craven, EMNLP’08
• word segmenta>on – Sassano, ACL’02
• speech recogni>on – Tur et al., SC’05
• translitera>on – Kuo et al., ACL’06
• transla>on – Haffari et al., NAACL’09
![Page 25: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/25.jpg)
Uncertainty Sampling: Failure?!
active neural net (stream-based uncertainty sampling)
20 40 60 80 100
target function initial sample
![Page 26: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/26.jpg)
What To Do?
• uncertainty sampling only uses the confidence of one single classifier – e.g., a “point es#mate” for parametric models
– this classifier can become overly confident about instances is really knows nothing about!
• instead, let’s consider a different no#on of “uncertainty”… about the classifier itself
![Page 27: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/27.jpg)
Ac>ve Learning Approaches (2) Hypothesis Space Search
![Page 28: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/28.jpg)
Remember Version Spaces?
• the set of all classifiers that are consistent with the labeled training data
• the larger the version space V, the less likely each possible classifier is… we want queries to reduce |V|
![Page 29: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/29.jpg)
Alien Fruits Revisited
• let’s try interpre#ng our binary search in terms of a version space search:
possible classifiers (thresholds): 8 4 2 1
![Page 30: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/30.jpg)
Version Space Search
• in general, the version space V may be too large to enumerate, or to measure the size |V| through analy#cal trickery
• observa>on: for the Zelgian fruits example, uncertainty sampling and version space search gave us the same queries!
• how far can uncertainty sampling get us?
![Page 31: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/31.jpg)
Version Spaces for SVMs
F (feature space)
“version space duality” (Vapnik, 1998)
points in F correspond to hyperplanes in H
and vice versa H (hypothesis space)
SVM with largest margin is the center of the largest
hypersphere in V
![Page 32: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/32.jpg)
Bisec#ng the SVM Version Space
• hence, uncertainty sampling is a special case of version space search for SVMs (and other so-‐called “max-‐margin” classifiers)
F (feature space)
points closest to the class boundary in F correspond to planes that bisect V in H H (hypothesis space)
![Page 33: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/33.jpg)
Query By Disagreement (QBD)
• in general, uncertainty doesn’t cut it
• idea: we wish to quickly eliminate bad hypotheses; train two classifiers G and S which represent the two “extremes” of the version space
• if these two models disagree, the instances falls within the “region of uncertainty”
[Cohn et al., ML’94]
![Page 34: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/34.jpg)
Neural Network Triangles Revisited
uncertainty sampling:
QBD:
target function initial sample
![Page 35: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/35.jpg)
Query By Commi0ee (QBC)
• simpler, more general approach
• train a commi0ee of classifiers C – no need to maintain G and S
– commi0ee can be any size
• query instances for which commi0ee members disagree
[Seung et al., COLT’92]
![Page 36: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/36.jpg)
QBC in Prac#ce
• selec#ve sampling: – train a commi0ee C – observe new instances, but only query those for which there is disacreement (or a lot of disagreement)
– retrain, repeat
• pool-‐based ac#ve learning: – train a commi0ee C – measure disagreement for each x in U – rank and query the top K instances – retrain, repeat
![Page 37: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/37.jpg)
QBC Design Decisions
• how to build a commi0ee: – “sample” models from P(θ|L)
• [Dagan & Engelson, ICML’95; McCallum & Nigam, ICML’98]
– standard ensembles (e.g., boos#ng, bagging) • [Abe & Mamitsuka, ICML’98]
• how to measure disagreement (many): – “XOR” commi0ee classifica#ons – view vote distribu#ons as probabili#es, use uncertainty measures…
![Page 38: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/38.jpg)
QBC Disagreement Measures
• “sov” vote entropy:
• average Kullback-‐Liebler (KL) divergence:
![Page 39: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/39.jpg)
QBC Disagreement Measures
heatmaps illustrating query heuristics for a 3-label classification task using multinomial logistic regression (e.g., a MaxEnt model)
![Page 40: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/40.jpg)
QBC Disagreement Measures
uncertain hypotheses; but in agreement
confident hypotheses; but in disagreement
SVE cannot tell either of these apart
KL divergence will query this
![Page 41: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/41.jpg)
Informa#on-‐Theore#c Interpreta#on
• we want to query the instance whose label contains maximal mutual informa#on about the version space: I(Y; V)
• consider the iden#ty:
• this jus#fies querying instances which will reduce |V| ≈ H(V) in expecta#on
![Page 42: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/42.jpg)
Informa#on-‐Theore#c Interpreta#on
• an alternate, equivalent iden#ty:
• which, under a few simple assump#ons, reduces to the KL-‐divergence heuris#c for QBC
![Page 43: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/43.jpg)
Limita#ons of Version Space Search
imagine Zelgon has both grey and red fruits, with different thresholds?
there are two queries A and B both bisect V
which query will result is the lowest classification error?
![Page 44: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/44.jpg)
Ac>ve Learning Approaches (3) Using the Data Distribu#on
![Page 45: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/45.jpg)
Expected Error Reduc#on
• minimize the expected 1/0 loss of a query x
[Roy & McCallum, ICML’01; Zhu et al., ICML-WS’03]
sum over unlabeled instances
0/1 error of x’ after retraining with x
expectation over possible labelings of x
![Page 46: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/46.jpg)
Expected Error Reduc#on
• minimize the expected log loss of a query x
[Roy & McCallum, ICML’01; Zhu et al., ICML-WS’03]
sum over unlabeled instances
entropy of x’ after retraining with x
expectation over labelings of x
![Page 47: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/47.jpg)
Text Classifica#on Examples [Roy & McCallum, ICML’01]
![Page 48: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/48.jpg)
Text Classifica#on Examples [Roy & McCallum, ICML’01]
![Page 49: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/49.jpg)
Informa#on-‐Theore#c Interpreta#on
• aim to maximize the informa7on gain over U uncertainty before query expected loss
distribute the sum
drop this constant term
![Page 50: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/50.jpg)
Poor Scalability
• expected error reduc#on tries to directly op#mize the loss of interest, but…
• quickly becomes intrac#ble – logis#c regression requires O(ULG) #me – MaxEnt would require O(M2ULG) #me
![Page 51: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/51.jpg)
Approxima#on: Density-‐Weigh#ng
• assume that the informa#on gained per unlabeled instance x’ is propor#onal to its similarity to the query x:
density term (i.e., similarity)
“base” utility measure
![Page 52: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/52.jpg)
Ac>ve Learning++ Beyond Instance Queries
![Page 53: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/53.jpg)
Beyond Instance Queries
• most research in ac#ve learning has been based on a few simple assump#ons: – “cost” is propor#onal to training set size – queries must be unlabeled instances – there is only a single classifier to train
![Page 54: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/54.jpg)
1. Real Annota#on Costs
54
[Settles et al., 2008]
[Results supported by Aurora et al., ALNLP’09; Vijayanarasimhan & Grauman, CVPR’09]
information extraction
computer vision
subjectivity analysis
information extraction
empirical study of 7me as labeling cost for four data sets:
![Page 55: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/55.jpg)
Strategies for Variable Annota#on Costs
• use the current trained model assist with automa#c pre-‐annota#on – some successes [Baldridge & Osbourne ’04; Culo0a & McCallum ’05; Baldridge & Palmer ’09; Felt et al. ’12]
• train a regression cost model in parallel (i.e., to predict #me or $$) and incorporate that into the query selec#on heuris#c – mixed results [Se0les et al. ’08; Haertel et al. ’08; Tomanek and Hahn ’10]
![Page 56: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/56.jpg)
2. New Query Types
• in many NLP applica#ons, “features” are discrete variables with seman#c meaning: – words – affixes – capitaliza#on – other orthographic pa0erns
• what if ac#ve learning systems could ask about “feature labels,” too?
[Druck et al., EMNLP’09; Settles, EMNLP’11]
![Page 57: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/57.jpg)
DUALIST
57
[Settles, EMNLP’11]
![Page 58: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/58.jpg)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs 0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
Results: Movie Reviews
time (seconds) time (seconds)
accu
racy
ac
cura
cy
58
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
![Page 59: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/59.jpg)
Results: WebKB
time (seconds) time (seconds)
accu
racy
ac
cura
cy
59
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
![Page 60: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/60.jpg)
0
0.2
0.4
0.6
0.8
1
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
Results: Science
time (seconds) time (seconds)
accu
racy
ac
cura
cy
60
0
0.2
0.4
0.6
0.8
1
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
0
0.2
0.4
0.6
0.8
1
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
0
0.2
0.4
0.6
0.8
1
0 60 120 180 240 300 360
DUALISTactive / docs
passive / docs
![Page 61: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/61.jpg)
3. Mul#-‐Task, Mul#-‐View Ac#ve Learning
• CMU’s NELL (Never Ending Language Learner)
• given: an ontology (schema), access to the Web, and a few seed examples per predicate, and periodic access to humans
• task: run 24x7 each day, popula#ng a knowledge base with new facts – learning to read and reading to learn …
[Carlson et al., AAAI’10]
![Page 62: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/62.jpg)
NELL’s Architecture
Knowledge Integrator!
candidate facts!
Data Resources
(e.g., corpora)#
beliefs!
Knowledge Base!
CPL# CSEAL# CMC# RL#
Subcomponent Learners!
• mul#ple tasks/views constrain each other, helping to prevent concept driv (“checks and balances”)
• to date: >1.5 million beliefs at 80% precision
![Page 63: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/63.jpg)
One View: CPL (contextual pa0erns)
63
![Page 64: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/64.jpg)
Another View: CMC (orthographic features)
64
![Page 65: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/65.jpg)
Gender Issues
![Page 66: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/66.jpg)
I proudly voted for _ _ is still the governor _ is the Republican nominee _ signed the legislation _ signed this bill
impeachment proceedings of _ _ 's inaugural _ signs bill endorsed _ vice presidential candidates like _
• these CPL pa0erns are generally correlated with males across the Web
• even though CMC learned that “Sarah” is a female name, these pa0erns ini#ally overwhelmed all other evidence, and NELL predicted male
• these days, NELL uses mul#-‐task/view ac#ve learning algorithms to iden#fy beliefs with “conflic#ng” evidence, and query them
![Page 67: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/67.jpg)
Interes#ng Open Issues
• be0er cost-‐sensi#ve approaches • “crowdsourced” labels (noisy oracles) • batch ac#ve learning (many queries at once)
• HCI / user interface issues • data reusability
![Page 68: Ac#ve&Learning&epxing/Class/10701-12f/Lecture/settles...Supervised&Learning& raw unlabeled data supervised learner induces a classifier labeled training instances expert / oracle analyzes](https://reader034.fdocuments.in/reader034/viewer/2022042216/5ebe7249acf22d215e58e13f/html5/thumbnails/68.jpg)
For Further Reading…
new book published by Morgan &
Claypool
free to download from the CMU
campus network
active-learning.net