Chapter 11 Section 3 Testing the Difference Between Two Means: Dependent Samples 1.
Restrict learning to a model-dependent “easy” set of samples General form of objective:
-
Upload
kelly-english -
Category
Documents
-
view
16 -
download
0
description
Transcript of Restrict learning to a model-dependent “easy” set of samples General form of objective:
Restrict learning to a model-dependent “easy” set of samples
General form of objective:
Introduce indicator of “easiness” vi:
K determines threshold for a set being easy, which is annealed over successive iterations until all samples used
Self-Paced Learning for Latent Variable Models M. Pawan Kumar, Ben Packer, and Daphne Koller
Motivation
Learning Latent Variable Models
Experiments
Intuitions from Human Learning: • all information at once may be confusing => bad local minima• start with “easy” examples the learner is prepared to handle
Maximize log likelihood: maxw i log P(xi,yi;w)Iterate:
• Find expected value of hidden variables using current w • Update w to maximize log likelihood subject to this expectation
Self-Paced Learning
??
Large K Medium K Small K
Optimization
Image label y is object class only, h is bounding boxΨ(xi,yi,hi) is HOG features in bounding box (offset by class)
Object Classification – Mammals Dataset
Motif Finding – UniProbe Dataset
x is DNA sequence, h is motif position, y is binding affinity
Handwriting Recognition - MNISTx is raw image, y is digit, h is image rotation, use linear kernel
1 vs. 7 2 vs. 7 3 vs. 8 8 vs. 9
Noun Phrase Coreference – MUC6x consists of pairwise features between pairs of nounsy is a clustering of nounsh specifies a forest of nouns s.t. each tree is a cluster of nouns
Aim: To learn an accurate set of parameters for latent variable models
Okay… Got it!
Standard Learning Self-Paced Learning
Latent Variable Models
x
y
hx : input or observed variablesy : output or observed variablesh : hidden/latent variables
y = “Deer”
xh
Goal: Given D = {(x1,y1), …, (xn,yn)}, learn parameters w.
Expectation-Maximization for Maximum Likelihood
Minimize upper bound on risk minw ||w||2 + C·i maxy’,h’ [w·Ψ(xi,y’,h’) + Δ(yi,y’,h’)] - C·i maxh [w·Ψ (xi,yi,h)]Iterate:
• Impute hidden variables• Update weights to minimize upper bound on risk given these hidden variables
Latent Struct SVM [2]
Initialize K to be largeIterate:
Run inference over hAlternatively update w and v:
v set by sorting li(w), comparing to threshold 1/KPerform normal update for w over subset of data
Until convergenceAnneal K K/μ
Until all vi = 1, cannot reduce objective within tolerance
Easier subsets in early iterations, avoids learning from samples whose hidden variables are imputed incorrectly
Iteration 1 Iteration 3
Iteration 5 Iteration 7
minw r(w) + i li(w)
minw r(w) + i vili(w) – 1/K i vi
h = Bounding Box
Bengio et al. [1]: user-specified ordering
“Self-paced” schedule of examples is automatically set by learner
• task-specific• onerous on user
• “easy for human” “easy for computer”• “easy for Learner A” “easy for Learner B”
Training Error (%)
0
0.25
0.5
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
Objective
4
4.25
4.5
4.75
5
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
5
10
15
20
25
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
Test Error (%)
CCCP
SPL
Ob
ject
Cla
ssifi
cati
on
Moti
f Fin
din
g Discussion
Compare Self-Paced Learning to standard CCCP as in [2]
• Self-paced strategy outperforms state of the art• Global solvers for biconvex optimization may improve accuracy• Method is ideally suited to handle multiple levels of annotations
[1] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009.[2] C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009.
152025303540
Protein 1Protein 2Protein 3Protein 4Protein 5
Training Error (%)
020406080
100120
Protein 1Protein 2Protein 3Protein 4Protein 5
Objective
152025303540
Protein 1Protein 2Protein 3Protein 4Protein 5
CCCP
SPL
Test Error (%)