Loss-based Learning with Weak Supervision

80
Loss-based Learning with Weak Supervision M. Pawan Kumar

description

Loss-based Learning with Weak Supervision. M. Pawan Kumar. About the Talk. Methods that use latent structured SVM A little math-y Initial stages. Outline. Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI. - PowerPoint PPT Presentation

Transcript of Loss-based Learning with Weak Supervision

Page 1: Loss-based Learning  with Weak Supervision

Loss-based Learning with Weak Supervision

M. Pawan Kumar

Page 2: Loss-based Learning  with Weak Supervision

About the Talk• Methods that use latent structured SVM

• A little math-y

• Initial stages

Page 3: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Andrews et al., NIPS 2001; Smola et al., AISTATS 2005;Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

Outline

Page 4: Loss-based Learning  with Weak Supervision

Weakly Supervised Data

Input x

Output y {-1,+1}

Hidden h

x

y = +1

h

Page 5: Loss-based Learning  with Weak Supervision

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

x

y = +1

h

Page 6: Loss-based Learning  with Weak Supervision

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,+1,h) Φ(x,h)

0=

x

y = +1

h

Page 7: Loss-based Learning  with Weak Supervision

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,-1,h) 0

Φ(x,h)=

x

y = +1

h

Page 8: Loss-based Learning  with Weak Supervision

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

Score f : Ψ(x,y,h) (-∞, +∞)

Optimize score over all possible y and h

x

y = +1

h

Page 9: Loss-based Learning  with Weak Supervision

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Latent SSVM

Page 10: Loss-based Learning  with Weak Supervision

Training data {(xi,yi), i = 1,2,…,n}

Highly non-convex in w

Cannot regularize w to prevent overfitting

w* = argminw Σi Δ(yi,yi(w))

Learning Latent SSVM

Minimize empirical risk specified by loss function

Page 11: Loss-based Learning  with Weak Supervision

Δ(yi,yi(w))wTΨ(x,yi(w),hi(w)) + - wTΨ(x,yi(w),hi(w))

Δ(yi,yi(w))≤ wTΨ(x,yi(w),hi(w)) + - maxhi wTΨ(x,yi,hi)

Δ(yi,y)}≤ maxy,h{wTΨ(x,y,h) + - maxhi wTΨ(x,yi,hi)

Training data {(xi,yi), i = 1,2,…,n}

Learning Latent SSVM

Page 12: Loss-based Learning  with Weak Supervision

Training data {(xi,yi), i = 1,2,…,n}

minw ||w||2 + C Σiξi

wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi

Difference-of-convex program in w

Local minimum or saddle point solution (CCCP)

Learning Latent SSVM

Page 13: Loss-based Learning  with Weak Supervision

Start with an initial estimate of w

minw ||w||2 + C Σiξi

wTΨ(xi,y,h) + Δ(yi,y) - wTΨ(xi,yi,hi*) ≤ ξi

CCCP

Impute hidden variables

hi* = argmaxh wTΨ(xi,yi,h)

Update w

Repeat until convergence

Loss independent

Loss dependent

Page 14: Loss-based Learning  with Weak Supervision

minw ||w||2 + C Σiξi

wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Learning

Recap

Page 15: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Joint Work with Aseem Behl and C. V. Jawahar

Outline

Page 16: Loss-based Learning  with Weak Supervision

RankingRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Average Precision = 1

Page 17: Loss-based Learning  with Weak Supervision

RankingRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Average Precision = 1 Accuracy = 1Average Precision = 0.92 Accuracy = 0.67Average Precision = 0.81

Page 18: Loss-based Learning  with Weak Supervision

Ranking

During testing, AP is frequently used

During training, a surrogate loss is used

Contradictory to loss-based learning

Optimize AP directly

Page 19: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking– Supervised Learning– Weakly Supervised Learning– Latent AP-SVM– Experiments

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Outline

Yue, Finley, Radlinski and Joachims, 2007

Page 20: Loss-based Learning  with Weak Supervision

Supervised Learning - Input

Training images X Bounding boxes H

P

N

= {HP,HN}

Page 21: Loss-based Learning  with Weak Supervision

Supervised Learning - Output

Ranking matrix Y

Yik =

+1 if i is better ranked than k

-1 if k is better ranked than i

0 if i and k are ranked equally

Optimal ranking Y*

Page 22: Loss-based Learning  with Weak Supervision

SSVM Formulation

Ψ(X,Y,{HP,HN})ΣiPΣkN Yik (Φ(xi,hi)-Φ(xk,hk))

|P||N|

Scoring function

wTΨ(X,Y,{HP,HN})

Joint feature vector

=

Page 23: Loss-based Learning  with Weak Supervision

Prediction using SSVM

Y(w) = argmaxY wTΨ(X,Y, {HP,HN})

Sort by value of sample score wTΦ(xi,hi)

Same as standard binary SVM

Page 24: Loss-based Learning  with Weak Supervision

Learning SSVM

Δ(Y*,Y(w))minw

Loss = 1 – AP of prediction

Page 25: Loss-based Learning  with Weak Supervision

Learning SSVM

Δ(Y*,Y(w))

wTΨ(X,Y(w),{HP,HN})+

-wTΨ(X,Y(w),{HP,HN})

Page 26: Loss-based Learning  with Weak Supervision

Learning SSVM

Δ(Y*,Y(w))

wTΨ(X,Y(w),{HP,HN})+

-wTΨ(X,Y*,{HP,HN})

Page 27: Loss-based Learning  with Weak Supervision

Learning SSVM

Δ(Y*,Y)

wTΨ(X,Y,{HP,HN})+

-wTΨ(X,Y*,{HP,HN})

maxY

≤ ξ

minw ||w||2 + C ξ

Page 28: Loss-based Learning  with Weak Supervision

Learning SSVM

Δ(Y*,Y)

wTΨ(X,Y,{HP,HN})+

-wTΨ(X,Y*,{HP,HN})

maxY

≤ ξ

minw ||w||2 + C ξ

Loss Augmented Inference

Page 29: Loss-based Learning  with Weak Supervision

Loss Augmented InferenceRank 1 Rank 2 Rank 3

Rank positives according to sample scores

Page 30: Loss-based Learning  with Weak Supervision

Loss Augmented InferenceRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Rank negatives according to sample scores

Page 31: Loss-based Learning  with Weak Supervision

Loss Augmented InferenceRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Slide best negative to a higher rankContinue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference

Page 32: Loss-based Learning  with Weak Supervision

RecapScoring function

wTΨ(X,Y,{HP,HN})

Y(w) = argmaxY wTΨ(X,Y, {HP,HN})

Prediction

Learning

Using optimal loss augmented inference

Page 33: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking– Supervised Learning– Weakly Supervised Learning– Latent AP-SVM– Experiments

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Outline

Page 34: Loss-based Learning  with Weak Supervision

Weakly Supervised Learning - Input

Training images X

Page 35: Loss-based Learning  with Weak Supervision

Weakly Supervised Learning - Latent

Training images X Bounding boxes HP

All bounding boxes in negative images are negative

Page 36: Loss-based Learning  with Weak Supervision

Intuitive Prediction Procedure

Select the best bounding boxes in all images

Page 37: Loss-based Learning  with Weak Supervision

Intuitive Prediction ProcedureRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Rank them according to their sample scores

Page 38: Loss-based Learning  with Weak Supervision

Ranking matrix Y

Yik =

+1 if i is better ranked than k

-1 if k is better ranked than i

0 if i and k are ranked equally

Optimal ranking Y*

Weakly Supervised Learning - Output

Page 39: Loss-based Learning  with Weak Supervision

Latent SSVM Formulation

Ψ(X,Y,{HP,HN})ΣiPΣkN Yik (Φ(xi,hi)-Φ(xk,hk))

|P||N|

Scoring function

wTΨ(X,Y,{HP,HN})

Joint feature vector

=

Page 40: Loss-based Learning  with Weak Supervision

Prediction using Latent SSVM

maxY,H wTΨ(X,Y, {HP,HN})

Page 41: Loss-based Learning  with Weak Supervision

Prediction using Latent SSVM

maxY,H wTΣiPΣkN Yik (Φ(xi,hi)-Φ(xk,hk))

Choose best bounding box for positives

Choose worst bounding box for negatives

Not what we wanted

Page 42: Loss-based Learning  with Weak Supervision

Learning Latent SSVM

Δ(Y*,Y(w))minw

Loss = 1 – AP of prediction

Page 43: Loss-based Learning  with Weak Supervision

Learning Latent SSVM

Δ(Y*,Y(w))

wTΨ(X,Y(w),{HP(w),HN(w)})+

-wTΨ(X,Y(w),{HP(w),HN(w)})

Page 44: Loss-based Learning  with Weak Supervision

Learning Latent SSVM

Δ(Y*,Y(w))

wTΨ(X,Y(w),{HP(w),HN(w)})+

-wTΨ(X,Y*,{HP,HN})maxH

Page 45: Loss-based Learning  with Weak Supervision

Learning Latent SSVM

Δ(Y*,Y)

wTΨ(X,Y,{HP,HN})+

-wTΨ(X,Y*,{HP,HN})maxH

maxY,H

≤ ξ

minw ||w||2 + C ξ

Page 46: Loss-based Learning  with Weak Supervision

Learning Latent SSVM

Δ(Y*,Y)

wTΨ(X,Y,{HP,HN})+

-wTΨ(X,Y*,{HP,HN})maxH

maxY,H

≤ ξ

minw ||w||2 + C ξ

Loss Augmented Inference

Cannot be solved optimally

Page 47: Loss-based Learning  with Weak Supervision

Recap

Unintuitive prediction

Non-optimal loss augmented inference

Can we do better?

Unintuitive objective function

Page 48: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking– Supervised Learning– Weakly Supervised Learning– Latent AP-SVM– Experiments

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Outline

Page 49: Loss-based Learning  with Weak Supervision

Latent AP-SVM Formulation

Ψ(X,Y,{HP,HN})ΣiPΣkN Yik (Φ(xi,hi)-Φ(xk,hk))

|P||N|

Scoring function

wTΨ(X,Y,{HP,HN})

Joint feature vector

=

Page 50: Loss-based Learning  with Weak Supervision

Prediction using Latent AP-SSVM

Choose best bounding box for all samples

Optimize over the ranking

hi(w) = argmaxh wTΦ(xi,h)

Y(w) = argmaxY wTΨ(X,Y, {HP(w),HN(w)})

Sort by sample scores

Page 51: Loss-based Learning  with Weak Supervision

Learning Latent AP-SVM

Δ(Y*,Y(w))minw

Loss = 1 – AP of prediction

Page 52: Loss-based Learning  with Weak Supervision

Learning Latent AP-SVM

Δ(Y*,Y(w))

wTΨ(X,Y(w),{HP(w),HN(w)})+

-wTΨ(X,Y(w),{HP(w),HN(w)})

Page 53: Loss-based Learning  with Weak Supervision

Learning Latent AP-SVM

Δ(Y*,Y(w))

wTΨ(X,Y(w),{HP(w),HN(w)})+

-wTΨ(X,Y*,{HP(w),HN(w)})

Page 54: Loss-based Learning  with Weak Supervision

Learning Latent AP-SVM

Δ(Y*,Y)

wTΨ(X,Y,{HP(w),HN})+

-wTΨ(X,Y*,{HP(w),HN})

maxY,HN

Page 55: Loss-based Learning  with Weak Supervision

Learning Latent AP-SVM

Δ(Y*,Y)

wTΨ(X,Y,{HP,HN})+

-wTΨ(X,Y*,{HP,HN})

maxY,HNminHP

HP(w) minimizing the above upper bound

≤ ξ

minw ||w||2 + C ξ

Page 56: Loss-based Learning  with Weak Supervision

Start with an initial estimate of w

CCCP

Impute hidden variables

Update w

Repeat until convergence

Page 57: Loss-based Learning  with Weak Supervision

Above algorithm is optimal.

Imputing Hidden Variables

Choose best bounding boxes according to sample score

Page 58: Loss-based Learning  with Weak Supervision

Start with an initial estimate of w

CCCP

Impute hidden variables

Update w

Repeat until convergence

Page 59: Loss-based Learning  with Weak Supervision

Loss Augmented Inference

Choose best bounding boxes according to sample score

Page 60: Loss-based Learning  with Weak Supervision

Loss Augmented InferenceRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Slide best negative to a higher rankContinue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference

Page 61: Loss-based Learning  with Weak Supervision

Recap

Intuitive prediction

Optimal loss augmented inference

Performance in practice?

Intuitive objective function

Page 62: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking– Supervised Learning– Weakly Supervised Learning– Latent AP-SVM– Experiments

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Outline

Page 63: Loss-based Learning  with Weak Supervision

• VOC 2011 action classification

• 10 action classes + other

• 2424 ‘trainval’ images

• 2424 ‘test’ images– Hidden annotations– Evaluated using a remote server– Only AP values are computed

Dataset

Page 64: Loss-based Learning  with Weak Supervision

• Latent SSVM with 0/1 loss (latent SVM)– Relative loss weight C– Relative positive sample weight J– Robustness threshold K

• Latent SSVM with AP loss (latent SSVM)– Relative loss weight C– Approximate greedy inference algorithm

• 5 random initializations

• 5-fold cross-validation (80-20 split)

Baselines

Page 65: Loss-based Learning  with Weak Supervision

Cross-Validation

Statistically significant improvement

Page 66: Loss-based Learning  with Weak Supervision

Test

Latent SVM Latent SSVM Latent AP-SVM3637383940414243444546

AP

Page 67: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Outline

Joint Work with Wojciech Zaremba, Alexander Gramfort and Matthew Blaschko

IPMI 2013

Page 68: Loss-based Learning  with Weak Supervision

M/EEG Data

Page 69: Loss-based Learning  with Weak Supervision

M/EEG Data

Faster activation (familiar with task)

Page 70: Loss-based Learning  with Weak Supervision

M/EEG Data

Slower activation (bored with task)

Page 71: Loss-based Learning  with Weak Supervision

Classifying M/EEG Data

Statistically significant improvement

Page 72: Loss-based Learning  with Weak Supervision

Functional Connectivity

• visual cortex → deep subcortical source• visual cortex → higher level cognitive processing

Connected components have similar delay

Page 73: Loss-based Learning  with Weak Supervision

• Latent SSVM

• Ranking

• Brain Activation Delays in M/EEG

• Probabilistic Segmentation of MRI

Outline

Joint Work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios,Noura Azzabou, Pierre Carlier

MICCAI 2013

Page 74: Loss-based Learning  with Weak Supervision

Training DataAnnotators provide‘hard’ segmentation

Page 75: Loss-based Learning  with Weak Supervision

Training DataAnnotators provide‘hard’ segmentation

Random Walks provides‘soft’ segmentation

Best ‘soft’ segmentation?

Page 76: Loss-based Learning  with Weak Supervision

Segmentation

Statistically significant improvement

Page 77: Loss-based Learning  with Weak Supervision

To Conclude …• Choice of loss function matters during training

• Many interesting latent variables– Computer Vision (onerous annotations)– Medical Imaging (impossible annotations)

• Large-scale experiments– Other problems– General loss– Efficient Optimization

Page 78: Loss-based Learning  with Weak Supervision

Questions?

http://www.centrale-ponts.fr/personnel/pawan

Page 79: Loss-based Learning  with Weak Supervision

SPLENDID

Nikos ParagiosEquipe GalenINRIA Saclay

Daphne KollerDAGS

Stanford

Machine LearningWeak AnnotationsNoisy Annotations

ApplicationsComputer VisionMedical Imaging

Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data

Visits between INRIA Saclay and Stanford University

Page 80: Loss-based Learning  with Weak Supervision