Modeling Latent Variable Uncertainty for Loss-based Learning

Daphne KollerStanford University

Ben PackerStanford University

M. Pawan KumarÉcole Centrale Paris

École des Ponts ParisTechINRIA Saclay, Île-de-France

AimAccurate learning with weakly supervised data

Train Input xi Output yi

Elephant

Giraffe

Rhino Object Detection

Input x

Output y = “Deer”Latent Variable h

(y(f),h(f)) = argmaxy,h f(Ψ(x,y,h))

Feature Ψ(x,y,h) (e.g. HOG)Input x

Output y = “Deer”

Prediction

Function f : Ψ(x,y,h) (-∞, +∞)

Latent Variable h

f* = argminf Objective(f)

Learning

Latent Variable h

AimFind a suitable objective function to learn f*

Learning

Encourages accurate prediction

User-specified criterion for accuracy

f* = argminf Objective(f)

Latent Variable h

• Previous Methods

• Our Framework

• Optimization

• Results

• Ongoing and Future Work

Outline

Latent SVM

Linear function parameterized by w

Prediction (y(w), h(w)) = argmaxy,h wTΨ(x,y,h)

Learning minw Σi Δ(yi,yi(w),hi(w))

✔ Loss based learning

✖ Loss independent of true (unknown) latent variable

✖ Doesn’t model uncertainty in latent variables

User-defined loss

Expectation Maximization

Joint probability Pθ(y,h|x) =exp(θTΨ(x,y,h))

Prediction (y(θ), h(θ)) = argmaxy,h Pθ(y,h|x)

Prediction (y(θ), h(θ)) = argmaxy,h θTΨ(x,y,h)

Learning maxθ Σi log (Pθ(yi|xi))

Prediction (y(θ), h(θ)) = argmaxy,h θTΨ(x,y,h)

Learning maxθ Σi Σhi log (Pθ(yi,hi|xi))

✔ Models uncertainty in latent variables

✖ Doesn’t model accuracy of latent variable prediction

✖ No user-defined loss function

• Our Framework

• Optimization

• Results

Outline

Problem

Model Uncertainty in Latent Variables

Model Accuracy of Latent Variable Predictions

Solution

Model Uncertainty in Latent Variables

Use two different distributions for the two different tasks

Solution

Use two different distributions for the two different tasks

Pθ(hi|yi,xi)

SolutionUse two different distributions for the two different tasks

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

The Ideal CaseNo latent variable uncertainty, correct prediction

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

Pw(yi,hi|xi)

(yi,hi)(yi,hi(w))

Pθ(hi|yi,xi)

In PracticeRestrictions in the representation power of models

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

Our FrameworkMinimize the dissimilarity between the two distributions

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

User-defined dissimilarity measure

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

- β Σh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)

Hi(w,θ)

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

- (1-β) Δ(yi(w),hi(w),yi(w),hi(w))

- β Hi(θ,θ)Hi(w,θ)

Pw(yi,hi|xi)

Pθ(hi|yi,xi)

- β Hi(θ,θ)Hi(w,θ)minw,θ Σi

• Our Framework

• Optimization

• Results

Outline

Optimizationminw,θ Σi Hi(w,θ) - β Hi(θ,θ)

Initialize the parameters to w0 and θ0

Repeat until convergence

Fix w and optimize θ

Fix θ and optimize w

Optimization of θminθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

Pθ(hi|yi,xi)

Case I: yi(w) = yi

Pθ(hi|yi,xi)

Case I: yi(w) = yi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Stochastic subgradient descent

Optimization of wminw Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Expected loss, models uncertainty

Form of optimization similar to Latent SVM

Observation: When Δ is independent of true h,our framework is equivalent to Latent SVM

Concave-Convex Procedure (CCCP)

• Our Framework

• Optimization

• Results

Outline

Object Detection

Elephant

Giraffe

Input x

Output y = “Deer”Latent Variable h

Mammals Dataset

60/40 Train/Test Split

5 Folds

Results – 0/1 Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.10.20.30.40.50.60.70.80.9

Average Test Loss

LSVMOur

Statistically Significant

Results – Overlap Loss

0.6Average Test Loss

LSVMOur

Action DetectionInput x

Output y = “Using Computer”Latent Variable h

PASCAL VOC 2011

60/40 Train/Test Split

5 Folds

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Results – 0/1 Loss

LSVMOur

Results – Overlap Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50.62

LSVMOur

• Our Framework

• Optimization

• Results

Outline

Slides Deleted !!!

Modeling Latent Variable Uncertainty for Loss-based Learning

Documents

Transcript of Modeling Latent Variable Uncertainty for Loss-based Learning

Max-Margin Latent Variable Models

New Continuous Relaxation Training of Discrete Latent Variable … · 2020. 7. 18. · relaxation discrete latent variable models can perform on par with continuous latent variable

Collaborative Filtering: Latent Variable Model

Latent VariabLe Hybrids

Process Modeling by Bayesian Latent Variable

Latent Variable Models - University of Pittsburgh

Alternating Direction Methods for Latent Variable Gaussian ...users.stat.umn.edu/~zouxx019/Papers/lvgm.final.pdf · Alternating Direction Methods for Latent Variable Gaussian Graphical

Latent Variable Models in Education

MARCOULIDES MOUSTAKI-Latent Variable and Latent Structure Models

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Latent Variable Analysis - Division of Social Sciencespages.ucsd.edu/~aronatas/Latent Variable Analysis.pdf · Latent Variable and Its Indicators honesty buystoln e1 1 1 ... bh =

INTEGRATED CHOICE AND LATENT VARIABLE MODELS

Latent Variable Analysis

Learning latent variable structured prediction models with …papers.nips.cc/paper/7577-learning-latent-variable... · 2019-02-19 · bounds have not focused on latent variables.

Point process latent variable models of larval zebrafish ... · 3 Mixed Discrete and Continuous Point Process Latent Variable Models We propose a class of point process latent variable

Discriminative Regularization for Latent Variable Models with …proceedings.mlr.press/v97/miller19a/miller19a.pdf · 2020. 8. 18. · Discriminative Regularization for Latent Variable

Latent Variable Graphical Model Selection using Harmonic ...ranger.uta.edu/~wonhwa/publication/cvpr2016_wonhwa.pdf · Latent Variable Graphical Model Selection using Harmonic Analysis:

The Laplacian Eigenmaps Latent Variable Modelproceedings.mlr.press/v2/carreira-perpinan07a/carreira-perpinan07a.pdf · The Laplacian Eigenmaps Latent Variable Model ... complementary

RAND-WALK: A latent variable model approach to …Latent Semantic Analysis (Papadimitriou et al., 1998; Hofmann, 1999). But there has been no corresponding latent variable generative

Dependency Patterns For Latent Variable Discovery - ABNMSabnms.org/conferences/abnms2015/presentations... · CaMML fails to detect latent variable because there is no latent variable