Inference in generative models of images and video John Winn MSR Cambridge May 2004.

40
Inference in generative models of images and video John Winn MSR Cambridge May 2004

Transcript of Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Page 1: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Inference in generative models of images and video

John WinnMSR CambridgeMay 2004

Page 2: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Overview

Generative vs. conditional models

Combined approach

Inference in the flexible sprite model

Extending the model

Page 3: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

We have an image I and latent variables H which we wish to infer, e.g. object position, orientation, class. There will also be other sources of variability, e.g. illumination, parameterised by θ.

Generative vs. conditional models

Generative model: P(H, θ, I)

Conditional model: P(H, θ|I) or P(H|I)

Page 4: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Conditional models use featuresFeatures are functions of I which aim to be informative about H but invariant to θ.

Edge features Corner features

Blob features

Page 5: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Conditional modelsUsing features f(I), train a conditional model e.g. using labelled data

))(()|( IIH fgP

Example: Viola & Jones face recognition using rectangle features and AdaBoost

Page 6: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Conditional modelsAdvantages

Simple - only model variables of interest

Inference is fast - due to use of features and simple model

Disadvantages

Non-robust

Difficult to compare different models

Difficult to combine different models

Page 7: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Generative modelsA generative model defines a process of generating the image pixels I from the latent variables H and θ, giving a joint distribution over all variables: P(H, θ, I)

Learning and inference carried out using standard machine learning techniques e.g. Expectation Maximisation, MCMC, variational methods.

No features!

Page 8: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Generative modelsExample: image modeled as layers of ‘flexible’ sprites.

Page 9: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Generative modelsAdvantages

Accurate – as the entire image is modeled

Can compare different models

Can combine different models

Can generate new images

Disadvantages

Inference is difficult due to local minima

Inference is slower due to complex model

Limitations on model complexity

Page 10: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Combined approach

Use a generative model, but speed up inference using proposal distributions given by a conditional model.

A proposal R(X) suggests a new distribution over some of the latent variables X H, θ.

Inference is extended to allow accepting or rejecting the proposal e.g. depending on whether it improves the model evidence.

Page 11: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Using proposals in an MCMC framework

Proposals for text and faces Accepted proposals

From Tu et al, 2003

Generative model: textured regions combined with face and text models

Conditional model: face and text detector using AdaBoost (Viola & Jones)

Page 12: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Using proposals in an MCMC framework

Proposals for text and faces Reconstructed image

From Tu et al, 2003

Generative model: textured regions combined with face and text models

Conditional model: face and text detector using AdaBoost (Viola & Jones)

Page 13: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Proposals in the flexible sprite model

Page 14: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Flexible sprite model

x

Set of images

e.g. frames from a video

Page 15: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Flexible sprite model

x

Page 16: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Flexible sprite model

πf

x

Sprite shape and appearance

Page 17: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Flexible sprite model

π

m

f

T

x

Sprite transform for this image (discretised)

Transformed mask instance for this image

Page 18: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Flexible sprite model

π

m

fb

T

x

Background

Page 19: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Inference method & problems Apply variational inference with factorised

Q distribution Slow – since we have to search entire

discrete transform space Limited size of transform space e.g.

translations only (160120). Many local minima.

Page 20: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Proposals in the flexible sprite model

π

m

T

We wish to create a proposal R(T).

Cannot use features of the image directly until object appearance found.

Use features of the inferred mask.

proposal

Page 21: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Moment-based featuresUse the first and second moments of the inferred mask as features. Learn a proposal distribution R(T).

True locationC-of-G of

mask

Contour of proposal distribution over object location

Can also use R to get a probabilistic bound on T.

Page 22: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Iteration #1

Page 23: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Iteration #2

Page 24: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Iteration #3

Page 25: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Iteration #4

Page 26: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Iteration #5

Page 27: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Iteration #6

Page 28: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Iteration #7

Page 29: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Results on scissors video.

On average, ~1% of transform space searched. Always converges, independent of initialisation.

Original Reconstruction

Foreground only

Page 30: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Beyond translation

Page 31: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Extended transform space

Original Reconstruction

Page 32: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Extended transform space

Original Reconstruction

Page 33: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Extended transform space

Normalised video

Learned sprite appearance

Page 34: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Corner features

Learned sprite appearance

Masked normalised image

Page 35: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Corner feature proposals

Page 36: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Preliminary results

Page 37: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Future directions

Page 38: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Extensions to the generative model

Very wide range of possible extensions: Local appearance model e.g. patch-based Multiple layered objects Object classes Illumination modelling Incorporation of object-specific models e.g. faces Articulated models

Page 39: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Further investigation of using proposals

Investigate other bottom-up features, including: Optical flow Color/texture Use of standard invariant features e.g. SIFT Discriminative models for particular object

classes e.g. faces, text

Page 40: Inference in generative models of images and video John Winn MSR Cambridge May 2004.

π

m

fb

T

x

N