MC949 - Computer Visionrocha/teaching/2012s1/mc949/aulas/2012-vision-lec-13.pdfMC949 - Computer...

Post on 28-Aug-2019

217 views 0 download

Transcript of MC949 - Computer Visionrocha/teaching/2012s1/mc949/aulas/2012-vision-lec-13.pdfMC949 - Computer...

MC949 - Computer Vision Lecture #13

Prof. Dr. Anderson Rocha

Microsoft Research Faculty Fellow Affiliate Member, Brazilian Academy of Sciences

Reasoning for Complex Data (Recod) Lab. Institute of Computing, University of Campinas (Unicamp)

Campinas, SP, Brazil

anderson.rocha@ic.unicamp.br http://www.ic.unicamp.br/~rocha

Hidden  Variables,  the  EM  Algorithm,  and    Mixtures  of  Gaussians

This lecture slides were made based on slides of several researchers such as James Hayes, Derek Hoiem, Alexei Efros, Steve Seitz, David Forsyth and many others. Many thanks to all of these authors.

Reading: PRML, Cap. #9: Secs. 9.1 and 9.2

Today’s  Class  •  Examples  of  Missing  Data  Problems  

– Detec:ng  outliers  

•  Background  – Maximum  Likelihood  Es:ma:on  – Probabilis:c  Inference  

•  Dealing  with  “Hidden”  Variables  – EM  algorithm,  Mixture  of  Gaussians  – Hard  EM  

Slide: Derek Hoiem

Missing  Data  Problems:  Outliers    You  want  to  train  an  algorithm  to  predict  whether  a  photograph  is  aIrac:ve.    You  collect  annota:ons  from  Mechanical  Turk.    Some  annotators  try  to  give  accurate  ra:ngs,  but  others  answer  randomly.  

   Challenge:  Determine  which  people  to  trust  and  the  average  ra:ng  by  accurate  annotators.  

Photo: Jam343 (Flickr)

Annotator Ratings

10 8 9 2 8

Missing  Data  Problems:  Object  Discovery    You  have  a  collec:on  of  images  and  have  extracted  regions  from  them.    Each  is  represented  by  a  histogram  of  “visual  words”.        Challenge:  Discover  frequently  occurring  object  categories,  without  pre-­‐trained  appearance  models.  

http://www.robots.ox.ac.uk/~vgg/publications/papers/russell06.pdf

Missing  Data  Problems:  Segmenta:on    You  are  given  an  image  and  want  to  assign  foreground/background  pixels.  

   Challenge:  Segment  the  image  into  figure  and  ground  without  knowing  what  the  foreground  looks  like  in  advance.  

Foreground

Background

Slide: Derek Hoiem

Missing  Data  Problems:  Segmenta:on    Challenge:  Segment  the  image  into  figure  and  ground  without  knowing  what  the  foreground  looks  like  in  advance.  

 Three  steps:  1.  If  we  had  labels,  how  could  we  model  the  appearance  of  

foreground  and  background?  2.  Once  we  have  modeled  the  fg/bg  appearance,  how  do  we  

compute  the  likelihood  that  a  pixel  is  foreground?  3.  How  can  we  get  both  labels  and  appearance  models  at  

once?  

Foreground

Background

Slide: Derek Hoiem

Maximum  Likelihood  Es:ma:on    1.  If  we  had  labels,  how  could  we  model  the  appearance  

of  foreground  and  background?  

Foreground

Background

Slide: Derek Hoiem

Maximum  Likelihood  Es:ma:on  

{ }

∏=

=

=

nn

N

xp

p

xx

)|(argmaxˆ

)|(argmaxˆ..1

θθ

θθ

θ

θx

xdata parameters

Slide: Derek Hoiem

Maximum  Likelihood  Es:ma:on  

{ }

∏=

=

=

nn

N

xp

p

xx

)|(argmaxˆ

)|(argmaxˆ..1

θθ

θθ

θ

θx

x

Gaussian Distribution

( )⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

2

2

2exp

21),|(

σµ

πσσµ n

nxxp

Slide: Derek Hoiem

Maximum  Likelihood  Es:ma:on  

{ }

∏=

=

=

nn

N

xp

p

xx

)|(argmaxˆ

)|(argmaxˆ..1

θθ

θθ

θ

θx

x

( )⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

2

2

2exp

21),|(

σµ

πσσµ n

nxxp

Gaussian Distribution

∑=n

nxN1

µ̂ ( )∑ −=n

nxN22 ˆ1ˆ µσ

Slide: Derek Hoiem

Example:  MLE  

>> mu_fg = mean(im(labels)) mu_fg = 0.6012

>> sigma_fg = sqrt(mean((im(labels)-mu_fg).^2)) sigma_fg = 0.1007

>> mu_bg = mean(im(~labels)) mu_bg = 0.4007

>> sigma_bg = sqrt(mean((im(~labels)-mu_bg).^2)) sigma_bg = 0.1007

>> pfg = mean(labels(:));

labels im

fg: mu=0.6, sigma=0.1 bg: mu=0.4, sigma=0.1

Parameters used to Generate

Slide: Derek Hoiem

Probabilis:c  Inference        2.  Once  we  have  modeled  the  fg/bg  appearance,  how  

do  we  compute  the  likelihood  that  a  pixel  is  foreground?  

Foreground

Background

Slide: Derek Hoiem

Probabilis:c  Inference    Compute  the  likelihood  that  a  par:cular  model  generated  a  sample  

component or label

),|( θnn xmzp =

Slide: Derek Hoiem

Probabilis:c  Inference    Compute  the  likelihood  that  a  par:cular  model  generated  a  sample  

component or label

( )( )θ

θθ

||,),|(

n

mnnnn xp

xmzpxmzp ===

Slide: Derek Hoiem

Probabilis:c  Inference    Compute  the  likelihood  that  a  par:cular  model  generated  a  sample  

component or label

( )( )θ

θθ

||,),|(

n

mnnnn xp

xmzpxmzp ===

( )( )∑ =

==

kknn

mnn

xkzpxmzp

θθ|,|,

Slide: Derek Hoiem

Probabilis:c  Inference    Compute  the  likelihood  that  a  par:cular  model  generated  a  sample  

component or label

( )( )θ

θθ

||,),|(

n

mnnnn xp

xmzpxmzp ===

( ) ( )( ) ( )∑ ==

===

kknknn

mnmnn

kzpkzxpmzpmzxp

θθθθ|,||,|

( )( )∑ =

==

kknn

mnn

xkzpxmzp

θθ|,|,

Slide: Derek Hoiem

Example:  Inference  

>> pfg = 0.5; >> px_fg = normpdf(im, mu_fg, sigma_fg);

>> px_bg = normpdf(im, mu_bg, sigma_bg);

>> pfg_x = px_fg*pfg ./ (px_fg*pfg + px_bg*(1-pfg));

im fg: mu=0.6, sigma=0.1 bg: mu=0.4, sigma=0.1

Learned Parameters

p(fg | im) Slide: Derek Hoiem

Figure from “Bayesian Matting”, Chuang et al. 2001

Mixture  of  Gaussian*  Example:  MaWng  

Mixture  of  Gaussian*  Example:  MaWng  

Result from “Bayesian Matting”, Chuang et al. 2001

Dealing  with  Hidden  Variables  

3.  How  can  we  get  both  labels  and  appearance  models  at  once?  

Foreground

Background

Slide: Derek Hoiem

Mixture  of  Gaussians  

( )m

m

mn

m

σ

µ

πσ⋅⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

2 2exp

2

1

( ) ( )mmmnnnn mzxpmzxp πσµ ,,|,,,|, 22 === πσµ

( ) ( )mnmmn mzpxp πσµ |,| 2 ==

mixture component

( ) ( )∑ ==m

mmmnnn mzxpxp πσµ ,,|,,,| 22 πσµ

component prior component model parameters

Slide: Derek Hoiem

Mixture  of  Gaussians  

 With  enough  components,  can  represent  any  probability  density  func:on  – Widely  used  as  general  purpose  pdf  es:mator  

Slide: Derek Hoiem

Segmenta:on  with  Mixture  of  Gaussians  

     Pixels  come  from  one  of  several  Gaussian  components  – We  don’t  know  which  pixels  come  from  which  components  

– We  don’t  know  the  parameters  for  the  components  

Slide: Derek Hoiem

Simple  solu:on  1.  Ini:alize  parameters  

2.  Compute  the  probability  of  each  hidden  variable  given  the  current  parameters  

3.  Compute  new  parameters  for  each  model,  weighted  by  likelihood  of  hidden  variables  

4.  Repeat  2-­‐3  un:l  convergence  

Slide: Derek Hoiem

Mixture  of  Gaussians:  Simple  Solu:on  1.  Ini:alize  parameters  

2.  Compute  likelihood  of  hidden  variables  for  current  parameters  

3.  Es:mate  new  parameters  for  each  model,  weighted  by  likelihood    

),,,|( )()(2)( tttnnnm xmzp πσµ==α

∑∑=+

nnnm

nnm

tm xα

αµ

1ˆ )1( ( )∑∑−=

+

nmnnm

nnm

tm x 2)1(2 ˆ1ˆ µα

ασ

Nn

nmt

m

∑=+

απ )1(ˆ

Slide: Derek Hoiem

Expecta:on  Maximiza:on  (EM)  Algorithm  

( )⎟⎠

⎞⎜⎝

⎛= ∑

zzx θθ

θ|,logargmaxˆ pGoal:    

[ ]( ) ( )[ ]XfXf EE ≥

Jensen’s Inequality

Log of sums is intractable

See here for proof: www.stanford.edu/class/cs229/notes/cs229-notes8.ps for concave funcions, such as f(x)=log(x)

Slide: Derek Hoiem

Expecta:on  Maximiza:on  (EM)  Algorithm  

1.  E-­‐step:  compute    

2.  M-­‐step:  solve  

( )( )[ ] ( )( ) ( ))(,|

,||,log|,logE )(t

xz pppt θθθθ

xzzxzxz∑=

( )( ) ( ))()1( ,||,logargmax tt pp θθθθ

xzzxz∑=+

( )⎟⎠

⎞⎜⎝

⎛= ∑

zzx θθ

θ|,logargmaxˆ pGoal:    

Slide: Derek Hoiem

EM  for  Mixture  of  Gaussians  (on  board)  ( )

∑ ⋅⎟⎟⎠

⎞⎜⎜⎝

⎛ −−=

mm

m

mn

m

σ

µ

πσ2

2

2exp

2

1( ) ( )∑ ==m

mmmnnn mzxpxp πσµ ,,|,,,| 22 πσµ

1.  E-­‐step:  

2.  M-­‐step:    

( )( )[ ] ( )( ) ( ))(,|

,||,log|,logE )(t

xz pppt θθθθ

xzzxzxz∑=

( )( ) ( ))()1( ,||,logargmax tt pp θθθθ

xzzxz∑=+

EM  for  Mixture  of  Gaussians  (on  board)  ( )

∑ ⋅⎟⎟⎠

⎞⎜⎜⎝

⎛ −−=

mm

m

mn

m

σ

µ

πσ2

2

2exp

2

1( ) ( )∑ ==m

mmmnnn mzxpxp πσµ ,,|,,,| 22 πσµ

1.  E-­‐step:  

2.  M-­‐step:    

( )( )[ ] ( )( ) ( ))(,|

,||,log|,logE )(t

xz pppt θθθθ

xzzxzxz∑=

( )( ) ( ))()1( ,||,logargmax tt pp θθθθ

xzzxz∑=+

),,,|( )()(2)( tttnnnm xmzp πσµ==α

∑∑=+

nnnm

nnm

tm xα

αµ

1ˆ )1( ( )∑∑−=

+

nmnnm

nnm

tm x 2)1(2 ˆ1ˆ µα

ασ

Nn

nmt

m

∑=+

απ )1(ˆ

Slide: Derek Hoiem

EM  Algorithm  

•  Maximizes  a  lower  bound  on  the  data  likelihood  at  each  itera:on  

•  Each  step  increases  the  data  likelihood  – Converges  to  local  maximum  

•  Common  tricks  to  deriva:on  – Find  terms  that  sum  or  integrate  to  1  – Lagrange  mul:plier  to  deal  with  constraints  

Slide: Derek Hoiem

Mixture  of  Gaussian  demos  

•  hIp://www.cs.cmu.edu/~alad/em/  •  hIp://lcn.epfl.ch/tutorial/english/gaussian/html/  

Slide: Derek Hoiem

“Hard  EM”  •  Same  as  EM  except  compute  z*  as  most  likely  values  for  hidden  variables  

•  K-­‐means  is  an  example  

•  Advantages  –  Simpler:  can  be  applied  when  cannot  derive  EM  –  Some:mes  works  beIer  if  you  want  to  make  hard  predic:ons  at  the  end  

•  But  – Generally,  pdf  parameters  are  not  as  accurate  as  EM  

Slide: Derek Hoiem

Missing  Data  Problems:  Outliers    You  want  to  train  an  algorithm  to  predict  whether  a  photograph  is  aIrac:ve.    You  collect  annota:ons  from  Mechanical  Turk.    Some  annotators  try  to  give  accurate  ra:ngs,  but  others  answer  randomly.  

   Challenge:  Determine  which  people  to  trust  and  the  average  ra:ng  by  accurate  annotators.  

Photo: Jam343 (Flickr)

Annotator Ratings

10 8 9 2 8

Missing  Data  Problems:  Object  Discovery    You  have  a  collec:on  of  images  and  have  extracted  regions  from  them.    Each  is  represented  by  a  histogram  of  “visual  words”.        Challenge:  Discover  frequently  occurring  object  categories,  without  pre-­‐trained  appearance  models.  

http://www.robots.ox.ac.uk/~vgg/publications/papers/russell06.pdf

What’s  wrong  with  this  predic:on?  

P(foreground | image)

Slide: Derek Hoiem

Next  class  •  MRFs  and  Graph-­‐cut  Segmenta:on  

Slide: Derek Hoiem

Missing  data  problems  •  Outlier  Detec:on  

–  You  expect  that  some  of  the  data  are  valid  points  and  some  are  outliers  but…  

–  You  don’t  know  which  are  inliers  –  You  don’t  know  the  parameters  of  the  distribu:on  of  inliers  

•  Modeling  probability  densi:es  –  You  expect  that  most  of  the  colors  within  a  region  come  from  one  of  a  few  normally  distributed  sources  but…  

–  You  don’t  know  which  source  each  pixel  comes  from  –  You  don’t  know  the  Gaussian  means  or  covariances  

•  Discovering  paIerns  or  “topics”  –  You  expect  that  there  are  some  re-­‐occuring  “topics”  of  codewords  within  an  image  collec:on.    You  want  to…  

–  Figure  out  the  frequency  of  each  codeword  within  each  topic  –  Figure  out  which  topic  each  image  belongs  to    

Slide: Derek Hoiem

Running  Example:  Segmenta:on  1.  Given  examples  of  foreground  and  

background  –  Es:mate  distribu:on  parameters  –  Perform  inference    

2.  Knowing  that  foreground  and  background  are  “normally  distributed”  

3.  Some  ini:al  es:mate,  but  no  good  knowledge  of  foreground  and  background