lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... ·...
Transcript of lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... ·...
![Page 1: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/1.jpg)
Mixture Models & EM algorithm Probabilis5c Graphical Models, Lecture 13
David Sontag New York University
Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore
![Page 2: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/2.jpg)
Mixture of Gaussians model
µ1 µ2
µ3
• P(Y): There are k components
• P(X|Y): Each component generates data from a mul5variate Gaussian with mean μi and covariance matrix Σi
Each data point is sampled from a genera&ve process:
1. Choose component i with probability P(y=i)
2. Generate datapoint ~ N(mi, Σi )
Gaussian mixture model (GMM)
![Page 3: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/3.jpg)
MulVvariate Gaussians
Σ = arbitrary (semidefinite) matrix: -‐ specifies rotaVon (change of basis) -‐ eigenvalues specify relaVve elongaVon
P(X = x j |Y = i) =1
(2π )m/2 || Σi ||1/2 exp −
12x j −µi( )
TΣi−1 x j −µi( )
#
$%&
'( P(X=xj)=
![Page 4: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/4.jpg)
Mixtures of Gaussians (1)
Old Faithful Data Set
Time to ErupV
on
DuraVon of Last ErupVon
![Page 5: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/5.jpg)
Mixtures of Gaussians (1)
Old Faithful Data Set
Single Gaussian Mixture of two Gaussians
![Page 6: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/6.jpg)
Mixtures of Gaussians (2)
Combine simple models into a complex model:
Component
Mixing coefficient
K=3
![Page 7: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/7.jpg)
Mixtures of Gaussians (3)
![Page 8: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/8.jpg)
Unsupervised learning Model data as mixture of mulVvariate Gaussians
![Page 9: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/9.jpg)
Unsupervised learning Model data as mixture of mulVvariate Gaussians
![Page 10: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/10.jpg)
Unsupervised learning Model data as mixture of mulVvariate Gaussians
Shown is the posterior probability that a point was generated from ith Gaussian: Pr(Y = i | x)
![Page 11: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/11.jpg)
ML esVmaVon in supervised se_ng
• Univariate Gaussian
• Mixture of Mul&variate Gaussians
ML esVmate for each of the MulVvariate Gaussians is given by:
Just sums over x generated from the k’th Gaussian
µML =1n
xnj=1
n
∑ ΣML =1n
x j −µML( ) x j −µML( )T
j=1
n
∑k k k k
![Page 12: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/12.jpg)
That was easy! But what if unobserved data?
• MLE: – argmaxθ ∏j P(yj,xj)
– θ: all model parameters • eg, class probs, means, and variances
• But we don’t know yj’s!!! • Maximize marginal likelihood:
– argmaxθ ∏j P(xj) = argmax ∏j ∑k=1 P(Yj=k, xj) K
![Page 13: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/13.jpg)
EM: Two Easy Steps Objec5ve: argmaxθ lg∏j ∑k=1
K P(Yj=k, xj |θ) = ∑j lg ∑k=1K P(Yj=k, xj|θ)
Data: {xj | j=1 .. n}
• E-‐step: Compute expectaVons to “fill in” missing y values according to current parameters, θ
– For all examples j and values k for Yj, compute: P(Yj=k | xj, θ)
• M-‐step: Re-‐esVmate the parameters with “weighted” MLE esVmates
– Set θ = argmaxθ ∑j ∑k P(Yj=k | xj, θ) log P(Yj=k, xj |θ)
Especially useful when the E and M steps have closed form soluVons!!!
![Page 14: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/14.jpg)
Simple example: learn means only! Consider: • 1D data • Mixture of k=2 Gaussians
• Variances fixed to σ=1 • DistribuVon over classes is uniform
• Just need to esVmate μ1 and μ2
€
P(x,Yj = k)k=1
K
∑j=1
m
∏ ∝ exp −12σ2
x − µk2⎡
⎣ ⎢ ⎤
⎦ ⎥ P(Yj = k)
k=1
K
∑j=1
m
∏
.01 .03 .05 .07 .09
=2
![Page 15: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/15.jpg)
EM for GMMs: only learning means Iterate: On the t’th iteraVon let our esVmates be
λt = { μ1(t), μ2(t) … μK(t) } E-‐step
Compute “expected” classes of all datapoints
M-‐step
Compute most likely new μs given class expectaVons €
P Yj = k x j ,µ1...µK( )∝ exp − 12σ 2 x j − µk
2⎛
⎝ ⎜
⎞
⎠ ⎟ P Yj = k( )
€
µk = P Yj = k x j( )
j=1
m
∑ x j
P Yj = k x j( )j=1
m
∑
![Page 16: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/16.jpg)
Gaussian Mixture Example: Start
![Page 17: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/17.jpg)
Amer first iteraVon
![Page 18: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/18.jpg)
Amer 2nd iteraVon
![Page 19: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/19.jpg)
Amer 3rd iteraVon
![Page 20: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/20.jpg)
Amer 4th iteraVon
![Page 21: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/21.jpg)
Amer 5th iteraVon
![Page 22: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/22.jpg)
Amer 6th iteraVon
![Page 23: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/23.jpg)
Amer 20th iteraVon
![Page 24: lecture13 Gaussian mixtures - MIT CSAILpeople.csail.mit.edu/dsontag/courses/pgm13/slides/... · lecture13_Gaussian_mixtures.pptx Author: David Sontag Created Date: 5/2/2013 8:19:27](https://reader033.fdocuments.in/reader033/viewer/2022042306/5ed1f53d07912b79177938bd/html5/thumbnails/24.jpg)
Jensen’s inequality
• Theorem: log ∑z P(z) f(z) ≥ ∑z P(z) log f(z) – e.g., Binary case for convex func5on f: