Lecture 19: More EM
description
Transcript of Lecture 19: More EM
![Page 1: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/1.jpg)
Lecture 19: More EM
Machine LearningApril 15, 2010
![Page 2: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/2.jpg)
Last Time
• Expectation Maximization• Gaussian Mixture Models
![Page 3: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/3.jpg)
Today
• EM Proof– Jensen’s Inequality
• Clustering sequential data– EM over HMMs– EM in any Graphical Model• Gibbs Sampling
![Page 4: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/4.jpg)
Gaussian Mixture Models
![Page 5: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/5.jpg)
How can we be sure GMM/EM works?
• We’ve already seen that there are multiple clustering solutions for the same data.– Non-convex optimization problem
• Can we prove that we’re approaching some maximum, even if many exist.
![Page 6: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/6.jpg)
Bound maximization
• Since we can’t optimize the GMM parameters directly, maybe we can find the maximum of a lower bound.
• Technically: optimize a convex lower bound of the initial non-convex function.
![Page 7: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/7.jpg)
EM as a bound maximization problem
• Need to define a function Q(x,Θ) such that– Q(x,Θ) ≤ l(x,Θ) for all x,Θ– Q(x,Θ) = l(x,Θ) at a single
point– Q(x,Θ) is concave
![Page 8: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/8.jpg)
EM as bound maximization
• Claim: – for GMM likelihood
– The GMM MLE estimate is a convex lower bound
![Page 9: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/9.jpg)
EM Correctness Proof
• Prove that l(x,Θ) ≥ Q(x,Θ)Likelihood function
Introduce hidden variable (mixtures in GMM)
A fixed value of θt
Jensen’s Inequality (coming soon…)
![Page 10: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/10.jpg)
EM Correctness Proof
GMM Maximum Likelihood Estimation
![Page 11: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/11.jpg)
The missing link: Jensen’s Inequality
• If f is concave (or convex down):
• Incredibly important tool for dealing with mixture models.
if f(x) = log(x)
![Page 12: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/12.jpg)
Generalizing EM from GMM
• Notice, the EM optimization proof never introduced the exact form of the GMM
• Only the introduction of a hidden variable, z.• Thus, we can generalize the form of EM to
broader types of latent variable models
![Page 13: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/13.jpg)
General form of EM
• Given a joint distribution over observed and latent variables:
• Want to maximize:
1. Initialize parameters2. E Step: Evaluate:
3. M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood)
4. Check for convergence of params or likelihood
![Page 14: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/14.jpg)
Applying EM to Graphical Models
• Now we have a general form for learning parameters for latent variables.– Take a Guess– Expectation: Evaluate likelihood– Maximization: Reestimate parameters– Check for convergence
![Page 15: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/15.jpg)
Clustering over sequential data
• Recall HMMs
• We only looked at training supervised HMMs.• What if you believe the data is sequential, but
you can’t observe the state.
![Page 16: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/16.jpg)
EM on HMMs
• also known as Baum-Welch
• Recall HMM parameters:
• Now the training counts are estimated.
![Page 17: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/17.jpg)
EM on HMMs
• Standard EM Algorithm– Initialize– E-Step: evaluate expected likelihood– M-Step: reestimate parameters from expected
likelihood– Check for convergence
![Page 18: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/18.jpg)
EM on HMMs
• Guess: Initialize parameters, • E-Step: Compute
![Page 19: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/19.jpg)
EM on HMMs
• But what are these E{…} quantities?
so…
These can be efficiently calculated from JTA potentials and separators.
![Page 20: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/20.jpg)
EM on HMMs
![Page 21: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/21.jpg)
EM on HMMs
• Standard EM Algorithm– Initialize– E-Step: evaluate expected likelihood
• JTA algorithm.– M-Step: reestimate parameters from expected likelihood
• Using expected values from JTA potentials and separators
– Check for convergence
![Page 22: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/22.jpg)
Training latent variables in Graphical Models
• Now consider a general Graphical Model with latent variables.
![Page 23: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/23.jpg)
EM on Latent Variable Models
• Guess– Easy, just assign random values to parameters
• E-Step: Evaluate likelihood.– We can use JTA to evaluate the likelihood.– And marginalize expected parameter values
• M-Step: Re-estimate parameters.– This can get trickier.
![Page 24: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/24.jpg)
Maximization Step in Latent Variable Models
• Why is this easy in HMMs, but difficult in general Latent Variable Models?
• Many parents graphical model
![Page 25: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/25.jpg)
Junction Trees
• In general, we have no guarantee that we can isolate a single variable.
• We need to estimate marginal separately.• “Dense Graphs”
![Page 26: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/26.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 27: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/27.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 28: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/28.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 29: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/29.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 30: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/30.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Keep k-1 parameters fixed (to the current
estimate)– Identify a better guess for the free parameter.
![Page 31: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/31.jpg)
M-Step in Latent Variable Models
• M-Step: Reestimate Parameters.– Gibbs Sampling.– This is helpful if it’s easier to
sample from a conditional than it is to integrate to get the marginal.
![Page 32: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/32.jpg)
EM on Latent Variable Models
• Guess– Easy, just assign random values to parameters
• E-Step: Evaluate likelihood.– We can use JTA to evaluate the likelihood.– And marginalize expected parameter values
• M-Step: Re-estimate parameters.– Either JTA potentials and marginals OR– Sampling
![Page 33: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/33.jpg)
Today
• EM as bound maximization• EM as a general approach to learning
parameters for latent variables• Sampling
![Page 34: Lecture 19: More EM](https://reader036.fdocuments.in/reader036/viewer/2022062410/568164dc550346895dd73614/html5/thumbnails/34.jpg)
Next Time
• Model Adaptation– Using labeled and unlabeled data to improve
performance.• Model Adaptation Application– Speaker Recognition• UBM-MAP