Introduction to Machine Learning - Brown...
Transcript of Introduction to Machine Learning - Brown...
![Page 1: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/1.jpg)
Introduction to Machine Learning
Brown University CSCI 1950-F, Spring 2011 Prof. Erik Sudderth
Lecture 21: EM for Factor Analysis
Many figures courtesy Kevin Murphy’s textbook, Machine Learning: A Probabilistic Perspective
![Page 2: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/2.jpg)
Principal Components Analysis (PCA)
3D Data
Best 2D Projection
Best 1D Projection
![Page 3: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/3.jpg)
PCA as Low Rank Approximation
![Page 4: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/4.jpg)
Maximizes Variance & Minimizes Error
C. Bishop, Pattern Recognition & Machine Learning
![Page 5: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/5.jpg)
Probabilistic PCA & Factor Analysis
C. Bishop, Pattern Recognition & Machine Learning
•! Both Models: Data is a linear function of low-dimensional latent coordinates, plus Gaussian noise
•! Factor analysis: is a general diagonal matrix •! Probabilistic PCA: is a multiple of identity matrix
![Page 6: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/6.jpg)
Lower Bounds on Marginal Likelihood
C. Bishop, Pattern Recognition & Machine Learning
![Page 7: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/7.jpg)
Expectation Maximization Algorithm
C. Bishop, Pattern Recognition & Machine Learning
E Step: Optimize distribution on hidden variables given parameters
M Step: Optimize parameters given distribution on hidden variables
![Page 8: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/8.jpg)
EM Algorithm for Probabilistic PCA
C. Bishop, Pattern Recognition & Machine Learning
True Principal Components
![Page 9: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/9.jpg)
EM Algorithm for Probabilistic PCA
C. Bishop, Pattern Recognition & Machine Learning
E-Step #1 (Projection)
![Page 10: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/10.jpg)
EM Algorithm for Probabilistic PCA
C. Bishop, Pattern Recognition & Machine Learning
M-Step #1 (Regression)
![Page 11: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/11.jpg)
EM Algorithm for Probabilistic PCA
C. Bishop, Pattern Recognition & Machine Learning
E-Step #2 (Projection)
![Page 12: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/12.jpg)
EM Algorithm for Probabilistic PCA
C. Bishop, Pattern Recognition & Machine Learning
M-Step #2 (Regression)
![Page 13: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/13.jpg)
EM Algorithm for Probabilistic PCA
C. Bishop, Pattern Recognition & Machine Learning
Converged Solution
![Page 14: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/14.jpg)
Why use the EM Algorithm for PCA? •! For large datasets, can be more computationally efficient
than an eigendecomposition or SVD •! Regularization: can put priors on model parameters,
do Bayesian model order selection, etc. •! Cleanly handles cases where some entries of the data
matrix are unobserved or missing (e.g., movie ratings) •! Generalizes to other models where there is no closed form
for the maximum likelihood estimates (e.g., factor analysis)
•! Probabilistic PCA models all rotations of the input data equally well (are basis vectors meaningful?)
•! Factor analysis models all element-wise rescalings of the input data equally well (better when varying units)
Probabilistic PCA or Factor Analysis
![Page 15: Introduction to Machine Learning - Brown Universitycs.brown.edu/courses/csci1950-f/spring2011/lectures/2011... · 2011-04-22 · Introduction to Machine Learning Brown University](https://reader030.fdocuments.in/reader030/viewer/2022041104/5f0490ab7e708231d40e9933/html5/thumbnails/15.jpg)
Factor Analysis Example
Features of Cars in 2004