Advanced Topics in Learning and Vision
Transcript of Advanced Topics in Learning and Vision
Overview
• EM Algorithm
• Mixture of Factor Analyzers
• Mixture of Probabilistic Component Analyzers
• Isometric Mapping
• Locally Linear Embedding
• Linear regression
• Logistic regression
• Linear classifier
• Fisher linear discriminant
Lecture 4 (draft) 1
Announcements
• More course material available on the course web page
• Code: PCA, FA, MoG, MFA, MPPCA, LLE, and Isomap
• Reading (due Oct 25):
- Fisher linear discriminant: Fisherface vs. Eigenface [1]- Support vector machine: [3] or [2]
References
[1] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. Fisherfaces:Recognition using class specific linear projection. IEEE Transactions on PatternAnalysis and Machine Intelligence, 19(7):711–720, 1997.
[2] A. Mohan, C. Papageorgiou, and T. Poggio. Example-based object detection in imagesby components. IEEE Transactions on Pattern Analysis and Machine Intelligence,23(4):349–361, 2001.
[3] M. Pontil and A. Verri. Support vector machines for 3D object recognition. IEEETransactions on Pattern Analysis and Machine Intelligence, 20(6):637–646, 1998.
Lecture 4 (draft) 2
Mixture of Gaussians
p(x) =∑K
k=1 πkN(x|µk,Σk)∑Kk=1 πk = 1 0 ≤ πk ≤ 1
(1)
where πk is the mixing parameter, describing the contribution of k-th Gaussiancomponent in explaining x.
• Given data X = {x1, . . . , xN}, we want to determine πk and modelparameters θ = {πk, µk,Σk}.
- X are observable- The contribution of each data point xi to j-th Gaussian component,
γj(xi), is hidden variables that can be derived from X and θ.- θ are unknown
• If we know θ, we can compute γj(xi).
• If we know γj(xi), we can compute θ.
• Chicken and egg problem.
Lecture 4 (draft) 3
EM algorithm
• Expectation Maximization
• First take some initial guess of model parameters and compute theexpectation of hidden value
• Iterative procedure
• Start with some initial guess and refine it
• Very useful technique
• Variational learning
• Generalized EM algorithm
Lecture 4 (draft) 4
EM Algorithm for Mixture of Gaussians
• Log likelihood function
ln p(X|π, µ,Σ) =N∑
i=1
ln{K∑
k=1
πkN(xi|µk,Σk)} (2)
• No close form solution (sum of components inside the log function)
• E (Expectation) step: Given all the current model parameters, compute theexpectation of the hidden variables
• M (Maximization) step: Optimize the log likelihood with respect to modelparameters
Lecture 4 (draft) 5
EM Algorithm for Mixture of Gaussians: M Step
lnL =N∑
i=1
(K∑
k=1
πkNki) (3)
where
Nki = N(xi|µk,Σk) (4)
Take derivative of L w.r.t. µj
∂ lnL∂µj
=N∑
i=1
πjNji∑Kk=1 πkNki
1Nji
∂Nji
∂µj= 0 (5)
Note1
Nji
∂Nji
∂µj= −
∑j
(xi − µj) (6)
Let γj(xi) = πjNjiPK
k=1 πkNki, i.e., the normalized probablity of xi being generated
Lecture 4 (draft) 6
from the j-th Gaussian component
N∑i=1
γj(xi)Σj(xi − µj) = 0 (7)
Thus,
µj =∑N
i=1 γj(xi)xi∑Ni=1 γj(xi)
(8)
Likewise take partial derivative w.r.t. to πj and Σj
πj =1N
N∑i=1
γj(xi) (9)
Σj =∑N
i=1 γj(xi)(xi − µj)(xi − µj)T∑Ni=1 γj(xi)
(10)
Note γj(xi) plays the weighting role
Lecture 4 (draft) 7
EM Algorithm for Mixture of Gaussians: E Step
• Compute the expected value of hidden variable γj
γj(xi) =πjN(xi|µj,Σj)∑K
k=1 πkN(xi|µk,Σk)(11)
• Interpret the mixing coefficients as prior probabilities
p(xi) =K∑
j=1
πjN(xi|µj,Σj) =K∑
j=1
p(j)p(xi|j) (12)
• Thus, γj(xi) corresponds to posterior probabilities (responsibilities)
p(j|xi) =p(j)p(xi|j)
p(xi)=
πjN(xi|µj,Σj)∑Kk=1 πkN(xi|µk,Σk)
= γj(xi) (13)
Lecture 4 (draft) 8
EM for Factor Analysis
• Factor analysis: x = Λz + ε
• Log likelihood: L = log∏
i(2π)d/2|Ψ|−1/2 exp{−12(xi − Λz)TΨ−1(xi − Λz)
• Hidden variable: z, model parameters: Λ, Ψ.
• E-step:E[z|x] = βxE[zzT |x] = V ar(z|x) + E[z|x]E[z|x]T
= I − βΛ + βxxTβT(14)
• M-step:
Λnew = (∑N
i=1 xiE[z|xi]T )(∑N
i=1 E[zzT |xi])−1
Ψnew = 1N diag{
∑Ni=1 xix
Ti − ΛnewE[z|xi]xT
i }(15)
where diag operator sets all off-diagonal elements to zero.
Lecture 4 (draft) 9
Mixture of Factor Analyzers (MFA)
• Assume that we have K factor analyzers indexed by ωk, k = 1, . . . ,K.ωk = 1 when the data point was generated by k-th factor analyzer.
• The generative mixture model:
p(x) =K∑
k=1
∫p(x|z, ωk)p(z|ωk)p(ωk)dz (16)
wherep(z|ωk) = p(z) = N(0, I) (17)
• All each mode factor analyzer to model data covariance structure in adifferent part of the input space
p(x|z, ωk) = N(µk + Λkz,Ψ) (18)
Lecture 4 (draft) 10
EM for Mixture of Factor Analyzers
• For the E step, we need to compute the expectations of all hidden variables
E[ωk|xi] ∝ p(xi, ωk) = p(ωk)p(xi|ωk) = πkN(x− µk,ΛkΛTk |Ψ)
E[ωkz|xi] = E[ωk|xi]E[z|ωk, xi]E[ωkzzT |xi] = E[ωk|xi]E[zzT |ωk, xi]
(19)
• The model parameters are {(µk,Λk, πk)Kk=1,Ψ}.
• For the M step, take derivative of log likelihood with respect to modelparameters for new µk, Λk, πk, and Ψ.
• Read “The EM Algorithm for Mixtures of Factor Analyzers,” by Ghahramaniand Hinton for details.
• Also read Ghahramani’s lecture notes.
Lecture 4 (draft) 11
EM for Mixture of Probabilistic PCA
• Based on factor analyzers
• Read “Mixtures of Probabilistic Principal Component Analyzers,” by Tippingand Bishop.
Lecture 4 (draft) 12
MFA: Applications
• Modeling the manifolds of images of hand written digits with mixture offactor analyzers [Hinton et al. 97].
• Modeling multimodal density of faces for recognition and detection [Frey etal. 98] [Yang et al. 00].
• Analyze layers of appearance and motion [Frey and Jojic 99]
• Mixture of factor analyzers concurrently performs clustering anddimensionality reduction.
• Able to model the nonlinear manifold well.
Lecture 4 (draft) 13
Nonlinear Principal Component Analysis (NLPCA)
• Aim to better model nonlinear manifold
• Use on multi-layer (5 layer) perception
• The layer in the middle represents the feature space of the NLPCAtransform.
• Two additional layers are used for nonlinearity.
• Auto-encoder, auto-associator, bottleneck network.
Lecture 4 (draft) 14
Recap
• Linear dimensionality reduction:
- Assume data is generated from a subspace- Determine the subspace with PCA or FA (i.e., the subspace is spanned
by the principal components)
• Nonlinear dimensionality reduction:
- Model data with a mixture of locally linear subspaces- use mixture of PCA, mixture of FA
• Mixture methods have local coordinate systems
• Need to find transformation between coordinate systems
Lecture 4 (draft) 15
Isometric Mapping (Isomap) [Tenenbaum et al. 00]
• Preserving pairwise distance structure
• Approximate geodesic distance
• Nonlinear dimensionality reduction
• Use a global coordinate system
• Aim to find intrinsic dimensionality
Lecture 4 (draft) 16
Multidimensional Scaling (MDS)
• Analyze pairwise similarities of entities to gain insight in the underlyingstructure
• Based on a matrix of pairwise similarities
• Metric or non-metric
• Useful for data visualization
• Can be used for dimensionality reduction
• Preserve the pairwise similarity measure
Lecture 4 (draft) 17
Isomap: Algorithm
• Isomap:
- Construct neighborhood graph:Define a graph G of all data points by connecting points i and j if theyare neighbors
- Compute the short paths:For any pair of points i and j, compute their shortest path, and obtainDG.
- Construct M -dimensional embedding:Apply classic MDS to DG to construct M -dimensional Euclidean spaceY . The coordinates yi are obtained by minimizing
E = ||τ(DG)− τ(DY )||L2 (20)
where τ converts distances into inner products that uniquely characterizethe geometry of the data.
• The global minimum of (20) is obtained by setting yi to the top Meigenvectors of τ(DG).
Lecture 4 (draft) 18
Isomap: Applications
Intrinsic low dimensionalembedding
Interpolation using lowdimensional embedding
Lecture 4 (draft) 19
Isomap: Applications
• Object recognition: memory-based recognition
• Object tracking: trajectory along inferred nonlinear manifold
• Video synthesis: interpolate along trajectory on nonlinear manifold
“Representation analysis and synthesis of lip images using dimensionality reduction,”Aharon and Kimmel, IJCV 2005.
Lecture 4 (draft) 20
Isomap: Applications
• States of a moving object move smoothly along a low dimensional manifold.
• Discover the underlying manifold using Isomap
• Learn the mapping between input data and the corresponding points thelow dimensional manifold using mixture of factor analyzers.
• Learn a dynamical model based on the points on the low dimensionalmanifold.
• Use particle filter for tracking.
“Learning object intrinsic structure for robust visual tracking,” Wang et al., CVPR 2003.
Lecture 4 (draft) 21
Locally Linear Embedding (LLE) [Roweis et al. 00]
• Capture local geometry by linear reconstruction
• Map high dimensional data to global internal coordinates
Lecture 4 (draft) 22
Locally Linear Embedding: Algorithm
• LLE
- For each point, determine its neighbors- Reconstruct with linear weights using neighbors
E(W ) =∑
i
|xi −∑
j
wijxj|2 (21)
to find out w- Map to embedded coordinates:
Fix w and project x ∈ Rd to y ∈ RM (M < d) such that
φ(y) =∑
i
|yi −∑
j
wijyj|2 (22)
• The embedding cost (22) defines a unconstrained optimization problem.Add a normalized constraint and solve an eigenvalue problem.
• The bottom M nonzero eigenvectors provide an ordered set of orthogonalcoordinates.
Lecture 4 (draft) 23
LLE: Applications
Learn the embedding of facial expressionimages for synthesis
Learn the embedding of lip images forsynthesis
Lecture 4 (draft) 24
Isomap and LLE
• Embedding rather than mapping function
• No probabilistic interpretation
• No generative model
• Do not take temporal information into consideration
• Unsupervised learning
Lecture 4 (draft) 25
Further Study
• Kernel PCA.
• Principal Curve.
• Laplacian Eigenmap.
• Hessian Isomap.
• Spectral clustering.
• Unified view of spectral embedding and clustering.
• Global coordination of local generative models:
- Global coordination of local linear representation.- Automatic alignment of local representation.
Lecture 4 (draft) 26