EE4-62 MLCV Lecture 13-14 Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
-
Upload
adelia-charles -
Category
Documents
-
view
223 -
download
3
Transcript of EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
EE462 MLCV
2
Data points (green), 2D vectors, are grouped to two homogenous clusters (blue and red).Clustering is achieved by an iterative algorithm (left to right). The cluster centers are marked x.
Vector Clustering
EE462 MLCV
3
``
RGB
Pixel Clustering (Image Quantisation)Image pixels are represented by 3D vectors of R,G,B values.The vectors are grouped to K=10,3,2 clusters, and represented by the mean values of the respective clusters.
𝐱∈R3
EE462 MLCV
4
dim
ensi
on D
……
…
……
or raw pixels
…
K codewords
Patch ClusteringImage patches are harvested around interest points from a large number of images.They are represented by finite dimensional vectors, and clustered to form a visual dictionary.
SIFT
20
20
D=400
Lecture 9-10 (BoW)
EE462 MLCV
5
Image ClusteringWhole images are represented as finite dimensional vectors.Homogenous vectors are grouped together in Euclidean space.
Lecture 9-10 (BoW)
……
EE462 MLCV
6
K-means vs GMM
Hard clustering: a data point is assigned a cluster.
Soft clustering: a data point is explained by a mix of multiple Gaussians probabilistically.
Two standard methods are k-means and Gaussian Mixture Model (GMM).K-means assigns data points to the nearest clusters, while GMM represents data by multiple Gaussian densities.
EE462 MLCV
7
Matrix and Vector DerivativesMatrix and vector derivatives are obtained first by element-wise derivatives and then reforming them into matrices and vectors.
EE462 MLCV
9
K-means Clustering
Given a data set {x1,…, xN} of N observations in a D-dimensional space, our goal is to partition the data set into K clusters or groups.
The vectors μk, where k = 1,...,K, represent k-th cluster, e.g. the centers of the clusters.
Binary indicator variables are defined for each data point xn, rnk ∈ {0, 1}, where k = 1,...,K.
1-of-K coding scheme: xn is assigned to cluster k then rnk = 1, and rnj = 0 for j ≠ k.
EE462 MLCV
10
The objective function that measures distortion is
We ought to find {rnk} and {μk} that minimise J.
EE462 MLCV
11
till converge
• Iterative solution:
Step 1: We minimise J with respect to rnk, keeping μk fixed. J is a linear function of rnk, we have a closed form solution
Step 2: We minimise J with respect to μk keeping rnk fixed. J is a quadratic of μk. We set its derivative with respect to μk to zero,
First we choose some initial values for μk.
EE462 MLCV
13
It provides convergence proof.Local minimum: its result depends on initial values of μk .
EE462 MLCV
14
• Generalisation of K-means using a more generic dissimilarity measure V (xn, μk). The objective function to minimise is
Circles in the same size
V = ( x n - u k ) T Σk
-1 ( x n - u k )
Generalisation of K-means
• Cluster shapes by different Σk:
, where Σk denotes the covariance matrix.
Σk: = I
¿ [ 𝜎 𝑥2 𝜎 𝑥𝑦
𝜎 𝑦𝑥 𝜎 𝑦2 ]
EE462 MLCV
15
Generalisation of K-means
Σk: a diagonal matrix
Σk: an isotropic matrixDifferent sized circles
Ellipses
Σk: a full matrix Rotated
ellipses
EE462 MLCV
16
Statistical Pattern Recognition Toolbox for Matlab
http://cmp.felk.cvut.cz/cmp/software/stprtool/
…\stprtool\probab\cmeans.m
EE462 MLCV
17
Mixture of GaussiansDenote z as 1-of-K representation: zk ∈ {0, 1} and Σk zk = 1.
We define the joint distribution p(x, z) by a marginal distribution p(z) and a conditional distribution p(x|z).
Lecture 11-12 (Prob. Graphical models)
Hidden variable
Observable variable: data
EE462 MLCV
18
The marginal distribution over z is written by the mixing coefficients πk
where
The marginal distribution is in the form of
Similarly,
EE462 MLCV
20
The conditional probability p(zk = 1|x) denoted by γ(zk ) is obtained by Bayes' theorem,
We view πk as the prior probability of zk = 1, and γ(zk ) as the posterior probability.
γ(zk ) is the responsibility that k-component takes for explaining the observation x.
EE462 MLCV
21
Maximum Likelihood Estimation
s.t.
Given a data set of X = {x1,…, xN}, the log of the likelihood function is
EE462 MLCV
23
objective ftn. f(x)constraints g(x)
max f(x) s.t. g(x)=0 max f(x) + g(x)
Refer to Optimisation course or http://en.wikipedia.org/wiki/Lagrange_multiplier
Setting the derivatives of ln p(X|π, μ, Σ) with respect to k to zero, we obtain
Finally, we maximise ln p(X|π, μ, Σ) with respect to the mixing coefficients πk. We use a Largrange multiplier
EE462 MLCV
25
EM (Expectation Maximisation) for Gaussian Mixtures
1. Initialise the means μk , covariances Σk and mixing coefficients πk.
2. Ε step: Evaluate the responsibilities using the current parameter values
3. M step: RE-estimate the parameters using the current responsibilities
EE462 MLCV
26
4. Evaluate the log likelihood
and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied, return to step 2.
EM (Expectation Maximisation) for Gaussian Mixtures
EE462 MLCV
28
Statistical Pattern Recognition Toolbox for Matlab
http://cmp.felk.cvut.cz/cmp/software/stprtool/
…\stprtool\visual\pgmm.m
…\stprtool\demos\demo_emgmm.m
EE462 MLCV
29
Information Theory
The amount of information can be viewed as the degree of surprise on the value of x.
If we have two events x and y that are unrelated, h(x,y) = h(x) + h(y). As p(x,y) = p(x)p(y), thus h(x) takes the logarithm of p(x) as
where the minus sign ensures that information is positive or zero.
Lecture 7 (Random forest)
0 1