Post on 26-Dec-2015
Page 2
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Segmentation Spectral Clustering
–Graph-cut
–Normalized graph-cut Expectation Maximization (EM) clustering
04/19/23
Page 3
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Segmentation Spectral Clustering
–Graph-cut
–Normalized graph-cut Expectation Maximization (EM) clustering
04/19/23
Page 4
Graph Theory Terminology Graph G(V,E)
–Set of vertices and edges
–Numbers represent weights Graphs for Clustering
–Points are vertices
–Weights reduced with distance
–Segmentation: look for minimum cut in graph
04/19/23
, 2exp
2a b
a b
xw
x
A B
Page 5
Spectral Clustering Graph-cut
–Undirected, weighted graph G = (V,E) as affinity matrix A
–Use eigenvectors for segmentation Assume k elements and c clusters Represent cluster n with vector w of k components
Values represent cluster association; normalize so that Extract good clusters
Select wn which maximizes
Solution is wn is an eigenvector of A; select eigenvector with largest eigenvalue
04/19/23
from Forsyth & Ponce
1
2
3
45
7
6
9
81 1
Page 6
Spectral Clustering Normalized Cut
–Address
drawbacks of
graph-cut
–Define association
between vertex subset A and full set V as:
–Previously maximized assoc(A,A); now also wish to minimize assoc(A,V). Define normalized cut as:
04/19/23
Ideal Cut
Cuts with lesser weightthan the ideal cut
Page 7
Spectral Clustering Normalized Cuts Algorithm
–Define where A is affinity matrix.
–Define vector x depicting cluster membership xi = 1 if point i is in A, and -1, otherwise
–Define real approximation to x:
–We now wish to minimize objective function:
–This constitutes solving:
–Solution is eigenvector with second smallest eigenvalue
– If normcut is over some threshold, re-partition graph.
04/19/23
(( ,, )) ,j
i ji iD A
Page 8
Probabilistic Mixture Resolving Approach to Clustering Expectation Maximization (EM) Algorithm
–Density estimation of data points in unsupervised setting
–Finds ML estimates when data depends on latent variables E step – likelihood expectation including latent variables as observed M step – computes ML estimates of parameters by maximizing above
– Start with Gaussian Mixture Model:
– Segmentation: reformulate as missing data problem Latent variable Z provides labeling
– Gaussian bivariate PDF:
04/19/23
11/2
1 1( | ) ( ) ( )
22
Tk j k j j k j
j
p x exp
x x
Page 9
Probabilistic Mixture Resolving Approach to Clustering EM Process
–Maximize log-likelihood function:
–Not trivial; introduce Z, & denote complete data Y = [XT ZT]T:
–We know above data; ML is easy:
04/19/23
1 1 1 1
log ( | ) log ( | ) log ( | )g gN N
nj n j nj n jn j n j
p z p x z p x
Y
1 1
log ( | ) log ( | )gN
n j jn j
p p x
X
ˆ arg max log ( | )p
Y
Page 10
Probabilistic Mixture Resolving Approach to Clustering
EM steps
04/19/23
1
1
( | , )
( | , )
N
n nn
i N
nn
x p i x
p i x
1
1( | , )
N
i nn
p i xN
1
1
( | , )[( )( ) ]
( | , )
NT
n n i n in
i N
nn
p i x x x
p i x
1
( | )[ ]
( | )
i k iki g
j k jj
p xE z
p x
Page 14
Conclusions For simple case like example of four Gaussians, both
algorithms perform well, as can be seen from results
From literature: (k = # of clusters)– EM is good for small k; coarse segmentation for large k
Needs to know number of components to cluster Initial conditions are essential; prior knowledge helpful to accelerate
convergence and achieving a local/global maximum of likelihood
– Ncut gives good results for large k For fully connected graph, intensive space & computation time requirements
– Graph cut’s first eigenvector approach finds points in the ‘dominant’ cluster Not very consistent; literature advocates for normalized approach
– In end, tradeoff depending on source data
Page 15
References (for slide images)
J. Shi & J. Malik “Normalized Cuts and Image Segmentation”– http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf
C. Bishop “Latent Variables, Mixture Models and EM”– http://cmp.felk.cvut.cz/cmp/courses/recognition/Resources/_EM/Bishop-EM.ppt
R. Nugent & L. Stanberry “Spectral Clustering”– http://www.stat.washington.edu/wxs/Stat593-s03/Student-presentations/
SpectralClustering2.ppt
S. Candemir “Graph-based Algorithms for Segmentation”– http://www.bilmuh.gyte.edu.tr/BIL629/special%20section-%20graphs/
GraphBasedAlgorithmsForComputerVision.ppt
W. H. Liao “Segmentation: Graph-Theoretic Clustering”– http://www.cs.nccu.edu.tw/~whliao/acv2008/segmentation_by_graph.ppt
D. Forsyth & J. Ponce “Computer Vision: A Modern Approach”
Page 17
K-means(used by some clustering algorithms)
Determine Euclidean distance of each object in data set to (randomly picked) center points
Construct K clusters by assigning all points to closest cluster
Move the center points to the real centers of the resulting clusters
Page 18
Responsibilities
Responsibilities assign data points to clusters
such that
Example: 5 data points and 3 clusters
Page 20
Minimizing the Cost Function
E-step: minimize w.r.t.– assigns each data point to nearest prototype
M-step: minimize w.r.t – gives
– each prototype set to the mean of points in that cluster Convergence guaranteed since there is a finite number of
possible settings for the responsibilities
Page 21
Limitations of K-means
Hard assignments of data points to clusters – small shift of a data point can flip it to a different cluster
Not clear how to choose the value of K – and value must be chosen beforehand.– Solution: replace ‘hard’ clustering of K-means with ‘soft’ probabilistic
assignments of EM Not robust to outliers – Far data from centroid may pull
centroid away from real one.
Page 24
EM Algorithm – Informal Derivation Let us proceed by simply differentiating the log likelihood Setting derivative with respect to equal to zero gives
giving
which is simply the weighted mean of the data
Page 25
Ng, Jordan, Weiss Algorithm
Form the matrix
Find , the k largest eigenvectors of L
These form the columns of the new matrix X– Note: have reduced dimension from nxn to nxk
1/ 2 1/ 2L D AD
1 2, ,..., kx x x
Page 26
Ng, Jordan, Weiss Algorithm Form the matrix Y
– Renormalize each of X’s rows to have unit length
–
– Y Treat each row of Y as a point in Cluster into k clusters via K-means Final Cluster Assignment
– Assign point to cluster j iff row i of Y was assigned to cluster j
2 2/( )ij ij ijj
Y X X
kR
nxkR
is
Page 27
Reasoning for Ng
If we eventually use K-means, why not just apply K-means to the original data?
This method allows us to cluster non-convex regions
Page 28
User’s Prerogative
Choice of k, the number of clusters
Choice of scaling factor– Realistically, search over and pick value that gives the tightest
clusters
Choice of clustering method
2
Page 29
Comparison of Methods
Authors Matrix used Procedure/Eigenvectors used
Perona & Freeman Affinity A 1st (largest) eigenvector x:
Recursive procedure; can be used non-recursively with k-largest eigenvectors for simple cases
Shi & Malik D-A with D a
degree matrix
2nd smallest generalized eigenvector
Also recursive
Ng, Jordan, Weiss Affinity A,
User inputs k
Normalizes A. Finds k eigenvectors, forms X. Normalizes X, clusters rows
Ax x
( , ) ( , )j
D i i A i j( )D A x Dx
Page 30
Advantages/Disadvantages
Perona & Freeman– For block diagonal affinity matrices, the first eigenvector finds points in
the “dominant” cluster; not very consistent Shi & Malik
– 2nd generalized eigenvector minimizes affinity between groups by affinity within each group; no guarantee, constraints
Ng, Jordan, Weiss– Again depends on choice of k
– Claim: effectively handles clusters whose overlap or connectedness varies across clusters