Fast, Accurate Spectral Clustering Using Locally Linear Landmarks
Max Vladymyrov Google Inc.
Miguel Á. Carreira-Perpiñán EECS, UC Merced
May 17, 2017
Focus on the problem of Spectral Clustering, where we try to split the dataset into a set of K meaningful clusters.
• Compute the affinity matrix (e.g. Gaussian ).
• Construct the degree matrix and graph Laplacian (e.g. unnormalized ).
• find the low-dimensional projection , by minimizing
• solution is given in the closed form by trailing eigenvectors of
• apply k-means on normalized projection to achieve the final clustering.
Problem: does not scale when the number of points is large!
Spectral clustering
Y 2 RD⇥N
W 2 RN⇥Nwij = e�k(yi�yj)
2k/2�2
D = diag (W1)L = D�W
X 2 RK⇥N
minX tr�XLXT
�, s.t. XDXT = I, XD1 = 0
D� 12LD� 1
2
Spectral clusteringY 2 RD⇥N
X 2 RK⇥N
W 2 RN⇥N
k-means
compute affinities
eigs()
4
Learning with landmarks
Applications:• When is so large that the direct solution is infeasible.• To select hyperparameters (e.g. for Gaussian kernel: ) efficiently
even if is not large (since a grid search over these requires solving the eigenproblem many times).
• As an out-of-sample extension.
N
Goal is find a fast, approximate solution for the embedding using only the subset of points from .
X
k,�N
XOriginal datasetY Affinity matrix
YeY L ⌧ N
5
Learning with landmarks
Reduced affinity matrixLandmarks)
eY eXXOriginal datasetY Affinity matrix
6
Learning with landmarks
Reduced affinity matrixLandmarks eY eX
Problems:• We need a way to project the non-landmark points, e.g. with
Nyström method (Talwalkar el at, 2008).• Solving only on a subset only, uses the information about the
landmarks, but ignores information about non-landmarks. This requires using many landmarks to represent the data manifold well.
• If too few landmarks are used:‣ Bad solution for the landmark projection .‣…and bad prediction for the non-landmarks.
eX = e
x1 . . . , exL
eY
7
with reduced affinities .
Locally Linear Landmarks (LLL)
• Each projection then can be interpreted as a locally linear function of the landmarks:xn =
PLl=1 zlnex, n = 1, . . . , N ) X = eXZ
• Solving the original eigenproblem of with this constraint results in a reduced eigenproblem of the same form but of on :
• After is found, the non-landmarks are predicted as .(out-of-sample mapping).
mineX tr⇣eXeAeXT
⌘s.t. eXeBeXT = I
(Vladymyrov and Carreira-Perpiñán, ’13)
• For a set of landmarks find a local reconstruction matrix (e.g. by solving a liner system ). Locality is enforced by using only nearby landmarks when reconstructing .
eY Z 2 RN⇥L
Y = eYZ
N ⇥NeXL⇥ L
eA = ZLZT , eB = ZDZT
eX X = eXZ
• Final k-means step on to find a resulting clusters.X
Y
8
LLL: reduced affinities
yn
eyieyj
compute reconstruction weights using Y = eYZ
Z eAreduced affinities
landmark projection eX
projectionX = eXZ
k-means
eigs()
eA = ZLZ
LLL for spectral clustering
• Applied spectral clustering on 256x256 cameraman image (N=65536, D=3) with different number of landmarks.
OriginalExact SL, t=66 s,
N=65536LLL, e=1%, t=1.82 s,
L=2475LLL, e=10%, t=0.9 s,
L=14
Clus
terin
g er
ror
# of landmarks Runtime, s
LLL for spectral clustering• Projection error reports the error of the projection matrix X with respect to the
projection matrix of the exact LE.Pr
ojec
tion
erro
r
# of landmarks Runtime, s• Clustering error reports the final clustering error with respect to the exact LE.
time needed to construct
matrix Z
time to compute exact SC
L=N
LLL for custom affinities
1. Constrained spectral clustering. User provides additional must- and cannot- constrains.
2. Affinity aggregation. Using a linear combination of different affinities.
3. Motion Segmentation. affinity based on the spatio-temporal graph.
——————4. Proximity graph. Non-Gaussian affinity built using MST on
multiple affinities.5. LLL for hyperparameter selection. Fast search of the
parameter space for the optimal set of hyperharapeters.6. Out of sample extension. On-the-fly clustering assignment
(using the landmark re-projection formula ).X = eXZ
Constrained spectral clustering• We follow framework from (Lu and Carreira-Perpiñán, 2008).• User additionally provides must and cannot link constraints using the sparse
matrix . New affinities are computed as:
• Problems:• inverse is slow: solve by rearranging the elements
inside sparse matrix .• is much denser:
• Using LLL reduced affinities:
By precomputing and rearranging terms the overall complexity is and we don’t need to compute .
O(cNL)W
ZW
Q = (I+MW)�1M
W = (W�1 +M)�1 = W �W(I+MW)�1MW,
ZLZT = ZDZT � ZWZT
= Z diag (W1)ZT � Z diag (WQW1)ZT � ZWZT + ZWQWZT
WM
M
Constrained spectral clustering
Image size N Time, seconds
SC,Exact SC,LLL CSC,Exact CSC,LLL
Small (64⇥ 94) 6 016 4.47 0.87 5.14 0.51Medium (160⇥ 240) 38 400 44 4.66 104.49 6.51Large (321⇥ 481) 154 401 512.01 48.19 out-of-memory 59.98
• Problem of foreground segmentation:• added few constraints for an image of three different sizes,• run Spectral Clustering (SC) and Constrained Spectral Clustering (CSC).
• LLL achieves 10x speedup for SC and >20x for CSC.
Affinity aggregation for spectral clustering
We follow (Huang et al, 2012) that propose to combine many affinities together.• For a weighted affinity matrix minimize:
• Solve by alternating minimization between and . For each iteration: - : 1D root-finding that can be solved in a few iterations.- : eigendecomposition of .
• Using LLL:
Each of can be precomputed just once and are . The rest of operations are independent from .
W =PK
k=1 v2kW
(k)
minX,v XLXT , s.t. XDXT = I,vT1 = 1,
eL = ZDZT � ZWZT =KX
k=1
v2kZD(k)ZT �
KX
k=1
v2kZW(k)ZT .
ZW(k)ZT L⇥ L
v XvX L
N
Affinity aggregation for spectral clustering
• We used N=11368 faces from CMU-PIE dataset (each face is an 64x64 image in near frontal position).
• We used 3 types of affinities: • Local Binary Pattern (LBP),• Gabor texture,• Eigenfaces.
• Combined affinities achieve lower error than any affinity on its own.• LLL achieves similar accuracy with 35x speedup.
Motion segmentation
• 41 frames with each frame 120x160 image. Spatio-temporal affinities with 2 spatial, 3 color and 1 for a frame number. Overall N=787200 with D=6.
• Used L=5000 landmarks and t=3 minutes.Original LLL
Nystrom (full affinity) Nystrom (sparse affinity)
17
Conclusions• Spectral clustering is a useful framework for clustering data,
however, it does not scale well due to a eigendecomposition of a large NxN affinity matrix.
• Local Linear Landmarks provides a simple approximation algorithm that constructs smaller LxL affinities by incorporating local connection between all the affinities.
• The algorithm robustly shows >10x speedup with error within 1% of the exact method.
• It can be easily extended to a variety of useful settings:- constrained spectral clustering,- affinity aggregation,- motion segmentation,- etc.
Top Related