Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from...

54
Spectral Clustering by HU Pili June 16, 2013

Transcript of Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from...

Page 1: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering

by HU Pili

June 16, 2013

Page 2: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Outline

• Clustering Problem

• Spectral Clustering Demo

• Preliminaries

◦ Clustering: K-means Algorithm

◦ Dimensionality Reduction: PCA, KPCA.

• Spectral Clustering Framework

• Spectral Clustering Justification

• Ohter Spectral Embedding Techniques

Main reference: Hu 2012 [4]

2

Page 3: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Clustering Problem

Height

Weight

Figure 1. Abstract your target using feature vector

3

Page 4: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Clustering Problem

Height

Weight

Figure 2. Cluster the data points into K (2 here) groups

4

Page 5: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Clustering Problem

Height

Weight

ThinFat

Figure 3. Gain Insights of your data

5

Page 6: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Clustering Problem

Height

Weight

Figure 4. The center is representative (knowledge)

6

Page 7: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Clustering Problem

Height

Weight

Figure 5. Use the knowledge for prediction

7

Page 8: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Review: Clustering

We learned the general steps of Data Mining/ KnowledgeDiscover using a clustering example:

1. Abstract your data in form of vectors.

2. Run learning algorithms

3. Gain insights/ extract knowledge/ make prediction

We focus on 2.

8

Page 9: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Demo

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Figure 6. Data Scatter Plot

9

Page 10: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Demo

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Figure 7. Standard K-Means

10

Page 11: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Demo

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Figure 8. Our Sample Spectral Clustering

11

Page 12: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Demo

The algorithm is simple:

• K-Means:[idx, c] = kmeans(X, K) ;

• Spectral Clustering:epsilon = 0.7 ;

D = dist(X’) ;

A = double(D < epsilon) ;

[V, Lambda] = eigs(A, K) ;

[idx, c] = kmeans(V, K) ;

12

Page 13: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Review: The Demo

The usual case in data mining:

• No weak algorithms

• Preprocessing is as important as algorithms

• The problem looks easier in another space (thesecret coming soon)

Transformation to another space:

• High to low: dimensionality reduction, low dimen-sion embedding. e.g. Spectral clustering.

• Low to high. e.g. Support Vector Machine (SVM)

13

Page 14: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Secrets of Preprocessing

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Figure 9. The similarity graph: connect to ε-ball.

D = dist(X’); A = double(D < epsilon);

14

Page 15: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Secrets of Preprocessing

−0.11 −0.1 −0.09 −0.08 −0.07 −0.06 −0.05 −0.04 −0.03 −0.02 −0.01−0.1

−0.05

0

0.05

0.1

0.15

Figure 10. 2-D embedding using largest 2 eigenvectors

[V, Lambda] = eigs(A, K);

15

Page 16: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Secrets of Preprocessing

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Figure 11. Even better after projecting to unit circle (not

used in our sample but more applicable, Brand 2003 [3])16

Page 17: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Secrets of Preprocessing

−0.2 0 0.2 0.4 0.6 0.8 1 1.20

1

2

3x 10

4 Angles −− All Points

0.94 0.96 0.98 1 1.02 1.04 1.060

200

400

600

800Angles −− Cluster 1

0.8 0.85 0.9 0.95 1 1.05 1.1 1.150

1000

2000

3000

4000Angles −− Cluster 2

Figure 12. Angle histogram: The two clusters are concen-

trated and they are nearly perpendicular to each other17

Page 18: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Notations

• Data points: Xn×N = [x1, x2, , xN]. N points,each of n dimensions. Organize in columns.

• Feature extraction: φ(xi). d dimensional.

• Eigen value decomposition: A = UΛUT. First d

colums of U : Ud.

• Feature matrix: Φn̂×N = [φ(x)1, φ(x)2, , φ(x)N]

• Low dimension embedding: YN×d. Embed N

points into d-dimensional space.

• Number of clusters: K.

18

Page 19: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

K-Means

• Initialize m1, ,mK centers

• Iterate until convergence:

◦ Cluster assignment: Ci=argminj ‖xi−mj‖for all data points, 16 i6N .

◦ Update clustering: Sj={i:Ci= j},16 j6K

◦ Update centers: mj=1

|Sj |

i∈Sjxi

19

Page 20: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Remarks: K-Means

1. A chiken and egg problem.

2. How to initialize centers?

3. Determine superparameter K?

4. Decision boundary is a straight line:

• Ci= argminj ‖xi−mj‖

• ‖x−mj‖= ‖x−mk‖

• miTmi−mj

Tmj=2xT(mi−mj)

We address the 4th points by transforming the data intoa space where straight line boundary is enough.

20

Page 21: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Principle Component Analysis

DataPCError

Figure 13. Error minimization formulation

21

Page 22: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Principle Component Analysis

Assume xi is already centered (easy to preprocess).Project points onto the space spanned by Un×d with min-imum errors:

minU∈Rn×d

J(U)=∑

i=1

N

‖UUTxi− xi‖2

s.t. UTU = I

22

Page 23: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Principle Component Analysis

Transform to trace maximization:

maxU∈Rn×d

Tr[UT(XXT)U ]

s.t. UTU = I

Standard problem in matrix theory:

• Solution of U is given by the largest d eigen vec-tors of XXT (those corresponding to largest eigen

values). i.e. XXTU =UΛ.

• Usually denote Σn×n =XXT, because XXT canbe interpreted as the covariance of variables (nvariables). (Again not that X is centered)

23

Page 24: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Principle Component Analysis

About UUTxi:

• xi: the data points in original n dimensional space.

• Un×d: the d dimensional space (axis) expressed

using the coordinates of original n-D space. –Principle Axis.

• UTxi: the coordinates of xi in d-dimensional space;d-dimensional embedding; the projection of xi tod-D space expressed in d-D space. – Principle

Component.

• U UTxi: the projection of xi to d-D space expressedin n-D space.

24

Page 25: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Principle Component Analysis

From principle axis to principle component:

• One data point: UTxi

• Matrix form: (Y T)d×N =UTX

Relation between covariance and similarity:

(XTX )Y = XTX (UTX)T

= XTXXTU

= XTU Λ

= Y Λ

Observation: Y is the eigen vectors of XTX .

25

Page 26: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Principle Component Analysis

Two operational approaches:

• Decompose (XXT)n×n and Let (Y T)d×N=UTX.

• Decompose (XTX)N×N and directly get Y .

Implications:

• Choose the smaller size one in practice.

• XTX hint that we can do more with the structure.

Remarks:

• Principle components are what we want in mostcases. i.e. Y . i.e. d-dimension embedding. e.g. Cando clustering on coordinates given by Y .

26

Page 27: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Kernel PCA

Settings:

• A feature extraction function:

φ:Rn→Rn̂

original n dimension features. Map to n̂ dimension.

• Matrix organization:

Φn̂×N = [φ(x)1, φ(x)2, , φ(x)N]

• Now Φ is the “data points” in the formulation ofPCA. Try to embed them into d-dimensions.

27

Page 28: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Kernel PCA

According to the analysis of PCA, we can operate on:

• (ΦΦT)n̂×n̂: the covariance matrix of features

• (ΦTΦ)N×N: the similarity/ affinity matrix (inspectral clustering language); the gram/ kernalmatrix (in keneral PCA language).

The observation:

• n̂ can be very large. e.g. φ:Rn→R∞

• (ΦTΦ)i,j=φT(xi)φ(xj). Don’t need explict φ; only

need k(xi, xj)= φT(xi)φ(xj).

28

Page 29: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Kernel PCA

k(xi, xj) = φT(xi)φ(xj) is the “kernel”. One importantproperty: by definition,

• k(xi, xj) is a positive semidefinite function.

• K is a positive semidefinite matrix.

Some example kernals:

• Linear: k(xi, xj)=xiTxj. Degrade to PCA.

• Polynomial: k(xi, xj)= (1+xiTxj)p

• Gaussion: k(xi, xj)= e−

∥xi−xj

2

2σ2

29

Page 30: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Remarks: KPCA

• Avoid explicit high dimension (maybe infinite) fea-ture construction.

• Enable one research direction: kernel engineering.

• The above discussion assume Φ is centered! SeeBishop 2006 [2] for how to center this matrix (usingonly kernel function). (or “double centering” tech-nique in [4])

• Out of sample embedding is the real difficulty, seeBengio 2004 [1].

30

Page 31: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Review: PCA and KPCA

• Minimum error formulation of PCA

• Two equivalent implementation approaches:

◦ covariance matrix

◦ similarity matrix

• Similarity matrix is more convenient to manipulateand leads to KPCA.

• Kernel is Positive Semi-Definite (PSD) by defini-tion. K =ΦTΦ

31

Page 32: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Framework

A bold guess:

• Decomposing K =ΦTΦ gives good low-dimensionembedding. Inner product measures similarity, i.e.k(xi, xj)= φT(xi)φ(xj). K is similarity matrix.

• In the operation, we acturally do not look at Φ.

• We can specify K directly and perform EVD:

Xn×N→KN×N

• What if we directly give a similarity measure, K,without the constraint of PSD?

That leads to the general spectral clustering.

32

Page 33: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Framework

1. Get similarity matrix AN×N from data points X.(A: affinity matrix; adjacency matrix of a graph;similarity graph)

2. EVD: A= UΛUT . Use Ud (or post-processed ver-sion, see [4]) as the d-dimension embedding.

3. Perform clustering on d-D embedding.

33

Page 34: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Framework

Review our naive spectral clustering demo:

1. epsilon = 0.7 ;

D = dist(X’) ;

A = double(D < epsilon) ;

2. [V, Lambda] = eigs(A, K) ;

3. [idx, c] = kmeans(V, K) ;

34

Page 35: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Remarks: SC Framework

• We start by relaxing A (K) in KPCA.

• Lose PSD == Lose KPCA justification? Not exact

A′=A+σI

• Real tricks: (see [4] section 2 for details)

◦ How to form A?

◦ Decompose A or other variants of A (L =D−A).

◦ Use EVD result directly (e.g. U) or use a

variant (e.g. UΛ1/2).

35

Page 36: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Similarity graph

Input is high dimensional data: (e.g. come in form of X)

• k-nearest neighbour

• ε-ball

• mutual k-NN

• complete graph (with Gaussian kernel weight)

36

Page 37: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Similarity graph

Input is distance.(

D(2))

i,jis the squred distance between

i and j. (may not come from raw xi, xj)

c = [x1Tx1, , xN

TxN]T

D(2) = c 1T+1 cT− 2XTX

J = I −1

n1 1T

XTX = −1

2JD(2)J

Remarks:

• See MDS.

37

Page 38: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Similarity graph

Input is a graph:

• Just use it.

• Or do some enhancements. e.g. Geodesic distance.See [4] Section 2.1.3 for some possible methods.

After get the graph (or input):

• Adjacency matrix A, Laplacian matrix L=D−A.

• Normalized versions:

◦ Aleft=D−1A, Asym=D−1/2

AD−1/2

◦ Lleft=D−1L, Lsym=D−1/2

LD−1/2

38

Page 39: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

EVD of the Graph

Matrix types:

• Adjacency series: Use the largest EVs.

• Laplacian series: Use the smallest EVs.

39

Page 40: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Remarks: SC Framework

• There are many possibilities in construction of sim-ilarity matrix and the post-processing of EVD.

• Not all of these combinations have justifications.

• Once a combination is shown to working, it maynot be very hard to find justifications.

• Existing works actually starts from very differentflavoured formulations.

• Only one common property: involve EVD;aka “spectral analysis”; hence the name.

40

Page 41: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Clustering Justification

• Cut based argument (main stream; origin)

• Random walk escaping probability

• Commute time: L−1 encodes the effective resis-tance. (where UΛ−1/2 come from)

• Low-rank approximation.

• Density estimation.

• Matrix perturbation.

• Polarization. (the demo)

See [4] for pointers.

41

Page 42: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Cut Justification

Normalized Cut (Shi 2000 [5]):

NCut=∑

i=1

Kcut(Ci, V −Ci)

vol(Ci)

Characteristic vector for Ci, χi= {0, 1}N:

NCut=∑

i=1

KχiTLχi

χiTDχi

42

Page 43: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Relax χi to real value:

minvi∈RN

i=1

K

viTLvi

s.t. viTDvi=1

viT vj=0, ∀i� j

This is the generalized eigenvalue problem:

L v=λDv

Equivalent to EVD on:

Lleft=D−1L

43

Page 44: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Matrix Perturbation Justification

• When the graph is ideally separable, i.e. multipleconnected components, A and L have character-istic (or piecewise linear) EVs.

• When not ideally separable but sparse cut exists,A can be viewed as ideal separable matrix plus asmall perturbation.

• Small perturbation of matrix entries leads to smallperturbation of EVs.

• EVs are not too far from piecewise linear: easy toseparate by simple algorithms like K-Means.

44

Page 45: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Low Rank Approximation

The similarity matrix A is generated by inner productin some unknown space we want to recover. We want tominimize the recovering error:

minY ∈RN×d

‖A−YY T ‖F2

The standard low-rank approximation problem, whichleads to EVD of A:

Y =UΛ1/2

45

Page 46: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Spectral Embedding Techniques

See [4] for some pointers: MDS, isomap, PCA, KPCA,LLE, LEmap, HEmap, SDE, MVE, SPE. The difference,as said, lies mostly in the construction of A.

46

Page 47: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Bibliography

[1] Y. Bengio, J. Paiement, P. Vincent, O. Delalleau, N. Le Roux,

and M. Ouimet. Out-of-sample extensions for lle, isomap, mds,

eigenmaps, and spectral clustering. Advances in neural informa-

tion processing systems, 16:177–184, 2004.

[2] C. Bishop. Pattern recognition and machine learning,

volume 4. springer New York, 2006.

[3] M. Brand and K. Huang. A unifying theorem for spectral

embedding and clustering. In Proceedings of the Ninth Inter-

national Workshop on Artificial Intelligence and Statistics, 2003.

[4] P. Hu. Spectral clustering survey, 5 2012.

[5] J. Shi and J. Malik. Normalized cuts and image segmentation.

Pattern Analysis and Machine Intelligence, IEEE Transactions

on, 22(8):888–905, 2000.

47

Page 48: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Thanks

Q/A

Some supplementary slides for details are attached.

48

Page 49: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

SVD and EVD

Definitions of Singular Value Decomposition (SVD):

Xn×N = Un×kΣk×kVN×kT

Definitions of Eigen Value Decomposition (EVD):

A = XTX

A = UΛUT

Relations:

XTX = V Σ2V T

XXT = UΣ2UT

49

Page 50: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Remarks:

• SVD requires UTU = I , V TV = I and σi > 0(Σ = diag(σ1, , σN)). This is to guarantee theuniqueness of solution.

• EVD does not have constraints, any U and Λ satis-fying AU =UΛ is OK. The requirement of UTU =I is also to guarantee uniqueness of solution (e.g.PCA). Another benefit is the numerical stability ofsubspace spanned by U : orthogonal layout is moreerror resilient.

• The computation of SVD is done via EVD.

• Watch out the terms and the object they refer to.

50

Page 51: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Out of Sample Embedding

• New data point x ∈Rn that is not in X . How tofind the lower dimension embedding, i.e. y ∈Rd.

• In PCA, we have principle axis U (XXT=UΛUT).Out of sample embedding is simple: y=UTx.

• Un×d is actually a compact representation of

knowledge.

• In KPCA and different variants of SC, we operateon similarity graph and do not have such compactrepresentation. It is thus hard to explicitly the outof sample embedding result.

• See [1] for some researches on this.

51

Page 52: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Gaussian Kernel

The gaussian kernel: (Let τ =1

2σ2)

k(xi, xj)= e−

∥xi−xj

2

2σ2 = e−τ ‖xi−xj‖2

Use Taylor expansion:

ex=∑

k=0

∞1

k!xk

Rewrite the kernel:

k(xi, xj) = e−τ(xi−xj)T(xi−xj)

= e−τxiTxi · e−τxj

Txj · e2τxiTxj

52

Page 53: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

Focus on the last part:

e2τxiTxj =

k=0

∞1

k!(2τxi

Txj)k

It’s hard to write out the form when xi ∈Rn, n > 1. Wedemo the case when n = 1. xi and xj are now singlevariable:

e2τxiTxj =

k=0

∞1

k!(2τ )kxi

kxjk

=∑

k=0

c(k) ·xikxj

k

53

Page 54: Spectral Clustering - HU Pili...Spectral Clustering Framework 1. Get similarity matrix AN× N from data points X. (A: affinity matrix; adjacency matrix of a graph; similarity graph)

The feature vector is: (infinite dimension)

φ(x)= e−τx2[

c(0)√

, c(1)√

x, c(2)√

x2, , c(k)√

xk,]

Verify that:

k(xi, xj)= φ(xi) · φ(xj)

This shows that Gaussian kernel implicitly map 1-D datato an infinite dimensional feature space.

54