Data Mining Spectral Clustering Junli Zhu SS 2005.

Data Mining

Spectral Clustering

Junli Zhu

SS 2005

Einleitung

• Motivation

• Algorithmen

spectral clustering

kernel k-means

Spectral ClusteringAlgorithmus

Gegeben sei eine Menge von Punkte S={s1 , …, sn }in

1. Sei affinity Matrix A∈ , definiere Aij

=exp(- ), wenn i≠j ; sonst Aii =0

2. Sei D eine Diagonalmatrix, Dii = , und

Rl

Rnn

22

2ss ji

n

1jAij

2121 ADDL

3. Finde die k groessten Eigenvektoren von L: x1 , x2,…, xk, bilde Matrix X=[x1 x2… xk]

4. Bilde die Matrix Y aus X, damit jeder Zeile von X normalisiert ist und die Laenge 1 hat

5. Behandle jede Zeile von Y als ein Punkt im Raum ,verpacke sie in k clusters mit k-means oder einer anderer Methode

6. Zum Schluss ordne den originale Punkt si zu Cluster j nur ,wenn Zeile i der Matrix Y zu Cluster j zugeordnet ist.

Rk

Analysis • Ideal case

sei k=3,d.h. 3 clusters S1, S2, S3, mit der Groesse n1, n2, n3,

sei = 0,wenn xi und xj in unterschiedliche cluster,sonst = Aij

ijA^

ijA^

AA

AA

)33(

)22(

)11(

^

00

00

00

)33(^

)22(^

)11(^

^

00

00

00

L

L

L

L

^ )(^ )(^21

)(

21)(

DADLiiii

iiii

xx

xX

)3(

1

)2(

1

)1(

1^

00

00

00

RY

Y

Y

Y

100

010

001

^

^

^

)3(

)2(

)1(

^

^

100

0^

10

00^

1

^

100

0^

10

00^

1

)(

)(

2

)(

1

)(

)(

)(

2

)(

1

d

d

dA

d

d

d

i

j

i

i

ii

i

j

i

i

Proposition 1

Let ’s off-diagonal blocks , ,be zero. Also assume that each cluster is connected. Then there exist k orthogonal vectors ( if i=j, 0 otherwise ) so that ’s rows satisfy

for all i = 1,…,k, j = 1,…, ni.

)(^ ij

A

1rr j

T

i

^

Y

ry i

i

j

^ )(

^

A ji

S i

rr ki,...,

General Case • Eigengap

• Assumption A 1. there exists δ>0 so that ,for all i = 1,….,k,

• Assumption A 1.1 define the Cheeger constant of the cluster Si to be

Where the outer minimum is over all index subsets . Assume that there exists δ>0 so that

for all i.

}^

,^

min{

)( )()(

,

)(

min

k

i

kIj

i

j

kIj

ii

jk

i

dd

ASh

},...,1{ in

1)(

2

i

ji

2))((2

Sh i

• Assumption A 2. there is some fixed є1 > 0, so that for all i1 , i2 ∈ {1,

…,k}, i1 ≠ i2 ,we have that

• Assumption A 3. for some fixed є2 >0, for every i=1,…,k, j ∈Si, we

have

1

2

1 2 ^^ S Si i

j k

kj

jk

dd

A

)( , ^^

2

^

21

2

:

Silk

jk

klS

dd

A

d

A

j

kk jki

• Assumption A 4.

there is some constant C>0 so that for every i = 1,…,k, j = 1,…, ni , we have

)()^

(^

1

)()(

Cdd nn

i

i

k

i

k

i

j

• Theorem 2

let assumptions A1, A2, A3 und A4 hold. Set

. if , then there exist k orthogonal vectors r1,.., rk ( if i = j , 0 0therwise) so that Y’s rows satisfy

2

21)1( kkk )22(

1rr j

T

i

)2()24( 2

22

1 1

2

2

4)(1

kry Cn

iijn

k

i j

i

• Kernel k-means

Before clustering ,points are mapped to a higher-dimensional feature space using a nonlinear function

• Weighted kernel k-means

Weighted Kernel k-means• A weight for each poing a : w(a)• Cluster , the partitioning of points , with

the non-linear function , define the objektiv function

where

j}{

1 j

k

j

ma jawjDk

l a

k

jj

)(}{2

11)()(

j

j

b

b

j bw

bbw

m )(

)()(

• Euklidische Abstand von zu mj)(a

))((

)(

)()()(

2

,

2

)()()()(

)(

)()()(2

)()(

jb

cbcwbw

bw

babw

aa

jb

jb

bw

bw

bbwa

j

j

jcb

b

b

Algorithmus weighted-kernel-kmeans(K,k,w,C1,…, Ck )Input: K: kernel matrix, k: number of clusters,w: weights for each pointOutput: C1 ,.., Ck :partitioning of the points1. Initialize the k clusters:2. Set t = 0.3. For each point a ,find its new cluster index as

4. Compute the updated clusters as

5. If not converged , set t=t+1 and go to Step 3; Otherweise, stop.

,)( )(minarg2*

maj jaj

}.)(:{*1

jaa jCt

j

CC k

)0()0(

1,...,

Spectral Clustering with Normalized Cuts

• Gegeben sei ein Graph G=( V, E, A), wobei V ist die Menge der Eckpunkten, E ist die Menge der Kanten zwischen die Punkten, A ist ein Matrix der Kanten,

sei A, B ⊆ V ,definiere

links(A,B)=

normlinkratio(A,B)=

BA j,

),(i

jiA

)(

)(

VA,BA,

links

links

• Sei D ein diagonale Matrix,mit , das normalized cut Kriterium

minimize

ist aber Äquivalent zu dem Problem

maximize ,

wobei , X ist ein n*k indicator matrix , und

j ijii AD

)(1

AZZspurk

T

)\,(1

1

k

jjtionormlinkra

kVVV

kT IDZZ

21)( DXXXZ T

• Sei ,

erleichtere die Anforderungen, so dass ,

X= maximize

Um dieses Problem zu loesen, setzen wir den Matrix

mit k Eigenvektoren von dem Matrix

ZDZ 21~

k

T

IZZ ~~

)(~

2121~

ZADDZspurT

~

Z2121 ADD

• Normalized cuts using Weighted Kernel k-means

sei W=D und , die spur maximazation von weighted kernel k-means

ist Äquivalent zu spur maximazation fuer normalized cut , wenn .

• Kernel k-means using Eigenvectors

rechne die erste k Eigenvektoren von Matrix

11 ADDK

)( 2121 YADDYspur T

YZ ~

2121 KWW

• Fazit

Spectral Clustering

Kernel k-means

Data Mining Spectral Clustering Junli Zhu SS 2005.

Documents

Transcript of Data Mining Spectral Clustering Junli Zhu SS 2005.