Power Iteration Clustering Speaker: Xiaofei Di 2010.10.11.
-
Upload
daniella-nelson -
Category
Documents
-
view
216 -
download
0
Transcript of Power Iteration Clustering Speaker: Xiaofei Di 2010.10.11.
Power Iteration Clustering
Speaker: Xiaofei Di2010.10.11
Outline
• Authors• Abstract• Background • Power Iteration Clustering(PIC)• Conclusion
Authors• Frank Lin• PhD Student
Language Technologies Institute School of Computer Science Carnegie Mellon University
• http://www.cs.cmu.edu/~frank/
• William W. Cohen • Associate Research Professor, Machine
Learning Department, Carnegie Mellon University
• http://www.cs.cmu.edu/~wcohen/
Abstract• We present a simple and scalable graph clustering
method called power iteration clustering.• PIC finds a very low-dimensional embedding of a
dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data. This embedding turns out to be an effective cluster indicator, consistently outperforming widely used spectral methods such as Ncut on real datasets.
• PIC is very fast on large datasets, running over 1000 times faster than an Ncut implementation based on the state-of-the-art IRAM eigenvector computation technique.
摘要• 本文提出了一种简单可扩展的图聚类方法:
快速迭代聚类( PIC )。• PIC 利用数据归一化的逐对相似度矩阵,采
用截断的快速迭代法,寻找数据集的一个超低维嵌入。这种嵌入恰好是很有效的聚类指标,使它在真实的数据集上总是好于广泛使用的谱聚类方法,比如 NCut 。
• 在大规模数据集上, PIC 非常快,比基于最好的特征计算技术实现的 Ncut 快 1000 倍。
Background 1 ----spectral clustering
: { , , ..., }1 2
: ( , ) 0
: ( , )
:
deg
X x x xn
s x xi j
W s x xij i j
D diagonal m
dataset
similarity function
affinity matrix
ree ma atrix
d Wjii ij
trix
1
1 1
2 2
-1
normalized affinity matrix
unnormalized graph Laplacian matrix
normalized s
: NA = D
: L = D -
ymmetric Laplacian matrix
normalized random-walk Laplacian m
W
: L=I - D
: L=I - D Watrix
W
WD
Background 2 ----Power Iteration
Method
• Advantage dose not compute matrix decomposition
• Disadvantages finds only the largest eigenvalue and converges slowly
• An eigenvalue algorithm– Input: initial vector b0 and the matrix A– Iteration: 1
1
kk
k
Abb Ab
• Convergence Under the assumptions:
A has an eigenvalue that is strictly greater in magnitude than its other eigenvalues The starting vector b0 has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue.
then: A subsequence of converges to an eigenvector associated with the dominant eigenvalue
kb
Power Iteration Clustering(PIC)Unfortunately, since the sum of each row of NA is 1, the largest eigenvector of NA (the smallest of L) is a constant vector with eigenvalue 1.
Fortunately, the intermediate vectors during the convergence process are interesting.
22
2
|| || and s(x ,x )=exp( )
2i j
i i j
x xx R
Example:
Conclusion: PI first converges locally within a cluster.
PI’s ConvergenceLet: W = NA (Normalized affinity matrix ),
1 1 n
1 1
2
W has eigenvectors e ,..., with eigenvalues λ ,...,λ
λ =1, e is constant
,...., are lager than the remaining ones
n
k
e
Spectral representation of
Spectral distance between a and b:
1
2 2 has an ( , )-eigengap between the k and (k+1) eigenvecto andr th t k khW
every W is e bounded
is the t-th iteration of PItv
0( , ) distance between a and b: t v
1 2
[ ( ) ( )]t
nj
j j jj k
e a e b c
2
signal is an approximation of spec, but
a) compressed to the small radius R
) has components distorted by c and
) has terms that are additively combined(rather than Euclidean)
t
t
t
i ib
c
a) The size of the radius is of no importance in clustering, because most clustering methods based on relative distance, not absolute one.
b) The importance of the dimension associated with the i-th eigenvector is downweighted by (a power of) its eigenvalue, which often improves performance for spectral methods.
c) For many natural problems, W is approximately block-stochastic, and hence the first k eigenvectors are approximately piecewise constant over the k clusters.
It is easy to see that when spec(a,b) is small, signal must also small. However, when a and b are in different clusters, since the terms are signed and additively
combined, it is possible that they may “cancel out” and make a and b seem to be in the same cluster. Fortunately, this seems to be uncommon in practice when the
cluster number k is not too large.
t
So for large enough a, small enough t,
signal is very likely a good cluster indicator.
Early stopping for PI1
1
at t:
at t:
ˆ|| || then stop PI
t t t
t t t
t
velocity v v
acceleration
if
While the clusters are ‘’locally converging”, the rate of convergence changes rapidly; whereas during the
final global convergence, the converge rate appears more stable.
1
5
0
1*10ˆ1.
n is the number of data instances
2. v ( ) ( )
V(A)=
3. [ ,..., ]
( dimension is good enough)
k
ijj
iji j
tt t
n
Ai
V A
A
V v v v
one
Experiments (1/3)• Purity : cluster purity• NMI : normalized mutual information• RI : rand index The Rand index or Rand measure is a measure of the similarity between two data clusterings. Given a set of n elements and two partitions of S to compare, and , we define the following:
a = | S * | , where b = | S * | , where c = | S * | , where d = | S * | , where
for someThen:
Experiments (2/3)Experimental comparisons on accuracy of PIC
Experimental comparisons on eigenvalue weighting
Experiments (3/3)
Experimental comparisons on scalability
NCutE uses slower, classic eigenvalue decomposition method to find all eigenvectors.NCutI uses fast Implicitly Restarted Arnoldi Method(IRAM) for the top k eigenvectors.
Synthetic dataset
Conclusion
• Novel• Simple• Efficient
Appendix ----NCut
Appendix ----NJW