Markov Cluster A lgorithm
description
Transcript of Markov Cluster A lgorithm
![Page 1: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/1.jpg)
![Page 2: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/2.jpg)
Introduction
Important Concepts in MCL Algorithm
MCL Algorithm
The Features of MCL Algorithm
Summary
![Page 3: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/3.jpg)
Graph Clustering Intuition:
◦ High connected nodes could be in one cluster◦ Low connected nodes could be in different
clusters. Model:
◦ A random walk may start at any node ◦ Starting at node r, if a random walk will reach
node t with high probability, then r and t should be clustered together.
![Page 4: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/4.jpg)
Markov Clustering (MCL) Markov process
◦ The probability that a random will take an edge at node u only depends on u and the given edge.
◦ It does not depend on its previous route.◦ This assumption simplifies the computation.
![Page 5: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/5.jpg)
MCL Flow network is used to approximate the
partition There is an initial amount of flow injected
into each node. At each step, a percentage of flow will goes
from a node to its neighbors via the outgoing edges.
![Page 6: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/6.jpg)
MCL Edge Weight
◦ Similarity between two nodes◦ Considered as the bandwidth or connectivity.◦ If an edge has higher weight than the other, then
more flow will be flown over the edge.◦ The amount of flow is proportional to the edge
weight.◦ If there is no edge weight, then we can assign the
same weight to all edges.
![Page 7: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/7.jpg)
Intuition of MCL Two natural clusters
When the flow reaches the border points, it is likely to return back, than cross the border.
A B
![Page 8: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/8.jpg)
MCL When the flow reaches A, it has four
possible outcomes.◦ Three back into the cluster, one leak out.◦ ¾ of flow will return, only ¼ leaks.
Flow will accumulate in the center of a cluster (island).
The border nodes will starve.
![Page 9: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/9.jpg)
Simualtion of Random Flow in graph
Two Operations: Expansion and Inflation
Intrinsic relationship between MCL process result and cluster structure
![Page 10: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/10.jpg)
Popular Description: partition into graph so that
Intra-partition similarity is the highest
Inter-partition similarity is the lowest
![Page 11: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/11.jpg)
Observation 1:
The number of Higher-Length paths in G is large for pairs of vertices lying in the same dense cluster
Small for pairs of vertices belonging to different clusters
![Page 12: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/12.jpg)
Oberservation 2:
A Random Walk in G that visits a dense cluster will likely not leave the cluster until many of its vertices have been visited
![Page 13: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/13.jpg)
Definitions nxn Adjacency matrix A.
◦ A(i,j) = weight on edge from i to j◦ If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric
nxn Transition matrix P.◦ P is row stochastic◦ P(i,j) = probability of stepping on node j from node i = A(i,j)/∑iA(i,j)
nxn Laplacian Matrix L.◦ L(i,j)=∑iA(i,j)-A(i,j)◦ Symmetric positive semi-definite for undirected graphs◦ Singular
![Page 14: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/14.jpg)
Definitions
Adjacency matrix A Transition matrix P
1
1
11
1
1/2
1/21
![Page 15: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/15.jpg)
What is a random walk
1
1/2
1/21
t=0
![Page 16: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/16.jpg)
What is a random walk
1
1/2
1/21
1
1/2
1/21
t=0 t=1
![Page 17: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/17.jpg)
What is a random walk
1
1/2
1/21
1
1/2
1/21
t=0 t=1
1
1/2
1/21
t=2
![Page 18: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/18.jpg)
What is a random walk
1
1/2
1/21
1
1/2
1/21
t=0 t=1
1
1/2
1/21
t=2
1
1/2
1/21
t=3
![Page 19: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/19.jpg)
Probability Distributions
xt(i) = probability that the surfer is at node i at time t
xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i) =∑jxt(j)*P(j,i)
xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt
What happens when the surfer keeps walking for a long time?
![Page 20: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/20.jpg)
Flow Formulation
• Flow: Transition probability from a node to another node.• Flow matrix: Matrix with the flows among all nodes; ith
column represents flows out of ith node. Each column sums to 1.
1 2 3
1 2 3
0.5 0.5
1 1
1 2 3
1 0 0.5 0
2 1.0 0 1.0
3 0 0.5 0
Flow
Matrix
20
![Page 21: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/21.jpg)
Measure or Sample any of these—high-length paths, random walks and deduce the cluster structure from the behavior of the samples quantities.
Cluster structure will show itself as a peaked distribution of the quantities
A lack of cluster structure will result in a flat distribution
![Page 22: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/22.jpg)
Markov Chain
Random Walk on Graph
Some Definitions in MCL
![Page 23: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/23.jpg)
A Random Process with Markov Property
Markov Property: given the present state, future states are independent of the past states
At each step the process may change its state from the current state to another state, or remain in the same state, according to a certain probability distribution.
![Page 24: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/24.jpg)
![Page 25: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/25.jpg)
A walker takes off on some arbitrary vertex
He successively visits new vertices by selecting arbitrarily one of outgoing edges
There is not much difference between random walk and finite Markov chain.
![Page 26: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/26.jpg)
Simple Graph
Simple graph is undirected graph in which every nonzero weight equals 1.
![Page 27: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/27.jpg)
Associated Matrix
The associated matrix of G, denoted MG ,is defined by setting the entry (MG)pq equal to w(vp,vq)
![Page 28: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/28.jpg)
Markov Matrix
The Markov matrix associated with a graph G is denoted by TG and is formally defined by letting its qth column be the qth column of M normalized
![Page 29: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/29.jpg)
![Page 30: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/30.jpg)
The associate matrix and markov matrix is actually for matrix M+I
I denotes diagonal matrix with nonzero element equals 1
Adding a loop to every vertex of the graph because for a walker it is possible that he will stay in the same place in his next step
![Page 31: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/31.jpg)
![Page 32: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/32.jpg)
Find Higher-Length Path
Start Point: In associated matrix that the quantity (Mk)pq has a straightforward interpretation as the number of paths of length k between vp and vq
![Page 33: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/33.jpg)
(MG+I)2
MG
![Page 34: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/34.jpg)
MG
![Page 35: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/35.jpg)
![Page 36: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/36.jpg)
Flow is easier with dense regions than across sparse boundaries,
However, in the long run, this effect disappears.
Power of matrix can be used to find higher-length path but the effect will diminish as the flow goes on.
![Page 37: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/37.jpg)
Idea: How can we change the distribution of transition probabilities such that prefered neighbours are further favoured and less popular neighbours are demoted.
MCL Solution: raise all the entries in a given column to a certain power greater than 1 (e.g. squaring) and rescaling the column to have the sum 1 again.
![Page 38: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/38.jpg)
![Page 39: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/39.jpg)
![Page 40: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/40.jpg)
![Page 41: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/41.jpg)
![Page 42: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/42.jpg)
Expansion Operation: power of matrix, expansion of dense region
Inflation Operation: mention aboved, elimination of unfavoured region
![Page 43: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/43.jpg)
The MCL algorithm
Expand: M := M*M
Inflate: M := M.^r (r usually 2), renormalize columns
Converged?
Input: A, Adjacency matrixInitialize M to MG, the canonical transition matrix M:= MG:= (A+I) D-1
Yes
Output clusters
No
Prune
Enhances flow to well-connected nodes as well as to new nodes.
Increases inequality in each column. “Rich get richer, poor get poorer.”
Saves memory by removing entries close to zero.
43
![Page 44: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/44.jpg)
Multi-level Regularized MCL
Input Graph
Intermediate Graph
Intermediate Graph
Coarsest Graph
. . . . . .
Coarsen
Coarsen
Coarsen
Run Curtailed R-MCL,project flow.
Run Curtailed R-MCL, project flow.
Input Graph
Run R-MCL to convergence, output clusters.
Faster to run on smaller graphs
first
Captures global topology of
graph
Initializes flow matrix of refined
graph
44
![Page 45: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/45.jpg)
![Page 46: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/46.jpg)
![Page 47: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/47.jpg)
![Page 48: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/48.jpg)
http://www.micans.org/mcl/ani/mcl-animation.html
![Page 49: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/49.jpg)
Find attractor: the node a is an attractor if Maa is nonzero
Find attractor system: If a is an attractor then the set of its neighbours is called an attractor system.
If there is a node who has arc connected to any node of an attractor system, the node will belong to the same cluster as that attractor system.
![Page 50: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/50.jpg)
Attractor Set={1,2,3,4,5,6,7,8,9,10}The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10}The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},{8,9,12,13,14,15},{10,12,13}
![Page 51: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/51.jpg)
how many steps are requred before the algorithm converges to a idempoent matrix?
The number is typically somewhere between 10 and 100
The effect of inflation on cluster granularity
![Page 52: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/52.jpg)
R denotes the inflation operation constants. a denotes the loop weight.
![Page 53: Markov Cluster A lgorithm](https://reader036.fdocuments.in/reader036/viewer/2022062309/568147cd550346895db50a86/html5/thumbnails/53.jpg)
MCL stimulates random walk on graph to find cluster
Expansion promotes dense region while Inflation demotes the less favoured region
There is intrinsic relationship between MCL result and cluster structure