Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
A scalable multilevel algorithm for community structure detection Melih Onus Hristo Djidjev Arizona...
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of A scalable multilevel algorithm for community structure detection Melih Onus Hristo Djidjev Arizona...
A scalable multilevel algorithm for community structure detection
Melih Onus Hristo Djidjev Arizona State University Los Alamos National Laboratory
Models and Algorithms for the Web Graph (WAW 2006) November 29 – December 2, 2006
Community Structure Detection Problem
The problem of identifying communities in a network is usually modeled as a graph clustering problem– Vertices correspond to individual items
– Edges describe relationships
– The communities correspond to subgraphs • Dense connections between vertices from the same subgraph
• Fewer connections between vertices in different subgraphs
Motivation: Why to detect communities?
Analyze and understand the information contained in the huge amount of data available on the WWW
Finding related commercial items Recommendation systems Important for
– Social networks
– Ad-hoc networks
– Protein interaction networks
– Genetic networks
Motivation: Why to detect communities?
Predict how much someone going to love a movie based on their movie preferences
Grand Prize
$1.000.000
Outline of the talk
Previous work Graph partitioning problem Our approach Modularity Reduction Multilevel graph partitioning Experimental results Conclusions
Previous Work
Two main classes– Agglomerative Methods (addition of edges)
– Divisive Methods (removal of edges)
Algorithms based on– Laplacian Matrix
– Centrality measures
– Flow models
– Random walks
– Resistor networks
– Optimization
Not fast enough or inaccurate
Graph Partitioning Problem
Given a graph G(V, E), find a partition such that – The partition is balanced (i.e., the number of vertices of all
subsets are roughly equal)
– Cut size is minimized (i.e., the number of the edges with endpoints in different subsets is minimized)
Previous Work: – Kernighan-Lin algorithm
– Spectral partitioning
– Multilevel algorithms
Kernighan - Lin Algorithm
Find an initial random partition
Improve by a greedy procedure that swaps pairs of vertices from different partitions
Minimize the size of the cut set
uv
uv
Graph Partitioning vs Graph Clustering
Minimize cut size Equal number of vertices
in each subset Number of subsets is an
input
Find Clusters Community sizes may differ
Number of subsets varies
Algorithms for graph partitioning can not be directly used to produce good quality clustering
Our approach
Convert original graph G into a complete graph G’ Find min-cut of G’ using modified graph partitioning
method This will produce a good quality (high modularity)
clustering for G
Modularity
A useful measure of clustering quality Introduced by Newman [6] Modularity of a partitioning
= (number of edges within communities)
– (expected number of such edges) We are trying to find a division of graph with high
modularity
Reduction
Graph Clustering Problem: The problem of finding a clustering of maximum modularity in G
Min-Cut Problem: The problem of finding a minimum cut in a complete edge-weighted graph G'
ReductionMaximize modularity of a
partitioning
= (number of edges within communities)
– (expected number of such edges)ij
ij
1 - p , if (i, j) E(G)Weight (i, j) =
- p , if (i, j) E(G)
Minimize (- modularity)
= (cut size)
– (expected cut size)
Graph Clustering Problem: Maximize modularity
Min-Cut Problem: Minimize cut size
Random Graph Models
Erdos - Renyi Model:
2
nm
pij
ij
ij
1 - p , if (i, j) E(G)Weight (i, j) =
- p , if (i, j) E(G)
pij : the probability that there is an edge between vertices i and j in a random graph from a given distribution
Chung - Lu Model:m
ddp jiij 2
Multilevel graph partitioning
Fast and an accurate method for producing high-quality partitions
Coarsening
Uncoarsening
Partitioning
Consists of the three phases: – Coarsening phase– Partitioning phase– Uncoarsening and refinement
phase
Coarsening Phase
Coarsening
Uncoarsening
Partitioning
Find a maximal matching and collapse edges to a vertex
Recursive coarsening:
< G = G1, G2, …, Gk >
Partitioning Phase
Coarsening
Uncoarsening
Partitioning
Greedy graph growing partitioning
Partition Gk
Uncoarsening and Refinement Phase
Coarsening
Uncoarsening
Partitioning
Project the partitioning Pi of Gi to Pi-1 of Gi-1
More degrees of freedom at Gi than Gi-1
Improve Pi using KL algorithm
Implementation
Our implementation is based on the graph partitioning package METIS [3] that employs a multilevel strategy
Convert the graph partitioning algorithm into a clustering one– The optimal clustering might not be balanced.
We ignore the restrictions that control the sizes of the parts.
– The number of the parts in the optimal clustering is not known.
We employ a recursive bisection procedure.
– The original graph G might be sparse, while the transformed one G' is complete. Our algorithm does not explicitly generate G’.
Modularity: Erdos - Renyi Model
(- Modularity) = cut size – n1n2p
n1 n2
Erdos - Renyi Model:
2
nm
pij
(- Modularity)’ = cut size’ – (n1+1)(n2-1)p
Modularity: Chung - Lu Model
(- Modularity) = cut size – w1w2/2m
w1 w2
(- Modularity)’ =
cut size’ – (w1 + w(v))(w2 - w(v))/2m
wi: Sum of degrees in partition i
Analysis
Time Complexity: O(n+m) Experiments
– Random Graphs– k-community graphs – nd.edu
Experiment I: Random Graphs
We generated random graphs with 128 vertices and 4 communities of size 32 each
The expected degree of any vertex is 16
Out degree varies
Experiment II: k-community graphs
We generated graphs with k communities
Size of each community is 100 Expected number of edges in the
community is equal to expected number of edges going outside from community.
Probability of an edge in communities varies between 0.5 and 0.1.
Results show that graphs are clustered especially %99 correctly.
Experiment III: nd.edu
Data consists of the complete map of the nd.edu domain, which contains 325,729 document and 1090108 links
Our algorithm clusters this graph into 280 clusters with modularity 0.925579
This high modularity indicates strong community structure in the graph
We show the dendrogram generated by our algorithm.
The size of rectangles are proportional to size of communities.
Conclusions
Community structure detection problem A scalable algorithm Based on multilevel graph partitioning Uses modularity as a quality measure