Graph-based multimodal clustering for social event detection in large collections of images
Graph Based Clustering
description
Transcript of Graph Based Clustering
Graph Based Clustering
Summer School
“Achievements and Applications of Contemporary Informatics,
Mathematics and Physics” (AACIMP 2011)
August 8-20, 2011, Kiev, Ukraine
Erik Kropat
University of the Bundeswehr Munich Institute for Theoretical Computer Science,
Mathematics and Operations Research
Neubiberg, Germany
Real World Networks
• Biological Networks
− Gene regulatory networks
− Metabolic networks
− Neural networks
− Food webs
• Technological Networks
− Telecommunication networks
− Internet
− Power grids
food web
power grid
Real World Networks
• Social Networks
− Communication networks
− Organizational networks
− Social media
− Online communities
• Economic Networks
− Financial market networks
− Trade networks
− Collaboration networks
social networks
economic networks
Source: Frank Schweitzer et al., “Economic Networks: The New Challenges,” Science 325, no. 5939 (July 24, 2009): 422-425.
Graph-Theory
• Graph theory can provide more detailed information about the inner structure of the data set in terms of
− cliques (subsets of nodes where each pair of elements is connected)
− clusters (highly connected groups of nodes)
− centrality (important nodes, hubs)
− outliers . . . (unimportant nodes)
• Applications
− social network analysis
− diffusion of information
− spreading of diseases or rumours
⇒ marketing campaigns, viral marketing, social network advertising
Graph-Based Clustering
• Collection of a wide range of very popular clustering algorithms
that are based on graph-theory.
• Organize information in large datasets to facilitate users
for faster access to required information.
Idea
• Objects are represented as nodes in a complete or connected graph.
• Assign a weight to each branch between the two nodes x and y.
The weight is defined by the distance d(x,y) between the nodes.
Clustering Distance between
clusters Distance between objects
Idea
minimal spanning tree
graph
clusters
Graph Based Clustering
Hierarchical method
(1) Determine a minimal spanning tree (MST)
(2) Delete branches iteratively
New connected components = Cluster
1
3
5
8
4
6
Minimal Spanning Trees
Minimal Spanning Tree
A minimal spanning tree of a connected graph G = (V,E)
is a connected subgraph with minimal weight
that contains all nodes of G and has no cycles.
1
3
5
8
4
6
a
1
3
5
8
4
6
c
d
b
a
c
d
b
minimal spanning tree graph G = (V, E)
Minimal spanning trees can be calculated with...
(1) Prim’s algorithm.
(2) Kruskal’s algorithm.
a
1
3
5
8
4
6
c
d
b
Example – Prims’s Algorithm
1
3
5
8
4
6
a
b
c
d
Set VT = {a}, ET = { }
1
3
5
8
4
6
a
b
c
d
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
VT = {a,b} and ET = { (a,b) }.
Example– Prims’s Algorithm
c
1
3
5
8
4
6
a
b
c
d
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
VT = {a,b,d} and ET = { (a,b), (a,d) }.
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
VT = {a,b,c,d} and ET = { (a,b), (a,d),(b,c) }.
c
1
3
5
8
4
6
a
c
d
b
Prim’s Algorithm
INPUT: Weighted graph G = (V, E), undirected + connected
OUTPUT: Minimal spanning tree T = (VT, ET) (1) Set VT = {v}, ET = { }, where v is an arbitrary node from V (starting point).
(2) REPEAT
(3) Choose an edge (a,b) with minimal weight, such that a ∈ VT and b ∉ VT.
(4) Set VT = VT ∪ {b} and ET = ET ∪ { (a,b) }.
(5) UNTIL VT = V
Kruskal’s Algorithm
INPUT: Weighted graph G = (V, E), undirected + connected
OUTPUT: Minimal spanning tree T = (VT, ET) (1) Set VT = V, ET = { }, H = E.
(2) Initialize a queue to contain all edges in G, using the weights in ascending order as keys.
(3) WHILE H ≠ { }
(4) Choose an edge e ∈ H with minimal weight.
(5) Set H = H \ {e}.
(6) If (VT, ET ∪ {e}) has no cycles, then ET = ET ∪ {e} .
(7) END
Branch Deletion
Delete Branches - Different Strategies
(1) Delete the branch with maximum weight.
(2) Delete inconsistent branches.
(3) Delete by analysis of weights.
(1) Delete the branch with maximum weight
• In each step, create two new clusters by deleting the branch with maximum weight.
• Repeat until the given number of clusters is reached.
2
2 6
3
4
2 2 2
2
2 6
3
4
2 2 2
Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2.
Minimum spanning tree
Example: Delete the branch with maximum weight
2
2 6
3
4
2 2 2
Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2.
Step 1: Delete branch (weight 6) ⇒ 2 clusters
Example: Delete the branch with maximum weight
2
2 6
3
4
2 2 2
Example: Delete the branch with maximum weight
Ordered weights of branches: 6, 4, 3, 2, 2, 2, 2, 2.
Step 1: Delete branch (weight 6) ⇒ 2 clusters Step 2: Delete branch (weight 4) ⇒ 3 clusters
(2) Delete inconsistent branches
• A branch e is inconsistent, if the corresponding weight de
is (much) larger than a reference value de .
• The reference value de can be defined by the average weight of all branches adjacent to e.
_
_
1
2 6
3 e de = 3 + 2 + 1 _________
3
_ = 2
de = 6 > 2 = de _
⇒ e inconsistent
(3) Delete by analysis of weights
• Perform an “analysis” of all weights of branches in the MST. Determine a threshold S.
• The threshold can be estimated by histograms on the weights of branches (= length of branches).
• Delete a branches, if the corresponding weight higher than the threshold S.
weight of branch (length of branch)
Num
ber
weight of branch
Num
ber
S
Exercise
Find a minimal spanning tree and provide a clustering of the graph by deleting all inconsistent branches.
10
f
a
b
c
d
e
g
2
12
4 1
3 20
8
5
9
15 6
Example
Set VT = {a}, ET = { } Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
Example
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
Example
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
Example
Choose an edge (x,y) with minimal weight such that x ∈ VT and y ∉ VT.
minimal spanning tree
Example
For each branch calculate the reference value
(average weight of adjacent branches)
f
a
b
c
d
e
g
2
4 1
3
5
6
(3)
(3)
(4.5)
(3.6)
(5)
(4)
Example
Delete inconsistent branches
(weight is larger than the reference value)
f
a
b
c
d
g 4 1
3
(3)
(3) (4)
e
2 clusters
Noise?
Summary
Summary
• In graph based clustering objects are represented as nodes in a complete or connected graph.
• The distance between two objects is given by the weight of the corresponding branch.
• Hierarchical method
(1) Determine a minimal spanning tree (MST)
(2) Delete branches iteratively
• Visualization of information in large datasets.
• V. Kumar, M. Steinbach, P.-N. Tan
Introduction to Data Mining.
Addison Wesley, 2005.
Literature
• J.A. Dunne, R.J. Williams, N.D. Martinez, R.A. Wood, D.H. Erwin Compilation and Network Analyses of Cambrian Food Webs.
PLoS Biol 6(4): e102. doi:10.1371/journal.pbio.0060102 • F. Schweitzer, G. Fagiolo, D. Sornette, F. Vega-Redondo, A. Vespignani, D.R. White
Economic Networks: The New Challenges.
Science 325, no. 5939 (July 24, 2009): 422-425.
Other work mentioned in the presentation
Thank you very much!