Community structure in complex networks
-
Upload
vincent-traag -
Category
Science
-
view
219 -
download
3
description
Transcript of Community structure in complex networks
Community structure in complex networks
V.A. Traag
KITLV, Leiden, the Netherlandse-Humanities, KNAW, Amsterdam, the Netherlands
February 21, 2014
eRoyal Netherlands Academy of Arts and SciencesHumanities
Overview
1 What are communities in networks? How do we find them?
2 Where are those small communities?
3 When are communities significant?
4 What should I remember? And what’s next?
Part IWhat are communities?How do we find them?
What is a community?
• Everybody has an intuitive idea.
• Yet no single agreed upon definition.
• Common core:
Groups of nodes that areI relatively densely connected within, andI relatively sparsely connected between.
General community detection
• Reward links inside community,weight aij
• Punish missing links insidecommunity, weight bij .
• General quality function
H =∑ij
(Aijaij − (1−Aij)bij)δ(σi , σj).
0
12
3
4
56
7
8
9
10
11
General community detection
• Reward links inside community,weight aij
• Punish missing links insidecommunity, weight bij .
• General quality function
H =∑ij
(Aijaij − (1−Aij)bij)δ(σi , σj).
0
12
3
4
56
7
8
9
10
11
General community detection
• Reward links inside community,weight aij
• Punish missing links insidecommunity, weight bij .
• General quality function
H =∑ij
(Aijaij − (1−Aij)bij)δ(σi , σj).
0
12
3
4
56
7
8
9
10
11
Different weights
No a-priori constraints on weights aij , bij .
Model aij bijReichardt & Bornholdt 1− bij γpijArenas, Fernandez & Gomez 1− bij pij(γ)− γδijRonhovde & Nussinov 1 γConstant Potts Model 1− γ γ
Modularity
• Null-model pij , constraint:∑
ij pij = 2m.
• Popular null-model, configuration model pij =kikj2m .
• With γ = 1, leads to modularity:
Q =∑ij
(Aij −
kikj2m
)δ(σi , σj).
• As sum over communities:
Q =∑c
(ec − 〈ec〉).
Optimising modularity
Initial communities
0
12
3
4
56
7
8
9
10
11
Optimising modularity
Initial communities
Move 0
0
12
3
4
56
7
8
9
10
11
Optimising modularity
Initial communities
Move 0
Move 5
0
12
3
4
56
7
8
9
10
11
Optimising modularity
Initial communities
Move 0
Move 5
Move 11
0
12
3
4
56
7
8
9
10
11
Optimising modularity
Initial communities
Move 0
Move 5
Move 11
0
12
3
4
56
7
8
9
10
11
No more improvement
Optimising modularity
Initial communities
Move 0
Move 5
Move 11
0
1
2
1
1 1
3 6
5
Aggregate graph, andrepeat same procedure.
Optimising modularity
Initial communities
Move 0
Move 5
Move 11
0
1
2
1
1 1
3 6
5
Aggregate graph, andrepeat same procedure.
Louvain algorithm
1 Move node i to best (greedy) community.
2 Repeat (1) until no more improvement.
3 Contract graph (communities → nodes).
4 Repeat (1)-(3) until no more improvement.
Part IIWhere are those small
communities?
Resolution limit
• Modularity might miss ‘small’communities.
• Merge two cliques in ring of cliqueswhen
γRB <q
nc(nc − 1) + 2.
• Number of communities scales as√γRBm.
• For general null model, problemremains since
∑ij pij = 2m.
Resolution limit
• Modularity might miss ‘small’communities.
• Merge two cliques in ring of cliqueswhen
γRB <q
nc(nc − 1) + 2.
• Number of communities scales as√γRBm.
• For general null model, problemremains since
∑ij pij = 2m.
Resolution-limit-free
• Ronhovde & Nussinov model (aij = 1, bij = γ).
• Claim: resolution-limit-free, as merge depends only on ‘local’variables
γRN <1
n2c − 1.
• But, take pij = kikj , we obtain
γRB <1
2(nc(nc − 1) + 2)2,
also only ‘local’ variables. Hence, also resolution-limit-free?
• Problems of scale remain.
Resolution limit
Resolution limit
Resolution limit
Resolution limit
Resolution-limit-free
Defining resolution-limit-free
Definition (Resolution-limit-free)
Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.
Theorem (Swap optimal subpartitions)
If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.
Defining resolution-limit-free
Definition (Resolution-limit-free)
Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.
Theorem (Swap optimal subpartitions)
If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.
What methods areresolution-limit-free?
Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.
Resolution-limit-free methods
• RN and CPM can be easily proven resolution-limit-free.
• What about other weights aij and bij?
Definition (Local weights)
Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).
Theorem (Local weights ⇒ resolution-limit-free)
Objective function H is resolution-limit-free if weights are local.
Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.
Local methods areresolution-limit-free.
Part IIWhen are communities
significant?
Modularity in non-modular graphs
Modularity as sign of community structure
• Modularity −1 ≤ Q ≤ 1.
• High modularity ⇒ community structure?
• Modularity higher than 0.3 seen as significant.
Modularity in non-modular graphs
Modularity as sign of community structure
• Modularity −1 ≤ Q ≤ 1.
• High modularity ⇒ community structure?
• Modularity higher than 0.3 seen as significant.
Many graphs have high modularity,but no community structure.
Modularity without community structure
Q = 0.31
Modularity Q 6≈ 0 for random graphs.
Significance
How significant is a partition?
Significance
E = 14
E = 9
Fixed partition
E = 11
Better partition
Significance
E = 14
E = 9
Fixed partition
E = 11
Better partition
• Not: Probability to find E edges in partition.
• But: Probability to find partition with E edges.
Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)
]
Significance
• Probability for all communities Pr(σ) ≈∏c
exp[−n2cD(pc ‖ p)
].
• Significance S(σ) = − log Pr(σ) =∑c
n2cD(pc ‖ p).
Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)
]
Significance
• Probability for all communities Pr(σ) ≈∏c
exp[−n2cD(pc ‖ p)
].
• Significance S(σ) = − log Pr(σ) =∑c
n2cD(pc ‖ p).
Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)
]
Significance
• Probability for all communities Pr(σ) ≈∏c
exp[−n2cD(pc ‖ p)
].
• Significance S(σ) = − log Pr(σ) =∑c
n2cD(pc ‖ p).
Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)
]
Significance
• Probability for all communities Pr(σ) ≈∏c
exp[−n2cD(pc ‖ p)
].
• Significance S(σ) = − log Pr(σ) =∑c
n2cD(pc ‖ p).
Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)
]
Significance
• Probability for all communities Pr(σ) ≈∏c
exp[−n2cD(pc ‖ p)
].
• Significance S(σ) = − log Pr(σ) =∑c
n2cD(pc ‖ p).
Subgraph probability
Decompose partition
• Probability to find partition with E edges.
• Probability to find communities with ec edges.
• Asymptotic estimate
• Probability for subgraph of nc nodes with density pc
Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)
]
Significance
• Probability for all communities Pr(σ) ≈∏c
exp[−n2cD(pc ‖ p)
].
• Significance S(σ) = − log Pr(σ) =∑c
n2cD(pc ‖ p).
Significance
10−3 10−2 10−1 100103
104
105
106
γ
N E
Significance
10−3 10−2 10−1 100103
104
105
106
γ
N E S
Final ChapterWhat should I remember? And
what’s next?
Conclusions
To remember
• Modularity can hide small communities.
• Local methods avoid this problem (RN, CPM).
• High modularity 6⇒ significant: use significance.
What’s next?
• Various measures of significance: what’s the difference?
• Choose “correct” resolution ⇒ resolution limit?
Thank you!Questions?
Traag, Van Dooren & NesterovNarrow scope for resolution-limit-free community detectionPhys Rev E 84, 016114 (2011)
Traag, Krings & Van DoorenSignificant scales in community structureSci Rep 3, 2930 (2013)
Reichardt & BornholdtStatistical mechanics of community detection.Phys Rev E 74, 016110 (2006)
m www.traag.net B [email protected] @vtraag