Community Structure in Large Complex Networks

26
Community Structure in Large Complex Networks Liaoruo Wang and John E. Hopcroft Dept. of Computer Engineering & Computer Science, Cornell University In Proc. 7th Annual Conference on Theory and Applications of Models of Computation (TAMC), June 2010 Presented by Nam Nguyen

description

Community Structure in Large Complex Networks. Liaoruo Wang and John E. Hopcroft Dept. of Computer Engineering & Computer Science, Cornell University In Proc. 7th Annual Conference on Theory and Applications of Models of Computation (TAMC) , June 2010 Presented by Nam Nguyen. Agenda. - PowerPoint PPT Presentation

Transcript of Community Structure in Large Complex Networks

Page 1: Community Structure in Large Complex Networks

Community Structure in Large Complex Networks

Liaoruo Wang and John E. HopcroftDept. of Computer Engineering & Computer Science, Cornell University

In Proc. 7th Annual Conference on Theory and Applications of Models of Computation (TAMC), June 2010

Presented by Nam Nguyen

Page 2: Community Structure in Large Complex Networks

Motivation Introduction Contributions of the paper Definitions WHISKER is NP-Complete. Algorithms.

Agenda

Page 3: Community Structure in Large Complex Networks

C.S is a classical but still-hot topic in complex networks.

Previous studies: Communities were assumed to be densely connected inside but sparsely connected outside.

A different point of view: We should disregard “whiskers” and elaborate “cores” in the networks.

Motivation

Page 4: Community Structure in Large Complex Networks

Roughly speaking◦ Whiskers: Subsets of vertices that are barely connected

to the rest of the network.◦ Cores: Connected subgraphs that are densely connected

inside and well-connected to the rest of the network, i.e., “real communities”

Why???◦ For real-world societies, communities are also well

connected to the rest of the network.◦ Imagine a close-nit community, CISE Dept., with only one

connection with the outer world. Definitions come right away.

Introduction

Page 5: Community Structure in Large Complex Networks

More concrete definitions of “whiskers” and “cores” in a networks.

WHISKER is NP-Complete Three heuristic algorithms for finding

approximate cores. Simulation results.

Contributions

Page 6: Community Structure in Large Complex Networks

Graph G = (V,E) undirected, A = (Ai,j). For S⊆V, let SC = V\S.

Conduction of S

where A suitable cut

Definition

Page 7: Community Structure in Large Complex Networks

A k-whisker

A maximal k-whisker

Definition(cont’d)

Page 8: Community Structure in Large Complex Networks

A whisker

A maximal whisker

Definition (cont’d)

Page 9: Community Structure in Large Complex Networks

A core

Definition (cont’d)

Page 10: Community Structure in Large Complex Networks

Lemmas

Proof

The only suitable cut of size = 26

|S ⋃ T| = 25

>

Page 11: Community Structure in Large Complex Networks

Lemmas (cont’d)

Proof

(1a) exr + exz + eyr + eyz ≤ vx + vy(1b) eyr + exy + ezr + exz ≤ vy + vz

(1c) exr + eyr + ezr > vx + vz

(1a) + (1b) and use (1c) givesexr+2eyr+ezr+exy+eyz+2exz ≤ vx+2vy+vz < exr+eyr+ezr+vy

eyr + exy + eyz < vy

Page 12: Community Structure in Large Complex Networks

NAE-3-SAT: The problem of determining whether there exists a truth assignment for a 3-CNF Boolean formula such that each clause has at least one true literal and at least one false literal.

Fact: NAE-3-SAT is NP-Complete [1]

WHISKER: Given an unweighted undirected graph, determine whether there exists a whisker or not.

WHISKER is NP-Complete(of course, from a reduction from NAE-3-SAT)

NP-Completeness

Page 13: Community Structure in Large Complex Networks

Road map◦ 1. Construct a special graph G of 2n

vertices and show that G admits 2n whiskers and no more.

◦ 2. Construct a G-like graph for the 3-SAT problem.

◦ 3. Make a reduction from NAE-3-SAT problem to WHISKER

WHISKER is NP-Complete

Page 14: Community Structure in Large Complex Networks

WHISKER is in NP Reduction from NAE-3-SAT to WHISKER

◦ Consider the following graph (constructed in poly time) At each row, pick only one vertex (i.e., either xi or ¬xi) The resulted graph G of n vertices is a whisker Total number of whiskers is 2n ………… And no more than that

NP-Completeness

Page 15: Community Structure in Large Complex Networks

2n whiskers and no more than that!!! Why???

Suppose there is a whisker W of 2k+j vertices

Cut size of W

By definition of suitable cut size, we have

which implies !!!!

NP-Complete

Page 16: Community Structure in Large Complex Networks

NAE-3-SAT ≤P WHISKER Consider an instance of NAE-3-SAT with n

variables and c clauses. Construct G1, G2, …, Gc as follow

NP-Complete

Page 17: Community Structure in Large Complex Networks

NAE-3-SAT ≤P WHISKER Now, combine all Gi’s and add up all edge weights to get G’.

Next

NP-Complete

G G

G’ G’G*3CNF has a satisfied

assignment contains a whisker

update

update

Page 18: Community Structure in Large Complex Networks

Update G ( )

Update G’◦ Amplify all edge weights of G’ by a small amount δ where cn2δ << 1

All whiskers in new G are the same as in old G.

NP-Complete

Page 19: Community Structure in Large Complex Networks

G* = G + G’

Goal: If the 3CNF instance has a satisfied truth assignment, then selecting true literal from each row of G* gives us a whisker of size n, and vice versa.

For any truth assignment of 3SAT, rearrange the literals in to TRUE and FALSE columns.

If there is a satisfied not-all-equal assignment for 3SAT◦ Each clause must have one TRUE and one FALSE literals.◦ Not all the literals in each clause can be in the same column.◦ For each ith clause, Gi contains n2-2 edges connecting its two columns◦ Total cut size is required to satisfied

NP-Complete

Page 20: Community Structure in Large Complex Networks

If there is NO satisfied not-all-equal assignment for 3SAT◦ At least one clause i has its literals located in the same column n2

edges between the two columns of Gi.◦ For the other (c-1) clauses, there are at most (n2-2) edges connecting the

their two columns. Total number of edges: (c-1)(n2-2)+n2 = cn2–2c+2.◦ Of course, we don’t want selecting the true literal in each row give us a

whisker, thus

Combining the two inequalities, if ℇ and δ is chosen such that

Then If the 3CNF instance has a satisfied truth assignment, then selecting true literal from each row of G* gives us a whisker of size n, and vice versa.

◦ Hence, NAE-3-CNF ≤P WHISKER □

NP-Complete

Page 21: Community Structure in Large Complex Networks

Heuristic Algorithms

Page 22: Community Structure in Large Complex Networks

On random graph

◦ Alg 2 can positively find an approximate core◦ Alg 3 fails to find approximate core◦ The size of core growing linearly with d = np (fixed n) and

logarithmically with n (fixed d)◦ ??? G(n,p) displays core structure with high probability when p > 1/n ???

Results

Page 23: Community Structure in Large Complex Networks

Textual graph◦ Vertices and Edges: Words and their semantic Correlations◦ Data is crawled from 10K scientific papers of KDD conf. (1992-2003)◦ Pointwise mutual information

◦ Total: 685 vertices and 6.432 edges

Results

Page 24: Community Structure in Large Complex Networks

Both alg 2 and 3 successfully find approximate cores. Higher values of λ indicate smaller core sizes. Fig (b), the best community of the textual graph has a large

conductance of .3 best community has as many internal edges as cut edges.

Alg 3 is believed to be more useful.

Results

Page 25: Community Structure in Large Complex Networks

Is a “whisker” make sense?

Comment

Page 26: Community Structure in Large Complex Networks

[1] Schaefer, T. J. The complexity of satisfiability problems. In Proc. 10th Ann. ACM Symp. on Theory of Computing (1978), Association for Computing Machinery, pp. 216-226.

Reference