Semidefinite Programming Relaxations for Recovering …jx77/Jiaming-fudan.pdfTwo equal-sized...

Semidefinite Programming Relaxations for RecoveringHidden Communities

Jiaming Xu

Krannert School of ManagementPurdue University

Joint work with Bruce Hajek (Illinois) and Yihong Wu (Yale)

December 17, 2016

Community detection in networks

• Observe local pairwise interactions between objects, e.g., socialnetworks, biological networks ...

• Interested in global properties of objects, e.g., similarity


•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•


•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

Goal: identify communities of similar objects, related to clustering andgraph partitioning.

Jiaming Xu (Purdue) SDP for community detection 2

Stochastic block model [Holland-Laskey-Leinhardt ’83]

Planted partition model [Condon-Karp 01’]




p = 0.8

q = 0.09




p = 0.8 q = 0.09


SBM - adjacency matrix view

p

p

p

q

q

• n: total number of nodes

• k: number of communities

• p: within-community edge prob. q: across-community edge prob.


Exact recovery

C∗ −→ A −→ C

• Goal: exact recovery

PC = C∗ n→∞−−−→ 1

• AlternativesI almost exact recovery:

[Mossel-Neeman-Sly ’14, Abbe-Sandon ’15, Montanari ’15,Zhang-Zhou’15, Yun-Proutiere ’15]...

I correlated recovery:[Decelle-Krzakala-Moore-Zdeborova ’11, Mossel-Neeman-Sly ’12 ’13,Massoulie ’13]...


Objectives of this talk

Exact recovery:

PC = C∗ n→∞−−−→ 1

• Information limit: When is exact recovery possible (impossible)?

• Is the information limit achievable in polynomial time, e.g., viasemidefinite programming?


Remainder of the talk

1 Two equal-sized communities

2 Multiple equal-sized communities

3 Conclusions


Two equal-sized communities: Binary symmetric SBM

Model:

• n nodes partitioned into two communities of size n2 (σ∗i = ±1).

• i ∼ j independently w.p.

p = a logn

n σ∗i = σ∗jq = b logn

n σ∗i 6= σ∗j

Remarks

• a+ b > 2 is the connectivity threshold and necessary for exactrecovery


Two equal-sized communities: Binary symmetric SBM

Model:

• n nodes partitioned into two communities of size n2 (σ∗i = ±1).

• i ∼ j independently w.p.

p = a logn

n σ∗i = σ∗jq = b logn

n σ∗i 6= σ∗jRemarks

• a+ b > 2 is the connectivity threshold and necessary for exactrecovery


Two equal-sized communities: MLE ⇒ SDP relaxation

• Maximum likelihood estimator (MLE): Assume p ≥ q

maxσ〈A, σσ>〉 → # of in-cluster edges

s.t. σi ∈ ±1 i ∈ [n]

σ>1 = 0

lift: Y=σσ>========⇒ max

Y〈A, Y 〉

s.t.

Yii = 1 i ∈ [n]

〈J, Y 〉 = 0

• Goal: P

YSDP =

−1

−11

1

→ 1





s.t. σi ∈ ±1 i ∈ [n]

σ>1 = 0


Y〈A, Y 〉

s.t. rank(Y ) = 1

Yii = 1 i ∈ [n]

〈J, Y 〉 = 0

• Goal: P

YSDP =

−1

−11

1

→ 1





s.t. σi ∈ ±1 i ∈ [n]

σ>1 = 0


Y〈A, Y 〉

s.t. Y 0

Yii = 1 i ∈ [n]

〈J, Y 〉 = 0

• Goal: P

YSDP =

−1

−11

1

→ 1


Two equal-sized communities: Optimal recovery via SDP

Theorem (Abbe-Bandeira-Hall ’14, Mossel-Neeman-Sly ’14)

For two equal-sized communities with p = a log n/n and q = b log n/n:

• If (√a−√b)2 > 2, recovery is achievable in polynomial-time.

• If (√a−√b)2 < 2, recovery is impossible.

Theorem (Hajek-Wu-X. ’14)

SDP achieves the optimal recovery threshold (√a−√b)2 > 2.

Remarks

• originally conjectured in [Abbe-Bandeira-Hall ’14]

• independently proved by [Bandeira ’15]

• P

YSDP =

−1

−11

1

= 1− n−Ω(1)


Two equal-sized communities: Dual certificate argument

YSDP = arg maxY〈A, Y 〉

dual variables

s.t. Y 0

S 0

Yii = 1

D = diag di

〈J, Y 〉 = 0

λ ∈ R

• di = (# of nbrs in own cluster)− (# of nbrs in other cluster)∼ Binom(n/2− 1, p)− Binom(n/2, q)

• S = D −A+ λJ 0 if λ ≥ (p+ q)/2 and min di ≥ ‖A− E [A] ‖• min di = ΩP (log n) if

√a−√b >√

2

• ‖A− E [A] ‖ = OP (√

log n): 2nd-order stochastic dominance[Tomozei-Massoulie ’14] + result for iid matrix [Seginer ’00]



YSDP = arg maxY〈A, Y 〉 dual variables

s.t. Y 0 S 0

Yii = 1 D = diag di〈J, Y 〉 = 0 λ ∈ R



√a−√b >√

2

• ‖A− E [A] ‖ = OP (√





s.t. Y 0 S 0

Yii = 1 D = diag di〈J, Y 〉 = 0 λ ∈ R


• S = D −A+ λJ 0 if λ ≥ (p+ q)/2 and min di ≥ ‖A− E [A] ‖

• min di = ΩP (log n) if√a−√b >√

2

• ‖A− E [A] ‖ = OP (√





s.t. Y 0 S 0

Yii = 1 D = diag di〈J, Y 〉 = 0 λ ∈ R



√a−√b >√

2

• ‖A− E [A] ‖ = OP (√



k equal-sized communities: MLE ⇒ SDP relaxation

max

k∑`=1

〈A,θ`θ>` 〉

max 〈A,Z〉

s.t. θ` ∈ 0, 1n

lift: Z=∑k

`=1 θ`θ>`⇐===========⇒ s.t.

〈θ`,1〉 = n/k

Zii = 1 ∀i ∈ [n]

〈θ`,θ`′〉 = 0, ` 6= `′

Zij ≥ 0,∑j

Zij = n/k

Goal: P

ZSDP =

11

11

0

0

→ 1



max

k∑`=1

〈A,θ`θ>` 〉 max 〈A,Z〉

s.t. θ` ∈ 0, 1nlift: Z=

∑k`=1 θ`θ

>`⇐===========⇒ s.t. rank(Z) = k

〈θ`,1〉 = n/k Zii = 1 ∀i ∈ [n]

〈θ`,θ`′〉 = 0, ` 6= `′ Zij ≥ 0,∑j

Zij = n/k

Goal: P

ZSDP =

11

11

0

0

→ 1



max

k∑`=1

〈A,θ`θ>` 〉 max 〈A,Z〉

s.t. θ` ∈ 0, 1nlift: Z=

∑k`=1 θ`θ

>`⇐===========⇒ s.t. Z 0

〈θ`,1〉 = n/k Zii = 1 ∀i ∈ [n]

〈θ`,θ`′〉 = 0, ` 6= `′ Zij ≥ 0,∑j

Zij = n/k

Goal: P

ZSDP =

11

11

0

0

→ 1


k equal-sized communities: optimal recovery via SDP

Theorem (Hajek-Wu-X. ’15)

For a fixed k communities with p = a log n/n and q = b log n/n.

• If√a−√b >√k, exact recovery is attained via SDP in poly-time.

• If√a−√b <√k, exact recovery is impossible.

Remarks

• Extended to k = o(log n) in [Agarwal-Bandeira-Koiliaris-Kolla ’15]

• Extended to the case with multiple unequal-sized clusters[Perry-Wein ’15]

• Heterogeneous setting: [Yun-Proutiere ’14] and [Abbe-Sandon ’15]


When does SDP cease to be optimal?

Theorem (Hajek-Wu-X. ’COLT16)

• If k log n, SDP achieves the optimal exact recovery threshold.

• If k ≥ c log n, SDP is suboptimal by a constant factor.

• If k log n, SDP is order-suboptimal.

Remarks

• A “hard but informationally possible“ regime is conjectured to existfor exact recovery when k log n [Chen-X. ’14]


Concluding remarks

1

12/3

p = cq = Θ(n−α)

s = Θ(nβ)

1/2

impossible

easy

1/2hard

spectral condition

O α

β


References

• B. Hajek, Y. Wu & J. X. Achieving exact cluster recovery threshold viasemidefinite programming. (Transactions on IT ’16)

• B. Hajek, Y. Wu & J. X. Achieving exact cluster recovery threshold viasemidefinite programming: Extensions. (Transactions on IT ’16)

• B. Hajek, Y. Wu & J. X. Semidefinite programs for exact recovery of a

hidden community. (COLT’16)

SDP in real networks

• Y. Chen, X. Li, and J. X. (2015), Convexified modularity maximization fordegree-corrected stochastic block models. arXiv:1512.08425.

• Code available at http://people.orie.cornell.edu/yudong.chen/cmm


Semidefinite Programming Relaxations for Recovering …jx77/Jiaming-fudan.pdfTwo equal-sized...

Documents

Transcript of Semidefinite Programming Relaxations for Recovering …jx77/Jiaming-fudan.pdfTwo equal-sized...