arXiv:2111.01392v1 [cs.SI] 2 Nov 2021

arX

iv:2

111.

0139

2v1

[cs

.SI]

2 N

ov 2

021

Overlapping and nonoverlapping models


Huan Qing [email protected]

School of Mathematics

China University of Mining and Technology

Xuzhou, 221116, P.R. China

Abstract

Consider a directed network with Kr row communities and Kc column communities. Pre-vious works found that modeling directed networks in which all nodes have overlappingproperty requires Kr = Kc for identifiability. In this paper, we propose an overlappingand nonoverlapping model to study directed networks in which row nodes have overlappingproperty while column nodes do not. The proposed model is identifiable when Kr ≤ Kc.Meanwhile, we provide one identifiable model as extension of ONM to model directed net-works with variation in node degree. Two spectral algorithms with theoretical guaranteeon consistent estimations are designed to fit the models. A small scale of numerical studiesare used to illustrate the algorithms.

Keywords: Community detection, directed networks, spectral clustering, asymptoticanalysis, SVD.

1. Introduction

In the study of social networks, various models have been proposed to learn the latentstructure of networks. Due to the extremely intensive studies on community detection, weonly focus on identifiable models that are closely relevant to our study in this paper. Forundirected network, the Stochastic Blockmodel (SBM) (Holland et al., 1983) is a classicaland widely used model to generate undirected networks. The degree-corrected stochas-tic blockmodel (DCSBM) Karrer and Newman (2011) extends SBM by introducing degreeheterogeneities. Under SBM and DCSBM, all nodes are pure such that each node onlybelong to one community. While, in real cases some nodes may belong to multiple commu-nities, and such nodes have overlapping (also known as mixed membership) property. Tomodel undirected networks in which nodes have overlapping property, Airoldi et al. (2008)designs the Mixed Membership Stochastic Blockmodel (MMSB). Jin et al. (2017) intro-duces the degree-corrected mixed membership model (DCMM) which extends MMSB byconsidering degree heterogeneities. Zhang et al. (2020) designs the OCCAM model whichequals DCMM actually. Spectral methods with consistent estimations under the abovemodels are provided in Rohe et al. (2011); Qin and Rohe (2013); Lei and Rinaldo (2015);Joseph and Yu (2016); Jin (2015); Jin et al. (2017); Mao et al. (2020, 2018). For directednetworks in which all nodes have nonoverlapping property, Rohe et al. (2016) proposes amodel called Stochastic co-Blockmodel (ScBM) and its extension DCScBM by consideringdegree heterogeneity, where ScBM (DCScBM) is extension of SBM (DCSBM). ScBM andDCScBM can model nonoverlapping directed networks in which row nodes belong to Kr

1

http://arxiv.org/abs/2111.01392v1

Qing

row communities and column nodes belong to Kc column communities, where Kr can dif-fer from Kc. Zhou and A.Amini (2019); Qing and Wang (2021b) study the consistency ofsome adjacency-based spectral algorithms under ScBM. Wang et al. (2020) studies the con-sistency of the spectral method D-SCORE under DCScBM when Kr = Kc. Qing and Wang(2021a) designs directed mixed membership stochastic blockmodel (DiMMSB) as an exten-sion of ScBM and MMSB to model directed networks in which all nodes have overlappingproperty. Meanwhile, DiMMSB can also be seen as an extension of the two-way blockmod-els with Bernoulli distribution of Airoldi et al. (2013). All the above models are identifiableunder certain conditions. The identifiability of ScBM and DCScBM holds even for thecase when Kr 6= Kc. DiMMSB is identifiable only when Kr = Kc. Sure, SBM, DCSBM,MMSB, DCMM and OCCAM are identifiable when Kr = Kc since they model undirectednetworks. For all the above models, row nodes and column nodes have symmetric structuralinformation such that they always have nonoverlapping property or overlapping propertysimultaneously. As shown by the identifiability of DiMMSB, to model a directed network inwhich all nodes have overlapping property, the identifiability of the model requiresKr = Kc.Naturally, there is a bridge model from ScBM to DiMMSB such that the bride model canmodel a directed network in which row nodes and column nodes have asymmetric structuralinformation such that they have different overlapping property. In this paper, we introducethis model and name it as overlapping and nonoverlapping model.

Our contributions in this paper are as follows. We propose an identifiable model fordirected networks, the overlapping and nonoverlapping model (ONM for short). ONMallows that nodes in a directed network can have different overlapping property. Withoutloss of generality, in a directed network, we let row nodes have overlapping property whilecolumn nodes do not. The proposed model is identifiable when Kr ≤ Kc. Recall thatthe identifiability of ScBM modeling nonoverlapping directed networks holds even for thecase Kr 6= Kc, and DiMMSB modeling overlapping directed networks is identifiable onlywhen Kr = Kc, this is the reason we call ONM modeling directed networks in which rownodes have different overlapping property as column nodes as a bridge model from ScBM toDiMMSB. Similar as DCScBM is an extension of ScBM, we propose an identifiable modeloverlapping and degree-corrected nonoverlapping model (ODCNM) as extension of ONMby considering degree heterogeneity. We construct two spectral algorithms to fit ONM andODCNM. We show that our method enjoy consistent estimations under mild conditions bydelicate spectral analysis. Especially, our theoretical results under ODCNM match thoseunder ONM when ODCNM degenerates to ONM.

Notations. We take the following general notations in this paper. For any positiveinteger m, let [m] := 1, 2, . . . ,m. For a vector x and fixed q > 0, ‖x‖q denotes its lq-norm.For a matrix M , M ′ denotes the transpose of the matrix M , ‖M‖ denotes the spectralnorm, ‖M‖F denotes the Frobenius norm, and ‖M‖2→∞ denotes the maximum l2-norm ofall the rows of M . Let σi(M) be the i-th largest singular value of matrix M , and λi(M)denote the i-th largest eigenvalue of the matrix M ordered by the magnitude. M(i, :) andM(:, j) denote the i-th row and the j-th column of matrix M , respectively. M(Sr, :) andM(:, Sc) denote the rows and columns in the index sets Sr and Sc of matrix M , respectively.For any matrix M , we simply use Y = max(0,M) to represent Yij = max(0,Mij) for anyi, j. For any matrix M ∈ R

m×m, let diag(M) be the m × m diagonal matrix whose i-th

2


diagonal entry is M(i, i). 1 is a column vector with all entries being ones. ei is a columnvector whose i-th entry is 1 while other entries are zero.

2. The overlapping and nonoverlapping model

Consider a directed network N = (Vr, Vc, E), where Vr = 1, 2, . . . , nr is the set of rownodes, Vc = 1, 2, . . . , nc is the set of column nodes, and E is the set of edges from rownodes to column nodes. Note that since row nodes can be different from column nodes,we may have Vr ∩ Vc = ∅, where ∅ denotes the null set. In this paper, we use subscriptr and c to distinguish terms for row nodes and column nodes. Let A ∈ 0, 1nr×nc be thebi-adjacency matrix of directed network N such that A(ir, ic) = 1 if there is a directionaledge from row node ir to column node ic, and A(ir, ic) = 0 otherwise.

We propose a new block model which we call overlapping and nonoverlapping model(ONM for short). ONM can model directed networks whose row nodes belong to Kr

overlapping row communities while column nodes belong to Kc nonoverlapping columncommunities.

For row nodes, let Πr ∈ Rnr×Kr be the membership matrix of row nodes such that

Πr(ir, ) ≥ 0, ‖Πr(ir, :)‖1 = 1 for ir ∈ [nr]. (1)

Call row node ir pure if Πr(ir, :) degenerates (i.e., one entry is 1, all others Kr − 1 entriesare 0) and mixed otherwise. From such definition, row node ir has mixed membership andmay belong to more than one row communities for ir ∈ [nr].

For column nodes, let ℓ be the nc × 1 vector whose ic-th entry ℓ(ic) = k if columnnode ic belongs to the k-th column community, and ℓ(ic) takes value from 1, 2, . . . ,Kc foric ∈ [nc]. Let Πc ∈ R

nc×Kc be the membership matrix of column nodes such that

Πc(ic, k) = 1 when ℓ(ic) = k, and 0 otherwise, and ‖Πc(ic, :)‖1 = 1 for ic ∈ [nc], k ∈ [Kc].(2)

From such definition, column node ic belongs to exactly one of the Kc column communitiesfor ic ∈ [nc]. Sure, all column nodes are pure nodes.

In this paper, we assume that

Kr ≤ Kc. (3)

Eq (3) is required for the identifiability of ONM.Let P ∈ R

Kr×Kc be the probability matrix (also known as connectivity matrix) suchthat

0 ≤ P (k, l) ≤ ρ ≤ 1 for k ∈ [Kr], l ∈ [Kc], (4)

where ρ controls the network sparsity and is called sparsity parameter in this paper. For con-venience, set P = ρP where P (k, l) ∈ [0, 1] for k ∈ [Kr], l ∈ [Kc], and maxk∈[Kr],l∈[Kc]P (k, l) =1 for model identifiability. For all pairs of (ir, ic) with ir ∈ [nr], ic ∈ [nc], our model assumesthat A(ir, ic) are independent Bernoulli random variables satisfying

Ω := ΠrPΠ′c, A(ir, ic) ∼ Bernoulli(Ω(ir, ic)) for ir ∈ [nr], ic ∈ [nc], (5)

where Ω = E[A] , and we call it population adjacency matrix in this paper.

3

Qing

Definition 1 Call model (1)-(5) the Overlapping and Nonoverlapping model (ONM) anddenote it by ONMnr ,nc(Kr,Kc, P,Πr,Πc).

The following conditions are sufficient for the identifiability of ONM:

• (I1) rank(P ) = Kr, rank(Πr) = Kr and rank(Πc) = Kc.

• (I2) There is at least one pure row node for each of the Kr row communities.

Here, rank(Πr) = Kr means that∑nr

ir=1(Πr(ir, k)) > 0 for all k ∈ [Kr]; rank(Πc) = Kc

means that each column community has at least one column node. For k ∈ [Kr], let

I(k)r = i ∈ 1, 2, . . . , nr : Πr(i, k) = 1. By condition (I2), I(k)r is non empty for all

k ∈ [Kr]. For k ∈ [Kr], select one row node from I(k)r to construct the index set Ir, i.e., Iris the indices of row nodes corresponding to Kr pure row nodes, one from each community.W.L.O.G., let Πr(Ir, :) = IKr (Lemma 2.1 Mao et al. (2020) also has similar setting todesign their spectral algorithms under MMSB.). Ic is defined similarly for column nodessuch that Πc(Ic, :) = IKc. Next proposition guarantees that once conditions (I1) and (I2)hold, ONM is identifiable.

Proposition 2 If conditions (I1) and (I2) hold, ONM is identifiable: For eligible (P,Πr,Πc)and (P , Πr, Πc), if ΠrPΠ′

c = ΠrP Π′c, then P = P ,Πr = Πr, and Πc = Πc.

Compared to some previous models for directed networks, ONM models different directednetworks.

• When all row nodes are pure, our ONM reduces to ScBM with Kr row clusters andKc column clusters Rohe et al. (2016). However, ONM allows row nodes to haveoverlapping memberships while ScBM does not. Meanwhile, for model identifiability,ScBM does not require rank(P ) = Kr while ONM requires, and this can be seen asthe cost of ONM when modeling overlapping row nodes.

• Though DiMMSB Qing and Wang (2021a) can model directed networks whose rowand column nodes have overlapping memberships, DiMMSB requires Kr = Kc formodel identifiability. For comparison, our ONM allows Kr ≤ Kc at the cost of losingoverlapping property of column nodes.

2.1 A spectral algorithm for fitting ONM

The primary goal of the proposed algorithm is to estimate the row membership matrix Πr

and column membership matrix Πc from the observed adjacency matrix A with given Kr

and Kc.We now discuss our intuition for the design of our algorithm to fit ONM. Under con-

ditions (I1) and (I2), by basic algebra, we have rank(Ω) = Kr. Let Ω = UrΛU′c be the

compact singular value decomposition of Ω, where Ur ∈ Rnr×Kr ,Λ ∈ R

Kr×Kr , Uc ∈ Rnc×Kr ,

U ′rUr = IKr , U

′cUc = IKr , and IKr is a Kr ×Kr identity matrix. Let nc,k = |ic : ℓ(ic) = k|

be the size of the k-th column community for k ∈ [Kc]. Let nc,max = maxk∈[Kc]nc,k andnc,min = mink∈[Kc]nc,k. Meanwhile, without causing confusion, let nc,Kr be theKr-th largestsize among all column communities. The following lemma guarantees that Ur enjoys idealsimplex structure and Uc has Kc distinct rows.

4


Lemma 3 Under ONMnr ,nc(Kr,Kc, P,Πr,Πc), there exist an unique Kr ×Kr matrix Br

and an unique Kc ×Kr matrix Bc such that

• Ur = ΠrBr where Br = Ur(Ir, :). Meanwhile, Ur(ir, :) = Ur (ir, :) when Πr(ir, :) =Πr (ir, :) for ir, ir ∈ [nr].

• Uc = ΠcBc. Meanwhile, Uc(ic, :) = Uc(ic, :) when ℓ(ic) = ℓ(ic) for ic, ic ∈ [nc], i.e., Uc

has Kc distinct rows. Furthermore, when Kr = Kc = K, we have ‖Bc(k, :) − Bc(l, :

)‖F =√

1nc,k

+ 1nc,l

for all 1 ≤ k < l ≤ K.

Lemma 3 says that the rows of Uc form a Kr-simplex in RKr which we call the Ideal Simplex

(IS), with the Kr rows of Br being the vertices. Such IS is also found in Jin et al. (2017);Mao et al. (2020); Qing and Wang (2021a). Meanwhile, Lemma 3 says that Uc has Kc

distinct rows, and if two column nodes ic and ic are from the same column community, thenUc(ic, :) = Uc(ic, :).

Under ONM, to recover Πc from Uc, since Uc has Kc distinct rows, applying k-meansalgorithm on all rows of Uc returns true column communities by Lemma 3. Meanwhile,since Uc has Kc distinct rows, we can set δc = mink 6=l‖Bc(k, :) − Bc(l, :)‖F to measure the

minimum center separation of Bc. By Lemma 3, δc ≥√

2nc,max

when Kr = Kc = K under

ONMnr,nc(Kr,Kc, P,Πr,Πc). However, when Kr < Kc, it is challenge to obtain a positivelower bound of δc, see the proof of Lemma 3 for detail.

Under ONM, to recover Πc from Uc, since Br is full rank, if Ur and Br are known inadvance ideally, we can exactly recover Πr by setting Πr = UrB

′r(BrB

′r)

−1 by Lemma 3.Set Yr = UrB

′r(BrB

′r)

−1, since Yr ≡ Πr and ‖Πr(ir, :)‖1 = 1 for ir∈[nr], we have

Πr(ir, :) =Yr(ir, :)

‖Yr(ir, :)‖1, ir ∈ [nr].

With given Ur, since it enjoys IS structure Ur = ΠrBr ≡ ΠrUr(Ir, :), as long as we canobtain the row corner matrix Ur(Ir, :) (i.e., Br), we can recover Πr exactly. As mentionedin Jin et al. (2017); Mao et al. (2020), for such ideal simplex, the successive projection (SP)algorithm Gillis and Vavasis (2015) (for detail of SP, see Algorithm 3) can be applied to Ur

with Kr row communities to find Ur(Ir, :).Based on the above analysis, we are now ready to give the following algorithm which

we call Ideal ONA. Input Ω,Kr,Kc with Kr ≤ Kc. Output: Πr and ℓ.

• Let Ω = UrΛU′c be the compact SVD of Ω such that Ur ∈ R

nr×Kr , Uc ∈ Rnc×Kr ,Λ ∈

RKr×Kr , U ′

rUr = IKr , U′cUc = IKr .

• For row nodes,

– Run SP algorithm on all rows of Ur assuming there are Kr row communities toobtain Ur(Ir, :). Set Br = Ur(Ir, :).

– Set Yr = UrB′r(BrB

′r)

−1. Recover Πr by setting Πr(ir, :) =Yr(ir ,:)

‖Yr(ir ,:)‖1 for ir ∈ [nr].

For column nodes,

5

Qing

– Run k-means on Uc assuming there are Kc column communities, i.e., find thesolution to the following optimization problem

M∗ = argminM∈Mnc,Kr,Kc‖M − Uc‖2F ,

where Mnc,Kr,Kc denotes the set of nc×Kr matrices with only Kc different rows.

– use M∗ to obtain the labels vector ℓ of column nodes.

Follow similar proof of Theorem 1 of Qing and Wang (2021a), Ideal ONA exactly recoveriesrow nodes memberships and column nodes labels, and this also verifies the identifiability ofONM in turn. For convenience, call the two steps for column nodes as “run k-means on Uc

assuming there are Kc column communities to obtain ℓ”.

We now extend the ideal case to the real case. Set A = UrΛU′c be the top-Kr-dimensional

SVD of A such that Ur ∈ Rnr×Kr , Uc ∈ R

nc×Kr , Λ ∈ RKr×Kr , U ′

rUr = IKr , U′cUc = IKr , and

Λ contains the top Kr singular values of A. For the real case, we use Br, Bc, Yr, Πr, Πc givenin Algorithm 1 to estimate Br, Bc, Yr,Πr,Πc, respectively. Algorithm 1 called overlappingand nonoverlapping algorithm (ONA for short) is a natural extension of the Ideal ONA tothe real case. In ONA, we set the negative entries of Yr as 0 by setting Yr = max(0, Yr) forthe reason that weights for any row node should be nonnegative while there may exist somenegative entries of UrB

′r(BrB

′r)

−1. Note that, in a directed network, if column nodes haveoverlapping property while row nodes do not, to do community detection for such directednetwork, set the transpose of the adjacency matrix as input when applying our algorithm.

Algorithm 1 Overlapping and Nonoverlapping Algorithm (ONA)

Require: The adjacency matrix A ∈ Rnr×nc of a directed network, the number of row

communities Kr, and the number of column communities Kc with Kr ≤ Kc.Ensure: The estimated nr ×Kr membership matrix Πr for row nodes, and the estimated

nc × 1 labels vector ℓ for column nodes.1: Compute Ur ∈ R

nr×Kr and Uc ∈ Rnc×Kr from the top-Kr-dimensional SVD of A.

2: For row nodes:

• Apply SP algorithm (i.e., Algorithm 3) on the rows of Ur assuming there are Kr

row clusters to obtain the near-corners matrix Ur(Ir, :) ∈ RKr×Kr , where Ir is the

index set returned by SP algorithm. Set Br = Ur(Ir, :).• Compute the nr×Kr matrix Yr such that Yr = UrB

′r(BrB

′r)

−1. Set Yr = max(0, Yr)

and estimate Πr(ir, :) by Πr(ir, :) =Yr(ir ,:)

‖Yr(ir ,:)‖1, ir ∈ [nr].

For column nodes: run k-means on Uc assuming there are Kc column communities toobtain ℓ.

2.2 Main results for ONA

In this section, we show the consistency of our algorithm for fitting the ONM as the numberof row nodes nr and the number of column nodes nc increase. Throughout this paper,Kr ≤ Kc are two known integers. First, we assume that

6


Assumption 4 ρmax(nr, nc) ≥ log(nr + nc).

Assumption (4) controls the sparsity of directed network considered for theoretical study.By Lemma 4 of Qing and Wang (2021a), we have below lemma.

Lemma 5 (Row-wise singular eigenvector error) Under ONMnr ,nc(Kr,Kc, P,Πr,Πc), whenAssumption (4) holds, suppose σKr(Ω) ≥ C

√

ρ(nr + nc)log(nr + nc), with probability atleast 1− o((nr + nc)

−α),

‖UrU′r − UrU

′r‖2→∞ = O(

√Kr(κ(Ω)

√

max(nr ,nc)µmin(nr ,nc)

+√

log(nr + nc))√ρσKr(P )σKr(Πr)

√nc,Kr

),

where µ is the incoherence parameter defined as µ = max(nr‖Ur‖22→∞

Kr,nc‖Uc‖22→∞

Kr).

For convenience, set = ‖UrU′r−UrU

′r‖2→∞ in this paper. To measure the performance of

ONA for row nodes memberships, since row nodes have mixed memberships, naturally, weuse the l1 norm difference between Πr and Πr. Since column nodes are all pure nodes, weconsider the performance criterion defined in Joseph and Yu (2016) to measure estimationerror of ONA on column nodes. We introduce this measurement of estimation error asbelow.

Let Tc = Tc,1,Tc,2, . . . ,Tc,Kc be the true partition of column nodes 1, 2, . . . , nc ob-

tained from ℓ such that Tc,k = ic : ℓ(ic) = k for k ∈ [Kc]. Let Tc = Tc,1, Tc,2, . . . , Tc,Kcbe the estimated partition of column nodes 1, 2, . . . , nc obtained from ℓ of ONA such thatTc,k = ic : ℓ(ic) = k for k ∈ [Kc]. The criterion is defined as

fc = minπ∈SKcmaxk∈[Kc]

|Tc,k ∩ T cc,π(k)|+ |T c

c,k ∩ Tc,π(k)|nc,k

,

where SKc is the set of all permutations of 1, 2, . . . ,Kc and the superscript c denotescomplementary set. As mentioned in Joseph and Yu (2016), fc measures the maximumproportion of column nodes in the symmetric difference of Tc,k and Tc,π(k).

Next theorem gives theoretical bounds on estimations of memberships for both row andcolumn nodes, which is the main theoretical result for ONA.

Theorem 6 Under ONMnr,nc(Kr,Kc, P,Πr,Πc), suppose conditions in Lemma 5 hold,with probability at least 1− o((nr + nc)

−α),

• for row nodes, there exists a permutation matrix Pr such that

maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(κ(Π′rΠr)Kr

√

λ1(Π′rΠr)).

• for column nodes,

fc = O(KrKcmax(nr, nc)log(nr + nc)

σ2Kr

(P )ρδ2cσ2Kr

(Πr)nc,Krnc,min

).

Especially, when Kr = Kc = K,

fc = O(K2max(nr, nc)nc,maxlog(nr + nc)

σ2K(P )ρσ2

K(Πr)n2c,min

).

7

Qing

Add conditions similar as Corollary 3.1 in Mao et al. (2020), we have the following corollary.

Corollary 7 Under ONMnr,nc(Kr,Kc, P,Πr,Πc), suppose conditions in Lemma 5 hold,and further suppose that λKr(Π

′rΠr) = O( nr

Kr), nc,min = O( nc

Kc), with probability at least

1− o((nr + nc)−α),

• for row nodes, when Kr = Kc = K,

maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(K2(

√

Cmax(nr ,nc)min(nr ,nc)

+√

log(nr + nc))

σK(P )√ρnc

).


fc = O(K2

rK3cmax(nr, nc)log(nr + nc)

σ2Kr

(P )ρδ2cnrn2c

).

When Kr = Kc = K,

fc = O(K4max(nr, nc)log(nr + nc)

σ2K(P )ρnrnc

).

Especially, when nr = O(n), nc = O(n),Kr = O(1) and Kc = O(1),

• for row nodes, when Kr = Kc,

maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(

√

log(n)

σKr(P )√ρn

).


fc = O(log(n)

σ2Kr

(P )ρδ2cn2).

When Kr = Kc = K,

fc = O(log(n)

σ2K(P )ρn

).

When Kr 6= Kc, though it is challenge to obtain the lower bound of δc, we can roughly set√

2nc,max

as the lower bound of δc since δc ≥√

2nc,max

when Kr = Kc.

When ONM degenerates to SBM by setting Πr = Πc and all nodes are pure, applying theseparation condition and sharp threshold criterion developed in Qing (2021b) on the upperbounds of error rates in Corollary 7, sure we can obtain the classical separation conditionof a balanced network and sharp threshold of the Erdos-Renyi random graph G(n, p) ofErdos and Renyi (2011), and this guarantees the optimality of our theoretical results.

8


3. The overlapping and degree-corrected nonoverlapping model

Similar as DCSBM Karrer and Newman (2011) is an extension of SBM by introducing nodespecific parameters to allow for varying degrees, in this section, we propose an extensionof ONM by considering degree heterogeneity and build theoretical guarantees for algorithmfitting our model.

Let θc be an nc× 1 vector whose ic-th entry is the degree heterogeneity of column nodeic, for ic ∈ [nc]. Let Θc be an nc×nc diagonal matrix whose ic-th diagonal element is θc(ic).The extended model for generating A is as follows:

Ω := ΠrPΠ′cΘc, A(ir, ic) ∼ Bernoulli(Ω(ir, ic)) for ir ∈ [nr], ic ∈ [nc]. (6)

Definition 8 Call model (1), (2), (3),(4), (6) the Overlapping and Degree-Corrected Nonover-lapping model (ODCNM) and denote it by ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc).

Note that, under ODCNM, the maximum element of P can be larger than 1 since maxic∈[nc]θc(ic)also can control the sparsity of the directed network N . The following proposition guaran-tees that ODCNM is identifiable in terms of P,Πr and Πc, and such identifiability is similaras that of DCSBM and DCScBM.

Proposition 9 If conditions (I1) and (I2) hold, ODCNM is identifiable for membershipmatrices: For eligible (P,Πr,Πc,Θc) and (P , Πr, Πc, Θc), if ΠrPΠ′

cΘc = ΠrP Π′cΘc, then

Πr = Πr and Πc = Πc.

Remark 10 By setting θc(ic) = ρ for ic ∈ [nc], ODCNM reduces to ONM, and this isthe reason that ODCNM can be seen as an extension of ONM. Meanwhile, though DC-ScBM Rohe et al. (2016) can model directed networks with degree heterogeneities for bothrow and column nodes, DCScBM does not allow the overlapping property for nodes. Forcomparison, our ODCNM allows row nodes have overlapping property at the cost of losingthe degree heterogeneities and requiring Kr ≤ Kc for model identifiability. Furthermore,another identifiable model extends ONM by considering degree heterogeneity for row nodeswith overlapping property is provided in Appendix D, in which we also explain why we donot extend ONM by considering degree heterogeneities for both row and column nodes.

3.1 A spectral algorithm for fitting ODCNM

We now discuss our intuition for the design of our algorithm to fit ODCNM. Without causingconfusion, we also use Ur, Uc, Br, Bc, δc, Yr, and so on under ODCNM. Let Uc,∗ ∈ R

nc×Kr

be the row-normalized version of Uc such that Uc,∗(ic, :) = Uc(ic,:)‖Uc(ic,:)‖F for ic ∈ [nc]. Then

clustering the rows of Uc,∗ by k-means algorithm can return perfect clustering for columnnodes, and this is guaranteed by next lemma.

Lemma 11 Under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc), there exist an unique Kr × Kr

matrix Br and an unique Kc ×Kr matrix Bc such that

• Ur = ΠrBr where Br = Ur(Ir, :). Meanwhile, Ur(ir, :) = Ur (ir, :) when Πr(ir, :) =Πr (ir, :) for ir, ir ∈ [nr].

9

Qing

• Uc,∗ = ΠcBc. Meanwhile, Uc,∗(ic, :) = Uc,∗(ic, :) when ℓ(ic) = ℓ(ic) for ic, ic ∈ [nc].Furthermore, when Kr = Kc = K, we have ‖Bc(k, :) − Bc(l, :)‖F =

√2 for all 1 ≤

k < l ≤ K.

Recall that we set δc = mink 6=l‖Bc(k, :)−Bc(l, :)‖F , by Lemma 11, δc =√2 whenKr = Kc =

K under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc). However, when Kr < Kc, it is challenge toobtain a positive lower bound of δc, see the proof of Lemma 11 for detail.

Under ODCNM, to recover Πc from Uc, since Uc,∗ has Kc distinct rows, applying k-means algorithm on all rows of Uc,∗ returns true column communities by Lemma 11; torecover Πr from Ur, just follow same idea as that of under ONM.

Based on the above analysis, we are now ready to give the following algorithm whichwe call Ideal ODCNA. Input Ω,Kr,Kc with Kr ≤ Kc. Output: Πr and ℓ.

• Let Ω = UrΛU′c be the compact SVD of Ω such that Ur ∈ R

nr×Kr , Uc ∈ Rnc×Kr ,Λ ∈

RKr×Kr , U ′

rUr = IKr , U′cUc = IKr . Let Uc,∗ be the row-normalization of Uc.

• For row nodes,

– Run SP algorithm on all rows of Ur assuming there are Kr row communities toobtain Ur(Ir, :). Set Br = Ur(Ir, :).

– Set Yr = UrB′r(BrB

′r)

−1. Recover Πr by setting Πr(ir, :) =Yr(ir ,:)

‖Yr(ir ,:)‖1 for ir ∈ [nr].

For column nodes: run k-means on Uc,∗ assuming there are Kc column communitiesto obtain ℓ.

Sure, Ideal ODCNA exactly recoveries row nodes memberships and column nodes labels,and this also supports the identifiability of ODCNM.

We now extend the ideal case to the real case. Let Uc,∗ ∈ Rnc×Kr be the row-normalized

version of Uc such that Uc,∗(ic, :) =Uc(ic,:)

‖Uc(ic,:)‖Ffor ic ∈ [nc]. Algorithm 2 called overlapping

and degree-corrected nonoverlapping algorithm (ODCNA for short) is a natural extensionof the Ideal ODCNA to the real case.

3.2 Main results for ODCNA

Set θc,max = maxic∈[nc]θc(ic), θc,min = minic∈[nc]θc(ic), and Pmax = maxk∈[Kr],l∈[nc]P (k, l).Assume that

Assumption 12 Pmaxmax(θc,maxnr, ‖θc‖1) ≥ log(nr + nc).

By the proof of Lemma 4.3 of Qing (2021a), we have below lemma.

Lemma 13 (Row-wise singular eigenvector error) Under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc),when Assumption (12) holds, suppose σKr(Ω) ≥ C

√

θc,max(nr + nc)log(nr + nc), with prob-ability at least 1− o((nr + nc)

−α),

‖UrU′r − UrU

′r‖2→∞ = O(

√

θc,maxKr(κ(Ω)√


+√

log(nr + nc))

θc,minσKr(P )σKr(Πr)√nc,Kr

).

10


Algorithm 2 Overlapping and Degree-Corrected Nonoverlapping Algorithm(ODCNA)

Require: The adjacency matrix A ∈ Rnr×nc of a directed network, the number of row

communities Kr, and the number of column communities Kc with Kr ≤ Kc.Ensure: The estimated nr ×Kr membership matrix Πr for row nodes, and the estimated

nc × 1 labels vector ℓ for column nodes.1: Compute Ur ∈ R

nr×Kr and Uc ∈ Rnc×Kr from the top-Kr-dimensional SVD of A.

Compute Uc,∗ from Uc.2: For row nodes:

• Apply SP algorithm (i.e., Algorithm 3) on the rows of Ur assuming there are Kr

row clusters to obtain the near-corners matrix Ur(Ir, :) ∈ RKr×Kr , where Ir is the

index set returned by SP algorithm. Set Br = Ur(Ir, :).• Compute the nr×Kr matrix Yr such that Yr = UrB

′r(BrB

′r)

−1. Set Yr = max(0, Yr)

and estimate Πr(ir, :) by Πr(ir, :) =Yr(ir ,:)

‖Yr(ir ,:)‖1, ir ∈ [nr].

For column nodes: run k-means on Uc,∗ assuming there are Kc column communities to

obtain ℓ.

Next theorem is the main theoretical result for ODCNA, where we also use same measure-ments as ONA to measure the performances of ODCNA.

Theorem 14 Under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc), suppose conditions in Lemma13 hold, with probability at least 1− o((nr + nc)

−α),

• for row nodes,

maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(κ(Π′rΠr)Kr

√

λ1(Π′rΠr)).


fc = O(θ2c,maxKrKcmax(θc,maxnr, ‖θc‖1)nc,maxlog(nr + nc)

σ2Kr

(P )θ4c,minδ2cm

2Vcσ2Kr

(Πr)nc,Krnc,min),

where mVc is a parameter defined in the proof of this theorem, and it is 1 when Kr =Kc. Especially, when Kr = Kc = K,

fc = O(θ2c,maxK

2max(θc,maxnr, ‖θc‖1)nc,maxlog(nr + nc)

σ2K(P )θ4c,minσ

2K(Πr)n

2c,min

).

Add some conditions on model parameters, we have the following corollary.

Corollary 15 Under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc), suppose conditions in Lemma13 hold, and further suppose that λKr(Π

′rΠr) = O( nr

Kr), nc,min = O( nc

Kc), with probability at

least 1− o((nr + nc)−α),

11

Qing


maxir∈[nr]‖e′ir(Πr −ΠrPr)‖1 = O(K2

√

θc,max(√


+√

log(nr + nc))

θc,minσK(P )√nc

).


fc = O(θ2c,maxK

2rK

2cmax(θc,maxnr, ‖θc‖1)log(nr + nc)

σ2Kr

(P )θ4c,minδ2cm

2Vcnrnc

).

When Kr = Kc = K,

fc = O(θ2c,maxK

4max(θc,maxnr, ‖θc‖1)log(nr + nc)

σ2K(P )θ4c,minnrnc

).




√

θc,maxlog(n)

θc,minσK(P )√n).


fc = O(θ2c,maxmax(θc,maxnr, ‖θc‖1)log(n)

σ2Kr

(P )θ4c,minδ2cm

2Vcn2

).

When Kr = Kc = K,

fc = O(θ2c,maxmax(θc,maxnr, ‖θc‖1)log(n)

σ2K(P )θ4c,minn

2).

When Kr 6= Kc, though it is challenge to obtain the lower bounds of δc and mVc , we canroughly set

√2 and 1 as the lower bounds of δc and mVc , respectively, since δc =

√2 and

mVc = 1 when Kr = Kc. Meanwhile, if we further set θc,max = O(ρ) and θc,min = O(ρ), wehave below corollary.

Corollary 16 Under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc), suppose conditions in Lemma13 hold, and further suppose that λKr(Π

′rΠr) = O( nr

Kr), nc,min = O( nc

Kc) and θc,max =

O(ρ), θc,min = O(ρ), with probability at least 1− o((nr + nc)−α),


maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(K2(

√


+√

log(nr + nc))

σK(P )√ρnc

).

12



fc = O(K2


σ2Kr

(P )ρδ2cm2Vcnrnc

).

When Kr = Kc = K,


σ2K(P )ρnrnc

).




√

log(n)

σK(P )√ρn

).


fc = O(log(n)

σ2Kr

(P )ρδ2cm2Vcn).

When Kr = Kc = K,

fc = O(log(n)

σ2K(P )ρn

).

By setting Θc = ρI, ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc) degenerates toONMnr ,nc(Kr,Kc, P,Πr,Πc).By comparing Corollary 7 and Corollary 16, we see that theoretical results under ODCNMare consistent with those under ONM when ODCNM degenerates to ONM for the case thatKr = Kc = K.

4. Simulations

In this section,we present some simulations to investigate the performance of the three pro-posed algorithms. We measure their performances by Mixed-Hamming error rate (MHammfor short) for row nodes and Hamming error rate (Hamm for short) for column nodes definedbelow

MHamm =minπ∈SKr

‖Πrπ −Πr‖1nr

and Hamm =minπ∈S‖Πcπ −Πc‖1

nc,

where Πc ∈ Rnc×Kc is defined as Πc(ic, k) = 1 if ℓ(ic) = k and 0 otherwise for ic ∈ [nc], k ∈

[Kc].For all simulations in this section, the parameters (nr, nc,Kr,Kc, P, ρ,Πr ,Πc,Θc) are

set as follows. Unless specified, set nr = 400, nc = 300,Kr = 3,Kc = 4. For column nodes,generate Πc by setting each column node belonging to one of the column communities withequal probability. Let each row community have 100 pure nodes, and let all the mixed

13

Qing

row nodes have memberships (0.6, 0.3, 0.1). P = ρP is set independently under ONMand ODCNM. Under ONM, ρ is 0.5 in Experiment 1 and we study the influence of ρ inExperiment 2; Under ODCNM, for zc ≥ 1, we generate the degree parameters for column

nodes as below: let θc ∈ Rnc×1 such that 1/θc(ic)

iid∼ U(1, zc) for ic ∈ [nc], where U(1, zc)denotes the uniform distribution on [1, zc]. We study the influences of Zc and ρ underODCNM in Experiments 3 and 4, respectively. For all settings, we report the averagedMHamm and the averaged Hamm over 50 repetitions.

Experiment 1: Changing nc under ONM. Let nc range in 50, 100, 150, . . . , 300. Forthis experiment, P is set as

P = ρ

1 0.3 0.2 0.30.2 0.9 0.1 0.20.3 0.2 0.8 0.3

.

Let ρ = 0.5 under for this experiment designed under ONM. The numerical results areshown in panels (a) and (b) of Figure 1. The results show that as nc increases, ONAand ODCNA perform better. Meanwhile, the total run-time for this experiment is roughly70 seconds. For row nodes, since both ONA and ODCNA apply SP algorithm on U toestimate Πr, the estimated row membership matrices of ONA and ODCNA are same, andhence MHamm for ONA always equal to that of ODCNA.

Experiment 2: Changing ρ under ONM. P is set same as Experiment 1, and we let ρrange in 0.1, 0.2, . . . , 1 to study the influence of ρ on performances of ONA and ODCNAunder ONM. The results are displayed in panels (c) and (d) of Figure 1. From the results,we see that both methods perform better as ρ increases since a larger ρ gives more edgesgenerated in a directed network. Meanwhile, the total run-time for this experiment isroughly 136 seconds.

Experiment 3: Change zc under ODCNM. P is set same as Experiment 1. Let zc rangein 1, 2, . . . , 8. Increasing zc decreases edges generated under ODCNM. Panels (e) and (f)in Figure 1 display simulation results of this experiment. The results show that, generally,increasing the variability of node degrees makes it harder to detect node memberships forboth ONA and ODCNA. Though ODCNA is designed under ODCNM, it holds similarperformances as ONA for directed networks in which column nodes have various degrees inthis experiment, and this is consistent with our theoretical findings in Corollaries 7 and 15.Meanwhile, the total run-time for this experiment is around 131 seconds.

Experiment 4: Change ρ under ODCNM. Set zc = 3, P is set same as Experiment 1,and let ρ range in 0.1, 0.2, . . . , 1 under ODCNM. Panels (g) and (h) in Figure 1 displayssimulation results of this experiment. The performances of the two proposed methods aresimilar as that of Experiment 2. Meanwhile, the total run-time for this experiment is around221 seconds.

5. Discussions

In this paper, we introduced overlapping and nonoverlapping models and its extensionby considering degree heterogeneity. The models can model directed network with Kr

row communities and Kc column communities, in which row node can belong to multiplerow communities while column node only belong to one of the column communities. The

14


50 100 150 200 250 300n

c

0.1

0.12

0.14

0.16

0.18

0.2

MH

amm

ONAODCNA

(a) Changing nc under ONM: MHamm.

50 100 150 200 250 300n

c

0

0.05

0.1

0.15

Ham

m

(b) Changing nc under ONM: Hamm.

0 0.2 0.4 0.6 0.8 10.05

0.1

0.15

0.2

0.25

MH

amm

(c) Changing ρ under ONM: MHamm.

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Ham

m

(d) Changing ρ under ONM: Hamm.

0 2 4 6 8z

c

0.08

0.09

0.1

0.11

0.12

MH

amm

(e) Changing zc under ODCNM:MHamm.

0 2 4 6 8z

c

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Ham

m

(f) Changing zc under ODCNM:Hamm.

0 0.2 0.4 0.6 0.8 10.1

0.15

0.2

0.25

0.3

0.35

0.4

MH

amm

(g) Changing ρ under ODCNM:MHamm.

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

Ham

m

(h) Changing ρ under ODCNM: Hamm.

Figure 1: Estimation errors of ONA and ODCNA.

15

Qing

proposed models are identifiable when Kr ≤ Kc and some other popular constraints on theconnectivity matrix and membership matrices. For comparison, modeling directed networkin which row nodes have overlapping property while column nodes do not with Kr > Kc isunidentifiable. Meanwhile, since previous works found that modeling directed networks inwhich both row and column nodes have overlapping property withKr 6= Kc is unidentifiable,our identifiable ONM and ODCNM as well as the DCONM in Appendix D supply a gap inmodeling overlapping directed networks when Kr 6= Kc. Theses models provide exploratorytools for studying community structure in directed networks with one side is overlappingwhile another side is nonoverlapping. Two spectral algorithms are designed to fit ONM andODCNM. We also showed estimation consistency under mild conditions for our methods.Especially, when ODCNM reduces to ONM, our theoretical results under ODCNM areconsistent with those under ONM. But perhaps the main limitation of the models is thatthe Kr and Kc in the directed network are assumed given, and such limitation also holds forthe ScBM and DCScBM of Rohe et al. (2016). In most community problems, the number ofrow community and the number of column community are unknown, therefore a completecalculation and theoretical study require not only the algorithms and their theoreticallyconsistent estimations described in this paper but also a method for estimating Kr and Kc.We leave studies of this problem to our future work.

Appendix A. Successive Projection algorithm

Algorithm 3 is the Successive Projection algorithm.

Algorithm 3 Successive Projection (SP) (Gillis and Vavasis, 2015)

Require: Near-separable matrix Ysp = SspMsp + Zsp ∈ Rm×n+ , where Ssp,Msp should

satisfy Assumption 1 Gillis and Vavasis (2015), the number r of columns to be extracted.Ensure: Set of indices K such that Ysp(K, :) ≈ S (up to permutation)1: Let R = Ysp,K = , k = 1.2: While R 6= 0 and k ≤ r do3: k∗ = argmaxk‖R(k, :)‖F .4: uk = R(k∗, :).

5: R← (I − uku′

k

‖uk‖2F)R.

6: K = K ∪ k∗.7: k=k+1.8: end while

Appendix B. Proofs under ONM

B.1 Proof of Proposition 2

Proof By Lemma 3, let UrΛU′c be the compact SVD of Ω such that Ω = UrΛU

′c, since

Ω = ΠrPΠ′c = ΠrP Π′

c, we have Ω(Ir,Ic) = P = P , which gives P = P . By Lemma3, since Ur = ΠrUr(Ir, :) = ΠrUr(Ir, :), we have Πr = Πr where we have used the factthat the inverse of Ur(Ir, :) exists. Since Ω = ΠrPΠ′

c = ΠrP Π′c = ΠrP Π′

c, we haveΠrPΠ′

c = ΠrP Π′c. By Lemma 7 of Qing and Wang (2021a), we have PΠ′

c = P Π′c, i.e.,

16


ΠcX = ΠcX where we set X = P ′ ∈ RKc×Kr . Let ℓ be the nc × 1 vector of column

nodes labels obtained from Πc. For ic ∈ [nc], k ∈ [Kr], from ΠcX = ΠcX, we have(ΠcX)(ic, k) = Πc(ic, :)X(:, k) = X(ℓ(ic), k) = X(ℓ(ic), k), which means that we musthave ℓ(ic) = ℓ(ic) for all ic ∈ [nc], i.e., ℓ = ℓ and Πc = Πc. Note that, for the special caseKr = Kc = K, Πc = Πc can be obtained easily: since PΠ′

c = P Π′c and P ∈ R

K×K isassumed to be full rank, we have Πc = Πc. Thus the proposition holds.

B.2 Proof of Lemma 3

Proof For Ur, since Ω = UrΛU′c and U ′

cUc = IKr , we have Ur = ΩUcΛ−1. Recall that

Ω = ΠrPΠ′c, we have Ur = ΠrPΠ′

cUcΛ−1 = ΠrBr, where we set Br = PΠ′

cUcΛ−1. Since

Ur(Ir, :) = Πr(Ir, :)Br = Br, we have Br = Ur(Ir, :). For ir ∈ [nr], Ur(ir, :) = e′irΠrBr =Πr(ir, :)Br, so sure we have Ur(ir, :) = Ur (ir, :) when Πr(ir, :) = Πr (ir, :).

For Uc, follow similar analysis as for Ur, we have Uc = ΠcBc, where Bc = P ′Π′rUrΛ

−1.Note that Bc ∈ R

Kc∗Kr . Sure, Uc(ic, :) = Uc(ic, :) when ℓ(ic) = ℓ(ic) for ic, ic ∈ [nc].Now, we focus on the case when Kr = Kc = K. For this case, since Bc ∈ R

Kc∗Kr , Bc isfull rank when Kr = Kc. Since IKr = IK = U ′

cUc = B′cΠ

′cΠcBc, we have Π′

cΠc = (BcB′c)

−1.Since Π′

cΠc = diag(nc,1, nc,2, . . . , nc,K), we have BcB′c = diag( 1

nc,1, 1nc,2

, . . . , 1nc,K

). When

Kr = Kc = K, we have Bc(k, :)B′c(l, :) = 0 for any k 6= l and k, l ∈ [K]. Then ,we have

BcB′c = diag(‖Bc(1, :)‖2F , ‖Bc(2, :)‖2F , . . . , ‖Bc(K, :)‖2F ) = diag( 1

nc,1, 1nc,2

, . . . , 1nc,K

) and the

lemma follows.Note that when Kr < Kc, since Bc is not full rank now, we can not obtain Π′

cΠc =(BcB

′c)

−1 from IKr = B′cΠ

′cΠcBc. Therefore, when Kr < Kc, the equality ‖Bc(k, :)−Bc(l, :

)‖F =√

1nc,k

+ 1nc,l

does not hold for any k 6= l. And we can only know that Uc has Kc

distinct rows when Kr < Kc, but have no knowledge about the minimum distance betweenany two distinct rows of Uc.

B.3 Proof of Theorem 6

Proof For row nodes, when conditions in Lemma 5 hold, by Theorem 2 of Qing and Wang(2021a), with probability at least 1−o((nr+nc)

−α) for any α > 0, there exists a permutationmatrix Pr such that, for ir ∈ [nr], we have

‖e′ir(Πr −ΠrPr)‖1 = O(κ(Π′rΠr)Kr

√

λ1(Π′rΠr)).

Next, we focus on column nodes. By the proof of Lemma 2.3 of Qing and Wang (2021b),there exists an orthogonal matrix O such that

‖UcO − Uc‖F ≤2√2Kr‖A− Ω‖

√

λKr(Ω′Ω)

. (7)

Under ONMnr,nc(Kr,Kc, P,Πr,Πc), by Lemma 10 of Qing and Wang (2021a), we have

√

λKr(Ω′Ω) ≥ ρσKr(P )σKr(Πr)σKr(Πc). (8)

17

Qing

Since all column nodes are pure, σKr(Πc) =√nc,Kr . By Lemma 3 of Qing and Wang

(2021a), when Assumption (4) holds, with probability at least 1− o((nr + nc)−α), we have

‖A− Ω‖ = O(√

ρmax(nr, nc)log(nr + nc)). (9)

Substitute the two bounds in Eqs (8) and (9) into Eq (7), we have

‖UcO − Uc‖F ≤ C

√

Krmax(nr, nc)log(nr + nc)

σKr(P )√ρσKr(Πr)

√nc,Kr

. (10)

Let ς > 0 be a small quantity, by Lemma 2 in Joseph and Yu (2016), if

√Kc

ς‖Uc − UcO‖F (

1√nc,k

+1√nc,l

) ≤ ‖Bc(k, :) −Bc(l, :)‖F , for each 1 ≤ k 6= l ≤ Kc,

(11)

then the clustering error fc = O(ς2). Recall that we set δc = mink 6=l‖Bc(k, :)−Bc(l, :)‖F to

measure the minimum center separation of Bc. Setting ς = 2δc

√

Kc

nc,min

‖Uc − UcO‖F makes

Eq (11) hold for all 1 ≤ k 6= l ≤ Kc. Then we have fc = O(ς2) = O(Kc‖Uc−UcO‖2

F

δ2cnc,min

). By Eq

(10), we have


σ2Kr

(P )ρδ2cσ2Kr

(Πr)nc,Krnc,min

).

Especially, when Kr = Kc = K, δc ≥√

2nc,max

under ONMnr ,nc(Kr,Kc, P,Πr,Πc) by

Lemma 3. When Kr = Kc = K, we have

fc = O(K2max(nr, nc)nc,maxlog(nr + nc)

σ2K(P )ρσ2

K(Πr)n2c,min

).

B.4 Proof of Corollary 7

Proof For row nodes, under conditions of Corollary 7, we have

maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(Kr

√

nr

Kr) = O(

√

Knr).

Under conditions of Corollary 7, κ(Ω) = O(1) and µ ≤ C for some C > 0 by the proof ofCorollary 1 Qing and Wang (2021a). Then, by Lemma 5, we have

= O(

√K(κ(Ω)

√


+√

log(nr + nc))√ρσK(P )σK(Πr)

√nc,Kr

) = O(

√K(

√


+√

log(nr + nc))√ρσK(P )σK(Πr)

√nc,min

)

18


= O(K1.5(

√


+√

log(nr + nc))

σK(P )√ρnrnc

),

which gives that

maxir∈[nr]‖e′ir(Πr −ΠrPr)‖1 = O(K2(

√


+√

log(nr + nc))

σK(P )√ρnc

).

Note that, when Kr < Kc, we can not draw a conclusion that µ ≤ C. Because, whenKr < Kc, the inverse of BcB

′c does not exist since Bc ∈ R

Kc×Kr . Therefore, Lemma 8 ofQing and Wang (2021a) does not hold, and we can not obtain the upper bound of ‖Uc‖2→∞,causing the impossibility of obtaining the upper bound of µ, and this is the reason that weonly consider the case when Kr = Kc for row nodes here.

For column nodes, under conditions of Corollary 7, we have


σ2Kr

(P )ρδ2cσ2Kr

(Πr)nc,Krnc,min

) = O(KrKcmax(nr, nc)log(nr + nc)

σ2Kr

(P )ρδ2c (nr/Kr)(nc/Kc)(nc/Kc))

= O(K2


σ2Kr

(P )ρδ2cnrn2c

).

For the special case Kr = Kc = K, sincenc,max

nc,min

= O(1) when nc,min = O(nc

K), we have


σ2K(P )ρnrnc

).

When nr = O(n), nc = O(n),Kr = O(1) and Kc = O(1), the corollary follows immediatelyby basic algebra.

Appendix C. Proofs under ODCNM

C.1 Proof of Proposition 9

Proof Since Ω = ΠrPΠ′cΘc = ΠrP Π′

cΘc = UrΛU′c, we have Ur = ΠrUr(Ir, :) = ΠrUr(Ir, :)

by Lemma 11, which gives that Πr = Πr. Since Uc,∗ = ΠcBc = ΠcUc,∗(Ic, :) = ΠcUc,∗(Ic, :)by Lemma 11, we have Πc = Πc.

C.2 Proof of Lemma 11

Proof

• For Ur: since Ω = UrΛU′c and U ′

cUc = IKr , we have Ur = ΩUcΛ−1. Recall that

Ω = ΠrPΠ′cΘc under ODCNM, we have Ur = ΠrPΠ′

cΘcUcΛ−1 = ΠrBr, where Br =

PΠ′cΘcUcΛ

−1. Sure, Ur(ir, :) = Ur (ir, :) holds when Πr(ir, :) = Πr (ir, :) for ir, ir ∈[nr].

19

Qing

• For Uc: let Dc be a Kc × Kc diagonal matrix such that Dc(k, k) = ‖ΘcΠc(:,k)‖F‖θc‖F for

k ∈ [Kc]. Let Γc be an nc×Kc matrix such that Γc(:, k) =ΘcΠc(:,k)

‖ΘcΠc(:,k)‖F for k ∈ [Kc]. For

suchDc and Γc, we have Γ′cΓc = IKc and Ω = ΠrP‖θc‖FDcΓ

′c, i.e., ΘcΠc = ‖θc‖FΓcDc.

Since Ω = UrΛU′c and U ′

rUr = IKr , we have Uc = ΘcΠcP′Π′

rUrΛ−1. Since ΘcΠc =

‖θc‖FΓcDc, we have Uc = Γc‖θc‖FDcP′Π′

rUrΛ−1 = ΓcVc, where we set Vc = ‖θc‖FDcP

′Π′rUrΛ

−1 ∈RKc×Kr . Note that since U ′

cUc = IKr = V ′cΓ

′cΓcVc = V ′

cVc, we have V ′cVc = IKr . Now,

for ic ∈ [nc], k ∈ [Kr], we have

Uc(ic, k) = e′icUcek = e′icΓcVcek = Γc(ic, :)Vcek

= θc(ic)[Πc(ic, 1)

‖ΘcΠc(:, 1)‖FΠc(ic, 2)

‖ΘcΠc(:, 2)‖F. . .

Πc(ic,Kc)

‖ΘcΠc(:,Kc)‖F]Vcek

=θc(ic)

‖ΘcΠc(:, ℓ(ic))‖FVc(ℓ(ic), k),

which gives that

Uc(ic, :) =θc(ic)

‖ΘcΠc(:, ℓ(ic)‖F[Vc(ℓ(ic), 1) Vc(ℓ(ic), 2) . . . Vc(ℓ(ic),Kr)] =

θc(ic)

‖ΘcΠc(:, ℓ(ic)‖FVc(ℓ(ic), :).

Then we have

Uc,∗(ic, :) =Vc(ℓ(ic), :)

‖Vc(ℓ(ic), :)‖F. (12)

Sure, we have Uc,∗(ic, :) = Uc,∗(ic, :) when ℓ(ic) = ℓ(ic) for ic, ic ∈ [nc]. Let Bc ∈RKc×Kr such that Bc(l, :) =

Vc(l,:)‖Vc(l,:)‖F for l ∈ [Kc]. Eq (12) gives Uc,∗ = ΠcBc, which

guarantees the existence of Bc.

Now we consider the case when Kr = Kc = K. Since Vc ∈ RKc×Kr and Uc = ΓcVc ∈

Rnc×Kr , we have Vc ∈ R

K×K and rank(Vc) = K. Since V ′cVc = IKr , we have V

′cVc = IK

when Kr = Kc = K. Then we have

V ′cVc = IK ⇒ V ′

cVcV′c = V ′

c ⇒ V ′c (VcV

′c − IK) = 0

rank(Vc)=K⇒ VcV′c = IK . (13)

Since VcV′c = V ′

cVc = IK , we have Uc,∗(ic, :) = Vc(ℓ(ic), :) by Eq (12), and ‖Uc,∗(ic, :)−Uc,∗(ic, :)‖F = ‖Vc(ℓ(ic), :)− Vc(ℓ(ic), :)‖F =

√2 when ℓ(ic) 6= ℓ(ic) for ic, ic ∈ [nc],

i.e., ‖Bc(k, :) −Bc(l, :)‖F =√2 for k 6= l ∈ [K].

Note that, when Kr < Kc, since rank(Vc) = Kr and Vc ∈ RKc×Kr , the inverse of

Vc does not exist, which causes that the last equality in Eq (13) does not hold and‖Bc(k, :) −Bc(ℓ, :)‖ 6=

√2 for all k 6= l ∈ [Kc].

20


C.3 Proof of Theorem 14

Proof For row nodes, when conditions in Lemma 13 hold, by Theorem 2 of Qing and Wang(2021a), we have

maxir∈[nr]‖e′ir(Πr −ΠrPr)‖1 = O(κ(Π′rΠr)Kr

√

λ1(Π′rΠr)).

Next, we focus on column nodes. By the proof of Lemma 2.3 of Qing and Wang (2021b),there exists an orthogonal matrix O such that

‖UcO − Uc‖F ≤2√2Kr‖A− Ω‖

√

λKr(Ω′Ω)

. (14)

Under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc), by Lemma 4 of Qing (2021a), we have

√

λKr(Ω′Ω) ≥ θc,minσKr(P )σKr(Πr)

√nc,Kr . (15)

By Lemma 4.2 of Qing (2021a), when Assumption (12) holds, with probability at least1− o((nr + nc)

−α), we have

‖A− Ω‖ = O(√

max(θc,maxnr, ‖θc‖1)log(nr + nc)). (16)

Substitute the two bounds in Eqs (15) and (16) into Eq (14), we have

‖UcO − Uc‖F ≤ C

√

Krmax(θc,maxnr, ‖θc‖1)log(nr + nc)

σKr(P )θc,minσKr(Πr)√nc,Kr

. (17)

For ic ∈ [nc], by basic algebra, we have

‖Uc,∗(ic, :)O − Uc,∗(ic, :)‖F ≤2‖Uc(ic, :)O − Uc(ic, :)‖F

‖Uc(ic, :)‖F.

Set mc = min1≤ic≤nc‖Uc(ic, :)‖F , we have

‖Uc,∗O − Uc,∗‖F =

√

√

√

√

nc∑

ic=1

‖Uc,∗(ic, :)O − Uc,∗(ic, :)‖2F ≤2‖UcO − Uc‖F

mc.

Next, we provide lower bounds of mc. By the proof of Lemma 11, we have

‖Uc(ic, :)‖F = ‖ θc(ic)

‖ΘcΠc(:, ℓ(ic))‖FVc(ℓ(ic), :)‖F =

θc(ic)

‖ΘcΠc(:, ℓ(ic))‖F‖Vc(ℓ(ic), :)‖F

=θc(ic)

‖ΘcΠc(:, ℓ(ic))‖F≥ θc,min

θc,max√nc,max

mVc ,

where we set mVc = mink∈[Kc]‖Vc(k, :)‖F . Note that when Kr = Kc = K, by the proof ofLemma 11, we know that VcV

′c = IK , which gives that ‖Vc(k, :)‖F = 1 for k ∈ [K], i.e.,

21

Qing

mVc = 1 when Kr = Kc = K. However, when Kr < Kc, it is challenge to obtain a positive

lower bound of mVc . Hence, we have 1mc≤ θc,max

√nc,max

θc,minmVc. Then, by Eq (17), we have

‖Uc,∗O − Uc,∗‖F = O(θc,max

√

Krmax(θc,maxnr, ‖θc‖1)nc,maxlog(nr + nc)

σKr(P )θ2c,minmVcσKr(Πr)√nc,Kr

).

Let ς > 0 be a small quantity, by Lemma 2 in Joseph and Yu (2016), if

√Kc

ς‖Uc,∗ − Uc,∗O‖F (

1√nc,k

+1√nc,l

) ≤ ‖Bc(k, :) −Bc(l, :)‖F , for each 1 ≤ k 6= l ≤ Kc,

(18)

then the clustering error fc = O(ς2). Setting ς = 2δc

√

Kc

nc,min

‖Uc,∗ − Uc,∗O‖F makes Eq (18)

hold for all 1 ≤ k 6= l ≤ Kc. Then we have fc = O(ς2) = O(Kc‖Uc,∗−Uc,∗O‖2

F

δ2cnc,min

). By Eq (17),

we have


σ2Kr

(P )θ4c,minδ2cm

2Vcσ2Kr

(Πr)nc,Krnc,min).

Especially, when Kr = Kc = K, δc =√2 under ODCNMnr,nc(Kr,Kc, P,Πr,Πc,Θc) by

Lemma 11, and mVc = 1. When Kr = Kc = K, we have

fc = O(θ2c,maxK


σ2K(P )θ4c,minσ

2K(Πr)n2

c,min

).

C.4 Proof of Corollary 15

Proof For row nodes, under conditions of Corollary 15, we have

maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(Kr

√

nr

Kr) = O(

√

Knr).

Under conditions of Corollary 15, κ(Ω) = O(1) and µ ≤ Cθ2c,max

θ2c,min

≤ C for some C > 0 by

Lemma 2 of Qing (2021a). Then, by Lemma 13, we have

= O(

√

θc,maxKr(κ(Ω)√


+√

log(nr + nc))

θc,minσKr(P )σKr(Πr)√nc,Kr

)

= O(

√

θc,maxK(κ(Ω)√

max(nr,nc)µmin(nr ,nc)

+√

log(nr + nc))

θc,minσK(P )σK(Πr)√nc,min

)

= O(K1.5

√

θc,max(√


+√

log(nr + nc))

θc,minσK(P )√nrnc

),

22


which gives that

maxir∈[nr]‖e′ir (Πr −ΠrPr)‖1 = O(K2

√

θc,max(√


+√

log(nr + nc))

θc,minσK(P )√nc

).

The reason that we do not consider the case when Kr < Kc for row nodes is similar as thatof Corollary 7, and we omit it here.

For column nodes, under conditions of Corollary 15, we have


σ2Kr

(P )θ4c,minδ2cm

2Vcσ2Kr

(Πr)nc,Krnc,min)

= O(θ2c,maxK

2rK

2cmax(θc,maxnr, ‖θc‖1)log(nr + nc)

σ2Kr

(P )θ4c,minδ2cm

2Vcnrnc

).

For the case Kr = Kc = K, we have

fc = O(θ2c,maxK


σ2K(P )θ4c,minσ

2K(Πr)n2

c,min

)

= O(θ2c,maxK

4max(θc,maxnr, ‖θc‖1)log(nr + nc)

σ2K(P )θ4c,minnrnc

).

When nr = O(n), nc = O(n),Kr = O(1) and Kc = O(1), the corollary follows immediatelyby basic algebra.

Appendix D. The degee-corrected overlapping and nonoverlapping model

Here, we extend ONM by introducing degree heterogeneities for row nodes with overlappingproperty in the directed network N . Let θr be an nr × 1 vector whose ir-th entry is thedegree heterogeneity of row node ir, for ir ∈ [nr]. Let Θr be an nr × nr diagonal matrixwhose ir-th diagonal element is θr(ir). The extended model for generating A is as follows:

Ω := ΘrΠrPΠ′c, A(ir, ic) ∼ Bernoulli(Ω(ir, ic)) for ir ∈ [nr], ic ∈ [nc]. (19)

Definition 17 Call model (1), (2), (3),(4), (19) the Degree-Corrected Overlapping andNonoverlapping model (DCONM) and denote it by DCONMnr,nc(Kr,Kc, P,Πr,Πc,Θr).

The following conditions are sufficient for the identifiability of DCONM:

• (II1) rank(P ) = Kr, rank(Πr) = Kr, rank(Πc) = Kc, and P (k, k) = 1 for k ∈ [Kr].

• (II2) There is at least one pure row node for each of the Kr row communities.

For degree-corrected overlapping models, it is popular to require that P has unit-“diagonal”elements for model identifiability, see the model identifiability requirements on the DCMMmodel of Jin et al. (2017) and the OCCAM model of Zhang et al. (2020). Follow similarproof as that of Lemma 3, we have the following lemma.

23

Qing

Lemma 18 Under DCONMnr,nc(Kr,Kc, P,Πr,Πc,Θr), there exist an unique Kr × Kr

matrix Br and an unique Kc ×Kr matrix Bc such that

• Ur = ΘrΠrBr where Br = Θ−1r (Ir,Ir)Ur(Ir, :).

• Uc = ΠcBc. Meanwhile, Uc(ic, :) = Uc(ic, :) when ℓ(ic) = ℓ(ic) for ic, ic ∈ [nc], i.e., Uc

has Kc distinct rows. Furthermore, when Kr = Kc = K, we have ‖Bc(k, :) − Bc(l, :

)‖F =√

1nc,k

+ 1nc,l

for all 1 ≤ k < l ≤ K.

The following proposition guarantees the identifiability of DCONM.

Proposition 19 If conditions (II1) and (II2) hold, DCONM is identifiable: For eligible(P,Πr,Πc,Θr) and (P , Πr, Πc, Θr), if ΘrΠrPΠ′

c = ΘrΠrP Π′c, then P = P ,Πr = Πr,Πc =

Πc,Θr = Θr.

Proof By Lemma 18, since Uc = ΠcBc = ΠcUc(Ic, :) = ΠcUc(Ic, :), we have Πc = Πc.Since Ω(Ir,Ic) = Θr(Ir,Ir)Πr(Ir, :)PΠ′

c(Ic, :) = Θr(Ir,Ir)P = Ur(Ir, :)ΛU ′c(Ic, :), we

have Θr(Ir,Ir) = diag(Ur(Ir, :)ΛU ′c(Ic, :)) by the condition that P (k, k) = 1 for k ∈

[Kr]. Therefore, we also have Θr(Ir,Ir) = diag(Ur(Ir, :)ΛU ′c(Ic, :)), which gives that

Θr(Ir,Ir) = Θr(Ir,Ir). Since Θr(Ir,Ir)P = Ur(Ir, :)ΛU ′c(Ic, :) = Θr(Ir,Ir)P , we have

P = P . By Lemma 18, since Ur = ΘrΠrΘ−1r (Ir,Ir)Ur(Ir, :) = ΘrΠrΘ

−1r (Ir,Ir)Ur(Ir, :) =

ΘrΠrΘ−1r (Ir,Ir)Ur(Ir, :) and Ur(Ir, :) is an nonsingular matrix, we have ΘrΠr = ΘrΠr.

Since ‖Πr(ir, :)‖1 = ‖Πr(ir, :)‖1 = 1 for ir ∈ [nr], we have Πr = Πr and Θr = Θr.

Remark 20 (The reason that we do not introduce a model as an extension of ONM byconsidering degree heterogeneities for both row and column nodes) Suppose we propose anextension model (call it nontrivial-extension-of-ONM, and ne-ONM for short) of ONMsuch that E[A] = Ω = ΘrΠrPΠ′

cΘc. For model identifiability, we see that if ne-ONMis identifiable, the following should holds: when Ω = ΘrΠrPΠ′

cΘc = ΘrΠrP Π′cΘc, we

have Θr = Θr,Πr = Πr, P = P ,Πc = Πc and Θc = Θc. Now we check the identifiabil-ity of ne-ONM. Follow proof of Lemma 19, since Ω(Ir,Ic) = Θr(Ir,Ir)Πr(Ir, :)PΠ′

c(Ic, :)Θc(Ic,Ic) = Θr(Ir,Ir)PΘc(Ic,Ic) = Ur(Ir, :)ΛU ′

c(Ic, :), we have Θr(Ir,Ir)P = Ur(Ir, :)ΛU ′

c(Ic, :)Θ−1c (Ic,Ic). If we assume that P (k, k) = 1 for k ∈ [Kr], we have Θr(Ir,Ir) =

diag(Ur(Ir, :)ΛU ′c(Ic, :)Θ−1

c (Ic,Ic)). Similarly, we have Θr(Ir,Ir) = diag(Ur(Ir, :)ΛU ′c(Ic, :

)Θ−1c (Ic,Ic)), and it is impossible to guarantee the uniqueness of Θr(Ir,Ir) such that

Θr(Ir,Ir) unless we further assume that Θc(Ic,Ic) is a fixed matrix. However, when we fixΘc(Ic,Ic) such that ne-ONM is identifiable, ne-ONM is nontrivial due to the fact Θc(Ic,Ic)is fixed. And ne-ONM is trivial only when we set Θc(Ic,Ic) = IKc, however, for such ne-ONM when Θc(Ic,Ic) = IKc, ne-ONM is DCONM actually. The above analysis proposesthe reason that why we do not extend ONM by considering Θr and Θc simultaneously.

Follow similar idea as Qing (2021a), we can design spectral algorithm with consistent esti-mation to fit DCONM. Compared with ONM and ODCNM, the identifiability requirementof DCONM on P is too strict such that DCONM only model directed network generatedfrom P with diagonal “unit” elements, and this is the reason we do not provide DCONMin the main text and propose further algorithmic study as well as theoretical study for it. .

24


References

Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed mem-bership stochastic blockmodels. Journal of Machine Learning Research, 9:1981–2014,2008.

Edoardo M. Airoldi, Xiaopei Wang, and Xiaodong Lin. Multi-way blockmodels for analyzingcoordinated high-dimensional responses. The Annals of Applied Statistics, 7(4):2431–2457, 2013.

P. Erdos and A. Renyi. On the evolution of random graphs. pages 38–82, 2011.

Nicolas Gillis and Stephen A. Vavasis. Semidefinite programming based preconditioningfor more robust near-separable nonnegative matrix factorization. SIAM Journal on Op-timization, 25(1):677–698, 2015.

Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic block-models: First steps. Social Networks, 5(2):109–137, 1983.

Jiashun Jin. Fast community detection by SCORE. Annals of Statistics, 43(1):57–89, 2015.

Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Estimating network memberships bysimplex vertex hunting. arXiv: Methodology, 2017.

Antony Joseph and Bin Yu. Impact of regularization on spectral clustering. Annals ofStatistics, 44(4):1765–1791, 2016.

Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure innetworks. Physical Review E, 83(1):16107, 2011.

Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic blockmodels. Annals of Statistics, 43(1):215–237, 2015.

Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. Overlapping clustering models,and one (class) svm to bind them all. In Advances in Neural Information ProcessingSystems, volume 31, pages 2126–2136, 2018.

Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. Estimating mixed member-ships with sharp eigenvector deviations. Journal of the American Statistical Association,pages 1–13, 2020.

Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochas-tic blockmodel. Advances in Neural Information Processing Systems 26, pages 3120–3128,2013.

Huan Qing. Directed degree corrected mixed membership model and estimating communitymemberships in directed networks. arXiv preprint arXiv:2109.07826, 2021a.

Huan Qing. A useful criterion on studying consistent estimation in community detection.arXiv preprint arXiv:2109.14950, 2021b.

25

Qing

Huan Qing and Jingli Wang. Directed mixed membership stochastic blockmodel. arXivpreprint arXiv:2101.02307, 2021a.

Huan Qing and Jingli Wang. Consistency of spectral clustering for directed network com-munity detection. arXiv preprint arXiv:2109.10319, 2021b.

Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensionalstochastic blockmodel. Annals of Statistics, 39(4):1878–1915, 2011.

Karl Rohe, Tai Qin, and Bin Yu. Co-clustering directed graphs to discover asymmetries anddirectional communities. Proceedings of the National Academy of Sciences of the UnitedStates of America, 113(45):12679–12684, 2016.

Zhe. Wang, Yingbin. Liang, and Pengsheng. Ji. Spectral algorithms for community detectionin directed networks. Journal of Machine Learning Research, 21:1–45, 2020.

Yuan Zhang, Elizaveta Levina, and Ji Zhu. Detecting overlapping communities in networksusing spectral methods. SIAM Journal on Mathematics of Data Science, 2(2):265–283,2020.

Zhixin Zhou and Arash A.Amini. Analysis of spectral clustering algorithms for communitydetection: the general bipartite setting. Journal of Machine Learning Research, 20(47):1–47, 2019.

26

arXiv:2111.01392v1 [cs.SI] 2 Nov 2021

Documents

Transcript of arXiv:2111.01392v1 [cs.SI] 2 Nov 2021