arXiv:1608.06196v4 [cs.SI] 10 Aug 2019 · 2019-08-13 · A framework for the construction of...

A framework for the construction of generative modelsfor mesoscale structure in multilayer networks

Marya Bazzi,1, 2, 3, ∗ Lucas G. S. Jeub,1, 4, 5, ∗ Alex Arenas,6 Sam D. Howison,1 and Mason A. Porter1, 7, 8

1Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute,University of Oxford, Oxford OX2 6GG, United Kingdom

2The Alan Turing Institute, London NW1 2DB, United Kingdom3Warwick Mathematics Institute, University of Warwick, Coventry CV4 7AL, United Kingdom4Center for Complex Networks and Systems Research, School of Informatics and Computing,

Indiana University, Bloomington, Indiana 47408, USA5ISI Foundation, Turin, Italy

6Departament d’Enginyeria Informatica i Matematiques,Universitat Rovira i Virgili, 43007 Tarragona, Spain

7CABDyN Complexity Centre, University of Oxford, Oxford OX1 1HP, United Kingdom8Department of Mathematics, University of California,

Los Angeles, Los Angeles, California 90095, USA

Multilayer networks allow one to represent diverse and coupled connectivity patterns — e.g.,time-dependence, multiple subsystems, or both — that arise in many applications and which aredifficult or awkward to incorporate into standard network representations. In the study of multi-layer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, suchas dense sets of nodes known as communities, to discover network features that are not apparentat the microscale or the macroscale. The ill-defined nature of mesoscale structure and its ubiquityin empirical networks make it crucial to develop generative models that can produce the featuresthat one encounters in empirical networks. Key purposes of such generative models include gen-erating synthetic networks with empirical properties of interest, benchmarking mesoscale-detectionmethods and algorithms, and inferring structure in empirical multilayer networks. In this paper, weintroduce a framework for the construction of generative models for mesoscale structures in mul-tilayer networks. Our framework provides a standardized set of generative models, together withan associated set of principles from which they are derived, for studies of mesoscale structures inmultilayer networks. It unifies and generalizes many existing models for mesoscale structures infully-ordered (e.g., temporal) and unordered (e.g., multiplex) multilayer networks. One can also useit to construct generative models for mesoscale structures in partially-ordered multilayer networks(e.g., networks that are both temporal and multiplex). Our framework has the ability to producemany features of empirical multilayer networks, and it explicitly incorporates a user-specified de-pendency structure between layers. We discuss the parameters and properties of our framework,and we illustrate examples of its use with benchmark models for community-detection methods andalgorithms in multilayer networks.

I. INTRODUCTION

One can model many physical, technological, biologi-cal, financial, and social systems as networks. The sim-plest type of network is a graph [1], which consists ofa set of nodes (which represent entities) and a set ofedges between pairs of nodes that encode interactions be-tween those entities. One can consider either unweightedgraphs or weighted graphs, in which each edge has aweight that quantifies the strength of the associated in-teraction. Edges can also incorporate directions to rep-resent asymmetric interactions or signs to differentiatebetween positive and negative interactions.

However, this relatively simple structure cannot cap-ture many of the possible intricacies of connectivity pat-terns between entities. For example, in temporal net-works [2, 3], nodes and/or edges change in time; and

∗ Both authors contributed equally.

in multiplex networks [4], multiple types of interactionscan occur between the same pairs of nodes. To betteraccount for the complexity, diversity, and dependenciesin real-world interactions, one can represent such con-nectivity patterns using “multilayer networks” (see Sec-tion II A) [4–8]. We use the term single-layer network(which is also called a “monolayer network” or a “graph”)for a multilayer network with a single layer (i.e., an ‘ordi-nary’ network), and we use the term “network” to referto both single-layer and multilayer networks.

By using a single multilayer network instead of severalindependent single-layer networks, one can account forthe fact that connectivity patterns in different layers of-ten “depend” on each other. (We use the term dependentin a probabilistic sense throughout the paper: an objectA depends on B if and only if P(A|B) 6= P(A).) For ex-ample, the connectivity patterns in somebody’s Facebookfriendship network today may depend both on the con-nectivity patterns in that person’s Facebook friendshipnetwork last year (temporal) and on the connectivity pat-terns in that person’s Twitter followership network today

arX

iv:1

608.

0619

6v5

[cs

.SI]

11

Dec

201

9

2

(multiplex). Data sets that have multilayer structures areincreasingly available (e.g., see Table 2 of [4]). A natu-ral type of multilayer network consists of a sequence ofdependent single-layer networks, where layers may corre-spond to different temporal snapshots, different types ofrelated interactions that occur during a given time inter-val, and so on. Following existing terminology [8, 9], werefer to an instance of a node in one “layer” as a “statenode” (see Section II A).

There is a diverse set of models for multilayer networks.(We overview them in Section II C.) Many of these mod-els take a specific type of dependency (e.g., temporal) astheir starting point. In the present paper, we introducea framework for the construction of generative modelsfor multilayer networks that incorporate a wide varietyof structures and dependencies. It is broad enough tounify many existing, more restrictive interlayer specifica-tions, but it is also easy to customize to yield multilayernetwork models for many specific cases of interest. Keypurposes of such generative models include (1) generat-ing synthetic networks with empirical features of interest,(2) benchmarking methods and algorithms for detectingmesoscale structures, and (3) inferring structure in em-pirical multilayer networks.

A unifying framework

A key feature of multilayer networks is their flexibil-ity, which allows one to incorporate many different typesof data as part of a single structure. In this spirit, ourgoal is to provide a general, unifying framework that en-ables users to construct generative models of multilayernetworks with a large variety of features of interest in em-pirical multilayer networks by appropriately constrainingthe parameter space. We accomplish this in two consec-utive steps. First, we partition the set of state nodes ofa multilayer network. Second, we allocate edges, given amultilayer partition. We focus on modeling dependencyat the level of partitions (as was done in [10]), rather thanwith respect to edges. Additionally, we treat the processof generating a multilayer partition separately from thatof generating edges for a given multilayer partition. Thismodular approach yields random structures that can cap-ture a wide variety of interlayer-dependency structures,including temporal and/or multiplex networks, appear-ance and/or disappearance of entities, uniform or nonuni-form dependencies between state nodes from differentlayers, and others. For a specified interlayer-dependencystructure, one can then use any (single-layer or multi-layer) network model with a planted partition (i.e., aplanted-partition network model) to generate a wide va-riety of network features, including weighted edges, di-rected edges, and spatially-embedded layers.

The flexibility of our framework to generate multilayernetworks with a specified dependency structure betweendifferent layers makes it possible to (1) gain insight intowhether, when, and how to build interlayer dependen-

cies into methods for studying many different types ofmultilayer networks and (2) generate tunable “bench-mark models” that allow a principled comparison forcommunity-detection (and, more generally, “meso-set-detection”) tools for multilayer networks, including forcomplicated situations that arise in many applications(such as networks that are both temporal and multiplex)but thus far have seldom or never been studied. In manybenchmark models, one plants a partition of a networkinto well-separated “meso-sets” (e.g., communities), andone thereby imposes a so-called “ground truth” (shouldone wish to use such a notion) [11] that a properly de-ployed meso-set-detection method ought to be able tofind. Benchmark networks with known structural prop-erties can be important for (1) analyzing and compar-ing the performance of different meso-set-detection tools;(2) better understanding the inner workings of meso-set-detection tools; and (3) determining which tool(s) maybe most appropriate in a given situation.

One can also use our framework to generate syntheticnetworks with desired empirical properties, to generatenull networks, and to explore “detectability limits” ofmesoscale structures. (See, for example, the study of de-tectability thresholds in [10] for a model that is a specialcase of our framework.) With some further work, it isalso possible to use our framework to develop models forstatistical inference. In our concluding discussion (seeSection VII), we suggest directions for how to apply ourframework to the task of statistical inference. Our inten-tion in designing such a flexible framework is to ensurethat the generative models that it provides remain usefulas researchers consider progressively more general multi-layer networks in the coming years.

Paper outline

Our paper proceeds as follows. In Section II, we give anintroduction to multilayer networks, overview mesoscalestructure in networks, and review related work on gener-ative models for mesoscale structure. In Section III, weformally introduce the notation that we use throughoutthe paper. In Section IV, we explain how we generatea multilayer partition with a specified dependency struc-ture between layers. We also give examples of how toconstrain the parameter space of our framework to gen-erate several different types of multilayer networks anddiscuss what we expect to be common use cases (includ-ing temporal structure [10, 12, 13] and multiplex struc-ture [9, 13]) in detail. In Section V, we describe how wegenerate edges that are consistent with the planted parti-tion. In Section VI, we illustrate the use of our frameworkas a way to construct benchmark models for multilayercommunity detection. In Section VII, we summarize ourmain results and briefly discuss possible extensions ofour work to enable statistical inference on empirical mul-tilayer networks with our framework.

Along with this paper, we include code [14] that users

3

(1, (1,1)) (2, (1,1))

time

Facebook

(1, (2,1))

(2, (2,1))

(1, (1,2)) (2, (1,2))

(3, (1,2))

Twitter

(1, (2,2))

(2, (2,2))

(1, (1,3)) (2, (1,3))

(3, (1,3))

LinkedIn

(1, (2,3))

(2, (2,3))

(3, (2,3))

Figure 1. Toy example of a multilayer network with threephysical nodes and two aspects. We represent undirected in-tralayer edges using solid black lines, directed interlayer edgesusing dotted red lines with an arrowhead, and undirected in-terlayer edges using dashed blue arcs. The first aspect isordered and corresponds to time. It has two time points(which we label as “1” and “2”). Therefore, L1 = {1, 2}.The second aspect is unordered and represents a social-mediaplatform. It has three elements: Facebook (labeled “1”),Twitter (labeled “2”), and LinkedIn (labeled “3”). There-fore, L2 = {1, 2, 3}. In this example, a state node takesthe form (i, (α1, α2)), with physical node i ∈ {1, 2, 3}, as-pect α1 ∈ {1, 2}, and aspect α2 ∈ {1, 2, 3}. The total numberof layers is l = |L1| × |L2| = 6.

can modify to readily incorporate different types of “nulldistributions” (see Section IV A), “interlayer-dependencystructures” (see Section IV B), and planted-partition net-work models (see Section V). The model instantiationsthat one needs for generating the figures in Section VIare available at https://dx.doi.org/10.5281/zenodo.3304059.

II. BACKGROUND AND RELATED WORK

A. Multilayer networks

The simplest type of network is a graph G = (V, E),where V = {1, . . . , n} is a set of nodes (which correspondto entities) and E ⊆ V×V is a set of edges. Using a graph,one can encode the presence or absence of connections(the edges) between entities (the nodes). However, inmany situations, it is desirable to include more detailedinformation about connections between entities. A com-mon extension is to allow each edge to have a weight,which one can use to represent the strength of a connec-tion. We assign a weight to each edge using a weightfunction w : E → R.

As we mentioned in Section I, one can also general-ize the notion of a graph to encode different aspects ofconnections between entities, such as connections at dif-ferent points in time or multiple types of relationships.We adopt the framework of multilayer networks [4, 6–8, 15] to encode such connections in a network. In a

multilayer network, a node can be present in a varietyof different states, where each state is also characterizedby a variety of different aspects. In this setting, edgesconnect state nodes, each of which is the instantiation ofa given node in a particular state. The set of all statenodes of a given entity corresponds to a single physi-cal node (which represents the entity), and the set of allstate nodes in a given state is in one layer of a multi-layer network. In the remainder of this paper, we use theterms “physical node” and “node” interchangeably andthe terms “layer” and “state” interchangeably. One canthink of the aspects as features that one needs to specifyto identify the state of a node. In other words, a state isa collection of exactly one element from each aspect. Forconvenience, we introduce a mapping that assigns an in-teger label to each element of an aspect. That is, we mapthe la elements of the ath aspect to the elements of a set{1, . . . , la} of integers. Aspects can be unordered (e.g.,social-media platform) or ordered (e.g., time). Most em-pirical investigations of multilayer networks focus on asingle aspect (e.g., temporal [16–18] or multiplex [19–21]). However, many real-world situations include morethan one aspect (e.g., a multiplex network that changesover time). For an ordered aspect, we require that themapping respects the order of the aspect (e.g., ti → i fortime, where t1 ≤ . . . ≤ tla is a set of discrete time points).A multilayer network can include an arbitrary number ofordered aspects and an arbitrary number of unorderedaspects, and one can generalize these ideas further (e.g.,by introducing a time horizon) [4].

To illustrate the above ideas, consider a hypotheti-cal social network with connections on multiple social-media platforms (‘Facebook’, ‘Twitter’, and ‘LinkedIn’)between the same set of people at different points in time.In this example network, there are two aspects: social-media platform and time. (One can consider ‘type ofconnection’ (e.g., ‘friendship’ and ‘following’) as a thirdaspect, but we restrict our example to two aspects forsimplicity.) The first aspect is ordered, and the num-ber of its elements is equal to the number of time pointsor time intervals. (For simplicity, we often refer simplyto “time points” in our discussions.) The second aspectis unordered and consists of three elements: ‘Facebook’(which we label with the integer “1”), ‘Twitter’ (whichwe label with “2”), and ‘LinkedIn’ (which we label with“3”). If we assume that the time resolution is daily andspans the year 2010, an example of a state is the tuple(‘01-Jan-2010’, ‘Twitter’), which is (1, 2) in our short-hand notation. We show a multilayer representation ofthis example social network in Fig. 1.

Edges in a multilayer network can occur either betweennodes in the same state (i.e., intralayer edges) or betweennodes in different states (i.e., interlayer edges). An ex-ample of an intralayer edge in Fig. 1 is (1, (1, 1)) ↔ (2,(1, 1)), indicating that entity 1 is friends with entity 2on Facebook at time 1. All interlayer edges in Fig. 1 arediagonal (because they connect state nodes that corre-spond to the same physical node). Interlayer edges be-

https://dx.doi.org/10.5281/zenodo.3304059


4

tween layers at different times for a given social-mediaplatform are ordinal (because they connect state nodeswith successive time labels [22]) and directed. Interlayeredges between concurrent layers that correspond to dif-ferent social-media platforms are categorical and undi-rected. (In this example, such edges occur between statenodes from all pairs of concurrent layers, although onecan envision situations in which interlayer edges occuronly between state nodes from a subset of such layers.)An example of an ordinal intralayer edge in Fig. 1 is (1,(1, 1)) → (1, (2, 1)), indicating a connection from entity1 in Facebook at time 1 to entity 1 in Facebook at time2. An example of a categorical intralayer edge in Fig. 1 is(1, (1, 1)) ↔ (1, (1, 2)), indicating a connection betweennode 1 in Facebook at time 1 and node 1 in Twitter attime 1. One can also imagine other types of interlayeredges in an example like the one in Fig. 1, as there may beedges between state nodes at successive times and differ-ent social-media platforms (e.g., (1, (1, 1))→ (1, (2, 2))).

B. Mesoscale structures in networks

Given a (single-layer or multilayer) network represen-tation of a system, it is often useful to apply a coarse-graining technique to investigate features that lie be-tween those at the microscale (e.g., nodes, pairwise in-teractions between nodes, or local properties of nodes)and those at the macroscale (e.g., total edge weight, de-gree distribution, or mean clustering coefficient) [1]. Onethereby studies mesoscale features such as communitystructure [23–26], core–periphery structure [27, 28], rolestructure [29], and others.

Our framework can produce multilayer networks withany of the above mesoscale structures. Notwithstand-ing this flexibility, an important situation is multilayernetworks with “community structure”, which are themost commonly studied type of mesoscale structure [1].Community detection is part of the standard toolkit forstudying single-layer networks [23, 24, 30], and effortsat community detection in multilayer networks have ledto insights in applications such as brain and behavioralnetworks in neuroscience [31], financial correlation net-works [16], committee and voting networks in politicalscience [32, 33], networks of interactions between bacte-rial species [21], disease-spreading networks [12], socialnetworks in Indian villages [34], and much more.

Loosely speaking, a community in a network is a setof nodes that are “more densely” connected to eachother than they are to nodes in the rest of the net-work [23, 24, 30, 35, 36]. Typically, a ‘good commu-nity’ should be a set of nodes that are ‘surprisinglywell-connected’ in some sense, but what one means by‘surprising’ and ‘well-connected’ is often application-dependent and subjective. In many cases, a precise defi-nition of “community” depends on the method that oneuses to detect communities. In particular, many popu-lar community-detection approaches in single-layer and

multilayer networks define communities as sets in a par-tition of a network that optimizes a quality function suchas modularity [1, 24, 37]; stability [38–40]; Infomap andits variants [9, 41]; likelihood functions that are derivedfrom stochastic block models (SBMs), which are modelsfor partitioning a network into sets of nodes with sta-tistically homogeneous connectivity patterns [21, 42–45];and others [4, 46, 47]. In this paper, we refer to a par-tition of the set of nodes of a single-layer network asa single-layer partition and to a partition of the set ofstate nodes of a multilayer network as a multilayer parti-tion. The two primary differences between a communityin a multilayer partition and a community in a single-layer partition are that (1) the former can include statenodes from different layers; and (2) “induced communi-ties” (see Section III) in one layer of a multilayer networkmay depend on connectivity patterns in other layers. Formany notions of (single-layer or multilayer) communitystructure — including the most prominent methods —one cannot exactly solve a community-assignment prob-lem in polynomial time (unless P = NP) [30, 48–50]; andpopular scalable heuristics currently have few or no the-oretical guarantees on how closely an identified partitionresembles an optimal partition. These issues apply moregenerally to the field of cluster analysis, such as in graphpartitioning [51], and many of the problems that plaguecommunity detection also apply to detecting other kindsof mesoscale structures.

Throughout our paper, to make it clear which resultsand observations apply to community structure in par-ticular and which apply to mesoscale structure more gen-erally, we use the term “community” when referring toa set in a partition of the set of nodes (or the set ofstate nodes) of a network that corresponds to commu-nity structure and the term “meso-set” (see Section III)when referring to a set in a partition that corresponds toany type of mesoscale structure. (In particular, a com-munity is a type of meso-set.)

C. Generative models for mesoscale structure

The ubiquity and diversity of mesoscale structures inempirical networks make it crucial to develop genera-tive models of mesoscale structure that can produce fea-tures that one encounters in empirical networks. Broadlyspeaking, the goal of such generative models is to con-struct synthetic networks that resemble real-world net-works when one appropriately constrains and/or cali-brates the parameters of a model using information abouta scenario or application. Generative models of mesoscalestructure can serve a variety of purposes, such as (1) gen-erating benchmark network models for comparing meso-set-detection methods and algorithms [10, 52–56]; (2)undertaking statistical inference on empirical networks[10, 43, 57]; (3) generating synthetic networks with a de-sired set of properties [20, 58]; (4) generating null modelsto take into account available information about an em-

5

pirical network [59]; and (5) investigating “detectabilitylimits” for mesoscale structure, as one can plant par-titions that, under suitable conditions, cannot subse-quently be detected algorithmically, despite the fact thatthey exist by construction [10, 60, 61].

One of the main challenges in constructing a realis-tic generative model (even for single-layer networks) isthe breadth of possible empirical features in networks.The available generative models for mesoscale structurein single-layer networks usually focus on replicating afew empirical features at a time (rather than all ofthem at once): heterogeneous degree distributions andcommunity-size distributions [43, 53, 62], edge-weightdistribution [52, 57, 63], spatial embeddedness [12, 64],and so on. Multilayer networks inherit all of the empiri-cal features of single-layer networks, and they also have akey additional one: dependencies between layers. Theseinterlayer dependencies can be ordered (as in most mod-els of temporal networks), unordered (as in multiplex net-works), combinations of these, or something more compli-cated. However, despite this variety, existing generativemodels for mesoscale structure in multilayer networks al-low only a restrictive set of interlayer dependencies. Theyassume either a temporal structure [10, 12, 54, 65–67], asimplified multiplex structure with the same planted par-titions across all layers [20, 44, 68–70], or independentgroups of layers such that layers in the same group haveidentical planted partitions [9, 21]. Using an alternativeapproach, a very recent model is able to generate multi-layer partitions that satisfy the constraint that nonemptyinduced meso-sets in different layers that correspond tothe same meso-set in the multilayer partition are iden-tical [71]. Recent work [13] on the link between multi-layer modularity maximization and maximum-likelihoodestimation of multilayer SBMs allows either temporalor multiplex interlayer dependencies with induced parti-tions that can vary across layers, but it makes restrictiveassumptions on interlayer dependencies (e.g., all layershave the same set of nodes, interdependencies are “diag-onal” and “layer-coupled”, and so on). Importantly, inall aforementioned generative models of mesoscale struc-ture in multilayer networks, interlayer dependencies areeither (1) not explicitly specifiable or (2) special cases ofthe framework that we discuss in the present paper.

III. NOTATION

We now present a comprehensive set of notation formultilayer networks. Different subsets of notation areuseful for different situations.

We consider a multilayer network M = (VM , EM ,V,L)with n = |V| nodes (i.e., physical nodes) and l = |L|layers (i.e., states). We use d to denote the number ofaspects and La = {1, . . . , la} to denote the labels of theath aspect (where a ∈ {1, . . . , d}). We use O to denotethe set of ordered aspects and U to denote the set ofunordered aspects. For each ordered aspect a ∈ O, we

assume that the labels La reflect the order of the aspect.That is, for all α, β ∈ La, we require that α < β if andonly if α precedes β. We say that a multilayer network isfully-ordered if U = ∅, unordered if O = ∅, and partially-ordered otherwise. The set L = L1 × . . . × Ld of statesis the Cartesian product of the aspects, where a stateα ∈ L is an integer-valued vector of length d and eachentry specifies an element of the corresponding aspect.

Note that l = |L| = ∏da=1 la.

We use (i,α) ∈ VM ⊆ V × L to denote the state node(i.e., “node-layer tuple” [4]) of physical node i ∈ V instate α ∈ L. We include a state node in VM if and onlyif the corresponding node exists in that state. The edgesEM ⊆ VM × VM in a multilayer network connect statenodes to each other. We use ((i,α), (j,β)) to denote adirected edge from (i,α) to (j,β). For two state nodes,(i,α) and (j,β), that are connected to each other viaa directed edge ((i,α), (j,β)) ∈ EM , we say that (i,α)is an in-neighbor of (j,β) and (j,β) is an out-neighborof (i,α). We categorize the edges into intralayer edgesEL, which have the form ((i,α), (j,α)) and link entities iand j in the same state α, and interlayer (i.e., coupling)edges EC , which have the form ((i,α), (j,β)) for α 6= β.We thereby decompose the edge set as EM = EL ∪ EC .

We define a weighted multilayer network by intro-ducing a weight function w : EM → R (analogous tothe weight function for weighted single-layer networks),which encodes the edge weights within and between lay-ers. For an unweighted multilayer network, w(e) = 1 forall e ∈ EM . We encode the connectivity pattern of amultilayer network using an adjacency tensor A, whichis analogous to the adjacency matrix for single-layer net-works, with entries

Aj,βi,α =

{w(((i,α), (j,β))) , ((i,α), (j,β)) ∈ EM ,

0 , otherwise .(1)

Note that GM = (VM , EM ) is a graph on the statenodes of the multilayer network M . We refer to GMas the flattened network of M . The adjacency matrixof the flattened network is the “supra-adjacency ma-trix” [4, 15, 72, 73] of the multilayer network. One ob-tains a supra-adjacency matrix by flattening [74] an ad-jacency tensor (see Eq. (1)) of a multilayer network. Themultilayer network and the corresponding flattened net-work encode the same information [4, 75], provided onekeeps track of the correspondence between the nodes inthe flattened network and the physical nodes and lay-ers of the multilayer network. In a similar vein, aspectsprovide a convenient way to keep track of the correspon-dence between state nodes and, for example, temporaland multiplex relationships.

We denote a multilayer partition with nset sets by S ={S1, . . . ,Snset

}, where⋃nset

s=1 Ss = VM and Ss∩Sr = ∅ fors 6= r. We represent a partition S using a partition tensorS with entries Si,α, where Si,α = s if and only if the statenode (i,α) is in the set Ss. We refer to a partition of atemporal network (a common type of multilayer network)

6

as a temporal partition and a partition of a multiplexnetwork (another common type of multilayer network)as a multiplex partition. A multilayer partition induces apartition S|α = {S1|α, . . . ,Snset

|α} on each layer, whereSs|α = {i ∈ V : (i,α) ∈ Ss}. We refer to a set Ssof a partition S as a meso-set, to S|α as the inducedpartition of S on layer α, and to Ss|α as the inducedmeso-set of Ss on layer α. A community is a set Ss in apartition that corresponds to community structure. Wecall s ∈ {1, . . . , nset} the label of meso-set Ss. One way toexamine overlapping meso-sets is by identifying multiplestate nodes from a single layer with the same physicalnode.

IV. GENERATING MULTILAYER PARTITIONS

The systematic analysis of dependencies between layersis a key motivation for analyzing a single multilayer net-work, instead of examining several single-layer networksindependently. The goal of our partition-generation pro-cess is to model interlayer dependency in a way that canincorporate diverse types of dependencies. We now mo-tivate our partition-modeling approach, and we describeit in detail in Sections IV A–D.

The complexity of dependencies between layers canmake it difficult to explicitly specify a joint probabil-ity distribution for meso-set assignments, especially forunordered or partially-ordered multilayer networks. Toaddress this issue, we require only the specification ofconditional probabilities for a state node’s meso-set as-signment, given the assignments of all other state nodes.The idea of conditional models is old and follows nat-urally from Markov-chain theory [76, 77]. Specifyingconditional models (which capture different dependencyfeatures separately) rather than joint models (which tryto capture many dependency features at once) is con-venient for numerous situations. For example, condi-tionally specified distributions have been applied to areassuch as spatial data modeling [78], imputation of missingdata [77], secure disclosure of information [79], depen-dence networks for combining databases from differentsources [80], and Gibbs sampling [81, 82].

A problem with conditional models is that the differ-ent conditional probability distributions are not neces-sarily compatible, in the sense that there may not ex-ist a joint distribution that realizes all conditional dis-tributions [80, 82–85]. Although several methods havebeen developed in recent years for checking the compat-ibility of discrete conditional distributions, these eithermake restrictive assumptions on the conditional distri-butions or have scalability constraints that significantlylimit practical use [77, 86]. Nevertheless, the employmentof conditional models (even if potentially incompatible) iscommon [87], and they have been useful in many applica-tions, provided that one cautiously handles any potentialincompatibility [77]. For our use case, potential incom-patibility arises for unordered or partially-ordered mul-

tilayer networks. An issue that can result from it is thenon-uniqueness of a joint distribution. We carefully de-sign our partition-generation process such that it is well-defined irrespective of whether conditional distributionsare compatible. In particular, we show that convergenceis guaranteed and we address non-uniqueness by appro-priately sampling initial conditions (see Appendix B). Wealso suggest empirical checks to ensure that a generatedpartition reflects planted interlayer dependencies.

In our framework, we define conditional probabilitiesin two parts: we separately specify (1) independent layer-specific random components and (2) interlayer dependen-cies. For a choice of interlayer dependencies, there areseveral features that one may want to allow in a multi-layer partition. These include variation in the numbersand sizes of meso-sets across layers (e.g., meso-sets cangain state nodes, lose state nodes, appear, and disap-pear) and the possibility for these meso-set variationsto incorporate features of the application at hand. (Forexample, in a temporal network, one may not want ameso-set to reappear after it disappears.) We ensure thatsuch features are possible for any choice of interlayer de-pendency via independent layer-specific null distributionsthat specify the set of possible meso-set assignments foreach layer and determine the expected size and expectednumber of meso-sets in the absence of interlayer depen-dencies.

Interlayer dependencies should reflect the type of mul-tilayer network that one is investigating. For example,in a temporal network, dependencies tend to be strongerbetween adjacent layers and weaker for pairs of layersthat are farther apart. It is common to assume that in-terlayer dependencies are uniform across state nodes fora given pair of layers (i.e., interlayer dependencies are“layer-coupled” [4]) and occur only between state nodesthat correspond to the same physical node (i.e., inter-layer dependencies are “diagonal” [4]) [12, 16, 21, 33, 61,72, 73, 88]. However, these assumptions are too restric-tive in many cases (e.g., situations in which dependenciesare state-node-dependent or in which dependencies existbetween state nodes that do not correspond to the samephysical node) [9, 46, 70, 89–91]. To ensure that onecan relax these assumptions, we allow dependencies tobe specified at the level of state nodes (or at the level oflayers, when a user assumes that dependencies are layer-coupled). We encode these interlayer dependencies ina user-specified interlayer-dependency tensor that deter-mines the extent to which a state node’s meso-set assign-ment in one layer “depends directly” on the assignment ofstate nodes in other layers (see Sections IV B and C). Ourindependence assumption on the null distributions allowsus to encode all of the interlayer dependencies in a singleobject (an interlayer-dependency tensor). The entries ofan interlayer-dependency tensor specify an “interlayer-dependency network” and correspond to the causal linksfor the flow of information (in the information-theoreticsense) between different layers of a multilayer network.Therefore, they should reflect any constraints that one

7

wishes to impose on the direct flow of information be-tween layers. Longer paths in an interlayer dependencynetwork yield indirect dependencies between structuresin different layers.

After specifying the interlayer-dependency structure,we define the conditional meso-set assignment probabil-ities such that we either sample a state node’s meso-set assignment from the corresponding null distributionor obtain it by copying the assignment of another statenode (based on the interlayer-dependency tensor). Usingthese conditional probabilities, we define an iterative up-date process on the meso-set assignments of state nodesto generate multilayer partitions with dependencies be-tween induced partitions in different layers. When updat-ing the meso-set assignments of state nodes, we respectthe order of an aspect (e.g., temporal ordering). For afully-ordered multilayer network, our update process re-duces to sequentially sampling an induced partition foreach layer based on the induced partitions of previouslayers. For an unordered multilayer network, our updateprocess defines a Markov chain on the space of multilayerpartitions. We sample partitions from a stationary dis-tribution of this Markov chain. This sampling strategy isknown as (pseudo-)Gibbs sampling [80–82, 87]. (We usethe word “pseudo” because the conditional probabilitiesthat we use to define the Markov chain are not necessarilycompatible.) For a partially-ordered multilayer network,our update process combines these two sampling strate-gies.

In Section IV A, we describe possible choices for theindependent layer-specific null distributions. In Sec-tion IV B, we explain our framework for generating a mul-tilayer partition with general interlayer dependencies. InSection IV C, we focus on the specific situation in whichinterlayer dependencies are layer-coupled and diagonal,and we also assume that a physical node is present inall layers (i.e., the network is “fully interconnected” [4]).In Section IV D, we illustrate the properties of exampletemporal and multiplex partitions that are generated bymodels that we construct from our framework. Addition-ally, we take advantage of the tractability of the specialcase of temporal networks to highlight some of its prop-erties analytically.

A. Null distribution

We denote the null distribution of layer α by Pα0 and

the set of all null distributions by P0 = {Pα0 ,α ∈ L}.

A simple choice for the null distributions is a categoricaldistribution, where for each layer α and each meso-setlabel s, we fix the probability pαs that an arbitrary statenode in layer α is assigned to a meso-set s in the absenceof interlayer dependencies. That is,

Pα0 [s] =

{pαs , s ∈ {1, . . . , nset} ,0 , otherwise ,

(2)

where nset is the total number of meso-sets in the mul-tilayer partition and

∑nset

s=1 pαs = 1 for all α ∈ L. The

set {1, . . . , nset} is the set of meso-set labels, and thesupport of a null distribution is the set of labels thathave nonzero probability. In the absence of interlayerdependencies, a categorical null distribution correspondsto specifying independent multinomial distributions forthe sizes of induced meso-sets on each layer and fixingthe expected size npαs of each induced meso-set. There-fore, by choosing the probabilities pαs , one has some con-trol over the expected number and sizes of meso-sets ina sampled multilayer partition. A natural choice for pα

is to sample it from a Dirichlet distribution, which is theconjugate prior for the categorical distribution [92, 93].One can think of the Dirichlet distribution, which is themultivariate form of the beta distribution, as a proba-bility distribution over the space of all possible categori-cal distributions with a given number of categories. Anyother (probabilistic or deterministic) choice for pα is alsopossible. We give further detail about the Dirichlet dis-tribution in Appendix A, where we discuss how one canvary its parameters to control the expected number andsizes of meso-sets in the absence of interlayer dependen-cies. This can allow a user to generate, for example, a nulldistribution with equally-sized meso-sets or a null distri-bution with a few large meso-sets and many small meso-sets. Furthermore, irrespective of the particular choiceof categorical distribution, it may be desirable to havemeso-sets that have a nonzero probability of obtainingstate nodes from the null distribution in some, but notall, layers. In Appendix A, we give examples of how onecan sample the support of the null distributions beforethe probability vectors to incorporate this property whenmodeling the birth and/or death of meso-sets in temporalnetworks and the presence and/or absence of meso-setsin multiplex networks.

In general, the choice of the null distributions can havea large effect on the set of sampled multilayer parti-tions and ought to be guided by one’s use case. Forour numerical examples in Section VI, we fix a valueof nset ∈ {1, . . . , nl}, where n is the number of physi-cal nodes in each layer and l the number of layers (seeSection III); and we use a symmetric Dirichlet distri-bution with parameters q = nset and θ = 1 (see Ap-pendix A) to sample probability vectors pα of lengthnset. This produces multilayer partitions in which theexpected meso-set labels are the same across layers (andgiven by {1, . . . , nset}) in the absence of interlayer de-pendencies and for which the expected induced meso-setsizes (which are npαs in the absence of interlayer depen-dencies) differ across layers.

B. General interlayer dependencies

We denote the user-specified interlayer-dependency

tensor by P , where P j,βi,α is the probability that state

node (j,β) copies its meso-set assignment from state

8

node (i,α), for any two state nodes (i,α), (j,β) ∈ VM ,and where P is fixed throughout the copying process.The probability that state node (j,β) copies its meso-setassignment from an arbitrary state node when state node(j,β)’s meso-set assignment is updated is

pj,β =∑

(i,α)∈VMP j,βi,α , (3)

where we require that pj,β ≤ 1 for all state nodes(j,β) ∈ VM . We also require that all intralayer proba-

bilities are 0; that is, P j,αi,α = 0 for all i, j ∈ V and α ∈ L.

We say that a state node (j,β) depends directly on a

state node (i,α) if and only if P j,βi,α is nonzero. By ex-tension, we say that a layer β depends directly on alayer α if there exists at least one state node in layerβ that depends directly on a state node in layer α.The interlayer-dependency tensor induces the interlayer-dependency network, whose edges are all interlayer, di-rected, and pointing in the direction of information flowbetween layers. The in-neighbors (see Section III) of astate node (j,β) in this network consists of the statenodes from which (j,β) can copy a meso-set assignmentwith nonzero probability. The support of the null distri-

bution Pβ0 corresponds to the only possible meso-set as-

signments for a state node (j,β) whenever the state nodedoes not copy its assignment from one of its in-neighborsin the interlayer-dependency network.

For a given null distribution P0 and interlayer-dependency tensor P , a multilayer partition that resultsfrom our sampling process depends on four choices: (1)the way in which we update a state node assignment at agiven step; (2) the order in which we update state nodeassignments; (3) the initial multilayer partition; and (4)the criterion for convergence of the iterative update pro-cess. We discuss points (1), (2), and (3) in the remainderof this subsection. We describe the sampling process inmore detail and discuss convergence in Appendix B.

Update equation

A single meso-set-assignment update depends only onthe choice of state node to update and on the currentmultilayer partition. Let τ be an arbitrary update stepof the copying process. Suppose that we are updatingthe meso-set assignment of state node (j,β) at step τand that the current multilayer partition is S(τ) (withpartition tensor S(τ)). We update the meso-set assign-ment of state node (j,β) either by copying the meso-setassignment in S(τ) from one of its in-neighbors in theinterlayer-dependency network or by obtaining a new,random meso-set assignment from the null distribution

Pβ0 for layer β. In particular, with probability pj,β, a

state node (j,β) copies its meso-set assignment from oneof its in-neighbors in the interlayer-dependency network;and with probability 1− pj,β, it obtains its meso-set as-

signment from the null distribution Pβ0 . This yields the

following update equation at step τ of our copying pro-cess:

P[Sj,β(τ + 1) = s|S(τ)]

=∑

(i,α)∈VMP j,βi,α δ(Si,α(τ), s)

+ (1− pj,β)Pβ0 [Sj,β = s] .

(4)

The update equation (4) is at the heart of our frame-work. It is clear from Eq. (4) that the set of null distribu-tions P0 is responsible for the specification of meso-set as-signments in the absence of interlayer dependencies (i.e.,

if P j,βi,α = 0 for all (i,α), (j,β)). In general, pj,β deter-mines the relative importance of interlayer dependenciesand the null distribution on the meso-set assignment ofstate node (j,β). Specifically, when pj,β = 0, the meso-set assignment of (j,β) depends only on the null distri-bution; and when pj,β = 1, the meso-set assignment of(j,β) depends only on the meso-set assignments of itsin-neighbors in the interlayer-dependency network.

Update order

The order in which we update meso-set assignments ofstate nodes via Eq. (4) can influence a generated mul-tilayer partition. As we mentioned in Section II A, anaspect of a multilayer network can be either ordered orunordered, and the update order is of particular impor-tance when generating a multilayer partition with at leastone ordered aspect. In particular, the structure of theinterlayer-dependency tensor should reflect the causalitythat arises from the order of the aspect’s elements. For anordered aspect, structure in a given layer should dependdirectly only on structure in previous layers. Formally,for each ordered aspect a ∈ O of a multilayer network,we require that

P j,βi,α > 0⇒ αa ≤ βa , (5)

where αa denotes the element of state α correspondingto aspect a and where (as we stated in Section III) werequire that the labels of an aspect’s elements reflect theordering of those elements [94]. For example, if we thinkof the interlayer edges in Fig. 1 as interlayer edges in theinterlayer-dependency network, then Eq. (5) states thatall edges with nonzero edge weights must respect the ar-row of time. That is, it is impossible to have edges of theform ((j, (2, α2)), (i, (1, α2))), with nodes i, j ∈ {1, 2, 3}and aspect α2 ∈ {1, 2, 3}.

The update order for the state nodes also needs toreflect the causality from the ordered aspects. In partic-ular, if a ∈ O is an ordered aspect and we update a statenode in a layer β, then the final meso-set assignment forstate nodes in layers α with αa < βa should already befixed. For example, in Fig. 1, an edge direction should re-spect the arrow of time; and it is also necessary that the

9

assignment of all state nodes in the first time point arefixed before one updates the assignments of state nodesin the second time point. To achieve the latter, we di-vide the layers into sets of “order-equivalent” layers, suchthat two layers α and β are order-equivalent if and onlyif αa = βa for all a ∈ O. We also say that layer α pre-cedes layer β if and only if αa < βa for all a ∈ O. (Forexample, the three layers in Fig. 1 that correspond tothe first time point are order-equivalent and precede thethree layers that correspond to the second time point.)Based on this equivalence relation, we obtain an orderedset of equivalence classes by inheriting the order from theordered aspects. If we sort the classes of order-equivalentlayers based on “lexicographic” order [95] of the orderedaspects, then (as a consequence of Eq. (5)) the meso-setassignment of state nodes in a given layer depends only onthe assignment of state nodes in order-equivalent or pre-ceding layers. This ensures that our partition-generationprocess reflects notions of causality that arise in multi-layer networks with at least one ordered aspect.

Sampling process

The general idea of our sampling process is to simul-taneously sample the meso-set assignment of state nodeswithin each class of order-equivalent layers and to se-quentially sample the meso-set assignment of state nodesin non-order-equivalent layers, conditional on the fixedassignment of state nodes in preceding layers.

More specifically, our sampling algorithm proceeds asfollows. First, we sample an initial multilayer partitionfrom the null distribution (i.e., Si,α(0) ∼ Pα

0 ). Sec-ond, we sample a partition for the first class of order-equivalent layers using (pseudo-)Gibbs sampling. In par-ticular, we iteratively sample a layer uniformly at ran-dom from the first class of order-equivalent layers andupdate the meso-set assignment of all state nodes in thatlayer [96] based on Eq. (4). This defines a Markov chainon a subspace of multilayer partitions. We repeat theupdate process for sufficiently many iterations such thatwe approximately sample the meso-set assignments ofthe first class of order-equivalent layers from a station-ary distribution of this Markov chain. Third, we samplestate node assignments for subsequent classes of order-equivalent layers in the same way, based on a fixed state-node assignment from preceding layers. In particular, ateach update step τ in Eq. (4), a state node can eithercopy a fixed meso-set assignment from an in-neighbor ina preceding layer, copy a current meso-set assignmentfrom an in-neighbor in an order-equivalent layer, or ob-tain a meso-set assignment from the null distribution.

For a fully-ordered multilayer network (i.e., a multi-layer network with U = ∅), each class of order-equivalentlayers consists only of a single layer. Consequently,one needs only a single update for each state nodefor the sampling process to converge. (Subsequent up-dates would constitute independent samples from the

same distribution.) For an unordered multilayer network,our sampling algorithm reduces to (pseudo-)Gibbs sam-pling, because all layers are order-equivalent and thereis thus only one class of order-equivalent layers. For apartially-ordered multilayer network, there are multiplenon-singleton classes of order-equivalent layers. We use(pseudo-)Gibbs sampling on each class, conditional onthe meso-set assignment of state nodes in preceding lay-ers.

In Appendix B, we explain our sampling process inmore detail. We describe our (pseudo-)Gibbs samplingprocedure for each class of order-equivalent layers, andwe show pseudo-code that one can use to sample parti-tions in unordered, partially-ordered, and fully-orderedmultilayer networks in Algorithm 1. We also discussthe convergence properties of our sampling procedure forunordered and partially-ordered multilayer networks, in-cluding the effect of the potential incompatibility of thedistributions that are defined by Eq. (4).

C. Layer-coupled and diagonal interlayerdependencies

As we mentioned at the beginning of Section IV, a par-ticularly useful restriction of the interlayer-dependencytensor that still allows us to represent many situations ofinterest is to assume that it is layer-coupled and diago-nal. For simplicity, we also assume that the multilayernetwork is fully interconnected. Under these assump-tions, we can express the interlayer-dependency tensor

P using a layer-coupled interlayer-dependency tensor P

with elements P j,βi,α = δ(i, j)Pβα . The probability that

state node (j,β) copies its meso-set assignment from anarbitrary state node when (j,β)’s meso-set assignment isupdated is given by

pβ =∑

α∈LPβα . (6)

As before, we require that pβ ≤ 1 and Pαα = 0 for all

α ∈ L. The update equation (Eq. (4)) then simplifies to

P[Sj,β(τ + 1) = s|S(τ)]

=∑

α∈LPβα δ(Sj,α(τ), s)

+ (1− pβ)Pβ0 [Sj,β = s] ,

(7)

which depends on the interlayer-dependency tensor P(which is now independent of state nodes) and the null

distributions P0. Each term Pβα quantifies the extent to

which an induced partition in layer β depends directly onan induced partition in layer α. By considering different

P , one can generate multilayer networks that correspondto several important scenarios, including temporal net-works, multiplex networks, and multilayer networks with

10

0 p2 0 0

0

pl

0 0

P =

P βα = δ(α+ 1, β)pβ ,

pβ = pβ ≤ 1

(a) Layer-coupled temporaldependencies

0 p p

p

p

p p 0

P =

P βα = (1− δ(α, β))p ,

p = (l − 1) p ≤ 1

(b) Layer-coupled multiplexdependencies

Figure 2. Layer-coupled interlayer dependency tensors(which, in this case, are matrices) for different types of multi-layer networks with a single aspect. (a) In a typical temporalnetwork, an induced partition in a layer depends directly onlyon the induced partition in the previous layer. Therefore, theonly nonzero elements of the layer-coupled interlayer depen-dency tensor occur in the first superdiagonal. (b) In a typicalmultiplex network, an induced partition in a layer depends di-rectly on the induced partitions in all other layers. We showa layer-coupled example. The copying probability in Eq. (6)is pβ for the temporal layer-coupled interlayer-dependencymatrix in panel (a) and p for the multiplex layer-coupledinterlayer-dependency matrix in panel (b). We suppress thesubscript β in panel (b), as the copying probability is thesame for all layers.

more than one aspect (e.g., with combinations of tempo-ral and multiplex features). That is, the above restrictionreduces the dimension of the interlayer-dependency ten-sor significantly, while still allowing one to analyze severalvery important situations.

In Fig. 2, we show layer-coupled interlayer-dependencytensors for two types of single-aspect multilayer networks:a temporal network and a multiplex network. As we men-tioned in Section IV, it is useful to think of interlayerdependencies as causal links for the flow of informationbetween layers. In a temporal network, it is typical toassume that an induced partition in a layer depends di-rectly only on induced partitions in the previous layer.There are thus l − 1 copying probabilities (one for eachpair of consecutive layers), which we are free to choose.Common examples include choosing the same probabilityfor each pair of consecutive layers [10, 12] or making someof the probabilities significantly smaller than the othersto introduce change points [44, 97]. In a multiplex net-work, an induced partition in any layer can in principledepend directly on induced partitions in all other layers.This yields l(l − 1) copying probabilities to choose. InFig. 2b, we illustrate the simplest case, in which eachlayer depends equally on every other layer. The layer-coupled interlayer-dependency tensors in Fig. 2 are ma-trices, so we sometimes refer to them and other examplesof layer-coupled interlayer-dependency tensors with a sin-gle aspect as layer-coupled interlayer-dependency matri-

0 p1,1 p1,1 p1,2 0 0

p1,1 0

p1,1 0

p1,1 p1,1 0 0 0 p1,2

0 0 0 p2,2 p2,2

p2,2

p2,2

0 0 p2,2 p2,2 0

α2=

1α2=

2

β2 = 1 β2 = 2

P =

P β1,β2α1,α2= (1− δ(α1, β1))δ(α2, β2)pβ1,β2

+ δ(α1, β1)δ(α2 + 1, β2)pβ1,β2

Figure 3. Block-matrix representation of a layer-coupledinterlayer-dependency tensor for a multilayer network thatis both temporal and multiplex. This example has morethan one aspect, and it combines the features of the exam-ples in Fig. 2. It has two classes of order-equivalent layers(where each corresponds to one time point), as indicated bythe presence of two diagonal blocks. An induced partitionin a layer depends directly on the induced partitions of itsorder-equivalent layers and on the induced partition of thecorresponding layer in the preceding class of order-equivalentlayers. If we think of the interlayer edges in Fig. 1 as edges inan interlayer-dependency network, this matrix with diagonalblocks of size 3× 3 would be the corresponding layer-coupledinterlayer-dependency tensor.

ces.We can also generate multilayer networks with more

than one aspect and can thereby combine temporal andmultiplex features. In Fig. 3, we illustrate how to con-struct a layer-coupled interlayer-dependency tensor togenerate such a multilayer network on a simple exam-ple with two aspects, one of which is multiplex and theother of which is temporal.

D. Temporal and multiplex partitions

In Fig. 4 and Fig. 5, we show example multilayer parti-tions that we obtain with the interlayer-dependency ten-sors of Fig. 2. The examples in Fig. 4 are temporal,and the examples in Fig. 5 are multiplex. For simplicity,we assume in Fig. 2a that dependencies between con-tiguous layers are uniform (i.e., pβ = p ∈ [0, 1] for allβ ∈ {2, . . . , l}). To illustrate the effect of the interlayerdependencies, we also show a heat map of the normalizedmutual information (NMI) [98] between induced parti-tions in different layers. NMI is a measure of similaritybetween partitions, so we expect the values of NMI to

11

1 5 10 15

1

50

100

150

layer

node

Partition

1 50 100

1

50

100

layer

layer

NMI

(a) p = 0

1 5 10 15

1

50

100

150

layer

node

Partition

1 50 100

1

50

100

layer

layer

NMI

(b) p = 0.5

1 5 10 15

1

50

100

150

layer

node

Partition

1 50 100

1

50

100

layer

layer

NMI

(c) p = 0.85

1 5 10 15

1

50

100

150

layer

node

Partition

1 50 100

1

50

100

layer

layer

NMI

(d) p = 0.95

1 5 10 15

1

50

100

150

layer

node

Partition

1 50 100

1

50

100

layer

layer

NMI

(e) p = 0.99

1 5 10 15

1

50

100

150

layer

node

Partition

1 50 100

1

50

100

layer

layer

NMI

(f) p = 1

Figure 4. Example temporal partitions for (n, l) = (150, 100).(Recall that n is the number of physical nodes and l is thenumber of layers.) We use the interlayer-dependency ten-sor from Fig. 2a with uniform probabilities pβ = p for allβ ∈ {1, . . . , l} and a Dirichlet null distribution with q = 1,θ = 1, and nset = 5 (see Appendix A). For (a) p = 0, (b)p = 0.5, (c) p = 0.85, (d) p = 0.95, (e) p = 0.99, and (f) p = 1,we show color-coded meso-set assignments ( , top) for asingle example output partition and NMI values [98] (0 1,bottom) between induced partitions in different layers aver-aged over a sample of 10 output partitions. The parametervalues match those in the numerical examples in Section VI(with the exception of p = 0, which we include for complete-ness). We choose a node ordering for each visualization that(whenever possible) emphasizes “persistent” mesoscale struc-ture [16]. We show only the first 15 layers of each multilayerpartition, because (as one can see in the NMI heat maps) sim-ilarities between induced partitions for p < 1 decay steeplywith the number of layers when dependencies exist only be-tween contiguous layers.

reflect the planted dependencies. As expected, for thetemporal examples, partitions are most similar for adja-cent layers. Similarities decay with the distance betweenlayers and increase with the value of p. For the multiplexexamples, we obtain approximately uniform similaritiesbetween pairs of layers (for all pairs of layers), wherethe similarities increase with the value of p. We use theexamples of interlayer dependency of Figs. 4 and 5, aswell as non-uniform and multi-aspect examples, in our

1 5 10 15

1

500

1000

layer

node

Partition

1 5 10 15

1

5

10

15

layer

layer

NMI

(a) p = 0

1 5 10 15

1

500

1000

layer

node

Partition

1 5 10 15

1

5

10

15

layer

layer

NMI

(b) p = 0.5

1 5 10 15

1

500

1000

layer

node

Partition

1 5 10 15

1

5

10

15

layer

layer

NMI

(c) p = 0.85

1 5 10 15

1

500

1000

layer

node

Partition

1 5 10 15

1

5

10

15

layer

layer

NMI

(d) p = 0.95

1 5 10 15

1

500

1000

layer

node

Partition

1 5 10 15

1

5

10

15

layer

layer

NMI

(e) p = 0.99

1 5 10 15

1

500

1000

layer

node

Partition

1 5 10 15

1

5

10

15

layer

layer

NMI

(f) p = 1

Figure 5. Example multiplex partitions for (n, l) = (1000, 15).We use the interlayer-dependency tensor in Fig. 2b and aDirichlet null distribution with q = 1, θ = 1, and nset = 10(see Appendix A). We perform 200 updating iterations (seeAppendix B). For (a) p = 0, (b) p = 0.5, (c) p = 0.85, (d)p = 0.95, (e) p = 0.99, and (f) p = 1, where p = (l−1)p is theprobability that a state node copies its assignment from an-other state node, we show color-coded meso-set assignments( , top) for a single example output partition and NMIvalues [98] (0 1, bottom) between induced partitions indifferent layers averaged over a sample of 10 output parti-tions. (For the temporal case in Fig. 4, note that p = p.)The parameter values match those in the numerical examplesin Section VI (with the exception of p = 0, which we includefor completeness). We choose a node ordering that (wheneverpossible) emphasizes “persistent” mesoscale structure [16].

numerical experiments of Section VI.

To provide a detailed illustration of the steps that re-sult in a multilayer partition, we focus on the temporalexamples in Fig. 4. For the important special case oftemporal interlayer dependencies, the generative modelfor multilayer partitions simplifies significantly. In par-ticular, there is a single ordered aspect, so the layer indexα ∈ N is a scalar and the order of the layers correspondsto temporal ordering. Furthermore, as we mentioned inSection IV B, for a fully-ordered multilayer network, werequire that the order of the meso-set-assignment updateprocess in Eq. (4) respects the order of the layers. Theupdate order of state nodes (1, α) . . . (n, α) in any given

12

layer α can be arbitrary (so we can update them simulta-neously), but each update is conditional on the meso-setassignment of state nodes in layer α− 1.

The update process that we described in Section IV Breduces to three steps: (1) initialize (in an arbitraryorder) the meso-set assignments in layer α = 1; (2)take the meso-set assignment of (i, α+ 1) to be that of(i, α) with probability p, and sample the meso-set as-signment of (i, α+ 1) from Pα+1

0 with complementaryprobability 1 − p; and (3) repeat steps (1) and (2) untilα + 1 = l. As we mentioned in Section IV B, conver-gence is not an issue for this case (or, more generally,for any fully-ordered multilayer network), as we needonly one iteration through the layers. This three-stepgenerative model for temporal partitions was also sug-gested by Ghasemian et al [50]. They used it to derivea detectability threshold for the case in which the nulldistributions are uniform across meso-sets (i.e., θ → ∞in Section IV A) and intralayer edges are generated in-dependently using the standard SBM. In other words,one replaces the degree-corrected SBM in Section V withthe non-degree-corrected SBM of [42]. In Appendix C,we highlight properties of a sampled temporal partitionthat illustrate how the interplay between p and the nulldistributions affects the evolution of meso-sets across lay-ers (e.g., growth, birth, and death of induced meso-sets).The properties that we highlight hold for any choice ofnull distribution. In the same appendix, we subsequentlyillustrate that the particular choice of null distributionscan greatly influence resulting partitions. For example, anon-empty overlap between the supports of null distribu-tions that correspond to contiguous layers is a necessarycondition for meso-sets to gain new state nodes over time.The properties and observations in Appendix C are in-dependent of one’s choice of planted-partition networkmodel.

V. GENERATING NETWORK EDGES

There are diverse different types of multilayer networks[4, 8]. One common type of empirical multilayer net-work to study is one with only intralayer edges. Thereare also many empirical multilayer networks with bothintralayer and interlayer edges (e.g., multimodal trans-portation networks [99]), as well as ones with only in-terlayer edges (e.g., temporal networks with edge delays,such as departures and arrivals of flights between air-ports [5, 100]). One can use our framework to generateall three types of examples, provided the underlying edgegeneration model is a planted-partition network model.

Having generated a multilayer partition S with de-pendencies between induced partitions in different layers,the simplest way to generate edges is to use any single-layer planted-partition network model (e.g., SBMs [42,43, 49, 57, 63, 101], models for spatially-embedded net-works [12, 64], and so on) and to generate edges for eachlayer independently. This yields a multilayer network

with only intralayer edges, such that any dependenciesbetween different layers result only from dependenciesbetween the partitions that are induced on the differ-ent layers. For our numerical experiments in Section VI,we use a single-layer network model that is a slightvariant (avoiding the creation of self-edges and multi-edges) of the degree-corrected SBM (DCSBM) bench-mark from [43], where “DCSBM benchmark” designatesthe specific type of DCSBM that was used in the nu-merical experiments of [43]. We include pseudocode forour implementation of the DCSBM benchmark in Algo-rithm 3 of Appendix D.

One can also include dependencies between layersother than those that are induced by planted mesoscalestructures. For example, one can introduce dependen-cies between the parameters of a single-layer planted-partition network model by (1) sampling them from acommon probability distribution (e.g., interlayer degreecorrelations [4, 102] in a DCSBM) or by (2) introduc-ing interlayer edge correlations, given a single-layer par-tition on each layer [103]. For temporal networks, onecan also incorporate “burstiness” [2, 3, 104] in the inter-event-time distribution of edges. In such a scenario, theprobability for an edge to exist in a given layer dependsnot only on the induced partition on that layer but alsoon the existence of the edge in previous layers. For ex-ample, one can use a Hawkes process to specify the timepoints at which edges are active [105, 106].

In Appendix D, we describe a generalization of theDCSBM in [43] to multilayer networks that one can useto generate intralayer edges and/or interlayer edges. Ourgeneralization constitutes a framework within which weformulate the parameters of a single-layer DCSBM in amultilayer setting. One can use our multilayer DCSBM(M-DCSBM) framework to incorporate some of the fea-tures that we described in the previous paragraph (e.g.,degree correlations) in a multilayer network with in-tralayer and/or interlayer edges.

VI. NUMERICAL EXAMPLES

In this section, we use our framework to constructbenchmark models for multilayer community-detectionmethods and algorithms. We use the examples ofinterlayer-dependency tensors from Figs. 2 and 3 to gen-erate benchmark models. We also consider a variant ofFig. 2b in which we split the layers into groups, use uni-form dependencies between layers in the same groups,and treat layers in different groups as independent of eachother. These examples cover commonly studied temporaland multiplex dependencies, and they illustrate how onecan generate benchmark multilayer networks with morethan one aspect. We focus on a couple of popular qual-ity functions for multilayer community detection in ournumerical examples, rather than investigating any givenmethod or algorithm in detail.

We compare the behavior of several variants of [107],

13

which are similar to the the locally greedy Louvain com-putational heuristic [108], to optimize a multilayer mod-ularity objective function [16, 33] using the standardNewman–Girvan null model (which is a variant of a “con-figuration model” [109]). Modularity is an objective func-tion that is often used to partition sets of nodes intocommunities that have a larger total internal edge weightthan the expected total internal edge weight in the samesets in a “null network” [16], which is generated fromsome null model. Modularity maximization consists offinding a partition that maximizes this difference. Forour numerical experiments, we use the generalization ofmodularity to multilayer networks of [33]. For multilayermodularity, the strength of the interactions between dif-ferent layers are governed by an interlayer-coupling ten-sor that controls the incentive for state nodes in differentlayers to be assigned to the same community. We usemultilayer modularity with uniform interlayer coupling,so the strength of the interactions between different lay-ers of the network depends on a layer-independent andnode-independent interlayer coupling weight ω ≥ 0. Weuse diagonal and categorical (i.e., between all pairs of lay-ers) interlayer coupling with weight ω for the multiplexexamples in Section VI A. We use diagonal and ordinal(i.e., between contiguous layers) interlayer coupling withweight ω for the temporal examples in Section VI B. InAppendix E, we describe the Louvain algorithm and thevariants of it that we use in this section.

We compare the results of multilayer modularity max-imization with those of multilayer InfoMap [110] [9],which uses an objective function called the “map equa-tion” (which is not an equation), based on a discrete-timerandom walk and ideas from coding theory, to coarse-grain sets of nodes into communities [111]. In multi-layer InfoMap, one uses a probability r ∈ [0, 1] calledthe “relaxation rate” to control the relative frequencywith which a random walker remains in the same layeror moves to other layers. (A random walker cannotchange layers when r = 0.) The relaxation rate thus con-trols the interactions between different layers of a mul-tilayer network. We allow the random walker to moveto all other layers when r 6= 0 for the multiplex exam-ples in Section VI A, and we allow the random walkerto move only to adjacent layers for the temporal exam-ples in Section VI B. In contrast to multilayer modularity(where ω = 0 yields single-layer modularity), single-layerInfoMap is not equivalent to choosing r = 0 in mul-tilayer InfoMap because placing state nodes that cor-respond to the same physical node in the same commu-nity can contribute positively to the quality function evenwhen r = 0 [9]. Consequently, we compute single-layerInfoMap separately and reference it by the data point“s” on the horizontal axis.

In all experiments in this section, we generate a mul-tilayer partition using our copying process in Section IV(see Algorithm 1 in Appendix B) and a multilayer net-work for a fixed planted partition using a slight variantof the DCSBM benchmark network model in [43] that

avoids generating self-edges and multi-edges (see Algo-rithm 3 in Appendix D). This produces multilayer net-works that have only intralayer edges and in which theconnectivity patterns in different layers depend on eachother. Following [43], we parametrize the DCSBM bench-mark in terms of its distribution of expected degrees anda community-mixing parameter µ ∈ [0, 1] that controlsthe strength of the community structure in the samplednetwork edges. For µ = 0, all edges lie within commu-nities; for µ = 1, edges are distributed independently ofthe communities, where the probability of observing anedge between two state nodes in the same layer dependsonly on the expected degrees of those two state nodes.We use a truncated power law for the distribution of ex-pected degrees.

The DCSBM benchmark imposes community structureas an expected feature of an ensemble of networks thatit generates. The definition of the mixing parameterµ of the DCSBM benchmark ensures that the plantedpartition remains community-like — as intra-communityedges are more likely to be observed and inter-communityedges are less likely to be observed than in a network thatis generated from a single-block DCSBM with the sameexpected degrees — for any value of µ < 1. Consequently,given sufficiently many samples from the same DCSBMbenchmark (i.e., all samples have the same planted par-tition and same expected degrees), one should be able toidentify the planted community structure for any valueof µ < 1 (where the necessary number of samples goesto infinity as µ → 1). This feature makes the DCSBMbenchmark an interesting test for the ability of multilayercommunity-detection methods to aggregate informationfrom multiple layers.

We use NMI [98] to compare the performance of dif-ferent community-detection algorithms. For each of ourpartitions, we compute the mean of the NMI between thepartition induced on each layer by the output partitionand that induced by the planted partition. That is,

〈NMI〉(S, T ) =1

l

∑

α∈LNMI(S|α, T |α) .

The quantity 〈NMI〉 is invariant under permutationsof the meso-set labels within a layer. Consequently,〈NMI〉 is well-suited to comparing multilayer community-detection methods with single-layer community-detectionmethods. In particular, it allows us to test whethermultilayer community-detection methods can exploit de-pendencies between layers of a multilayer network whenp � 0 (see Eq. (6)). In Appendix F, we show numericalexperiments in which we compute NMI between multi-layer partitions. We denote the NMI between two multi-layer partitions by mNMI.

In all of our numerical experiments in Sections VI Aand B, we sample the benchmark networks in the fol-lowing way. For each value of p, we generate 10 samplepartitions. For each sample partition and value of µ, wegenerate 10 sample multilayer networks. This yields 100benchmark instantiations for each pair (p, µ). We run

14

each community-detection algorithm 10 times on eachinstantiation. In Figs. 6–9, we show 〈NMI〉 betweenplanted and recovered multilayer partitions averaged oversample partitions, sample networks, and algorithmic runsfor each value of µ. The results for different planted par-titions are generally similar. The only exception is In-foMap on temporal networks with p = 0.99 and p = 1,where we observe large differences between results fordifferent partitions for certain values of µ.

Figures 6–9 correspond to different choices for theinterlayer-dependency tensor. In each figure, rows cor-respond to different choices for the community-detectionalgorithm, and columns correspond to different valuesof the copying probabilities (with the strength of inter-layer dependencies increasing from left to right). Allbenchmark instantiations that we use for Figs. 6–10are available at https://dx.doi.org/10.5281/zenodo.3304059.

A. Multiplex examples

In this section, we consider two stylized examples ofmultiplex networks. Multiplex networks arise in a varietyof different applications, including international relationsand trade [112–114], social networks [115], and ecologi-cal networks [116]. In our multiplex examples, we con-sider simple dependency structures in which we expectmultilayer community-detection methods to outperformsingle-layer methods by exploiting interlayer dependen-cies.

In Fig. 6, we consider multiplex networks with uni-form dependencies between community structure in dif-ferent layers. In Fig. 7, we consider multiplex net-works with nonuniform dependencies between commu-nity structure in different layers. In both figures, weparametrize the amount of interlayer dependency in anetwork by the probability p (see Eq. (6)) that a statenode copies its community assignment from a neighbor inthe interlayer-dependency network. All multilayer net-works have n = 1000 physical nodes and l = 15 lay-ers. Each node is present in every layer, so there area total of 15000 state nodes. In Fig. 2b, we show thelayer-coupled interlayer-dependency matrix that we useto generate the uniform multiplex networks. For thenonuniform multiplex networks, we split the layers into3 groups of 5 layers each. We use uniform dependenciesbetween layers in the same group, and layers in differ-ent groups are independent of each other. The resultinglayer-coupled interlayer-dependency matrix is block diag-onal with diagonal blocks as in Fig. 2b and 0 entries onthe off-diagonal blocks. We use a Dirichlet null distribu-tion with nset = 10, θ = 1, and q = 1 (see Appendix A)to specify expected community sizes and the M-DCSBMbenchmark (see Appendix D) with ηk = 2, kmin = 3, andkmax = 150 to generate intralayer edges. We perform 200iterations of our update process (see Appendix B).

Comparing our results for GenLouvain (Figs. 6a–e

and 7a–c) and GenLouvainRand with reiteration andpost-processing (Figs. 6f–j and 7d–f), we see that thechoice of optimization heuristic for a given quality func-tion can significantly affect the quality of resulting iden-tified partitions. In particular, for GenLouvain, we ob-serve two distinct regimes and what appears to be a sharptransition between them [117]. For ω < 1, we obtain apartition that is of similar quality to what we obtain bymaximizing single-layer modularity; however, for ω > 1,we obtain a partition with identical induced partitionsfor each layer. We call the latter an aggregate partition.Although the aggregate partition can be more similar tothe planted partition than the single-layer partition whenp is sufficiently large (e.g., this occurs in Figs. 6d and e),we do not observe an interval of ω values between thetwo regimes in which GenLouvain recovers the plantedpartition better than both the single-layer and aggregatepartitions. By contrast, GenLouvainRand with reiter-ation and post-processing identifies partitions that matchthe planted partition more closely than either the single-layer or aggregate partitions in most cases. The excep-tions are uniform multiplex networks with p = 0.5 (seeFig. 6f), where it is unable to exploit the weak depen-dencies to outperform the single-layer case; and p = 1(Fig. 6j), where the aggregate partition is always best.Most of the improvement in the results for GenLou-vainRand over those for GenLouvain comes from re-iteration. The additional randomization helps smoothout the transition at ω ≈ 1. Post-processing only yieldsa minor improvement in the value of 〈NMI〉. However,its effect is more pronounced if we compute the mNMIbetween multilayer partitions (see Appendix F).

Multilayer InfoMap exhibits some problematic be-havior in these benchmark experiments. Our resultsfor multilayer InfoMap are noticeably worse than thosethat we obtain with single-layer InfoMap (correspond-ing to the data point “s” on the horizontal axis) fornetworks with comparatively weak community structure(specifically, µ ≥ 0.4), unless the planted partition for dif-ferent layers is very similar (i.e., unless p ≥ 0.99). Fur-thermore, the methods that are based on multilayer mod-ularity outperform multilayer InfoMap in our experi-ments for networks with weak community structure andsimilar layers (i.e., when both µ and p are close to 1). In-foMap does not identify meaningful community struc-ture in networks with µ > 0.7 in any of our numerical ex-amples, whereas GenLouvain and GenLouvainRandidentify meaningful structure even for networks with veryweak community structure (e.g., with µ = 0.9) for suffi-ciently large values of ω.

Our results for nonuniform multiplex networks in Fig. 7are similar to those that we showed for the uniformmultiplex networks in Fig. 6. In particular, we seethat GenLouvainRand can exploit dependencies be-tween community structure in different layers (althoughthe improvement over single-layer modularity is less pro-nounced than in the uniform examples), whereas Gen-Louvain cannot. The main difference is that for p = 1,



15

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3 µ = 0.4 µ = 0.5

µ = 0.6 µ = 0.7 µ = 0.8 µ = 0.9 µ = 1

GenLouvain

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(a) p = 0.5

10−2 1000

0.2

0.4

0.6

0.8

1

ω〈N

MI〉

(b) p = 0.85

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(c) p = 0.95

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(d) p = 0.99

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(e) p = 1

GenLouvainRand

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(f) p = 0.5

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(g) p = 0.85

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(h) p = 0.95

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(i) p = 0.99

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(j) p = 1

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(k) p = 0.5

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(l) p = 0.85

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(m) p = 0.95

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(n) p = 0.99

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(o) p = 1

Figure 6. Multiplex networks with uniform interlayer dependencies. Effect of interlayer coupling strength ω and relaxationrate r on the ability of different community-detection algorithms to recover planted partitions as a function of the community-mixing parameter µ in benchmark networks with uniform multiplex dependencies (see Fig. 2b). Each multilayer network has1000 nodes and 15 layers, and each node is present in all layers. All NMI values are means over 10 runs of the algorithms and100 instantiations of the benchmarks. (See the introduction of Section VI.) Each curve corresponds to the mean NMI valuesthat we obtain for a given value of µ, and the shaded area around a curve corresponds to the minimum and maximum NMIvalues that we obtain with the 10 sample partitions for a given value of p.

the aggregate partitions from GenLouvain and Gen-LouvainRand do not recover the planted partition forsufficiently large values of ω, because the induced parti-tions are not identical across layers in the planted parti-tion. In principle, multilayer InfoMap should have anadvantage over methods based on maximizing multilayermodularity in this case study, as the former’s qualityfunction is designed to detect breaks in community struc-ture, whereas multilayer modularity forces some persis-tence of community labels between any pair of layers. Inpractice, however, multilayer InfoMap correctly identi-fies the planted community structure only when it is par-ticularly strong (as we show in Appendix F, it correctlyidentifies the different groups of layers for networks withµ ≤ 0.3); and it is outperformed by single-layer InfoMapfor networks with µ ≥ 0.4.

We suspect that at least some of the shortcomings ofmultilayer InfoMap in these experiments are due to theuse of a Louvain-like optimization heuristic, rather than

from flaws in InfoMap’s quality function. As we haveseen from the results with the heuristics GenLouvainand GenLouvainRand, seemingly minor adjustmentsof an optimization heuristic can have large effects on thequality of the results. We are thus hopeful that similaradjustments can also improve the results for multilayerInfoMap.

B. Temporal examples

Temporal networks arise in many different applications[2, 3], such as the study of brain dynamics [17], financial-asset correlations [16], and scientific citations [18]. Whenrepresenting a temporal network as a multilayer network,one orders the layers in a causal way, such that structurein a particular layer depends directly only on structurein previous layers (and not in future layers). To gen-

16

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3

µ = 0.4 µ = 0.5 µ = 0.6 µ = 0.7

µ = 0.8 µ = 0.9 µ = 1

GenLouvain

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(a) p = 0.85, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(b) p = 0.95, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(c) p = 1, pc = 0

GenLouvainRand

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(d) p = 0.85, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(e) p = 0.95, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(f) p = 1, pc = 0

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(g) p = 0.85, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(h) p = 0.95, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(i) p = 1, pc = 0

Figure 7. Multiplex networks with nonuniform interlayer de-pendencies. Effect of interlayer coupling strength ω and re-laxation rate r on the ability of different community-detectionalgorithms to recover planted partitions as a function of thecommunity-mixing parameter µ in a multiplex benchmarkwith nonuniform interlayer dependencies. Each multilayernetwork has 1000 nodes and 15 layers, and each node ispresent in all layers. The layer-coupled interlayer-dependencymatrix is a block-diagonal matrix with diagonal blocks ofsize 5 × 5. For each diagonal block, we set the value in theinterlayer-dependency matrix to a value p (so each diagonalblock has the same structure as the matrix in Fig. 2b); foreach off-diagonal block, we set the value to pc = 0, therebyincorporating an abrupt change in community structure. AllNMI values are means over 10 runs of the algorithms and 100instantiations of the benchmarks. Each curve corresponds tothe mean NMI values that we obtain for a given value of µ,and the shaded area around a curve corresponds to the min-imum and maximum NMI values that we obtain with the 10sample partitions for a given value of p.

erate multilayer networks that have such structure, weuse our sampling process for fully-ordered multilayer net-works and the interlayer-dependency tensor of Fig. 2a.

We consider two stylized examples of temporal net-works. In Fig. 8, we show results for temporal networksthat have uniform dependencies between consecutive lay-ers (and hence tend to evolve gradually). To generatethese networks, we set pβ = p for all layers in Fig. 2a.In Fig. 9, we show results for temporal networks with

change points. To generate these networks, we set pβ = pfor all layers except layers 25, 50, and 75, for whichpβ = pc = 0, resulting in abrupt changes in communitystructure. Each multilayer network in these two exam-ples has n = 150 physical nodes and l = 100 layers. Eachnode is present in every layer, so there are a total of15000 state nodes. We use a Dirichlet null distributionto specify expected community sizes and set nset = 5,θ = 1, and q = 1 (see Appendix A). For the tempo-ral networks with change points in Fig. 9, we choose thesupport of the null distribution so that communities aftera change point have new labels. We use the M-DCSBMbenchmark (see Appendix D) with ηk = 2, kmin = 3, andkmax = 30 to generate intralayer edges.

Both multilayer-modularity-based algorithms (i.e.,GenLouvain and GenLouvainRand) can exploit in-terlayer dependencies for these temporal benchmark net-works and identify partitions with significantly larger〈NMI〉 than those that we obtain with single-layer mod-ularity (i.e., with ω = 0). Typically, the peak of 〈NMI〉seems to occur when 1 / ω / 4. When p < 1, one ex-pects 〈NMI〉 to decrease for sufficiently large values ofω, as increasing ω further favors “persistence” [16] inthe output partition that is not present in the multilayerplanted partition.

For multilayer InfoMap, the results are less promis-ing. For most parameter choices, the best result for mul-tilayer InfoMap is at best similar and often worse thanthe result for single-layer InfoMap. (The 〈NMI〉 valuefor single-layer InfoMap is the data point that we la-bel with “s” on the horizontal axis.) An exception arethe results for p = 1 and uniform temporal dependencies(see Fig. 8o). In this example (where induced partitionsare the same across layers), increasing the value of therelaxation rate r enhances the recovery for all sampledplanted partitions when µ ≤ 0.4 and for a subset of sam-pled planted partitions when µ = 0.5 and µ = 0.6. Theresults for multilayer InfoMap on temporal benchmarkswith uniform interlayer dependencies with p = 0.99 (seeFig. 8n for µ = 0.3 and µ = 0.4) and p = 1 (see Fig. 8ofor µ = 0.5 and µ = 0.6) are the only instances where weobserve large differences in results for different partitionsthat are generated by our model with the same parametervalues.

Comparing results for GenLouvain (see Figs. 8a–eand 9a–c) and GenLouvainRand with reiteration andpost-processing (see Figs. 8f–j and 9d–f), we see thatthe difference in results between the two optimizationheuristics for multilayer modularity is even more pro-nounced in these temporal examples than in the mul-tiplex examples in Section VI A. In particular, GenLou-vain exhibits an abrupt change in behavior near ω = 1;this is related to the transition behavior that was de-scribed in [16]. As we explained in Section VI A (wherewe also observed this phenomenon), this transition occursfor the following reason: for values of ω that are abovea certain threshold [16], only interlayer merges occur inthe first phase of the Louvain heuristic. This abrupt

17

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3 µ = 0.4 µ = 0.5

µ = 0.6 µ = 0.7 µ = 0.8 µ = 0.9 µ = 1

GenLouvain

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(a) p = 0.5

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω〈N

MI〉

(b) p = 0.85

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(c) p = 0.95

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(d) p = 0.99

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(e) p = 1

GenLouvainRand

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(f) p = 0.5

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(g) p = 0.85

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(h) p = 0.95

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(i) p = 0.99

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(j) p = 1

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(k) p = 0.5

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(l) p = 0.85

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(m) p = 0.95

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(n) p = 0.99

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(o) p = 1

Figure 8. Temporal networks with uniform interlayer dependencies. Effect of interlayer coupling strength ω and relaxationrate r on the ability of different community-detection algorithms to recover planted partitions as a function of the community-mixing parameter µ in a temporal benchmark with uniform interlayer dependencies (i.e., pβ = p ∈ [0, 1] for all β ∈ {2, . . . , l}in Fig. 2a). Each multilayer network has 150 nodes and 100 layers, and each node is present in all layers. All NMI values aremeans over 10 runs of the algorithms and 100 instantiations of the benchmark. (See the introduction of Section VI.) Each curvecorresponds to the mean NMI values that we obtain for a given value of µ, and the shaded area around a curve corresponds tothe minimum and maximum NMI values that we obtain with the 10 sample partitions for a given value of p.

transition no longer occurs with the additional random-ization in GenLouvainRand, which exhibits smooth be-havior as a function of ω. However, unlike in our multi-plex experiments, we observe benefits from the behaviorof GenLouvain in some situations, especially for net-works with weak community structure (specifically, forµ ≥ 0.8). This phenomenon first becomes noticeable fornetworks with p = 0.95, and it becomes more pronouncedfor progressively larger values of p. Compare Figs. 8c–ewith Figs. 8h–j; and compare Figs. 9b and c with Figs. 9eand f.

Our results for temporal benchmark networks withchange points (see Fig. 9) are mostly similar to thosefor temporal benchmark networks with uniform depen-dencies when considering 〈NMI〉 to compare planted andrecovered partitions. Only the results for p = 1 arenoticeably different. (Compare Figs. 9c, f, and i withFigs. 8e, j, and o.) Because of the change points, the

induced planted partitions for each layer are not identi-cal, even when p = 1. Consequently, unlike in Fig. 8,the results for p = 1 in Fig. 9 are qualitatively similarto those for smaller values of p. In principle, one mightexpect multilayer InfoMap to have an advantage overmultilayer-modularity-based methods in this experiment,as the former’s quality function is designed to detectabrupt changes in community structure, whereas max-imizing multilayer modularity always favors some persis-tence of community labels from one layer to the next.However, this theoretical advantage is not borne out inour experiment. When comparing the planted and recov-ered partitions using the NMI between multilayer parti-tions, we see that multilayer InfoMap can correctly re-cover the change points only for p = 1 and µ = {0, 0.1}(see Appendix F).

18

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3

µ = 0.4 µ = 0.5 µ = 0.6 µ = 0.7

µ = 0.8 µ = 0.9 µ = 1

GenLouvain

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(a) p = 0.85, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(b) p = 0.95, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(c) p = 1, pc = 0

GenLouvainRand

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(d) p = 0.85, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(e) p = 0.95, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

〈NM

I〉

(f) p = 1, pc = 0

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(g) p = 0.85, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(h) p = 0.95, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

〈NM

I〉

(i) p = 1, pc = 0

Figure 9. Temporal networks with nonuniform interlayer de-pendencies. Effect of interlayer coupling strength ω and re-laxation rate r on the ability of different community-detectionalgorithms to recover planted partitions as a function of thecommunity-mixing parameter µ in a temporal benchmarkwith nonuniform interlayer dependencies. Each multilayernetwork has 150 nodes and 100 layers, and each node ispresent in all layers. Every 25th layer (i.e., for layers 25,50, and 75), we set the value of pβ in Fig. 2a to pc = 0,thereby introducing an abrupt change in community struc-ture; and we set all other values of pβ in Fig. 2a to p. AllNMI values are means over 10 runs of the algorithms and 100instantiations of the benchmark. Each curve corresponds tothe mean NMI values that we obtain for a given value of µ,and the shaded area around a curve corresponds to the min-imum and maximum NMI values that we obtain with the 10sample partitions for a given value of p.

C. Multi-aspect examples

In this section, we illustrate the ability of our frame-work to generate multilayer networks with more thanone aspect. A common situation that gives rise to mul-tilayer networks with two aspects is a multiplex net-work that changes over time (e.g., citations patterns thatchange over time [118]). In our framework, such a situ-ation corresponds to the interlayer-dependency tensor inFig. 3. For the illustrative examples in this section, weuse uniform multiplex and temporal dependencies; andwe parametrize the interlayer dependency tensor as fol-

0.73

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(a) p = 0.85, a = 0.1

0.76

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(b) p = 0.85, a = 0.5

0.83

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(c) p = 0.85, a = 0.9

0.78

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(d) p = 0.95, a = 0.1

0.79

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(e) p = 0.95, a = 0.5

0.88

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(f) p = 0.95, a = 0.9

0.87

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωmωt

(g) p = 0.99, a = 0.1

0.81

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(h) p = 0.99, a = 0.5

0.86

0 0.2 0.4 0.6 0.8 1

0

2

4

6

8

10

ωm

ωt

(i) p = 0.99, a = 0.9

Figure 10. Multi-aspect networks with uniform interlayer de-pendencies. Effect of the strength of temporal interlayer cou-pling ωt and multiplex interlayer coupling ωm on the ability ofmaximization of multilayer modularity to recover the plantedpartition in two-aspect multilayer networks that we generateusing our framework. The first aspect is temporal (and thusordered) and the second aspect is multiplex (and thus un-ordered). The heat maps show the mean (〈NMI〉, 0 1)over all layers of the NMI between induced planted parti-tions and recovered community structure. A black cross indi-cates the maximum value of 〈NMI〉. The dependence betweenstructure in different layers increases from top to bottom, andthe importance of the temporal aspect versus the multiplexaspect increases from left to right. We average our results foreach pair of parameters over 10 runs of GenLouvainRandwith reiteration. The generated networks have n = 100 nodesand l = 100 layers with |LT| = |LM| = 10, so there are a totalof 10000 state nodes. We fix the community-mixing parame-ter of the M-DCSBM benchmark to be µ = 0.5.

lows:

P βt,βmαt,αm = δ(αt, βt)(1− δ(αm, βm))(1− a)p/(lm − 1)+ δ(αt + 1, βt)δ(αm, βm)ap ,

where a controls the balance between multiplex and tem-poral dependencies and p controls the overall strength ofthe interlayer dependencies.

To recover communities, we optimize a multi-aspect

19

generalization,

Q(S) =∑

(i,α)∈VM(j,β)∈VM

[(Aj,βi,α −

kαi,αkβj,β

2mβα

)δ(α,β)

+ Cβα δ(i, j)

]δ(Si,α, Sj,β) ,

(8)

of the multilayer modularity function that was intro-duced in [33].

We specify the interlayer-coupling tensor C (a multi-aspect generalization of the “interslice coupling” of [33]),such that it reflects the planted interlayer dependencies,by writing

Cβt,βmαt,αm = δ(αt, βt)(1− δ(αm, βm))ωm+ δ(αt + 1, βt)δ(αm, βm)ωt ,

where ωt denotes the coupling parameter for the tempo-ral dependencies and ωm denotes the coupling parameterfor the multiplex dependencies.

In Fig. 10, we illustrate the effects of the two inter-layer coupling parameters, ωt and ωm, on the extent towhich multilayer modularity maximization can recovera planted partition. We use GenLouvainRand withreiteration but without post-processing to find a localoptimum of Eq. (8). For all of the values of p and athat we consider in Fig. 10, we see some improvementin the performance of multilayer modularity maximiza-tion with nonzero interlayer coupling compared with thecase where both ωt = 0 and ωm = 0. As the lattercase corresponds to independently maximizing modular-ity on each layer of a network, this demonstrates thatmultilayer modularity is able to exploit the interlayer de-pendencies to better recover the planted partition. Asexpected, increasing a (which gives more weight to tem-poral dependencies) leads to a shift of the region of theparameter space with good planted-partition recovery tolarger values of ωt and smaller values of ωm. For pro-gressively larger p, we observe a small overall increase inthe value of 〈NMI〉, but its dependence on ωt and ωmremains similar.

VII. CONCLUSIONS

We introduced a unifying and flexible framework forthe construction of generative models for mesoscale struc-tures in multilayer networks. The three most importantfeatures of our framework are the following: (1) it in-cludes an explicitly parametrizable tensor P that con-trols interlayer-dependency structure; (2) it can generatean extremely general, diverse set of multilayer networks(including, e.g., temporal, multiplex, “multilevel” [119],and multi-aspect multilayer networks with uniform ornonuniform dependencies between state nodes); and (3)it is modular, as the process of generating a partition isseparate from the process of generating edges, enablinga user to first generate a partition and then use any

planted-partition network model. Along with our pa-per, we provide publicly available code [14] that userscan modify to readily incorporate different types of nulldistributions (see Section IV A), interlayer-dependencystructures (see Section IV B), and planted-partition net-work models (see Section V).

The ability to explicitly specify interlayer-dependencystructure makes it possible for a user to control whichlayers depend directly on each other (by deciding whichentries in the interlayer-dependency tensor are nonzero)and the strengths of such dependencies (by varying themagnitude of entries in the interlayer-dependency ten-sor). One can thereby generate multilayer networks witheither a single aspect or multiple aspects (e.g., tempo-ral and/or multiplex networks) and vary dependenciesbetween layers from the extreme case in which inducedpartitions in a planted multilayer partition are the sameacross layers to the opposite extreme, in which inducedpartitions in a planted multilayer partition are gener-ated independently for each layer from a null distributionfor that layer. To the best of our knowledge, this levelof generality is absent from existing generative modelsfor mesoscale structures in multilayer networks, as thosemodels tend to only consider networks with a single as-pect (e.g., temporal or multiplex) or limited interlayer-dependency structures (e.g., with the same induced par-titions in a planted multilayer partition across all layers).

We illustrated several examples of generative modelsthat one can construct from our framework, with a focuson a few special cases of interest, rather than on tryingto discuss as many situations as possible. We focusedon community structure in our numerical experiments,because it is a commonly studied mesoscale structure,but one can also use our framework to construct genera-tive models of intra-layer mesoscale structures other thancommunity structure (e.g., core–periphery structure, bi-partite structure, and so on) by taking advantage of theflexibility of our framework’s ability to use any single-layer planted-partition network model. For our single-aspect examples, we assumed that interlayer dependen-cies exist either between all contiguous layers (a specialcase of temporal networks), between all layers (a specialcase of multiplex networks), or between all contiguousgroups of layers (another special case of multiplex net-works). For both our temporal and multiplex examples,we considered both uniform and nonuniform interlayerdependencies. We also combined some of these scenar-ios in an example with two aspects, a temporal aspectand a multiplex aspect. However, our framework’s flexi-bility allows us to construct generative models of multi-layer networks with more realistic features. For example,for temporal networks, one can introduce dependenciesbetween a layer and all layers that follow it (such thatFig. 2a is an upper triangular matrix with nonzero en-tries above the diagonal) to incorporate memory effects[120]. One can also consider interlayer dependencies thatare not layer-coupled. For example, dependencies can bediagonal but nonuniform, or they can be nonuniform and

20

exist only between sets of related nodes.

In Section VI, we consider the commonly studied caseof a multilayer network with only intralayer edges andconnectivity patterns in the different layers that dependon each other. We use a slight variant of the DCSBMbenchmark from [43] to generate edges for each layer (seeAlgorithm 3). However, other types of multilayer net-works are also important [4], and one can readily com-bine our approach for generating multilayer partitionswith different network generative models that capturevarious important features. For example, one can usean SBM to generate interlayer edges (e.g., using our M-DCSBM framework, which we discuss in Appendix D), orone can replace the degree-corrected SBM in Section Vwith any other planted-partition network model or otherinteresting models (e.g., other variants of SBMs [42, 49]and models for networks whose structure is affected byspace (and perhaps spatially embedded) [12] or arbitrarylatent features [64]). In all of these examples, depen-dencies between connectivity patterns in different layersarise only from a planted multilayer partition. It is alsopossible to modify our network generation process (seeSection V) to incorporate additional dependencies be-tween layers beyond those that are induced by a plantedmesoscale structure, such as by introducing dependenciesbetween a node’s degree in different layers [102, 121] orburstiness [104] in inter-event-time distributions of edges.

Our work has the potential for many useful and inter-esting extensions, and we highlight three of these. First,although we have given some illustrative numerical exam-ples in Section VI, the area of benchmarking community-detection methods in multilayer networks is far from fullydeveloped. Generative models are useful tools for un-derstanding the behavior of community-detection meth-ods in detail and thus for suggesting ways of improvingheuristic algorithms without losing scalability. One canuse our framework to construct benchmark models thatprovide a test bed for gaining insight into the advan-tages and shortcomings of community-detection meth-ods and algorithms (and, more generally, of meso-set-detection methods and algorithms). Importantly, we ex-pect these benchmark models to be very informative fordetecting potential artifacts of algorithms that can some-times be masked in real-world applications. Second, awell-understood generative model can be a powerful toolfor statistical inference (i.e., inferring the structure of amultilayer network rather than generating a multilayernetwork with a planted structure) [24]. For temporalnetworks, closed forms for the joint distribution of meso-set assignments have been derived for models that arespecial cases of our framework [10, 13]. These resultsmay be useful for statistical inference. Additionally, itseems likely that it is possible to adapt Bayesian inferencetechniques (including Gibbs sampling [122–124] or varia-tional methods [125]) that have been developed for SBMsboth for inferring a multilayer partition and for inferringan interlayer-dependency tensor. A key advantage of thegenerality of the framework that we have developed in the

present paper is that it may be possible to frame manymodel-selection questions in terms of posterior estimationof the interlayer-dependency tensor. Finally, it is impor-tant to model interdependent data streams and not justfixed data sets. For example, for any fully-ordered multi-layer network, our generative model respects the causal-ity of layers (e.g., temporal causality), and one can thusupdate a multilayer network with a new layer withoutthe need to update any previous layers. It is critical todevelop generative models that are readily adaptable tosuch situations, and our work in this paper is a step inthis direction.

Appendix A: Example null distributions

In this appendix, we discuss parameter choices for acategorical null distribution and give concrete examplesthat can be useful for modeling mesoscale structure intemporal or multiplex networks.

In Section IV A, we described the general form of acategorical null distribution:

Pα0 [s] =

{pαs , s ∈ {1, . . . , nset} ,0 , otherwise ,

(A1)

where pαs is the probability for a state node in layerα to be assigned to a meso-set s in the absence of in-terlayer dependencies, nset is the number of meso-setsin the multilayer partition, and

∑nset

s=1 pαs = 1 for each

α ∈ L. The support Gα of the null distribution Pα0 is

Gα = {s : Pα0 [s] 6= 0}. We say that a label s is active in

a layer α if it is in the support of the null distributionPα0 (i.e., if Pα

0 [s] 6= 0); and we say that a label is inactivein layer α if it is in the complement of the support of Pα

0

(i.e., if Pα0 [s] = 0).

A natural choice is to sample pα from a Dirichlet dis-tribution, which is the conjugate prior for the categori-cal distribution [92, 93]. The Dirichlet distribution overq variables has q parameters θ1, . . . , θq. Its probabilitydensity function is

p(x1, . . . , xq) =Γ (∑qi=1 θi)∏q

i=1 Γ(θi)

q∏

i=1

xθi−1i ,

where xi ∈ (0, 1) and θi > 0 for each i ∈ {1, . . . , q}.The case in which all θi are equal is called a “symmet-ric Dirichlet distribution”, which we parametrize by thecommon value θ (the so-called “concentration parame-ter”) of the parameters and the number q of variables.

The concentration parameter θ determines the typesof discrete probability distributions that one is likely toobtain from the symmetric Dirichlet distribution. Forθ = 1, the symmetric Dirichlet distribution is the contin-uous uniform distribution over the space of all discreteprobability distributions with nset states. As θ → ∞,the Dirichlet distribution becomes increasingly concen-trated near the discrete uniform distribution, such that

21

all entries in pα are approximately equal. As θ → 0,it becomes increasingly concentrated away from the uni-form distribution, such that pα tends to have 1 (or a few)large entries and all other entries are close to 0. Conse-quently, to have very heterogeneous meso-set sizes, onewould choose θ ≈ 1. To have all meso-sets be of simi-lar sizes, one would choose a large value of θ. To have afew large meso-sets and many small meso-sets, one wouldchoose θ to be sufficiently smaller than 1. The valueof nset also affects the amount of meso-set label overlapacross layers in the absence of interlayer dependencies.For example, if pα is the same for all layers, then largervalues of nset incentivize less label overlap across layers(because there are more possible labels for each layer)and smaller values of nset incentivize more label overlapacross layers (because there are fewer possible labels foreach layer).

In some situations — e.g., when modeling the birth anddeath of communities in temporal networks or the ap-pearance and disappearance of communities in multiplexnetworks — it is desirable to have meso-set labels thathave a nonzero probability in (A1) only in some layers.For these situations, we suggest sampling the support ofthe distributions before sampling the probabilities pα.Given the supports for each layer, one then samples thecorresponding probabilities from a symmetric Dirichletdistribution (or any other distribution over categoricaldistributions). That is,

pαGα ∼ Dir(θ, |Gα|) , pαG

α = 0 , (A2)

where Gα = {s ∈ {1, . . . , nset} : Pα0 [s] = 0} is the com-

plement of the support, with nset = maxα∈L max(Gα).

A simple example for a birth/death process for meso-sets is the following. First, fix a number of meso-setsand a support (i.e., active community labels) for the firstlayer. One then sequentially initializes the supports forthe other layers by removing each meso-set in the sup-port of the previous layer with probability rd ∈ [0, 1] andadding a number, sampled from a Poisson distributionwith rate rb ∈ [0,∞), of new meso-sets (with new labelsthat are not active in any previous layer). In temporalnetworks, for example, this ensures that if a meso-set la-bel is not in the support of a given layer, then the labelis also not in the support of any subsequent layers. Forthis process, the expected number 〈|Gα|〉 of meso-sets in alayer approaches rb/rd as one iterates through the layers.Therefore, one should initialize the size of the supportfor the first layer close to this value to avoid transientsin the number of meso-sets. For this process, the life-time of meso-sets follows a geometric distribution. Thenature of the copying process that we use to introduce de-pendencies between induced partitions in different layerstypically implies that meso-sets that have been removedfrom the support of the null distribution do not lose allof their members instantly, but instead shrink at a speedthat depends on the values of the copying probabilitiesin the interlayer-dependency tensor.

One can also allow labels to appear and disappearwhen examining multiplex partitions. For example, givena value for nset, one can generate the support for eachlayer by allowing each label s ∈ {1, . . . , nset} to bepresent with some probability q and absent with com-plementary probability 1− q. This yields a sets of activeand inactive meso-set labels for each layer. One can thensample the nonzero probabilities in pα that correspondto active labels from a Dirichlet distribution and set pαs to0 for each inactive label s. Because multiplex partitionsare unordered, there is no notion of one layer occurringafter another one, so we do not need to ensure that aninactive label in a given layer is also inactive in “subse-quent” layers.

Appendix B: Partition sampling process

In this appendix, we provide a detailed description ofthe way in which we sample multilayer partitions, in-cluding a discussion of the convergence properties of oursampling process. As we mentioned in Section IV B, weassume that the multilayer partitions are generated bya copying process on the meso-set assignment of statenodes. Recall that the conditional probability distribu-tion for the updated meso-set assignment of a state node,given the current state of the copying process (Eq. (4) inSection IV B), is

P[Sj,β(τ + 1) = s|S(τ)]

=∑

(i,α)∈VMP j,βi,α δ(Si,α(τ), s)

+ (1− pj,β)Pβ0 [Sj,β = s] ,

pj,β =∑

(i,α)∈VMP j,βi,α ,

(B1)

where P0 is a given set of independent layer-specific nulldistributions and P is a given interlayer-dependency ten-sor. Further recall that a multilayer network can haveboth ordered aspects and unordered aspects, where wedenote the set of ordered aspects by O and the set ofunordered aspects by U . Our goal is to sample multi-layer partitions that are consistent with generation byEq. (B1) while respecting the update order from its or-dered aspects (if present). Recall that two layers, α andβ, are order-equivalent (which we denote by α ∼O β) ifαa = βa for all a ∈ O. Based on this equivalence relation,we obtain an ordered set of equivalence classes L/∼O byinheriting the order from the ordered aspects (where, fordefiniteness, we consider the lexicographic order [95] overordered aspects). We want to simultaneously sample themeso-set assignment of state nodes within each class oforder-equivalent layers and we want to sequentially sam-ple the meso-set assignment of state nodes in non-order-equivalent layers, conditional on the fixed assignment ofstate nodes in preceding layers (see Algorithm 1).

22

function SamplePartition(P , P0, O, nupd)for (i,α) ∈ Vm do

Si,α ∼ Pα0 . Initialize partition tensor

using null distributionsend forfor [α] ∈ L/∼O do . Loop over ordered aspects

in lexicographic orderfor τ ∈ {1, . . . |[α]|nupd} doα ∼ U([α]) . Sample uniformly from

order-equivalent layersfor i ∈ V do . Loop over nodes in some

orderSi,α ∼ P[Si,α|S]

. Update (i,α) according toEq. (B1)

end forend for

end forreturn S

end function

ALG. 1. Pseudocode for generating multilayer partitions withinterlayer-dependency tensor P , null distributions Pα

0 , and or-dered aspects O. The parameter nupd is the expected numberof updates for each state node.

Scan order and compatibility

To sample the induced partitions for each class oforder-equivalent layers, we use a technique that resem-bles Gibbs sampling [81]. As in Gibbs sampling, we de-fine a Markov chain on the space of multilayer partitionsby updating the meso-set assignment of state nodes bysampling from Eq. (B1). In Gibbs sampling, one as-sumes that the conditional distributions that are usedto define this Markov chain are compatible, in the sensethat there exists a joint distribution that realizes theseconditional distributions [82–85]. However, for a generalinterlayer-dependency tensor, there is no guarantee thatthe distributions that are defined by Eq. (B1) are com-patible. Following [80], we use the term “pseudo-Gibbssampling” to refer to Gibbs sampling from a set of po-tentially incompatible conditional distributions. We candefine different Markov chains by changing the way inwhich we select which conditional distribution to apply ateach step (i.e., which state node to update at each step).The order in which one cycles through the conditionaldistributions is known as the scan order [87]. We use theterm sampling Markov chain for the Markov chain that isdefined by pseudo-Gibbs sampling with a specified scanorder. Common scan orders are cycling through con-ditional distributions in a fixed order and sampling theupdate distribution uniformly at random from the setof conditional distributions. In pseudo-Gibbs sampling,the choice of scan order deserves careful attention. Inpseudo-Gibbs sampling, different scan orders correspondto sampling from different distributions [87]. (By con-trast, in Gibbs sampling, the choice of scan order influ-ences only the speed of convergence.) In particular, whencycling through conditional distributions in a fixed order,

conditional distributions that are applied later have anoutsized influence on the stationary distributions of thesampling Markov chain [87]. Sampling the update distri-bution uniformly at random mitigates this problem, atthe expense of introducing computational overhead.

We can improve on a purely random sampling strategyby exploiting the structure of the interlayer-dependencytensor. In particular, note that the conditional distribu-tions for state nodes in the same layer are independent(as we assume that the interlayer-dependency tensor hasno intralayer contributions) and hence compatible. Whenupdating a set of state nodes from a given layer, we canupdate them in any order or even update them concur-rently. To take advantage of this fact, we first sample alayer uniformly at random from the current class of order-equivalent layers and then update all state nodes in thatlayer. The sequence of layer updates defines a Markovchain on the space of multilayer partitions, and its con-vergence properties determine the level of effectiveness ofour sampling algorithm.

Convergence guarantees

We now discuss two key properties — aperiodicity andergodicity — that determine the convergence behavior offinite Markov chains.

First, note that sampling layers uniformly at randomguarantees that the Markov chain is aperiodic. To seethis, note that a sufficient condition for aperiodicity isthat, for all states, there is a nonzero probability to tran-sition from the state to itself. This clearly holds for oursampling chain, as it is possible for two consecutive up-dates to update the same layer, and (because the proba-bility distributions in Eq. (B1) remain unchanged at thesecond update) there is a positive probability that thesecond update does not change the partition.

Second, note that if pi,α < 1 for all state nodes(i,α) ∈ VM such that α ∈ [α], where [α] is a class oforder-equivalent layers, then the sampling Markov chainfor that class is ergodic over a subspace of multilayer par-titions that includes the support G[α] =

∏α∈[α] Gα of the

null distributions. We have already shown that the sam-pling Markov chain is aperiodic, so all that remains is toverify that there exists a sample path from any arbitraryinitial partition S0 to any arbitrary multilayer partitionSg ∈ G[α] in the support of the null distributions. Sucha sample path clearly exists, as

P[Si,α(τ + 1) = Sgi,α | Si,α(τ) = S0

i,α

]

≥ 1

|[α]| (1− pi,α)Pα0 [Sgi,α] > 0 .

Therefore, one can achieve the desired transition usingone update for each state node.

Sample paths of a finite, aperiodic Markov chain con-verge in distribution to a stationary distribution of theMarkov chain. However, different sample paths can con-

23

verge to different distributions, and the eventual station-ary distribution can depend both on the initial conditionand on the transient behavior of the sample path. If theMarkov chain is ergodic, the stationary distribution isunique, so it is independent of the initial condition andtransient behavior.

After sufficiently many updates (i.e., by specifying asufficiently large value for the expected number nupd ofupdates for each state node in Algorithm 1), we approx-imately sample the state of the Markov chains for eachclass of order-equivalent layers from their stationary dis-tributions. For fully-ordered multilayer networks (i.e., formultilayer networks with U = ∅), setting nupd = 1 is suf-ficient for convergence, because the equivalence classesinclude only a single layer in this case and subsequentupdates are thus independent samples from the same dis-tribution.

Sampling initial conditions in an ‘unbiased’ way canbe important for ensuring that one successfully exploresthe space of possible partitions, as the updating processis not necessarily ergodic over the support of the null

distributions when∑

(i,α)∈VM P j,βi,α = 1 for some state

nodes. For example, if pi,α = 1 for all (i,α) ∈ VM , thenany partition in which all state nodes in each weakly con-nected component of the interlayer-dependency networkhave the same meso-set assignment is an absorbing stateof the Markov chain.

However, provided the updating process has con-verged, any partition that one generates should still re-flect the desired dependencies between induced partitionsfor different layers. Therefore, to circumvent the problemof non-unique stationary distributions when the updatingprocess is not ergodic, we reinitialize the updating pro-cess by sampling from the null distribution for each par-tition that we want to generate. Reinitializing the initialpartition from a fixed distribution for each sample alwaysdefines a unique joint distribution, which is a mixture ofall possible stationary distributions when the updatingprocess is not ergodic. When the updating process isergodic, it is equivalent to sample partitions by reinitial-izing or to sample multiple partitions from a single longchain. Using a single long chain is usually more efficientwhen one needs many samples and it is not problematicif there is some dependence between samples [126]. Inour case, however, we usually need only a few sampleswith the same parameters, and ensuring independencetends to be more important than ensuring perfect con-vergence (provided the generated partitions exhibit thedesired interlayer-dependency structure). Using multiplechains thus has clear advantages even when the updat-ing process is ergodic, and it is necessary to use multiplechains when it is not ergodic.

Convergence tests

It is difficult to determine whether a Markov chain hasconverged to a stationary distribution, mostly because it

lag=1 lag=5 lag=10

lag=50 lag=100 lag=200

0 50 100 150 2000

0.05

0.1

0.15

nupd

mNM

I

(a) p = 0.5

0 50 100 150 2000

0.1

0.2

nupd

mNM

I

(b) p = 0.85

0 50 100 150 2000

0.1

0.2

0.3

0.4

nupd

mNM

I

(c) p = 0.95

0 50 100 150 2000

0.2

0.4

0.6

nupd

mNM

I

(d) p = 0.99

0 50 100 150 2000

0.5

1

nupdmNM

I

(e) p = 1

Figure 11. Autocorrelation of the sampling Markov chain,as measured by mNMI. On the horizontal axis, we show thevalue of nupd that we use to generate the first partition in thecomparison. The lag is the number of additional updates perstate node that we use to generate the second partition in thecomparison. We average results over a sample of 100 indepen-dent chains with different initial partitions. The shaded areaindicates one standard deviation above and below the mean.Note that the large amount of noise for small values of thelag is not a result of differences between chains in the sample;instead, it is an inherent feature of all chains in the sample.

is difficult to distinguish the case of a slow-mixing chainbecoming stuck in a particular part of state space fromthe case in which the chain has converged to a station-ary distribution. There has been much work on trying todefine convergence criteria for Markov chains [127], butnone of the available approaches are entirely successful.In practice, one usually runs a Markov chain (or chains)for a predetermined number of steps. One checks manu-ally for a few examples that the resulting chains exhibitbehavior that is consistent with convergence by examin-ing autocorrelations between samples of the same chainand cross-correlations between samples of independentchains with the same initial state. When feasible, onecan also check whether parts of different, independent

24

lag=1 lag=5 lag=10

lag=50 lag=100 lag=200

0 50 100 150 2000

2

4

·10−3

nupd

d

(a) p = 0.5

0 50 100 150 2000

2

4

6

·10−2

nupd

d

(b) p = 0.85

0 50 100 150 2000

0.1

0.2

nupd

d

(c) p = 0.95

0 50 100 150 2000

0.2

0.4

0.6

nupd

d

(d) p = 0.99

0 50 100 150 2000

0.5

1

nupd

d

(e) p = 1

Figure 12. Convergence of the dependency pattern be-tween induced partitions in different layers, as measured bythe mean absolute distance d between NMI matrices (seeEq. (B2)). On the horizontal axis, we show the value of nupd

that we use to generate the first partition in the comparison.The lag is the number of additional updates per state nodethat we use to generate the second partition in the compar-ison. We average results over a sample of 100 independentchains with different initial partitions. The shaded area in-dicates one standard deviation above and below the mean.

chains or different parts of the same chain are consistentwith sampling from the same distribution.

To estimate the number of updates that we need inthe numerical experiments of Section VI A, we examinethe autocorrelation of the Markov chain in two differ-ent ways. In Fig. 11, we consider the multilayer NMI(mNMI) between sampled partitions at different steps ofthe Markov chain. We are interested in the behavior ofthe mNMI both as a function of nupd (the expected num-ber of updates per state node that we use to generate thefirst partition in the comparison) and the lag (the num-ber of additional updates per state node that we use togenerate the second partition in the comparison). As theMarkov chain converges to a stationary distribution, thevalue of the mNMI for a given lag should become inde-

pendent of nupd (so the curves in Fig. 11 should becomeflat). The results in Fig. 11 suggest that the number ofupdates that we need for convergence increases moder-ately with p, where nupd ≈ 50 is sufficient for convergenceeven when p = 1. In Fig. 12, we examine whether thedependency pattern between induced partitions in differ-ent layers — we characterize such a pattern by the NMIbetween induced partitions in different layers — has con-verged using the mean absolute distance

d(S, T ) =2

l(l−1)∑

α<β

|NMI(S|α,S|β)−NMI(T |α, T |β)| (B2)

between the NMI matrices between induced partitionsin a multilayer partition. (We showed examples of suchNMI matrices in Figs. 4 and 5.) As with mNMI, themean absolute distance d should become independent ofnupd once the Markov chain has converged to a station-ary distribution. This yields estimates for the minimumnumber of updates for convergence that are consistentwith those from the results in Fig. 11. It is always safeto use a larger number of updates to sample partitions,and the results from Figs. 11 and 12 suggest that ourchoice of nupd = 200 is conservative.

Figs. 11 and 12 reveal additional important informa-tion about the behavior of our sampling process. We canestimate the mixing time of the sampling Markov chainat stationarity from Fig. 11, because the mixing timecorresponds to the lag that is necessary for the mNMIto converge to 0. The mixing time at stationarity in-creases with p and becomes infinite as p→ 1, becausethe Markov chain converges to an absorbing state. FromFig. 12, we observe that, for all values of p, the meanabsolute distance d between NMI matrices converges toa value near 0 and becomes essentially independent ofthe lag. This corroborates our assumption that differentpartitions that we generate from our sampling processhave similar interlayer-dependency structure.

Appendix C: A generative model for temporalnetworks

In this appendix, we examine the generative model fortemporal networks that we considered in Section IV D.In this model, we assume uniform dependencies be-tween contiguous layers (i.e., pβ = p ∈ [0, 1] for allβ ∈ {2, . . . , l} in Fig. 2a). For this choice of interlayer-dependency tensor, our generative model reduces to Al-gorithm 2. In this case (and, more generally, for anyfully-ordered multilayer network), convergence is auto-matic, because we need only one iteration through thelayers.

A first important feature of the generative model inAlgorithm 2 is that it respects the arrow of time. In par-ticular, meso-set assignments in a given layer depend onlyon meso-set assignments in the previous layer (e.g., theprevious temporal snapshot) and on the null distributions

25

function TemporalPartition(p, P0)S = 0 . Initialize partition tensor of

appropriate sizefor i ∈ V do . Loop over nodes in some

orderSi,1 ∼ P1

0 . Initialize induced partitionon first layer using nulldistribution

end forfor α ∈ 2 . . . l do . Loop over layers in

sequential orderfor i ∈ V do . Loop over nodes in some

orderif rand() < p then . With probability p

Si,α = Si,α−1 . Copying update (Step C)else . With probability 1− p

Si,α ∼ Pα0 . Reallocation update (StepR)

end ifend for

end forreturn S

end function

ALG. 2. Pseudocode for generating partitions of a temporalnetwork with uniform interlayer dependencies p between suc-cessive layers (i.e., pβ = p for all β ∈ {1, . . . , l} in Fig. 2a).

P0. That is, for all s ∈ {1, . . . , nl} and all α ∈ {2, . . . , l},we have

P [Si,α = s| S|α−1] = pδ(Si,α−1, s)

+ (1− p)Pα0 [Si,α = s] ,(C1)

where S|α is the n-node partition that is induced on layerα by the multilayer partition S. The value of p deter-mines the relative importance of the previous layer andthe null distribution. When p = 0, meso-set assignmentsin a given layer depend only on the null distribution ofthat layer (i.e., on the second term of the right-hand sideof Eq. (C1)). When p = 1, meso-set assignments in agiven layer are identical to those of the previous layer(and, by recursion, to meso-set assignments in all previ-ous layers).

Using Eq. (C1), we see that the marginal probabilitythat a given state node has a specific meso-set assignmentis

P [Si,α = s] = pP [Si,α−1 = s]

+ (1− p)Pα0 [Si,α = s] ,

= P10 [Si,1 = s] pα−1

+ (1− p)α−1∑

β=2

Pβ0 [Si,β = s] pα−β

+ (1− p)Pα0 [Si,α = s] , α > 1 .

Computing marginal probabilities can be useful for com-puting expected meso-set sizes for a given choice of nulldistributions.

We now highlight how the copying update (i.e., StepC) and the reallocation update (i.e., Step R) in Algo-rithm 2 govern the evolution of meso-set assignments be-tween consecutive layers. Steps C and R deal with themovement of nodes by first removing some nodes (“sub-traction”) and then reallocating them (“addition”). InStep C, a meso-set assignment s in layer α can lose anumber of nodes that ranges from 0 to all of them. It cankeep all of its nodes in layer α + 1 (i.e., Ss|α+1 = Ss|α),lose some of its nodes (i.e., Ss|α+1 ⊂ Ss|α), or disappearentirely (i.e., Ss|α+1 = ∅ and Ss|α 6= ∅). The null distri-bution in Step R is responsible for a meso-set assignments gaining new nodes (i.e., Ss|α+1 6⊂ Ss|α) or for the ap-pearance of a new meso-set label (i.e., Ss|α+1 6= ∅ andSs|α = ∅). When defining the null distributions P0, it isnecessary to consider the interplay between Step C andStep R.

To illustrate how the meso-set-assignment copying pro-cess and the null distribution in Algorithm 2 can interactwith each other, we give the conditional probability thata label disappears in layer α and the conditional probabil-ity that a label appears in layer α. For all s ∈ {1, . . . , nl}and all α ∈ {2, . . . , l}, the conditional probability that alabel disappears in layer α is

P [Ss|α = ∅| S|α−1]

= [(1− p) (1− Pα0 [Si,α = s])]

∣∣Ss|α−1

∣∣

× [p+ (1− p) (1− Pα0 [Si,α = s])]n−∣∣Ss|α−1

∣∣.

(C2)

The expression (C2) depends only on our copying pro-

cess and simplifies to (1−p)∣∣Ss|α−1

∣∣when Pα0 [Si,α = s] =

0 (i.e., when the probability of being assigned to label sis 0 using the null distribution of layer α). Furthermore,for progressively larger Pα0 [Si,α = s], the probability thata label disappears is progressively smaller.

For all s ∈ {1, . . . , nl} and all α ∈ {2, . . . , l}, the con-ditional probability that a label appears in layer α is

P[Ss|α 6= ∅| Ss|α−1 = ∅

]

= 1−[p+ (1− p)

( ∑

Sr|α−1∈S|α−1

Pα0 [Si,α = r]

)]n.

(C3)The expression (C3) gives the probability that at leastone node in layer α has the label s, given thatno node in layer α − 1 has the label s. When∑Sr|α−1∈S|α−1

Pα0 [Si,α = r] = 0, the probability that a

label appears depends only on our meso-set-assignmentcopying process and is given by 1 − pn. Furthermore,larger values of

∑Sr|α−1∈S|α−1

Pα0 [Si,α = r] reduce the

probability that a label appears in layer α.All discussions thus far in this appendix hold for any

choice of P0. In the next two paragraphs, we give two ex-amples to illustrate some features of the categorical nulldistribution from Section IV A. In particular, we focus onthe effect of the support of a categorical null distributionon a sampled multilayer partition. (See Section IV A for

26

definitions of some of the notation that we use below.)The support Gα of a categorical null distribution Pα

0 isGα = {s : pαs 6= 0}, where s ∈ {1, . . . , nset}. An impor-tant property of the support for Fig. 2a is that overlapbetween Gα and Gα+1 (i.e., Gα+1∩Gα 6= ∅) is a necessarycondition for a meso-set in layer α to gain new membersin layer α+ 1.

Let cα denote the vector of expected induced meso-setsizes in layer α (i.e., cαs = npαs ), and suppose that theprobabilities pα are the same in each layer (i.e., pα = pfor all α). The expected number of meso-set labels isthen the same for each layer, and the expected numberof nodes with meso-set label s is also the same in eachlayer and is given by cαs . This choice produces a temporalnetwork in which nodes change meso-set labels acrosslayers in a way that preserves both the expected numberof induced meso-sets in a layer and the expected sizes ofinduced meso-sets in a layer.

Now suppose that we choose the pα values such thattheir supports do not overlap (i.e., Gα ∩ Gβ = ∅ for allα 6= β). At each Step C in Algorithm 2, an existingmeso-set label can then only lose members; and withprobability 1 − pn, at least one new label will appearin each subsequent layer. For this case, one expects pcαsmembers of meso-set s in layer α to remain in meso-sets in layer α + 1 and (1− p)cαs members of meso-set s inlayer α to be assigned to new meso-sets (because labelsare nonoverlapping) in layer α+ 1. This choice thus pro-duces multilayer partitions in which the expected num-ber of new meso-set labels per layer is nonzero (unlessp = 1) and the expected size of a given induced meso-setdecreases in time.

Appendix D: M-DCSBM sampling procedure

In this appendix, we discuss a generalization to mul-tilayer networks of the degree-corrected SBM (DCSBM)from Ref. [43]. In other words, it is a multilayer DCSBM(M-DCSBM). The parameters of a general, directed M-DCSBM are a multilayer partition S (which determinesthe assignment of state nodes to meso-sets), a block ten-sor W (which determines the expected number of edgesbetween meso-sets in different layers), and a set σ ofstate-node parameters (which determine the allocationof edges to state nodes in meso-sets). The probability ofobserving an edge (or the expected number of edges, ifwe allow multi-edges) from state node (i,α) to state node(j,β) with meso-set assignments r = Si,α and s = Sj,βin a M-DCSBM is

P[Aj,βi,α = 1

]= σβ

i,αWs,βr,α σj,βα , (D1)

where W s,βr,α is the expected number of edges from state

nodes in layer α and meso-set Sr to state nodes in layer

β and meso-set Ss, the quantity σβi,α is the probability

for an edge starting in meso-set Sr in layer α and endingin layer β to be attached to state node (i,α) (note that

function DCSBM(S,σ,W )A = 0 . Initialize adjacency tensor

of appropriate sizefor α ∈ L do . Loop over layers

for r ∈ 1 . . . |S| dofor s ∈ r . . . |S| do

. Sample number of edgesfrom a Poisson distribution

if r = s thenm = Poisson(W r,α

r,α /2)else

m = Poisson(W s,αr,α )

end ife = 0 . Count sampled edgeswhile e < m do

i = Sample(Sr|α,σSr|α,α). Sample node i from induced

community r in layer αwith probability σi,α

j = Sample(Ss|α,σSs|α,α). Sample node j from induced

community s in layer αwith probability σj,α

if i 6= j & Aj,αi,α = 0 then. Reject self-edges or

multi-edgesAj,αi,α = 1, Ai,αj,α = 1e = e+ 1

end ifend while

end forend for

end forreturn A

end function

ALG. 3. Sampling multilayer networks from a DCSBM withcommunity assignments S, node parameters σ, and block ten-sor W .

the dependence on Sr is implicit in σβi,α), and σj,βα is the

probability for an edge starting in layer α and ending inmeso-set Ss in layer β to be attached to state node (j,β).(Note that the dependence on Ss is implicit in σj,βα .) Foran undirected M-DCSBM, both the block tensor W andthe state-node parameters σ are symmetric. That is,

W r,αs,β = W s,β

r,α and σi,αβ = σβi,α.

The above M-DCSBM can generate multilayer net-works with arbitrary expected layer-specific in-degreesand out-degrees for each state node. (The DCSBMof [43] can generate single-layer networks with arbitraryexpected degrees.) Given a multilayer network with ad-jacency tensor A, the layer-α-specific in-degree of statenode (j,β) is

kj,βα =∑

i∈VAj,βi,α ,

and the layer-β-specific out-degree of state node (i,α) is

kβi,α =∑

j∈VAj,βi,α .

27

The layer-α-specific in-degree and out-degree of statenode (i,α) are the “intralayer in-degree” and “intralayerout-degree” of (i,α). For an undirected multilayernetwork, layer-β-specific in-degrees and out-degrees are

equal (i.e., ki,αβ = kβi,α). We refer to their common valueas the “layer-β-specific degree” of a state node. For anensemble of networks generated from an M-DCSBM, theassociated means are

〈kj,βα 〉 = σj,βα

|S|∑

r=1

W s,βr,α , s = Sj,β (D2)

and

〈kβi,α〉 = σβi,α

|S|∑

s=1

W s,βr,α , r = Si,α . (D3)

As we mentioned in Section V, we consider undirectedmultilayer networks with only intralayer edges for ourexperiments in Section VI. The block tensorW thus doesnot have any interlayer contributions (i.e., W s,β

r,α = 0 ifα 6= β). Furthermore, we can reduce the number of nodeparameters that we need to specify the M-DCSBM to asingle parameter σi,α = σα

i,α = σi,αα for each state node(i,α).

We sample the expected intralayer degreesei,α = 〈kαi,α〉, where kαi,α =

∑j∈VM Aj,αi,α , for the state

nodes from a truncated power law [128] with exponentηk, minimum cutoff kmin, and maximum cutoff kmax.We then construct the block tensor W and state-nodeparameters σ for the M-DCSBM from the sampledexpected degrees e and the meso-set assignments S. Let

κs,α =∑

i∈Ss|αei,α , Ss ∈ S

be the expected degree of meso-set s in layer α; and let

wα =1

2

∑

i∈Vei,α

be the expected number of edges in layer α. Conse-quently,

σi,α =ei,ακs,α

, s = Si,α

is the probability for an intralayer edge in layer α tobe attached to state node (i,α), given that the edge isattached to a state node that is in layer α and is part ofthe community Si,α.

The elements

W s,βr,α = δ(α,β)

((1− µ)δ(r, s)κs,α + µ

κr,ακs,α2wα

)

(D4)of the block tensor give, for r 6= s, the expected num-ber of edges between state nodes in meso-set s in layerβ and state nodes in meso-set r in layer α. For s = r

and β = α, the block-tensor element W s,βr,α gives twice

the expected number of edges. One way to think of theDCSBM benchmark is that we categorize each edge thatwe want to sample as an intra-meso-set edge with prob-ability 1 − µ or as a “random edge” (i.e., an edge thatcan be either an intra-meso-set edge or an inter-meso-setedge) with probability µ. To sample an edge, we sampletwo state nodes (which we then join by an edge). Wecall these two state nodes the “end points” of the edge.We sample the two end points of an intra-meso-set edgewith a frequency that is proportional to the expected de-gree of their associated state nodes, conditional on theend points being in the same meso-set. By contrast, wesample the two end points of a random edge with a fre-quency proportional to the expected degree of their as-sociated state nodes (without conditioning on meso-setassignment). We assume that the total number of edgesin layer α is sampled from a Poisson distribution [129]with mean wα. Although our procedure for samplingedges describes a potential algorithm for sampling net-works from the DCSBM benchmark, it usually is moreefficient to sample edges separately for each pair of meso-sets.

In Algorithm 3, we show pseudocode for the specificinstance of the model in Eq. (D1) that we use to sam-ple intralayer network edges (independently for each in-duced partition) for a given multilayer partition in ournumerical experiments of Section VI. The only differencebetween Algorithm 3 and the sampling algorithm of [43]is that we use rejection sampling to avoid creating self-edges and multi-edges. (In other words, if we sample anedge that has already been sampled or that is a self-edge,we do not include it in the multilayer network; instead,we resample.) Rejection sampling is efficient provided allblocks of the network are sufficiently sparse, such thatthe probability of generating multi-edges is small. Fordense blocks of a network, we instead sample edges fromindependent Bernoulli distributions with success proba-bility Eq. (D1). This algorithm for sampling networksfrom a DCSBM is very efficient; it scales linearly withthe number of edges in a network.

In the above discussion, we defined the parametersin Eq. (D1) to generate intralayer edges in a multilayernetwork by specifying layer-specific in-degrees and out-degrees. That is, the layer-β-specific in-degree and out-degree of state node (i,α) are 0 if β 6= α. We can alsoextend the model in Eq. (D1) to generate interlayer edges.For a directed multilayer network with interlayer edges,we sample expected layer-β-specific in-degrees ei,αβ and

out-degrees eβi,α for each state node (i,α) and layer βfrom appropriate distributions. Given expected layer-specific degrees e and a multilayer partition S, we thenconstruct the block tensor W and state-node parametersσ for the M-DCSBM analogously to the special case thatwe described above. Let

κs,αβ =∑

i∈Ss|αei,αβ , κβs,α =

∑

i∈Ss|αeβi,α , Ss ∈ S

28

be the expected layer-β-specific in-degree and out-degreeof meso-set s in layer α; and let

wβα =

∑

i∈Veβi,α =

∑

i∈Vei,βα

be the expected number of edges from layer α to layer β.(Note the consistency constraint on expected in-degreesand expected out-degrees.) It then follows that

σi,αβ =ei,αβκs,αβ

, σβi,α =

eβi,α

κβs,α, s = Si,α ,

and

W s,βr,α = (1− µ)δ(r, s)

κβs,α + κs,βα

2+ µ

κβr,ακs,βα

wβα

.

For directed networks, the expected layer-specific in-degrees and out-degrees generated by the above modeldo not correspond exactly to the values from the inputparameters e, except when µ = 1.

Appendix E: The Louvain algorithm and its variants

In this appendix, we describe the Louvain algorithmand variants of the Louvain algorithm that we use in thenumerical experiments in Section VI.

The Louvain algorithm [108] for maximizing (single-layer or multilayer) modularity proceeds in two phases,which are repeated iteratively. Starting from an initialpartition, one considers the state nodes one by one (insome order) and places each state node in a set that re-sults in the largest increase of modularity. (If there isno move that improves modularity, a state node keepsthe same assignment.) One repeats this first phase ofthe algorithm until reaching a local maximum. In thesecond phase of the Louvain algorithm, one obtains anew modularity matrix by aggregating the sets of statenodes that one obtains after the convergence of the firstphase. One then applies the algorithm’s first phase tothe new modularity matrix and iterates both phases un-til one converges to a local maximum. The two Louvain-like algorithms that we use in Section VI differ in howthey select which moves to make. The first is GenLou-vain, which always chooses the move that maximally in-creases modularity; the second is GenLouvainRand,which chooses modularity-increasing moves at random,such that the probability of a particular move is pro-portional to the resulting increase in the quality func-tion. (The latter is a variant of the algorithm “Louvain-Rand” in [16]; in that algorithm, one chooses modularity-increasing moves uniformly at random.) We use “reiter-ation” and “post-processing” to improve the output ofGenLouvainRand. Reiteration entails taking an out-put partition of GenLouvainRand as an initial parti-tion and reiterating the algorithm until the output par-tition no longer changes. Post-processing entails using

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3

µ = 0.4 µ = 0.5 µ = 0.6 µ = 0.7

µ = 0.8 µ = 0.9 µ = 1

GenLouvain

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(a) p = 0.85,

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(b) p = 0.95

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(c) p = 1

GenLouvainRand, no PP

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(d) p = 0.85

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(e) p = 0.95

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(f) p = 1

GenLouvainRand

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(g) p = 0.85

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(h) p = 0.95

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(i) p = 1

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(j) p = 0.85

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(k) p = 0.95

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(l) p = 1

Figure 13. Effect of interlayer coupling strength ω and relax-ation rate r on the ability of different community-detectionalgorithms to recover planted partitions as a function of thecommunity-mixing parameter µ in a uniform multiplex bench-mark (see Fig. 2b). Each multilayer network has 1000 nodesand 15 layers, and each node is present in all layers.

the Hungarian algorithm [130] to optimize the “persis-tence” [16] (i.e., the number of interlayer edges for whichboth end points have the same community assignment) ofeach output partition without changing the induced par-titions on each layer. All Louvain variants and relatedfunctionality are available at [107].

Appendix F: Multilayer NMI experiments

In this appendix, we calculate the multilayer NMI(mNMI) between the multilayer output partition andthe planted multilayer partition for the numerical experi-

29

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3

µ = 0.4 µ = 0.5 µ = 0.6 µ = 0.7

µ = 0.8 µ = 0.9 µ = 1

GenLouvain

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(a) p = 0.85, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(b) p = 0.95, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(c) p = 1, pc = 0


10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(d) p = 0.85, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(e) p = 0.95, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(f) p = 1, pc = 0

GenLouvainRand

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(g) p = 0.85, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(h) p = 0.95, pc = 0

10−2 1000

0.2

0.4

0.6

0.8

1

ω

mNM

I

(i) p = 1, pc = 0

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(j) p = 0.85, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(k) p = 0.95, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(l) p = 1, pc = 0

Figure 14. Effect of interlayer coupling strength ω and relax-ation rate r on the ability of different community-detectionalgorithms to recover planted partitions as a function of thecommunity-mixing parameter µ in a multiplex benchmarkwith nonuniform interlayer dependencies. Each multilayernetwork has 1000 nodes and 15 layers, and each node ispresent in all layers. The layer-coupled interlayer-dependencymatrix is a block-diagonal matrix with diagonal blocks ofsize 5 × 5. For each diagonal block, we set the value in theinterlayer-dependency matrix to a value p (so each diagonalblock has the same structure as the matrix in Fig. 2b); foreach off-diagonal block, we set the value to pc = 0, therebyincorporating an abrupt change in community structure. Wedefine the null distributions in these experiments so that thereis no overlap in community labels between different groups oflayers.

ments in Figs. 6–9. One needs to be cautious when inter-preting values of mNMI; they depend on how one deter-mines the correspondences between communities in dif-ferent layers, and these correspondences can be ambigu-

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3

µ = 0.4 µ = 0.5 µ = 0.6 µ = 0.7

µ = 0.8 µ = 0.9 µ = 1

GenLouvain

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(a) p = 0.85

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(b) p = 0.95

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(c) p = 1


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(d) p = 0.85

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(e) p = 0.95

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(f) p = 1

GenLouvainRand

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(g) p = 0.85

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(h) p = 0.95

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(i) p = 1

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(j) p = 0.85

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(k) p = 0.95

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(l) p = 1

Figure 15. Effect of interlayer coupling strength ω and relax-ation rate r on the ability of different community-detectionalgorithms to recover planted partitions as a function of thecommunity-mixing parameter µ in a temporal benchmarkwith uniform interlayer dependencies (i.e., pβ = p ∈ [0, 1]for all β ∈ {2, . . . , l} in Fig. 2a). Each multilayer networkhas 150 nodes and 100 layers, and each node is present in alllayers.

ous. For example, in a temporal network, if a communityin one layer splits into multiple communities in the nextlayer, there are multiple plausible ways to define labelsacross layers (e.g., all new communities get new labelsor the largest new community keeps the original label).Moreover, different quality functions reward interlayer-assignment labeling in different ways, and mNMI con-flates this issues with how well a quality function recov-ers structure within layers. In particular, a small value ofmNMI can hide an optimal value of 〈NMI〉 (as 〈NMI〉 = 1is necessary, but not sufficient, to have mNMI = 1).

We include only a subset of the values of p and p

30

µ = 0 µ = 0.1 µ = 0.2 µ = 0.3

µ = 0.4 µ = 0.5 µ = 0.6 µ = 0.7

µ = 0.8 µ = 0.9 µ = 1

GenLouvain

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(a) p = 0.85, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(b) p = 0.95, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(c) p = 1, pc = 0


0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(d) p = 0.85, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(e) p = 0.95, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(f) p = 1, pc = 0

GenLouvainRand

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(g) p = 0.85, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(h) p = 0.95, pc = 0

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

ω

mNM

I

(i) p = 1, pc = 0

InfoMap

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(j) p = 0.85, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(k) p = 0.95, pc = 0

s 0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

r

mNM

I

(l) p = 1, pc = 0

Figure 16. Effect of interlayer coupling strength ω and relax-ation rate r on the ability of different community-detectionalgorithms to recover planted partitions as a function of thecommunity-mixing parameter µ in a temporal benchmarkwith nonuniform interlayer dependencies. Each multilayernetwork has 150 nodes and 100 layers, and each node ispresent in all layers. Every 25th layer (i.e., for layers 25, 50,and 75), we set the value of pβ in Fig. 2a to pc = 0 (therebyincorporating an abrupt change in community structure); weset all other values of pβ in Fig. 2a to p. We define the nulldistributions in these experiments so that there is no overlapin community labels between different groups of layers.

that we considered in Figs. 6 and 8, as these suffice forthe purpose of this appendix. In the numerical exper-iments of Section VI, we observed that post-processingdoes not have a major effect on 〈NMI〉. This suggeststhat any misassignment across layers that underempha-sizes the “persistence” of a multilayer partition (whichpost-processing tries to mitigate [16]) does not affectan algorithm’s ability to identify structure within lay-

ers. (Recall that post-processing relabels an assignmentwithout changing the induced partitions.) We include fig-ures without post-processing in the experiments of thisappendix, as we expect the effect to be more noticeablewhen examining NMI values between multilayer parti-tions (which accounts for assignments both within andacross layers).

In Figs. 13 and 14, we use the same benchmark net-works as in the multiplex examples in Figs. 6 and 7, re-spectively. In Figs. 15 and 16, we use the same bench-mark networks as in the temporal examples of Figs. 8and 9, respectively. All mNMI values are means over10 runs of the algorithms and 100 instantiations of thebenchmark. (See the introduction of Section VI for moredetails on how we generate these instantiations.) Eachcurve in each figure corresponds to the mean mNMI val-ues that we obtain for a given value of the community-mixing parameter µ, and the shaded area around a curvecorresponds to the minimum and maximum mNMI val-ues that we obtain with the 10 sample partitions for agiven value of p or p.

For experiments with nonuniform interlayer dependen-cies in Figs. 14 and 16, it seems that InfoMap is betterat detecting abrupt differences (“breaks”, such as in theform of change points for temporal examples) in commu-nity structure than both GenLouvain and GenLou-vainRand, especially for the multiplex case (where thismanifests for all examined values of p). However, multi-layer InfoMap correctly identifies the planted commu-nity structure only when it is particularly strong (specif-ically, for µ ≤ 0.3). Recall from our experiments inFigs. 7 and 9 that both GenLouvain and GenLouvain-Rand are better than InfoMap at detecting an inducedpartition within a layer (especially in cases where theplanted community structure is quite weak). The rea-son that multilayer modularity maximization does notperfectly recover breaks in community structure of theplanted multilayer partition is because multilayer modu-larity maximization with uniform ordinal or categoricalcoupling (see the introduction of Section VI) incentivizes“persistence” of community labels between layers (andnonzero persistence is achievable even when one inde-pendently samples induced partitions on different lay-ers) [16]. However, by design, our benchmark networkshave no persistence whenever there is a break in commu-nity structure because of how we defined the support ofthe null distributions. Therefore, we do not expect mul-tilayer modularity maximization to recover the plantedlabels between layers when pc = 0; this manifests as a defacto maximum in the mNMI between the output parti-tion and the planted partition.

ACKNOWLEDGEMENTS

MB acknowledges a CASE studentship award fromthe EPSRC (BK/10/41). MB was also supported byThe Alan Turing Institute under the EPSRC grant

31

EP/N510129/1. LGSJ acknowledges a CASE stu-dentship award from the EPSRC (BK/10/39). MB,LGSJ, AA, and MAP were supported by FET-Proactiveproject PLEXMATH (FP7-ICT-2011-8; grant #317614)funded by the European Commission; and MB, LGSJ,and MAP were also supported by the James S. McDon-nell Foundation (#220020177). MAP was also supportedby the National Science Foundation (grant #1922952)through the Algorithms for Threat Detection (ATD) pro-gram. Computational resources that we used for this re-

search were supported in part by the National ScienceFoundation under Grant No. CNS-0521433, in part byLilly Endowment, Inc. through its support for the In-diana University Pervasive Technology Institute, and inpart by the Indiana METACyt Initiative. The IndianaMETACyt Initiative at Indiana University Bloomingtonwas also supported in part by Lilly Endowment, Inc. Wethank Sang Hoon Lee, Peter Mucha (and his researchgroup), Brooks Paige, and Roxana Pamfil for helpfulcomments.

[1] M. E. J. Newman, Networks, 2nd ed. (Oxford UniversityPress, 2018).

[2] P. Holme and J. Saramaki, “Temporal networks,” Phys.Rep. 519, 97–125 (2012).

[3] P. Holme, “Modern temporal network theory: A collo-quium,” Eur. Phys. J. B 88, 234 (2015).

[4] M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson,Y. Moreno, and M. A. Porter, “Multilayer networks,”J. Complex Netw. 2, 203–271 (2014).

[5] S. Boccaletti, G. Bianconi, R. Criado, C. I. Del Genio,J. Gomez-Gardenes, M. Romance, I. Sendina-Nadal,Z. Wang, and M. Zanin, “The structure and dynamicsof multilayer networks,” Phys. Rep. 544, 1–122 (2014).

[6] A. Aleta and Y. Moreno, “Multilayer networks in a nut-shell,” Annu. Rev. Condens. Matter Phys. 10 (2019).

[7] G. Bianconi, Multilayer networks: structure and func-tion (Oxford University Press, 2018).

[8] M. A. Porter, “WHAT IS ... A multilayer network,” No-tices Amer. Math. Soc. 65 (2018).

[9] M. De Domenico, A. Lancichinetti, A. Arenas, andM. Rosvall, “Identifying modular flows on multilayernetworks reveals highly overlapping organization in in-terconnected systems,” Phys. Rev. X 5, 011027 (2015).

[10] A. Ghasemian, P. Zhang, A. Clauset, C. Moore, andL. Peel, “Detectability thresholds and optimal algo-rithms for community structure in dynamic networks,”Phys. Rev. X 6, 031005 (2016).

[11] L. Peel, D. B. Larremore, and A. Clauset, “The groundtruth about metadata and community detection in net-works,” Science Advances 3, e1602548 (2017).

[12] M. Sarzynska, E. A. Leicht, G. Chowell, and M. A.Porter, “Null models for community detection in spa-tially embedded, temporal networks,” J. Complex Netw.4, 363–406 (2015).

[13] A. R. Pamfil, S. D. Howison, R. Lambiotte, andM. A. Porter, “Relating modularity maximization andstochastic block models in multilayer networks,” SIAMJournal on Mathematics of Data Science 1, 667–698(2019).

[14] L. G. S. Jeub and M. Bazzi, “A generative model formesoscale structure in multilayer networks,” https://

github.com/MultilayerGM (2016–2019).[15] M. De Domenico, A. Sole-Ribalta, E. Cozzo, M. Kivela,

Y. Moreno, M. A. Porter, S. Gomez, and A. Arenas,“Mathematical formulation of multilayer networks,”Phys. Rev. X 3, 041022 (2013).

[16] M. Bazzi, M. A. Porter, S. Williams, M. McDonald,D. J. Fenn, and S. D. Howison, “Community detection

in temporal multilayer networks, with an application tocorrelation networks,” Multiscale Model. Simul. 14, 1–41 (2016).

[17] R. F. Betzel and D. S. Bassett, “Multi-scale brain net-works,” Neuroimage 160, 73–83 (2017).

[18] D. Hric, K. Kaski, and M. Kivela, “Stochastic blockmodel reveals maps of citation patterns and their evo-lution in time,” J. Informetr. 12, 757–783 (2018).

[19] L. Cantini, E. Medico, S. Fortunato, and M. Caselle,“Detection of gene communities in multi-networks re-veals cancer drivers,” Sci. Rep. 5, 17386 (2015).

[20] C. De Bacco, E. A. Power, D. B. Larremore, andC. Moore, “Community detection, link prediction, andlayer interdependence in multilayer networks,” Phys.Rev. E 95, 042317 (2017).

[21] N. Stanley, S. Shai, D. Taylor, and P. J. Mucha, “Clus-tering network layers with the strata multilayer stochas-tic block model,” IEEE Trans. Network Sci. Eng. 3, 95–105 (2016).

[22] It is also possible to have ordinal layering in aspectsother than time.

[23] M. A. Porter, J.-P. Onnela, and P. J. Mucha, “Com-munities in networks,” Notices Amer. Math. Soc. 56,1082–1097, 1164–1166 (2009).

[24] S. Fortunato and D. Hric, “Community detection in net-works: A user guide,” Phys. Rep. 659 (2016).

[25] G. Rossetti and R. Cazabet, “Community discovery indynamic networks: A survey,” ACM Comput. Surv. 51,35 (2018).

[26] H. Cherifi, G. Palla, B. K. Szymanski, and X. Lu, “Oncommunity structure in complex networks: Challengesand opportunities,” arXiv:1908.04901 (2019).

[27] P. Csermely, A. London, L.-Y. Wu, and B. Uzzi,“Structure and dynamics of core–periphery networks,”J. Complex Netw. 1, 93–123 (2013).

[28] P. Rombach, M. A. Porter, J. H. Fowler, and P. J.Mucha, “Core–periphery structure in networks (revis-ited),” SIAM Rev. 59, 619–646 (2017).

[29] R. A. Rossi and N. K. Ahmed, “Role discovery in net-works,” IEEE Trans. Knowl. Data Eng. 27, 1112–1131(2015).

[30] S. Fortunato, “Community detection in graphs,” Phys.Rep. 486, 75–174 (2010).

[31] M. Vaiana and S. F. Muldoon, “Multilayer brainnetworks,” Journal of Nonlinear Science , 1–23(2018), available at https://doi.org/10.1007/

s00332-017-9436-8.

http://dx.doi.org/ 10.1093/comnet/cnu016

http://dx.doi.org/ 10.1103/PhysRevX.6.031005

https://github.com/MultilayerGM

https://github.com/MultilayerGM

http://dx.doi.org/ 10.1103/PhysRevX.3.041022

http://dx.doi.org/10.1016/j.physrep.2009.11.002

http://dx.doi.org/10.1016/j.physrep.2009.11.002

https://doi.org/10.1007/s00332-017-9436-8

https://doi.org/10.1007/s00332-017-9436-8

32

[32] P.J. Mucha and M. A. Porter, “Communities in multi-slice voting networks,” Chaos 20 (2010).

[33] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter,and J.-P. Onnela, “Community structure in time-dependent, multiscale, and multiplex networks,” Sci-ence 328, 876–878 (2010).

[34] E. Omodei and A. Arenas, “Untangling the role of di-verse social dimensions in the diffusion of microfinance,”Applied Network Science 95, 14 (2016).

[35] M. T. Schaub, J.-C. Delvenne, M. Rosvall, andR. Lambiotte, “The many facets of community detec-tion in complex networks,” Applied Network Science 2,4 (2017).

[36] L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J.Mucha, and M. W. Mahoney, “Think locally, actlocally: The detection of small, medium-sized, andlarge communities in large networks,” Phys. Rev. E 91,012821 (2015).

[37] M. E. J. Newman, “Finding community structure in net-works using the eigenvectors of matrices,” Phys. Rev. E74, 036104 (2006).

[38] G. Petri and P. Expert, “Temporal stability of networkpartitions,” Phys. Rev. E 90, 022813 (2014).

[39] R. Lambiotte, J.-C. Delvenne, and M. Barahona, “Ran-dom walks, Markov processes and the multiscale mod-ular organization of complex networks,” IEEE Trans.Network Sci. Eng. 1, 76–90 (2015), (see also the precur-sor paper at arXiv:0812.1770, 2008).

[40] J.-C. Delvenne, S. N. Yaliraki, and M. Barahona, “Sta-bility of graph communities across time scales,” Proc.Nat. Acad. Sci. U.S.A. 107, 12755–12760 (2010).

[41] M. Rosvall and C. T. Bergstrom, “Maps of randomwalks on complex networks reveal community struc-ture,” Proc. Nat. Acad. Sci. U.S.A. 105, 1118–1123(2008).

[42] P. W. Holland, K. B. Laskey, and S. Leinhardt,“Stochastic blockmodels: First steps,” Soc. Networks5, 109–137 (1983).

[43] B. Karrer and M. E. J. Newman, “Stochastic blockmod-els and community structure in networks,” Phys. Rev.E 83, 016107 (2011).

[44] T. P. Peixoto, “Inferring the mesoscale structure of lay-ered, edge-valued, and time-varying networks,” Phys.Rev. E 92, 042807 (2015).

[45] I. Kloumann, J. Ugander, and J. Kleinberg, “Blockmodels and personalized PageRank,” Proc. Nat. Acad.Sci. U.S.A. 114, 33–38 (2016).

[46] L. G. S. Jeub, M. W. Mahoney, P. J. Mucha, and M. A.Porter, “A local perspective on community structurein multilayer networks,” Network Science 5, 144–163(2017).

[47] J. D. Wilson, J. Palowitch, S. Bhamidi, and A. B. No-bel, “Community extraction in multilayer networks withheterogeneous community structure,” J. Mach. Learn.Res. 18, 1–49 (2017).

[48] U. Brandes, D. Delling, M. Gaertler, R. Goke, M. Hoe-fer, Z. Nikoloski, and D. Wagner, “On modularity clus-tering,” IEEE Trans. Knowl. Data Eng. 20, 172–188(2008).

[49] A. Z. Jacobs and A. Clauset, “A unified view of genera-tive models for networks: Models, methods, opportuni-ties, and challenges,” in NIPS Workshop on Networks:From Graphs to Rich Data (2014) arXiv:1411.4070.

[50] A. Ghasemian, H. Hosseinmardi, and A. Clauset,“Evaluating overfit and underfit in models of networkcommunity structure,” IEEE Trans. Knowl. Data Eng.(2019), available at 10.1109/TKDE.2019.2911585.

[51] S. E. Schaeffer, “Graph clustering,” Comput. Sci. Rev.1, 27–64 (2007).

[52] A. Lancichinetti and S. Fortunato, “Benchmark graphsfor testing community detection algorithms on directedand weighted graphs with overlapping communities,”Phys. Rev. E 80, 1 (2009).

[53] A. Lancichinetti, S. Fortunato, and F. Radicchi,“Benchmark graphs for testing community detection al-gorithms,” Phys. Rev. E 78, 046110 (2008).

[54] C. Granell, R. K. Darst, A. Arenas, S. Fortunato,and S. Gomez, “A benchmark model to assess commu-nity structure in evolving networks,” Phys. Rev. E 92,012805 (2015).

[55] A. Condon and R. M. Karp, “Algorithms for graphpartitioning on the planted partition model,” RandomStruct. Algor. 18, 116–140 (2000).

[56] M. Girvan and M. E. J. Newman, “Community struc-ture in social and biological networks,” Proc. Nat. Acad.Sci. U.S.A. 99, 7821–7826 (2002).

[57] T. P. Peixoto, “Nonparametric weighted stochasticblock models,” Phys. Rev. E 97, 012306 (2018).

[58] L. G. S. Jeub, O. Sporns, and S. Fortunato, “Multires-olution consensus clustering in networks,” Sci. Rep. 8,3259 (2018).

[59] S. Paul and Y. Chen, “Null models and modularitybased community detection in multi-layer networks,”arXiv:1608.00623 (2016).

[60] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborova,“Inference and phase transitions in the detection ofmodules in sparse networks,” Phys. Rev. Lett. 107,065701 (2011).

[61] D. Taylor, R. S. Caceres, and P. J. Mucha, “Super-resolution community detection for layer-aggregatedmultilayer networks,” Phys. Rev. X 7, 031056 (2017).

[62] G. K. Orman, V. Labatut, and H. Cherifi, “Towardsrealistic artificial benchmark for community detectionalgorithms evaluation,” Int. J. Web Based Communities9, 349–370 (2013).

[63] C. Aicher, A. Z. Jacobs, and A. Clauset, “Learning la-tent block structure in weighted networks,” J. ComplexNetw. 3, 221–248 (2014).

[64] M. E. J. Newman and T. P. Peixoto, “Generalized com-munities in networks,” Phys. Rev. Lett. 115, 088701(2015).

[65] X. Zhang, C. Moore, and M. E. J. Newman, “Randomgraph models for dynamic networks,” Eur. Phys. J. B90, 200 (2017).

[66] Y. Hulovatyy and T. Milenkovic, “SCOUT: Simultane-ous time segmentation and community detection in dy-namic networks,” Sci. Rep. 6, 37557 (2016).

[67] M. Q. Pasta and F. Zaidi, “Network generation modelbased on evolution dynamics to generate benchmarkgraphs,” arXiv:1606.01169 (2016).

[68] S. Paul and Y. Chen, “Consistent community detectionin multi-relational data through restricted multi-layerstochastic blockmodel,” Electron. J. Stat. 10, 3807–3870 (2016).

[69] T. Valles-Catala, F. A. Massucci, R. Guimera, andM. Sales-Pardo, “Multilayer stochastic block models re-veal multilayer structure of complex networks,” Phys.

http://dx.doi.org/ 10.1126/science.1184819

http://dx.doi.org/ 10.1126/science.1184819

http://arxiv.org/abs/0812.1770

10.1109/TKDE.2019.2911585

http://dx.doi.org/10.1103/PhysRevLett.115.088701

http://dx.doi.org/10.1103/PhysRevLett.115.088701

33

Rev. X 6, 011036 (2016).[70] P. Barbillon, S. Donnet, E. Lazega, and A. Bar-Hen,

“Stochastic block models for multiplex networks: an ap-plication to networks of researchers,” J. Roy. Stat. Soc.A 180, 295–314 (2016).

[71] Y. Huang, A. Panahi, H. Krim, and L. Dai, “Commu-nity detection and improved detectability in multiplexnetworks,” arXiv:1909.10477 (2019).

[72] M. De Domenico, A. Sole-Ribalta, S. Gomez, andA. Arenas, “Navigability of interconnected networks un-der random failures,” Proc. Nat. Acad. Sci. U.S.A. 111,8351–8356 (2014).

[73] S. Gomez, A. Dıaz-Guilera, J. Gomez-Gardenes,C. J. Perez Vicente, Y. Moreno, and A. Arenas, “Diffu-sion dynamics on multiplex networks,” Phys. Rev. Lett.110, 028701 (2013).

[74] Flattening (i.e., ‘matricizing’) [4] a tensor T entails writ-ing its entries in matrix form. Consequently, startingfrom a tensor T with elements Tβ

α , one obtain a matrix

T with entries Tα,β , where there are bijective mappingsbetween the indices.

[75] T. G. Kolda and B. W. Bader, “Tensor decompositionsand applications,” SIAM Rev. 51, 455–500 (2009).

[76] M. S. Bartlett, An Introduction to Stochastic Processes,3rd ed. (Press Syndicate of the University of Cambridge,Cambridge, UK, 1978).

[77] S. van Buuren, Flexible Imputation of Missing Data, 2nded. (Chapman & Hall/CRC Interdisciplinary Statistics,Boca Raton, FL, USA, 2018).

[78] J. E. Besag, “Spatial interaction and the statistical anal-ysis of lattice systems,” J. Roy. Stat. Soc. B 36, 192–236(1974).

[79] S. E. Fienberg and A. B. Slavkovic, “Preserving theconfidentiality of categorical statistical data bases whenreleasing information for association rules,” Data Min.Knowl. Discov. 11, 155–180 (2005).

[80] D. Heckerman, D. M. Chickering, C. Meek, R. Roundth-waite, and C. Kadie, “Dependency networks for infer-ence, collaborative filtering, and data visualization,” J.Mach. Learn. Res. 1, 49–75 (2000).

[81] Alan E. Gelfand, “Gibbs sampling,” J. Am. Stat. Assoc.95, 1300–1304 (2000).

[82] J. P. Hobert and G. Casella, “Functional compatibil-ity, Markov chains, and Gibbs sampling with improperposteriors,” J. Comput. Graph. Stat. 7, 42–60 (1998).

[83] B. C. Arnold and S. J. Press, “Compatible conditionaldistributions,” J. Am. Stat. Assoc. 84, 152–156 (1989).

[84] H.-Y. Chen, “Compatibility of conditionally specifiedmodels,” Stat. Probab. Lett. 80, 670–677 (2010).

[85] S.-H. Chen and E. H. lp, “Behavior of the Gibbs samplerwhen conditional distributions are potentially incom-patible,” J. Stat. Comput. Simul. 85, 3266–3275 (2015).

[86] K.-L. Kuo and Y. J. Wang, “A simple algorithm forchecking compatibility among discrete conditional dis-tributions,” Comput. Stat. Data Anal. 55, 2457–2462(2011).

[87] K.-L. Kuo and Y. J. Wang, “Pseudo-Gibbs samplerfor discrete conditional distributions,” Ann. Inst. Stat.Math. 71, 93–105 (2017).

[88] D. S. Bassett, N. F. Wymbs, M. A. Porter, P. J. Mucha,J. M. Carlson, and S. T. Grafton, “Dynamic reconfigu-ration of human brain networks during learning,” Proc.Nat. Acad. Sci. U.S.A. 118, 7641–7646 (2011).

[89] J. M. Buldu and M. A. Porter, “Frequency-based brainnetworks: From a multiplex framework to a full multi-layer description,” Network Neuro. 2, 418–441 (2017).

[90] U. Aslak, M. Rosvall, and S. Lehmann, “Constrainedinformation flows in temporal networks reveal intermit-tent communities,” Phys. Rev. E 97, 062312 (2018).

[91] E. Valdano, C. Poletto, and V. Colizza, “Infectionpropagator approach to compute epidemic thresholds ontemporal networks: impact of immunity and of limitedtemporal resolution,” Eur. Phys. J. B 88, 1–11 (2015).

[92] H. Raiffa and R. Schlaifer, Applied Statistical DecisionTheory , Wiley Classics Library (Wiley-Interscience,New York City, NY, USA, 2000).

[93] A. Agarwal and H. Daume III, “A geometric view ofconjugate priors,” Mach. Learn. 81, 99–113 (2010).

[94] For a fully-ordered multilayer network (i.e., whereU = ∅), condition (5) is equivalent to the existenceof a matrix representation of the flattened interlayer-dependency tensor that is upper triangular. (See Fig. 2afor an example.) In the case of a partially-ordered mul-tilayer network, the same equivalence holds if we “ag-gregate” [15] the interlayer-dependency tensor over un-ordered aspects.

[95] One defines lexicographic order for the Cartesian prod-uct of ordered sets similarly to the way that words areordered in a dictionary. For two elements e, f ∈ E =E1 × . . .× En, we define e ≺ f if e1 < f1 or there existsi such that ei < fi and ei′ = fi′ for all i′ < i.

[96] The order in which we update state nodes in the samelayer has no effect on the sampling process, becausetheir conditional distributions are independent. One caneven update them in parallel.

[97] L. Peel and A. Clauset, “Detecting change points in thelarge-scale structure of evolving networks,” in Proceed-ings of the 29th Conference on Artificial Intelligence(AAAI Press, Palo Alto, California, USA, 2015) pp.2914–2920.

[98] Different choices for normalizing mutual informationyield different versions of NMI [131]. We use joint-entropy normalization, which entails that

NMI(S, T ) =I(S, T )

H(S, T ),

where S and T denote two multilayer partitions, I de-notes mutual information, and H denotes joint entropy.This variant of the NMI has the desirable property that

1−NMI(S, T ) = NVI(S, T ) =VI(S, T )

H(S, T ),

where NVI is a normalized version of variation of in-formation (VI) [132, 133]. Variation of information is ametric for comparing partitions, and NVI preserves itsmetric properties [134].

[99] R. Gallotti and M. Barthelemy, “Anatomy and effi-ciency of urban multimodal mobility,” Sci. Rep. 4, 6911(2014).

[100] M. Zanin and F. Lillo, “Modelling the air transport withcomplex networks: A short review,” Eur. Phys. J. Spec.Top. 215, 5–21 (2013).

[101] S. Galhotra, A. Mazumdar, S. Pal, and B. Saha, “Thegeometric block model,” AAAI-18 Thirty-Second AAAIConference on Artificial Intelligence (2018).

http://dx.doi.org/ 10.1111/rssa.12193

http://dx.doi.org/ 10.1111/rssa.12193

http://www.pnas.org/content/111/23/8351.abstract

http://www.pnas.org/content/111/23/8351.abstract

http://www.ams.org/mathscinet-getitem?mr=MR1789469

http://www.ams.org/mathscinet-getitem?mr=MR1789469

http://dx.doi.org/ 10.1007/s10994-010-5203-x



34

[102] K.-M. Lee, J. Y. Kim, W.-K. Cho, K.-I. Goh, , and I.-M. Kim, “Correlated multiplexity and connectivity ofmultiplex random networks,” New J. Phys. 14, 033027(2012).

[103] A. R. Pamfil, S. D. Howison, and M. A. Porter, “Edgecorrelations in multilayer networks,” arXiv:1908.03875(2019).

[104] R. Lambiotte, L. Tabourier, and J.-C Delvenne,“Burstiness and spreading on temporal networks,” Eur.Phys. J. B 86, 320 (2013).

[105] D. J. Daley and D. Vere-Jones, An Introduction to theTheory of Point Processes, 2nd ed., Vol. 1 (Springer-Verlag, Heidelberg, Germany, 2003).

[106] A. G. Hawkes, “Spectra of some self-exciting and mu-tually exciting point processes,” Biometrika 58, 83–90(1971).

[107] L. G. S. Jeub, M. Bazzi, I. S. Jutla, and P. J. Mucha,“A generalized Louvain method for community detec-tion implemented in MATLAB,” https://github.com/

GenLouvain/GenLouvain (2011–2019), version 2.2.[108] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and

E. Lefebvre, “Fast unfolding of communities in largenetworks,” J. Stat. Mech: Theory Exp. 2008, P10008(2008).

[109] B. K. Fosdick, D. B. Larremore, J. Nishimura, andJ. Ugander, “Configuring random graph models withfixed degree sequences,” SIAM Rev. 60, 315–355 (2018).

[110] We use version 0.18.2 of the InfoMap code with the“--two-level” option. The code is available at http:

//mapequation.org/code.[111] M. Rosvall and C. T. Bergstrom, “An information-

theoretic framework for resolving community structurein complex networks,” Proc. Nat. Acad. Sci. U.S.A. 18,7327–7331 (2007).

[112] S. J. Cranmer, E. J. Menninga, and P. J. Mucha, “Kan-tian fractionalization predicts the conflict propensity ofthe international system,” Proc. Nat. Acad. Sci. U.S.A.112, 11812–11816 (2015).

[113] M. Barigozzi, G. Fagiolo, and D. Garlaschelli, “Multi-network of international trade: A commodity-specificanalysis,” Phys. Rev. E 81, 046104 (2010).

[114] R. Mastrandrea, T. Squartini, G. Fagiolo, and D. Gar-laschelli, “Reconstructing the world trade multiplex:The role of intensive and extensive biases,” Phys. Rev.E 90, 062804 (2014).

[115] W. Fan and A. Yeung, “Similarity between communitystructures of different online social networks and its im-pact on underlying community detection,” Commun.Nonlinear Sci. Numer. Simul. 20, 1015–1025 (2015).

[116] S. Kefi, V. Miele, E. A. Wieters, S. A. Navarrete, andE. L. Berlow, “How structured is the entangled bank?The surprisingly simple organization of multiplex eco-logical networks leads to increased persistence and re-silience,” PLoS Biol. 14, 1–21 (2016).

[117] We observe some erratic behavior at the transition asthe interlayer coupling ω approaches 1 from below. Forvalues of ω near 1 but smaller than 1, the GenLouvainalgorithm has a tendency to place all state nodes ina single community. This observation is related to thetransition behavior that was described in [16]. For valuesof ω above a certain threshold [16] only interlayer mergesoccur in the first phase of Louvain.

[118] M. Starnini, A. Baronchelli, and R. Pastor-Satorras,“Effects of temporal correlations in social multiplex net-

works,” Sci. Rep. 7 (2017).[119] A. Lomi, G. Robins, and M. Tranmer, “Introduction to

multilevel social networks,” Soc. Networks 44, 266–268(2016).

[120] M. Rosvall, A. V. Esquivel, A. Lancichinetti, J. D. We-stand, and R. Lambiotte, “Memory in network flowsand its effects on spreading dynamics and communitydetection,” Nat. Commun. 5, 4630 (2014).

[121] V. Nicosia and V. Latora, “Measuring and modelingcorrelations in multiplex networks,” Phys. Rev. E 92,032805 (2015).

[122] T. A. B. Snijders and K. Nowicki, “Estimation and pre-diction for stochastic blockmodels for graphs with latentblock structure,” J. Classif. 14, 75–100 (1997).

[123] K. Nowicki and T. A. B. Snijders, “Estimation and pre-diction for stochastic blockstructures,” J. Am. Stat. As-soc. 96, 1077–1087 (2001).

[124] T. P. Peixoto, “Bayesian stochastic blockmodeling,”arXiv:1705.10225 (2017).

[125] P. Latouche, E. Birmele, and C. Ambroise, “VariationalBayesian inference and complexity control for stochasticblock models,” Stat. Modelling 12, 93–115 (2012).

[126] L. Tierney, “Markov chains for exploring posterior dis-tributions,” Ann. Statist. 22, 1701–1728 (1994).

[127] M. K. Cowles and B. P. Carlin, “Markov chain MonteCarlo convergence diagnostics: A comparative review,”J. Am. Stat. Assoc. 91, 883–904 (1996).

[128] A power law with cutoffs xmin and xmax and exponent τis a continuous probability distribution with probabilitydensity function

p(x) =

{C x−η , xmin ≤ x ≤ xmax ,

0 , otherwise ,

where

C =η − 1

x−(η−1)min − x−(η−1)

max

is the normalization constant.[129] The choice of a Poisson distribution for the number of

edges ensures that this approach for sampling edges isapproximately consistent with sampling edges indepen-dently from Bernoulli distributions with success proba-bilities Eq. (D1). The exact distribution for the numberof edges is a Poisson–binomial distribution, from whichit is difficult to sample; a Poisson–binomial distributionis well-approximated by a Poisson distribution if noneof the individual edge probabilities are too large.

[130] H. W. Kuhn, “The Hungarian method for the assign-ment problem,” Naval Res. Logist. Quart. 2, 83–97(1955).

[131] Y. Y. Yao, “Information-theoretic measures for knowl-edge discovery and data mining,” in Entropy Measures,Maximum Entropy Principle and Emerging Applica-tions, Studies in Fuzziness and Soft Computing, Vol.119, edited by Karmeshu (Springer-Verlag, Heidelberg,Germany, 2003) pp. 115–136.

[132] M. Meila, “Comparing clusterings by the variation ofinformation,” in Learning Theory and Kernel Machines,Lecture Notes in Computer Science, Vol. 2777, editedby B. Scholkopf and M. K. Warmuth (Springer-Verglag,Heidelberg, Germany, 2003).

[133] M. Meila, “Comparing clusterings—An informationbased distance,” J. Multivar. Anal. 98, 873–895 (2007).

https://github.com/GenLouvain/GenLouvain

https://github.com/GenLouvain/GenLouvain

http://dx.doi.org/ 10.1088/1742-5468/2008/10/P10008

http://dx.doi.org/ 10.1088/1742-5468/2008/10/P10008

http://mapequation.org/code

http://mapequation.org/code

http://dx.doi.org/10.1073/pnas.1509423112

http://dx.doi.org/10.1073/pnas.1509423112

http://dx.doi.org/10.1103/PhysRevE.92.032805

http://dx.doi.org/10.1103/PhysRevE.92.032805

http://dx.doi.org/10.1214/aos/1176325750

http://dx.doi.org/10.2307/2291683

35

[134] M. Li, J. H. Badger, X. Chen, S. Kwong, P. Kearney,and H. Zhang, “An information-based sequence distance

and its application to whole mitochondrial genome phy-logeny,” Bioinformatics 7, 194–154 (2001).

arXiv:1608.06196v4 [cs.SI] 10 Aug 2019 · 2019-08-13 · A framework for the construction of...

Documents

Transcript of arXiv:1608.06196v4 [cs.SI] 10 Aug 2019 · 2019-08-13 · A framework for the construction of...