Applications of dense graph limits in probability …souravc/beam-ERGM-trans.pdfApplications of...

Applications of dense graph limits in probabilityand statistics

Sourav Chatterjee

(Courant Institute, NYU)

Based on joint works withPersi Diaconis and S. R. S. Varadhan

Sourav Chatterjee Dense graph limits in probability and statistics

Real world networks

I The last decade has seen an explosion in the study of realworld networks, e.g. rail and road networks, biochemicalnetworks, data communication networks such as the Internet,and social networks.

I Concerted interdisciplinary effort to develop new mathematicalnetwork models to explain characteristics of observed realworld networks, such as power law degree behavior, smallworld properties, and a high degree of clustering.

I Clustering/transitivity/reciprocity refers to the prevalence oftriangles in a graph.

I That is, a friend of a friend is more likely to be a friend than arandom individual.

I Most of the popular network models, such as the preferentialattachment and the configuration models, are locally tree-likeand thus do not model the transitivity observed in real socialnetworks.


Exponential Random Graphs

I In the social science literature, efforts to mathematically modeltransitivity have centered around the so-called ExponentialRandom Graph Models (ERGM), also called p∗ models.

I Statistically, ERGM’s are exponential families of distributionson the space of graphs on a given number of vertices.

I The sufficient statistics are usually simple graph parameters,such as the number of edges, number of triangles, etc.

I Notable early papers due to Holland and Leinhardt (1981),Frank and Strauss (1986). General development in the bookof Wasserman and Faust. Recent progresses in Handcock(2003), Snijders et. al. (2006), Park and Newman (2004,2005), etc.


Example

I Consider the model on simple graphs with n vertices,

pβ1,β2(G ) = exp

(β1E +

β2

n∆− n2ψn(β1, β2)

)where E , ∆ denote the number of edges and triangles in thegraph G , and ψn is the normalizing constant.

I The normalization of the model ensures non-trivial large nlimits. Without scaling, for large n, almost all graphs areempty or full.

I This model is studied by Strauss (1986), Park and Newman(2004, 2005), Haggstrom and Jonasson (1999), and manyothers.


Challenges

I For thirty years, nothing much could be done mathematicallywith these models. For example, no formula for ψn, norigorously proven information about qualitative behavior.

I Approximation of the normalizing constant, necessary forevaluation of maximum likelihood estimates, is usually donewith the aid of Markov Chain Monte Carlo.

I But: Bhamidi, Bresler and Sly (2008) have shown that,depending on β1 and β2,

I either the model behaves like an Erdos-Renyi random graph(the uninteresting case),

I or the usual MCMC algorithms take exponentially long time toconverge.

I Alternative (widely used) approach via pseudo-likelihoodmethods. But: properties are poorly understood, andappreciably higher variability than MLE.


An asymptotic formula for the edge-triangle model

Recall the model:

pβ1,β2(G ) = exp

(β1E +

β2

n∆− n2ψn(β1, β2)

),

where E and ∆ are the number of edges and triangles in G and ψn

is the normalizing constant.

Theorem (Chatterjee & Diaconis, 2011)

There is a negative constant −c, depending on β1, such that when−c < β2 <∞,

limn→∞

ψn(β1, β2)

= sup0≤u≤1

(β1u

2+β2u3

6− 1

2u log u − 1

2(1− u) log(1− u)

).

There is another negative constant −d, again depending on β1,such that the formula is not valid if β2 < −d.


The symmetric phase and symmetry breaking

I The region where the formula is valid is called the symmetricphase in our paper. Partially identified in an earlier work ofChatterjee & Dey (2009).

I In the symmetric phase, we prove that the model behavesessentially like an Erdos-Renyi graph (i.e. independent edges)with edge-probability u∗, where u∗ is the value of u thatsolves the maximization problem in the theorem.

I When β2 < −d , we prove that the model stops behaving likean Erdos-Renyi model and enters the region of brokensymmetry.

I We do not understand this region very well. We can onlyprove that when β2 is large and negative, the random graphsgenerated from the model look approximately like bipartitegraphs.


Large negative β2

1

2

3 4

5 6 7 8

9

10

11 12

13 14

15 16

17 18

19

20

Figure: A simulated realization of the exponential random graph modelon 20 nodes with edges and triangles as sufficient statistics, whereβ1 = 120 and β2 = −400. (Picture by Sukhada Fadnavis.)


Degeneracy

I Researchers, e.g. Handcock (2003), have observed thatmodels like the edge-triangle model tend to exhibit a certaindegeneracy: As the parameter values vary, the random graphsare either very sparse, or almost complete, skipping allintermediate structures.

I We have a theorem that gives a proof of this phenomenon.


Rigorous result about degeneracy


Let Gn be a random graph from the edge-triangle model. Fixβ1 < 0. Let

c1 :=eβ1/2

1 + eβ1/2, c2 := 1 +

1

β1.

Suppose |β1| is so large that c1 < c2. Let e(Gn) be the number ofedges in Gn and let f (Gn) := e(Gn)/

(n2

)be the edge density. Then

there exists q(β1) such that if β2 < q(β1), then as n→∞,

P(f (Gn) > c1)→ 0,

and if β2 > q(β1), then P(f (Gn) < c2)→ 0.

Remark. The difference in the values of c1 and c2 can be quitestriking even for relatively small values of β1. For example,β1 = −10 gives c1 ' 0.007 and c2 = 0.9.


Phase transitions and degeneracy

0.0 0.5 1.0 1.5

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(b) β1 = −0.35

0.0 0.5 1.0 1.5

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(a) β1 = −0.45

0.0 0.5 1.0 1.5

0.2

0.4

0.6

0.8

1.0

(b) β1 = −0.8

Figure: Plot of asymptotic edge density (on y-axis) vs. β2 (on x-axis) forthree different values of β1.

More progress on this recently by Charles Radin and Mei Yin.


General formula for ψn


For any β1 and β2,

limn→∞

ψn(β1, β2)

= supf

(β1

2

∫∫f (x , y)dxdy +

β2

6

∫∫∫f (x , y)f (y , z)f (z , x)dxdydz

− 1

2

∫∫ {f (x , y) log f (x , y) + (1− f (x , y)) log(1− f (x , y))

}dxdy

),

where the supremum is over all measurable f : [0, 1]2 → [0, 1]satisfying f (x , y) = f (y , x) and the integrals are from 0 to 1.

Remarks. (a) The symmetric phase is where the maximizer is aconstant function. (b) There is a general version of this theorem inour paper which applies to essentially all exponential random graphmodels.


First step: counting graphs with a given property

I 2n(n−1)/2 simple graphs on n vertices.

I Question: Given a property P and an integer n, roughly howmany of these graphs have property P?

I For example, P may be: #triangles ≥ tn3, where t is a givenconstant.

I How this helps: The ability to count the number of graphswith a given number of triangles and a given number of edgeswill lead to the evaluation of the normalizing constant in theedge-triangle model.

I To make any progress, need to assume some regularity on P.For example, we may demand that P be continuous or at leastmeasurable with respect to some metric.

I What metric? What space?


An abstract topological space of graphs

I Beautiful unifying theory developed by Laszlo Lovasz andcoauthors V. T. Sos, B. Szegedy, C. Borgs, J. Chayes,K. Vesztergombi, A. Schrijver and M. Freedman in the last sixyears. Related to earlier works of Aldous, Hoover, Kallenberg.

I Let Gn be a sequence of simple graphs whose number ofnodes tends to infinity.

I For every fixed simple graph H, let hom(H,G ) denote thenumber of homomorphisms of H into G (i.e. edge-preservingmaps V (H)→ V (G ), where V (H) and V (G ) are the vertexsets).

I This number is normalized to get the homomorphism density

t(H,G ) :=hom(H,G )

|V (G )||V (H)| .

This gives the probability that a random mappingV (H)→ V (G ) is a homomorphism.


Abstract space of graphs contd.

I Suppose that t(H,Gn) tends to a limit t(H) for every H.I Then Lovasz & Szegedy proved that there is a natural “limit

object” in the form of a function f ∈ W, where W is thespace of all measurable functions from [0, 1]2 into [0, 1] thatsatisfy f (x , y) = f (y , x) for all x , y .

I Conversely, every such function arises as the limit of anappropriate graph sequence.

I This limit object determines all the limits of subgraphdensities: if H is a simple graph with k vertices, then

t(H, f ) =

∫[0,1]k

∏(i ,j)∈E(H)

f (xi , xj) dx1 · · · dxk .

I A sequence of graphs {Gn}n≥1 is said to converge to f if forevery finite simple graph H,

limn→∞

t(H,Gn) = t(H, f ).


Example

I Consider the Erdos-Renyi random graph G (n, p). Each edge ispresent with probability p, independent of other edges.

I For any fixed graph H,

t(H,G (n, p))→ p|E(H)| almost surely as n→∞.

I On the other hand, if f is the function that is identically equalto p, then t(H, f ) = p|E(H)|.

I Thus, the sequence of random graphs G (n, p) convergesalmost surely to the non-random limit function f (x , y) ≡ p asn→∞.


Abstract space of graphs contd.

I The elements of W are sometimes called ‘graphons’.

I A finite simple graph G on n vertices can also be representedas a graphon f G is a natural way:

f G (x , y) =

{1 if (dnxe, dnye) is an edge in G ,

0 otherwise.

I Note that this allows all simple graphs, irrespective of thenumber of vertices, to be represented as elements of the singleabstract space W.

I So, what is the topology on this space?


The cut metric

I For any f , g ∈ W, Frieze and Kannan defined the cut distance:

d�(f , g) := supS ,T⊆[0,1]

∣∣∣∣∫S×T

[f (x , y)− g(x , y)]dxdy

∣∣∣∣.I Introduce an equivalence relation on W: say that f ∼ g if

f (x , y) = gσ(x , y) := g(σx , σy) for some measure preservingbijection σ of [0, 1].

I Denote by g the closure in (W, d�) of the orbit {gσ}.I The quotient space is denoted by W and τ denotes the

natural map g → g .I Since d� is invariant under σ one can define on W the natural

distance δ� by

δ�(f , g) := infσ

d�(f , gσ) = infσ

d�(fσ, g) = infσ1,σ2

d�(fσ1 , gσ2)

making (W, δ�) into a metric space.


Cut metric and graph limits

To any finite graph G , we associate the natural graphon f G and itsorbit G = τ f G = f G ∈ W. One of the key results of the theory isthe following:

Theorem (Borgs, Chayes, Lovasz, Sos & Vesztergombi)

A sequence of graphs {Gn}n≥1 converges to a limit f ∈ W if and

only if δ�(Gn, f )→ 0 as n→∞.

Another important result is:

Theorem (Lovasz & Szegedy)

The space W is compact under the metric δ�.


Counting graphs with a given property

I For any Borel set A ⊆ W, let

An := {h ∈ A : h = G for some G on n vertices}.I Let I (u) := 1

2u log u + 12(1− u) log(1− u).

I For any h ∈ W, let I (h) :=∫∫

I (h(x , y))dxdy , where h is any

element of h.

Theorem (Chatterjee & Varadhan, 2010)

The function I is well-defined and lower-semicontinuous on W. Forany measurable A ⊆ W,

− infeh∈closure(eA)I (h) ≥ lim sup

n→∞

log |An|n2

≥ lim infn→∞

log |An|n2

≥ − infeh∈interior(eA)I (h)


A simple application

I Under very special circumstances, the variational problem isknown to have an explicit solution.

I For example, we can prove that the number of graphs on nvertices with at least tn3 triangles is en2f (t)(1+o(1)), where

f (t) =

12 log 2 if 0 ≤ t < 1

48

−I ((6t)1/3) if 148 ≤ t < 1

6

−∞ if t ≥ 16 .

I On the other hand, for the number of graphs with at most tn3

triangles, we can prove such an explicit formula for tsufficiently away from zero, and can show that the formuladoes not hold sufficiently close to zero. But could not derivean explicit formula for small t.


Method of proof

I Counting graphs with a given property is essentially the sameas proving a Large Deviation Principle (LDP) for theErdos-Renyi random graph G (n, p). For example,

#graphs on n vertices satisfying P

= 2n(n−1)/2P(G (n, 1/2) satisfies P).

I The LDP can be proved by standard techniques for the weaktopology on W. (Fenchel-Legendre transforms, Gartner-Ellistheorem, etc.)

I However, the weak topology is not very interesting. Forexample, subgraph counts are not continuous with respect tothe weak topology.

I The LDP for the topology of the cut metric does not followvia standard methods.


Szemeredi’s lemma

I Let G = (V ,E ) be a simple graph of order n.I For any X ,Y ⊆ V , let eG (X ,Y ) be the number of X -Y edges

of G and let

ρG (X ,Y ) :=eG (X ,Y )

|X ||Y |I Call a pair (A,B) of disjoint sets A,B ⊆ V ε-regular if all

X ⊆ A and Y ⊆ B with |X | ≥ ε|A| and |Y | ≥ ε|B| satisfy|ρG (X ,Y )− ρG (A,B)| ≤ ε.

I A partition {V0, . . . ,VK} of V is called an ε-regular partitionof G if it satisfies the following conditions: (i) |V0| ≤ εn; (ii)|V1| = |V2| = · · · = |VK |; (iii) all but at most εK 2 of the pairs(Vi ,Vj) with 1 ≤ i < j ≤ K are ε-regular.

Theorem (Szemeredi’s lemma)

Given ε > 0, m ≥ 1 there exists M = M(ε,m) such that everygraph of order ≥ M admits an ε-regular partition {V0, . . . ,VK} forsome K ∈ [m,M].


Finishing the proof using Szemeredi’s lemma

I Suppose G is a graph of order n with ε-regular partition{V0, . . . ,VK}.

I Let G ′ be the random graph with independent edges where avertex u ∈ Vi is connected to a vertex v ∈ Vj with probabilityρG (Vi ,Vj).

I Using Szemeredi’s regularity lemma, one can prove thatδ�(G ,G ′) ' 0 with high probability if K and n areappropriately large and ε is small.

I Let f be the probability density of the law of G (n, p) withrespect to the law of G ′. (This is easily computed; gives riseto the entropy function.) Then

P(G (n, p) ≈ G ) ≈ f (G )P(G ′ ≈ G ) ≈ f (G ).

I Since the space W is compact, this allows us to approximateP(G (n, p) ∈ A) for any nice set A by approximating A as afinite union of small balls.


Solution of general ERGMs

I Let T :W → R be a bounded continuous function on thepseudometric space (W, δ�).

I Let Gn denote the set of simple graphs on n vertices.

I Then T induces a probability mass function pn on Gn:

pn(G ) := exp(n2T (G )− n2ψn),

where ψn is the normalizing constant.


For h ∈ W, define

I (h) :=1

2

∫∫[h(x , y) log h(x , y)

+ (1− h(x , y)) log(1− h(x , y))]dxdy .

Then limn→∞ ψn = suph∈W(T (h)− I (h)).


Summary

I Theory of graph limits gives a framework for proving the LargeDeviation Principle for the Erdos-Renyi random graph G (n, p).

I The LDP for G (n, p) can be used to count the number ofgraphs with a given property, such as a prescribed number ofedges and triangles.

I The graph counting theorem allows us to evaluate normalizingconstants and maximum likelihood estimates in exponentialrandom graph models and understand their qualitativebehavior. The solutions involve variational problems.

I The general theorems can be specialized in simple situationsto give useful byproducts such as proofs of degeneracy andother qualitative phenomena observed by practitioners.

I Another application: limits of graphs with a given degreesequence (joint work with Persi Diaconis and Allan Sly).

I Main open question: How to solve the variational problems incomplicated models?


A future direction: alternating sign ERGMs

I As we saw, the edge-triangle model does not exhibittransitivity.

I Alternating sign ERGMs, introduced by Snijders et. al. (2006),attempt to do this.

I A one-parameter example:

pβ(G ) ∝ exp

(βE − βL

n+β∆

n

),

where E , L and ∆ are the number of edges, 2-stars andtriangles in G .

I In this model, we can prove that if β is large, then the verticesautomatically divide into two groups of roughly equal size, sothat two vertices in the same group are connected by an edgewith probability ' 1, while two vertices in different groups areconnected with probability ' 1/2.


Applications of dense graph limits in probability …souravc/beam-ERGM-trans.pdfApplications of...

Documents

Transcript of Applications of dense graph limits in probability …souravc/beam-ERGM-trans.pdfApplications of...