Asymptotic normality of graph statistics

14
Journal of Statistical Planning and Inference 21 (1989) 209-222 North-Holland 209 ASYMPTOTIC NORMALITY OF GRAPH STATISTICS Krzysztof NOWICKI Department of Mathematical Statistics, University of Lund, Box 118, S-221 00 Lund, Sweden Received 27 February 1986; revised manuscript received 11 February 1988 Recommended by 0. Frank Abstract: Various types of graph statistics for Bernoulli graphs are represented as numerators of incomplete U-statistics. Asymptotic normality of these statistics is proved for Bernoulli graphs in which the edge probability is constant. In addition it is shown that subgraph counts asymp- totically are linear functions of the number of edges in the graph. AMS Subject Classification: Primary 05C99; Secondary 62699, 92A20 Key words: Random graphs; subgraph counts; induced subgraph counts; U-statistics; incomplete U-statistics; Markov graphs. 1. Introduction The statistical analysis of graph data is often based on subgraph counts. In some cases, subgraph counts are sufficient statistics, as, for example, in the case of certain Markov graphs, see Frank and Strauss (1986). However, for other models and for empirical graph data, subgraph counts are merely used to describe structural proper- ties. Kendall and Babington Smith (1940) and Moran (1947) considered measures of inconsistency and agreement between judges in paired comparisons. Several authors have proposed and investigated indices for measuring transitivity of con- tacts and friendship choices, e.g. Holland and Leinhardt (1971, 197.5), Harary and Kommel (1979), Frank (1980) and Frank and Harary (1982). In random graphs, the subgraph counts are random variables with complicated mutual dependencies. Even for very simple graph models, studies of their asymp- totic distributions have been restricted to special classes of subgraphs. Moon (1968) studied the distribution of the numbers of 3- and 4-cycles in a random tournament and proved their asymptotic normality. For the most common random graph model, so-called Bernoulli graphs (a graph in which edges occur independently with a common probability p), subgraph counts were investigated for so-called balanced subgraphs. The classical reference is two papers by Erdos and RCnyi (1959, 1960). Further references can be found in the text books by Bollobas (1985) and Palmer (1985) and in the survey articles by Karonski (1982, 1984). 0378-3758/89/$3.50 0 1989, Elsevier Science Publishers B.V. (North-Holland)

Transcript of Asymptotic normality of graph statistics

Page 1: Asymptotic normality of graph statistics

Journal of Statistical Planning and Inference 21 (1989) 209-222

North-Holland

209

ASYMPTOTIC NORMALITY OF GRAPH STATISTICS

Krzysztof NOWICKI

Department of Mathematical Statistics, University of Lund, Box 118, S-221 00 Lund, Sweden

Received 27 February 1986; revised manuscript received 11 February 1988

Recommended by 0. Frank

Abstract: Various types of graph statistics for Bernoulli graphs are represented as numerators of

incomplete U-statistics. Asymptotic normality of these statistics is proved for Bernoulli graphs

in which the edge probability is constant. In addition it is shown that subgraph counts asymp-

totically are linear functions of the number of edges in the graph.

AMS Subject Classification: Primary 05C99; Secondary 62699, 92A20

Key words: Random graphs; subgraph counts; induced subgraph counts; U-statistics; incomplete

U-statistics; Markov graphs.

1. Introduction

The statistical analysis of graph data is often based on subgraph counts. In some

cases, subgraph counts are sufficient statistics, as, for example, in the case of certain

Markov graphs, see Frank and Strauss (1986). However, for other models and for

empirical graph data, subgraph counts are merely used to describe structural proper-

ties. Kendall and Babington Smith (1940) and Moran (1947) considered measures

of inconsistency and agreement between judges in paired comparisons. Several

authors have proposed and investigated indices for measuring transitivity of con-

tacts and friendship choices, e.g. Holland and Leinhardt (1971, 197.5), Harary and

Kommel (1979), Frank (1980) and Frank and Harary (1982).

In random graphs, the subgraph counts are random variables with complicated

mutual dependencies. Even for very simple graph models, studies of their asymp-

totic distributions have been restricted to special classes of subgraphs. Moon (1968)

studied the distribution of the numbers of 3- and 4-cycles in a random tournament

and proved their asymptotic normality. For the most common random graph

model, so-called Bernoulli graphs (a graph in which edges occur independently with

a common probability p), subgraph counts were investigated for so-called balanced

subgraphs. The classical reference is two papers by Erdos and RCnyi (1959, 1960).

Further references can be found in the text books by Bollobas (1985) and Palmer

(1985) and in the survey articles by Karonski (1982, 1984).

0378-3758/89/$3.50 0 1989, Elsevier Science Publishers B.V. (North-Holland)

Page 2: Asymptotic normality of graph statistics

210 K. Nowicki / Normality of graph statistics

In statistical analysis, Bernoulli graphs can be useful as representations of pure

chance phenomena; they can thus be used to test if empirical graph data are

generated from some more complicated model than a Bernoulli graph (Frank,

1980). Furthermore, Bernoulli graphs have been used to describe networks of ir-

regularly appearing contacts between vertices; see Frank (1979). Barlow and Pro-

schan (1975) considered reliability problems by using Bernoulli graphs. Holland and

Leinhardt (1975, 1979) and Wasserman (1977) considered moments of dyad and

triad counts (subgraph counts of order 2 and 3) for simple random graph models,

while Frank (1979) gave the first two moments of subgraph counts of arbitrary order

for general random graph models.

The purpose of this paper is to study the asymptotic behaviour of subgraph

counts for Bernoulli graphs in which the edge probability p is constant. It is proved

that subgraph counts are asymptotically normally distributed. In addition, we show

that subgraph counts asymptotically are linear functions of the number of edges in

the graph. This allows us to obtain simultaneous distributions for multiple counts

directly from distributions of single counts.

In Section 2 we will give basic concepts and notations. Section 3 presents some

results from the theory of U-statistics. In Section 4 we will give the asymptotic

distribution of a single subgraph count with certain extensions in Section 5. Section

6 is devoted to investigations of the simultaneous distribution of subgraph counts,

as well as their application to testing a Bernoulli graph versus a ?/Iarkov graph.

Finally, in Section 7 we study induced subgraph count statistics, both for a single

count and for simultaneous counts.

2. Basic concepts and notations

LetN={l,..., n} be a set of elements called vertices, and let E be a subset of the

collection of n’= (;) unordered pairs of vertices. The elements of E are called

edges. A graph G consists of an ordered pair (N, E) of sets of vertices and edges.

The number of vertices is called the order of G while the number of edges is called

the size of G. The size of G can be at most n’ and a graph on n vertices with size

n’ is called a complete graph and is denoted K,,.

A subgraph H= (V, 0 of G is itself a graph whose vertex set V is a subset of N,

and edge set r is a subset of E. Moreover, if every pair of vertices u and u of His

connected by an edge in H whenever it is connected by an edge in G, then H is an

induced subgraph of G.

Two graphs G and Hare isomorphic if there is a one-to-one function Y from the

vertex set of G onto the vertex set of H, such that for any two vertices t and u of

G we have an edge between t and u in G if and only if we have an edge between

Y(t) and Y(u) in H. We let t(G,H) denote the number of subgraphs of G which

are isomorphic to H. In particular, let G= K, and let H be a graph of order u,

u < n. Then, we have that b(K,, H) = (E)t(K,,, H).

Page 3: Asymptotic normality of graph statistics

K. Nowicki / Normality of graph statistics 211

3. Asymptotic normality of an incomplete U-statistic

Hoeffding (1948) considered a sequence of independent identically distributed

random variables (i.i.d. T.v.) X,, . . . , X, and a symmetric function g(x,, . . . ,xk),

which we call a kernel. He estimated the parameter B =E(g(X,, . . . ,X,)) by

where the g;‘s are obtained by inserting the (t) possible ordered subsequences

X;,, **. , Xjp 1 Sj, < ... <j, 5 n, in g, the numbering of the gj being arbitrary. The

r.v. U, is called a complete U-statistic. Now, set

02= ~UwXl, . ..>&)IX.N (3.1)

and suppose that o*>O. Using Hajek’s projection method it can be proved that

fi(U,- 0) is asymptotically normally distributed with mean 0 and variance k2a2. In order to calculate U, we have to evaluate the sum of (z) terms. To reduce

these calculations Blom (1976) introduced an incomplete U-statistic

(3.2)

Denote by S the set of the m subsequences Xj,, . . . ,XJk in the g;‘s. Due to the strong

dependence between many of the gi’s in U. we can often choose S so that U is

asymptotically equivalent to U,. This is shown by the following theorem, which

was essentially given by Blom (1976).

The m subsequences Xj,, . . . , Xjp appearing in S, form m2 pairs. Let f,, c=

1 , . . . , k, be the number of pairs with c X-values in common, and set pc=fc/m2. Then we have

Theorem 1. Suppose that for m/n + 00,

and p1 = k*/n + o( 1 /n)

;i2Pj=Wn).

(3.3a)

(3.3b)

Then (A) &(U- 0) is asymptotically normally distributed with mean 0 and variance k*a*.

(B) nE(U- U*)2+0 as n-+m, where

and

U*=(k/n) i h(X,)-(k-1)0 i=l

h(X;)=E(g(X;,,...,Xj~)IXi,=xj), Isi,< ... <i,<n.

Page 4: Asymptotic normality of graph statistics

212 K. Nowicki / Normality of graph statistics

Proof. For the proof of part (A) we refer the reader to Blom (1976). To prove part

(B) we write

E(U- U*)2=E(U- CQ2+E(U,- U*)2+2E{(U- U,)(U,- U*)}.

Furthermore, as proved by Hoeffding (1948) we have that

nE(U,- U*)2+0, as n-a,

and, as proved by Blom (1976), under the assumption of our theorem

nE(U - U,,)2 --f 0, as ~1403.

Finally, by invoking the Minkowski inequality we obtain (B). q

We shall now give the two other conditions that ensure the asymptotic normality

of U. Let a,, t=l,..., n, denote the number of gi’s in U which contain X,, and a,,

the number of g,‘s which contain both X, and X,,. Then we have:

Theorem 2. If

and

5 af/m’= k2/n + o(l/n) I=1

c afu/m2=o(l/n), 15l<USrl

then the conditions (3.3a) and (3.3b) of Theorem 1 ure satisfied.

(3.4a)

(3.4b)

Proof. Consider the g;‘s appearing in U. There are Q: pairs (gi,gj) which have X,

in common and a:, p airs which have both X, and X, in common. We seek the

number f, + ... + fk of pairs with at least one X-value in common. Clearly, we have

the double inequality

,~,a:-,_,~~~~a~~~f~+...+fh~,~~a:.

Similary it is found that

f2 + .., +fkl c 4, I~/<Ul~

Dividing these inequalities by m2 and using the conditions of the theorem we have

P1f ... +P,=k2/n+o(l/n), P2+ ... +Pk=o(l/n).

Hence the conditions (3.3a) and (3.3b) are satisfied. q

In the sequel we shall be particularly interested in the numerators

(2, ,?I T,=cg, and T=zg,

i= I i=l

Page 5: Asymptotic normality of graph statistics

K. Nowicki / Normality of graph statistics 213

of U, and U. We call these quantities complete and incomplete T-statistics, respec-

tively. All asymptotic results for u0 and U hold, of course, also for T0 and T after

multiplication by a suitable factor.

4. Subgraph counts

Consider a Bernoulli graph G,, on a finite set { 1, . . . , n} of vertices, i.e. a graph

where edges occur independently with the same probability p, O<p< 1. Such a

graph can be described by the sequence X,,, llt<urn, of n’=(G) independent

T.v., where

X

i

1 if an edge occurs between vertices t and u, 121

=

0 otherwise.

We want to count the number of subgraphs of G,, with prescribed properties.

For this purpose we may use the theory of U-statistics. We begin with a simple prob-

lem leading to a complete T-statistic: count the number of all subgraphs with s

edges. In what follows, we consider only subgraphs without isolated vertices, i.e.

subgraphs defined by specifying their set of edges. Subgraph counts for arbitrary

graphs are then obtained by multiplying these subgraph counts by a suitable factor.

For this purpose introduce the symmetric function

g(x,, . . . . x,)=x,x,...x,,

and the corresponding complete T-statistic ”

T, = ‘i’ gi . i= 1

(4.1)

Here the gj’s are obtained by inserting in g all possible s-subsequences taken from

the sequence {X,,}. Clearly, gi= 1 if all X,, corresponding to the i-th subsequence

are equal to 1. Hence T, counts the number of all subgraphs with s edges.

It follows from Hoeffding (1948) that T, is asymptotically normally distributed.

In this case we have

6 = E(g) =ps.

Moreover, using (3.1) we obtain that

a*= V(E(gjX,,)) = V(p”P1X,,)=p2sP1(1 -a).

Hence we have proved the following theorem:

(4.2a)

(4.2b)

Theorem 3. For T, defined as before,

Ilti;(T,/(“,‘) -P”)

is asymptotically normally distributed with mean 0 and variance s2ptiP ‘(1 -p).

Page 6: Asymptotic normality of graph statistics

214 K. Nowicki / Normality of graph statistics

We now consider a fixed subgraph H of order v and size s and seek the number

T, = t(Gn,p, H) of subgraphs of G,, isomorphic to H.

Introduce an incomplete T-statistic

r(k’,Z, H)

TH= 1 ST; (4.3) i=l

with the same kernel g as in (4.1) and with t(K,, H) being the number of subgraphs

isomorphic to H in K,,. By suitable choice of a set SH consisting of t(K,, H) s-

subsequences from {XtU} and inserting these subsequences in g, TH counts the

number of subgraphs isomorphic to H.

For example, take s = 4, n = 4 and let H be a 4-cycle, i.e. an undirected graph on

four vertices in which each vertex is connected with two other vertices. Set

T,=g(X,,,X 13,X,,,X,,)+g(X,,,X,,,X23,X34)+g(X13,X,4,X23,X24).

Clearly, TH counts the number of 4-cycles in G4,p.

For any given H the incomplete T-statistic in (4.3) is asymptotically normally

distributed. To prove this we shall check that the conditions of Theorem 2 are

satisfied for n = ~1’.

Let a,, be the frequency of the single variable X,, in SH; also let a,,,+,, be the fre-

quency of the pair (X,,,X,,,,); and a,,.,, the frequency of the pair (X,,,X,,). Due

to the invariance of the structure of SH when vertices are relabeled, the frequences

a lu, a IL!, 10; 9 a Iu,uI(, do not depend on the indices. Let us therefore call them c, , c2, cj,

respectively. It is immediately seen that c1 =st(K,, H)/n’; taking m = t(K,, H),

a,=a,, in condition (3.4a) we see that this condition is fulfilled.

We now turn to condition (3.4b). Its left side takes the form

(3(:)c; + 3(;)c:)/(%% H))2, (4.4)

where we have used the fact that there are 3(t) pairs of type (X,,,X,Z) and 3(t)

pairs of type (X,,,X,,). In order to find c2 and c3 denote the order of H by v and

note that a pair (XrU,XwZ) can be completed to a subsequence of SH in al(zIj)

ways, where a, depends on the form of H but not on n. Moreover, a pair (X,,, X,,)

can be completed to a subsequence of SH in a2(EIi) ways, even here a2 depends on

the form of SH but not n. Hence c2 is of order nvm4, and c3 of order noP3. Noting

that t(K,, H) is of order no we see from (4.4) that the left side in condition (4.3b)

is of order K3. But n’ =(z). Thus the left side of (3.4b) converges faster than l/n’

and hence (3.4b) is satisfied for n = n’.

We now observe that we still have 8 and o2 as in (4.2ab). Thus, from Theorem

1 we obtain:

Theorem 4. Let TH count subgraphs isomorphic to a given graph H of size s and

order v in G,,. Then

@(T,/t(K,, H) - p”)

Page 7: Asymptotic normality of graph statistics

K. Nowicki / Normality of graph statistics 215

is asymptotically normally distributed with mean 0 and variance ~~p~~~‘(l -p).

Let us now investigate the asymptotic distributions for some special subgraph

count statistics. Frank and Strauss (1986) showed that the number of triangles and

the number of k-stars for k = 1, . . . ,n - 1 are sufficient statistics for any

homogeneous Markov graph of order n. The distributions of these statistics in the

case of a Bernoulli graph can be obtained from Theorem 4, namely:

Corollary 5. tf T is the number of triangles, we have that

@(T/(Y) -p3)

is asymptotically normally distributed with mean 0 and variance 9(p5 -p6).

Corollary 6. If S, is the number of k-stars, where a k-star has order k+ 1 and size k, we have that

1/55;(W((,“,,)(k+ 1))-ak)

is asymptotically normally distributed with mean 0 and variance k2(p2kP ’ -Pan).

5. Extensions

The ideas developed in Section 4 can be generalized to counting the number of

all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each

of size s. As before, we consider only graphs without isolated vertices. For this pur-

pose introduce an incomplete T-statistic

T= f g; (5.1) i=l

with the same kernel g(xr, . . . ,x,) =x,x2 “.x, as before. Let S be the union of

SH,, .*. , S, and let m denote the number of subsequences in S. Clearly, T counts

the number of subgraphs isomorphic to at least one of HI, . . . , HL. Now, checking

the conditions of Theorem 2 for S we prove that T is asymptotically normally

distributed.

First, a,, is the sum of a:,, . . . , a; which are frequences of X,, in SH,, . . . ,SHI,

respectively. Hence

a?, = (W,, HI) + ... + t(K,,, HtA))2s2/n’2

and condition (3.4a) in Theorem 2 is fulfilled.

Second, the left side of condition (3.4b) takes the form

<3(~)c~ + 3(i)cf)/m2

where now the orders of magnitude of c2, c3 and m are determined by the highest

Page 8: Asymptotic normality of graph statistics

216 K. Nowicki / Normality of graph statistics

order among H,, . . . , HL. Hence condition (3.4b) holds true because it is true for

each S,.

Thus we have proved that T in (5.1) is asymptotically normally distributed and

the following theorem can be formulated:

Theorem 7. Let T count the number of all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each of size s in G,,. Then

fi(T/m -p”), where

m = t(K,, H,) f ... + t(K,,, HL),

is asymptotically normally distributed with mean 0 and variance s2p2’-‘(1 -p).

In particular, let us count all subgraphs in G,, of order u and size s. Then we

obtain the following corollary to Theorem 7:

Corollary 8. Let T count all subgraphs in G,, of order v and size s, where s> (‘2 ‘). Then

fi(T/m -p”),

where

m = (C) (‘) ( )

, S

is asymptotically normally distributed with mean 0 and variance s’~~‘~‘(l -p).

6. Asymptotic linear dependency between subgraph counts

For the purpose of investigating the asymptotic distributions of multiple subgraph

counts and linear combinations of subgraph counts the following theorem is useful:

Theorem 9. Let Tn count subgraphs in G,, isomorphic to a given graph H of order v and size s. Then, it holds that

E(@(T,/t(K,,H)-p”)-fl(spSP’R/n’-spS))2+0, as n-)m,

where R is the number of edges in G,,,.

Proof. By Theorem l(B) we have that fl(Tn/t(K,,,H)-pS) is asymptotically equal (in mean square) to

6( n’ (s/n’) c

l~i<J~rl

Furthermore, we have that

(6.1)

Page 9: Asymptotic normality of graph statistics

Thus, (6.1) is equal to

K. Nowicki / Normality of graph statistics 217

and by substituting R = 1 ,SicjSn Xij we obtain the result of the theorem. 0

Consider now T=(T,,..., T,.) to be a vector of r subgraph counts. Each Tj is

assumed to count subgraphs isomorphic to a given graph Hj of order Vj and size Sj,

respectively. The following result is obtained from Theorem 9:

Corollary 10. The random vector (F,, . . . , F,.) where

~=~(Tj/(t(K,,Hj)-pSJ), j=l,..., r,

is asymptotically normally distributed with mean vector 0 and covariance matrix C=(Cij)ij=, of rank 1. Here

Cij= CiCj, where Ci=s;p’l~~, i= 1, . . . . r.

Proof. By Theorem 4 each component of the considered vector is asymptotically

normally distributed. Thus, it remains to prove that a linear combination

CT= 1 aj Fj is asymptotically normally distributed. First, we introduce

Xj=il;.-~(s,pSJ-‘R/n’-sjpSJ), j=l,...,r,

i.e. the distance between Fj and its projection on {X,, 1 <i< j<n}. We have that

EXj = 0 and, from Theorem 9, that V(Xj) = E(Xi) * 0 as n + 00. Second, we have

that E(C& 1 ajXj)2 + 0 as n + w. Namely,

Thus, we have that

E j$, ajc-@P(R/n’-p) >

‘-0 as n-+03

where

Finally, since R E Bin(n’,p) we have that fl(R/n’-p) and consequently CJ= 1 CXj Tj are asymptotically normally distributed.

Page 10: Asymptotic normality of graph statistics

218 K. Nowicki / Normality of graph siatistics

To calculate the covariance matrix we write

C,:i=C(ai+biR,aj+bjR)=b;bjVar(R).

Hence, since b, = sips’ ‘/fl, bj = sips/ ’ /@ and Var(R)=n’p(l -p) we obtain

the covariance matrix. Finally, since each component of the vector is a multiple of

R we obtain that the rank is equal to 1. 0

Another application of Theorem 9 is in the investigation of linear combinations

of subgraph counts. We will consider the example of testing a Bernoulli graph

against a Markov graph. Here, we devote ourselves to a Markov graph model used

by Frank and Strauss (1986) with probability function

P,(G)=c~‘exp(~r+as+rf),

where r, s, f are equal to the number of edges, 2-stars and triangles in graph G,

respectively. This model captures both transitivity and clustering in graph data and

can be seen as a natural generalisation of a Bernoulli graph model. In fact if

o = r = 0, P&G) reduces to a probability function of a Bernoulli graph G,, which

will be denoted by

P,(G) =p’(l -p)“‘-’ where p = e@/(l + ee).

Consider now the hypothesis

H,: P,(G) = c-l exp(pr)

against

H,: P,(G)=c,‘exp(elr+a,s+slt)

where e, ei, o, and ri are specified values of the parameters. By using the

Neyman-Pearson lemma we obtain that the critical region of the likelihood test is

of the type

(el-~)r+a,s+s~r>k.

Now, let R, S and T be the number of edges, 2-stars and triangles in Gn,p, respec-

tively. Then, from Theorem 9 we obtain that

S=2p(n-2)R-3p2(;)+&, and T=p’(n-2)R-2p3(;)+&,

where r.v. e, and E, are such that

E(E,) = E(E,) = 0 and I’(&,) = 0(n2), I/ = 0(n2).

Consequently, under Ha the critical region is approximately

((ei -e) + 2adn - 2) + tlp2(n - 2))R > k+ 3alp2(;) + 2rlp3(;),

from which we finally obtain that the test is of the following type:

R>k, if (~~-@)+2a,p(n-2)+r,p~(rz-2)>0,

R<k, if (~~-~)+2o,p(n-2)+s,p~(n-2)<0.

Page 11: Asymptotic normality of graph statistics

K. Nowicki / Normality of graph statistics 219

7. Induced subgrapb counts

Consider, as before, G,, to be a Bernoulli graph. We want to count the number

Tkd of induced subgraphs of G,,, isomorphic to a given graph H of order u and

size s. We assume that H does not have isolated vertices; for graph H with isolated

vertices we count instead induced subgraphs of G, 1 up isomorphic to the comple-

ment of H. Since an induced subgraph of G,, having ii, . . . , i,, say, as vertices, is

either isomorphic to H or nonisomorphic to H we can represent TEd as

rid= c g(Xi,!*, ..*3XjL,~ ,i,,) (7.1) l<il<...<i,5n

where

1

1 if the subgraph of G,, induced by

g(X,,+ . . . ,X;,_ , ;,) = i,, . . . , i, is isomorphic to H, 0 otherwise.

For instance, take H to be a cycle of size 4 and order 4. Then,

g(X@ Xik, Xi,, XJk, xjh xk/) =xijxi,xjkxk,(l - Xik)(l - xj/>

+xi,x;kx~,xk/(l -X,/)(1 - xjk)

+ xjkxi[xjkxj/( 1 - X0)( 1 - Xk,).

It is easy to verify that g is no longer a symmetric function and consequently Tgd is not a numerator of an incomplete U-statistic. However, T,$‘” can be written as a

linear combination of subgraph counts for all subgraphs Hi such that H C H; c K,. Namely,

Tkd= c (- l)S’-St(H;, H)T,, (7.2) HCH,CK,,

where Si is the size of Hi and TH, is the number of subgraphs of G,, isomorphic to

Hi. Thus we arrive at the following theorem:

Theorem 11. Let Tk” count induced subgraphs in G,, isomorphic to a given graph H of order v and size s. Then

I/TI;( rid/(:) - f3), where 0 = t(K,, H)pS(l -p)“‘-“,

is asymptotically normally distributed with mean 0 and variance e2(s- (y)p)‘/

(P(1 -P)).

Proof. Asymptotical normality of T2d follows from Corollary 10 since Tkd is a

linear combination of subgraph counts of the same order. Moreover, the expecta-

tion of Tkd can be easily obtained from representation (7.1) since each function

g(X,,,, ... ,xiv_ ,i,) is a sum of t(K,,H) random indicators, each with the expecta-

tion ps(l -p)“‘-“. In order to obtain the variance we use (7.2) and write

I/;;r(T;;ld/(;)-6’)=@( c HcH,cK,,

(-1,‘-‘t(Hj,H)TH,/(;)-8).

Page 12: Asymptotic normality of graph statistics

220 K. Nowicki / Normality of graph statistics

Thus we have to calculate

n’Var (-

,,Z, (- W”WiJW,X) I- ” >

which, by arguments as in the proof of Corollary 10, is asymptotically equal to

n’Var (

C (- l)S’~St(Hj,H)SiPSI~lf(Ku,H,)R/n’ HCH,CK,, >

and since Var(R) = n’pq to

P4 (

C ( - l)S’~St(Hj, H)s;pS’-’ t(K”, Hi) HcH,cK,, >

2. (7.3)

Finally, this expression can be simplified by calculating the expectation 13 using (7.2).

Namely, we obtain that

e=Hc;cK (- l)S’~St(H,,H)t(K,,Hi)PSf - I- ”

and consequently

(dB/dp) = 2 (- l)S’~Sf(H;,H)t(K,,H;)~ipS~~l. HcH,cK,,

Hence, expression (7.3) can be written pq(de/dp)2 and consequently variance is ob-

tained by evaluating the expression

d(W,,WpS(l-p)“‘-S)

dp

Remark 12. Observe that when p = s/(y) the asymptotic variance in Theorem 11 is

equal to 0 and we obtain only the convergence in probability to 0. Maehara (1987)

proved, independently, Theorem 11 when p = 0.5.

Let us now consider T=(T,,..., T,.) to be a vector of induced subgraph count

statistics, in which each component Tj counts induced subgraphs Hj of order uj and

size sJ, respectively. Then we can give the following theorem:

Theorem 13. We have that

(~(T,/(,:)-e,),...,~(T,/(G,)-er)

is asymptotically normally distributed with mean vector 0 and covariance matrix C = (Cij)ij= 1 of rank 1. Here

ej= t(K”, H,)$‘(l -p)“‘+

and C,= CiCj, Ci=e;(Si-(;)P)/J/m, i,j= 1, . . . . r.

Page 13: Asymptotic normality of graph statistics

K. Nowicki / Normality of graph statistics 221

Proof. The asymptotic normality can be proved by a method similar to that used

to prove Corollary 10. In order to establish covariance matrix C we use a method

similar to that used to prove Theorem Il. It allows us to conclude that

cjj = cov H,c;iX.,, (- ~)““-“‘~(H~,H;)~~P~~--I~(K~,H~)R/~,

c fl, r H,,, c K”

(- l)“‘flPsJt(H,, Hj)s,pS’“- ’ t(K,, H,)R/fl 1

and after calculation we obtain that

Finally, the evaluation of this expression gives the covariance matrix. 0 7

Consider now T= (T,, T,, T,, T,) to be a vector of triad counts for a Bernoulli

graph where the components T,. give the number of triads with r edges where

r=O , . . . ,3, i.e. T, is of order 3 and size r. Frank (1979) obtained the first- and

second-degree moments of triad counts. We can use Theorem 13 in order to

generalize his results. The following corollary gives the asymptotic distribution of T:

Corollary 14. Let T=(T,,,..., T,) be a vector of triad counts in G,,. Then

fl(T4;) - e),

where

13=(& ,..., f?,), f3,=(:)p’(l-p)3P’ for i=O ,..., 3,

is asymptotically normally distributed with mean vector 0 and covariance matrix C = (Cij):j=o where

and

Cij = Ci Cj

C,=(~)pi(l-p)3-i(3p-i)/ll(l-plp, i,j=O,...,3.

Acknowledgement

I am indebted to Professor Gunnar Blom and Professor Ove Frank for offering

so much of their time for discussion, valuable suggestions and comments. I also

want to thank Professor Svante Jansson who gave me valuable comments and Do-

cent Daniel Thorburn who further extended my understanding of the asymptotic

behaviour of subgraph counts.

Page 14: Asymptotic normality of graph statistics

222 K. Nowicki / Normality of graph statistics

References

Barlow, R.E. and F. Proschan (1975). Statistical Theory ofReliability and Life Testing. Holt, New York.

Blom, G. (1976). Some properties of incomplete U-statistics. Biometrika 63, 573-580.

BollobBs, B. (1985). Random Graphs. Academic Press, New York.

ErdGs, P. and A. Renyi (1959). On random graphs 1. Pubf. Math. Debrecen 6, 290-297.

Erdiis, P. and A. Renyi (1960). On the evolution of random graphs. Publ. Math. Insr. Hung. Acad. Sci.

5, 17-61.

Frank, 0. (1979). Moment properties of subgraph counts in stochastic graphs. Ann. New York Acad.

Sci. 319, 207-218.

Frank, 0. (1980). Transitivity in stochastic graphs and digraphs. J. Math. Social. 7, 199-213.

Frank, 0. and F. Harary (1982). Cluster inference by using ransitivity indices in empirical graphs. J.

Amer. Statist. Assoc. 77, 835-840.

Frank, 0. and D. Strauss (1986). Markov graphs. .I. Amer. Statist. Assoc. 81, 832-842.

Harary, F. and H. Kommel (1979). Matrix measures for transitivity and balance. J. Math. Social. 6,

199-210.

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist.

19, 293-325.

Holland, P. and S. Leinhardt (1971). Transitivity in structural models of small groups. Comparitive

Group Studies 2, 107-124.

Holland, P. and S. Leinhardt (1975). Local structure in social networks. In: D. Heise, Ed., Sociological

Methodology 1976. Jossey-Bass, San Francisco, CA.

Holland, P. and S. Leinhardt, Eds. (1979). Structural Sociometry Social Networks, Surveys, Advances

and Commentaries. Academic Press, New York.

Karoriski, M. (1982). A review of random graphs. J. Graph Theory 6, 349-389.

Karoriski, M. (1984). Balanced Subgraphs of Large Random Graph. Uniwersytet im. Adama Mickiewic-

za w Poznaniu, Seria Matematyka nr. 7.

Kendall, M. and B. Babington Smith (1940). On the method of paired comparisons. Biometrika 31,

324-345.

Maehara, H. (1987). On the number of induced subgraphs of a random graph. Discrete Math. 64,

309-312.

Moon, J.W. (1968). Topics on Tournaments. Holt, New York.

hloran, P.A. (1947). On the method of paired comparisons. Biometrika 34, 363-365.

Palmer, E.M. (1985). Graphical Evolution. Wiley, New, York.

Wasrcrman, S.S. (1977). Random directed graph distributions and the triad census in social networks.

J. Math. Social. 5, 61-86.