On the asymptotic joint normality of quantiles from a multivariate ...
Asymptotic normality of graph statistics
-
Upload
krzysztof-nowicki -
Category
Documents
-
view
216 -
download
0
Transcript of Asymptotic normality of graph statistics
Journal of Statistical Planning and Inference 21 (1989) 209-222
North-Holland
209
ASYMPTOTIC NORMALITY OF GRAPH STATISTICS
Krzysztof NOWICKI
Department of Mathematical Statistics, University of Lund, Box 118, S-221 00 Lund, Sweden
Received 27 February 1986; revised manuscript received 11 February 1988
Recommended by 0. Frank
Abstract: Various types of graph statistics for Bernoulli graphs are represented as numerators of
incomplete U-statistics. Asymptotic normality of these statistics is proved for Bernoulli graphs
in which the edge probability is constant. In addition it is shown that subgraph counts asymp-
totically are linear functions of the number of edges in the graph.
AMS Subject Classification: Primary 05C99; Secondary 62699, 92A20
Key words: Random graphs; subgraph counts; induced subgraph counts; U-statistics; incomplete
U-statistics; Markov graphs.
1. Introduction
The statistical analysis of graph data is often based on subgraph counts. In some
cases, subgraph counts are sufficient statistics, as, for example, in the case of certain
Markov graphs, see Frank and Strauss (1986). However, for other models and for
empirical graph data, subgraph counts are merely used to describe structural proper-
ties. Kendall and Babington Smith (1940) and Moran (1947) considered measures
of inconsistency and agreement between judges in paired comparisons. Several
authors have proposed and investigated indices for measuring transitivity of con-
tacts and friendship choices, e.g. Holland and Leinhardt (1971, 197.5), Harary and
Kommel (1979), Frank (1980) and Frank and Harary (1982).
In random graphs, the subgraph counts are random variables with complicated
mutual dependencies. Even for very simple graph models, studies of their asymp-
totic distributions have been restricted to special classes of subgraphs. Moon (1968)
studied the distribution of the numbers of 3- and 4-cycles in a random tournament
and proved their asymptotic normality. For the most common random graph
model, so-called Bernoulli graphs (a graph in which edges occur independently with
a common probability p), subgraph counts were investigated for so-called balanced
subgraphs. The classical reference is two papers by Erdos and RCnyi (1959, 1960).
Further references can be found in the text books by Bollobas (1985) and Palmer
(1985) and in the survey articles by Karonski (1982, 1984).
0378-3758/89/$3.50 0 1989, Elsevier Science Publishers B.V. (North-Holland)
210 K. Nowicki / Normality of graph statistics
In statistical analysis, Bernoulli graphs can be useful as representations of pure
chance phenomena; they can thus be used to test if empirical graph data are
generated from some more complicated model than a Bernoulli graph (Frank,
1980). Furthermore, Bernoulli graphs have been used to describe networks of ir-
regularly appearing contacts between vertices; see Frank (1979). Barlow and Pro-
schan (1975) considered reliability problems by using Bernoulli graphs. Holland and
Leinhardt (1975, 1979) and Wasserman (1977) considered moments of dyad and
triad counts (subgraph counts of order 2 and 3) for simple random graph models,
while Frank (1979) gave the first two moments of subgraph counts of arbitrary order
for general random graph models.
The purpose of this paper is to study the asymptotic behaviour of subgraph
counts for Bernoulli graphs in which the edge probability p is constant. It is proved
that subgraph counts are asymptotically normally distributed. In addition, we show
that subgraph counts asymptotically are linear functions of the number of edges in
the graph. This allows us to obtain simultaneous distributions for multiple counts
directly from distributions of single counts.
In Section 2 we will give basic concepts and notations. Section 3 presents some
results from the theory of U-statistics. In Section 4 we will give the asymptotic
distribution of a single subgraph count with certain extensions in Section 5. Section
6 is devoted to investigations of the simultaneous distribution of subgraph counts,
as well as their application to testing a Bernoulli graph versus a ?/Iarkov graph.
Finally, in Section 7 we study induced subgraph count statistics, both for a single
count and for simultaneous counts.
2. Basic concepts and notations
LetN={l,..., n} be a set of elements called vertices, and let E be a subset of the
collection of n’= (;) unordered pairs of vertices. The elements of E are called
edges. A graph G consists of an ordered pair (N, E) of sets of vertices and edges.
The number of vertices is called the order of G while the number of edges is called
the size of G. The size of G can be at most n’ and a graph on n vertices with size
n’ is called a complete graph and is denoted K,,.
A subgraph H= (V, 0 of G is itself a graph whose vertex set V is a subset of N,
and edge set r is a subset of E. Moreover, if every pair of vertices u and u of His
connected by an edge in H whenever it is connected by an edge in G, then H is an
induced subgraph of G.
Two graphs G and Hare isomorphic if there is a one-to-one function Y from the
vertex set of G onto the vertex set of H, such that for any two vertices t and u of
G we have an edge between t and u in G if and only if we have an edge between
Y(t) and Y(u) in H. We let t(G,H) denote the number of subgraphs of G which
are isomorphic to H. In particular, let G= K, and let H be a graph of order u,
u < n. Then, we have that b(K,, H) = (E)t(K,,, H).
K. Nowicki / Normality of graph statistics 211
3. Asymptotic normality of an incomplete U-statistic
Hoeffding (1948) considered a sequence of independent identically distributed
random variables (i.i.d. T.v.) X,, . . . , X, and a symmetric function g(x,, . . . ,xk),
which we call a kernel. He estimated the parameter B =E(g(X,, . . . ,X,)) by
where the g;‘s are obtained by inserting the (t) possible ordered subsequences
X;,, **. , Xjp 1 Sj, < ... <j, 5 n, in g, the numbering of the gj being arbitrary. The
r.v. U, is called a complete U-statistic. Now, set
02= ~UwXl, . ..>&)IX.N (3.1)
and suppose that o*>O. Using Hajek’s projection method it can be proved that
fi(U,- 0) is asymptotically normally distributed with mean 0 and variance k2a2. In order to calculate U, we have to evaluate the sum of (z) terms. To reduce
these calculations Blom (1976) introduced an incomplete U-statistic
(3.2)
Denote by S the set of the m subsequences Xj,, . . . ,XJk in the g;‘s. Due to the strong
dependence between many of the gi’s in U. we can often choose S so that U is
asymptotically equivalent to U,. This is shown by the following theorem, which
was essentially given by Blom (1976).
The m subsequences Xj,, . . . , Xjp appearing in S, form m2 pairs. Let f,, c=
1 , . . . , k, be the number of pairs with c X-values in common, and set pc=fc/m2. Then we have
Theorem 1. Suppose that for m/n + 00,
and p1 = k*/n + o( 1 /n)
;i2Pj=Wn).
(3.3a)
(3.3b)
Then (A) &(U- 0) is asymptotically normally distributed with mean 0 and variance k*a*.
(B) nE(U- U*)2+0 as n-+m, where
and
U*=(k/n) i h(X,)-(k-1)0 i=l
h(X;)=E(g(X;,,...,Xj~)IXi,=xj), Isi,< ... <i,<n.
212 K. Nowicki / Normality of graph statistics
Proof. For the proof of part (A) we refer the reader to Blom (1976). To prove part
(B) we write
E(U- U*)2=E(U- CQ2+E(U,- U*)2+2E{(U- U,)(U,- U*)}.
Furthermore, as proved by Hoeffding (1948) we have that
nE(U,- U*)2+0, as n-a,
and, as proved by Blom (1976), under the assumption of our theorem
nE(U - U,,)2 --f 0, as ~1403.
Finally, by invoking the Minkowski inequality we obtain (B). q
We shall now give the two other conditions that ensure the asymptotic normality
of U. Let a,, t=l,..., n, denote the number of gi’s in U which contain X,, and a,,
the number of g,‘s which contain both X, and X,,. Then we have:
Theorem 2. If
and
5 af/m’= k2/n + o(l/n) I=1
c afu/m2=o(l/n), 15l<USrl
then the conditions (3.3a) and (3.3b) of Theorem 1 ure satisfied.
(3.4a)
(3.4b)
Proof. Consider the g;‘s appearing in U. There are Q: pairs (gi,gj) which have X,
in common and a:, p airs which have both X, and X, in common. We seek the
number f, + ... + fk of pairs with at least one X-value in common. Clearly, we have
the double inequality
,~,a:-,_,~~~~a~~~f~+...+fh~,~~a:.
Similary it is found that
f2 + .., +fkl c 4, I~/<Ul~
Dividing these inequalities by m2 and using the conditions of the theorem we have
P1f ... +P,=k2/n+o(l/n), P2+ ... +Pk=o(l/n).
Hence the conditions (3.3a) and (3.3b) are satisfied. q
In the sequel we shall be particularly interested in the numerators
(2, ,?I T,=cg, and T=zg,
i= I i=l
K. Nowicki / Normality of graph statistics 213
of U, and U. We call these quantities complete and incomplete T-statistics, respec-
tively. All asymptotic results for u0 and U hold, of course, also for T0 and T after
multiplication by a suitable factor.
4. Subgraph counts
Consider a Bernoulli graph G,, on a finite set { 1, . . . , n} of vertices, i.e. a graph
where edges occur independently with the same probability p, O<p< 1. Such a
graph can be described by the sequence X,,, llt<urn, of n’=(G) independent
T.v., where
X
i
1 if an edge occurs between vertices t and u, 121
=
0 otherwise.
We want to count the number of subgraphs of G,, with prescribed properties.
For this purpose we may use the theory of U-statistics. We begin with a simple prob-
lem leading to a complete T-statistic: count the number of all subgraphs with s
edges. In what follows, we consider only subgraphs without isolated vertices, i.e.
subgraphs defined by specifying their set of edges. Subgraph counts for arbitrary
graphs are then obtained by multiplying these subgraph counts by a suitable factor.
For this purpose introduce the symmetric function
g(x,, . . . . x,)=x,x,...x,,
and the corresponding complete T-statistic ”
T, = ‘i’ gi . i= 1
(4.1)
Here the gj’s are obtained by inserting in g all possible s-subsequences taken from
the sequence {X,,}. Clearly, gi= 1 if all X,, corresponding to the i-th subsequence
are equal to 1. Hence T, counts the number of all subgraphs with s edges.
It follows from Hoeffding (1948) that T, is asymptotically normally distributed.
In this case we have
6 = E(g) =ps.
Moreover, using (3.1) we obtain that
a*= V(E(gjX,,)) = V(p”P1X,,)=p2sP1(1 -a).
Hence we have proved the following theorem:
(4.2a)
(4.2b)
Theorem 3. For T, defined as before,
Ilti;(T,/(“,‘) -P”)
is asymptotically normally distributed with mean 0 and variance s2ptiP ‘(1 -p).
214 K. Nowicki / Normality of graph statistics
We now consider a fixed subgraph H of order v and size s and seek the number
T, = t(Gn,p, H) of subgraphs of G,, isomorphic to H.
Introduce an incomplete T-statistic
r(k’,Z, H)
TH= 1 ST; (4.3) i=l
with the same kernel g as in (4.1) and with t(K,, H) being the number of subgraphs
isomorphic to H in K,,. By suitable choice of a set SH consisting of t(K,, H) s-
subsequences from {XtU} and inserting these subsequences in g, TH counts the
number of subgraphs isomorphic to H.
For example, take s = 4, n = 4 and let H be a 4-cycle, i.e. an undirected graph on
four vertices in which each vertex is connected with two other vertices. Set
T,=g(X,,,X 13,X,,,X,,)+g(X,,,X,,,X23,X34)+g(X13,X,4,X23,X24).
Clearly, TH counts the number of 4-cycles in G4,p.
For any given H the incomplete T-statistic in (4.3) is asymptotically normally
distributed. To prove this we shall check that the conditions of Theorem 2 are
satisfied for n = ~1’.
Let a,, be the frequency of the single variable X,, in SH; also let a,,,+,, be the fre-
quency of the pair (X,,,X,,,,); and a,,.,, the frequency of the pair (X,,,X,,). Due
to the invariance of the structure of SH when vertices are relabeled, the frequences
a lu, a IL!, 10; 9 a Iu,uI(, do not depend on the indices. Let us therefore call them c, , c2, cj,
respectively. It is immediately seen that c1 =st(K,, H)/n’; taking m = t(K,, H),
a,=a,, in condition (3.4a) we see that this condition is fulfilled.
We now turn to condition (3.4b). Its left side takes the form
(3(:)c; + 3(;)c:)/(%% H))2, (4.4)
where we have used the fact that there are 3(t) pairs of type (X,,,X,Z) and 3(t)
pairs of type (X,,,X,,). In order to find c2 and c3 denote the order of H by v and
note that a pair (XrU,XwZ) can be completed to a subsequence of SH in al(zIj)
ways, where a, depends on the form of H but not on n. Moreover, a pair (X,,, X,,)
can be completed to a subsequence of SH in a2(EIi) ways, even here a2 depends on
the form of SH but not n. Hence c2 is of order nvm4, and c3 of order noP3. Noting
that t(K,, H) is of order no we see from (4.4) that the left side in condition (4.3b)
is of order K3. But n’ =(z). Thus the left side of (3.4b) converges faster than l/n’
and hence (3.4b) is satisfied for n = n’.
We now observe that we still have 8 and o2 as in (4.2ab). Thus, from Theorem
1 we obtain:
Theorem 4. Let TH count subgraphs isomorphic to a given graph H of size s and
order v in G,,. Then
@(T,/t(K,, H) - p”)
K. Nowicki / Normality of graph statistics 215
is asymptotically normally distributed with mean 0 and variance ~~p~~~‘(l -p).
Let us now investigate the asymptotic distributions for some special subgraph
count statistics. Frank and Strauss (1986) showed that the number of triangles and
the number of k-stars for k = 1, . . . ,n - 1 are sufficient statistics for any
homogeneous Markov graph of order n. The distributions of these statistics in the
case of a Bernoulli graph can be obtained from Theorem 4, namely:
Corollary 5. tf T is the number of triangles, we have that
@(T/(Y) -p3)
is asymptotically normally distributed with mean 0 and variance 9(p5 -p6).
Corollary 6. If S, is the number of k-stars, where a k-star has order k+ 1 and size k, we have that
1/55;(W((,“,,)(k+ 1))-ak)
is asymptotically normally distributed with mean 0 and variance k2(p2kP ’ -Pan).
5. Extensions
The ideas developed in Section 4 can be generalized to counting the number of
all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each
of size s. As before, we consider only graphs without isolated vertices. For this pur-
pose introduce an incomplete T-statistic
T= f g; (5.1) i=l
with the same kernel g(xr, . . . ,x,) =x,x2 “.x, as before. Let S be the union of
SH,, .*. , S, and let m denote the number of subsequences in S. Clearly, T counts
the number of subgraphs isomorphic to at least one of HI, . . . , HL. Now, checking
the conditions of Theorem 2 for S we prove that T is asymptotically normally
distributed.
First, a,, is the sum of a:,, . . . , a; which are frequences of X,, in SH,, . . . ,SHI,
respectively. Hence
a?, = (W,, HI) + ... + t(K,,, HtA))2s2/n’2
and condition (3.4a) in Theorem 2 is fulfilled.
Second, the left side of condition (3.4b) takes the form
<3(~)c~ + 3(i)cf)/m2
where now the orders of magnitude of c2, c3 and m are determined by the highest
216 K. Nowicki / Normality of graph statistics
order among H,, . . . , HL. Hence condition (3.4b) holds true because it is true for
each S,.
Thus we have proved that T in (5.1) is asymptotically normally distributed and
the following theorem can be formulated:
Theorem 7. Let T count the number of all subgraphs isomorphic to at least one of a given set of graphs H,, . . . , HL, each of size s in G,,. Then
fi(T/m -p”), where
m = t(K,, H,) f ... + t(K,,, HL),
is asymptotically normally distributed with mean 0 and variance s2p2’-‘(1 -p).
In particular, let us count all subgraphs in G,, of order u and size s. Then we
obtain the following corollary to Theorem 7:
Corollary 8. Let T count all subgraphs in G,, of order v and size s, where s> (‘2 ‘). Then
fi(T/m -p”),
where
m = (C) (‘) ( )
, S
is asymptotically normally distributed with mean 0 and variance s’~~‘~‘(l -p).
6. Asymptotic linear dependency between subgraph counts
For the purpose of investigating the asymptotic distributions of multiple subgraph
counts and linear combinations of subgraph counts the following theorem is useful:
Theorem 9. Let Tn count subgraphs in G,, isomorphic to a given graph H of order v and size s. Then, it holds that
E(@(T,/t(K,,H)-p”)-fl(spSP’R/n’-spS))2+0, as n-)m,
where R is the number of edges in G,,,.
Proof. By Theorem l(B) we have that fl(Tn/t(K,,,H)-pS) is asymptotically equal (in mean square) to
6( n’ (s/n’) c
l~i<J~rl
Furthermore, we have that
(6.1)
Thus, (6.1) is equal to
K. Nowicki / Normality of graph statistics 217
and by substituting R = 1 ,SicjSn Xij we obtain the result of the theorem. 0
Consider now T=(T,,..., T,.) to be a vector of r subgraph counts. Each Tj is
assumed to count subgraphs isomorphic to a given graph Hj of order Vj and size Sj,
respectively. The following result is obtained from Theorem 9:
Corollary 10. The random vector (F,, . . . , F,.) where
~=~(Tj/(t(K,,Hj)-pSJ), j=l,..., r,
is asymptotically normally distributed with mean vector 0 and covariance matrix C=(Cij)ij=, of rank 1. Here
Cij= CiCj, where Ci=s;p’l~~, i= 1, . . . . r.
Proof. By Theorem 4 each component of the considered vector is asymptotically
normally distributed. Thus, it remains to prove that a linear combination
CT= 1 aj Fj is asymptotically normally distributed. First, we introduce
Xj=il;.-~(s,pSJ-‘R/n’-sjpSJ), j=l,...,r,
i.e. the distance between Fj and its projection on {X,, 1 <i< j<n}. We have that
EXj = 0 and, from Theorem 9, that V(Xj) = E(Xi) * 0 as n + 00. Second, we have
that E(C& 1 ajXj)2 + 0 as n + w. Namely,
Thus, we have that
E j$, ajc-@P(R/n’-p) >
‘-0 as n-+03
where
Finally, since R E Bin(n’,p) we have that fl(R/n’-p) and consequently CJ= 1 CXj Tj are asymptotically normally distributed.
218 K. Nowicki / Normality of graph siatistics
To calculate the covariance matrix we write
C,:i=C(ai+biR,aj+bjR)=b;bjVar(R).
Hence, since b, = sips’ ‘/fl, bj = sips/ ’ /@ and Var(R)=n’p(l -p) we obtain
the covariance matrix. Finally, since each component of the vector is a multiple of
R we obtain that the rank is equal to 1. 0
Another application of Theorem 9 is in the investigation of linear combinations
of subgraph counts. We will consider the example of testing a Bernoulli graph
against a Markov graph. Here, we devote ourselves to a Markov graph model used
by Frank and Strauss (1986) with probability function
P,(G)=c~‘exp(~r+as+rf),
where r, s, f are equal to the number of edges, 2-stars and triangles in graph G,
respectively. This model captures both transitivity and clustering in graph data and
can be seen as a natural generalisation of a Bernoulli graph model. In fact if
o = r = 0, P&G) reduces to a probability function of a Bernoulli graph G,, which
will be denoted by
P,(G) =p’(l -p)“‘-’ where p = e@/(l + ee).
Consider now the hypothesis
H,: P,(G) = c-l exp(pr)
against
H,: P,(G)=c,‘exp(elr+a,s+slt)
where e, ei, o, and ri are specified values of the parameters. By using the
Neyman-Pearson lemma we obtain that the critical region of the likelihood test is
of the type
(el-~)r+a,s+s~r>k.
Now, let R, S and T be the number of edges, 2-stars and triangles in Gn,p, respec-
tively. Then, from Theorem 9 we obtain that
S=2p(n-2)R-3p2(;)+&, and T=p’(n-2)R-2p3(;)+&,
where r.v. e, and E, are such that
E(E,) = E(E,) = 0 and I’(&,) = 0(n2), I/ = 0(n2).
Consequently, under Ha the critical region is approximately
((ei -e) + 2adn - 2) + tlp2(n - 2))R > k+ 3alp2(;) + 2rlp3(;),
from which we finally obtain that the test is of the following type:
R>k, if (~~-@)+2a,p(n-2)+r,p~(rz-2)>0,
R<k, if (~~-~)+2o,p(n-2)+s,p~(n-2)<0.
K. Nowicki / Normality of graph statistics 219
7. Induced subgrapb counts
Consider, as before, G,, to be a Bernoulli graph. We want to count the number
Tkd of induced subgraphs of G,,, isomorphic to a given graph H of order u and
size s. We assume that H does not have isolated vertices; for graph H with isolated
vertices we count instead induced subgraphs of G, 1 up isomorphic to the comple-
ment of H. Since an induced subgraph of G,, having ii, . . . , i,, say, as vertices, is
either isomorphic to H or nonisomorphic to H we can represent TEd as
rid= c g(Xi,!*, ..*3XjL,~ ,i,,) (7.1) l<il<...<i,5n
where
1
1 if the subgraph of G,, induced by
g(X,,+ . . . ,X;,_ , ;,) = i,, . . . , i, is isomorphic to H, 0 otherwise.
For instance, take H to be a cycle of size 4 and order 4. Then,
g(X@ Xik, Xi,, XJk, xjh xk/) =xijxi,xjkxk,(l - Xik)(l - xj/>
+xi,x;kx~,xk/(l -X,/)(1 - xjk)
+ xjkxi[xjkxj/( 1 - X0)( 1 - Xk,).
It is easy to verify that g is no longer a symmetric function and consequently Tgd is not a numerator of an incomplete U-statistic. However, T,$‘” can be written as a
linear combination of subgraph counts for all subgraphs Hi such that H C H; c K,. Namely,
Tkd= c (- l)S’-St(H;, H)T,, (7.2) HCH,CK,,
where Si is the size of Hi and TH, is the number of subgraphs of G,, isomorphic to
Hi. Thus we arrive at the following theorem:
Theorem 11. Let Tk” count induced subgraphs in G,, isomorphic to a given graph H of order v and size s. Then
I/TI;( rid/(:) - f3), where 0 = t(K,, H)pS(l -p)“‘-“,
is asymptotically normally distributed with mean 0 and variance e2(s- (y)p)‘/
(P(1 -P)).
Proof. Asymptotical normality of T2d follows from Corollary 10 since Tkd is a
linear combination of subgraph counts of the same order. Moreover, the expecta-
tion of Tkd can be easily obtained from representation (7.1) since each function
g(X,,,, ... ,xiv_ ,i,) is a sum of t(K,,H) random indicators, each with the expecta-
tion ps(l -p)“‘-“. In order to obtain the variance we use (7.2) and write
I/;;r(T;;ld/(;)-6’)=@( c HcH,cK,,
(-1,‘-‘t(Hj,H)TH,/(;)-8).
220 K. Nowicki / Normality of graph statistics
Thus we have to calculate
n’Var (-
,,Z, (- W”WiJW,X) I- ” >
which, by arguments as in the proof of Corollary 10, is asymptotically equal to
n’Var (
C (- l)S’~St(Hj,H)SiPSI~lf(Ku,H,)R/n’ HCH,CK,, >
and since Var(R) = n’pq to
P4 (
C ( - l)S’~St(Hj, H)s;pS’-’ t(K”, Hi) HcH,cK,, >
2. (7.3)
Finally, this expression can be simplified by calculating the expectation 13 using (7.2).
Namely, we obtain that
e=Hc;cK (- l)S’~St(H,,H)t(K,,Hi)PSf - I- ”
and consequently
(dB/dp) = 2 (- l)S’~Sf(H;,H)t(K,,H;)~ipS~~l. HcH,cK,,
Hence, expression (7.3) can be written pq(de/dp)2 and consequently variance is ob-
tained by evaluating the expression
d(W,,WpS(l-p)“‘-S)
dp
Remark 12. Observe that when p = s/(y) the asymptotic variance in Theorem 11 is
equal to 0 and we obtain only the convergence in probability to 0. Maehara (1987)
proved, independently, Theorem 11 when p = 0.5.
Let us now consider T=(T,,..., T,.) to be a vector of induced subgraph count
statistics, in which each component Tj counts induced subgraphs Hj of order uj and
size sJ, respectively. Then we can give the following theorem:
Theorem 13. We have that
(~(T,/(,:)-e,),...,~(T,/(G,)-er)
is asymptotically normally distributed with mean vector 0 and covariance matrix C = (Cij)ij= 1 of rank 1. Here
ej= t(K”, H,)$‘(l -p)“‘+
and C,= CiCj, Ci=e;(Si-(;)P)/J/m, i,j= 1, . . . . r.
K. Nowicki / Normality of graph statistics 221
Proof. The asymptotic normality can be proved by a method similar to that used
to prove Corollary 10. In order to establish covariance matrix C we use a method
similar to that used to prove Theorem Il. It allows us to conclude that
cjj = cov H,c;iX.,, (- ~)““-“‘~(H~,H;)~~P~~--I~(K~,H~)R/~,
c fl, r H,,, c K”
(- l)“‘flPsJt(H,, Hj)s,pS’“- ’ t(K,, H,)R/fl 1
and after calculation we obtain that
Finally, the evaluation of this expression gives the covariance matrix. 0 7
Consider now T= (T,, T,, T,, T,) to be a vector of triad counts for a Bernoulli
graph where the components T,. give the number of triads with r edges where
r=O , . . . ,3, i.e. T, is of order 3 and size r. Frank (1979) obtained the first- and
second-degree moments of triad counts. We can use Theorem 13 in order to
generalize his results. The following corollary gives the asymptotic distribution of T:
Corollary 14. Let T=(T,,,..., T,) be a vector of triad counts in G,,. Then
fl(T4;) - e),
where
13=(& ,..., f?,), f3,=(:)p’(l-p)3P’ for i=O ,..., 3,
is asymptotically normally distributed with mean vector 0 and covariance matrix C = (Cij):j=o where
and
Cij = Ci Cj
C,=(~)pi(l-p)3-i(3p-i)/ll(l-plp, i,j=O,...,3.
Acknowledgement
I am indebted to Professor Gunnar Blom and Professor Ove Frank for offering
so much of their time for discussion, valuable suggestions and comments. I also
want to thank Professor Svante Jansson who gave me valuable comments and Do-
cent Daniel Thorburn who further extended my understanding of the asymptotic
behaviour of subgraph counts.
222 K. Nowicki / Normality of graph statistics
References
Barlow, R.E. and F. Proschan (1975). Statistical Theory ofReliability and Life Testing. Holt, New York.
Blom, G. (1976). Some properties of incomplete U-statistics. Biometrika 63, 573-580.
BollobBs, B. (1985). Random Graphs. Academic Press, New York.
ErdGs, P. and A. Renyi (1959). On random graphs 1. Pubf. Math. Debrecen 6, 290-297.
Erdiis, P. and A. Renyi (1960). On the evolution of random graphs. Publ. Math. Insr. Hung. Acad. Sci.
5, 17-61.
Frank, 0. (1979). Moment properties of subgraph counts in stochastic graphs. Ann. New York Acad.
Sci. 319, 207-218.
Frank, 0. (1980). Transitivity in stochastic graphs and digraphs. J. Math. Social. 7, 199-213.
Frank, 0. and F. Harary (1982). Cluster inference by using ransitivity indices in empirical graphs. J.
Amer. Statist. Assoc. 77, 835-840.
Frank, 0. and D. Strauss (1986). Markov graphs. .I. Amer. Statist. Assoc. 81, 832-842.
Harary, F. and H. Kommel (1979). Matrix measures for transitivity and balance. J. Math. Social. 6,
199-210.
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist.
19, 293-325.
Holland, P. and S. Leinhardt (1971). Transitivity in structural models of small groups. Comparitive
Group Studies 2, 107-124.
Holland, P. and S. Leinhardt (1975). Local structure in social networks. In: D. Heise, Ed., Sociological
Methodology 1976. Jossey-Bass, San Francisco, CA.
Holland, P. and S. Leinhardt, Eds. (1979). Structural Sociometry Social Networks, Surveys, Advances
and Commentaries. Academic Press, New York.
Karoriski, M. (1982). A review of random graphs. J. Graph Theory 6, 349-389.
Karoriski, M. (1984). Balanced Subgraphs of Large Random Graph. Uniwersytet im. Adama Mickiewic-
za w Poznaniu, Seria Matematyka nr. 7.
Kendall, M. and B. Babington Smith (1940). On the method of paired comparisons. Biometrika 31,
324-345.
Maehara, H. (1987). On the number of induced subgraphs of a random graph. Discrete Math. 64,
309-312.
Moon, J.W. (1968). Topics on Tournaments. Holt, New York.
hloran, P.A. (1947). On the method of paired comparisons. Biometrika 34, 363-365.
Palmer, E.M. (1985). Graphical Evolution. Wiley, New, York.
Wasrcrman, S.S. (1977). Random directed graph distributions and the triad census in social networks.
J. Math. Social. 5, 61-86.