Central Limit theory for the sample variance and correlation coefficient

Central limit theoremsfor variances and correlation coe¢ cients

E. Omey and S. Van GulckHUB �Stormstraat 2, 1000 - Brussels, Belgium{edward.omey, stefan.vangulck}@hubrussel.be

March 2008

Abstract

In many texbooks, the central limit theorem playes a prominent role. Instudying con�dence intervals for the mean � for example, the use of the centrallimit theorem is fully exploited. For large samples from an arbitrary distributionwith �nite second moment, we can always construct con�dence intervals and testhypothesis concerning �. In the same textbooks, in the treatment of the variance�2 and the correlation coe¢ cient �, the analysis is usually restricted to samplesfrom normal distributions!In this paper we give a general and simple central limit appraoch to these

parameters and show that it is convenient but not necessary to restrict attentionto normal samples. Among others we discuss central limit theorems for thesample variance s2, the sample correlation coe¢ cient r and the ratio of samplevariances s22=s

21 for paired and for unpaired samples.

1

1 Introduction

Let X1; X2; :::; Xn denote a sample from X s A(�; �2), where A is an arbitrarydistribution with � = E(X) and �2 = V ar(X). The sample mean is given by

X =1

n

nXi=1

Xi.

It is well known that E(X) = � and that V ar(X) = �2=n. To calculateprobabilities concerning X is a more complicated problem. For small samplesthere are not many distributions for which the distribution of X is known. Forlarge samples we can use the central limit theorem. The central limit theoremfor X states that as n!1, we have

pnX � ��

d=) Z s N(0; 1),

i.e. we have

P (pnX � ��

� x)! P (Z � x).

We use the notation X t N(�; �2=n). In many cases this approximation workssu¢ ciently well.The sample variances are given by

S2 = (X � �)2 = 1

n

nXi=1

(Xi � �)2,

s2 =n

n� 1(X �X)2 = n

n� 1(X2 �X2

).

It is well known that E(S2) = E(s2) = �2. For the variance, we �nd that

V ar(S2) =1

nV ar((X � �)2).

To calculate the variance of s2 is, in general, much more complicated. For asample from the normal distribution N(�; �2) there are no problems. In thiscase we have

nS2

�2s �2n,

(n� 1)s2�2

s �2n�1and for large n we have

S2 t N(�2;2�4

n), s2 t N(�2;

2�4

n� 1).

In the case of a sample from another distribution, these approximations areusually not valid. In section 2 of this paper, we provide a central limit theoremfor S2 and for s2.In section 3 we state and prove a multivariate central limit theorem and

then apply a tranfer theorem to obtain central limit theorems for the samplecoe¢ cient of variation CV , the sample correlation coe¢ cient r and the ratio ofsample variances.

2

2 Central Limit Theorem for S2 and s2

2.1 Central limit theorem for S2

In view of the de�nition of S2, using the ordinary central limit theorem, weimmediately obtain the following result.

Theorem 1 If X1; X2; :::; Xn is a sample from X where E(X4) <1, then

P (pn(S2 � �2) � x)! P (U � x)

where U s N(0; �2U ) with �2U = V ar((X � �)2).

Proof. Apply the central limit theorem to Yi = (Xi � �)2.

Remark. Note that �2U is related to the kurtosis (X) of X. Recall thatthe kurtosis is de�ned as:

(X) =E((X � �)4)

�4� 3 = V ar((X � �)2)

�4� 2.

We �nd that �2U = ( (X) + 2)�4.

2.2 Central limit theorem for s2

To prove a central limit theorem for s2, we rewrite s2 as follows. We have

(n� 1)s2 =

nXi=1

(Xi � �� (X � �))2

=nXi=1

(Xi � �)2 + n(X � �)2 � 2(X � �)nXi=1

(Xi � �)

= nS2 � n(X � �)2

It follows that

pn(s2 � �2) = n

n� 1pn(S2 � �2) +

pn

n� 1�2 � n

n� 1pn(X � �)2 (1)

We prove the following result.

Theorem 2 If X1; X2; :::; Xn is a sample from X where E(X4) <1, then

P (pn(s2 � �2) � x)! P (U � x)

where U s N(0; �2U ) with �2U = V ar((X � �)2).

3

Proof. Consider (1) and writepn(s2 � �2) = A+B, where

An =n

n� 1pn(S2 � �2),

Bn =

pn

n� 1�2 � n

n� 1pn(X � �)2.

Using Theorem 1, we have

P (An � x)! P (U � x).

For the second term we have

Bn =

pn

n� 1�2 � n

n� 1pn(X � �)(X � �).

Using the central limit theorem we have

P (pn(X � �)=� � x)! P (Z � x)

and the law of law numbers gives X � � P! 0. It follows that BnP! 0. The

result now follows.Remarks.1) In the previous result we used the following property: if Xn

d=) X and

YnP�! 0, then Xn + Yn

d=) X.

2) In section 4.1 we provide another proof of this result.3) We �nd con�dence intervals for �2 in the usual way. We have

�2 = s2 � z�=2�Upn

and using �2U = ( (X) + 2)�4 we �nd that

�2 =s2

1� z�=2p( (X) + 2)=n

.

In applications, we replace (X) by the sample kurtosis b .2.3 Special cases

1) If X s N(�; �2) ,we have E((X ��)3) = 0 and E((X ��)4) = 3�4 and thenit follows that �2U = 2�

4. We �nd back the known result.2) If X s BERN(p), then � = p and, using q = 1� p, we have

E((X � �)4) = p4q + q4p = pq(1� 3pq)

Now we �nd that �2U = pq(1� 4pq). Note that for p = 1=2 we have �2U = 0.3) If X s UNIF (�a; a), we have � = 0, �2 = a2=3 and E(X4) = a4=5. We

�nd that pn(s2 � a2=3) =) U s N(0; a4=5).

4

3 Multivariate central limit theorem

3.1 The central limit theorem

We prove the following theorem.

Theorem 3 Let (X1; Y1); (X2; Y2); :::; (Xn; Yn) denote a sample from a bivari-ate distribution (X;Y ) s A(�1; �2; �

21; �

22; �). Let X = n�1

Pni=1Xi and Y =

n�1Pn

i=1 Yi. Then we have

P (pn(X � �1) � x;

pn(Y � �2) � y)! P (U � x; V � y)

where (U; V ) has a bivariate normal distribution (U; V ) s BN(0; 0; �21; �22; �).

Proof. For arbitrary a and b where (a; b) 6= (0; 0), we consider aX+bY . Clearlywe have

E(aX + bY ) = a�1 + b�2,

V ar(aX + bY ) = a2�21 + b2�22 + 2ab��1�2.

Using the ordinary central limit theorem, we obtain that

pn(aX + bY � a�1 � b�2)

d=)W

whereW s N(0; a2�21 + b2�22 + 2ab��1�2).

Clearly this limit can be identi�ed as follows: we have

Wd= aU + bV

where (U; V ) has a bivariate normal distribution (U; V ) s BN(0; 0; �21; �22; �).

The result now follows from the Cramer-Wold device.

Remark. The Cramer-Wold device states that for random vectors (Xn; Yn)we have

(Xn; Yn)d=) (U; V )

if and only if

8(a; b) 6= (0; 0) : aXn + bYnd=) aU + bV .

This device is easy to prove by using generating functions or characteristicfunctions.

For random vectors with 3 or more components, we have a similar resultwith a similar proof.

5

Theorem 4 Let (X1;j ; :::; Xk;j), j = 1; 2; :::; n, denote a sample from a multi-variate distribution (X1; X2; :::; Xk) s A with means E(Xi) = �i and variance-covariance matrix = (cov(Xi; Xj))

ki;j=1. For each i = 1; 2; :::; k, let Xi =

n�1Pn

j=1Xi;j. Then we have

P (pn(X1 � �1) � x1;

pn(X2 � �2) � x2; :::;

pn(Xk � �k) � xk)

! P (U1 � x1; U2 � x2; :::; Uk � xk)

where (U1; U2; :::; Uk) has a multivariate normal distribution with E(Ui) = 0 andCov(Ui; Uj) = i;j.

The following corollary will we be useful.

Corollary 5 (5) Let (X1; Y1); (X2; Y2); :::; (Xn; Yn) denote a sample from a bi-variate distribution (X;Y ) s A(�1; �2; �21; �22; �) and suppose that E(X4+Y 4) <1. Consider the vectors

�!A = (X;Y ;X2; Y 2; XY ),�!� = (�1; �2; E(X

2); E(Y 2); E(XY )).

ThenP (pn(�!A ��!� ) � �!x )! P (

�!V � �!x ),

where�!V has a multivariate normal distribution with means 0 and with variance-

covariance matrix given by0BBBB@�21 Cov(X;Y ) Cov(X;X2) Cov(X;Y 2) Cov(X;XY )

�22 Cov(Y;X2) Cov(Y; Y 2) Cov(Y;XY )V ar(X2) Cov(X2; Y 2) Cov(X2; XY )

V ar(Y 2) Cov(Y 2; XY )V ar(XY )

1CCCCA (2)

3.2 Functions

Using the notations of Theorem 3, let us consider a new random variablef(X;Y ), where the function f(x; y) is su¢ ciently smooth. Writing the �rstterms of a Taylor expansion, we have

f(x; y) = f(a; b) +�f

�x(a; b)(x� a) + �f

�y(a; b)(y � b) + 1

2R

where the remainder term R is of the form

R = (x� a; y � b)�fx;x(�; �) fx;y(�; �)fx;y(�; �) fy;y(�; �)

��x� ay � b

�.

Here the fa;b denote the second partial derivatives of f , and � (resp. �) isbetween x and a (resp. y and b). If these partial derivatives are boundedaround (a; b), for some constant c > 0 we have

jRj � c((x� a)2 + (y � b)2 + j(x� a)(y � b)j).

6

Furthermore, if jx� aj � � and jy � bj � �, we �nd that��f(x; y)� f(a; b)� �f�x (a; b)(x� a)� �f�y (a; b)(y � b)�� 3c�2

and hence also that

�3c�2 + �f�x(a; b)(x� a) + �f

�y(a; b)(y � b)

� f(x; y)� f(a; b)

� 3c�2 +�f

�x(a; b)(x� a) + �f

�y(a; b)(y � b)

Now replace (x; y) and (a; b) by (X;Y ) and (�1; �2) and de�ne the followingquantities:

�!� = (�1; �2) = (

�f

�x(�1; �2);

�f

�y(�1; �2)),

An = �1pn(X � �1) + �2

pn(Y � �2),

Kn =pn(f(X;Y )� f(�1; �2)).

Note that Theorem 3 implies that P (A(n) � x)! P (W � x) = P (�1U+�2V �x).If��X � �1

�� and ��Y � �2�� , the previous analysis shows that�3c

pn�2 +An � Kn � 3c

pn�2 +An

Now consider P (Kn � x) and write P (Kn � x) = I + II, where

I = P (Kn � x;E),II = P (Kn � x;Ec),

where E is the event E =��X � �1

�� and ��Y � �2�� , and Ec its com-plement.We have II � P (Ec) � P (

��X � �1�� > �) + P (

��Y � �2�� > �). Using theinequality of Chebyshev, we obtain that

II � �21 + �22

n�2.

If we choose � such that n�2 !1, we obtain that II ! 0.For I, we have

I � P (�3pnc�2 +A(n) � x;E) � P (A(n) � x+ 3

pnc�2).

If we choose � such thatpn�2 ! 0, we �nd, after taking limits for n ! 1,

that I is bounded from above by P (W � x). A good choice of � is for example� = n�1=3. On the other hand, we have

I � P (3pnc�2 +A(n) � x;E)

= P (3pnc�2 +A(n) � x)� P (3

pnc�2 +A(n) � x;Ec)

7

As before, we have P (3pnc�2 + A(n) � x) ! P (W � x). For the other term,

we haveP (3

pnc�2 +A(n) � x;Ec) � P (Ec)! 0.

We obtain that as n!1, I is bounded from below by P (W � x). We concludethat

P (Kn � x)! P (W � x).

Clearly we have E(W ) = 0 and for the variance we �nd that

�2W = V ar(W ) = (�1; �2)

��1�2

�=�!�

�!� T .

where

=

�V ar(X) Cov(X;Y ))Cov(X;Y ) V ar(Y )

�.

This approach can also be used for random vectors with 3 or more components.The general result is the following.

Theorem 6 Using the notations of Theorem 4, if f is su¢ ciently smooth, wehave

P (pn(f(

�!A )� f(�!� )) � x)! P (W � x)

where W d=Pk

i=1 �iUi s N(0; �2W ) with �i = (�f=�xi)(�!� ) and �2W =

�!�

�!� T .

Remark. We can also consider vectors of functions.If (f1(

�!x ); f2(�!x ); :::; fm(�!x )), is such a vector, it su¢ ces to consider linearcombinations of the form

h(�!x ) = u1f1(�!x ) + u2f2(�!x ) + :::+ umfm(�!x )

where (u1; u2; :::; um) 6= (0; 0; :::; 0). Now Theorem 6 and the Cramer-Wolddevice can be used.

4 Variance and coe¢ cient of variation

4.1 The sample variance s2

Here is another proof of Theorem 2. Consider the vectors�!A = (X;X2), �!� =

(�;E(X2) and the function f(x; y) = y � x2. In this case we �nd f(�!A ) =(n� 1)s2=n and f(�!� ) = �2. Using (�1; �2) = (�2�; 1) it follows from Theorem6 that

P (pn(n� 1n

s2 � �2) � x)! P (W � x),

8

where W s N(0; �2W ) with

�2W = (�2�; 1)�

V ar(X) Cov(X;X2)Cov(X;X2) V ar(X2)

��2�1

�= 4�2V ar(X)� 4�Cov(X;X2) + V ar(X2)

= V ar(X2 � 2�X)= V ar((X � �)2)

We can easily replace (n� 1)s2=n by s2 to �nd back Theorem 2.

4.2 The sample coe¢ cient of variation

In probability theory and statistics, the coe¢ cient of variation (CV ) is a nor-malized measure of dispersion of a probability distribution. It is de�ned as theratio of the standard deviation to the mean: CV = �=�. This is only de�nedfor non-zero mean �, and is most useful for variables that are always positive.The sample coe¢ cient of variation is given by

SCV =s

X.

If � 6= 0, we have X a:s:�! � 6= 0 and SCV is well-de�ned a:s:.Now we considerthe vectors

�!A = (X;X2), �!� = (�;E(X2) and the function

f(x; y) =

py � x2x

.

It is easy to see that f(�!� ) = CV and that

f(�!A ) =

rn� 1n

SCV .

Straightforward calculations show that

(�1; �2) = (�E(X2)

��2;1

2��).

Using Theorem 6, we �nd that

P (pn(

rn� 1n

SCV � CV ) � x)! P (W � x).


�2W =�!�

�V ar(X) Cov(X;X2)

Cov(X;X2) V ar(X2)

��!� T

=E2(X2)

�4� E(X

2)

�3�2Cov(X;X2) +

1

4�2�2V ar(X2).

9

To simplify, note that

E((X � �)3) = Cov(X;X2)� 2��2,V ar((X � �)2) = V ar(X2) + 4�2�2 � 4�Cov(X;X2)

Now we �nd

�2W =E2(X2)

�4� (E(X

2)

�3�2� 4�

4�2�2)(E((X � �)3) + 2��2)

+1

4�2�2(V ar((X � �)2)� 4�2�2)

=(�2 + �2)2

�4� 1

�3E((X � �)3)� 2�

2

�2+

1

4�2�2V ar((X � �)2)� 1

=�4

�4� 1

�3E((X � �)3) + 1

4�2�2V ar((X � �)2).

In terms of kurtosis (X) and skewness 1(X) = ��3E((X ��)3), we �nd that

�2W =�4

�4� �

3

�3 1(X) +

�2

4�2 (X) +

�2

2�2.

Remarks.1) In the case of a normal distribution, we �nd that

�2W =�4

�4+�2

2�2= CV 4 +

1

2CV 2.

In other cases, we see that �2W is in�uenced by (X) and 1(X).2) In the case of an exponential distribution with parameter �, we have

� = � = 1=�, 1 = 2, = 6

and then CV = �2W = 1.3) For the Poisson(�)-distribution, we have

� = �2 = �, 1 = ��1=2, = ��1

and then CV = ��1=2 and

�2W =1

2�+

1

4�2.

4.3 The case � = 0

If � = 0, then CV is not de�ned but we can always calculate

1

SCV=X

s.

10

If �2 < 1, the central limit theorem together with s2 P�! �2 shows that wehave p

n

SCV=pnX

s

d=) Z

where Z s N(0; 1). Now note that for x > 0, we have

P (

pn

SCV> x) = P (

SCVpn<1

x),

P (

pn

SCV< �x) = P (SCVp

n> � 1

x).

As a consequence, we have

SCVpn

d=) U =

1

Z.

The reader can check that U has a (symmetric) density given by

fU (u) =1

u2fZ(

1

u) =

1

u2p�exp(� 1

2u2).

From this it follows that E(U) = 0 and �2U =1.

4.4 A t-statistic

In the place of SCV we can study T = 1=SCV = X=s. This is a quantityrelated to the t-statistic t = (X � �)=s. As in section 4.2, we obtain that

pn(T � �

�)

d=)W

where W s N(0; �2U ) where

�2U =�4

�4�2W = 1� �

3

�3 1(X) +

�2

4�2 (X) +

�2

2�2.

Note that for the t-statistic, we have the simpler result that

pnX � �s

d=) Z s N(0; 1).

4.5 The sample dispersion

Another related statistic is related to the dispersion D = �2=�. This measure iswell de�ned for � 6= 0 and can D be used for example to compare distributionswith di¤erent means. The corresponding sample dispersion is given by

SD =s2

X.

11

To study SD, we consider�!A = (X;X2), �!� = (�;E(X2)) and the function

f(x; y) = (y � x2)=x. Clearly we have

f(�!A ) =

n� 1n

SD,

f(�!� ) = D,

�!� = (��

2

�2� 2; 1

�).

It readily follows that

pn(SD �D) d

=)W


�2W = V ar(�1X + �2X2)

= �2(�2

�2( (X) + 2) +

�4

�4� 2�

3

�3 1(X)).

In the case of a normal distribution, we �nd that

�2W = �2�2

�2(2 +

�2

�2).

If � = 0, we obtain �rst that

pn1

SD=pnX

s2d=) 1

�2Z,

where Z s N(0; 1), and then it follows as in section 4.3. that

1pnSD

d=) �2

1

Z.

5 Sample covariance and correlation

5.1 The sample covariance

Consider the vector�!A = (X;Y ;XY ), �!� = (�1; �2; E(XY )) and let f(x; y; z) =

z � xy. In this case we �nd

f(�!A ) = XY �X Y

f(�!� ) = Cov(X;Y )

and �!� = (��1; �2; 1)

It follows that

P (pn(f(

�!A )� Cov(X;Y )) � x)! P (W � x)

12

where W s N(0; �2W ) and

�2W =�!�

0@ V ar(X) Cov(X;Y ) Cov(X;XY )Cov(X;Y ) V ar(Y ) Cov(Y;XY )Cov(X;XY ) Cov(Y;XY ) V ar(XY )

1A�!� tAssuming �rst for simplicity that �1 = �2 = 0, we �nd

�!� = (0; 0; 1) and

�2W = V ar(XY ). In the general case we �nd that

�2W = V ar((X � �1)(Y � �2)).

Remark. If X and Y are independent, we have

�2W = E((X � �1)2(Y � �2)2) = �21�22.

5.2 The sample correlation coe¢ cient

For a sample (X1; Y1); (X2; Y2); :::; (Xn; Yn) from (X;Y ) s A(�1; �2; �21; �

22; �),

the sample correlation coe¢ cient is de�ned as

r =n

n� 1XY �X Y

s1s2. (3)

For n!1, a rough estimate gives

r tE(XY )� E(X)E(Y )

�1�2= �,

so that r is an approximation of �. As in Corollary 5, we consider the vectors�!A = (X;Y ;X2; Y 2; XY ),�!� = (�1; �2; E(X

2); E(Y 2); E(XY ))

and the function

f(a; b; c; d; e) =e� abp

(c� a2)(d� b2)

Now we �nd f(�!� ) = �, f(�!A ) = r, and the derivatives:

�f

�a=

�bp(c� a2)(d� b2)

+(e� ab)a

(c� a2)p(c� a2)(d� b2)

�f

�b=

�ap(c� a2)(d� b2)

+(e� ab)b

(d� b2)p(c� a2)(d� b2)

�f

�c= �1

2

(e� ab)(c� a2)

p(c� a2)(d� b2)

�f

�d= �1

2

(e� ab)(d� b2)

p(c� a2)(d� b2)

�f

�e=

1p(c� a2)(d� b2)

13

It follows thatP (pn(r � �) � x)! P (W � x). (4)

where W s N(0; �2W ) and �2W =�!�

�!� t with given in (2).

In this case we have

�1 =�E(Y )�1�2

+�E(X)

�21, �2 =

�E(X)�1�2

+�E(Y )

�22,

�3 = �12

�

�21, �4 = �

1

2

�

�22, �5 =

1

�1�2.

In the special case where �1 = �2 = 0 and �1 = �2 = 1, we �nd�!� =

(0; 0;��=2;��=2; 1) and then we have

(�!�)1 = ��

2Cov(X;X2)� �

2Cov(X;Y 2) + Cov(X;XY )

(�!�)2 = ��

2Cov(Y;X2)� �

2Cov(Y; Y 2) + Cov(Y;XY )

(�!�)3 = ��

2V ar(X2)� �

2Cov(X2; Y 2) + Cov(X2; XY )

(�!�)4 = ��

2Cov(X2; Y 2)� �

2V ar(Y 2) + Cov(Y 2; XY )

(�!�)5 = ��

2Cov(X2; XY )� �

2Cov(Y 2; XY ) + V ar(XY )

and then (recall that �1 = �2 = 0 and �1 = �2 = 1) we have:

�2W =�!�

�!� t =

�2

4V ar(X2) +

�2

4Cov(X2; Y 2)� �

2Cov(X2; XY )

+�2

4Cov(X2; Y 2) +

�2

4V ar(Y 2)� �

2Cov(Y 2; XY )

��2Cov(X2; XY )� �

2Cov(Y 2; XY ) + V ar(XY )

=�2

4

�V ar(X2) + 2Cov(X2; Y 2) + V ar(Y 2)

��(Cov(X2; XY ) + Cov(Y 2; XY )) + V ar(XY )

=�2

4(E(X4)� 1 + 2E(X2Y 2)� 2 + E(Y 4)� 1)

��(E(X3Y )� �+ E(XY 3)� �) + E(X2Y 2)� �2

=�2

4(E(X4) + 2E(X2Y 2) + E(Y 4))

��(E(X3Y ) + E(XY 3)) + E(X2Y 2)

In the general case, we �nd that

�2W =�2

4(E(X�4) + 2E(X�2Y �2) + E(Y �4)) (5)

��(E(X�3Y �) + E(X�Y �3)) + E(X�2Y �2),

14

where

X� =X � E(X)

�1and Y � =

Y � E(Y )�2

.

The �nal result is that (4) holds with �2W given in (5).

Remarks.1) We can rewrite �2W more compact as follows. Assuming standardized

variables, we have

�2W =�2

4

�V ar(X2) + 2Cov(X2; Y 2) + V ar(Y 2)

��(Cov(X2; XY ) + Cov(Y 2; XY )) + V ar(XY )

=�2

4V ar(X2 + Y 2)� �Cov(X2 + Y 2; XY ) + V ar(XY )

= V ar(�

2(X2 + Y 2)�XY )

2) Note that the asymptotic variance �2W only depends on � and fourth-ordercentral moments of the underlying distribution.3) If � = 0, we �nd that �2W = E(X�2Y �2).4) If X and Y are independent, we have � = 0 and �2W = E(X�2Y �2) =

E(X�2)E(Y �2) = 1.5) If Y = a+ bX, b > 0 we �nd � = 1, Y � = X� and �2W = 0.

5.3 Application

To model dependence, one often uses a model of the following form. Startingfrom arbitrary independent random variables A and B we construct the vector(X;Y ) = (A;B + �A). Given a sample (Xi; Yi) we want to test e.g. thehypothesis H0 : � = 0 versus Ha : � 6= 0.It is clear that

V ar(X) = �2X = �2A

V ar(Y ) = �2Y = �2B + �

2�2A

Cov(X;Y ) = ��2A

� = �(X;Y ) = ��2Aq

�2A(�2B + �

2�2A)

and we have � = 0 if and only if � = 0. Under H0 we have

pnr

d=) Z s N(0; 1).

15

5.4 The bivariate normal case

For a standard bivariate normal distribution (X;Y ) s BN(0; 0; 1; 1; �), we showhow to calculate �2W , cf. (5).First note that (U; V ) = (X��Y; Y ) also has a bivariate normal distribution

withCov(U; V ) = Cov(X;Y )� �Cov(Y; Y ) = 0.

It follows that U and V are independent with V s N(0; 1) and U s N(0; 1��2).For general W s N(0; �2), we have �W (t) = exp 12�

2t2 and then E(W ) =E(W 3) = 0 and E(W 2) = �2, E(W 4) = 3�4.Now observe that Y = V and X = U + �V . We �nd

E(Y 4) = E(X4) = 3;

E(Y X3) = E(Y 3X) = E(V 3U + �V 4) = 3�;

E(Y 2X2) = E(V 2(U2 + 2�UV + �2V 2) = 1 + 2�2;

It follows that

�2W =�2

4(3 + 2 + 4�2 + 3)� �(3�+ 3�) + 1 + 2�2

= �4 � 2�2 + 1 = (1� �2)2

In general, for (X;Y ) s BN(�1; �2; �21; �22; �), we also �nd that �2W = (1��2)2,and then

r t N(�;(1� �2)2

n)

5.5 The t� and the F�transformationThe approach of the previous section can now be used to construct con�denceintervals for � and to test hypothesis concerning �.

5.5.1 Testing H0 : � = 0 versus Ha : � 6= 0

In the bivariate normal case it is often necessary to test H0 : � = 0 versusHa : � 6= 0. In the bivariate normal case, usually one uses the t-transformation:

t(x) =xp1� x2

.

Observe that we havet0(x) =

1

(1� x2)p1� x2

Under H0 we have t(�) = 0 and t0(�) = 1 and then the t�transformation showsthat p

n t(r)d=) Z s N(0; 1).

16

Remark. Note that

t(r)� r = r3p1� r2(1 +

p1� r2)

.

Under H0 it follows that

n3=2(t(r)� r) d=) 1

2Z3.

For large samples it is not very useful to use the t-transformation.

5.5.2 Testing H0 : � = �0 versus Ha : � 6= �0To test H0 : � = �0 versus Ha : � 6= �0, where �0 6= 0, in the bivariate normalcase, usually one uses the Fisher F -transformation:

F (x) =1

2ln(1 + x

1� x ).

In this case we haveF 0(x) =

1

1� x2The F�transformation leads to the popular result that

pn(F (r)� F (�)) t F 0(�)

pn(r � �)

so that pn(F (r)� F (�)) d

=) Z s N(0; 1).This approach can also be used in the case where �0 = 0.

5.6 Spearman�s rank correlation

To see whether or not two ordinal variables are associated, one can use Spear-man�s rank correlation coe¢ cient rS . In this case we start from the sample ofordinal variables (X1; Y1); (X2; Y2); :::; (Xn; Yn) and we assign a rank going from1 to n. The smallest X�value gets label 1, the next smallest X�value gets label2,..., the largest of the X�values is labelled with rank n. In a similar way welabel the Y�values. In the case of ties, we assign each variable the average ofthe rankings, cf. the example below.Starting from (X1; Y1); (X2; Y2); :::; (Xn; Yn), we thus obtain a sequence of

ranks (R1; R�1); (R2; R�2); :::; (Rn; R

�n). The rank correlation rS is given by the

ordinary correlation coe¢ cient between the two rankings. We use the notation

rS = rS(X;Y ) = r(R;R�).

As before, we calculate rS by using the general formula (3) as before. Formula(3) can be rewritten as

rS =

PRiR

�i � nR R�q

(PR2i � nR

2)(PR�2i � nR�2)

.

17

Now note that (with or without ties):XRi =

XR�i = 1 + 2 + ::+ n =

n(n+ 1)

2.

If there are no ties, we also have:XR2i =

XR�2i = 1 + 22 + :::+ n2 =

n(n+ 1)(2n+ 1)

6,X

R2i � nR2=

n(n+ 1)(2n+ 1)

6� n (n+ 1)

2

4=n(n2 � 1)

121

2

X(Ri �R�i )2 =

n(n+ 1)(2n+ 1)

6�X

RiR�i .

In the case of no ties, after simplifying, we �nd that:

rS = 1�6Pn

i=1(Ri �R�i )2n(n2 � 1) . (6)

For independent variables, we can use the result of section 5.2 to conclude thatpnrS

d=) Z s N(0; 1).

Remark. In the case of ties between variables, we assign each variablethe average of the rankings. Formula (5) to calculate rS should be modi�ed.Consider the following example:

X Y R R� R�R� (R�R�)23 10 1 1 0 06 15 2 2 0 09 30 3 4; 5 �1; 5 2; 2512 35 4 6 �2 415 25 5 3 2 418 30 6 4; 5 1; 5 2; 2521 50 7 8 �1 124 45 8 7 1 1

In the case of no ties we hadPR2i =

PR�2i = 204. In our example, we

havePR2i = 204,

PR�2i = 203; 5. If there is 1 tie involving 2 observations, we

see that there is a di¤erence of 0; 5.

In general, one can proceed as follows. Lett2 = the number of ties involving 2 observations;t3 = the number of ties involving 3 observations;

...tk = the number of ties involving k observations.

18

Now we calculate the correction factor

T =23 � 212

t2 +33 � 312

t3 + :::+k3 � k12

tk

In the case of ties, we replace (6) by:

rS = 1�6(T +

Pni=1(Ri �R�i )2)

n(n2 � 1) .

6 Comparing variances

Testing hypothesis concerning di¤erences between means is well known and canbe found in any textbook about statistics. Less is known about comparingvariances. In the case of unpaired samples from normal distributions, the dis-tribution of the quotient of the sample variances s21=s

22 can be determined and

is related to an F -distribution. In general, the analysis of s21=s22 is more compli-

cated. In this section we study s21=s22 for large samples. We consider unpaired

samples as well as paired samples.

6.1 Unpaired samples

Suppose that we have unpaired samples X1; X2; :::; Xn from X s A(�1; �21) andY1; Y2; :::; Ym from Y s B(�2; �22). In order to test whether or not �22 = �21 onecan use a test based on s21 and s

22. We need the following lemma.

Lemma 7 Suppose that E(X4 + Y 4) <1. As n!1 and m!1, we have

P (pn(s21 � �21) � x;

pm(s22 � �22) � y)

P (pn(s21 � �21) � x)P (

pm(s22 � �22) � y)! P (U1 � x)P (U2 � y),

where U1 s N(0; V ar(X � �1)2) and U2 s N(0; V ar((Y � �2)2).

Proof. This follows from independence and Theorem 2.

Now consider K de�ned by

K =�22�21

s21s22� 1.

Clearly we have

K =�22(s

21 � �21)� �21(s22 � �22)

s22�21

=Q

s22�21

.

Now we write

pnQ = �22

pn(s21 � �21)� �21

pm(s22 � �22)

pnpm.

Using the notations of Lemma 7 we have the following result.

19

Theorem 8 Suppose that E(X4 + Y 4) <1. If n!1 and m!1 in such away that n=m! �2 (0 � � <1), then

pnK

d=) V

d=1

�21U1 � �

1

�22U2,

and V s N(0; �2V ) with

�2V =1

�41V ar((X � �1)2) + �2

1

�42V ar((Y � �2)2). (7)

Proof. We clearly have pnQ

d=)W ,

where W d= �22U1 � ��21U2. Using s2i

P�! �2i (i = 1; 2), it follows that

pnK

d=) 1

�21�12

Wd=1

�21U1 � �

1

�22U2

and the result follows.Remarks.1) If � =1, we can interchange the role of n and m.2) From the practical point of view, we can use (7) to write

1

n�2V t

1

n

1

�41V ar((X � �1)2) +

1

m

1

�42V ar((Y � �2)2).

3) Note that the asymptotic variance depends on the kurtosis of the under-lying distributions. We �nd that.

�2V = (X) + 2 + �2( (Y ) + 2),

and then1

n�2V =

1

n( (X) + 2) +

1

m( (Y ) + 2)

4) In the special case of independent samples from normal distributions, we

havepnK

d=) V , where V s N(0; �2V ) with

�2V = V ar(X�2) + �2V ar(Y �2),

where X� and Y � are the standardized X and Y . Using the expressions ofSection 5.3. we �nd that �2V = 2(1 + �

2), and then

1

n�2V t 2(

1

n+1

m)

4) If �21 = �22 = �

2, we can study the pooled variance given by:

s2p =(n� 1)s21 + (m� 1)s22

n+m� 2 .

20

Now we �nd that

pn(s2p � �2) = (

n� 1n+m� 2)

pn(s21 � �2) +

m� 1n+m� 2

pnpm

pm(s22 � �2)

It follows that

pn(s2p � �2)

d

=)Wd=

�2

�2 + 1U1 +

�

�2 + 1U2

In this case W s N(0; �2W ), with

�2W = (�2

�2 + 1)2V ar((X � �1)2) + (

�

�2 + 1)2V ar((Y � �2)2).

In the case of samples from normal distributions with �21 = �22 = �2, we �ndthat

�2W = 2�4(�2

�2 + 1)2 + 2�4(

�

�2 + 1)2

= 2�4�2

1 + �2t 2�4

n

n+m.

6.2 Paired samples

Let (X1; Y1); (X2; Y2); :::; (Xn; Yn) denote a sample from an arbitrary bivariatedistribution (X;Y ) s A(�1; �2; �21; �22; �). We prove the following result.

Lemma 9 If E(X4 + Y 4) <1, then

P (pn(s21 � �21) � x;

pn(s22 � �22) � y)! P (U1 � x;U2 � y).

where (U1; U2) has a bivariate normal distribution with zero means and withvariance-covariance matrix�

V ar((X � �1)2) Cov((X � �1)2; (Y � �2)2)Cov((X � �1)2; (Y � �2)2) V ar((Y � �2)2)

�.

Proof. Take arbitrary real numbers (u; v) 6= (0; 0) and consider the vectors�!A = (X;Y ;X2; Y 2),�!� = (�1; �2; E(X

2); E(Y 2)),

and the function f(a; b; c; d) = u(c� a2) + v(d� b2). Clearly we have

f(�!A ) = u(X2 �X2

) + v(Y 2 � Y 2)

=n� 1n

(us21 + vs22)

21

and f(�!� ) = u�21 + v�22. It is easy to see that�!� = (�2u�1;�2v�2; u; v). The

transfer results of section 3.2 show that

P (pn(f(

�!A )� f(�!� )) � x)! P (W � x),

where W s N(0; �2W ) with �2W =�!�

�!� t and

=

0BB@�21 Cov(X;Y ) Cov(X;X2) Cov(X;Y 2)

�22 Cov(Y;X2) Cov(Y; Y 2)V ar(X2) Cov(X2; Y 2)

V ar(Y 2)

1CCA .Straighforward calculations show that

�!�

�!� t = V ar(u(X � �1)2 + v(Y � �2)2).

It follows that

P (pn(n� 1n

(us21 + vs22)� (u�21 + v�22)) � x)! P (W � x),

where W d= uU1 + vU2, and (U1; U2) has the desired bivariate normal distribu-

tion. It is clear that the correction factor (n�1)=n is not important. The resultfollows by using the Cramer-Wold-device.As in Theorem 8, we consider K and now we conclude that P (

pnK � x)!

P (V � x), whereV

d=1

�21U1 �

1

�22U2.

We �nd that V s N(0; �2V ) with

�2V =V ar((X � �1)2

�42+V ar((Y � �2)2)

�42� 2Cov((X � �1)2; (Y � �2)2)

�21�22

Remarks1) We can rewrite �2V more compact as follows. Using the notation X� =

(X � �1)=�1 and Y � = (Y � �2)=�2 we have

�2V = V ar(X�2) + V ar(Y �2)� 2Cov(X�2; Y �2)

= V ar(X�2 � Y �2)= E((X�2 � Y �2)2)

2) If we start from a sample from a bivariate normal distribution, we �nd(cf Section 5.3) that

�2V = E(X�4) + E(Y �4)� 2E(X�2Y �2) = 4(1� �2).

In the case of � = 0 we �nd back the result of the unpaired case with � = 1.

22

7 References

1. Bentkus,V., Jing, B.Y., Shao, Q.M. and Zhou, W., 2006, Limiting distri-butions of the non-central t-statistic and their applications to the powerof t-tests under non-normality. Bernoulli 13:2, 346-364

2. P. Billingsley (1968). Convergence of probability measures. Wiley, NewYork.

3. W. Feller, (1971). An introduction to probability theory and its applica-tions, Vol. 2 (2nd edition). Wiley, New York.

4. G. Grimmet and D. Stirzaker (2002). Probability and Random Processes(3rd edition). Oxford University Press, London.

5. Ladoucette, S.A. (2007). Analysis of Heavy-Tailed Risks. Ph.D. Thesis,Catholic University of Leuven.

6. Omey, E. (2008). Domains of attraction of the random vector (X;X2)and applications. To appear: Stochastics: An International Journal ofProbability and Stochastic Processes, Vol. 80, N � 2-3, 211-227.. Availableon http://arxiv.org/abs/0712.3440

7. S. Ross (1998). A �rst course in probability (5th edition). Prentice-Hall,New York.

8. O. Rykunova (1997). Some applications of asymptotic distribution of thesample correlation coe¢ cient Proceedings of Tartu Conference on Compu-tational Statistics and Statistics Education (Ed. E.M. Tiit), University ofTartu, Estonia, 140-147.

9. Sta¤ of Research and Education Association (1986). The Statistics Prob-lem Solver. Research and Education Association, New York

23

Central Limit theory for the sample variance and correlation coefficient

Documents

Transcript of Central Limit theory for the sample variance and correlation coefficient