Distance Correlation

download Distance Correlation

of 26

Transcript of Distance Correlation

  • 7/28/2019 Distance Correlation

    1/26

    arXiv:0803

    .4101v1[math.ST]28Mar2008

    The Annals of Statistics

    2007, Vol. 35, No. 6, 27692794DOI: 10.1214/009053607000000505

    c Institute of Mathematical Statistics, 2007

    MEASURING AND TESTING DEPENDENCE BY CORRELATIONOF DISTANCES

    By Gabor J. Szekely,1 Maria L. Rizzo and Nail K. Bakirov

    Bowling Green State University, Bowling Green State University andUSC Russian Academy of Sciences

    Distance correlation is a new measure of dependence between ran-dom vectors. Distance covariance and distance correlation are anal-ogous to product-moment covariance and correlation, but unlike theclassical definition of correlation, distance correlation is zero only if

    the random vectors are independent. The empirical distance depen-dence measures are based on certain Euclidean distances betweensample elements rather than sample moments, yet have a compactrepresentation analogous to the classical covariance and correlation.Asymptotic properties and applications in testing independence arediscussed. Implementation of the test and Monte Carlo results arealso presented.

    1. Introduction. Distance correlation provides a new approach to theproblem of testing the joint independence of random vectors. For all dis-tributions with finite first moments, distance correlation R generalizes theidea of correlation in two fundamental ways:

    (i) R(X, Y) is defined for X and Y in arbitrary dimensions;(ii) R(X, Y) = 0 characterizes independence of X and Y.

    Distance correlation has properties of a true dependence measure, analogousto product-moment correlation . Distance correlation satisfies 0 R 1,and R = 0 only ifX and Y are independent. In the bivariate normal case,R is a function of , and R(X, Y) |(X, Y)| with equality when = 1.

    Throughout this paper X in Rp and Y in Rq are random vectors, wherep and q are positive integers. The characteristic functions of X and Y aredenoted fX and fY, respectively, and the joint characteristic function of X

    Received November 2006; revised March 2007.1Supported by the National Science Foundation while working at the Foundation.AMS 2000 subject classifications. Primary 62G10; secondary 62H20.Key words and phrases. Distance correlation, distance covariance, multivariate inde-

    pendence.

    This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in The Annals of Statistics,2007, Vol. 35, No. 6, 27692794. This reprint differs from the original inpagination and typographic detail.

    1

    http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://arxiv.org/abs/0803.4101v1http://www.imstat.org/aos/http://dx.doi.org/10.1214/009053607000000505http://www.imstat.org/http://www.imstat.org/http://www.ams.org/msc/http://www.imstat.org/http://www.imstat.org/aos/http://dx.doi.org/10.1214/009053607000000505http://dx.doi.org/10.1214/009053607000000505http://dx.doi.org/10.1214/009053607000000505http://www.imstat.org/aos/http://www.imstat.org/http://www.ams.org/msc/http://www.imstat.org/http://dx.doi.org/10.1214/009053607000000505http://www.imstat.org/aos/http://arxiv.org/abs/0803.4101v1
  • 7/28/2019 Distance Correlation

    2/26

    2 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    and Y is denoted fX,Y . Distance covariance Vcan be applied to measure thedistance fX,Y(t, s) fX(t)fY(s) between the joint characteristic functionand the product of the marginal characteristic functions (the norm isdefined in Section 2), and to test the hypothesis of independence

    H0 : fX,Y = fXfY vs. H1 : fX,Y = fXfY.The importance of the independence assumption for inference arises, for ex-ample, in clinical studies with the case-only design, which uses only diseasedsubjects assumed to be independent in the study population. In this design,inferences on multiplicative gene interactions (see [1]) can be highly distortedwhen there is a departure from independence. Classical methods such as theWilks Lambda [14] or PuriSen [8] likelihood ratio tests are not applicable if

    the dimension exceeds the sample size, or when distributional assumptionsdo not hold (see, e.g., [7] regarding the prevalence of nonnormality in biol-ogy and ecology). A further limitation of multivariate extensions of methodsbased on ranks is that they are ineffective for testing nonmonotone types ofdependence.

    We propose an omnibus test of independence that is easily implementedin arbitrary dimension. In our Monte Carlo results the distance covariancetest exhibits superior power against nonmonotone types of dependence whilemaintaining good power performance in the multivariate normal case rela-tive to the parametric likelihood ratio test. Distance correlation can also beapplied as an index of dependence; for example, in meta-analysis [ 12] dis-tance correlation would be a more generally applicable index than product-

    moment correlation, without requiring normality for valid inferences.Theoretical properties of distance covariance and correlation are covered

    in Section 2, extensions in Section 3, and results for the bivariate normalcase in Section 4. Empirical results are presented in Section 5, followed bya summary in Section 6.

    2. Theoretical properties of distance dependence measures.Notation. The scalar product of vectors t and s is denoted by t, s. For

    complex-valued functions f(), the complex conjugate of f is denoted byf and |f|2 = f f . The Euclidean norm of x in Rp is |x|p. A sample fromthe distribution of X in Rp is denoted by the n p matrix X, and thesample vectors (rows) are labeled X1, . . . , X n. A primed variable X

    is anindependent copy of X; that is, X and X are independent and identicallydistributed.

    Definition 1. For complex functions defined on Rp Rq the w-norm in the weighted L2 space of functions on R

    p+q is defined by

    (t, s)2w =Rp+q

    |(t, s)|2w(t, s) dtds,(2.1)

  • 7/28/2019 Distance Correlation

    3/26

    DISTANCE CORRELATION 3

    where w(t, s) is an arbitrary positive weight function for which the integralabove exists.

    2.1. Choice of weight function. Using the w-norm (2.1) with a suit-able choice of weight function w(t, s), we define a measure of dependence

    V2(X, Y; w) = fX,Y(t, s) fX(t)fY(s)2w(2.2)

    =

    Rp+q

    |fX,Y(t, s) fX(t)fY(s)|2w(t, s) dtds,

    such that V2(X, Y; w) = 0 if and only if X and Y are independent. In thispaper V will be analogous to the absolute value of the classical product-moment covariance. If we divide

    V(X, Y; w) by V(X; w)V(Y; w), where

    V2(X; w) =R2p

    |fX,X(t, s) fX(t)fX(s)|2w(t, s) dtds,we have a type of unsigned correlation Rw.

    Not every weight function leads to an interesting Rw , however. Thecoefficient Rw should be scale invariant, that is, invariant with respect totransformations (X, Y) (X,Y), for > 0. We also require that Rw ispositive for dependent variables. It is easy to check that if the weight functionw(t, s) is integrable, and both X and Y have finite variance, then the Taylorexpansions of the underlying characteristic functions show that

    lim0

    V2(X,Y; w)V2(X; w)V2(Y; w) =

    2(X, Y).

    Thus for integrable w, if = 0, then Rw can be arbitrarily close to zero evenif X and Y are dependent. However, by applying a nonintegrable weightfunction, we obtain an Rw that is scale invariant and cannot be zero fordependent X and Y. We do not claim that our choice for w is the onlyreasonable one, but it will become clear in the following sections that ourchoice (2.4) results in very simple and applicable empirical formulas. (A morecomplicated weight function is applied in [2], which leads to a more com-putationally difficult statistic and does not have the interesting correlationform.)

    The crucial observation is the following lemma.

    Lemma 1. If 0 < < 2, then for all x inRdRd

    1 cost, x|t|d+d

    dt = C(d, )|x|,

    where

    C(d, ) =2d/2(1 /2)2((d + )/2)

  • 7/28/2019 Distance Correlation

    4/26

    4 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    and () is the complete gamma function. The integrals at 0 and aremeant in the principal value sense: lim0

    Rd\{B+1Bc}, where B is the

    unit ball (centered at 0) inRd and Bc is the complement of B.

    See [11] for the proof of Lemma 1. In the simplest case, = 1, the constantin Lemma 1 is

    cd = C(d, 1) =(1+d)/2

    ((1 + d)/2).(2.3)

    In view of Lemma 1, it is natural to choose the weight function

    w(t, s) = (cp cq |t|1+pp |s|1+qq )1,(2.4)

    corresponding to = 1. We apply the weight function (2.4) and the cor-responding weighted L2 norm , omitting the index w, and write thedependence measure (2.2) as V2(X, Y). In integrals we also use the symbold, which is defined by

    d = (cp cq |t|1+pp |s|1+qq )1 dtds.

    For finiteness offX,Y(t, s) fX(t)fY(s)2 it is sufficient that E|X|p < and E|Y|q < . By the CauchyBunyakovsky inequality

    |fX,Y(t, s) fX(t)fY(s)|2 = [E(eit,X fX(t))(eis,Y fY(s))]2

    E[eit,X fX(t)]2E[eis,Y fY(s)]2

    = (1 |fX(t)|2)(1 |fY(s)|2).

    If E(|X|p + |Y|q) < , then by Lemma 1 and by Fubinis theorem it followsthat

    Rp+q|fX,Y(t, s) fX(t)fY(s)|2 d

    Rp1 |fX(t)|2

    cp|t|1+ppdtRq

    1 |fY(s)|2

    cq|s|1+qqds

    (2.5)

    = E

    Rp

    1 cost, X Xcp|t|1+pp

    dt

    ERq

    1 coss, Y Ycq|s|1+qq

    ds

    = E|X X|pE|Y Y|q < .

    Thus we have the following definitions.

  • 7/28/2019 Distance Correlation

    5/26

    DISTANCE CORRELATION 5

    2.2. Distance covariance and distance correlation.

    Definition 2 (Distance covariance). The distance covariance (dCov)between random vectors X and Y with finite first moments is the nonneg-ative number V(X, Y) defined by

    V2(X, Y) = fX,Y(t, s) fX(t)fY(s)2(2.6)

    =1

    cpcq

    Rp+q

    |fX,Y(t, s) fX(t)fY(s)|2|t|1+pp |s|1+qq

    dtds.

    Similarly, distance variance (dVar) is defined as the square root of

    V2(X) = V2(X, X) = fX,X(t, s) fX(t)fX(s)2.

    Remark 1. If E(|X|p + |Y|q) = but E(|X|p + |Y|q ) < for some0 < < 1, then one can apply V() and R() (see Section 3.1); otherwiseone can apply a suitable transformation of (X, Y) into bounded random

    variables ( X, Y) such that X and Y are independent if and only if X andY are independent.

    Definition 3 (Distance correlation). The distance correlation (dCor)between random vectors X and Y with finite first moments is the nonneg-

    ative number R(X, Y) defined by

    R2(X, Y) =

    V2(X, Y)V2(X)V2(Y)

    , V2(X)V2(Y) > 0,

    0, V2(X)V2(Y) = 0.(2.7)

    Clearly the definition ofR in (2.7) suggests an analogy with the product-moment correlation coefficient . Analogous properties are established inTheorem 3. The relation between V, R and in the bivariate normal casewill be established in Theorem 7.

    The distance dependence statistics are defined as follows. For an observedrandom sample (X,Y) =

    {(X

    k, Y

    k) : k = 1, . . . , n

    }from the joint distribution

    of random vectors X in Rp and Y in Rq, define

    akl = |Xk Xl|p, ak = 1n

    nl=1

    akl, al, =1

    n

    nk=1

    akl,

    a

    =1

    n2

    nk,l=1

    akl, Akl = akl ak al + a,

  • 7/28/2019 Distance Correlation

    6/26

    6 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    k, l = 1, . . . , n. Similarly, define bkl = |Yk Yl|q and Bkl = bkl bk bl + b,for k, l = 1, . . . , n.

    Definition 4. The empirical distance covariance Vn(X,Y) is the non-negative number defined by

    V2n(X,Y) =1

    n2

    nk,l=1

    AklBkl.(2.8)

    Similarly, Vn(X) is the nonnegative number defined by

    V2n(X) = V2n(X,X) =1

    n2

    n

    k,l=1A2kl.(2.9)

    Although it may not be immediately obvious that V2n(X,Y) 0, this factas well as the motivation for the definition ofVn will be clear from Theorem1 below.

    Definition 5. The empirical distance correlation Rn(X,Y) is the squareroot of

    R2n(X,Y) =

    V2n(X,Y)V2n(X)V2n(Y)

    , V2n(X)V2n(Y) > 0,

    0, V2n(X)V2n(Y) = 0.(2.10)

    Remark 2. The statistic Vn(X) = 0 if and only if every sample observa-tion is identical. Indeed, if Vn(X) = 0, then Akl = 0 for k, l = 1, . . . , n. Thus0 = Akk = ak ak + a implies that ak = ak = a/2, and

    0 = Akl = akl ak al + a = akl = |Xk Xl|p,so X1 = = Xn.

    It is clear that Rn is easy to compute, and in the following sections it willbe shown that Rn is a good empirical measure of dependence.

    2.3. Properties of distance covariance. It would have been natural, butless elementary, to define Vn(X,Y) as fnX,Y(t, s) fnX(t)fnY(s), where

    fnX,Y(t, s) =1n

    nk=1

    exp{it, Xk + is, Yk}

    is the empirical characteristic function of the sample, {(X1, Y1), . . . , (Xn, Yn)},and

    fnX(t) =1

    n

    nk=1

    exp{it, Xk}, fnY(s) =1

    n

    nk=1

    exp{is, Yk}

  • 7/28/2019 Distance Correlation

    7/26

    DISTANCE CORRELATION 7

    are the marginal empirical characteristic functions of the X sample and Ysample, respectively. Our first theorem shows that the two definitions areequivalent.

    Theorem 1. If(X,Y) is a sample from the joint distribution of (X, Y),then

    V2n(X,Y) = fnX,Y(t, s) fnX(t)fnY(s)2.

    Proof. Lemma 1 implies that there exist constants cp and cq such thatfor all X in Rp, Y in Rq,

    Rp1 exp{it, X}

    |t|1+pp

    dt = cp

    |X

    |p,(2.11)

    Rq

    1 exp{is, Y}|s|1+qq

    ds = cq|Y|q,(2.12)Rp

    Rq

    1 exp{it, X + is, Y}|t|1+pp |s|1+qq

    dtds = cpcq|X|p|Y|q,(2.13)

    where the integrals are understood in the principal value sense. For sim-plicity, consider the case p = q = 1. The distance between the empiricalcharacteristic functions in the weighted norm w(t, s) = 2t2s2 involves|fnX,Y(t, s)|2, |fnX(t)fnY(s)|2 and fnX,Y(t, s)fnX(t)fnY(s). For the first we have

    fnX,Y(t, s)f

    nX,Y(t, s) =

    1

    n2

    n

    k,l=1

    cos(Xk Xl)t cos(Yk Yl)s + V1,where V1 represents terms that vanish when the integral fnX,Y(t, s)fnX(t)fnY(s)2is evaluated. The second expression is

    fnX(t)fnY(s)f

    nX(t)f

    nY(s)

    =1

    n2

    nk,l=1

    cos(Xk Xl)t 1n2

    nk,l=1

    cos(Yk Yl)s + V2

    and the third is

    fnX,Y(t, s)fnX(t)f

    nY(s)

    = 1n3

    nk,l,m=1

    cos(Xk Xl)t cos(Yk Ym)s + V3,

    where V2 and V3 represent terms that vanish when the integral is evaluated.To evaluate the integral fnX,Y(t, s) fnX(t)fnY(s)2, apply Lemma 1, andstatements (2.11), (2.12) and (2.13) using

    cos u cos v = 1 (1 cos u) (1 cos v) + ( 1 cos u)(1 cos v).

  • 7/28/2019 Distance Correlation

    8/26

    8 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    After cancellation in the numerator of the integrand it remains to evaluateintegrals of the type

    R2(1 cos(Xk Xl)t)(1 cos(Yk Yl)s) dt

    t2ds

    s2

    =

    R

    (1 cos(Xk Xl)t) dtt2

    R

    (1 cos(Yk Yl)s) dss2

    = c21|Xk Xl| |Yk Yl|.For random vectors X in Rp and Y in Rq , the same steps are applied

    using w(t, s) = {cp cq|t|1+pp |s|1+qq }1. ThusfnX,Y(t, s) fnX(t)fnY(s)2 = S1 + S2 2S3,(2.14)

    where

    S1 =1

    n2

    nk,l=1

    |Xk Xl|p|Yk Yl|q,(2.15)

    S2 =1

    n2

    nk,l=1

    |Xk Xl|p 1n2

    nk,l=1

    |Yk Yl|q,(2.16)

    S3

    =1

    n3

    n

    k=1n

    l,m=1 |Xk Xl|p|Yk Ym|q.(2.17)To complete the proof we need to verify the algebraic identity

    V2n(X,Y) = S1 + S2 2S3.(2.18)For the proof of (2.18) see the Appendix. Then (2.14) and (2.18) imply thatV2n(X,Y) = fnX,Y(t, s) fnX(t)fnY(s)2.

    Theorem 2. If E|X|p < and E|Y|q < , then almost surelylimnVn(X, Y) = V(X, Y).(2.19)

    Proof. Define

    n(t, s) =1

    n

    nk=1

    eit,Xk+is,Yk 1n

    nk=1

    eit,Xk1

    n

    nk=1

    eis,Yk,

    so that V2n = n(t, s)2. Then after elementary transformations

    n(t, s) =1

    n

    nk=1

    ukvk 1n

    nk=1

    uk1

    n

    nk=1

    vk,

  • 7/28/2019 Distance Correlation

    9/26

    DISTANCE CORRELATION 9

    where uk = exp(it, Xk) fX(t) and vk = exp(is, Yk) fY(s).For each > 0 define the region

    D() = {(t, s) : |t|p 1/, |s|q 1/}(2.20)and random variables

    V2n, =D()

    |n(t, s)|2 d.

    For any fixed > 0, the weight function w(t, s) is bounded on D(). HenceV2n, is a combination of V-statistics of bounded random variables. For each > 0 by the strong law of large numbers (SLLN) for V-statistics, it followsthat almost surely

    limnV

    2n, = V2, =

    D()|fX,Y(t, s) fX(t)fY(s)|2 d.

    Clearly V2, converges to V2 as tends to zero. Now it remains to prove that

    almost surely

    limsup0

    limsupn

    |V2n, V2n| = 0.(2.21)

    For each > 0

    |V2n, V2n| |t|p1/

    |n(t, s)|2 d(2.22)

    + |s|q1/ |n(t, s)|2 d.For z = (z1, z2, . . . , zp) in R

    p define the function

    G(y) =

    |z|

  • 7/28/2019 Distance Correlation

    10/26

    10 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    Here |vk|2 = 1 + |fY(s)|2 eis,YkfY(s) eis,YkfY(s), thusRq

    |vk|2 dscq |s|1+qq

    = (2EY |Yk Y| E|Y Y|) 2(|Yk| + E|Y|),

    where the expectation EY is taken with respect to Y, and Y D= Y is inde-

    pendent of Yk. Further, after a suitable change of variables|t|p

  • 7/28/2019 Distance Correlation

    11/26

    DISTANCE CORRELATION 11

    Corollary 1. If E(|X|p + |Y|q) < , then almost surely,limnR

    2n(X,Y) = R2(X, Y).

    The definition of dCor suggests that our distance dependence measuresare analogous in at least some respects to the corresponding product-momentcorrelation. By analogy, certain properties of classical correlation and vari-ance definitions should also hold for dCor and dVar. These properties areestablished in Theorems 3 and 4.

    Theorem 3 (Properties of dCor).

    (i) If E(|X|p + |Y|q) < , then 0 R 1, andR(X, Y) = 0 if and onlyif X and Y are independent.

    (ii) 0 Rn 1.(iii) If Rn(X,Y) = 1, then there exist a vector a, a nonzero real number

    b and an orthogonal matrix C such thatY = a + bXC.

    Proof. In (i), R(X, Y) exists whenever X and Y have finite first mo-ments, and X and Y are independent if and only if the numerator

    V2(X, Y) = fX,Y(t, s) fX(t)fY(s)2

    of R2(X, Y) is zero. Let U = eit,X fX(t) and V = eis,Y fY(s). Then

    |fX,Y(t, s) fX(t)fY(s)|2 = |E[UV]|2 (E[|U| |V|])2 E[|U|2|V|2]= (1 |fX(t)|2)(1 |fY(s)|2).

    Thus Rp+q

    |fX,Y(t, s) fX(t)fY(s)|2 d

    Rp+q

    |(1 |fX(t)|2)(1 |fY(s)|2)|2 d;

    hence 0 R(X, Y) 1, and (ii) follows by a similar argument.(iii) IfRn(X,Y) = 1, then the arguments below show that X and Y are

    similar almost surely, thus the dimensions of the linear subspaces spannedby X and Y respectively are almost surely equal. (Here similar means thatY and X are isometric for some = 0.) For simplicity we can suppose thatX and Y are in the same Euclidean space and both span Rp. From theCauchyBunyakovski inequality it is easy to see that Rn(X,Y) = 1 if andonly if Ak l = Bk l for some factor . Suppose that || = 1. Then

    |Xk Xl|p = |Yk Yl|q + dk + dl

  • 7/28/2019 Distance Correlation

    12/26

    12 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    for all k, l, for some constants dk, dl. Then with k = l we obtain dk = 0 forall k. Now, one can apply a geometric argument. The two samples are iso-metric, so Y can be obtained from X through operations of shift, rotationand reflection, and hence Y= a + bXC for some vector a, b = and orthog-onal matrix C. If || = 1 and = 0, apply the geometric argument to Xand Y and it follows that Y= a + bXC where b = .

    Theorem 4 (Properties of dVar). The following properties hold for ran-dom vectors with finite first moments:

    (i) dVar(X) = 0 implies that X= E[X], almost surely.(ii) dVar(a+ bCX) = |b|dVar(X) for all constant vectorsa inRp, scalars

    b and p p orthonormal matrices C.(iii) dVar(X+ Y) dVar(X)+dVar(Y) for independent random vectorsX inRp and Y inRp.

    Proof. (i) If dVar(X) = 0, then

    (dVar(X))2 =

    Rp

    Rp

    |fX,X(t, s) fX(t)fX(s)|2c2p|t|p+1p |s|p+1p

    dtds = 0,

    or equivalently, fX,X(t, s) = fX(t + s) = fX(t)fX(s) for all t, s. That is,

    fX(t) = eic,t for some constant vector c, and hence X is a constant vector,

    almost surely.Statement (ii) is obvious.(iii) Let = fX+Y,X+Y(t, s)fX(t+s)fY(t+s), 1 = fX(t, s)fX(t)fX(s)

    and 2 = fY(t, s) fY(t)fY(s). If X and Y are independent, then = fX+Y,X+Y(t, s) fX+Y(t)fX+Y(s)

    = fX(t + s)fY(t + s) fX(t)fX(s)fY(t)fY(s)= [fX(t + s) fX(t)fX(s)]fY(t + s)

    + fX(t)fX(s)[fY(t + s) fY(t)fY(s)]= 1fY(t + s) + fX(t)fX(s)2,

    and therefore ||2 |1|2 + |2|2 + 2|1| |2|. Equality holds if and only if12 = 0, that is, if and only ifX or Y is a constant almost surely.

    2.4. Asymptotic properties of nV2n. Our proposed test of independenceis based on the statistic nV2n/S2. IfE(|X|p + |Y|q) < , we prove that underindependence nV2n/S2 converges in distribution to a quadratic form

    QD=

    j=1

    jZ2j ,(2.25)

  • 7/28/2019 Distance Correlation

    13/26

    DISTANCE CORRELATION 13

    where Zj are independent standard normal random variables, {j} are non-negative constants that depend on the distribution of (X, Y) and E[Q] = 1.A test of independence that rejects independence for large nV2n/S2 is statis-tically consistent against all alternatives with finite first moments.

    Let () denote a complex-valued zero-mean Gaussian random processwith covariance function

    R(u, u0) = (fX(t t0) fX(t)fX(t0))(fY(s s0) fY(s)fY(s0)),where u = (t, s), u0 = (t0, s0) Rp Rq .

    Theorem 5 (Weak convergence). If X and Y are independent andE(

    |X

    |p +

    |Y

    |q) 0 we construct a sequence of random variables {Qn()} with

    the following properties:

    (i) Qn() converges in distribution to a random variable Q().

    (ii) E|Qn() n| .(iii) E|Q() | .Then the weak convergence of n2 to 2 follows from the convergenceof the corresponding characteristic functions.

    The sequence Qn() is defined as follows. Given > 0, choose a partition{Dk}Nk=1 of D() (2.20) into N = N() measurable sets with diameter atmost . Define

    Qn() =Nk=1

    Dk

    |n|2 d.

    For a fixed M > 0 let

    () = supu,u0 E||n(u)|2 |n(u0)|2|,where the supremum is taken over all u = (t, s) and u0 = (t0, s0) such thatmax{|t|, |t0|, |s|, |s0|} < M, and |t t0|2 + |s s0|2 < 2. Then lim0 () = 0for every fixed M > 0, and for fixed > 0

    E

    D()

    |n(u)|2 d Qn() ()

    D|n(u)|2 d

    0= 0.

  • 7/28/2019 Distance Correlation

    14/26

    14 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    On the other hand,D

    |n(u)|2 d Rp+q

    |n(u)|2 d

    |t|1/

    |n(u)|2 d

    +

    |s|1/

    |n(u)|2 d.

    By similar steps as in the proof of Theorem 2, one can derive that

    E

    |t|1/

    |n(u)|2 d n 1

    n(E|X1 X2|G(|X1 X2|) + wp)E|Y1 Y2|

    00,

    where wp is a constant depending only on p, and similarly

    E

    |s|1/

    |n(u)|2 d0

    0.

    Similar inequalities also hold for the random process (t, s) with

    Q() =Nk=1

    Dk

    |(u)|2 d.

    The weak convergence of Qn() to Q() as n follows from the multi-variate central limit theorem, and therefore nV2n = n2 Dn

    2.

    Corollary 2. If E(|X|p + |Y|q) < , then:(i) If X and Y are independent, nV2n/S2 DnQ where Q is a non-

    negative quadratic form of centered Gaussian random variables (2.25) andE[Q] = 1.

    (ii) If X and Y are dependent, then nV2n/S2 Pn.

    Proof. (i) The independence ofX and Y implies that n and thus is a

    zero-mean process. According to Kuo [5], Chapter 1, Section 2, the squarednorm 2 of the zero-mean Gaussian process has the representation

    2 D=j=1

    jZ2j ,(2.26)

    where Zj are independent standard normal random variables, and the non-negative constants {j} depend on the distribution of (X, Y). Hence, under

  • 7/28/2019 Distance Correlation

    15/26

    DISTANCE CORRELATION 15

    independence, nV2n converges in distribution to a quadratic form (2.26). Itfollows from (2.5) that

    E2 =Rp+q

    R(u, u) d

    =

    Rp+q

    (1 |fX(t)|2)(1 |fY(s)|2) d

    = E(|X X|p|Y Y|q).

    By the SLLN for V-statistics, S2a.s.nE(|XX

    |p|YY|q). Therefore nV2n/S2

    D

    n

    Q, where E[Q] = 1 and Q is the quadratic form (2.25).

    (ii) Suppose that X and Y are dependent and E(|X|p + |Y|q) < .Then V(X, Y) > 0, Theorem 2 implies that V2n(X,Y) a.s.nV

    2(X, Y) > 0, and

    therefore nV2n(X,Y) Pn. By the SLLN, S2 converges to a constant andtherefore nV2n/S2 Pn.

    Theorem 6. Suppose T(X,Y,,n) is the test that rejects independenceif

    nV2n(X,Y)S2

    > (1(1 /2))2,

    where () denotes the standard normal cumulative distribution function,and let (X,Y,n) denote the achieved significance level of T(X,Y,,n). IfE(|X|p + |Y|q) < , then for all 0 < 0.215

    (i) limn (X,Y,n) ,(ii) supX,Y{limn (X,Y,n) :V(X, Y) = 0} = .

    Proof. (i) The following inequality is proved as a special case of atheorem of Szekely and Bakirov [9], page 189. If Q is a quadratic form ofcentered Gaussian random variables and E[Q] = 1, then

    P{

    Q

    (1(1

    /2))2

    }

    for all 0 < 0.215.(ii) For Bernoulli random variables X and Y we have that Rn(X,Y) =

    |(X,Y)|. By the central limit theorem, under independence n(X,Y) isasymptotically normal. Thus, in case X and Y are independent Bernoullivariables, the quadratic form Q contains only one term, Q = Z21 , and theupper bound is achieved.

  • 7/28/2019 Distance Correlation

    16/26

    16 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    Thus, a test rejecting independence ofX and Y when nV2n/S2 1(1/2) has an asymptotic significance level at most . The asymptotic test cri-terion could be quite conservative for many distributions. Alternatively onecan estimate the critical value for the test by conditioning on the observedsample, which is discussed in Section 5.

    Remark 3. IfE|X|2p < and E|Y|2q < , then E[|X|p|Y|q] < , so byLemma 1 and by Fubinis theorem we can evaluate

    V2(X, Y) = E[|X1 X2|p|Y1 Y2|q] + E|X1 X2|p E|Y1 Y2|q 2E[|X1 X2|p|Y1 Y3|q ].

    If second moments exist, Theorem 2 and weak convergence can be estab-lished by V-statistic limit theorems [13]. Under the null hypothesis of inde-pendence, V2n is a degenerate kernel V-statistic. The first-order degeneracyfollows from inequalities proved in [10]. Thus nV2n converges in distributionto a quadratic form (2.26).

    3. Extensions.

    3.1. The class of -distance dependence measures. We introduce a one-parameter family of distance dependence measures indexed by a positiveexponent . In our definition of dCor we have applied exponent = 1.

    Suppose that E(|X|p + |Y|q ) < . Let V() denote the -distance covari-ance, which is the nonnegative number defined by

    V2()(X, Y) = fX,Y(t, s) fX(t)fY(s)2

    =1

    C(p, )C(q, )

    Rp+q

    |fX,Y(t, s) fX(t)fY(s)|2|t|+pp |s|+qq

    dtds.

    Similarly, R() denotes -distance correlation, which is the square root of

    R2() = V2()(X, Y)

    V2()(X)V2()(Y), 0 < V2()(X), V2()(Y) < ,

    and R() = 0 ifV2()(X)V2()(Y) = 0.The -distance dependence statistics are defined by replacing the expo-

    nent 1 with exponent in the distance dependence statistics (2.8), (2.9) and(2.10). That is, replace akl = |Xk Xl|p with akl = |Xk Xl|p and replacebkl = |Yk Yl|q with bkl = |Yk Yl|q , k, l = 1, . . . , n.

    Theorem 2 can be generalized for -norms, so that almost sure con-vergence of V()n V() follows if the -moments are finite. Similarly onecan prove the weak convergence and statistical consistency for exponents,0 < < 2, provided that moments are finite.

  • 7/28/2019 Distance Correlation

    17/26

    DISTANCE CORRELATION 17

    The case = 2 leads to the counterpart of classical correlation and co-variance. In fact, if p = q= 1, then R(2) = ||, R(2)n = || and V(2)n = 2|xy|,where xy is the maximum likelihood estimator of Cov(X, Y).

    3.2. Affine invariance. Group invariance is an important concept in sta-tistical inference (see Eaton [3] or Giri [4]), particularly when any transfor-mation of data and/or parameters by some group element constitutes anequivalent problem for inference. For the problem of testing independence,which is preserved under the group of affine transformations, it is natural toconsider dependence measures that are affine invariant. Although R(X, Y)as defined by (2.7) is not affine invariant, it is clearly invariant with respectto the group of orthogonal transformations

    X a1 + b1C1X, Y a2 + b2C2Y,(3.1)where a1, a2 are arbitrary vectors, b1, b2 are arbitrary nonzero numbersand C1, C2 are arbitrary orthogonal matrices. We can also define a distancecorrelation that is affine invariant.

    For random samples X from the distribution of X in Rp and Y from thedistribution of Y in Rq, define the scaled samples X and Y by

    X =XS1/2X , Y

    =YS1/2Y ,(3.2)

    where SX and SY are the sample covariance matrices ofX and Y, respec-tively. Although the sample vectors in (3.2) are not invariant to affine trans-formations, the distances

    |Xk

    Xl

    |and

    |Yk

    Yl

    |, k, l = 1, . . . , n, are invari-

    ant to affine transformations. Then the affine distance correlation statisticRn(X,Y) between random samples X and Y is the square root of

    R2n (X,Y) =V2n(X,Y)V2n(X)V2n(Y) .

    Properties established in Section 2 also hold for Vn and Rn, because thetransformation (3.2) simply replaces the weight function {cpcq |t|1+pp |s|1+qq }1with the weight function {cpcq|S1/2X t|1+pp |S1/2Y s|1+qq }1.

    4. Results for the bivariate normal distribution. Let X and Y havestandard normal distributions with Cov(X, Y) = (X, Y) = . Introduce thefunction

    F() =

    |fX,Y(t, s) fX(t)fY(s)|2 dtt2

    ds

    s2.

    Then V2(X, Y) = F()/c21 = F()/2 and

    R2(X, Y) = V2(X, Y)V2(X, X)V2(Y, Y) = F()F(1) .(4.1)

  • 7/28/2019 Distance Correlation

    18/26

    18 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    Theorem 7. If X and Y are standard normal, with correlation =(X, Y), then

    (i) R(X, Y) ||,(ii) R2(X, Y) = arcsin+

    12arcsin/2

    42+1

    1+/33 ,

    (iii) inf=0R(X,Y)

    || = lim0R(X,Y)

    || =1

    2(1+/33)1/2 0.89066.

    Proof. (i) If X and Y are standard normal with correlation , then

    F() =

    |e(t2+s2)/2ts et2/2es2/2|2 dtt2

    ds

    s2

    = R2

    et2s2 (1 2ets + e2ts) dtt2

    dss2

    =

    R2

    et2s2

    n=2

    2n 2n!

    (ts)n dtt2

    ds

    s2

    =

    R2

    et2s2

    k=1

    22k 2(2k)!

    (ts)2k dtt2

    ds

    s2

    = 2 k=1

    22k 2(2k)!

    2(k1)R2

    et2s2 (ts)2(k1) dtds

    .

    Thus F() = 2G(), where G() is a sum with all nonnegative terms. Thefunction G() is clearly nondecreasing in and G() G(1). Therefore

    R2(X, Y) = F()F(1)

    = 2G()

    G(1) 2,

    or equivalently, R(X, Y) ||.(ii) Note that F(0) = F(0) = 0 so F() =

    0

    x0 F

    (z) dz dx. The secondderivative of F is

    F(z) =d2

    dz2

    R2

    et2s2(1 2ezts + e2zts) dt

    t2ds

    s2= 4V(z) 2V

    z

    2

    ,

    where

    V(z) = R2

    et2

    s2

    2zts dtds =

    1 z2 .Here we have applied a change of variables, used the fact that the eigenvaluesof the quadratic form t2 + s2 + 2zts are 1 z, and et2 dt = (/)1/2.Then

    F() =

    0

    x0

    4

    1 z2 2

    1 z2/4

    dzdx

  • 7/28/2019 Distance Correlation

    19/26

    DISTANCE CORRELATION 19

    Fig. 1. Dependence coefficient R2 (solid line) and correlation 2 (dashed line) in thebivariate normal case.

    = 4

    0

    (arcsin(x) arcsin(x/2)) dx

    = 4( arcsin +1 2

    arcsin(/2)

    4 2 + 1),

    and (4.1) implies (ii).(iii) In the proof of (i) we have that R/|| is a nondecreasing function of

    ||, and lim||0R(X, Y)/|| = (1 + /3

    3)1/2/2 follows from (ii).

    The relation between R and derived in Theorem 7 is shown by the plotof R2 versus 2 in Figure 1.

    5. Empirical results. In this section we summarize Monte Carlo powercomparisons of our proposed distance covariance test with three classicaltests for multivariate independence. The likelihood ratio test (LRT) of thehypothesis H0 : 12 = 0, with unknown, is based on

    det(S)det(S11)det(S22)

    = det(S22 S21S111 S12)

    det(S22),(5.1)

    where det() is the determinant, S, S11 and S22 denote the sample covari-ances of (X,Y), X and Y, respectively, and S12 is the sample covarianceCov(X,Y). Under multivariate normality,

    W = 2 log = n logdet(I S122 S21S111 S12)

  • 7/28/2019 Distance Correlation

    20/26

    20 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    Table 1Empirical Type-I error rates for 10,000 tests at nominal significance level 0.1 in Example

    1 ( = 0), usingB replicates forV and Bartletts approximation forW, T andS

    (a) Multivariate normal, p = q = 5 (b) t(1), p = q = 5

    n B V W T S V W T S

    25 400 0.1039 0.1089 0.1212 0.1121 0.1010 0.3148 0.1137 0.112030 366 0.0992 0.0987 0.1145 0.1049 0.0984 0.3097 0.1102 0.107835 342 0.0977 0.1038 0.1091 0.1011 0.1060 0.3102 0.1087 0.105450 300 0.0990 0.0953 0.1052 0.1011 0.1036 0.2904 0.1072 0.103770 271 0.1001 0.0983 0.1031 0.1004 0.1000 0.2662 0.1013 0.0980

    100 250 0.1019 0.0954 0.0985 0.0972 0.1025 0.2433 0.0974 0.1019

    (c)t

    (2),p

    =q

    = 5 (d)t

    (3),p

    =q

    = 5n B V W T S V W T S

    25 400 0.1037 0.1612 0.1217 0.1203 0.1001 0.1220 0.1174 0.113430 366 0.0959 0.1636 0.1070 0.1060 0.1017 0.1224 0.1143 0.106235 342 0.0998 0.1618 0.1080 0.1073 0.1074 0.1213 0.1131 0.104750 300 0.1033 0.1639 0.1010 0.0969 0.1050 0.1166 0.1065 0.104670 271 0.1029 0.1590 0.1063 0.0994 0.1037 0.1200 0.1020 0.1002

    100 250 0.0985 0.1560 0.1050 0.1007 0.1019 0.1176 0.1066 0.1033

    has the Wilks Lambda distribution (q, n 1 p, p) [14]. Puri and Sen [8],Chapter 8, proposed similar tests based on more general sample dispersion

    matrices T = (Tij). The PuriSen tests replace S, S11, S12 and S22 in (5.1)with T, T11, T12 and T22. For example, T can be a matrix of Spearmansrank correlation statistics. For a sign test the dispersion matrix has entries1n

    nj=1 sign(Zjk Zk)sign(Zjm Zm), where Zk is the sample median of the

    kth variable. Critical values of the Wilks Lambda and PuriSen statistics aregiven by Bartletts approximation: ifn is large and p,q > 2, then (n 12 (p +q+ 3))log det(I S122 S21S111 S12) has an approximate 2(pq) distribution([6], Section 5.3.2b).

    To implement the distance covariance test for small samples, we obtaina reference distribution for nV2n under independence by conditioning on theobserved sample, that is, by computing replicates of nV2n under randompermutations of the indices of the Y sample. We obtain good control of

    Type-I error (see Table 1) with a small number of replicates; for this studywe used 200 + 5000/n replicates. Implementation is straightforward dueto the simple form of the statistic. The statistic nV2n has O(n2) time andspace computational complexity. Source code for the test implementation isavailable from the authors upon request.

    Each example compares the empirical power of the dCov test (labeled V)with the Wilks Lambda statistic (W), PuriSen rank correlation statistic (S)

  • 7/28/2019 Distance Correlation

    21/26

    DISTANCE CORRELATION 21

    Fig. 2. Example1(a): Empirical power at0.1 significance and sample sizen, multivariatenormal alternative.

    and PuriSen sign statistic (T). Empirical power is computed as the propor-tion of significant tests on 10,000 random samples at significance level 0.1.

    Example 1. In 1(a) the marginal distributions ofX and Y are standardmultivariate normal in dimensions p = q= 5 and Cov(X

    k, Y

    l) = for k, l =

    1, . . . , 5. The results displayed in Figure 2 are based on 10,000 tests for eachof the sample sizes n = 25:50:1, 55:100:5, 110:200:10 with = 0.1. Asexpected, the Wilks LRT is optimal in this case, but power of the dCovtest is quite close to W. Table 1 gives empirical Type-I error rates for thisexample when = 0.

    In Examples 1(b)1(d) we repeat 1(a) under identical conditions exceptthat the random variables Xk and Yl are generated from the t() distribu-tion. Table 1 gives empirical Type-I error rates for = 1, 2, 3 when = 0.Empirical power for the alternative = 0.1 is compared in Figures 35. (TheWilks LRT has inflated Type-I error for = 1, 2, 3, so a power comparisonwith W is not meaningful, particularly for = 1, 2.)

    Example 2. The distribution of X is standard multivariate normal(p = 5), and Ykj = Xkjkj , j = 1, . . . , p, where kj are independent standardnormal variables and independent of X. Comparisons are based on 10,000tests for each of the sample sizes n = 25 :50: 1, 55: 100: 5, 110: 240: 10. Theresults displayed in Figure 6 show that the dCov test is clearly superior tothe LRT tests. This alternative is an example where the rank correlationand sign tests do not exhibit power increasing with sample size.

  • 7/28/2019 Distance Correlation

    22/26

    22 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    Fig. 3. Example1(b): Empirical power at 0.1 significance and sample sizen, t(1) alter-native.

    Fig. 4. Example1(c): Empirical power at 0.1 significance and sample sizen, t(2) alter-native.

    Example 3. The distribution ofX is standard multivariate normal (p =5), and Ykj = log(X

    2kj), j = 1, . . . , p. Comparisons are based on 10,000 tests

    for each of the sample sizes n = 25:50:1, 55:100:5. Simulation results are

  • 7/28/2019 Distance Correlation

    23/26

    DISTANCE CORRELATION 23

    Fig. 5. Example1(d): Empirical power at 0.1 significance and sample sizen, t(3) alter-native.

    Fig. 6. Example 2: Empirical power at 0.1 significance and sample size n against thealternative Y = X .

    displayed in Figure 7. This is an example of a nonlinear relation wherenV2n achieves very good power while none of the LRT type tests performswell.

  • 7/28/2019 Distance Correlation

    24/26

    24 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    Fig. 7. Example 3: Empirical power at 0.1 significance and sample size n against thealternative Y = log(X2).

    6. Summary. We have introduced new distance measures of dependencedCov, analogous to covariance, and dCor, analogous to correlation, definedfor all random vectors with finite first moments. The dCov test of multivari-ate independence based on the statistic n

    V2n is statistically consistent against

    all dependent alternatives with finite expectation. Empirical results suggestthat the dCov test may be more powerful than the parametric LRT whenthe dependence structure is nonlinear, while in the multivariate normal casethe dCov test was quite close in power to the likelihood ratio test. Our pro-posed statistics are sensitive to all types of departures from independence,including nonlinear or nonmonotone dependence structure.

    APPENDIX

    Proof of statement (2.18). By definition (2.8)

    n2

    V2n =

    n

    k,l=1

    aklbkl akl bk akl bl +akl b

    akbkl +akbk +akbl

    akb

    albkl +albk +albl alb

    +a

    bkl abk abl +ab

    =k,l

    aklbkl k

    akbk l

    albl + ab

    k

    akbk + nk

    akbk +k,l

    akbl nk

    akb(A.1)

  • 7/28/2019 Distance Correlation

    25/26

    DISTANCE CORRELATION 25

    l

    albl +k,l

    albk + nl

    albl nl

    alb

    + a

    b

    nk

    a

    bk nl

    a

    bl + n

    2a

    b

    ,

    where ak = nak, al = nal, bk = nbk and bl = nbl. Applying the identities

    S1 =1

    n2

    nk,l=1

    aklbkl,(A.2)

    S2 =1

    n2

    n

    k,l=1akl

    1

    n2

    n

    k,l=1bkl = ab,(A.3)

    n2S2 = n2a

    b

    =a

    n

    nl=1

    bl

    n=

    nk,l=1

    akbl,(A.4)

    S3 =1

    n3

    nk=1

    nl,m=1

    aklbkm =1

    n3

    nk=1

    akbk =1

    n

    nk=1

    akbk(A.5)

    to (A.1), we obtain

    n2V2n = n2S1 n2S3 n2S3 + n2S2 n2S3 + n2S3 + n2S2 n2S2

    n2

    S3 + n2

    S2 + n2

    S3 n2

    S2+ n2S2 n2S2 n2S2 + n2S2 = n2(S1 + S2 2S3).

    Acknowledgments. The authors are grateful to the Associate Editor andreferee for many suggestions that greatly improved the paper.

    REFERENCES

    [1] Albert, P. S., Ratnasinghe, D., Tangrea, J. and Wacholder, S. (2001). Lim-itations of the case-only design for identifying gene-environment interactions.Amer. J. Epidemiol. 154 687693.

    [2] Bakirov, N. K., Rizzo, M. L. and Szekely, G. J. (2006). A multivariate nonpara-metric test of independence. J. Multivariate Anal. 97 17421756. MR2298886

    [3] Eaton, M. L. (1989). Group Invariance Applications in Statistics. IMS, Hayward,CA. MR1089423

    [4] Giri, N. C. (1996). Group Invariance in Statistical Inference. World Scientific, RiverEdge, NJ. MR1626154

    [5] Kuo, H. H. (1975). Gaussian Measures in Banach Spaces. Lecture Notes in Math.463. Springer, Berlin. MR0461643

    [6] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Aca-demic Press, London. MR0560319

    http://www.ams.org/mathscinet-getitem?mr=2298886http://www.ams.org/mathscinet-getitem?mr=1089423http://www.ams.org/mathscinet-getitem?mr=1626154http://www.ams.org/mathscinet-getitem?mr=0461643http://www.ams.org/mathscinet-getitem?mr=0560319http://www.ams.org/mathscinet-getitem?mr=0560319http://www.ams.org/mathscinet-getitem?mr=0461643http://www.ams.org/mathscinet-getitem?mr=1626154http://www.ams.org/mathscinet-getitem?mr=1089423http://www.ams.org/mathscinet-getitem?mr=2298886
  • 7/28/2019 Distance Correlation

    26/26

    26 G. J. SZEKELY, M. L. RIZZO AND N. K. BAKIROV

    [7] Potvin, C. and Roff, D. A. (1993). Distribution-free and robust statistical meth-ods: Viable alternatives to parametric statistics? Ecology 74 16171628.

    [8] Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis.Wiley, New York. MR0298844

    [9] Szekely, G. J. and Bakirov, N. K. (2003). Extremal probabilities for Gaussianquadratic forms. Probab. Theory Related Fields 126 184202. MR1990053

    [10] Szekely, G. J. and Rizzo, M. L. (2005). A new test for multivariate normality. J.Multivariate Anal. 93 5880. MR2119764

    [11] Szekely, G. J. and Rizzo, M. L. (2005). Hierarchical clustering via joint between-within distances: Extending Wards minimum variance method. J. Classification22 151183. MR2231170

    [12] Tracz, S. M., Elmore, P. B. and Pohlmann, J. T. (1992). Correlational meta-analysis: Independent and nonindependent cases. Educational and PsychologicalMeasurement 52 879888.

    [13] von Mises, R. (1947). On the asymptotic distribution of differentiable statisticalfunctionals. Ann. Math. Statist. 18 309348. MR0022330

    [14] Wilks, S. S. (1935). On the independence ofk sets of normally distributed statisticalvariables. Econometrica 3 309326.

    G. J. SzekelyDepartment of Mathematics and Statistics

    Bowling Green State UniversityBowling Green, Ohio 43403USA

    andRenyi Institute of Mathematics

    Hungarian Academy of SciencesHungaryE-mail: [email protected]

    M. L. RizzoDepartment of Mathematics and Statistics

    Bowling Green State UniversityBowling Green, Ohio 43403USA

    E-mail: [email protected]

    N. K. BakirovInstitute of Mathematics

    USC Russian Academy of Sciences112 Chernyshevskii St.

    450000 Ufa

    Russia

    http://www.ams.org/mathscinet-getitem?mr=0298844http://www.ams.org/mathscinet-getitem?mr=1990053http://www.ams.org/mathscinet-getitem?mr=2119764http://www.ams.org/mathscinet-getitem?mr=2231170http://www.ams.org/mathscinet-getitem?mr=0022330mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.ams.org/mathscinet-getitem?mr=0022330http://www.ams.org/mathscinet-getitem?mr=2231170http://www.ams.org/mathscinet-getitem?mr=2119764http://www.ams.org/mathscinet-getitem?mr=1990053http://www.ams.org/mathscinet-getitem?mr=0298844