A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen...

25
A JACKKNIFE EMPIRICAL LIKELIHOOD APPROACH TO GOODNESS-OF-FIT DEGENERATE U-STATISTICS TESTING By Hanxiang Peng * Indiana University Purdue University at Indianapolis Motivated by applications to goodness of fit degenerate U-statistics testing, the Jackknife empirical likelihood approach for U-statistics is generalized to degenerate U-statistics. The proposed empirical likeli- hood based goodness of fit tests are asymptotically distribution free. The asymptotic theory for testing shift of location, spatial depth testing for central symmetry, Cram´ er-von Mises type statistic of M- distribution Testing for a Specific Distribution, Cram´ er-von Mises statistic testing for a specific distribution, Simplicial depth testing for angular symmetry are presented. 1. Introduction. The empirical likelihood approach was introduced by Owen (1988, 1990) to construct confidence intervals in a nonparametric set- ting, see also Owen (2001). As a likelihood approach possessing nonpara- metric properties, it does not require us to specify a distribution for the data and often yields more efficient estimates of the parameters. It allows data to decide the shape of confidence regions and is Bartlett correctable (DiCiccio, Hall and Romano, 1991). The approach has been developed to various situations, e.g., to generalized linear models (Kolaczyk, 1994), lo- cal linear smoother (Chen and Qin, 2000), partially linear models (Shi and Lau, 2000; Wang and Jing, 2003), parametric and semiparametric models in multiresponse regression (Chen and Van Keilegom, 2009), linear regression with censored data (Zhou and Li, 2008), and plug-in estimates of nuisance parameters in estimating equations in the context of survival analysis (Qin and Jing, 2001; Wang and Jing, 2001; Li and Wang, 2003). Algorithms, cali- bration and higher-order precision of the approach can be found in Hall and La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate side information expressed through equality constraints. Qin and Lawless (1994) linked empirical like- * Supported by NSF Grant DMS 0940365 AMS 2000 subject classifications: Primary 62G10; secondary 62G15,62G20 Keywords and phrases: infinitely many constraints; nuisance parameter; estimated con- straint functions; Kendall’s τ test for independence; Theil’s test for slope; U-statistics; Wilcoxon signed rank test; 1

Transcript of A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen...

Page 1: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

A JACKKNIFE EMPIRICAL LIKELIHOOD APPROACHTO GOODNESS-OF-FIT DEGENERATE U-STATISTICS

TESTING

By Hanxiang Peng∗

Indiana University Purdue University at Indianapolis

Motivated by applications to goodness of fit degenerate U-statisticstesting, the Jackknife empirical likelihood approach for U-statistics isgeneralized to degenerate U-statistics. The proposed empirical likeli-hood based goodness of fit tests are asymptotically distribution free.The asymptotic theory for testing shift of location, spatial depthtesting for central symmetry, Cramer-von Mises type statistic of M-distribution Testing for a Specific Distribution, Cramer-von Misesstatistic testing for a specific distribution, Simplicial depth testingfor angular symmetry are presented.

1. Introduction. The empirical likelihood approach was introduced byOwen (1988, 1990) to construct confidence intervals in a nonparametric set-ting, see also Owen (2001). As a likelihood approach possessing nonpara-metric properties, it does not require us to specify a distribution for thedata and often yields more efficient estimates of the parameters. It allowsdata to decide the shape of confidence regions and is Bartlett correctable(DiCiccio, Hall and Romano, 1991). The approach has been developed tovarious situations, e.g., to generalized linear models (Kolaczyk, 1994), lo-cal linear smoother (Chen and Qin, 2000), partially linear models (Shi andLau, 2000; Wang and Jing, 2003), parametric and semiparametric models inmultiresponse regression (Chen and Van Keilegom, 2009), linear regressionwith censored data (Zhou and Li, 2008), and plug-in estimates of nuisanceparameters in estimating equations in the context of survival analysis (Qinand Jing, 2001; Wang and Jing, 2001; Li and Wang, 2003). Algorithms, cali-bration and higher-order precision of the approach can be found in Hall andLa Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) amongothers. It is especially convenient to incorporate side information expressedthrough equality constraints. Qin and Lawless (1994) linked empirical like-

∗Supported by NSF Grant DMS 0940365AMS 2000 subject classifications: Primary 62G10; secondary 62G15,62G20Keywords and phrases: infinitely many constraints; nuisance parameter; estimated con-

straint functions; Kendall’s τ test for independence; Theil’s test for slope; U-statistics;Wilcoxon signed rank test;

1

Page 2: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

2

lihood with finitely many estimating equations. These estimating equationsserve as finitely many equality constraints.

Recently Hjort, McKeague and Van Keilegom (2009) extended the scopeof the empirical method. In particular, they developed a general theory forconstraints with nuisance parameters and considered the case with infinitelymany constraints. Their results for infinitely many constraints, however, donot allow for nuisance parameters. In this paper we will fill this gap and inthe process improve on their results. Let us now discuss some of our resultsin the following special case.

Our paper is organized as follows. In Section 2, we give several importantexamples that motivate our research. The emphasis in these examples is ongoodness of fit testing. The proposed empirical likelihood based goodness offit tests are asymptotically distribution free. In Section 3, we justify the def-inition of jackknife U-statistics. In Section 4, Wilks theorems for degenerateU-statistics are developed for fixed dimension with estimated constraints.In Section 5, we discuss the jackknife empirical likelihood for degenerateU-statistics when side information is available.

2. Motivating Examples. In this section, we give examples that mo-tivated the research in this paper.

Example 1. Testing Shift of Location. Suppose that (Xi, Yi), i =1, . . . , n is a random sample from a continuous distribution F and (Ui, Vi), i =1, . . . , n is a random sample from a continuous distribution G, and the tworandom samples are indepedent. One is interested in testing the null hypoth-esis that the location parameters of the two distribution are identical againstthe alternative that one location is a shift of the other, i.e., H0 : θF = θGagainst H1 : θF = θG+δ where δ 6= 0. Assume the second moments of F andG are finite and identical. Mathur and Smith (2008) proposed the followingdegenerate U-statistic

Un(hmat) =

(n

2

)−1 ∑1≤i<j≤n

h(Si,Sj),

where hmat(s1, s2) = s1·s2−(x1, y1, x2, y2)·(u2, v2, u1, v1) and s1 = (x1, y1, u1, v1)and s2 = (x2, y2, u2, v2). In order to simplify the notation, let us assumeθF = θG = 0. Thene it can be shown that under the hull hypothesis,

nUn(hmat) =⇒ ξmat = λ1(Z21 − 1) + λ2(Z

22 − 1),

Page 3: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 3

where λ1,2 = .5E(U2 + V 2)±

[(E(U2 − V 2))2 + 4(E(UV ))2

]1/2. Hence it

follows from Theorem 4.2 that under the null hypothesis,

−2 log Rn(hmat) =⇒ (1/4σ2hmat)ξ2mat

provided that X,Y, U, V have finite eighth moments.

Example 2. Spatial Depth Testing Central Symmetry. Let X1,. . . , Xn be independent random Rd-valued variables with a common distri-bution F . The (sample) spatial depth function (Chaudhuri (1996)) is givenby

SDn(x) = 1−∣∣ 1n

n∑j=1

ψ(Xj − x)∣∣,

where ψ(x) = x/|x|, x ∈ Rd if x 6= 0 and ψ(0) = 0 is the spatial sign(unit) function. The (population) spatial depth function is SD(x) = 1 −|E(ψ(X − x))|. Any vector that maximizes the spatial depth is called aspatial median. Suppose F is centrally symmetric about θ, so that X−θ andθ − X are identically distributed. Milasevic and Ducharme (1987) showedthat the center of θ of central symmetry is the unique spatial median ifthe distribution F of X is not concentrated on a line on Rd, and clearlySD(θ) = 1. We shall use this property to test mulvariate central symmetry.Our approach toward this is empirical likelihood. Should we take the usualprocedure of constructing the empirical likelihood, we would end up withnonlinear constraint equations. Specifically, because SD(x) = 1− |E(ψ(X −x))|, we would confront with the empirical likelihood

Rn = sup n∏j=1

nπj : π ∈Pn,∣∣∣ n∑j=1

πjψ(Xj)∣∣∣ = 1−D(x)

.

This form of empirical likelihood involves equations nonlineaer in weightsπj′s, leading to the consequence that we even cannot find an explicit formu-

lae for the weights πj′s as we usually do. Using U-statistics, we can avoid the

nonlinear constraint equations and work with the usual linear constraints inthe weights, so that we obtain the usual form of empirical likelihood. Thisis made possible by the simple identity

|E(ψ(X−x))|2 = E(ψ(X−x))>E(ψ(X−x)) = E(ψ(X1−x))>ψ(X2−x)

).

Thus we take hsd(x1, x2|x) = ψ(x1 − x)>ψ(x2 − x) as the kernel for fixed xand calculate the jackknife values Vnj

′s of the U-statistic

Un(hsd;x) =

(n

2

)−1 ∑1≤i<j≤n

hsd(Xi, Xj |x)

Page 4: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

4

and look at the empirical likelihood

Rn(hsd;x) = sup n∏j=1

nπj : π ∈Pn,n∑j=1

πjVnj = (1−D(x))2.

Here we will study the case that x is the center θ of central symmetry, sothat D(θ) = 1. In this case, the kernel is degenerate, indeed, since under thenull hypothesis of symmetry, E(ψ(X2 − θ)) = 0, it follows

E(hsd(x1, X2)) = ψ(x1 − θ)>E(ψ(X2 − θ)) = 0.

Thus the corresponding U-statistic Un(hsd) = Un(hsd, θ) is degenerate, andwith it associated the empirical likelihood Rn(hsd) = Rn(hsd, θ). Clearly,|ψ(x)| ≤ 1. Thus by Corollary 4.1, Rn(hsd) converges in distribution to thelimit variable given in the corollary. Let us now calculate the eigenvalues ofthe integral operator. Because the spatial depth function D(x) is locationinvariant, it is without loss of generality to assume θ = 0. Let λ be a nonzeroeigenvalue and ϕ be the associated eigen function, so that they must satisfy

ψ(x)>∫ψ(y)ϕ(y) dF (y) = λϕ(x), x ∈ X .

From this we solve ϕ(y) = λ−1ψ(x)>∫ψ(y)ϕ(y) dF (y) and substitute it in

the above right hand side, we obtain

ψ(x)>Ψ

∫ψ(y)ϕ(y) dF (y) = λ2ϕ(y), y ∈ X ,

where Ψ =∫ψ(x)ψ(x)> dF (x) =

∫xx>/|x|2 dF (x). Using the above ex-

pression for ψ(y) again, we immediately arrive at the equation

ψ(x)>(λI −Ψ)

∫ψ(y)ϕ(y) dF (y) = 0,

where I denotes the d× d identity matrix. Repeat this process k times, weget

ψ(x)>(λkI −Ψk)

∫ψ(y)ϕ(y) dF (y) = 0, k = 1, 2, . . . .

Multiply ψ(x) on the above equation and integrate it with respect to F toobtain

Ψ(λkI −Ψk)

∫ψ(y)ϕ(y) dF (y) = 0, k = 1, 2, . . . .

Page 5: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 5

Since Ψ is nonsingular and there must be nonzero solutions, it follows thatthe eigen values λ are the solutions of the equation

(2.1) det(λkI −Ψk) = 0, k = 1, 2, . . . .

where det(M) denotes the determinant of matrix M . Suppose X is spheri-cally symmetric, that is, X and ΓX are identically distributed for any d× dothorgonal matrix Γ. Then one has Ψ = I/d, see, e.g., page 903, Chaudhuri(1996). Thus under the null hypothesis of spherical symmetry, there are twodistinct egenvalues ±1/

√d. Let us denote the corresponding kernel by hsd,ss.

Then we have−2 log Rn(hsd,ss) =⇒ (Z2

1 − Z22 )/√d,

Example 3. Cramer-von Mises type statistic: M-distributionTesting a Specific Distribution. Let X1, . . . , Xn be independent Rd-valued random variables with a distribution P . We are interested in testingthe null hypothesis that the underlying multivariate distribution is a Borelprobability measure π on Rd against the alternative that the underlyingdistribution is not π, i.e., H0 : P = π against H1 : P 6= π. Koltchinskii (1997)introduced the multivariate extension of the Cramer-von Mises statistics(ω2-test as called by the author) which is given by

W 2n = n

∫Rd

|Fn(s)− F (s)|2 π(ds),

where F (s) =∫Rd ψ(s − x)π(dx), s ∈ Rd is an M-distribution function and

Fn(s) = 1n

∑nj=1 ψ(s − Xj) is the empirical M -distribution function. Here

ψ is the spatial sign function given in Example 5. He showed that if πis nonatomic, then the limit distribution of W 2

n is that of∫|ξπ(s)|2 π(ds),

where

ξπ(s) =

∫Rd

ψ(s− x)Gπ(dx), s ∈ Rd.

with Gπ the π-Brownian bridge process (a Gaussian process with mean zeroand covariance E(Gπ(f)Gπ(g)) = π(fg)−π(f)π(g)). In the same fashion asin Example 4, we look at the jackknife empirical likelihood

Rn(hko) = sup n∏j=1

nπj : π ∈Pn,n∑j=1

πjVnj = 0,

where Vnj′s are the jackknife values of the U-statistic

Un(hko) =

(n

2

)−1 ∑1≤i<j≤n

hko(Xi, Xj),

Page 6: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

6

where hko(x1, x2) =∫Rd ψ(x1 − s)>ψ(x2 − s)π(ds) with ψ the centered

version, i.e. ψ(x) = ψ(x) −∫Rd ψ(x)π(dx). Since under the hull hypothesis

Eπ(hko(x1, X2)) = 0 and σ2ko = E(hko2(X1, X2)) is finite, it follows that

Un(hko) is a degenerate U-statistic. Besides, hko is bounded by 4 so that ithas finite fourth moment and

nUn(hko) =⇒ ξko =∞∑j=1

λj(Z2j − 1)

where Zj′s are i.i.d standard normal random variables, and λj

′s are theeigen values of the integral equation∫

hko(x, y)ϕ(y) dπ(y) = λϕ(x), x ∈ Rd, ϕ ∈ L2(π).

Hence it follows from Theorem 4.2 that under the null hypothesis,

−2 log Rn(hko) =⇒ ξ2ko/4σ2ko.

Example 4. Cramer-von Mises statistic Testing a Specific Dis-tribution. Let X1, . . . , Xn be independent random variables with a com-mon distribution F . The Cramer-von Mises statistic for testing the null hy-pothesis that the underlying cumulative distribution function is F is givenby

ηn = n

∫(F− F )2 dF =

1

n

n∑i=1

hcv(Xi, Xi) +2

n

∑1≤i<j≤n

hcv(Xi, Xj)

where hcv(s, t) =∫

(1[s ≤ x]−F (x))(1[t ≤ x]−F (x)) dF (x). Under the nullhypothesis, the mean of the first average is 1/6 and the mean of the secondaverage is zero. This suggests us to look at the jackknife empirical likelihood

Rn(hcv) = sup n∏j=1

nπj : π ∈Pn,n∑j=1

πjVnj = 0,

where Vnj′s are the jackknife values of the U-statistic

Un(hcv) =

(n

2

)−1 ∑1≤i<j≤n

hcv(Xi, Xj).

Since under the hull hypothesis E(hcv(x1, X2)) = 0 and σ2cv = E(hcv2(X1, X2))

is finite, it follows that Un(hcv) is a degenerate U-statistic. Besides, hcv is

Page 7: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 7

bounded by 1 so that it has finite fourth moment and

nUn(hcv) =⇒ ξcv =∞∑j=1

j−2π−2(Z2j − 1)

where Zj′s are i.i.d standard normal random variables, see, e.g., Example

12.13 of van der Vaart (2000). Hence it follows from Theorem 4.2 that underthe null hypothesis,

−2 log Rn(hcv) =⇒ ξ2cv/4σ2cv.

Example 5. Simplicial Depth Testing Angular Symmetry. LetX1, . . . , Xn be independent random Rd-valued variables with a common dis-tribution F . The (sample) simplicial depth function (Liu, 1990) on Rd withrespect o F is defined as

SPDn(x) =

(n

d+ 1

)−1 ∑1≤i1<···<id+1≤n

1[x ∈ S[Xi1 , . . . , Xip+1 ]

],

where S[x1, . . . , xd+1] denotes the closed simplex with vertices X1, . . . , Xd+1.This is a U-statistic. The population simplicial depth function is given bySPD(x) = P (x ∈ S(X1, . . . , Xd+1)). Let θ be the center of angular symmetryof distribution F , so that (X−θ)/|X−θ| and (θ−X)/|X−θ| are identicallydistributed. Angular symmetry is broader than the usual central symmetryand the former implies the latter. Liu (1990) showed in her Theorem 4that if F is absolutely continuous on Rd and angularly symmetric about θ,then SDF (x) is uniquely maximized at the angular symmetric center θ andSPD(θ) = 2−d. Further, she pointed out in Rembark B that Dn(θ)− 2−d isa degenerate U-statistic, that is

E(Dn(θ)− 2−d|Xi) = 0, i = 1, . . . , n.

Therefore, Un(hliu) = Dn(θ) − 2−d is a degenerate U-statistic with kernelhliu(x1, . . . , xd+1|θ) = 1[θ ∈ S(x1, . . . , xd+1)] − 2−d and can be used to testangular symmetry. Specifically, if θ is a hypothesized center of angular sym-metry, then a smaller value of Un(hliu) rejects the null hypothesis. Thismotives us to introduce the empirical likelihood

Rn(hliu) = sup n∏j=1

nπj : π ∈Pn,

n∑j=1

πjVnj = 0,

Page 8: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

8

where Vnj′s are the jackknife values of the U-statistic Un(hliu). Since hliu is

bounded by one so that it has finite fourth moment and

nUn(hliu) =⇒ ξliu =∞∑j=1

λj(Z2j − 1)

where Zj′s are i.i.d standard normal random variables and λj

′s are theeigenvalues of the integral equation

λg(x) =

∫h2,liu(x, y)g(y) dF (y), x ∈ Rd, g ∈ L2(F ).

Here h2,liu(x, y) = E(hliu(x, y,X3, . . . , Xd+1), x, y ∈ Rd. Hence it followsfrom Theorem 4.2 that under the null hypothesis,

−2 log Rn(hliu) =⇒ ξ2liu/(d+ 1)(d+ 1)!σ2liu,

where σ2liu = E(hliu2(X1, . . . , Xd+1)).

3. Jacknife Empirical likelihood for U statistics. We will first re-call some facts about one-sample U-statistics. Let (Ω,A, P ) be a probabilityspace, and X1, . . . , Xn be a random sample from an unknown distributionfunction F under P . Let h : Rm 7→ R be a known function that is permuta-tion symmetric in its m arguments. A U-statistic with kernel h of order mis defined as

Un ≡ Unm(h) =

(n

m

)−1 ∑1≤i1<...<im≤n

h(Xi1 , . . . , Xim), n ≥ 2.

Throughout we assume h is Fm-square integrable, that is, h ∈ L2(Fm),

where L2(Fm) =

f :∫f2 dFm <∞

. We shall abbreviate θ = E(h) :=

E(h(X1, . . . , Xm)) =∫h dFm, Pnf = n−1

∑nj=1 f(Xj) and Pf = E(f(X))

with X an i.i.d copy of X1. Then Un is an unbiased estimate of θ. Let hm = hand for c = 1, . . . ,m−1 let hc(x1, . . . , xc) = E(h(x1, . . . , xc, Xc+1, . . . , Xm)).Then hc is a version of the conditional expectation, that is,

hc(x1, . . . , xc) = E(h(X1, . . . , Xm)|X1 = x1, . . . , Xc = xc).

Let δx be the point mass at x ∈ Rd. We now define

h∗c(x1, · · · , xc) = (δx1 − P ) · · · (δxc − P )Pm−ch, c = 0, 1, . . . ,m.

Page 9: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 9

With these the useful Hoeffding decomposition can be stated as

(3.1) Un − θ =m∑c=1

(m

c

)Unc(h

∗c).

Let U(−j)n−1 denote the U-statistic based on the n− 1 observations X1, . . .,

Xj−1, Xj+1, . . . , Xn. The jacknife pseudo values of the U-statistic are definedas

Vnj = nUn − (n− 1)U(−j)n−1 , j = 1, . . . , n.

From (3.1) we immediately derive

(3.2) Vnj = θ +mh1(Xj) +Rnj , j = 1, . . . , n,

where Rnj =∑m

c=2

(mc

)(nUnc(h

∗c)− (n− 1)U

(−j)(n−1)c(h

∗c)), and f = f −Pf be

the centered versions of f .Using the orthogonality properties of Unc(h

∗c)′s, one calculates

E(Rnj) = E(Rn1) = 0, E(R2nj) = E(R2

n1) = O(1/n), j = 1, . . . , n.

Thus for each fixed j, Vnj − θ−mh1(Xj)→ 0 in quadratic mean and hencein probability as n tends to infinity. This shows that each jacknife value Vnjdepends asymptotically onXj and hence Vnj , j = 1, . . . , n are asymptoticallyindependent. As a result, if πj is a probability mass placed at Xj , thenapproximately the same probability mass πj is placed at the jacknife valueVnj for j = 1, . . . , n; because of the the asymptotic independence of thejacknife values, the joint likelihood is approximately the product of theseπ′js. This sugggests us to introduce the jacknife empirical likelihood of Jing,et al. (2009) with side information as follows,

Rn(h, g) = sup n∏j=1

nπj : π ∈Pn,

n∑j=1

πj(Vnj − θ) = 0,

n∑j=1

πjg(Xj) = 0,

where g is a Rr-valued measurable functions such that∫g dF = 0 and∫

|g|2 dF is finite. Here r is the number of equalities that express the sideinformation, and we call such equalities constraints.

4. Main results. We need the following result which is a special caseof Lemma 5.2 of Peng and Schick (2010).

Page 10: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

10

Lemma 4.1. Let x1, . . . , xN be r-dimensional vectors. Set

x∗ = max1≤j≤N

|xj |, x =1

N

N∑j=1

xj , S =1

N

N∑j=1

xjx>j ,

R = max N∏i=1

Nπi : π ∈PN ,N∑i=1

πixi = 0.

Let λ denote the smallest and Λ the largest eigen value of the matrix S.Then the inequality λ > 5|x|x∗ implies that there is a unique ζ satisfying

(4.1) 1 + ζ>xj > 0, j = 1, . . . , N,

(4.2)N∑j=1

xj1 + ζ>xj

= 0,

(4.3) R =n∏j=1

1

1 + ζ>xj,

and

(4.4)∣∣∣− 2 log R −Nx>S−1x

∣∣∣ ≤ N |x|3x(3)

(λ− |x|x∗)3+

Λ2

λ24N |x|4x(4)

(λ− |x|x∗)4.

We shall now derive the Wilks theorems using Lemma 4.1 to the casewhen the vectors xj are replaced by random vectors. We are interested inthe cases both the dimension of the random vectors is fixed and growingwith n.

Let Z1, . . . , Zn be random vectors from some unknown distribution F . LetTn1, . . . , TnN be rn-dimensional measurable functions of the random vectors,where N = Nn ≥ n increases with n. With these random functions weassociate the empirical likelihood

Rn = sup Nn∏j=1

Nnπj : π ∈PNn ,

Nn∑j=1

πjTnj = 0.

Page 11: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 11

To study the asymptotic behavior of Rn we introduce

T ∗n = max1≤j≤Nn

|Tnj |, Tn =1

Nn

Nn∑j=1

Tnj , Sn =1

Nn

Nn∑j=1

TnjT>nj ,

and

T (k)n = sup

|u|=1

∣∣∣ 1

Nn

Nn∑j=1

(u>Tnj)k∣∣∣, k = 3, 4,

and let λn = λmin(Sn) and Λn = λmax(Sn) denote the smallest and largesteigen values of Sn, i.e.,

λn = inf|u|=1

u>Snu and Λn = sup|u|=1

u>Snu.

Peng and Schick (2010) investigated the asymptotic behavior of the em-pirical likelihood when Sn can be approximated by a sequence Wn of dis-persion matrices which is regular in the sense that the smallest eigenvalueλmin(Wn) is bounded away from zero and the largest eigenvalue λmax(Wn)is bounded away from infinity uniformly in n. Here we shall relax this as-sumption and allow λmin(Wn) and λmax(Wn) to approach zero as n tendsto infinity, see (4.5) below. Motivated by (A1) – (A4) of Peng and Schick(2011), we introduce the following conditions.

(A1) For some ν ≥ 1, nν/2−1T ∗n = op(r−1/2n );

(A2) nν |Tn|2 = Op(rn);(A3) There is a sequence of rn × rn dispersion matrices Wn such that

(4.5) 0 < lim infn→∞

nν−1λmin(Wn) ≤ lim supn→∞

nν−1λmax(Wn) <∞,

(4.6) nν−1|Sn −Wn|o = op(r−1/2n );

(A4) n3ν/2−2T(3)n = op(r

−1n ) and n2ν−3T

(4)n = op(r

−3/2n ).

Clearly when ν = 1 these conditions correspond to (A1) – (A4) of Peng andSchick (2011).

It follows from (A1) and (A2) that nν−1T ∗n |Tn| = op(1), and from (A3)that P (a ≤ nν−1λn ≤ nν−1Λn ≤ A) → 1 as n tends to infinity, where aand A denote the lower and upper limits in (4.5). Thus P (λn > 5T ∗n |Tn|) =P (nν−1λn > 5nν−1T ∗n |Tn|) → 1 as n tends to infinity. Consequently, by

Page 12: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

12

Lemma 4.1, there exists an rn-dimensional random vector ζn which is uniquelydetermined on the event inside the latter probability by the properties

(4.7) 1 + ζ>n Tnj > 0, j = 1, . . . , Nn,

(4.8)1

Nn

Nn∑j=1

Tnj

1 + ζ>n Tnj= 0.

and

Rn =

Nn∏j=1

1

1 + ζ>n Tnj.

By (A3), Sn is invertible except on an event whose probability tends to zero,and

nν−1(λn − T ∗n |Tn|) = a− op(1), and λn/Λn = OP (1).

By (A2) and (A4),

n3(ν−1)|Tn|3T (3)n = op(r

1/2n ) and n4(ν−1)|Tn|4T (4)

n = op(r1/2n ).

Thus, under (A1)–(A4), the following expansion follows from (4.4)

(4.9) − 2(nN−1n ) log Rn = nT>n S−1n Tn + op(r

1/2n ).

From (A3) we can also derive the rate n−ν+1|S−1n − W−1n |o = op(r−1/2n ).

Thus, if (A1)–(A4) hold, then (4.9) holds with Sn replaced by Wn,

(4.10) − 2(nN−1n ) log Rn = nT>n W−1n Tn + op(r

1/2n ).

In view of the inequalities T(3)n ≤ ΛnT

∗n and T

(4)n ≤ Λn(T ∗n)2, a sufficient

condition for (A1) and (A4) is given by

(B1) n−ν/2+1T ∗n = oP (r−1n ) and nν−1Λn = OP (1).

Let us first deal with the case that the dimension rn does not increasewith n. In this case (B1) and (A2) are implied by T ∗n = op(n

ν/2−1), Tn =Op(n

−ν/2) and Λn = OP (n−ν+1), and (A3) is implied by the condition:nν−1Sn = W (ν) + op(1) for some positive definite matrix W . Thus we haveachieved the following conclusion.

Page 13: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 13

Theorem 4.1. Let rn = r for all n. Suppose there exists some ν ≥ 1such that

(4.11) T ∗n = op(nν/2−1), nν/2Tn =⇒ T (ν), nν−1Sn = W (ν) + op(1)

for some random vector T (ν) and positive definite matrix W (ν) both de-pending on ν. Then

−2(nN−1n ) log Rn =⇒ T (ν)>W (ν)−1T (ν).

Proof. The desired result follows from nν/2Tn =⇒ T (ν) and (4.10).

Theorem 4.1 does not require the independence of the random vectors Tn1,. . . , TnN , even though some sort of asymptotic independence is essential tojustify the likelihood. But this is important when dealing with estimatedconstraint functions and U-statistics as we shall see below.

4.1. Wilks Theorems for degenerate U-statistics with side information.In this section, we study the asymptotic behavior of the Jacknife empiricallikelihood of degnerate U-statistics. We first quote a limit theorem of degen-erate U-statistics in page 169 of van der Vaart (2000). Let Hj(x) be the jthHermite polynomial with leading term xj , so that

∫Hi(x)Hj(x)φ(x) dx = 0

for i 6= j, and the first few ones are H0(x) = 1, H1(x) = x and H2(x) = x−1and H3(x) = x3 − 3x. Here φ denotes the density of the standard normal.

Lemma 4.2. Assume h : Xm 7→ R is a symmetric kernel such that0 < σ2h =

∫h2 dPm < ∞ and

∫h(x1, . . . , xm−1, ·) dP = 0 for xj

′s in Xm.Let 1 = b0, b1, b2, . . . be an orthnormal basis of L2(Xm,A, P ). Then

(4.12) nm/2Un,m(h) =⇒∑k∈Nm

P (hbk1 · · · bkm)

d(k)∏i=1

Hai(k)

(G(ψi(k))

),

where G is the P-Brownian bridge process, ψi(k) ′s are the distinct functionsin bki

′s with d(k) the number of the distinct functions, and ai(k) is thenumber of times ψi(k) appears in bki

′s. The variance of the limit variable ism!σ2h.

The kernel h and the associated U-Statistic in the above lemma are re-ferred to as completely degenerate, and the limit distribution is called theGaussian chaos.

Page 14: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

14

Let h be a degenerate kernel and Vn1, . . . , Vnn be the jackknife values ofthe U-statistic Un,m(h) with kernel h. We now look at the Jacknife empiricallikelihood without side information,

Rn(h) = sup n∏j=1

nπj : π ∈Pn,

n∑j=1

πj(Vnj − θ) = 0.

Theorem 4.2. Suppose h is a completely degenerate kernel and has fi-nite fourth moment E(h4(X1, . . . , Xm)) <∞. Then

−2 log Rn(h) =⇒ ξ2/(mm!σ2h),

where ξ is the limit variable given in Lemma 4.2.

Remark 4.1. For m = 2, the limit variable ξ can be represented asξ =

∑∞k=1 λk(Z

2k − 1), where Zk

′s are independent random variables withcommon standard normal distribution, and λk

′s are the eigenvalues of theoperator Aϕ(x) =

∫h(x, y)ϕ(y) dF (y), x ∈ X , ϕ ∈ L2(F ). Details may be

found, e.g. in Examples 12.9 and 12.11 of van der Vaart (2000).

4.2. Quadratic case. Let H be a Hilbert space on reals with real-valuedinner product 〈〉. Let F be a distribution function on a measurable space(X , A), and K is a measurable function from X to H such that E(K(X)) =∫K(x) dF (x) = 0. Let X1, . . . , Xn be independent random variables with

common distribution F . We consider the U-statistic

Un(K) =

(n

2

)−1∑i<j

h(Xi, Xj),

where h(x, y) = 〈K(x), K(y)〉. Clearly E(h(x,X2)) = 0 so that it is degen-erate. By Theorem 4.2 and Remark 4.1, we have the following result.

Corollary 4.1. Suppose K is a bounded measureable function suchthat E(K(X)) = 0 and σ2K = E(〈K(X1), K(X2)〉2) > 0. Then

−2 log Rn(K) =⇒ (1/4σ2K)ξ2K ,

where ξK =∑∞

k=1 λk(Z2k − 1) with λk

′s are the eigenvalues of the operatorAϕ(x) =

∫〈K(x), K(y)〉ϕ(y) dF (y), x ∈ X , ϕ ∈ L2(F ).

For m-tuple vectors i, j with elements in 1, . . . , n, we abuse the setnotation to denote i ∪ j the union of the elememts in i and j and i ∩ j theircommon elememts. Set Xi = (Xi1 , . . . , Xim) for i = (i1, . . . , im). In whatfollows, i, j,k, l denote vectors with components in 1, . . . , n. The followinglemma gives some useful properties of complete degenerte U-statistics.

Page 15: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 15

Lemma 4.3. Suppose h is a completely degnerate kernel with finite sec-ond moment. Then

(4.13) E(h(Xi)h(Xj)) = σ2h1[i = j]

for every m-tuple vectors i, j with components in 1, . . . , n. Suppose h addi-tionally has finite fourth moment. If there is one in h(Xi), h(Xj), h(Xk), h(Xl)which does not equal any of the rest, then

(4.14) E(h(Xi)h(Xj)h(Xk)h(Xl)) = 0

for every m-tuple vectors i, j,k, l with components in 1, . . . , n. Moreover,

(4.15) E(h2(Xi)h2(Xj)) = σ4h1[i ∩ j = ∅] + E(h4)1[i = j].

Recall for c = 1, . . . ,m− 1, hc(x1, . . . , xc) =∫h(x1, . . . , xc, yc+1, . . . , ym)

dF (yc+1) . . . dF (ym). Set hm = h, h0 = E(h) = E(h(X1, . . . , Xm)) and σ2h =E(h2).

Proof. By the complete degeneracy of h, hs(x1, . . . , xs) = 0 for s =0, 1, . . . ,m− 1. Using the independence of X1, . . . , Xn, we have

E(h(Xi)h(Xj)

)= E

E(h(Xi)|Xs : s ∈ i ∩ j

)E(h(Xj)|Xs : s ∈ i ∩ j

)= E(h2c) = E(h2)1[c = m] = σ2h1[i = j],

where c be the number of common elements in i and j. This shows (4.13).Analoguously, one proves (4.14)–(4.15).

Proof. We will prove Theorem 4.2 by verifying the three conditions in(4.11) of Theorem 4.1 with ν = m. Recall h∗m−1,j(x1, . . . , xm−1) = h∗m(Xj , x1, . . . , xm−1)for every j. Because of the complete degeneracy of the kernel h, h∗m = h and

h∗c = hc for c = 1, . . . ,m − 1, hence Un−1,m−1(hm−1,j) and U(−j)n−1,m(hm) are

uncorrelated for every j, and

Var(Un−1,m−1(hm−1,1)

)=

(n− 1

m− 1

)−1σ2h,

Var(U

(−1)n−1,m(hm)

)=

(n− 1

m

)−1σ2h.

Under the complete degeneracy, the only nonzero term in (??) is when c =m, which is

(4.16) Vnj = mUn−1,m−1(hm−1,j)− (m− 1)U(−j)n−1,m(hm), j = 1, . . . , n.

Page 16: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

16

This and the above properties yield E(V 2nj) = E(V 2

n1) and

E(V 2n1) = m2 Var

(Un−1,m−1(hm−1,1)

)+ (m− 1)2 Var

(U

(−1)n−1,m(hm)

)=

(m2

(n− 1

m− 1

)−1+ (m− 1)2

(n− 1

m

)−1)σ2h

≈ mm!n−m+1σ2h, j = 1, . . . , n,(4.17)

where an ≈ bn if both an →∞ and bn →∞ such that an/bn → 1 as n tendsto infinity. Thus similar to (??), we derive for ε > 0,

P(

max1≤j≤n

n(m−1)/2|Vnj | > n1/2ε)≤

n∑j=1

P(|n(m−1)/2|Vnj | > n1/2ε

)= nP

(n(m−1)/2|Vn1| > n1/2ε

)≤ 1

ε2E(nm−1|Vn1|21[n(m−1)/2|Vn1| > n1/2ε]

)→ 0, n→∞.

This shows

(4.18) max1≤j≤n

|Vnj | = op(n−m/2+1),

and verifies the first equality of (4.11) with ν = m. To verify the thirdequality of (4.11), set

Vn =1

n

n∑j=1

(V 2nj − E(V 2

nj)),

and prove below that

(4.19) E(V 2n ) = o(n−2(m−1)).

Accordingly, for ε > 0, by Chebyshev’s inequality,

ε2P (|Vn| > n−m+1ε) ≤ E((nm−1Vn)21[|nm−1Vn| > ε]

)→ 0.

From this and (4.17) it immediately follows

(4.20) nm−11

n

n∑j=1

V 2nj = mm!σ2h + oP (1),

and giving the third equality of (4.11) with W (m) = mm!σ2h.

Page 17: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 17

Let us now verify (4.19). Since h is symmetric and Xi′s are i.i.d., it follows

Cov(V 2ni, V

2nj) equals Cov(V 2

n1, V2n2) or Var(V 2

n1) according to i 6= j or i = j,so that

(4.21) Var(Sn) = (1/n) Var(V 2n1) + (1− 1/n)Cov(V 2

n1, V2n2)

Let us first calculate the above variance. Using expression (4.16) and theinequality (a+ b)4 ≤ 8(a4 + b4), we have

Var(V 2n1) ≤ 8(m4E(Un−1,m−1(hm−1,1)

4) + (m− 1)4E(U(−1)n−1,m(hm)4)).

Denote 2 ≤ i ≤ n= 2 ≤ i1 < · · · < im ≤ n. We then write the latter ex-pectation as

E(U

(−1)n−1,m(hm)4

)=

(n− 1

m

)−4 ∑2≤i,j,k,l≤n

E(h(Xi)h(Xj)h(Xk)h(Xl)

).

Because of the complete degeneracy of the kernel h, the only nonvanishingterms in the above sum are those which contain either σ4h or E(h4). By (4.14)and (4.15), the number of such terms are

(42

)(n−1m

)(n−m−1

m

)for the former

and(n−1m

)for the latter, so that

E(U

(−1)n−1,m(hm)4

)=

(4

2

)(n− 1

m

)−3(n−m− 1

m

)σ4h +

(n− 1

m

)−3E(h4).

Again by (4.14) and (4.15) and using conditioning argument (conditioningon X1), we get

E(Un−1,m−1(hm−1,1)4) =

(4

2

)(n− 1

m− 1

)−3(n−mm− 1

)σ4h +

(n− 1

m− 1

)−3E(h4).

It follows accordingly that

(4.22) Var(V 2n1) = O(n−2(m−1)).

We now deal with the covariance term in (4.21). To simplify our notation,set

U11 = Un−1,m−1(hm−1,1), U12 = U(−1)n−1,m(hm),

and define U21, U22 similarly. For a random variable U , introduce ˜ and wide˜ operations by U2 = (U)2 = (U − E(U))2 and U2 = U2 − E(U2). Using

identity (4.16) and vector-matrix representation, we express V 2ni = a>ui

where

a = (m2, (m− 1)2, −2m(m− 1))>, ui = (U2i1, U

2i2, Ui1Ui2)

>, i = 1, 2.

Page 18: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

18

Thus we obtain a concise expression

(4.23) Cov(V 2n1, V

2n2) = E

(V 2n1V

2n2

)= E(au1u

>2 a) = a>Aa,

where A = E(u1u>2 ). Since A is symmetric, it suffices to find the magnitudes

of the upper triangle entries Aij for i ≥ j with i, j = 1, 2, 3. For simplicity,let us introduce the notation i2 = (i2, . . . , im) and i3 = (i3, . . . , im), whereas

i = (i1, . . . , im). With these, we now write(n−1m−1

)4A11 =

(n−1m−1

)4E(U211U

221

)as ∑

1/∈i2,1/∈j2

∑2/∈k2,2/∈l2

Cov(h(X1, Xi2)h(X1, Xj2), h(X2, Xk2)h(X2, Xl2)

)It follows from (4.13) that

E(h(X1, Xi2)h(X1, Xj2)

)= σ2h1[i2 = j2].

This and (4.14) yield that the covariance in the above sum equals zerounless Case 1: i2 = j2 and k2 = l2, or Case 2: i2 6= j2, k2 6= l2 but (1, i2) =(2,k2), (1, j2) = (2, l2) or (1, i2) = (2, l2), (1, j2) = (2, k2). In Case 1, thecovariance equals

Cov(h2(X1,Xi2), h2(X2,Xk2)

)= E

(h2(X1,Xi2)h2(X2,Xk2)

)− σ4h

= σ4h1[i2 ∩ k2 = ∅]− σ4h.

There are(n−1m−1

)2of the latter term and

(n−1m−1

)(n−mm−1

)of the former term. In

the first part of Case 2, the covariance equals

E(h2(X1,Xi2)h2(X1,Xj2)

)= σ4h1[i2 ∩ j2 = ∅]1[2 ∈ i2]1[2 ∈ j2] = 0.

Consequently, in view of(n−1m−1

)/(n−mm−1

)→ 1 as n tends to infinity, we obtain

(4.24) A11 = σ4h

(n− 1

m− 1

)−2((n− 1

m− 1

)/

(n−mm− 1

)− 1

)= o(n−2(m−1)).

Analoguously, we express(n−1m

)4A22 =

(n−1m

)4E(U212U

222

)as∑

1/∈i,1/∈j

∑2/∈k,2/∈l

Cov(h(Xi)h(Xj), h(Xk)h(Xl)

),

and show

(4.25) A22 = O(n−2m).

Page 19: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 19

Cauchy-Swarchz inequality, (4.24) and (4.25) yield

A233 = |E

(U11U12U21U22

)|2 ≤ E

(U211U

221

)E(U212U

222

)= o(n−4m+2),

so that A33 = o(n−2m+1). In the same manner, one shows

A12 = E(U211U

222

)= O(n−2m+1),

A13 = E(U211U21U22

)= O(n−2m+1),

A23 = E(U212U21U22

)= O(n−2m).

These and (4.23) – (4.25) imply that

Cov(V 2n1, V

2n2) = o(n−2(m−1)),

so that the desired (4.19) is now immediate in view of (4.22).

5. Empirical likelihood for degenerate U-statistics with side in-formation. Side information is common in semiparametric models andthe use of side information may lead to better statistical inference such asmore powerful tests or more efficient estimation. Empirical likelihood ap-proach is especially convenient in incorporating side infomration expressedvia equalities. In the present section, we shall study empirical likelihoodof degenerate U-statistics with side information which is expressed via U-statistics of the same order of degeneracy. Specifically, we shall be con-cerned with vector U-statistics of components which have the same orderof degeneracy. With this formulation we in fact do not lose any general-ity as we justify next. If instead side information is expressed by a squareintegrable function g satisfying the equality E(g(X)) = 0, then the side in-formation can be expressed by the degenerate U-statistic Un(hg) with thekernel hg(x1, x2) = g(x1)g(x2), x1, x2 ∈ X , degeneracy clearly following fromthe equality E(hg(x1, X2)) = 0, x1 ∈ X . In all the examples considered, de-generacy occurs in the case that 0 = E(h21(X1)) < E(h22(X1, X2)) < ∞.Moreover, in this case there exists nice formulas of the limit variables,see Remark 4.1. Thus we shall focus on the empirical likelihood of suchdegeneratge U-statistics. To this end, let h(k) : Xmk 7→ R, k = 1, · · · rbe measurable functions with all mk ≥ 2, and set Un(h(1), · · · , h(r)) =(Un(h(1)), · · · , Un(h(r))

)>. Let V

(k)nj , j = 1, · · · , n be the jackknife values

of the U-statistic Un(h(k)) for k = 1, · · · , r, and set Vnj = (V(1)nj , · · · , V

(r)nj )>.

We now consider the empirical likelihood

Rn(h(1), · · · , h(r)) = sup n∏j=1

nπj : π ∈Pn,n∑j=1

πjVnj = 0.

Page 20: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

20

In order to obtain the limit distribution of the empirical likelihood, we shallfirst derive the limit distribution of the vector U-statistic. Our approach isbased on orthogonal expansions for kernels, see, for example, Dewan and Rao

(2000) and Serfling (1980). Stated below is the result. Recall h(k)2 (x1, x2) =

E(h(k)(x1, x2, X3, · · · , Xmk)) − E(h(k)), x1, x2 ∈ X is the centered version.

For a, b ∈ Rm, define the componentwise product of a and b by a × b =(a1b1, · · · , ambm)>.

Theorem 5.1. Suppose that h(1), · · · , h(r) have finite second moments

such that E(h(k)2 (x,X1)) = 0, x ∈ X and E(h

(k)2 (X1, X2)

2) > 0 for every k.Let 1 = b0, b1, b2, . . . be an orthnormal basis of L2(X ,A, P ). Then

nUn(h(1), · · · , h(r)) =⇒ ξ(h(1), · · · , h(r)) :=∞∑j=1

λj(h(1), · · · , h(r))(Z2

j − 1),

where λj(h(1), · · · , h(r)) = (a/2)×(λj(h

(1)), · · · , λj(h(r)))> with a =(m1(m1−

1), · · · ,mr(mr−1))>

and λj(h(k)) being the Fourier coefficients of h(k)(x, y)

w. r. t. the basis, i.e.,

λj(h(k)) =

∫h(k)(x, y)bj(x)bj(y) dP (x)dP (y), j = 0, 1, · · ·

Proof. The proof is similar to that of the Theorem at page 194 of Serfling(1980), and here we give sketches and provide the details when there appearto be essential differences. For brevity, we shall give the proof for the caser = 2, and set h = h(1), g = h(2), λj = λj(h), µj = λj(g). Followingpage 190 of Serfling (1980), for the degenerate U-statistic Un(h), define its“projection” to be Un(h) given by

Un(h)− θ1 =∑

1≤i<j≤nE(Un(h)|Xi, Xj)−

(n

2

)θ1,

where θ := (θ1, θ2)> = (E(h), E(g))>. Then (2) and (3) at page 190 of

Serfling (1980) with c = 2 read

(5.1) Un(h)− θ1 =m(m− 1)

n(n− 1)

∑1≤i<j≤n

h2(Xi, Xj),

andE((Un(h)− Un(h))2) = O(n−3).

Page 21: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 21

The above of course also hold for kernel g. Thus

E(|Un(h, g)− Un(h, g)|2) = O(n−3),

implying n(Un(h, g)−Un(h, g))p→0, hence the limit law of n(Un(h, g)− θ) is

identical to that of n(Un(h, g) − θ). Consequently, we are left to show thatthe random vectors

n(Un(h, g)− θ) =1

(n− 1)

∑1≤i<j≤n

a× (h2, g2)>(Xi, Xj)

converge in distribution to the random vector

(a/2)× (Yh, Yg)>.

Here Yh =∑∞

j=1 λj(Z2j − 1) and Yg =

∑∞j=1 µj(Z

2j − 1). Putting

Tn =1

n

∑i 6=j

h2(Xi, Xj), Sn =1

n

∑i 6=j

g2(Xi, Xj),

we haven(Un(h, g)− θ) =

n

2(n− 1)a× (Tn, Sn)>.

Thus our objective is to show

(Tn, Sn) =⇒ (Yh, Yg).

This can be carried out by showing

E(exp(i(tTn + sSn)))→ E(exp(i(tYh + sYg))), s, t ∈ R,

where i2 = −1. The above limit is clearly implied by the following threeinequalities.

(5.2) |E exp(i(tTn + sSn))− exp(i(tTnK + sSnK)) | < ε, n→∞,

(5.3) |E exp(i(tTnK + sSnK))− exp(i(tYhK + sYgK)) | < ε, n→∞,

for arbitrary fixed K, and

(5.4) |E exp(i(tYhK + sYgK))− exp(i(tYh + sYg)) | < ε, K →∞,

for arbitrary ε > 0 and s, t ∈ R, where

TnK =1

n

∑i 6=j

K∑k=1

λkbk(Xi)bk(Xj), SnK =1

n

∑i 6=j

K∑k=1

µkbk(Xi)bk(Xj),

Page 22: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

22

YhK =

K∑k=1

λk(Z2k − 1), YgK =

K∑k=1

µk(Z2k − 1).

Let us first state a few pertinent properties of othorgonal expansions. Weexpress

(5.5) h2(x, y) =

∞∑k=1

λjbk(x)bk(y)

in the mean square limit of∑K

k=1 λjbk(x)bk(y) as K →∞, that is,

(5.6) limK→∞

E(h2(X1, X2)−

K∑k=1

λjbk(X1)bk(X2))2

= 0.

In the same sense,

(5.7) h1(x) =

∞∑k=1

λkbk(x)E(bk(X)).

Therefore, since h is degenerate,

E(bk(X)) = 0, k = 1, 2, · · · .

Furthermore by the property of projection and othornormal basis,

(5.8) E(h2(X1, X2)−

K∑k=1

λjbk(X1)bk(X2))2

= E(h22(X1, X2))−K∑k=1

λ2k,

wherence by Parseval’s identity,

(5.9)

∞∑k=1

λ2k = E(h22(X1, X2)) <∞.

Obviously (5.5)–(5.9) also hold for kernel g. By the representation (5.5),

Tn =1

n

∑i 6=j

∞∑k=1

λkbk(Xi)bk(Xj), Sn =1

n

∑i 6=j

∞∑k=1

µkbk(Xi)bk(Xj).

Using the inequality

(5.10) | exp(iz)− 1| ≤ |z|,

Page 23: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 23

for arbitrary z, we have

|E(exp(i(tTn + sSn)))− E(exp(i(tTnK + sSnK)))|≤ (|s|+ |t|)

(E1/2(|Tn − TnK |2) + E1/2(|Sn − SnK |2)

).

This, (5.9) and the following two inequalities from (5) at page 197 of Serfling(1980) immediately yield (5.2).

(5.11) E((Tn − TnK)2) ≤ 2∞∑

k=K+1

λ2k, E((Sn − SnK)2) ≤ 2∞∑

k=K+1

µ2k.

Following pages 198 – 199 of Serfling (1980), we write

(TnK , SnK) =K∑k=1

(λk, µk)(B2kn − Ckn),

where

Bkn = n−1/2n∑i=1

bk(Xi), Ckn = n−1n∑i=1

b2k(Xi).

An application of the central limit theorm yields

(B1n, · · · , BKn) =⇒ N(0, NK×K), n→∞.

Also by the strong law of large numbers,

(C1n, · · · , CKn)as−→(1, · · · , 1), n→∞.

Consequently, by Slusky’s theorem, for arbitrary fixed K,

(TnK , SnK) =⇒K∑k=1

(λk, µk)(Z2k − 1) = (YhK , YgK).

This implies (5.3). Finally, by (5.10),

|E(exp(i(tYhK + sYgK)))− E(exp(i(tYh + sYg)))|≤ E| exp(i(t[YhK − Yh] + s[YgK − Yg]))− 1|≤ (|s|+ |t|)(E|YhK − Yh|+ E|YgK − Yg|)≤ (|s|+ |t|)

(E1/2(|Yh − YhK |2) + E1/2(|Yg − YgK |2)

)≤ 2(|s|+ |t|)E1/2(|Z2

1 − 1|2)([ ∞∑

k=K+1

λ2k

]1/2+[ ∞∑k=K+1

µ2k

]1/2).

This immediately yield (5.4) in view of the convergence (5.9) and finishesthe proof.

Page 24: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

24

Theorem 5.2. Suppose that h(1), · · · , h(r) have finite fourth moments

such that E(h(k)2 (x,X1)) = 0, x ∈ X and E(h

(k)2 (X1, X2)

2) > 0 for every k.

Assume the r×r matrix W with (i, j) entry Wij = Cov(h(i)2 , h

(j)2 ) is positive

definite. Then

−2 log Rn(h(1), · · · , h(r)) =⇒ ξ(h(1), · · · , h(r))>W−1ξ(h(1), · · · , h(r)),

where ξ(h(1), · · · , h(r)) is given in Theorem 5.1.

Proof. For brevity, we shall give the proof for the case r = 2, and set h =

h(1), g = h(2), λj = λj(h), µj = λj(g), vnj = V(1)nj and unj = V

(2)nj . The proof

shall be carried out by applying Theorem 4.1. This amounts to verifyingthe three conditions in (4.11) of the theorem with Tnj = (vnj , unj)

>, r = 2,ν = 2, Nn = n and W (ν) = W given in (5.12). We shall use the establishedresults in the proof of Theorem 4.2 in which the jackknife values “Vnj” inthis theorem are verified to satisfy the three conditions. Specifically, “Vnj”have been shown to satisfy (4.18), so that we apply “Vnj” to be vnj and unjwith m = 2 to conclude that Vnj = (vnj , unj)

> satisfies the first condition.Since “Vnj” have been proved to satisfy (4.20), we again apply “Vnj” to bevnj and unj with m = 2 to obtain

n∑j=1

v2nj = 4σ2h,

n∑j=1

u2nj = 4σ2g

Thus these and the following equality imply the third condition in (4.11).

(5.12)

n∑j=1

vnjunj = 4Cov(h, g),

which shall be proved next.

References.

[1] M. A. Arcones (1996). The Bahadur-Kiefer Representation for U-Quantiles. Ann.Statist. 24 1400 – 1422.

[2] K. Doksum and A. Samarov (1995). Nonparametric Estimation of Globabl Functionalsand a Measure of the Explanatory Power of Covariates in Regression. Ann. Statist. 231443 – 1473.

[3] P. M. Heffernan (1997). Unbiased Estimation of Central Moments by using U-statistics.J. R. Statist. Soc. B 59 861 – 863.

[4] J. S. Nobre, J. M. Singer and M. J. Silvapulle (2008). U-tests for variance componentsin one-way random effects models. Beyond Parametrics in Interdisciplinary Research:Festschrift in Honor of Professor Pranab K. Sen Vol. 1 197 – 210.

Page 25: A Jackknife Empirical Likelihood Approach To Goodness-of ......La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others. It is especially convenient to incorporate

GOODNESS OF FIT 25

Indiana University Purdue Universityat IndianapolisDepartment of Mathematical SciencesIndianapolis, IN 46202-3267USAE-mail: [email protected],??