Semiparametric Likelihood Ratio Inference RevisitedSemiparametric Likelihood Ratio Inference...

28
Semiparametric Likelihood Ratio Inference Revisited December Abstract 2000 We extend the Semiparametric Likelihood Ratio Theorem of Murphy and Van del' Vaart for one-dimensional to Euclidean paramet(;rs of auy dimension. The as:VIrlptotic distribution of the likelihood ratio statistic for testing a k-dimensional Euclidean paramet'"r is shown to be the usual under the null hypothesis. This result is useful not only for testing purposes but also in forming likelihood ratio based confidence regions. We also obtain the behavior of the likelihood ratio statistic under local (contiguous) alternatives; this is non-central with a non-centrality parameter involving the direction of perturbation of the null hypothesis and the efficient information matrix for the Euclidean parameter under the null hypothesis in the problem. Some applications are provided.

Transcript of Semiparametric Likelihood Ratio Inference RevisitedSemiparametric Likelihood Ratio Inference...

Semiparametric Likelihood Ratio Inference Revisited

December

Abstract

2000

We extend the Semiparametric Likelihood Ratio Theorem of Murphy and Van del' Vaartfor one-dimensional to Euclidean paramet(;rs of auy dimension. The as:VIrlptoticdistribution of the likelihood ratio statistic for testing a k-dimensional Euclidean paramet'"r isshown to be the usual under the null hypothesis. This result is useful not only for testingpurposes but also in forming likelihood ratio based confidence regions. We also obtain thebehavior of the likelihood ratio statistic under local (contiguous) alternatives; this is non-central

with a non-centrality parameter involving the direction of perturbation of the null hypothesisand the efficient information matrix for the Euclidean parameter under the null hypothesis inthe problem. Some applications are provided.

1 Introduction

A rigoJwus

first established in DER

indexed an infinite dimensional considered andreal of the dass of distrihutions under are studied. It is shownthat under proper conditions the usual likelihood ratio statistic which is definedthe difference hetween the maximum values of the in the unconstrainedthe the class of distributions vary and constrained cases !<UlUVVlU,,"-

pararne'ter to vary so that some real valued functional of the distribution assunws a pn~-specltlec1

converges in distrihution under the null to a random variable. This resultcan then be used not in hypothesis but also to construct confidence setsfor the real valued The primary result used is the Likelihood Ratio::ltatlstJIC Theorem where for each value of the an one dimensional submodel isconstructed defining curves in the parameter space). This is " approximately least favorable"in the sense that the score function obtained from the submodel at the true values turnsout to be the efficient score function in the semiparametric problem, the linear approximation(up to by its dispersion matrix) to the centered and scaled maximum likelihood estimateof the real valued parameter. The idea is then to minorize and majorize the LRT by randomquantities that are functions of the scores from the submodels and their derivatives (which can behandled easily under the assumptions of the theorem) and show that both these quantities are

. Thus the whole prohlem is essentially brought down to a dimension inwhich framework things can be handled without having to resort to complicated techniques.However some of the regularity conditions of the theorem can indeed be quite difficult to asillustrated in MURPHY AND VAN DER VAART (1997), and need extensive applications of EmpiricalProcess

undel:go natural exterlsions.

with k:beeoJme natmml extensions

The general

\eVe first extend the theorem of and Van der Vaart to a k - dimensionalpararneter. We establish under CUl<UtJt;CIl"O re~;ularJlty conditions the distribution ofthe ratio statistic under the null tly]pOl:hE~SlS

space now becom(~ surfaces and instead one-dimensional param(~tr!c snbm(xl,elsdilJnensimlal paraJme:tric submodels. conditions

this is the 1 - cr 'th of the Likelihood ratiobased confidence well-behaved and much nicer than confidence sets obtainedthe distribution of MLE's since these involve unknown of theunderl.Jrm.g distribution from the data. In the likelihood ratio context this is obviateddue the of the distribution. The other nice of likelihood ratiobased confidence sets is that which confidencesets obtained other procedures ones the distribution of the MLE'sor the vVald method confidence ellipsoids) do not. If we take a smooth one to onetransformation of 8. the of for a fixed confidence the likelihood-ratiobased confidence set for the transformed is pn~cisely the of the likelihood- ratiobased confidence set for the original () under the transformation. likelihoodratio based confidence sets are more "data-driven". The shape of likelihood-based confidencesets generally reflects emphasis in the observed data or in other words, the data dictate the

of the region. This is a property that other methods often lack and is especially true forthe vVald-based method which always yields confidence ellipsoids centered around the MLE. SeeBANERJEE for a more extensive discussion of the above issues. A discussion of likelihoodratio based estimation of confidence regions is also available in MEEKER AND ESCOBAR (1995).

Another problem that we address is the power issue. vVe derive expressions for the power ofthe semipararnetric likelihood ratio test under local alternatives. To this end we typically assumeour parameter to vary in a Hilbert or a Banach space; if is the true value of the parameter weconsider alternatives of the form + o( In). These correspond to the curve th o(t)

through the parameter space (at least small values of t) with gradient 11, at the point O. Moreaccurately it is really the space of gradients that needs to be embedded in a Hilbert or Banachspace. If we now have Hellinger differentiability of the dass of probability measures along thispath the local alternatives are also contiguous alternatives and contiguity theory can be used toshow that the LRT convergE~s under this sequence of (contiguous) alternatives to a non-centralwith k degrees of freedom. Expressions for the non-centrality parameter can then be obtaine,d;these involve a form in the efficient information matrix for the corrE~SpiOlltdjng

problem and parallel the obtained in the parmnetric case.

follows. In 2 weon diffE:rentiable functionals

. Statements results are in ::-iectron 3. These {'n1rnr,ri"",

selTllparalrnE~tr:lc likelihood ratio statistic under the null and thatalternatives. In of these results. Section 5

par,amE,ter of mterest

2 SemiparametricFunctionals.

Likelihood Ratios and Differentiable

Semiparametric Likelihood Ratios:

Consider a situation where the observations X , are random from thedistribution where '1/; to set w. \Vf~ assume a on the set WW will be seen to be a subset of a Banach or Hilbert the definition of the likelihood

for an observation (in many cases is a minor modification of where .)is defined as

.)dlt '

p being some common dominating a - finite measure), the likelihood ratio statistic for t"0"+"',0' thenull = On is by:

lrtn (On) Xi))(log(lik(1j;o)))

where and are the unconstrained and constrained maximizers of the likelihood respectively.In the cases we consider, 0 takes values in

On Differentiable Functionals:

lies in a subset III of Hilbert space 'It anda common (j - finite measure p. Also assume HeUJ]lgf~r diffi~rentiabilij.v

certain collection

Assume noware dominated

with rpsnp,~t

o

map

cOllsi,jer (J as a from the spacenornJI-diftE~reJ[ltiab.le if there exists a linear map a

where has gnHl1,ent

due to Van del' Vaartpathi,vis:e norm differentiabilt:v

c

and failure of this (Jondition implies the non-existence of estimators of (J. vVhen the conditionfor is we that L = j) 0 A so that A * 0 j)*. It is now easily shownthat the above necessary and sufficient is to the for each 1 :::: j :::: It:there exists a solution to the equation A*x ajO in We denote theunique vector of solutions go = , . , , it is also the case that for each j,

L )) (where Cj denotes the j'th canonical basis vector) and assumes valueswe conclude that

its values at thelinear

We the etticleJllt iIlt1u.el1l~e illn{:tloln and it with the vector go,canonical basis ve(~tors. In many selni])ar'anletric situations it is this vector that nrc)VHlf'S

the centered and MLE of (J, the interest.

conclude the information is dominatedin this case is the efficient influenee function for the estimation of t~ at paralne1tersuppose that ~A is invertible. Then definE: Thenthat for eachi. the fact that we find that

expn~ssibJ.e as = A* where ho = . Thusfn'nn'l"r:,filp direct-ion in the spaee of for thE: estimation of K, in the sense

dimensional submodel with this a multiple) as achieves the upper boundfor the information bound. namely the variance of Henee JB in this situation. Tosee this note that

Also II \ 11211 AO II . the information bound for a submodel with as gradient is

The notions introduced here will be useful in characterizing the expressions for the power underlocal alternatives.

3 Asymptotic distribution of Semiparametric Likelihood RatioStatistics: statements of results.

\Ve first state a theoremthe null hY1Jothelsis.

the as;ymptotie distribution of the JO~;-nKennOO(lratio statistic under

3.1 General Semiparametric Likelihood Ratio Statistic Theorem

the value pm'anlet,or with

• 1\.2Vof

and

assume that f(l!'there exists a surface

neigt,bc)rfJood U of and every invalues in \[t and that satisfiE~s the rolIIO,,'m):!;:

The surface t satisfies:

t t,

is twice continuously differentiable in t for every with derivatives i and l with respect to t.

(d) For the derivatives i and l in (c)

i(-;fJo, =l.

and

for any random 0 -t p fJo and

0,under Po and

10 (3.5)

(l(·; fJo, l) o. (3.6)

• 1\.3 Suppose that both the unconstrained and constrained maximizers of the likelihood,and are consistent under Po.

Theorem 3.1rt A.l, A. and A.S hold, then

(On fJo) 10

10 Z '"

whereZ",

Comn-lents:

paralne:ter spaet1.for 0 as definedAs in the

in the and is the g;nLClJ,entwhieh is dosed Hilhert

the derivative with fixed eolleetioneffieient seore function for 0 is the TH';~1C'f'r"An

above the tbe rangeit is this effieient function that nrOVlO,,,.;

its to the eentered and sealed MLE of O.interest. Even in situations where the of interest does not

it is often the ease that is the efficient influenee function as defined in Van del' Vaart'sdiltel'ertti,lbiility theorem. This of eourse to situations where the of interest 0 isoath'mE;e norm-ortterellti.able and ean be estimated at a but this is theease in situations of interest. It is however not the ease that estimators exist forthe nuisanee parameters as well.

3.2 Power under Local (Contiguous) Alternatives.

The theorem eharacterizes the limiting power of the likelihood ratio test under a sequeneeof loeal alternatives eonverging to the true value of the parameter.

Theorem 3.2 Cons'ider the likeZ,ihood ratio statistic Irtn(Oo) for testing the null 0 00 ,

Let i.n the mtll hypothesis be the true value of the parameter and consider local alternatives ofthe form hn o(n- at stage n. Ass1],7ne that the rnodel is Hellinger alongthe CUT've l1J'ith derivative operator A to the repr'esentation in 1) Alsoassume that eonditions A.l to A.4 of Theorem S.l hold under . Then

• Under this sequenee of local alternatives the likelihood ratio statistic converges indistrib11tion to Z where Z folloV}s a distribution with where10 the covariance rnatr'i;r: 1 and c is the covariance belw(;len Ah and I in

scaled

• Athen also

th(~on"m remain valid for a Banach norm diliel'(m:ti,lbiJit)is characterized in the acs in the Hilbert spaceproper and effici(;nt influence functions bdmve in the same but

since there is not a natural identification between a Banach space and its dual.

4 Proofs of Results in Section 3.

We first nr~,,,pr,t a of Theorern ;).1.

Proof of Theorem 3.1: We show that at each it is the case that:

+ op(l) :::

where converges in distribution to .Th(~ terms that appear in the minorant and themajorant rwed not be the same. The result of the theorem then follows immediately.

\Ve have:

Irtn(eo) (In(Iik(~, .))) - (In(Iik(7/~o, .)))

< 2nlPn[ln(Iik(~Jo ' .)) - In(Iik(~Joo 7)

since ~J and [In(Iik(11)oo(~),.))] < [In(Iik(~~o, .))]. The last inequality is a

consequence of the fact that e( ~Joo = 00 and by definition is the maximiser of [In(Iik(IjJ,over the null hypothesis e(IjJ) = eo. We thus have the following string of inequalities:

(eo) < [1(-; iJ,

.J), iJ, + iJ .1

20,

with t,them. Now

t, where alies between aand line JOluirlg

In Hence

in the statement of the theoremand also the fact that (j and

) ShllW·lTifT mlnlE~d111teJy that the ~1t"rJ"..,f"1f'

Thus we have shown that:

< 1)

where ,,/ii((} ( 0 )

The reverse inequality follows in the following way_ Recall that is the global maximizer ofthe and since 00 _ \eVe then have

>

(In(lik('0, -)) )

[In(lik( 'Ij}fJ ' -))

0, 1(-; 00 ,

(In(lik(1;00,

In(lik(1/;00"1

') I-) iJ

11)

conditions and the first term is We thus the IOIlOWIIlI; mequalJl y

>

:\ow and III COlt1jllmcticlIl finish the

A few more words about the of the theorem. :\ote that the above usesthe surfaces only for andljJ = and for t in a neighborhood of eo. Itassumes that to U and that¢ and belong to F. This is than what

and for and of efor eo guarantees; namely that, with probability terlCllngto 1 , belong to F and ebelongs to U eventually. But this is all that is needed to maketlH~ of the theorem work The slightly stronger assumption we use, helps in obviatingsome minor technicalities while keeping the key ideas of the proof intact.

Proof of Theorem 3.2: Hellinger differentiability of the model along the curvethat

implies

as t --+ O. Now set

./ _'-"--_--'-"- _ ~(Ah)dP~/2] 2 --+ 0t 2 11'0

This is the log likelihood ratio based on observations Xl, X 2 , .•. ,Xn at stage n . By Lemma 3.10.11of VAN DEll V."'ART AND WELLNEH (1996) we a LAN expansion for the log - likelihood ratio asshown below :

1

2

corlvergencl~s HI prcibatnllt)

and areconclude that

CLT cOIrjmrlction with :5h.JltSl\:y's theorem

( )under with J1 . and

( c )Here

c = Cov 7) l) 7)

LeCam's third lemma

under the sequence

1p,=(c, 2

the IlmltlIlg, page

whereis

Note that

This shows that :

under . \Ve now note that all the conditions in Theorem 3.1 hold under

m()c!lhcatllon being that

under

now converges m distribution to h C 1}. onsequent y which III this ease is still

converges to

Note that

Furthermon~ ifcor:npcmeJnt of f). n;:u'.]c'ilv

direetion forshc)wimg that = A" Hence

thE] i'th

c

, ...

This cOInplett]]s the of Theorem 3.2.

5 The Special Case of Partitioned Parameters

In this seetion we obtain expressions for the power of the likelihood ratio test under localalternatives when the parameter of interest has two components, the first, f), belonging to aEuclidean space and the second, TI, being infinite dimensional.

Let P = {PO,I) : f) (-), TI ti} where ('3 is an open subset of and }[ is some subset of aBanach space Q. Consider a fixed set of paths in ti of the form 171 = 11 + + o(t) with ,13 E Q. LetC denote the closed linear span of the . Now consider paths of the form (f) + th, TIt) E e x ti,and assume Hellinger differentiability with respeet to this set of paths. Thus

.I ]

2

1"T " ] /22 (to h + IfJ,3)dPo,TI o.

Now, x C is a Banach space itself with the product topology and in this situation the scoreop.erator A : C --+ L:g is

+

. Note

where If) E is the score funetion for f) and the score oper<ltcir for TI, the nuisance pararnei:er,

is the bounded linear map in from C l~ :that , the adJIOlIlt opera1:or

the

IS

orttlO~;onal pl'oJeetlCHl of

function for () and the efficient ini'orJmatiem

as

Ii Ii

. Also considerthat

))

mappaJrarneter; thus

aat

il. a bounded linear map from C --+ As before a necessarv and sufficient conditionfor pathwise norm differentiability of K is the existence of s m for iI, .. ,m such that

for each i. arguments similar to VAN orm VAART (1991) we can showthat the necessary and sufficient condition translates to:

C I(())) ,

which is the ease if is invertible. In what follows we shall assume that this is the case, vVe shallalso assume that Xl (B) is of full row rank (m). The efficient influence function

before identified with the values at the basis vectors) is then given 90 = X , and thedispersion matrix for 90 which acts as the information bound for the estimation of X is given byJ Xl (B) Ii,;jX

I

(Bf and is invertible under our assumptions.

Denote the paramE,ter TI) by Consider the problem of testing the null X XOag,unst X Xo· Let (Bo, E 7-l0 be the true value ofthe parameter. Assume that all the conditionsof the LRT theorem hold with (3.3) being satisfied with 90 ( keep in mind here that X here

the role of B in Theorem 3.1 while plays the role of thus we have:

with I consider

~When in manyfix the nOIH::eIIITaUI\ par;unet(~r rednces to

is the idEmtity matrix and the expn~ssjlon

is the efficient information for eS1dnlaldng () pararne~ter value

6 Applications of the Results in Section 3

Here we some of the results of Section 3. As an application of Theorem :3.1we derive distribution of the log-likelihood ratio statistic for the

re!~rE':ssion pararneter in the Cox Model with right censoring following the formulationin VAN DER VAAHT (1998). The one-dimensional version of this model is treated briefly in MURPHYAND VAN DER VAAHT (2000) from the angle of semiparametric profile likelihood rnethods. Thecurrent status model with a d dimensional Euclidean parameter can also be treated in theframework of this theorem along the same lines as in the one-dimensional case treated in MURPHYAND VAN DER VAAHT (1997) and hence is not discussed here (for details see BANERJEE (2000)). For atreatment ofthis model in the profile likelihood framework see MURPHY AND VAN DER VAAHT

vVe also illustrate the use of Theorem ~l.2 in the Cox Model with Right Censoring and also inthe Double Censoring Model and Mixture Model discussed in MURPHY AND VAN DER VAAHT (1997).For more applications of this theorem see BANERJEE (2000).

6.1 The Cox :Model with Right Censoring.

In what f()llows whenever we mention densities without specification they are assumed to be withrespect to Lebesgue measure.

Consider n i.i.d observations D i , where is the failure time of the i'th individual.Di the observation time and is a k - dimensional vector of covariates. Let the conditionaldDnQ'~" of D Z z be denoted by . Let denote the marginal dDY,,,it,,

ElOlnT[Ilcltlng measure v. vVe now nrake the aretIlil'OllgrlOllt in all that follows:

• A.l the marginal deIlsity of Z is strlctJly nnQiti,rrJ with support a cornpact subset of

• nn'Qitivp density on the comIJ<u:t interval

our"Aclf"m mass

i.i.d observationsch,ir'C<Ju lf~ss than if D has a nnSJtlve rlfincltu

nncifi"" number in case D has a nnciti"" densitytnm(:at,edJ observation time. Thus C has SU1PpiDrt

which is

> P(C /(1and is strictly the made above condition is crucial to the aEvrrrpi;oticsin that it entails the boundedness of a certain function away 0 that in turn thecontinuous invertibility of the information operator for the nuisance parameter leading to an explicitform for the efficient score function in this problem.). Also, under our P(T 2>P(T> > O.

Straightforward computations readily yield the joint density of (Y, Z) as follows:

<5, z) = ( A(y)

for x {O, x Z with respect to some dominating measure. In the above eXDrt~ss:ion

for c: < T and z) I -

1'01' purposes of de;tinimg the likelihood and the same, we can forget the terms thatdo not involve the and define the likelihood for 1 observation as:

) )

likelihood cumulative hazard that have a at the observed Yimatter. It is from the for the likelihood that one can cumulative

hazard function constant between two successive observation times. Under these constraintsthe MLE of IS uniqll1eJy determined.

\Ve now have the fiJl]mvir:ltr theoreuL

Theorem 6.1 Consider the null {mpolnes?s Ho : 0 H] : 0 :f the basisi. d observations the distrilmtion X. Let denote the likelihood mtio statl:stic

the mtio as based on these n observations where the likelihoodone observat'ion is by 6.14. Let E Ho be the tnte value of the and assume

has a continuous Lebesgv.e /\0. Denote the dist1'ibution X under (00 ,

under Po ,lrtn(Oo) --+d W

whe1'e TV is a random va1'iable. FU1'thermore, if we consider the following sequence of localattematives Pn whe1'e Pn is the dist1'ibution of X unde1' (00 hdvn, (l/vn) .hi h2dAo) with

being a bovnded function ,in L2(Ao) then

11't" (00 ) --+d}V un£leT Pn

where 1S a ra,ndom, vaTiable with non-centrality pararneter' ~ = hI' Ioh] and 10 is the efficientinfonnation matri:J: for the Euclidean pammeter 0 in th'is problem unde1' the tnte parameter value(00, Ao) and is e:J:plicitly defined in what follows.

Proof: It can be shown under the assumptions made above that the MLE (en, is consistentfor (0o, under the product of the Euclidean topology and the topology of uniform convergenceon T] VAN DER VAAHT (1999)). It is also the case that the MLE of A in the constrainedcase (a = ( 0 ) and denoted by converges in probability to Ao under the topology of uniformconvergence.

the above model is lieHJJ]gt~r

the set of paths where at =function on has del1sJt:yof this is a for t sujtticielJitly

is

panuneter value with to

a bounded measurable(in view of the boundedness

oper,ltcH' with to this set of

We StJ'ail;>;hffiJJl'w,lrd diff<~rentiation

and this is that

is the score function for () the true paralneter value and IS

pararne1ter value. It isand is the score for the nuisance paralYJetershow that A is a continuous linear map from

Now, the continuity of A that henceforth referred to as is a continuouslinear map from , Consider now the adjoint oper,ltclr Bo' Standard co'mrmtatiorls

show that:

B~Bo >

Note that I" continuous oper;ltclr from to itself. Also

(l(Y ~

is a bounded non-negative and monotone decreasing function on

o. Consequently ( 1 > I is bounded and hence in

is continuously invertible with

that is bounded away from

and furthermore BoBo

-lh(y)

nY',~w,f'hnn operi'lt()r into the closure of the rangeis a continuous linear map, and that

that

---~-----~=

where is vector - valued. Thus

and the efficient score function

_ ( iVf] (y))il y----~ (y)

io io

6z -

The efficient information matrix is then given by:

and is nonsingular under mild regularity conditions. This corresponds to v( PO,A)pathwise norm differentiable at Po = POo,Ao'

e being

It can be shown that in this model On is asymptotically efficient with the asymptotic covariancematrix being given by the inverse of the efficient information matrix. In other words we have:

1)

with ) 0 and (To and thus assumption A.l of Theorem 3.1 holds in this case.A derivation of the above follows from the discussions on Likelihood Equations in 25.12

VAAHT ( . fa acts as the influence function for e in thepresence of A the . For a see BANERJEE

vVe now define the appn)Xim,ltely least favorable submodels in the tollmvlIJlg way. Define

.\

z

-In lik

lik which is also by t,furthermore cOJt1tinu,owsly differentiable in t for every

plllggmg in 00 for t and for that:

is continuous in forIt is now to

/

',1d

, 0

This verifies condition 3.4 in Theorem 3.1 . \Ve next compute t, ( ~-;-ln lik(:r, (1/))))).

t.

loy

h7dA .. 0

'y )T *(t-O) I h dA./0

o.

It remains to check only conditions (3.5) and (3.6) of Theorem 3.1. These are checkedby verifying the conditions of the following lemma and by establishing that thecondition" holds a discussion of the "unbiasedness condition" whichcondition cases see \:lrmnuv

certain "unbiasedness"withconditions whichhold.

belowcondition entail that COJadltlCH1S

0, 0,

Verification of the above conditions in this model is not too difficult but somewhat tedious andinvolves extensive of for Donsker classes, The details aresklpr,e<i but can be found in BANERJEE . \eVe complete our proof of the distributionof the LRT the unbiasedness condition, i,e

under Po, Now, = 0 by construction, so it suffices to show that :

6h*(y)

Now,

and (}o, ((}o, AD))

Po (i(:r; (}o, ((}o, Ao))

E

z ClY h*(l1)dAo(l1))

has the same form with Ao replaced by Ao,

(}o,((}o,Ao))) by we have

l y lVII,.)-dAo, 0

}

Denoting

with A nuisance paralne~ter

expn~ssion:s for the power. If we considerpararne1ter and irlt)ntitu the thein Section 5, dosed linear span of thebe C Section 3.3. The score is thendefined

this is what we takewhere A is

with the role of Following Section 5 with X being the we find that underthe sequence of local alternatiVE~s being considered the LRT statistic converges to a randomvariable with parameter D. = . A derivation is also po:ssliblebut is not pursued here.

This finishes the proof.

6.2 The Double Censoring Model

For a treatment of the double-censoring model we refer the reader to Sections 2 and 4 of MURPHY

AND VAN DER VAART (1997). The double censoring model can be looked upon as a datamodel where the full (non-missing) data consists of R) , T denoting the failure Ldenoting the left censoring time and R denoting the right censoring time but one only toobserve (U, D) with U and D being measurable functions of R). In fact ULand D 1 ifT ~ U T and D 2 if L < T ~ Rand U = Rand D 3 if T > R. The parameter here is

the distribution of the failure time; tbe distributions of the censoring times are assumed knownand T is assumed to be of. The distribution of L is denoted

denSity gL, and that of R is denoted and assumed have densIty gR.

ieft-contirliUclus 11ll1ctlon of bOUIJldtd variation~

g under F.

1

is bounded function indistri bution with non-{:ernT,'UrLj

true value and

under the sequence } converges to non-pararne1r.er L}. j where j is the efficient information

c Iflh

Proof: Let denote the of, the true distribution in ?-lo, with resne{:tsome measure. Fryr any bounded measurable h with mean 0 under consider

of the form It so that ft. (1 + . Such a path lies in and can

be seen to be of the form .fo + o(t) . The gradient of this path isidentified with h; so the closed linear span of the gradients can be identified with .cg HellingerdifferentiaJ)ility of the data" model, where (T, L, R) are observed, with respect to the

set of paths considered above, follows directly froIn the fact that iF2 = f lV2 o(t) in.c2 (p.). Proposition A.5.5 of BKRW then ensures Hellinger differentiability of the double-censoringmodel, which is a missing data model with respect to the given set of paths, at the point Fo. Thescore operator, it: .cg(PFrJ has the following form:

Ah d) r.. hdFol(d.lrO,I1:1) h(u)l(d

hdFo2) + -'1'-'--'-[;'-.(--. I (d = :3)

.1'0 u)

and is in fact, the conditional expectation operator,

Ah(u,d) E[h(T) I U u,D = d]

The adjoint operator /1* : .cg(PFrJ -+ .cg(F(l) is then given by:

E D) T = t]

Now, with (1 and f we have:

and follovvjng: notation :,ec1:lOn 2 have:

Igh.fdp

Let the followinG!: rerlreE:en1:ati:on:

the LRT statistic convergeswhere

c

a direct apJpli<:ation of Theorem 3.2. This finishes the proof.

6.3 A JVlixture JVlodel

In this mixture model, and denoted as U and V ) Z are with hazardsZ and () Z We observe but not Z. P. the distribution of Z is a cOlnpletelyunknown distribution on (0, and we have that

./We have the following proposition.

Proposition: Let (()o, be the true value of the parameter and let be such that

<00

converges to a distribution undersequence of local

with

lo(Oo, is the scoreis the score operator

). Notice that SV.

Here fl is some dominating measure on wrt which Z has a densityoperator for the parameter of interest 0 at the parameter value (00 , Fa andfor the nuisance parameter. Let S denote the closure of the range of inis a of the .5, the subspace of L~ ) that comprises functions of U +

Consider now local alternatives of the form (O() + hI!fi, as in the statementof the proposition. Hellinger differentiability then ensures that the local alternatives consideredabove are contiguous. For the mixture model condition ~')sumptionA.I of Theorem 3.1 is satisfiedwith the following representation holding true:

fi ([) - (0 ) = 10 j

withlo (eo, (U,

and 10 (00,. It is not difficult to show that the conditional of io(eo, )U 00 V lies in S and hence is the projection of io(eo, into S a of this

see BANERJEE . It now follows from the discussion in Section 5 that the likelihoodratio statistic under the contiguous alternatives converges to a distribution with nOIl-(:entrculty

This finishes the proof.

7 Comlnents

~e'mrpaJrarnelTicLikelihood Ratio Theort~m as expoundedWhile

It

the of the MLE's whichthe likelihood and a stron/ser

than the one related to Theorem 3.1, naJrneJlythe COlllslsltenl:y n~qUJ:ren1(mts

"unbiasedness condition"

Discussion section in Chapter 2 of BANERJEE and also MURPHYand MURPHY VAN DER VAART . The trade-off though is that the

as'{m,ptotic elticiellcy of the MLE for the Euclidean with the efficient score functionas the linear approximation, can actually be derived from the quadratic of the ProfileLikelihood Theorem and does not need to be assumed as in Theorem 3.1. However given the

of maximum likelihood estimates and the asymptotic efficiency of the MLE for amodel, the likelihood ratio theorem certainly seems to provide a straightforward route to deducingthe asymptotic distribution of the likelihood ratio statistic.

leas!therelwired conditions and leadthespace thatthe

Potential applications of Theorem 3.1 lie in the the semiparametric regression models studiedin AND \VILD (1999) under two-phase sampling designs. Consistency hasbeen proved in VAN DER VAART AND \VELLNER and asymptotic normality and efficiencyof the MLE's have been established in BRESLOW, IVlcNENEY AND WELLNEH . Aapplication also lies in the framework of Empirical Likelihood Methods as studied

LAWLESS ( It remains to be seen whether empirical likelihood ratioencapsulated in the semiparametric likelihood ratio framework, possibly under certain ref!lliaritvconditions. difficulty here seems to be in coming up with the surfaces

f~rvorabl(~" sllbmod(~ls becasue of the nature in whichdefined likelihood ratio framework.

Euclidean paralne'ter of interlc~st

construction of likelihood basedhas been and discussed in

bel:.:allse likelihood ratio based confidence have

sufficient conditions check the same refer to Snrfsec1tion 2.9.2 in Ch,ipt'~r 2

ACKNOvVLEDGEl\1ENTS: I would like to thank my advisor .Ion who introduced meselnijHu'anletric int,ere~noe, for the many discussions I have had with him.

References

M. (2000) Likelihood R.atio Inference in Regular and Non-regular Problems. Ph.DUI]liv(~rsitv of Washington.

Begun, .I.M. , Hall, W ..I. , Huang, W. and Wellner, ,J.A. (1983). Information and asymptoticetficilmcy in parametric-nonparametric models. Ann. Statist. 11, 432-462.

Bickel, P., Klaassen, Ritov, Y. and Wellner, .I.A. (1993). Efficient and Adaptive Estimationfor Semiparametric A:Jodels. Johns Hopkins University Press, Baltimore.

Breslow, N. , McNeney, B. and \Vellner, ,J. A. (2000). Large sample theory for semiparametricregression models with two-phase outcome dependent sampling. Technical Report No. 381,Department of Statistics, University of \Vashington.

Huang,.I. (1996). Efficient estimation for the Cox model with interval censoring. Ann. Statist.24 ,540-,568.

Lawless, .I.F., Kalhfleisch, ,J.D. , and Wild, c..I. (1999) . Semiparametric ruethods for response­selective and missing data problems in regression. J. Roy. Statist. Soc. B 61 , 413-438.

Meeker, \V.Q. and Escobar, L. A. (1995) Teaching about approximate confidence regions basedon maximum likelihood estimation. The Amer. Statist. 49, 48 - 53.

A. (1997). Semiparametric likelihod ratio inference. Ann. Statist.:\;1'1rr.tnr S. and Van del'25 , 1471 - 1509.

S. and Van del' A. . On likelihood. J. Amer. Statist. Assoc. 95

.I. and .I. (Stiltist. 22, :lOO - 325.

EnlpiriC:l1 likelihood and 6L~'L"CH e13tll"natmg e(:j11i'lt!CJIlS. Ann.

On differentiable functionals. Arm. Statist. 19, 178 - 201.Van del'

del' .I.A. Weak C01tlVergence ,md .1:'.JInpr1'1,caJ