Semiparametric Likelihood Ratio Inference RevisitedSemiparametric Likelihood Ratio Inference...

of 28 /28
Semiparametric Likelihood Ratio Inference Revisited December Abstract 2000 We extend the Semiparametric Likelihood Ratio Theorem of Murphy and Van del' Vaart for one-dimensional to Euclidean paramet(;rs of auy dimension. The as:VIrlptotic distribution of the likelihood ratio statistic for testing a k-dimensional Euclidean paramet'"r is shown to be the usual under the null hypothesis. This result is useful not only for testing purposes but also in forming likelihood ratio based confidence regions. We also obtain the behavior of the likelihood ratio statistic under local (contiguous) alternatives; this is non-central with a non-centrality parameter involving the direction of perturbation of the null hypothesis and the efficient information matrix for the Euclidean parameter under the null hypothesis in the problem. Some applications are provided.

Embed Size (px)

Transcript of Semiparametric Likelihood Ratio Inference RevisitedSemiparametric Likelihood Ratio Inference...

  • Semiparametric Likelihood Ratio Inference Revisited




    We extend the Semiparametric Likelihood Ratio Theorem of Murphy and Van del' Vaartfor one-dimensional to Euclidean paramet(;rs of auy dimension. The as:VIrlptoticdistribution of the likelihood ratio statistic for testing a k-dimensional Euclidean paramet'"r isshown to be the usual under the null hypothesis. This result is useful not only for testingpurposes but also in forming likelihood ratio based confidence regions. We also obtain thebehavior of the likelihood ratio statistic under local (contiguous) alternatives; this is non-central

    with a non-centrality parameter involving the direction of perturbation of the null hypothesisand the efficient information matrix for the Euclidean parameter under the null hypothesis inthe problem. Some applications are provided.

  • 1 Introduction

    A rigoJwus

    first established in DERindexed an infinite dimensional considered andreal of the dass of distrihutions under are studied. It is shownthat under proper conditions the usual likelihood ratio statistic which is definedthe difference hetween the maximum values of the in the unconstrainedthe the class of distributions vary and constrained cases !

  • this is the 1 - cr 'th of the Likelihood ratiobased confidence well-behaved and much nicer than confidence sets obtainedthe distribution of MLE's since these involve unknown of theunderl.Jrm.g distribution from the data. In the likelihood ratio context this is obviateddue the of the distribution. The other nice of likelihood ratiobased confidence sets is that which confidencesets obtained other procedures ones the distribution of the MLE'sor the vVald method confidence ellipsoids) do not. If we take a smooth one to onetransformation of 8. the of for a fixed confidence the likelihood-ratiobased confidence set for the transformed is pn~cisely the of the likelihood- ratiobased confidence set for the original () under the transformation. likelihoodratio based confidence sets are more "data-driven". The shape of likelihood-based confidencesets generally reflects emphasis in the observed data or in other words, the data dictate the

    of the region. This is a property that other methods often lack and is especially true forthe vVald-based method which always yields confidence ellipsoids centered around the MLE. SeeBANERJEE for a more extensive discussion of the above issues. A discussion of likelihoodratio based estimation of confidence regions is also available in MEEKER AND ESCOBAR (1995).

    Another problem that we address is the power issue. vVe derive expressions for the power ofthe semipararnetric likelihood ratio test under local alternatives. To this end we typically assumeour parameter to vary in a Hilbert or a Banach space; if is the true value of the parameter weconsider alternatives of the form + o( In). These correspond to the curve th o(t)through the parameter space (at least small values of t) with gradient 11, at the point O. Moreaccurately it is really the space of gradients that needs to be embedded in a Hilbert or Banachspace. If we now have Hellinger differentiability of the dass of probability measures along thispath the local alternatives are also contiguous alternatives and contiguity theory can be used toshow that the LRT convergE~s under this sequence of (contiguous) alternatives to a non-centralwith k degrees of freedom. Expressions for the non-centrality parameter can then be obtaine,d;these involve a form in the efficient information matrix for the corrE~SpiOlltdjngproblem and parallel the obtained in the parmnetric case.

    follows. In 2 weon diffE:rentiable functionals

    . Statements results are in ::-iectron 3. These {'n1rnr,ri"",selTllparalrnE~tr:lc likelihood ratio statistic under the null and that

    alternatives. In of these results. Section 5par,amE,ter of mterest

  • 2 SemiparametricFunctionals.

    Likelihood Ratios and Differentiable

    Semiparametric Likelihood Ratios:

    Consider a situation where the observations X , are random from thedistribution where '1/; to set w. \Vf~ assume a on the set WW will be seen to be a subset of a Banach or Hilbert the definition of the likelihood

    for an observation (in many cases is a minor modification of where .)is defined as

    .)dlt '

    p being some common dominating a - finite measure), the likelihood ratio statistic for t"0"+"',0' thenull = On is by:

    lrtn (On) Xi))(log(lik(1j;o)))

    where and are the unconstrained and constrained maximizers of the likelihood respectively.In the cases we consider, 0 takes values in

    On Differentiable Functionals:

    lies in a subset III of Hilbert space 'It anda common (j - finite measure p. Also assume HeUJ]lgf~r diffi~rentiabilij.v

    certain collection

    Assume noware dominated

    with rpsnp,~t


  • map

    cOllsi,jer (J as a from the spacenornJI-diftE~reJ[ltiab.le if there exists a linear map a

    where has gnHl1,entdue to Van del' Vaart

    pathi,vis:e norm differentiabilt:v


    and failure of this (Jondition implies the non-existence of estimators of (J. vVhen the conditionfor is we that L = j) 0 A so that A * 0 j)*. It is now easily shownthat the above necessary and sufficient is to the for each 1 :::: j :::: It:there exists a solution to the equation A*x ajO in We denote theunique vector of solutions go = , . , , it is also the case that for each j,

    L )) (where Cj denotes the j'th canonical basis vector) and assumes valueswe conclude that

    its values at thelinear

    We the etticleJllt iIlt1u.el1l~e illn{:tloln and it with the vector go,canonical basis ve(~tors. In many selni])ar'anletric situations it is this vector that nrc)VHlf'S

    the centered and MLE of (J, the interest.

  • conclude the information is dominatedin this case is the efficient influenee function for the estimation of t~ at paralne1tersuppose that ~A is invertible. Then definE: Thenthat for eachi. the fact that we find that

    expn~ssibJ.e as = A* where ho = . Thusfn'nn'l"r:,filp direct-ion in the spaee of for thE: estimation of K, in the sense

    dimensional submodel with this a multiple) as achieves the upper boundfor the information bound. namely the variance of Henee JB in this situation. Tosee this note that

    Also II \ 11211 AO II . the information bound for a submodel with as gradient is

    The notions introduced here will be useful in characterizing the expressions for the power underlocal alternatives.

    3 Asymptotic distribution of Semiparametric Likelihood RatioStatistics: statements of results.

    \Ve first state a theoremthe null hY1Jothelsis.

    the as;ymptotie distribution of the JO~;-nKennOO(lratio statistic under

    3.1 General Semiparametric Likelihood Ratio Statistic Theorem

    the value pm'anlet,or with

  • • 1\.2Vof


    assume that f(l!'there exists a surface

    neigt,bc)rfJood U of and every invalues in \[t and that satisfiE~s the rolIIO,,'m):!;:

    The surface t satisfies:

    t t,

    is twice continuously differentiable in t for every with derivatives i and l with respect to t.

    (d) For the derivatives i and l in (c)

    i(-;fJo, =l.


    for any random 0 -t p fJo and0,under Po and

    10 (3.5)

    (l(·; fJo, l) o. (3.6)

    • 1\.3 Suppose that both the unconstrained and constrained maximizers of the likelihood,and are consistent under Po.

    Theorem 3.1rt A.l, A. and A.S hold, then

    (On fJo) 1010 Z '"



  • paralne:ter spaet1.for 0 as definedAs in the

    in the and is the g;nLClJ,entwhieh is dosed Hilhert

    the derivative with fixed eolleetioneffieient seore function for 0 is the TH';~1C'f'r"An

    above the tbe rangeit is this effieient function that nrOVlO,,,.;

    its to the eentered and sealed MLE of O.interest. Even in situations where the of interest does not

    it is often the ease that is the efficient influenee function as defined in Van del' Vaart'sdiltel'ertti,lbiility theorem. This of eourse to situations where the of interest 0 isoath'mE;e and ean be estimated at a but this is theease in situations of interest. It is however not the ease that estimators exist forthe nuisanee parameters as well.

    3.2 Power under Local (Contiguous) Alternatives.

    The theorem eharacterizes the limiting power of the likelihood ratio test under a sequeneeof loeal alternatives eonverging to the true value of the parameter.

    Theorem 3.2 Cons'ider the likeZ,ihood ratio statistic Irtn(Oo) for testing the null 0 00 ,Let i.n the mtll hypothesis be the true value of the parameter and consider local alternatives ofthe form hn o(n- at stage n. Ass1],7ne that the rnodel is Hellinger alongthe CUT've l1J'ith derivative operator A to the repr'esentation in 1) Alsoassume that eonditions A.l to A.4 of Theorem S.l hold under . Then

    • Under this sequenee of local alternatives the likelihood ratio statistic converges indistrib11tion to Z where Z folloV}s a distribution with where10 the covariance rnatr'i;r: 1 and c is the covariance belw(;len Ah and I in


    • Athen also

  • th(~on"m remain valid for a Banach norm diliel'(m:ti,lbiJit)is characterized in the acs in the Hilbert spaceproper and effici(;nt influence functions bdmve in the same but

    since there is not a natural identification between a Banach space and its dual.

    4 Proofs of Results in Section 3.

    We first nr~,,,pr,t a of Theorern ;).1.

    Proof of Theorem 3.1: We show that at each it is the case that:

    + op(l) :::

    where converges in distribution to .Th(~ terms that appear in the minorant and themajorant rwed not be the same. The result of the theorem then follows immediately.

    \Ve have:

    Irtn(eo) (In(Iik(~, .))) - (In(Iik(7/~o, .)))

    < 2nlPn[ln(Iik(~Jo ' .)) - In(Iik(~Joo 7)

    since ~J and [In(Iik(11)oo(~),.))] < [In(Iik(~~o, .))]. The last inequality is aconsequence of the fact that e( ~Joo = 00 and by definition is the maximiser of [In(Iik(IjJ,over the null hypothesis e(IjJ) = eo. We thus have the following string of inequalities:

    (eo) < [1(-; iJ,.J), iJ, + iJ .



    with t,them. Now

    t, where alies between aand line JOluirlg

  • In Hence

    in the statement of the theoremand also the fact that (j and

    ) ShllW·lTifT mlnlE~d111teJy that the ~1t"rJ"..,f"1f'Thus we have shown that:

    < 1)

    where ,,/ii((} ( 0 )The reverse inequality follows in the following way_ Recall that is the global maximizer of

    the and since 00 _ \eVe then have


    (In(lik('0, -)) )

    [In(lik( 'Ij}fJ ' -))

    0, 1(-; 00 ,



    ') I-) iJ


  • conditions and the first term is We thus the IOIlOWIIlI; mequalJl y


    :\ow and III COlt1jllmcticlIl finish the

    A few more words about the of the theorem. :\ote that the above usesthe surfaces only for andljJ = and for t in a neighborhood of eo. Itassumes that to U and that¢ and belong to F. This is than what

    and for and of efor eo guarantees; namely that, with probability terlCllngto 1 , belong to F and ebelongs to U eventually. But this is all that is needed to maketlH~ of the theorem work The slightly stronger assumption we use, helps in obviatingsome minor technicalities while keeping the key ideas of the proof intact.

    Proof of Theorem 3.2: Hellinger differentiability of the model along the curvethat


    as t --+ O. Now set

    ./ _'-"--_--'-"- _ ~(Ah)dP~/2] 2 --+ 0t 2 11'0

    This is the log likelihood ratio based on observations Xl, X 2 , .•. ,Xn at stage n . By Lemma 3.10.11of VAN DEll V."'ART AND WELLNEH (1996) we a LAN expansion for the log - likelihood ratio asshown below :



    corlvergencl~s HI prcibatnllt)

    and areconclude that

  • CLT cOIrjmrlction with :5h.JltSl\:y's theorem

    ( )under with J1 . and

    ( c )Here

    c = Cov 7) l) 7)

    LeCam's third lemma

    under the sequence

    1p,=(c, 2

    the IlmltlIlg, page


    Note that

    This shows that :

    under . \Ve now note that all the conditions in Theorem 3.1 hold under

    m()c!lhcatllon being that


    now converges m distribution to h C 1}. onsequent y which III this ease is stillconverges to

    Note that

  • Furthermon~ ifcor:npcmeJnt of f). n;:u'.]c'ilv

    direetion forshc)wimg that = A" Hence

    thE] i'th


    , ...

    This cOInplett]]s the of Theorem 3.2.

    5 The Special Case of Partitioned Parameters

    In this seetion we obtain expressions for the power of the likelihood ratio test under localalternatives when the parameter of interest has two components, the first, f), belonging to aEuclidean space and the second, TI, being infinite dimensional.

    Let P = {PO,I) : f) (-), TI ti} where ('3 is an open subset of and }[ is some subset of aBanach space Q. Consider a fixed set of paths in ti of the form 171 = 11 + + o(t) with ,13 E Q. LetC denote the closed linear span of the . Now consider paths of the form (f) + th, TIt) E e x ti,and assume Hellinger differentiability with respeet to this set of paths. Thus

    .I ]2

    1"T " ] /22 (to h + IfJ,3)dPo,TI o.

    Now, x C is a Banach space itself with the product topology and in this situation the scoreop.erator A : C --+ L:g is


    . Note

    where If) E is the score funetion for f) and the score oper

  • the


    orttlO~;onal pl'oJeetlCHl of

    function for () and the efficient ini'orJmatiem


    Ii Ii

    . Also considerthat


    mappaJrarneter; thus


    il. a bounded linear map from C --+ As before a necessarv and sufficient conditionfor pathwise norm differentiability of K is the existence of s m for iI, .. ,m such that

    for each i. arguments similar to VAN orm VAART (1991) we can showthat the necessary and sufficient condition translates to:

    C I(())) ,

    which is the ease if is invertible. In what follows we shall assume that this is the case, vVe shallalso assume that Xl (B) is of full row rank (m). The efficient influence function

    before identified with the values at the basis vectors) is then given 90 = X , and thedispersion matrix for 90 which acts as the information bound for the estimation of X is given byJ Xl (B) Ii,;jX


    (Bf and is invertible under our assumptions.

    Denote the paramE,ter TI) by Consider the problem of testing the null X XOag,unst X Xo· Let (Bo, E 7-l0 be the true value ofthe parameter. Assume that all the conditionsof the LRT theorem hold with (3.3) being satisfied with 90 ( keep in mind here that X here

    the role of B in Theorem 3.1 while plays the role of thus we have:

    with I consider

  • ~When in manyfix the nOIH::eIIITaUI\ par;unet(~r rednces to

    is the idEmtity matrix and the expn~ssjlon

    is the efficient information for eS1dnlaldng () pararne~ter value

    6 Applications of the Results in Section 3

    Here we some of the results of Section 3. As an application of Theorem :3.1we derive distribution of the log-likelihood ratio statistic for the

    re!~rE':ssion pararneter in the Cox Model with right censoring following the formulationin VAN DER VAAHT (1998). The one-dimensional version of this model is treated briefly in MURPHYAND VAN DER VAAHT (2000) from the angle of semiparametric profile likelihood rnethods. Thecurrent status model with a d dimensional Euclidean parameter can also be treated in theframework of this theorem along the same lines as in the one-dimensional case treated in MURPHYAND VAN DER VAAHT (1997) and hence is not discussed here (for details see BANERJEE (2000)). For atreatment ofthis model in the profile likelihood framework see MURPHY AND VAN DER VAAHT

    vVe also illustrate the use of Theorem ~l.2 in the Cox Model with Right Censoring and also inthe Double Censoring Model and Mixture Model discussed in MURPHY AND VAN DER VAAHT (1997).For more applications of this theorem see BANERJEE (2000).

    6.1 The Cox :Model with Right Censoring.

    In what f()llows whenever we mention densities without specification they are assumed to be withrespect to Lebesgue measure.

    Consider n i.i.d observations D i , where is the failure time of the i'th individual.Di the observation time and is a k - dimensional vector of covariates. Let the conditionaldDnQ'~" of D Z z be denoted by . Let denote the marginal dDY,,,it,,

    ElOlnT[Ilcltlng measure v. vVe now nrake the aretIlil'OllgrlOllt in all that follows:

    • A.l the marginal deIlsity of Z is strlctJly nnQiti,rrJ with support a cornpact subset of

    • nn'Qitivp density on the comIJ

  • our"Aclf"m mass

    i.i.d observationsch,ir'C P(C /(1and is strictly the made above condition is crucial to the aEvrrrpi;oticsin that it entails the boundedness of a certain function away 0 that in turn thecontinuous invertibility of the information operator for the nuisance parameter leading to an explicitform for the efficient score function in this problem.). Also, under our P(T 2>P(T> > O.

    Straightforward computations readily yield the joint density of (Y, Z) as follows:

  • likelihood cumulative hazard that have a at the observed Yimatter. It is from the for the likelihood that one can cumulative

    hazard function constant between two successive observation times. Under these constraintsthe MLE of IS uniqll1eJy determined.

    \Ve now have the fiJl]mvir:ltr theoreuL

    Theorem 6.1 Consider the null {mpolnes?s Ho : 0 H] : 0 :f the basisi. d observations the distrilmtion X. Let denote the likelihood mtio statl:stic

    the mtio as based on these n observations where the likelihoodone observat'ion is by 6.14. Let E Ho be the tnte value of the and assume

    has a continuous Lebesgv.e /\0. Denote the dist1'ibution X under (00 ,under Po ,

    lrtn(Oo) --+d W

    whe1'e TV is a random va1'iable. FU1'thermore, if we consider the following sequence of localattematives Pn whe1'e Pn is the dist1'ibution of X unde1' (00 hdvn, (l/vn) .hi h2dAo) with

    being a bovnded function ,in L2(Ao) then

    11't" (00 ) --+d}V un£leT Pn

    where 1S a ra,ndom, vaTiable with non-centrality pararneter' ~ = hI' Ioh] and 10 is the efficientinfonnation matri:J: for the Euclidean pammeter 0 in th'is problem unde1' the tnte parameter value(00, Ao) and is e:J:plicitly defined in what follows.

    Proof: It can be shown under the assumptions made above that the MLE (en, is consistentfor (0o, under the product of the Euclidean topology and the topology of uniform convergenceon T] VAN DER VAAHT (1999)). It is also the case that the MLE of A in the constrainedcase (a = ( 0 ) and denoted by converges in probability to Ao under the topology of uniformconvergence.

    the above model is lieHJJ]gt~rthe set of paths where at =function on has del1sJt:yof this is a for t sujtticielJitly


    panuneter value with toa bounded measurable

    (in view of the boundednessoper,ltcH' with to this set of

    We StJ'ail;>;hffiJJl'w,lrd diff

  • and this is that

    is the score function for () the true paralneter value and IS

    pararne1ter value. It isand is the score for the nuisance paralYJetershow that A is a continuous linear map from

    Now, the continuity of A that henceforth referred to as is a continuouslinear map from , Consider now the adjoint oper,ltclr Bo' Standard co'mrmtatiorls

    show that:

    B~Bo >

    Note that I" continuous oper;ltclr from to itself. Also

    (l(Y ~

    is a bounded non-negative and monotone decreasing function on

    o. Consequently ( 1 > I is bounded and hence inis continuously invertible with

    that is bounded away from

    and furthermore BoBo


    nY',~w,f'hnn operi'lt()r into the closure of the rangeis a continuous linear map, and that

  • that


    where is vector - valued. Thus

    and the efficient score function

    _ ( iVf] (y))il y----~ (y)

    io io

    6z -

    The efficient information matrix is then given by:

    and is nonsingular under mild regularity conditions. This corresponds to v( PO,A)pathwise norm differentiable at Po = POo,Ao'

    e being

    It can be shown that in this model On is asymptotically efficient with the asymptotic covariancematrix being given by the inverse of the efficient information matrix. In other words we have:


    with ) 0 and (To and thus assumption A.l of Theorem 3.1 holds in this case.A derivation of the above follows from the discussions on Likelihood Equations in 25.12

    VAAHT ( . fa acts as the influence function for e in thepresence of A the . For a see BANERJEE

    vVe now define the appn)Xim,ltely least favorable submodels in the tollmvlIJlg way. Define


  • z

    -In lik

    lik which is also by t,furthermore cOJt1tinu,owsly differentiable in t for every

    plllggmg in 00 for t and for that:

    is continuous in forIt is now to



    , 0

    This verifies condition 3.4 in Theorem 3.1 . \Ve next compute t, ( ~-;-ln lik(:r, (1/))))).



    h7dA .. 0

    'y )T *(t-O) I h dA./0


    It remains to check only conditions (3.5) and (3.6) of Theorem 3.1. These are checkedby verifying the conditions of the following lemma and by establishing that thecondition" holds a discussion of the "unbiasedness condition" whichcondition cases see \:lrmnuv

    certain "unbiasedness"withconditions whichhold.

    belowcondition entail that COJadltlCH1S

  • 0, 0,

    Verification of the above conditions in this model is not too difficult but somewhat tedious andinvolves extensive of for Donsker classes, The details aresklpr,e

  • with A nuisance paralne~terexpn~ssion:s for the power. If we considerpararne1ter and irlt)ntitu the thein Section 5, dosed linear span of thebe C Section 3.3. The score is thendefined

    this is what we takewhere A is

    with the role of Following Section 5 with X being the we find that underthe sequence of local alternatiVE~s being considered the LRT statistic converges to a randomvariable with parameter D. = . A derivation is also po:ssliblebut is not pursued here.

    This finishes the proof.

    6.2 The Double Censoring Model

    For a treatment of the double-censoring model we refer the reader to Sections 2 and 4 of MURPHYAND VAN DER VAART (1997). The double censoring model can be looked upon as a datamodel where the full (non-missing) data consists of R) , T denoting the failure Ldenoting the left censoring time and R denoting the right censoring time but one only toobserve (U, D) with U and D being measurable functions of R). In fact ULand D 1 ifT ~ U T and D 2 if L < T ~ Rand U = Rand D 3 if T > R. The parameter here is

    the distribution of the failure time; tbe distributions of the censoring times are assumed knownand T is assumed to be of. The distribution of L is denoted

    denSity gL, and that of R is denoted and assumed have densIty gR.

    ieft-contirliUclus 11ll1ctlon of bOUIJldtd variation~g under F.


  • is bounded function indistri bution with non-{:ernT,'UrLj

    true value and

    under the sequence } converges to L}. j where j is the efficient information

    c Iflh

    Proof: Let denote the of, the true distribution in ?-lo, with resne{:tsome measure. Fryr any bounded measurable h with mean 0 under consider

    of the form It so that ft. (1 + . Such a path lies in and canbe seen to be of the form .fo + o(t) . The gradient of this path is

    identified with h; so the closed linear span of the gradients can be identified with .cg HellingerdifferentiaJ)ility of the data" model, where (T, L, R) are observed, with respect to the

    set of paths considered above, follows directly froIn the fact that iF2 = f lV2 o(t) in.c2 (p.). Proposition A.5.5 of BKRW then ensures Hellinger differentiability of the double-censoringmodel, which is a missing data model with respect to the given set of paths, at the point Fo. Thescore operator, it: .cg(PFrJ has the following form:

    Ah d) r.. hdFol(d.lrO,I1: 1) h(u)l(dhdFo

    2) + -'1'-'--'-[;'-.(--. I (d = :3).1'0 u)

    and is in fact, the conditional expectation operator,

    Ah(u,d) E[h(T) I U u,D = d]

    The adjoint operator /1* : .cg(PFrJ -+ .cg(F(l) is then given by:

    E D) T = t]

    Now, with (1 and f we have:

    and follovvjng: notation :,ec1:lOn 2 have:


  • Let the followinG!: rerlreE:en1:ati:on:

    the LRT statistic convergeswhere


    a direct apJpli

  • with

    lo(Oo, is the scoreis the score operator

    ). Notice that SV.

    Here fl is some dominating measure on wrt which Z has a densityoperator for the parameter of interest 0 at the parameter value (00 , Fa andfor the nuisance parameter. Let S denote the closure of the range of inis a of the .5, the subspace of L~ ) that comprises functions of U +

    Consider now local alternatives of the form (O() + hI!fi, as in the statementof the proposition. Hellinger differentiability then ensures that the local alternatives consideredabove are contiguous. For the mixture model condition ~')sumptionA.I of Theorem 3.1 is satisfiedwith the following representation holding true:

    fi ([) - (0 ) = 10 j

    withlo (eo, (U,

    and 10 (00,. It is not difficult to show that the conditional of io(eo, )U 00 V lies in S and hence is the projection of io(eo, into S a of this

    see BANERJEE . It now follows from the discussion in Section 5 that the likelihoodratio statistic under the contiguous alternatives converges to a distribution with nOIl-(:entrculty

    This finishes the proof.

    7 Comlnents

    ~e'mrpaJrarnelTicLikelihood Ratio Theort~m as expoundedWhile

  • It

    the of the MLE's whichthe likelihood and a stron/ser

    than the one related to Theorem 3.1, naJrneJlythe COlllslsltenl:y n~qUJ:ren1(mts"unbiasedness condition"

    Discussion section in Chapter 2 of BANERJEE and also MURPHYand MURPHY VAN DER VAART . The trade-off though is that the

    as'{m,ptotic elticiellcy of the MLE for the Euclidean with the efficient score functionas the linear approximation, can actually be derived from the quadratic of the ProfileLikelihood Theorem and does not need to be assumed as in Theorem 3.1. However given the

    of maximum likelihood estimates and the asymptotic efficiency of the MLE for amodel, the likelihood ratio theorem certainly seems to provide a straightforward route to deducingthe asymptotic distribution of the likelihood ratio statistic.

    leas!therelwired conditions and leadthespace thatthe

    Potential applications of Theorem 3.1 lie in the the semiparametric regression models studiedin AND \VILD (1999) under two-phase sampling designs. Consistency hasbeen proved in VAN DER VAART AND \VELLNER and asymptotic normality and efficiencyof the MLE's have been established in BRESLOW, IVlcNENEY AND WELLNEH . Aapplication also lies in the framework of Empirical Likelihood Methods as studied

    LAWLESS ( It remains to be seen whether empirical likelihood ratioencapsulated in the semiparametric likelihood ratio framework, possibly under certain ref!lliaritvconditions. difficulty here seems to be in coming up with the surfaces

    f~rvorabl(~" sllbmod(~ls becasue of the nature in whichdefined likelihood ratio framework.

    Euclidean paralne'ter of interlc~st

    construction of likelihood basedhas been and discussed in

    bel:.:allse likelihood ratio based confidence have

  • sufficient conditions check the same refer to Snrfsec1tion 2.9.2 in Ch,ipt'~r 2

    ACKNOvVLEDGEl\1ENTS: I would like to thank my advisor .Ion who introduced meselnijHu'anletric int,ere~noe, for the many discussions I have had with him.


    M. (2000) Likelihood R.atio Inference in Regular and Non-regular Problems. Ph.DUI]liv(~rsitv of Washington.

    Begun, .I.M. , Hall, W ..I. , Huang, W. and Wellner, ,J.A. (1983). Information and asymptoticetficilmcy in parametric-nonparametric models. Ann. Statist. 11, 432-462.

    Bickel, P., Klaassen, Ritov, Y. and Wellner, .I.A. (1993). Efficient and Adaptive Estimationfor Semiparametric A:Jodels. Johns Hopkins University Press, Baltimore.

    Breslow, N. , McNeney, B. and \Vellner, ,J. A. (2000). Large sample theory for semiparametricregression models with two-phase outcome dependent sampling. Technical Report No. 381,Department of Statistics, University of \Vashington.

    Huang,.I. (1996). Efficient estimation for the Cox model with interval censoring. Ann. Statist.24 ,540-,568.

    Lawless, .I.F., Kalhfleisch, ,J.D. , and Wild, c..I. (1999) . Semiparametric ruethods for response-selective and missing data problems in regression. J. Roy. Statist. Soc. B 61 , 413-438.

    Meeker, \V.Q. and Escobar, L. A. (1995) Teaching about approximate confidence regions basedon maximum likelihood estimation. The Amer. Statist. 49, 48 - 53.

    A. (1997). Semiparametric likelihod ratio inference. Ann. Statist.:\;1'1rr.tnr S. and Van del'25 , 1471 - 1509.

    S. and Van del' A. . On likelihood. J. Amer. Statist. Assoc. 95

    .I. and .I. (Stiltist. 22, :lOO - 325.

    EnlpiriC:l1 likelihood and 6L~'L"CH e13tll"natmg e(:j11i'lt!CJIlS. Ann.

    On differentiable functionals. Arm. Statist. 19, 178 - 201.Van del'

    del' .I.A. Weak C01tlVergence ,md .1:'.JInpr1'1,caJ