Non-parametric Hypothesis Testing and Confidence Intervals with Doubly Censored Data

21
Non-parametric Hypothesis Testing and Confidence Intervals with Doubly Censored Data KUN CHEN [email protected] Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA MAI ZHOU [email protected] Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Received July 20, 2000; Revised April 22, 2002; Accepted July 1, 2002 Abstract. The non-parametric maximum likelihood estimator (NPMLE) of the distribution function with doubly censored data can be computed using the self-consistent algorithm (Turnbull, 1974). We extend the self-consistent algorithm to include a constraint on the NPMLE. We then show how to construct confidence intervals and test hypotheses based on the NPMLE via the empirical likelihood ratio. Finally, we present some numerical comparisons of the performance of the above method with another method that makes use of the influence functions. Keywords: modified self-consistent algorithm, empirical likelihood ratio, influence function, EM algorithm 1. Introduction When data are doubly censored, the non-parametric maximum likelihood estimator (NPMLE) of the distribution function has no explicit form and has to be computed via some iteration (Turnbull, 1976; Chang and Yang, 1987; Zhou, 1995; Mykland and Ren, 1996; Zhan and Wellner, 1999). The estimation of the asymptotic variance of the NPMLE is even more involved (Chang, 1990). The self-consistency algorithm is a general way of computing non-parametric distribu- tional estimators when data are not completely observed (Efron, 1967; Turnbull, 1974; Tsai and Crowly, 1985). In particular, the NPMLE of a distribution function based on an i.i.d. sample must satisfy the self-consistent equation (Gill, 1989). It is a special case of the EM algorithm (Dempster, Laird and Rubin, 1977) when the parameter is the distribution function itself. In this paper we extend the self-consistent algorithm to compute the NPMLE under the constraint, F(T ) ¼ , for doubly censored data. This constrained estimator is useful in the construction of empirical likelihood ratio. As an application, we show how this in turn will allow us to find confidence intervals and test hypotheses for F(T ) with doubly censored data. Another approach to construct confidence intervals based on the NPMLE with doubly censored data is to estimate the influence function/variance of the NPMLE. But the influence functions are implicitly defined and not easy to estimate and thus may not be very practical for larger samples. We will compare the two approaches for one example in this paper. Lifetime Data Analysis, 9, 71 – 91, 2003 # 2003 Kluwer Academic Publishers. Printed in The Netherlands.

Transcript of Non-parametric Hypothesis Testing and Confidence Intervals with Doubly Censored Data

Non-parametric Hypothesis Testing and ConfidenceIntervals with Doubly Censored Data

KUN CHEN [email protected]

Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA

MAI ZHOU [email protected]

Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA

Received July 20, 2000; Revised April 22, 2002; Accepted July 1, 2002

Abstract. The non-parametric maximum likelihood estimator (NPMLE) of the distribution function with doubly

censored data can be computed using the self-consistent algorithm (Turnbull, 1974). We extend the self-consistent

algorithm to include a constraint on the NPMLE. We then show how to construct confidence intervals and test

hypotheses based on the NPMLE via the empirical likelihood ratio. Finally, we present some numerical

comparisons of the performance of the above method with another method that makes use of the influence

functions.

Keywords: modified self-consistent algorithm, empirical likelihood ratio, influence function, EM algorithm

1. Introduction

When data are doubly censored, the non-parametric maximum likelihood estimator

(NPMLE) of the distribution function has no explicit form and has to be computed via

some iteration (Turnbull, 1976; Chang and Yang, 1987; Zhou, 1995; Mykland and Ren,

1996; Zhan and Wellner, 1999). The estimation of the asymptotic variance of the NPMLE

is even more involved (Chang, 1990).

The self-consistency algorithm is a general way of computing non-parametric distribu-

tional estimators when data are not completely observed (Efron, 1967; Turnbull, 1974;

Tsai and Crowly, 1985). In particular, the NPMLE of a distribution function based on an

i.i.d. sample must satisfy the self-consistent equation (Gill, 1989). It is a special case of the

EM algorithm (Dempster, Laird and Rubin, 1977) when the parameter is the distribution

function itself.

In this paper we extend the self-consistent algorithm to compute the NPMLE under the

constraint, F(T ) ¼ �, for doubly censored data. This constrained estimator is useful in the

construction of empirical likelihood ratio. As an application, we show how this in turn will

allow us to find confidence intervals and test hypotheses for F(T ) with doubly censored data.

Another approach to construct confidence intervals based on the NPMLE with doubly

censored data is to estimate the influence function/variance of the NPMLE. But the

influence functions are implicitly defined and not easy to estimate and thus may not be

very practical for larger samples. We will compare the two approaches for one example in

this paper.

Lifetime Data Analysis, 9, 71–91, 2003# 2003 Kluwer Academic Publishers. Printed in The Netherlands.

Doubly censored data can occur when observations are subject to both right and left

censoring, i.e., the patients are only watched within a window of observational time;

otherwise we only know the event time is below or above the window.

We end this section with a formal definition of the relevant notation and basic

assumptions.

Let X1,: : :, Xn be positive random variables denoting the sample of lifetimes which are

independent and identically distributed with a continuous distribution F0. The censoring

mechanism is such that Xi is observable if and only if it lies inside the interval [Zi, Yi]. The

Zi and Yi are positive random variables with distribution functions GL0 and GR0respec-

tively, and Zi � Yi with probability 1. If Xi is not inside [Zi, Yi], the exact value of Xi cannot

be determined. We only know whether Xi is less than Zi or greater than Yi and we observe

Zi or Yi correspondingly.

The variable Xi is said to be left censored if Xi < Zi and right censored if Xi > Yi. The

available information may be expressed by a pair of random variables: Ti, �i, where

Ti ¼ maxðminðXi; YiÞ; ZiÞ and �i ¼

1 if Zi � Xi � Yi

0 if Xi > Yi

2 if Xi < Zi

i ¼ 1; 2;: : :; n:

8>>>><>>>>:ð1:1Þ

The modified (constrained) self-consistent equation for doubly censored data is derived

in Section 2. We generalize the self-consistent algorithm to deal with several constraints in

Section 3. We then explain how the constrained self-consistent estimate may be used to

obtain confidence intervals via empirical likelihood ratio in section 4. The influence

function is discussed in Section 5. We carry out simulations and apply our algorithm to

some doubly censored data in Section 6.

2. Modified Self-Consistent Equation Under a Constraint

Based on data (1.1), suppose for a given T and � we are interested to test the hypothesis

H0 : FðTÞ ¼ � vs: H1 : FðTÞ 6¼ �: ð2:1Þ

As we will see later, a key step to accomplish this is to compute the NPMLE of F(t)

under H0 from the data (1.1). We shall focus on computing the constrained NPMLE in this

section.

First let us write down the empirical log-likelihood function (log L(F )) for doubly

censored data. The log-likelihood function involves all n observations. However, we can

decompose the likelihood function into two parts: for observations before and after T.

We define wi � 0, i ¼ 1, 2,: : :, n1 corresponding to jumps of F before T; similarly �i � 0

CHEN AND ZHOU72

i ¼ 1, 2,: : :, n2 andPn2

i¼1 �i ¼ ð1� �Þ for jumps after T. Then the log-likelihood function

can be written as

log LðFÞ ¼ log L1 þ log L2

¼X

�i¼1;i�n1

logðwiÞ þX

�i¼0;i�n1

log½1� FðtiÞ þX

�i¼2;i�n1

log½Fðti�Þ

þX

�i¼1;i�n2

logð�iÞ þX

�i¼0;i�n2

log½1� FðsiÞ þX

�i¼2;i�n2

log½Fðsi�Þ

ð2:2Þ

where log L1 denotes the log-likelihood function for n1 observations before T and log L2denotes the log-likelihood function for n2 observations after T (n1 þ n2 ¼ n). Also ti is

the observed times before T; si is the observed times after T. �i is the censoring indicator

before T. �i is the censoring indicator after T.

Notice that because F(T ) ¼ �, we have

1� FðtiÞ ¼ 1� FðTÞ þ FðTÞ � FðtiÞ

¼ 1� �þ wðiþ1Þ þ : : : þ wn1 ;

and F (ti�) ¼ w1 þ w2 þ : : : þ w(i�1). The first part of above likelihood (2.2), log L1, only

involves wi, the jumps of F before time T as in (2.3). Similarly the second part of the

likelihood, log L2, only involves �i, the jumps of F after time T as in (2.4). Therefore, we

can maximize the whole likelihood in two independent steps: maximize log L1 with

respect to wi and then maximize log L2 with respect to �i.If a distribution function, say F, did maximize the log likelihood under H0, then it must

satisfy that the derivative of the log likelihood function at this F equal to 0. We define a

directional change of wi in the direction h(�) and parameterized by to facilitate the

derivative (under H0). Since wi must sum up to �, �i must sum up to (1 � �), we define

them as follows:

For jumps before time T, we define

wi ¼ wiðÞ ¼DFðtiÞ

1þ hðtiÞ�

CðÞ with CðÞ ¼Xn1i¼1

DFðtiÞ1þ hðtiÞ

: ð2:3Þ

Clearly C(0) ¼ �.For jumps after time T, we define

�i ¼ �iðÞ ¼DFðsiÞ

1þ hðsiÞ1� �

DðÞ with DðÞ ¼Xn2i¼1

DFðsiÞ1þ hðsiÞ

: ð2:4Þ

Clearly D(0) ¼ 1 � �.

NON-PARAMETRIC HYPOTHESIS TESTING 73

Remark 2.1. The form of the (2.3) and (2.4) is inspired by Owen’s (1988) results. We

could also use the following definition

wi*ðÞ ¼ D FðtiÞ½1þ hðtiÞ�

C*ðÞ with C*ðÞ ¼X

i¼1;ti<T

D FðtiÞ½1þ hðtiÞ

in the computation of derivatives, but the result of Owen (1988) points towards the form

(2.3) and (2.4). The derivative with respect to at ¼ 0 is then the partial/directional

derivative in the direction h. Since Wi ¼ D F(ti) achieves the maximum, the derivative of

the log likelihood with respect to must be zero for any direction h(t). This leads to the

equation (A.1) (see Appendix A). In particular, the choice h(t) ¼ I[t�u] for �l < u < lwill give us the modified self-consistent equation under the constraint (2.1). In the case of

right censored data without a constraint, this choice of h(�) will give rise to the familiar

self-consistent equation of Efron (1967) for the Kaplan-Meier estimator (Kaplan and

Meier, 1958). This choice of h was also used by Gill (1989). We shall use the convention

1� FðtiÞ ¼Xj:tj>ti

DFðtjÞ; FðtiÞ ¼Xj:tj�ti

DFðtjÞ:

(a) The constrained self-consistent equation for F(u) when u � T is:

FðuÞ ¼ �

n1

X�i¼1;i�n1

I ½ti � u þX

�i¼0;i�n1

1��� FðuÞ

1� FðtiÞ

(

þX

�i¼0;i�n1

FðuÞ � FðtiÞ1� FðtiÞ

I ½ti � u þX

�i¼2;i�n1

Fðminðu; ti�ÞÞFðti�Þ

): ð2:5Þ

where all ti � T.

(b) Similarly, the constrained self-consistent equation for u > T is:

FðuÞ ¼ �þ 1� �

n2

X�i¼1;i�n2

I ½si � u þX

�i¼0;i�n2

FðuÞ � FðsiÞ1� FðsiÞ

I ½si � u(

þX

�i¼2;i�n2

Fðminðu; si�ÞÞ � �

Fðsi�Þ

): ð2:6Þ

where all si > T.

For the derivation of above self-consistent equations, please see Appendix A. We

summarize the result in the following theorem.

THEOREM 2.1 The NPMLE of F (�) under hypothesis (2.1) based on doubly censored data

(1.1) must satisfy the modified self-consistent equations (2.5) and (2.6). w

CHEN AND ZHOU74

Using the above modified self-consistent equations, we can iteratively compute the

estimator of distribution F under the constrained equation: we substitute the current

estimate, F, into the right hand side of (2.5) or (2.6) to obtain a new estimator Fnew. The

initial estimator should have probability mass at locations where the NPMLE may have

jumps, and it should satisfy the constraint. We used the NPMLE without constraint but

scaled to satisfy the constraint as the initial estimator.

THEOREM 2.2 The estimator obtained by the above modified self-consistent equations is at

least a local maximizer of the likelihood function under (2.1).

Proof: See Appendix B. w

3. Modified Self-Consistent Equations Under Many Constraints

In this section, we extend the modified self-consistent algorithm to handle hypotheses

(constraints) at several times:

H0 : FðT1Þ ¼ �1;FðT2Þ ¼ �2; . . . ;FðTkÞ ¼ �k

H1 : FðTjÞ 6¼ �j for at least one j; 1 � j � k:

For simplicity, we only give the proof for k ¼ 2. Following the similar steps as in the

simple hypothesis, we can decompose the log-likelihood function into three parts: before

time T1, between times T1 and T2 and after time T2.

log LðFÞ ¼ log L1ðFÞ þ log L2ðFÞ þ log L3ðFÞ

¼X

�i¼1;i�n1

logðwiÞ þX

�i¼0;i�n1

log½1� FðtiÞ þX

�i¼2;i�n1

log½Fðti�Þ

þX

�i¼1;i�n2

logðziÞ þX

�i¼0;i�n2

log½1� FðxiÞ þX

�i¼2;i�n2

log½Fðxi�Þ

þX

ci¼1;i�n3

logð�iÞ þX

ci¼0;i�n3

log½1� FðsiÞ þX

ci¼2;i�n3

log½Fðsi�Þ:

ð3:1Þ

In the above equation, we use the following notations. wi, zi and �i are the jumps of F at ith

observation before T1, between T1 and T2 and after T2 respectively; ti, xi and si are

observations before T1, between T1 and T2 and after T2 respectively. �i, �i and ci are the

censoring indicator before T1, between T1 and T2, after T2 respectively. There are n1observations before T1, n2 observations between T1 and T2, and n3 observations after T2.

NON-PARAMETRIC HYPOTHESIS TESTING 75

The total number of observations is n1 þ n2 þ n3 ¼ n. Note that the constraints arePn1i¼1 wi ¼ �1;

Pn2i¼1 zi ¼ �2 � �1; and

Pn3i¼1 �i ¼ 1� �2.

The maximization of (3.1) can be done separately on log L1, log L2 and log L3. The

maximization of log L1 and log L3 are exactly the same as the maximization of log L1 and

log L2 in section 2. To maximize the log L2 in (3.1), we define the jumps between T1 and

T2 as

zi ¼ ziðÞ ¼DFðxiÞ

1þ hðxiÞ�2 � �1BðÞ with BðÞ ¼

Xn2i¼1

DFðxiÞ1þ hðxiÞ

;

and B(0) ¼ �2 � �1.The self-consistent equation related to log L2, for T1 � u � T2 turns out to be:

FðuÞ ¼ �1 þ�2 � �1

n2

X�i¼1;i�n2

Iðxi � uÞ þX

�i¼0;i�n2

1��2�2��1

½FðuÞ � �11� FðxiÞ

(

þX

�i¼0;i�n2

½FðuÞ � FðxiÞI ½xi � u1� FðxiÞ

þX

�i¼2;i�n2

½Fðminðxi; uÞÞ � �1Fðxi�Þ

þX

�i¼2;i�n2

�1�2��1

½FðuÞ � �1Fðxi�Þ

; ð3:2Þ

where T1 � xi � T2 (For the details, see Appendix C).

4. Empirical Likelihood Ratio Tests and Confidence Intervals

Let F be the NPMLE of F which maximizes log likelihood, log L(F ) defined in (2.2), over

all distributions. Let F denote the NPMLE of F under H0, which maximizes the log

likelihood only among distributions that satisfy H0. We define the empirical likelihood

ratio function as:

RðH0Þ ¼LðFÞLðF Þ

:

The likelihood ratio test statistic is:

�2log RðH0Þ ¼ �2logmaxH0

LðFÞmaxH0þH1

LðFÞ ¼ 2 logðLðF ÞÞ � logðLðF ÞÞ �

:

CHEN AND ZHOU76

The method described in section 2 will enable us to compute the constrained NPMLE F

which maximizes the log likelihood under H0 : F(T ) ¼ �. Using the usual self-consistent

algorithm (Turnbull, 1974), we can compute the NPMLE F without any constraint. Once

we have these estimates, we can easily compute the empirical likelihood ratio. We use chi-

square theory to carry out hypothesis testing and construct confidence intervals. For the

general theory about empirical likelihood ratio, see Owen (1988, 2001). For a more

technical result that is directly applicable to our setup here, see Theorem 2.1 of Murphy

and Van der Vaart (1997).

If the observed �2 log R(H0) is greater than 1,�2 (the 100(1 � �)th percentile of 2 with

1 degree freedom), we reject H0 at � significance level. To construct the confidence

interval for F(T ), we can test different hypotheses with fixed T and various �’s and the

confidence interval is just

� : �2log RðH0 : FðTÞ ¼ �Þ � 21;�

n o: ð4:1Þ

We can also construct confidence interval for percentile F�1(�) as follows: test many

hypotheses with fixed � and various T ’s, and form the confidence interval as:

T : �2log RðH0 : FðTÞ ¼ �Þ � 21;�

n o: ð4:2Þ

In particular, a confidence interval for the median can be obtained with � ¼ 1/2.

5. Influence Function of NPMLE and Its Estimation

Influence function (or influence curve) is a general technique to obtain the variance of a

random process (and more). In the analysis of F(�) with doubly censored samples, there are

three influence functions corresponding to right, left and non-censored observations.

Chang (1990) computed those asymptotic influence functions for the processffiffiffin

p ðFnðtÞ�FðtÞÞ, he obtained the following:

ffiffiffin

pðFnðtÞ � FðtÞÞ ¼

Z T

0

IC1ðt; sÞdqðnÞ1 ðsÞ þZ T

0

IC0ðt; sÞdqðnÞ0 ðsÞ

þZ T

0

IC2ðt; sÞdqðnÞ2 ðsÞ þ oðnÞp ð1Þ;

where

qðnÞj ðtÞ ¼

ffiffiffin

p 1

n

Xi

I½Ti�t;�i¼j

!� E

1

n

Xi

I½Ti�t;�i¼j

!" #j ¼ 0; 1; 2:

NON-PARAMETRIC HYPOTHESIS TESTING 77

From the above, we could try to estimate the three asymptotic influence functions,

ICjˆ ðt; sÞ and then estimate the variance of

ffiffiffin

p ðFðtÞ � FðtÞÞ by

Z T

0

IC2

1ðt; sÞdqðnÞ1 ðsÞ þ

Z T

0

IC2

0ðt; sÞdqðnÞ0 ðsÞ þ

Z T

0

IC2

2ðt; sÞdqðnÞ2 ðsÞ

�Z T

0

IC1ðt; sÞdqðnÞ1 ðsÞ þZ T

0

IC0ðt; sÞdqðnÞ0 ðsÞ þZ T

0

IC2ðt; sÞdqðnÞ2 ðsÞ� �2

:

However, those influence functions are only defined implicitly via Fredholm integral

equations that involve not only the unknown distribution function F, but also the unknown

censoring distributions. We plug in the (self-consistent) estimate of those distribution

functions, discretize the Fredholm integral equations into matrix equations and solve for

ICjˆ . For details see Chang (1990) and Press et al. (1993) Chapter 18.

In this approach we need to solve a matrix equation that results from discretizing the

corresponding integral equations for influence functions. If we discretize at few points,

then the estimate would not be very accurate. If we use a lot of discretizing points, we get a

big matrix and the computation is slow. For large sample sizes with many discretizing

points, this approach is very computationally expensive (to find the inverse of a large

matrix). In our experience, when censored observations (both right and left) exceed 500,

Figure 1. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Normal Distribution when Sample

Size ¼ 100.

CHEN AND ZHOU78

it becomes slow in our implementation, since we are discretizing at observed censoring

times. On the other hand, for the self-consistent algorithm we observe only slight slowing

for sample size over 2,000. Besides, the memory requirement for influence function

calculation is of n2 order while only n for the self-consistent algorithm. In other

implementation, the speed will undoubtedly vary, but the compromise between speed

and accuracy (the number of discretizing points) of influence function estimator is always

there. The accuracy of those influence functions based variance estimators is largely

unknown. On the other hand, we have pretty good indication for the 2 approximation in

the empirical likelihood case (Figure 1, 2, 3 and 4). It is also a generally accepted fact that

for smaller sample sizes the likelihood ratio tests are more accurate than the Wald tests.

In section 5, we use influence function to estimate the variance of F(T ) and construct

confidence intervals illustrated in (6.1) (6.2), (6.3), (6.4). The comparisons are made

among the different confidence intervals, using the approaches of influence function and

empirical likelihood ratio described in section 4.

6. Applications, Simulations and Examples

In this section, we will apply our methodology to test hypothesis and construct confidence

intervals with doubly censored data.

Figure 2. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Normal Distribution when Sample

Size ¼ 25.

NON-PARAMETRIC HYPOTHESIS TESTING 79

6.1. Simulation: Hypothesis Testing

In our next two simulations, we only examined the size of the test (i.e., under H0). We took

normally distributed samples of size 100 and size 25 respectively for each run and each

entry in Table 2 was based on 5,000 runs. The data were generated via (1.1) where Xi, Yi, Ziare distributed as in Table 1.

The testsH0 : F(T )¼ � based on doubly censored data Ti, �i as described in Section 2 werecarried out. Table 2 illustrates the percentages of rejecting H0 : F(T ) ¼ � at the nominal

significance level � ¼ 0.05 (or 0.10). For example, the second row were the results of

testingH0 : F(8)¼ 0.16. F(�) is cumulative distribution function of Normal distribution with

mean � ¼ 10 and standard deviation ¼ 2. The percentages were computed as the number

of �2log-likelihood ratios greater than critical value 1,0.052 ¼ 3.84 (or 1,0.1

2 ¼ 2.71)

divided by the number of runs 5,000. From Table 2 we can see the probabilities of rejecting

H0 are pretty close to the nominal level � ¼ 0.05 (or 0.10), since we have chosen the T

and � values that make the H0 hold, (Prob(N(� ¼ 10, ¼ 2) � 10) ¼ 0.5, etc.).

Similar to the above, we took exponentially distributed samples of size 100 and size 25

respectively for our final simulation. This simulation was also based on 5,000 runs. The

data were generated via (1.1) using Xi, Yi, Zi distribution listed in Table 3.

Again Table 4 shows that the percentages of rejecting H0 are close to the nominal level

� ¼ 0.05 (or 0.10), since the T and � values we chose make H0 hold.

Figure 3. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Exponential Distribution when Sample

Size ¼ 100.

CHEN AND ZHOU80

Figure 1, 2, 3 and 4 are Q-Q plots of �2log-likelihood ratios for the above two

simulations verse the (1)2 percentiles. At the point 3.84 (or 2.71), if the �2log-likelihood

ratio line is above the dashed line (45j line), the rejecting probability is greater than 5% (or

10%). Otherwise, the rejecting probability is less than 5% (or 10%).

Remark 6.1. The Q-Q plots for exponentially distributed light-censored simulations

(Figure 3(a) and 4(a)) are somehow more discrete than the others. We did 5,000

uncensored simulations with exponential distribution, the Q-Q plot is similar to Figure 3(a)

and 4(a). Since the constraint is F(T ) ¼ �, F and F only depend on the number of

observations (n1) before and (n2) after time T. If n1 and n2 are the same in two uncensored

samples, then the �2log-likelihood ratio will be the same. For this reason, uncensored and

light-censored plots look more discrete. The censoring somehow smoothed the plot, and

made the chi-square approximation better.

Table 1. Generating Normally Distributed Samples i ¼ 1, 2, : : :, n.

Xi Zi Yi

light-censored Case (10%–20% censored) N(� ¼ 10, ¼ 2) N(� ¼ 6, ¼ 2) exp (1) þ Zi þ 8

medium-censored Case (20%–40% censored) N(� ¼ 10, ¼ 2) N(� ¼ 7, ¼ 2) exp (1) þ Zi þ 5

heavy-censored Case (40%–60% censored) N(� ¼ 10, ¼ 2) N(� ¼ 9, ¼ 2) exp (1) þ Zi þ 2

Figure 4. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Exponential Distribution when Sample

Size = 25.

NON-PARAMETRIC HYPOTHESIS TESTING 81

6.2. Example: Confidence Intervals – A Case Study

We compare the confidence intervals obtained via empirical likelihood ratio as in section 4

to confidence intervals obtained by directly estimating the asymptotic variance of F(T )

and then form the Wald (1 � �) confidence interval:

FðTÞFZð1��=2Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðTÞ ;

qð6:1Þ

where Z(1��/2) is the 100(1 � �/2)th percentile of a standard normal distribution. Better

confidence intervals may be obtained on a transformed scale. We may use the log-log

transformation to obtain:

SðTÞb; SðTÞ1=bh i

; where b ¼ expZð1��=2Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffibbVar SðTÞ �q

SðTÞjlog SðTÞj

24 35; ð6:2Þ

where (S (�) ¼ 1 � F (�)); or the logit transformation to obtain:

ea

1þ ea;

eb

1þ eb

� �; ð6:3Þ

Table 2. The Percentage Of Rejecting H0 : F(T ) ¼ � At � ¼ 0.05 And 0.10.

T � Sample Size Light Censored Medium Censored Heavy Censored

10 0.5 n ¼ 100 5.38%/10.36% 5.02%/10.04% 4.66%/12.26%

n ¼ 25 4.62%/10.76% 4.82%/10.12% 4.92%/10.14%

8 0.16 n ¼ 100 5.04%/10.2% 5.02%/10.4% 5.70%/10.44%

n ¼ 25 5.12%/12.8% 6.08%/11.52% 5.58%/10.72%

Table 3. Generating Exponentially Distributed Samples i ¼ 1, 2, : : :, n.

Xi Zi Yi

light-censored (10%–20% censored) exp (2) exp (15) exp (1) + Zi + 1

medium-censored (20%–40% censored) exp (2) exp (8) exp (1) + Zi + 0.3

heavy-censored (40%–60% censored) exp (2) exp (5) exp (1) + Zi

Table 4. The Percentage Of Rejecting H0 : F(T ) ¼ � At � ¼ 0.05 And 0.10.

T � Sample Size Light Censored Medium Censored Heavy Censored

0.5 0.63 n ¼ 100 4.18%/9.06% 4.50%/9.58% 4.94%/9.98%

n ¼ 25 4.58%/9.68% 4.70%/9.74% 5.40%/10.2%

0.3 0.45 n ¼ 100 4.72%/8.64% 4.56%/9.34% 4.48%/9.38%

n ¼ 25 4.52%/10.18% 4.82%/10.18% 5.46%/10.68%

CHEN AND ZHOU82

where

a ¼ logFðTÞ

1� FðTÞ�

Zð1��=2Þ

FðTÞð1� FðTÞÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðTÞ

qand

b ¼ logFðTÞ

1� FðTÞþ

Zð1��=2Þ

FðTÞð1� FðTÞÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðTÞ :

q

It is not easy to obtain the Wald type confidence interval for the median (or other

quantiles). But we can invert the test of F(T ) ¼ 0.5 with estimated variance. Therefore, the

confidence set for median may be obtained as the set of points T that satisfy the following

condition:

T : �Zð1��=2Þ �FðTÞ � 0:5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffibbVar½FðTÞq � Zð1��=2Þ

8><>:9>=>;: ð6:4Þ

We will use the following doubly censored data as our example to construct confidence

intervals.

Turnbull and Weiss (1978) reported part of a study conducted at Stanford-Palo Alto Peer

Counseling Program (see Hamburg et al. (1975) for details of the study). In this study, 191

California high school boys were asked ‘‘When did you first use marijuana?’’ The answers

are either the exact age (uncensored observations), or ‘‘I never used it’’ which are right-

censored observations at the boys’ current ages, or ‘‘I have used it but can not recall just

when the first time was’’ which are left-censored observations. The estimated median age

of high school boys who use marijuana is 14 years old. Table 5 shows the results of this

study.

Suppose we are interested to obtain 95% confidence interval for the median age of first

time marijuana use. Table 6 shows the 95% confidence intervals for the median age with

(4.2) and (6.4). Table 7 shows four different kinds of 95% confidence intervals for � at themedian age with (4.1), (6.1), (6.2) and (6.3).

From Table 7 we can see the 95% confidence interval obtained by empirical likelihood

ratio (4.1) is narrower than any other three intervals with (6.1), (6.2) and (6.3). The two

transformed intervals obtain by (6.2) and (6.3) are wider than (4.1) but close to each other.

We do not know which transformation is better. We believe the empirical likelihood ratio

approach is better because we do not need to know what is the best transformation, and

simulations in the previous section show the chi square approximations are pretty accurate.

The Wald confidence interval (6.1) includes the numbers less than 0 and greater than 1.

These are not reasonable confidence limits for a distribution since a distribution must be

between 0 and 1. Therefore, in this example we conclude that the empirical likelihood ratio

approach works better than Wald’s approach.

NON-PARAMETRIC HYPOTHESIS TESTING 83

6.3. Software Used

The simulations were carried out using Splus 3.4 for Unix on HP workstations. The code

we used also work in R (Gentleman and Ihaka, 1996). The R code have already been

packaged and posted to the R homepage http://cran.r-project.org as contributed package

dblcens (Zhou et al., 2001).

7. Discussion

In this paper, we proposed a modified self-consistent algorithm under constraints of the

type F(T ) ¼ �. We used it to test hypothesis and construct confidence intervals for survival

probabilities and quantiles via the empirical likelihood ratios for doubly censored data.

Simulations show the resulting tests have fairly accurate size under H0. This method does

Table 5. Marijuana Use In High School Boys.

Age

Number of Exact

Observations

Number Who Have Yet

to Smoke Marijuana

Number Who Have Started

Smoking at an Earlier Age

10 4 0 0

11 12 0 0

12 19 2 0

13 24 15 1

14 20 24 2

15 13 18 3

16 3 14 2

17 1 6 3

18 0 0 1

>18 4 0 0

Table 6. 95% Confidence Intervals For Median Age(14).

method Confidence Interval

T : �2 log R(H0 : F(T ) ¼ 0.5) � 3.84 (4.2) (11.00000, 16.99991)

T : �1.96 � FðtÞ�0:5ffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðtÞ

p � 1.96 (6.4) (10, 19)

Table 7. 95% Confidence Intervals For F(14).

method Confidence Interval

� : �2 log R(H0 : F(14) ¼ �) � 3.84 (4.1) (0.2798736, 0.7195857)

[1 � S(14)b, 1 � S(14)1/b] (6.2) (0.0378062, 0.9999983)

ea

1þea; eb

1þeb

h i(6.3) (0.01719094, 0.9842504)

F(14) F 1.96

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½Fð14Þ

q(6.1) (�0.533258, 1.511003)

CHEN AND ZHOU84

not give the simultaneous confidence band for the whole distribution function. We also

illustrated how to estimate the influence function for the NPMLE F. The accuracy of this

estimator deserves more study.

Appendix A

A.1. Derivation of the Self-consistent Equation Before Time T

(a) The log likelihood function before time T :

log L1 ¼X

�i¼1;i�n1

logðwiÞ þX

�i¼0;i�n1

log½1� FðtiÞ þX

�i¼2;i�n1

log½Fðti�Þ

¼X

�i¼1;i�n1

logðwiÞ þX

�i¼0;i�n1

log ð1� �Þ þX

tj>ti; j�n1

wj

" #

þX

�i¼2;i�n1

logX

tj<ti; j�n1

wj

!:

To facilitate derivative we substitute wi () as in (2.3)

log L1 ¼X

�i¼1;i�n1

logDFðtiÞ

1þ hðtiÞ�

CðÞ

þX

�i¼0;i�n1

log ð1� �Þ þX

tj>ti; j�n1

DFðtjÞ1þ hðtjÞ

CðÞ

" #

þX

�i¼2;i�n1

logX

tj<ti; j�n1

DFðtjÞ1þ hðtjÞ

CðÞ

¼X

�i¼1;i�n1

log�

CðÞ þX

�i¼1;i�n1

logDFðtiÞ

1þ hðtiÞþ

X�i¼0;i�n1

log�

CðÞ

þX

�i¼0;i�n1

log1� �

�CðÞ þ

Xtj>ti; j�n1

DFðtjÞ1þ hðtjÞ

" #þ

X�i¼2;i�n1

log�

CðÞ

þX

�i¼2;i�n1

logX

tj<ti; j�n1

DFðtjÞ1þ hðtjÞ

NON-PARAMETRIC HYPOTHESIS TESTING 85

¼ n1log�

CðÞ þX

�i¼1;i�n1

logDFðtiÞ

1þ hðtiÞ

þX

�i¼0;i�n1

logð1� �ÞCðÞ

�þ

Xtj>ti; j�n1

DFðtjÞ1þ hðtjÞ

" #

þX

�i¼2;i�n1

logX

tj<ti; j�n1

DFðtjÞ1þ hðtjÞ

:

Now we are ready to take the derivative with respect to ,

Dlog L1

D¼ �n1

CVðÞCðÞ �

X�i¼1;i�n1

hðtiÞ1þ hðtiÞ

þX

�i¼0;i�n1

1��� CVðÞ �

Ptj>ti

DFðtjÞhðtjÞ½1þhðtjÞ2

1��� CðÞ þ

Ptj>ti

DFðtjÞ1þhðtjÞ

�X

�i¼2;i�n1

Ptj<ti

DFðtjÞhðtiÞ½1þhðtjÞ2P

tj<ti

DFðtjÞ1þhðtjÞ

:

If we set ¼ 0, the above can be simplified to

Dlog L1

Dj¼0 ¼

n1

Xn1i¼1

DFðtiÞhðtiÞ �X

�i¼1;i�n1

hðtiÞ

�X

�i¼0;i�n1

1���

Pn1i¼1 DFðtiÞhðtiÞ

ð1� �Þ þP

tj>tiDFðtjÞ

�X

�i¼0;i�n1

Ptj>ti

DFðtjÞhðtjÞð1� �Þ þ

Ptj>ti

DFðtjÞ�

X�i¼2;i�n1

Ptj<ti

DFðtjÞhðtjÞPtj<ti

DFðtjÞ:

This derivative must be zero since F is the NPMLE. Thus, we get

Xn1i¼1

DFðtiÞhðtiÞ ¼�

n1

X�i¼1;i�n1

hðtiÞ þX

�i¼0;i�n1

1���

Pn1i¼1 DFðtiÞhðtiÞ1� FðtiÞ

(

þX

�i¼0;i�n1

Ptj>ti

DFðtjÞhðtjÞ1� FðtiÞ

þX

�i¼2;i�n1

Ptj<ti

DFðtjÞhðtjÞFðti�Þ

):

ðA:1Þ

If we take h(ti) ¼ I [ti � u ], for u � T, then (A.1) becomes equation (2.5).

CHEN AND ZHOU86

A.2. Derivation of the Self-consistent Equation After Time T

Similarly we can obtain the constrained self-consistent equation for F when u > T.

(b) The log likelihood function after time T is

log L2 ¼X

�i¼1;i�n2

logð�iÞ þX

�i¼0;i�n2

log½1� FðsiÞ þX

�i¼2;i�n2

log½Fðsi�Þ

¼X

�i¼1;i�n2

logð�iÞ þX

�i¼0;i�n2

logX

sj>si; j�n2

vj

X�i¼2;i�n2

log �þX

sj<si; j�n2

vj

!:

Again substituting �i () as in (2.4),

log L2 ¼X

�i¼1;i�n2

logDFðsiÞ

1þ hðsiÞ1� �

DðÞ þX

�i¼0;i�n2

logXsj>si

�FðsjÞ1þ hðsjÞ

1� �

DðÞ

þX

�i¼2;i�n2

log �þXsj<si

DFðsjÞ1þ hðsjÞ

1� �

DðÞ

" #

¼ n2log1� �

DðÞ þX

�i¼1;i�n2

logDFðsiÞ

1þ hðsiÞþ

X�i¼0;i�n2

logXsj>si

DFðsjÞ1þ hðsjÞ

þX

�i¼2;i�n2

log�

1� �DðÞ þ

Xsj<si

DFðsjÞ1þ hðsjÞ

" #:

Taking the derivative with respect to , it follows:

Dlog L2

D¼ �n2

DVðÞDðÞ �

X�i¼1;i�n2

hðsiÞ½1þ hðsiÞ2

�X

�i¼0;i�n2

Psj>si

DFðsjÞhðsjÞ½1þhðsjÞ2P

sj>si

DFðsjÞ1þhðsjÞ

þX

�i¼2;i�n2

�1�� DVðÞ �

Psj<si

DFðsjÞhðsiÞ½1þhðsjÞ2

�1�� DðÞ þ

Psj<si

DFðsjÞ1þhðsjÞ

:

If we set ¼ 0 and the derivative must be zero, thus

Xn2i¼1

DFðsiÞhðsiÞ ¼1� �

n2

X�i¼1;i�n2

hðsiÞ þX

�i¼0;i�n2

Psj>si

DFðsjÞhðsjÞPsj>si

DFðsjÞ

(

þX

�i¼2;i�n2

�1��

Pn2i¼1 DFðsjÞhðsjÞ

�þP

sj<siDFðsjÞ

þX

�i¼2;i�n2

Psj<si

DFðsjÞhðsjÞ�þ

Psj<si

DFðsjÞ

):

If we take h (si) ¼ I [si � u] for u > T, the above equation becomes (2.6).

NON-PARAMETRIC HYPOTHESIS TESTING 87

Appendix B

Proof of Theorem 2.2: Taking the second derivative of log likelihood function log L1with respect to and setting ¼ 0 and h (ti) ¼ I[ti�u], we can simplify the derivative as

follows:

D2log L1

D2j¼0 ¼ � 2n1

�FðuÞ þ n1

�2½FðuÞ2 þ

X�i¼1;i�n1

I½ti�u þ 2X

�i¼0;i�n1

1��� FðuÞ

1� FðtiÞ

þ 2X

�i¼0;i�n1

FðuÞ � FðtiÞ1� FðtiÞ

I½ti�u �X

�i¼0;i�n1

1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u

$ %2½1� FðtiÞ2

þ 2X

�i¼2;i�n1

Fðminðu; ti�ÞFðti�Þ

�X

�i¼2;i�n1

Fðminðu; ti�ÞFðti�Þ

& 2

:

Substituting F(u) as in (2.5),

D2log L1

D2j¼0 ¼ �

X�i¼1;i�n1

I½ti�u þn1

�2½FðuÞ2

�X

�i¼0;i�n1

1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u

$ %2½1� FðtiÞ2

�X

�i¼2;i�n1

Fðminðu; ti�ÞÞF ðti�Þ

" #2

¼ �n11

n1

X�i¼1;i�n1

I2½ti�u þX

�i¼0;i�n1

1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u

$ %2½1� FðtiÞ2

"(

þX

�i¼2;i�n1

Fðminðu; ti�ÞÞ

Fðti�Þ

!2#� FðuÞ

� �2)

¼ �n11

n1

Xn1i¼1

a2i �1

n1

Xn1i¼1

ai

" #28<:9=;

whereXn1i¼1

a2i ¼X

�i¼1;i�n1

I2½ti�u þX

�i¼0;i�n1

1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u

$ %2½1� FðtiÞ2

þX

�i¼2;i�n1

Fðminðu; ti�ÞÞFðti�Þ

� �2

CHEN AND ZHOU88

and Xn1i¼1

ai ¼X

�i¼1;i�n1

I½ti�u þX

�i¼0;i�n1

1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u

½1� FðtiÞ

þX

�i¼2;i�n1

Fðminðu; ti�ÞFðti�Þ

:

By (2.5) and Cauchy-Schwartz Inequality, it is clear that the second derivative of log L1 is

non-positive. The second derivative of log L2 is also non-positive by following the similar

steps. w

Appendix C

Derivation of the self-consistent Equations Between Time T1 and T2

log L2 ¼X

�i¼1;i�n2

logðziÞ þX

�i¼0;i�n2

log½1� FðxiÞ þX

�i¼2;i�n2

log½Fðxi�Þ

¼X

�i¼1;i�n2

logðziÞ þX

�i¼0;i�n2

log½1� FðT2Þ þ FðT2Þ � FðxiÞ

þX

�i¼2;i�n2

log½Fðxi�Þ � FðT1Þ þ FðT1Þ

¼X

�i¼1;i�n2

logðziÞ þX

�i¼0;i�n2

log ð1� �2Þ þX

xj>xi;i�n2

zj

" #

þX

�i¼2;i�n2

logX

xj<xi;i�n2

zj þ �1

" #:

Substituting zi () into log L2 into above equation, we get

log L2 ¼X

�i¼1;i�n2

logDFðxiÞ

1þ hðxiÞ�2 � �1BðÞ

� �

þX

�i¼0;i�n2

log ð1� �2Þ þX

xj>xi; j�n2

DFðxjÞ1þ hðxjÞ

�2 � �1BðÞ

" #

þX

�i¼2;i�n2

logX

xj<xi; j�n2

DFðxjÞ1þ hðxjÞ

�2 � �1BðÞ þ �1

" #

NON-PARAMETRIC HYPOTHESIS TESTING 89

¼ n2 log�2 � �1BðÞ þ

X�i¼1;i�n2

logDFðxiÞ

1þ hðxiÞ

þX

�i¼0;i�n2

log1� �2�2 � �1

BðÞ þX

xj>xi; j�n2

DFðxjÞ1þ hðxjÞ

" #

þX

�i¼2;i�n2

logX

xj<xi; j�n2

DFðxjÞ1þ hðxjÞ

þ �1�2 � �1

BðÞ" #

:

Taking the derivative with respective to ,

Dlog L2

D¼ �n2

BVðÞBðÞ �

X�i¼1;i�n1

hðxiÞ1þ hðxiÞ

þX

�i¼0;i�n1

1��2�2��1

BVðÞ �P

xj>xi

DFðxjÞhðxjÞ½1þhðxjÞ2

1��2�2��1

BðÞ þP

xj>xi

DFðxjÞ1þhðxjÞ

þX

�i¼2;i�n2

Pxj<xi; j�n2

�DFðxjÞhðxiÞ½1þhðxjÞ2

þ �1�2��1

BVðÞPxj<xi; j�n2

DFðxjÞ1þhðxjÞ þ

�1�2��1

BðÞ:

If we set ¼ 0, the above equation can be simplified to

Dlog L2

Dj¼0 ¼

n2

�2 � �1

Xn2i¼1

DFðxiÞhðxiÞ �X

�i¼1;i�n2

hðxiÞ

�X

�i¼0;i�n2

1��2�2��1

Pn2i¼1 DFðxiÞhðxiÞ þ

Pxj>xi

DFðxjÞhðxjÞð1� �2Þ þ

Pxj>xi

DFðxjÞ

�X

�i¼2;i�n2

Pxj<xi

DFðxjÞhðxjÞ þ �1�2��1

Pn2i¼1 DFðxiÞhðxiÞ

�1 þP

xj<xiDFðxjÞ

:

Since the derivative must equal to zero, we rearrange the terms and obtain

Xn2i¼1

DFðxiÞhðxiÞ ¼�2 � �1

n2

X�i¼1;i�n2

hðxiÞ(

þX

�i¼0;i�n2

1��2�2��1

Pn2i¼1 DFðxiÞhðxiÞ þ

Pxj>xi

DFðxjÞhðxjÞð1� �2Þ þ

Pxj>xi

DFðxjÞ

þX

�i¼2;i�n2

Pxj<xi

DFðxjÞhðxjÞ þ �1�2��1

Pn2i¼1 DFðxiÞhðxiÞ

�1 þP

xj<xiDFðxjÞ

):

CHEN AND ZHOU90

Finally, we take h(xi) ¼ I[xi�u] for T1 � u � T2, and the above equation can be rewritten

as (3.2).

References

M. N. Chang and G. L. Yang, ‘‘Strong consistency of a non-parametric estimator of the survival function with

doubly censored data,’’ Ann. Statist. vol. 15, pp. 1536–1547, 1987.

M. N. Chang, ‘‘Weak convergence of a self-consistent estimator of the survival function with doubly censored

data,’’ Ann. Statist. vol. 18, pp. 391–404, 1990.

A. P. Dempster, N. M. Laird, and D. B. Rubin, ‘‘Maximum likelihood from incomplete data via the EM algorithm

with discussion,’’ J. Roy. Statist. Soc., Ser. vol. B 39, pp. 1–38, 1977.

R. Gentleman and R. Ihaka, ‘‘R: A Language for data analysis and graphics,’’ J. of Computational and Graphical

Statistics vol. 5, pp. 299–314, 1996.

B. Efron, ‘‘The two sample problem with censored data,’’ Proc. Fifth Berkeley Symp. Math. Statist. Probab. vol. 4,

pp. 831–883, 1967.

R. Gill, ‘‘Non- and semi-parametric maximum likelihood estimator and the Von Mises method (I),’’ Scand. J.

Statist. vol. 16, pp. 97–128, 1989.

B. A. Hamberg, H. C. Kraemer, and W. Jahnke, ‘‘A hierarchy of drug use in adolescence behavioral and

attitudinal correlates of substantial drug use,’’ American Journal of Psychiatry vol. 132, pp. 1155–1163, 1975.

E. Kaplan and P. Meier, ‘‘Non-parametric estimator from incomplete observations,’’ J. Amer. Statist. Assoc.

vol. 53, pp. 457–481, 1958.

S. Murphy and Van der Vaart, ‘‘Semi-parametric likelihood ratio inference,’’ Ann. Statist. vol. 25, pp. 1471–1509,

1997.

P. A. Mykland and J. J. Ren, ‘‘Algorithms for Computing Self-Consistent and Maximum Likelihood Estimators

with Doubly Censored Data,’’ Ann. Statist. vol. 24, pp. 1740–1764, 1996.

A. Owen, ‘‘Empirical likelihood ratio confidence intervals for a single functional,’’ Biometrika vol. 75,

pp. 237–249, 1988.

A. Owen, Empirical Likelihood, Chapman & Hall: London, 2001.

W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C: The Art of Scientific Computing,

Cambridge Univ. Press, Chapter 18.1, pp. 791–794, 1993.

B. W. Turnbull, ‘‘Non-parametric estimation of a survivorship function with doubly censored data,’’ J. Amer.

Statist. Assoc. vol. 69, pp. 169–173, 1974.

B. W. Turnbull, ‘‘The empirical distribution function with arbitrarily grouped, censored and truncated data,’’

J. R. Statis. Soc. vol. B 38, pp. 290–295, 1976.

B. W. Turnbull and Weiss, ‘‘A likelihood aatio statistic for testing goodness of fit with randomly censored data,’’

Biometrics vol. 34, pp. 367–375, 1978.

W. Y. Tsai and J. Crowley, ‘‘A large sample study of generalized maximum likelihood estimator from incomplete

data via self-consistency,’’ Ann. Statist. vol. 13, no. 4, pp. 1317–1334, 1985.

Y. Zhan and J. Wellner, ‘‘A hybrid algorithm for computation of the NPMLE from censored data,’’ J. Amer.

Statist. Assoc. vol. 92, pp. 945–959, 1999.

M. Zhou, http://lib.stat.cmu.edu/s/d009newr, 1995.

M. Zhou, L. Lee, and K. Chen, http://cran.r-project.org as contributed package dblcens, 2001.

NON-PARAMETRIC HYPOTHESIS TESTING 91