Non-parametric Hypothesis Testing and Confidence Intervals with Doubly Censored Data
Transcript of Non-parametric Hypothesis Testing and Confidence Intervals with Doubly Censored Data
Non-parametric Hypothesis Testing and ConfidenceIntervals with Doubly Censored Data
KUN CHEN [email protected]
Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA
MAI ZHOU [email protected]
Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA
Received July 20, 2000; Revised April 22, 2002; Accepted July 1, 2002
Abstract. The non-parametric maximum likelihood estimator (NPMLE) of the distribution function with doubly
censored data can be computed using the self-consistent algorithm (Turnbull, 1974). We extend the self-consistent
algorithm to include a constraint on the NPMLE. We then show how to construct confidence intervals and test
hypotheses based on the NPMLE via the empirical likelihood ratio. Finally, we present some numerical
comparisons of the performance of the above method with another method that makes use of the influence
functions.
Keywords: modified self-consistent algorithm, empirical likelihood ratio, influence function, EM algorithm
1. Introduction
When data are doubly censored, the non-parametric maximum likelihood estimator
(NPMLE) of the distribution function has no explicit form and has to be computed via
some iteration (Turnbull, 1976; Chang and Yang, 1987; Zhou, 1995; Mykland and Ren,
1996; Zhan and Wellner, 1999). The estimation of the asymptotic variance of the NPMLE
is even more involved (Chang, 1990).
The self-consistency algorithm is a general way of computing non-parametric distribu-
tional estimators when data are not completely observed (Efron, 1967; Turnbull, 1974;
Tsai and Crowly, 1985). In particular, the NPMLE of a distribution function based on an
i.i.d. sample must satisfy the self-consistent equation (Gill, 1989). It is a special case of the
EM algorithm (Dempster, Laird and Rubin, 1977) when the parameter is the distribution
function itself.
In this paper we extend the self-consistent algorithm to compute the NPMLE under the
constraint, F(T ) ¼ �, for doubly censored data. This constrained estimator is useful in the
construction of empirical likelihood ratio. As an application, we show how this in turn will
allow us to find confidence intervals and test hypotheses for F(T ) with doubly censored data.
Another approach to construct confidence intervals based on the NPMLE with doubly
censored data is to estimate the influence function/variance of the NPMLE. But the
influence functions are implicitly defined and not easy to estimate and thus may not be
very practical for larger samples. We will compare the two approaches for one example in
this paper.
Lifetime Data Analysis, 9, 71–91, 2003# 2003 Kluwer Academic Publishers. Printed in The Netherlands.
Doubly censored data can occur when observations are subject to both right and left
censoring, i.e., the patients are only watched within a window of observational time;
otherwise we only know the event time is below or above the window.
We end this section with a formal definition of the relevant notation and basic
assumptions.
Let X1,: : :, Xn be positive random variables denoting the sample of lifetimes which are
independent and identically distributed with a continuous distribution F0. The censoring
mechanism is such that Xi is observable if and only if it lies inside the interval [Zi, Yi]. The
Zi and Yi are positive random variables with distribution functions GL0 and GR0respec-
tively, and Zi � Yi with probability 1. If Xi is not inside [Zi, Yi], the exact value of Xi cannot
be determined. We only know whether Xi is less than Zi or greater than Yi and we observe
Zi or Yi correspondingly.
The variable Xi is said to be left censored if Xi < Zi and right censored if Xi > Yi. The
available information may be expressed by a pair of random variables: Ti, �i, where
Ti ¼ maxðminðXi; YiÞ; ZiÞ and �i ¼
1 if Zi � Xi � Yi
0 if Xi > Yi
2 if Xi < Zi
i ¼ 1; 2;: : :; n:
8>>>><>>>>:ð1:1Þ
The modified (constrained) self-consistent equation for doubly censored data is derived
in Section 2. We generalize the self-consistent algorithm to deal with several constraints in
Section 3. We then explain how the constrained self-consistent estimate may be used to
obtain confidence intervals via empirical likelihood ratio in section 4. The influence
function is discussed in Section 5. We carry out simulations and apply our algorithm to
some doubly censored data in Section 6.
2. Modified Self-Consistent Equation Under a Constraint
Based on data (1.1), suppose for a given T and � we are interested to test the hypothesis
H0 : FðTÞ ¼ � vs: H1 : FðTÞ 6¼ �: ð2:1Þ
As we will see later, a key step to accomplish this is to compute the NPMLE of F(t)
under H0 from the data (1.1). We shall focus on computing the constrained NPMLE in this
section.
First let us write down the empirical log-likelihood function (log L(F )) for doubly
censored data. The log-likelihood function involves all n observations. However, we can
decompose the likelihood function into two parts: for observations before and after T.
We define wi � 0, i ¼ 1, 2,: : :, n1 corresponding to jumps of F before T; similarly �i � 0
CHEN AND ZHOU72
i ¼ 1, 2,: : :, n2 andPn2
i¼1 �i ¼ ð1� �Þ for jumps after T. Then the log-likelihood function
can be written as
log LðFÞ ¼ log L1 þ log L2
¼X
�i¼1;i�n1
logðwiÞ þX
�i¼0;i�n1
log½1� FðtiÞ þX
�i¼2;i�n1
log½Fðti�Þ
þX
�i¼1;i�n2
logð�iÞ þX
�i¼0;i�n2
log½1� FðsiÞ þX
�i¼2;i�n2
log½Fðsi�Þ
ð2:2Þ
where log L1 denotes the log-likelihood function for n1 observations before T and log L2denotes the log-likelihood function for n2 observations after T (n1 þ n2 ¼ n). Also ti is
the observed times before T; si is the observed times after T. �i is the censoring indicator
before T. �i is the censoring indicator after T.
Notice that because F(T ) ¼ �, we have
1� FðtiÞ ¼ 1� FðTÞ þ FðTÞ � FðtiÞ
¼ 1� �þ wðiþ1Þ þ : : : þ wn1 ;
and F (ti�) ¼ w1 þ w2 þ : : : þ w(i�1). The first part of above likelihood (2.2), log L1, only
involves wi, the jumps of F before time T as in (2.3). Similarly the second part of the
likelihood, log L2, only involves �i, the jumps of F after time T as in (2.4). Therefore, we
can maximize the whole likelihood in two independent steps: maximize log L1 with
respect to wi and then maximize log L2 with respect to �i.If a distribution function, say F, did maximize the log likelihood under H0, then it must
satisfy that the derivative of the log likelihood function at this F equal to 0. We define a
directional change of wi in the direction h(�) and parameterized by to facilitate the
derivative (under H0). Since wi must sum up to �, �i must sum up to (1 � �), we define
them as follows:
For jumps before time T, we define
wi ¼ wiðÞ ¼DFðtiÞ
1þ hðtiÞ�
CðÞ with CðÞ ¼Xn1i¼1
DFðtiÞ1þ hðtiÞ
: ð2:3Þ
Clearly C(0) ¼ �.For jumps after time T, we define
�i ¼ �iðÞ ¼DFðsiÞ
1þ hðsiÞ1� �
DðÞ with DðÞ ¼Xn2i¼1
DFðsiÞ1þ hðsiÞ
: ð2:4Þ
Clearly D(0) ¼ 1 � �.
NON-PARAMETRIC HYPOTHESIS TESTING 73
Remark 2.1. The form of the (2.3) and (2.4) is inspired by Owen’s (1988) results. We
could also use the following definition
wi*ðÞ ¼ D FðtiÞ½1þ hðtiÞ�
C*ðÞ with C*ðÞ ¼X
i¼1;ti<T
D FðtiÞ½1þ hðtiÞ
in the computation of derivatives, but the result of Owen (1988) points towards the form
(2.3) and (2.4). The derivative with respect to at ¼ 0 is then the partial/directional
derivative in the direction h. Since Wi ¼ D F(ti) achieves the maximum, the derivative of
the log likelihood with respect to must be zero for any direction h(t). This leads to the
equation (A.1) (see Appendix A). In particular, the choice h(t) ¼ I[t�u] for �l < u < lwill give us the modified self-consistent equation under the constraint (2.1). In the case of
right censored data without a constraint, this choice of h(�) will give rise to the familiar
self-consistent equation of Efron (1967) for the Kaplan-Meier estimator (Kaplan and
Meier, 1958). This choice of h was also used by Gill (1989). We shall use the convention
1� FðtiÞ ¼Xj:tj>ti
DFðtjÞ; FðtiÞ ¼Xj:tj�ti
DFðtjÞ:
(a) The constrained self-consistent equation for F(u) when u � T is:
FðuÞ ¼ �
n1
X�i¼1;i�n1
I ½ti � u þX
�i¼0;i�n1
1��� FðuÞ
1� FðtiÞ
(
þX
�i¼0;i�n1
FðuÞ � FðtiÞ1� FðtiÞ
I ½ti � u þX
�i¼2;i�n1
Fðminðu; ti�ÞÞFðti�Þ
): ð2:5Þ
where all ti � T.
(b) Similarly, the constrained self-consistent equation for u > T is:
FðuÞ ¼ �þ 1� �
n2
X�i¼1;i�n2
I ½si � u þX
�i¼0;i�n2
FðuÞ � FðsiÞ1� FðsiÞ
I ½si � u(
þX
�i¼2;i�n2
Fðminðu; si�ÞÞ � �
Fðsi�Þ
): ð2:6Þ
where all si > T.
For the derivation of above self-consistent equations, please see Appendix A. We
summarize the result in the following theorem.
THEOREM 2.1 The NPMLE of F (�) under hypothesis (2.1) based on doubly censored data
(1.1) must satisfy the modified self-consistent equations (2.5) and (2.6). w
CHEN AND ZHOU74
Using the above modified self-consistent equations, we can iteratively compute the
estimator of distribution F under the constrained equation: we substitute the current
estimate, F, into the right hand side of (2.5) or (2.6) to obtain a new estimator Fnew. The
initial estimator should have probability mass at locations where the NPMLE may have
jumps, and it should satisfy the constraint. We used the NPMLE without constraint but
scaled to satisfy the constraint as the initial estimator.
THEOREM 2.2 The estimator obtained by the above modified self-consistent equations is at
least a local maximizer of the likelihood function under (2.1).
Proof: See Appendix B. w
3. Modified Self-Consistent Equations Under Many Constraints
In this section, we extend the modified self-consistent algorithm to handle hypotheses
(constraints) at several times:
H0 : FðT1Þ ¼ �1;FðT2Þ ¼ �2; . . . ;FðTkÞ ¼ �k
H1 : FðTjÞ 6¼ �j for at least one j; 1 � j � k:
For simplicity, we only give the proof for k ¼ 2. Following the similar steps as in the
simple hypothesis, we can decompose the log-likelihood function into three parts: before
time T1, between times T1 and T2 and after time T2.
log LðFÞ ¼ log L1ðFÞ þ log L2ðFÞ þ log L3ðFÞ
¼X
�i¼1;i�n1
logðwiÞ þX
�i¼0;i�n1
log½1� FðtiÞ þX
�i¼2;i�n1
log½Fðti�Þ
þX
�i¼1;i�n2
logðziÞ þX
�i¼0;i�n2
log½1� FðxiÞ þX
�i¼2;i�n2
log½Fðxi�Þ
þX
ci¼1;i�n3
logð�iÞ þX
ci¼0;i�n3
log½1� FðsiÞ þX
ci¼2;i�n3
log½Fðsi�Þ:
ð3:1Þ
In the above equation, we use the following notations. wi, zi and �i are the jumps of F at ith
observation before T1, between T1 and T2 and after T2 respectively; ti, xi and si are
observations before T1, between T1 and T2 and after T2 respectively. �i, �i and ci are the
censoring indicator before T1, between T1 and T2, after T2 respectively. There are n1observations before T1, n2 observations between T1 and T2, and n3 observations after T2.
NON-PARAMETRIC HYPOTHESIS TESTING 75
The total number of observations is n1 þ n2 þ n3 ¼ n. Note that the constraints arePn1i¼1 wi ¼ �1;
Pn2i¼1 zi ¼ �2 � �1; and
Pn3i¼1 �i ¼ 1� �2.
The maximization of (3.1) can be done separately on log L1, log L2 and log L3. The
maximization of log L1 and log L3 are exactly the same as the maximization of log L1 and
log L2 in section 2. To maximize the log L2 in (3.1), we define the jumps between T1 and
T2 as
zi ¼ ziðÞ ¼DFðxiÞ
1þ hðxiÞ�2 � �1BðÞ with BðÞ ¼
Xn2i¼1
DFðxiÞ1þ hðxiÞ
;
and B(0) ¼ �2 � �1.The self-consistent equation related to log L2, for T1 � u � T2 turns out to be:
FðuÞ ¼ �1 þ�2 � �1
n2
X�i¼1;i�n2
Iðxi � uÞ þX
�i¼0;i�n2
1��2�2��1
½FðuÞ � �11� FðxiÞ
(
þX
�i¼0;i�n2
½FðuÞ � FðxiÞI ½xi � u1� FðxiÞ
þX
�i¼2;i�n2
½Fðminðxi; uÞÞ � �1Fðxi�Þ
þX
�i¼2;i�n2
�1�2��1
½FðuÞ � �1Fðxi�Þ
; ð3:2Þ
where T1 � xi � T2 (For the details, see Appendix C).
4. Empirical Likelihood Ratio Tests and Confidence Intervals
Let F be the NPMLE of F which maximizes log likelihood, log L(F ) defined in (2.2), over
all distributions. Let F denote the NPMLE of F under H0, which maximizes the log
likelihood only among distributions that satisfy H0. We define the empirical likelihood
ratio function as:
RðH0Þ ¼LðFÞLðF Þ
:
The likelihood ratio test statistic is:
�2log RðH0Þ ¼ �2logmaxH0
LðFÞmaxH0þH1
LðFÞ ¼ 2 logðLðF ÞÞ � logðLðF ÞÞ �
:
CHEN AND ZHOU76
The method described in section 2 will enable us to compute the constrained NPMLE F
which maximizes the log likelihood under H0 : F(T ) ¼ �. Using the usual self-consistent
algorithm (Turnbull, 1974), we can compute the NPMLE F without any constraint. Once
we have these estimates, we can easily compute the empirical likelihood ratio. We use chi-
square theory to carry out hypothesis testing and construct confidence intervals. For the
general theory about empirical likelihood ratio, see Owen (1988, 2001). For a more
technical result that is directly applicable to our setup here, see Theorem 2.1 of Murphy
and Van der Vaart (1997).
If the observed �2 log R(H0) is greater than 1,�2 (the 100(1 � �)th percentile of 2 with
1 degree freedom), we reject H0 at � significance level. To construct the confidence
interval for F(T ), we can test different hypotheses with fixed T and various �’s and the
confidence interval is just
� : �2log RðH0 : FðTÞ ¼ �Þ � 21;�
n o: ð4:1Þ
We can also construct confidence interval for percentile F�1(�) as follows: test many
hypotheses with fixed � and various T ’s, and form the confidence interval as:
T : �2log RðH0 : FðTÞ ¼ �Þ � 21;�
n o: ð4:2Þ
In particular, a confidence interval for the median can be obtained with � ¼ 1/2.
5. Influence Function of NPMLE and Its Estimation
Influence function (or influence curve) is a general technique to obtain the variance of a
random process (and more). In the analysis of F(�) with doubly censored samples, there are
three influence functions corresponding to right, left and non-censored observations.
Chang (1990) computed those asymptotic influence functions for the processffiffiffin
p ðFnðtÞ�FðtÞÞ, he obtained the following:
ffiffiffin
pðFnðtÞ � FðtÞÞ ¼
Z T
0
IC1ðt; sÞdqðnÞ1 ðsÞ þZ T
0
IC0ðt; sÞdqðnÞ0 ðsÞ
þZ T
0
IC2ðt; sÞdqðnÞ2 ðsÞ þ oðnÞp ð1Þ;
where
qðnÞj ðtÞ ¼
ffiffiffin
p 1
n
Xi
I½Ti�t;�i¼j
!� E
1
n
Xi
I½Ti�t;�i¼j
!" #j ¼ 0; 1; 2:
NON-PARAMETRIC HYPOTHESIS TESTING 77
From the above, we could try to estimate the three asymptotic influence functions,
ICjˆ ðt; sÞ and then estimate the variance of
ffiffiffin
p ðFðtÞ � FðtÞÞ by
Z T
0
IC2
1ðt; sÞdqðnÞ1 ðsÞ þ
Z T
0
IC2
0ðt; sÞdqðnÞ0 ðsÞ þ
Z T
0
IC2
2ðt; sÞdqðnÞ2 ðsÞ
�Z T
0
IC1ðt; sÞdqðnÞ1 ðsÞ þZ T
0
IC0ðt; sÞdqðnÞ0 ðsÞ þZ T
0
IC2ðt; sÞdqðnÞ2 ðsÞ� �2
:
However, those influence functions are only defined implicitly via Fredholm integral
equations that involve not only the unknown distribution function F, but also the unknown
censoring distributions. We plug in the (self-consistent) estimate of those distribution
functions, discretize the Fredholm integral equations into matrix equations and solve for
ICjˆ . For details see Chang (1990) and Press et al. (1993) Chapter 18.
In this approach we need to solve a matrix equation that results from discretizing the
corresponding integral equations for influence functions. If we discretize at few points,
then the estimate would not be very accurate. If we use a lot of discretizing points, we get a
big matrix and the computation is slow. For large sample sizes with many discretizing
points, this approach is very computationally expensive (to find the inverse of a large
matrix). In our experience, when censored observations (both right and left) exceed 500,
Figure 1. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Normal Distribution when Sample
Size ¼ 100.
CHEN AND ZHOU78
it becomes slow in our implementation, since we are discretizing at observed censoring
times. On the other hand, for the self-consistent algorithm we observe only slight slowing
for sample size over 2,000. Besides, the memory requirement for influence function
calculation is of n2 order while only n for the self-consistent algorithm. In other
implementation, the speed will undoubtedly vary, but the compromise between speed
and accuracy (the number of discretizing points) of influence function estimator is always
there. The accuracy of those influence functions based variance estimators is largely
unknown. On the other hand, we have pretty good indication for the 2 approximation in
the empirical likelihood case (Figure 1, 2, 3 and 4). It is also a generally accepted fact that
for smaller sample sizes the likelihood ratio tests are more accurate than the Wald tests.
In section 5, we use influence function to estimate the variance of F(T ) and construct
confidence intervals illustrated in (6.1) (6.2), (6.3), (6.4). The comparisons are made
among the different confidence intervals, using the approaches of influence function and
empirical likelihood ratio described in section 4.
6. Applications, Simulations and Examples
In this section, we will apply our methodology to test hypothesis and construct confidence
intervals with doubly censored data.
Figure 2. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Normal Distribution when Sample
Size ¼ 25.
NON-PARAMETRIC HYPOTHESIS TESTING 79
6.1. Simulation: Hypothesis Testing
In our next two simulations, we only examined the size of the test (i.e., under H0). We took
normally distributed samples of size 100 and size 25 respectively for each run and each
entry in Table 2 was based on 5,000 runs. The data were generated via (1.1) where Xi, Yi, Ziare distributed as in Table 1.
The testsH0 : F(T )¼ � based on doubly censored data Ti, �i as described in Section 2 werecarried out. Table 2 illustrates the percentages of rejecting H0 : F(T ) ¼ � at the nominal
significance level � ¼ 0.05 (or 0.10). For example, the second row were the results of
testingH0 : F(8)¼ 0.16. F(�) is cumulative distribution function of Normal distribution with
mean � ¼ 10 and standard deviation ¼ 2. The percentages were computed as the number
of �2log-likelihood ratios greater than critical value 1,0.052 ¼ 3.84 (or 1,0.1
2 ¼ 2.71)
divided by the number of runs 5,000. From Table 2 we can see the probabilities of rejecting
H0 are pretty close to the nominal level � ¼ 0.05 (or 0.10), since we have chosen the T
and � values that make the H0 hold, (Prob(N(� ¼ 10, ¼ 2) � 10) ¼ 0.5, etc.).
Similar to the above, we took exponentially distributed samples of size 100 and size 25
respectively for our final simulation. This simulation was also based on 5,000 runs. The
data were generated via (1.1) using Xi, Yi, Zi distribution listed in Table 3.
Again Table 4 shows that the percentages of rejecting H0 are close to the nominal level
� ¼ 0.05 (or 0.10), since the T and � values we chose make H0 hold.
Figure 3. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Exponential Distribution when Sample
Size ¼ 100.
CHEN AND ZHOU80
Figure 1, 2, 3 and 4 are Q-Q plots of �2log-likelihood ratios for the above two
simulations verse the (1)2 percentiles. At the point 3.84 (or 2.71), if the �2log-likelihood
ratio line is above the dashed line (45j line), the rejecting probability is greater than 5% (or
10%). Otherwise, the rejecting probability is less than 5% (or 10%).
Remark 6.1. The Q-Q plots for exponentially distributed light-censored simulations
(Figure 3(a) and 4(a)) are somehow more discrete than the others. We did 5,000
uncensored simulations with exponential distribution, the Q-Q plot is similar to Figure 3(a)
and 4(a). Since the constraint is F(T ) ¼ �, F and F only depend on the number of
observations (n1) before and (n2) after time T. If n1 and n2 are the same in two uncensored
samples, then the �2log-likelihood ratio will be the same. For this reason, uncensored and
light-censored plots look more discrete. The censoring somehow smoothed the plot, and
made the chi-square approximation better.
Table 1. Generating Normally Distributed Samples i ¼ 1, 2, : : :, n.
Xi Zi Yi
light-censored Case (10%–20% censored) N(� ¼ 10, ¼ 2) N(� ¼ 6, ¼ 2) exp (1) þ Zi þ 8
medium-censored Case (20%–40% censored) N(� ¼ 10, ¼ 2) N(� ¼ 7, ¼ 2) exp (1) þ Zi þ 5
heavy-censored Case (40%–60% censored) N(� ¼ 10, ¼ 2) N(� ¼ 9, ¼ 2) exp (1) þ Zi þ 2
Figure 4. Q-Q Plot of �2log-likelihood Ratios vs. (1)2 Percentiles for Exponential Distribution when Sample
Size = 25.
NON-PARAMETRIC HYPOTHESIS TESTING 81
6.2. Example: Confidence Intervals – A Case Study
We compare the confidence intervals obtained via empirical likelihood ratio as in section 4
to confidence intervals obtained by directly estimating the asymptotic variance of F(T )
and then form the Wald (1 � �) confidence interval:
FðTÞFZð1��=2Þ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðTÞ ;
qð6:1Þ
where Z(1��/2) is the 100(1 � �/2)th percentile of a standard normal distribution. Better
confidence intervals may be obtained on a transformed scale. We may use the log-log
transformation to obtain:
SðTÞb; SðTÞ1=bh i
; where b ¼ expZð1��=2Þ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffibbVar SðTÞ �q
SðTÞjlog SðTÞj
24 35; ð6:2Þ
where (S (�) ¼ 1 � F (�)); or the logit transformation to obtain:
ea
1þ ea;
eb
1þ eb
� �; ð6:3Þ
Table 2. The Percentage Of Rejecting H0 : F(T ) ¼ � At � ¼ 0.05 And 0.10.
T � Sample Size Light Censored Medium Censored Heavy Censored
10 0.5 n ¼ 100 5.38%/10.36% 5.02%/10.04% 4.66%/12.26%
n ¼ 25 4.62%/10.76% 4.82%/10.12% 4.92%/10.14%
8 0.16 n ¼ 100 5.04%/10.2% 5.02%/10.4% 5.70%/10.44%
n ¼ 25 5.12%/12.8% 6.08%/11.52% 5.58%/10.72%
Table 3. Generating Exponentially Distributed Samples i ¼ 1, 2, : : :, n.
Xi Zi Yi
light-censored (10%–20% censored) exp (2) exp (15) exp (1) + Zi + 1
medium-censored (20%–40% censored) exp (2) exp (8) exp (1) + Zi + 0.3
heavy-censored (40%–60% censored) exp (2) exp (5) exp (1) + Zi
Table 4. The Percentage Of Rejecting H0 : F(T ) ¼ � At � ¼ 0.05 And 0.10.
T � Sample Size Light Censored Medium Censored Heavy Censored
0.5 0.63 n ¼ 100 4.18%/9.06% 4.50%/9.58% 4.94%/9.98%
n ¼ 25 4.58%/9.68% 4.70%/9.74% 5.40%/10.2%
0.3 0.45 n ¼ 100 4.72%/8.64% 4.56%/9.34% 4.48%/9.38%
n ¼ 25 4.52%/10.18% 4.82%/10.18% 5.46%/10.68%
CHEN AND ZHOU82
where
a ¼ logFðTÞ
1� FðTÞ�
Zð1��=2Þ
FðTÞð1� FðTÞÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðTÞ
qand
b ¼ logFðTÞ
1� FðTÞþ
Zð1��=2Þ
FðTÞð1� FðTÞÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðTÞ :
q
It is not easy to obtain the Wald type confidence interval for the median (or other
quantiles). But we can invert the test of F(T ) ¼ 0.5 with estimated variance. Therefore, the
confidence set for median may be obtained as the set of points T that satisfy the following
condition:
T : �Zð1��=2Þ �FðTÞ � 0:5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffibbVar½FðTÞq � Zð1��=2Þ
8><>:9>=>;: ð6:4Þ
We will use the following doubly censored data as our example to construct confidence
intervals.
Turnbull and Weiss (1978) reported part of a study conducted at Stanford-Palo Alto Peer
Counseling Program (see Hamburg et al. (1975) for details of the study). In this study, 191
California high school boys were asked ‘‘When did you first use marijuana?’’ The answers
are either the exact age (uncensored observations), or ‘‘I never used it’’ which are right-
censored observations at the boys’ current ages, or ‘‘I have used it but can not recall just
when the first time was’’ which are left-censored observations. The estimated median age
of high school boys who use marijuana is 14 years old. Table 5 shows the results of this
study.
Suppose we are interested to obtain 95% confidence interval for the median age of first
time marijuana use. Table 6 shows the 95% confidence intervals for the median age with
(4.2) and (6.4). Table 7 shows four different kinds of 95% confidence intervals for � at themedian age with (4.1), (6.1), (6.2) and (6.3).
From Table 7 we can see the 95% confidence interval obtained by empirical likelihood
ratio (4.1) is narrower than any other three intervals with (6.1), (6.2) and (6.3). The two
transformed intervals obtain by (6.2) and (6.3) are wider than (4.1) but close to each other.
We do not know which transformation is better. We believe the empirical likelihood ratio
approach is better because we do not need to know what is the best transformation, and
simulations in the previous section show the chi square approximations are pretty accurate.
The Wald confidence interval (6.1) includes the numbers less than 0 and greater than 1.
These are not reasonable confidence limits for a distribution since a distribution must be
between 0 and 1. Therefore, in this example we conclude that the empirical likelihood ratio
approach works better than Wald’s approach.
NON-PARAMETRIC HYPOTHESIS TESTING 83
6.3. Software Used
The simulations were carried out using Splus 3.4 for Unix on HP workstations. The code
we used also work in R (Gentleman and Ihaka, 1996). The R code have already been
packaged and posted to the R homepage http://cran.r-project.org as contributed package
dblcens (Zhou et al., 2001).
7. Discussion
In this paper, we proposed a modified self-consistent algorithm under constraints of the
type F(T ) ¼ �. We used it to test hypothesis and construct confidence intervals for survival
probabilities and quantiles via the empirical likelihood ratios for doubly censored data.
Simulations show the resulting tests have fairly accurate size under H0. This method does
Table 5. Marijuana Use In High School Boys.
Age
Number of Exact
Observations
Number Who Have Yet
to Smoke Marijuana
Number Who Have Started
Smoking at an Earlier Age
10 4 0 0
11 12 0 0
12 19 2 0
13 24 15 1
14 20 24 2
15 13 18 3
16 3 14 2
17 1 6 3
18 0 0 1
>18 4 0 0
Table 6. 95% Confidence Intervals For Median Age(14).
method Confidence Interval
T : �2 log R(H0 : F(T ) ¼ 0.5) � 3.84 (4.2) (11.00000, 16.99991)
T : �1.96 � FðtÞ�0:5ffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½FðtÞ
p � 1.96 (6.4) (10, 19)
Table 7. 95% Confidence Intervals For F(14).
method Confidence Interval
� : �2 log R(H0 : F(14) ¼ �) � 3.84 (4.1) (0.2798736, 0.7195857)
[1 � S(14)b, 1 � S(14)1/b] (6.2) (0.0378062, 0.9999983)
ea
1þea; eb
1þeb
h i(6.3) (0.01719094, 0.9842504)
F(14) F 1.96
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarbb ½Fð14Þ
q(6.1) (�0.533258, 1.511003)
CHEN AND ZHOU84
not give the simultaneous confidence band for the whole distribution function. We also
illustrated how to estimate the influence function for the NPMLE F. The accuracy of this
estimator deserves more study.
Appendix A
A.1. Derivation of the Self-consistent Equation Before Time T
(a) The log likelihood function before time T :
log L1 ¼X
�i¼1;i�n1
logðwiÞ þX
�i¼0;i�n1
log½1� FðtiÞ þX
�i¼2;i�n1
log½Fðti�Þ
¼X
�i¼1;i�n1
logðwiÞ þX
�i¼0;i�n1
log ð1� �Þ þX
tj>ti; j�n1
wj
" #
þX
�i¼2;i�n1
logX
tj<ti; j�n1
wj
!:
To facilitate derivative we substitute wi () as in (2.3)
log L1 ¼X
�i¼1;i�n1
logDFðtiÞ
1þ hðtiÞ�
CðÞ
þX
�i¼0;i�n1
log ð1� �Þ þX
tj>ti; j�n1
DFðtjÞ1þ hðtjÞ
�
CðÞ
" #
þX
�i¼2;i�n1
logX
tj<ti; j�n1
DFðtjÞ1þ hðtjÞ
�
CðÞ
¼X
�i¼1;i�n1
log�
CðÞ þX
�i¼1;i�n1
logDFðtiÞ
1þ hðtiÞþ
X�i¼0;i�n1
log�
CðÞ
þX
�i¼0;i�n1
log1� �
�CðÞ þ
Xtj>ti; j�n1
DFðtjÞ1þ hðtjÞ
" #þ
X�i¼2;i�n1
log�
CðÞ
þX
�i¼2;i�n1
logX
tj<ti; j�n1
DFðtjÞ1þ hðtjÞ
NON-PARAMETRIC HYPOTHESIS TESTING 85
¼ n1log�
CðÞ þX
�i¼1;i�n1
logDFðtiÞ
1þ hðtiÞ
þX
�i¼0;i�n1
logð1� �ÞCðÞ
�þ
Xtj>ti; j�n1
DFðtjÞ1þ hðtjÞ
" #
þX
�i¼2;i�n1
logX
tj<ti; j�n1
DFðtjÞ1þ hðtjÞ
:
Now we are ready to take the derivative with respect to ,
Dlog L1
D¼ �n1
CVðÞCðÞ �
X�i¼1;i�n1
hðtiÞ1þ hðtiÞ
þX
�i¼0;i�n1
1��� CVðÞ �
Ptj>ti
DFðtjÞhðtjÞ½1þhðtjÞ2
1��� CðÞ þ
Ptj>ti
DFðtjÞ1þhðtjÞ
�X
�i¼2;i�n1
Ptj<ti
DFðtjÞhðtiÞ½1þhðtjÞ2P
tj<ti
DFðtjÞ1þhðtjÞ
:
If we set ¼ 0, the above can be simplified to
Dlog L1
Dj¼0 ¼
n1
�
Xn1i¼1
DFðtiÞhðtiÞ �X
�i¼1;i�n1
hðtiÞ
�X
�i¼0;i�n1
1���
Pn1i¼1 DFðtiÞhðtiÞ
ð1� �Þ þP
tj>tiDFðtjÞ
�X
�i¼0;i�n1
Ptj>ti
DFðtjÞhðtjÞð1� �Þ þ
Ptj>ti
DFðtjÞ�
X�i¼2;i�n1
Ptj<ti
DFðtjÞhðtjÞPtj<ti
DFðtjÞ:
This derivative must be zero since F is the NPMLE. Thus, we get
Xn1i¼1
DFðtiÞhðtiÞ ¼�
n1
X�i¼1;i�n1
hðtiÞ þX
�i¼0;i�n1
1���
Pn1i¼1 DFðtiÞhðtiÞ1� FðtiÞ
(
þX
�i¼0;i�n1
Ptj>ti
DFðtjÞhðtjÞ1� FðtiÞ
þX
�i¼2;i�n1
Ptj<ti
DFðtjÞhðtjÞFðti�Þ
):
ðA:1Þ
If we take h(ti) ¼ I [ti � u ], for u � T, then (A.1) becomes equation (2.5).
CHEN AND ZHOU86
A.2. Derivation of the Self-consistent Equation After Time T
Similarly we can obtain the constrained self-consistent equation for F when u > T.
(b) The log likelihood function after time T is
log L2 ¼X
�i¼1;i�n2
logð�iÞ þX
�i¼0;i�n2
log½1� FðsiÞ þX
�i¼2;i�n2
log½Fðsi�Þ
¼X
�i¼1;i�n2
logð�iÞ þX
�i¼0;i�n2
logX
sj>si; j�n2
vj
!þ
X�i¼2;i�n2
log �þX
sj<si; j�n2
vj
!:
Again substituting �i () as in (2.4),
log L2 ¼X
�i¼1;i�n2
logDFðsiÞ
1þ hðsiÞ1� �
DðÞ þX
�i¼0;i�n2
logXsj>si
�FðsjÞ1þ hðsjÞ
1� �
DðÞ
þX
�i¼2;i�n2
log �þXsj<si
DFðsjÞ1þ hðsjÞ
1� �
DðÞ
" #
¼ n2log1� �
DðÞ þX
�i¼1;i�n2
logDFðsiÞ
1þ hðsiÞþ
X�i¼0;i�n2
logXsj>si
DFðsjÞ1þ hðsjÞ
þX
�i¼2;i�n2
log�
1� �DðÞ þ
Xsj<si
DFðsjÞ1þ hðsjÞ
" #:
Taking the derivative with respect to , it follows:
Dlog L2
D¼ �n2
DVðÞDðÞ �
X�i¼1;i�n2
hðsiÞ½1þ hðsiÞ2
�X
�i¼0;i�n2
Psj>si
DFðsjÞhðsjÞ½1þhðsjÞ2P
sj>si
DFðsjÞ1þhðsjÞ
þX
�i¼2;i�n2
�1�� DVðÞ �
Psj<si
DFðsjÞhðsiÞ½1þhðsjÞ2
�1�� DðÞ þ
Psj<si
DFðsjÞ1þhðsjÞ
:
If we set ¼ 0 and the derivative must be zero, thus
Xn2i¼1
DFðsiÞhðsiÞ ¼1� �
n2
X�i¼1;i�n2
hðsiÞ þX
�i¼0;i�n2
Psj>si
DFðsjÞhðsjÞPsj>si
DFðsjÞ
(
þX
�i¼2;i�n2
�1��
Pn2i¼1 DFðsjÞhðsjÞ
�þP
sj<siDFðsjÞ
þX
�i¼2;i�n2
Psj<si
DFðsjÞhðsjÞ�þ
Psj<si
DFðsjÞ
):
If we take h (si) ¼ I [si � u] for u > T, the above equation becomes (2.6).
NON-PARAMETRIC HYPOTHESIS TESTING 87
Appendix B
Proof of Theorem 2.2: Taking the second derivative of log likelihood function log L1with respect to and setting ¼ 0 and h (ti) ¼ I[ti�u], we can simplify the derivative as
follows:
D2log L1
D2j¼0 ¼ � 2n1
�FðuÞ þ n1
�2½FðuÞ2 þ
X�i¼1;i�n1
I½ti�u þ 2X
�i¼0;i�n1
1��� FðuÞ
1� FðtiÞ
þ 2X
�i¼0;i�n1
FðuÞ � FðtiÞ1� FðtiÞ
I½ti�u �X
�i¼0;i�n1
1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u
$ %2½1� FðtiÞ2
þ 2X
�i¼2;i�n1
Fðminðu; ti�ÞFðti�Þ
�X
�i¼2;i�n1
Fðminðu; ti�ÞFðti�Þ
& 2
:
Substituting F(u) as in (2.5),
D2log L1
D2j¼0 ¼ �
X�i¼1;i�n1
I½ti�u þn1
�2½FðuÞ2
�X
�i¼0;i�n1
1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u
$ %2½1� FðtiÞ2
�X
�i¼2;i�n1
Fðminðu; ti�ÞÞF ðti�Þ
" #2
¼ �n11
n1
X�i¼1;i�n1
I2½ti�u þX
�i¼0;i�n1
1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u
$ %2½1� FðtiÞ2
"(
þX
�i¼2;i�n1
Fðminðu; ti�ÞÞ
Fðti�Þ
!2#� FðuÞ
�
� �2)
¼ �n11
n1
Xn1i¼1
a2i �1
n1
Xn1i¼1
ai
" #28<:9=;
whereXn1i¼1
a2i ¼X
�i¼1;i�n1
I2½ti�u þX
�i¼0;i�n1
1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u
$ %2½1� FðtiÞ2
þX
�i¼2;i�n1
Fðminðu; ti�ÞÞFðti�Þ
� �2
CHEN AND ZHOU88
and Xn1i¼1
ai ¼X
�i¼1;i�n1
I½ti�u þX
�i¼0;i�n1
1��� FðuÞ þ ½FðuÞ � FðtiÞI½ti�u
½1� FðtiÞ
þX
�i¼2;i�n1
Fðminðu; ti�ÞFðti�Þ
:
By (2.5) and Cauchy-Schwartz Inequality, it is clear that the second derivative of log L1 is
non-positive. The second derivative of log L2 is also non-positive by following the similar
steps. w
Appendix C
Derivation of the self-consistent Equations Between Time T1 and T2
log L2 ¼X
�i¼1;i�n2
logðziÞ þX
�i¼0;i�n2
log½1� FðxiÞ þX
�i¼2;i�n2
log½Fðxi�Þ
¼X
�i¼1;i�n2
logðziÞ þX
�i¼0;i�n2
log½1� FðT2Þ þ FðT2Þ � FðxiÞ
þX
�i¼2;i�n2
log½Fðxi�Þ � FðT1Þ þ FðT1Þ
¼X
�i¼1;i�n2
logðziÞ þX
�i¼0;i�n2
log ð1� �2Þ þX
xj>xi;i�n2
zj
" #
þX
�i¼2;i�n2
logX
xj<xi;i�n2
zj þ �1
" #:
Substituting zi () into log L2 into above equation, we get
log L2 ¼X
�i¼1;i�n2
logDFðxiÞ
1þ hðxiÞ�2 � �1BðÞ
� �
þX
�i¼0;i�n2
log ð1� �2Þ þX
xj>xi; j�n2
DFðxjÞ1þ hðxjÞ
�2 � �1BðÞ
" #
þX
�i¼2;i�n2
logX
xj<xi; j�n2
DFðxjÞ1þ hðxjÞ
�2 � �1BðÞ þ �1
" #
NON-PARAMETRIC HYPOTHESIS TESTING 89
¼ n2 log�2 � �1BðÞ þ
X�i¼1;i�n2
logDFðxiÞ
1þ hðxiÞ
þX
�i¼0;i�n2
log1� �2�2 � �1
BðÞ þX
xj>xi; j�n2
DFðxjÞ1þ hðxjÞ
" #
þX
�i¼2;i�n2
logX
xj<xi; j�n2
DFðxjÞ1þ hðxjÞ
þ �1�2 � �1
BðÞ" #
:
Taking the derivative with respective to ,
Dlog L2
D¼ �n2
BVðÞBðÞ �
X�i¼1;i�n1
hðxiÞ1þ hðxiÞ
þX
�i¼0;i�n1
1��2�2��1
BVðÞ �P
xj>xi
DFðxjÞhðxjÞ½1þhðxjÞ2
1��2�2��1
BðÞ þP
xj>xi
DFðxjÞ1þhðxjÞ
þX
�i¼2;i�n2
Pxj<xi; j�n2
�DFðxjÞhðxiÞ½1þhðxjÞ2
þ �1�2��1
BVðÞPxj<xi; j�n2
DFðxjÞ1þhðxjÞ þ
�1�2��1
BðÞ:
If we set ¼ 0, the above equation can be simplified to
Dlog L2
Dj¼0 ¼
n2
�2 � �1
Xn2i¼1
DFðxiÞhðxiÞ �X
�i¼1;i�n2
hðxiÞ
�X
�i¼0;i�n2
1��2�2��1
Pn2i¼1 DFðxiÞhðxiÞ þ
Pxj>xi
DFðxjÞhðxjÞð1� �2Þ þ
Pxj>xi
DFðxjÞ
�X
�i¼2;i�n2
Pxj<xi
DFðxjÞhðxjÞ þ �1�2��1
Pn2i¼1 DFðxiÞhðxiÞ
�1 þP
xj<xiDFðxjÞ
:
Since the derivative must equal to zero, we rearrange the terms and obtain
Xn2i¼1
DFðxiÞhðxiÞ ¼�2 � �1
n2
X�i¼1;i�n2
hðxiÞ(
þX
�i¼0;i�n2
1��2�2��1
Pn2i¼1 DFðxiÞhðxiÞ þ
Pxj>xi
DFðxjÞhðxjÞð1� �2Þ þ
Pxj>xi
DFðxjÞ
þX
�i¼2;i�n2
Pxj<xi
DFðxjÞhðxjÞ þ �1�2��1
Pn2i¼1 DFðxiÞhðxiÞ
�1 þP
xj<xiDFðxjÞ
):
CHEN AND ZHOU90
Finally, we take h(xi) ¼ I[xi�u] for T1 � u � T2, and the above equation can be rewritten
as (3.2).
References
M. N. Chang and G. L. Yang, ‘‘Strong consistency of a non-parametric estimator of the survival function with
doubly censored data,’’ Ann. Statist. vol. 15, pp. 1536–1547, 1987.
M. N. Chang, ‘‘Weak convergence of a self-consistent estimator of the survival function with doubly censored
data,’’ Ann. Statist. vol. 18, pp. 391–404, 1990.
A. P. Dempster, N. M. Laird, and D. B. Rubin, ‘‘Maximum likelihood from incomplete data via the EM algorithm
with discussion,’’ J. Roy. Statist. Soc., Ser. vol. B 39, pp. 1–38, 1977.
R. Gentleman and R. Ihaka, ‘‘R: A Language for data analysis and graphics,’’ J. of Computational and Graphical
Statistics vol. 5, pp. 299–314, 1996.
B. Efron, ‘‘The two sample problem with censored data,’’ Proc. Fifth Berkeley Symp. Math. Statist. Probab. vol. 4,
pp. 831–883, 1967.
R. Gill, ‘‘Non- and semi-parametric maximum likelihood estimator and the Von Mises method (I),’’ Scand. J.
Statist. vol. 16, pp. 97–128, 1989.
B. A. Hamberg, H. C. Kraemer, and W. Jahnke, ‘‘A hierarchy of drug use in adolescence behavioral and
attitudinal correlates of substantial drug use,’’ American Journal of Psychiatry vol. 132, pp. 1155–1163, 1975.
E. Kaplan and P. Meier, ‘‘Non-parametric estimator from incomplete observations,’’ J. Amer. Statist. Assoc.
vol. 53, pp. 457–481, 1958.
S. Murphy and Van der Vaart, ‘‘Semi-parametric likelihood ratio inference,’’ Ann. Statist. vol. 25, pp. 1471–1509,
1997.
P. A. Mykland and J. J. Ren, ‘‘Algorithms for Computing Self-Consistent and Maximum Likelihood Estimators
with Doubly Censored Data,’’ Ann. Statist. vol. 24, pp. 1740–1764, 1996.
A. Owen, ‘‘Empirical likelihood ratio confidence intervals for a single functional,’’ Biometrika vol. 75,
pp. 237–249, 1988.
A. Owen, Empirical Likelihood, Chapman & Hall: London, 2001.
W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C: The Art of Scientific Computing,
Cambridge Univ. Press, Chapter 18.1, pp. 791–794, 1993.
B. W. Turnbull, ‘‘Non-parametric estimation of a survivorship function with doubly censored data,’’ J. Amer.
Statist. Assoc. vol. 69, pp. 169–173, 1974.
B. W. Turnbull, ‘‘The empirical distribution function with arbitrarily grouped, censored and truncated data,’’
J. R. Statis. Soc. vol. B 38, pp. 290–295, 1976.
B. W. Turnbull and Weiss, ‘‘A likelihood aatio statistic for testing goodness of fit with randomly censored data,’’
Biometrics vol. 34, pp. 367–375, 1978.
W. Y. Tsai and J. Crowley, ‘‘A large sample study of generalized maximum likelihood estimator from incomplete
data via self-consistency,’’ Ann. Statist. vol. 13, no. 4, pp. 1317–1334, 1985.
Y. Zhan and J. Wellner, ‘‘A hybrid algorithm for computation of the NPMLE from censored data,’’ J. Amer.
Statist. Assoc. vol. 92, pp. 945–959, 1999.
M. Zhou, http://lib.stat.cmu.edu/s/d009newr, 1995.
M. Zhou, L. Lee, and K. Chen, http://cran.r-project.org as contributed package dblcens, 2001.
NON-PARAMETRIC HYPOTHESIS TESTING 91