An optimal completion of the product limit estimator
-
Upload
zhiqiang-chen -
Category
Documents
-
view
214 -
download
1
Transcript of An optimal completion of the product limit estimator
![Page 1: An optimal completion of the product limit estimator](https://reader036.fdocuments.in/reader036/viewer/2022081702/5750204f1a28ab877e9a1bfe/html5/thumbnails/1.jpg)
ARTICLE IN PRESS
0167-7152/$ - se
doi:10.1016/j.sp
�CorrespondE-mail addr
1Partially sup
Statistics & Probability Letters 76 (2006) 913–919
www.elsevier.com/locate/stapro
An optimal completion of the product limit estimator
Zhiqiang Chen�,1, Eswar Phadia
Department of Mathematics, The William Paterson University of New Jersey, Wayne, NJ 07470, USA
Received 4 June 2003; accepted 7 September 2005
Available online 14 November 2005
Abstract
It is well known that the product limit estimator is undefined beyond the largest observation if it is censored. Some
completion methods are suggested in the literature (see e.g. [Efron, 1967. The two sample problem with censored data.
Proceedings of the 5th Berkeley Symposium] and [Gill, 1980. Censoring and stochastic integrals. Mathematical Centre
Tract No. 124, Mathematisch Centrum, Amsterdam]). In this note, we propose a completion method that is optimal in the
sense that the expected value of the integrated squared error loss function is minimized. This method yields an estimator
that falls between the above two extremes and possesses the same large sample properties. New bounds for the biases are
also derived for the above-mentioned cases.
r 2005 Elsevier B.V. All rights reserved.
MSC: primary 62C15; 62D05
Keywords: Bias bound; Censored data; Kaplan–Meier estimator; Loss function; Proportional hazard model
1. Introduction
The product limit (PL) estimator introduced by Kaplan and Meier (1958) has been used extensively inpractice even though it is known that the estimator is undefined beyond the last observation (in the sense ofordered observations) if it happens to be a right censored observation. To overcome this shortfall, Efron(1967) suggested that it be defined as zero beyond the last observation, whether it is censored or not. Hetermed such an estimator as self-consistent. Gill (1980) on the other hand recommended that it be defined asthe value of the estimator at the last observation. Some other completion methods have been suggested in theliterature without adequate justification. In this note, we propose a completion method that is optimal in thesense that the expected value of the integrated squared error loss function is minimum. In our treatment, weuse the exact formula of the kth moment derived by Phadia and Shao (1999). This approach yields anestimator that falls between the above two extremes but has the same large sample properties. New bounds forbiases are also derived. Although our focus is to concentrate on the proportional hazard model, a generalapproach is adopted.
e front matter r 2005 Elsevier B.V. All rights reserved.
l.2005.10.023
ing author. Tel.:+1973 720 3382; fax: +1973 720 3622.
ess: [email protected] (Z. Chen).
ported by CFR, College of Science and Health, William Paterson University.
![Page 2: An optimal completion of the product limit estimator](https://reader036.fdocuments.in/reader036/viewer/2022081702/5750204f1a28ab877e9a1bfe/html5/thumbnails/2.jpg)
ARTICLE IN PRESSZ. Chen, E. Phadia / Statistics & Probability Letters 76 (2006) 913–919914
The organization of the paper is as follows. In Section 2 we introduce some notations and preliminaries. Themain result is presented in Section 3. Bias bounds are then presented in Section 4.
2. Notations and preliminaries
Let X i; i ¼ 1; 2; . . . ; n, be i.i.d. sample from F , and Y i; i ¼ 1; 2; . . . ; n i.i.d. (G), be the censoring variables,independent of X i, and we observe, as in the context of survival analysis, Zi ¼ minfX i;Y ig anddi ¼ IðX ipY iÞ; i ¼ 1; 2; . . . ; n. We assume that both F and G are continuous and defined on the interval½0;1Þ. If we further assume that di and Zi are independent, we have the proportional hazard model. Kaplanand Meier (1958) introduced the PL estimator for the survival function SðtÞ ¼ 1� F ðtÞ, defined as
bSðtÞ ¼ Pn�11 1�
di:n
n� i þ 1
� �IðZi:nptÞ
,
where Zi:n denotes the ith ordered observation among Z0is and di:n corresponds to Zi:n. However, for t4Zn:n,and dn:n ¼ 0, the estimator was undefined. Efron (1967) introduced the notion of self-consistency andsuggested that it be defined as 0, while Gill (1980) defined it as
ð1� dn:nÞP 1�di:n
n� i þ 1
� �IðZi:nptÞ
; for t4Zn:n.
It is well known that EcSEðtÞpSðtÞpEcSGðtÞ, where cSE and cSG stands for the Efron’s and Gill’s versions,respectively. Since the survival function is monotone, any sensible modification of the last term of the PLestimator will be
bScðtÞ ¼ cð1� dn:nÞPn�11 1�
di:n
n� i þ 1
� �IðZi:nptÞ
; for t4Zn:n,
where c will be a quantity between 0 and 1. Clearly, the extreme values of c yield Efron’s and Gill’s versions,respectively. The goal of this article is to suggest a completion which minimizes the mean squared error loss
EðbS � SÞ2 ¼ EðbF � F Þ2. But instead of point-wise consideration, we will consider the loss functionR10ðbF ðtÞ �
F ðtÞÞ2 dF ðtÞ and try to determine a constant c which minimizes ER10 ðbF ðtÞ � F ðtÞÞ2 dF ðtÞ ¼ �
R10 EðbS � SÞ2 dS.
The results of Phadia and Shao (1999) on the exact formula for E bSkðtÞ will be used.
For this purpose, let H1ðtÞ ¼ PðZpt;X4Y Þ ¼R t
0 S dG, H2ðtÞ ¼ PðZpt;XpY Þ ¼R t
0 G dF , and HðtÞ ¼
PðZptÞ where G ¼ 1� G. For positive integer k, define fðkÞj ðtÞ ¼ H1ðtÞ þ ððn� jÞ=ðn� j þ 1ÞÞkH2ðtÞ. For
i ¼ 1; 2; . . . ; n, denote Zi:n ¼ Zi, such that Z1pZ2p � � �pZn are the ordered observations and di correspondsto Zi. Then Phadia and Shao (1999) showed that
E bSkðtÞ ¼ S
n
i
� �H
n�ii!
ZIðziotÞPi
j¼1dðfðkÞj ðzjÞÞ,
where the summation extends to n� 1 for Efron’s version and to n for Gill’s version. They also used linearapproximation for the integral and obtained
i!
ZIðziotÞPi
j¼1dðfðkÞj ðzjÞe¼Pi
j¼1fðkÞj ðtÞ (2.1)
yielding E bSkðtÞ ¼ Sðn
iÞH
n�iPi
j¼1fðkÞj ðtÞ.
3. Optimal completion results
We are now ready to derive the main result.
![Page 3: An optimal completion of the product limit estimator](https://reader036.fdocuments.in/reader036/viewer/2022081702/5750204f1a28ab877e9a1bfe/html5/thumbnails/3.jpg)
ARTICLE IN PRESSZ. Chen, E. Phadia / Statistics & Probability Letters 76 (2006) 913–919 915
Theorem 1. The critical value of c such that the corresponding completion minimizes ER10 ðbF ðtÞ � F ðtÞÞ2 dF ðtÞ is
c ¼
R10 SðtÞ½
RIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ� dSðtÞR1
0 ½R
IðznotÞPnj¼1 dðf
ð2Þj ðzjÞÞ�dSðtÞ
, (3.1)
which is approximatelyR10 SðtÞPn
j¼1fð1Þj ðtÞdSðtÞR1
0 Pnj¼1f
ð2Þj ðtÞdSðtÞ
.
Proof. Differentiating the loss function with respect to c, we get
d
dc�
Z 10
EðbSðtÞ � SðtÞÞ2 dSðtÞ
� �¼
Z 10
2SðtÞd
dcfEðdSðtÞÞgdSðtÞ �
Z 10
d
dcfEðbS2
ðtÞÞgdSðtÞ.
SinceR10 2SðtÞd=dcfEðbSðtÞÞgdSðtÞ ¼ 2
R10 SðtÞn!
RIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞdSðtÞ and
R10 d=dc fEðbS2
ðtÞÞgdSðtÞ ¼
2cR10 n!
RIðznotÞPn
j¼1 dðfð2Þj ðzjÞÞdSðtÞ, we get the critical value
c ¼
R10 SðtÞ½
RIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ� dSðtÞR1
0 ½R
IðznotÞPnj¼1 dðf
ð2Þj ðzjÞÞ�dSðtÞ
40.
The approximate value is obtained trivially by making the substitution given in (2.1). &
Since ER10ðcF c � F ðtÞÞ2 dF ðtÞ ¼ �
R10
EðbSc � SÞ2 dS is a quadratic function of c, we see from the above
proof that, if c 2 ð0; 1�, then bSc is the optimal completion. However, if c41, Gill’s version is the optimalbecause the constant c has to be in the interval of 0 and 1 in order for the completion to make sense. It isbelieved that the above critical value c is between 0 and 1, but it is rather difficult to prove it in general.However, it is true for the proportional hazard model as shown in Corollary 3. In any case, we define thecompletion to be optimal for values of c in the interval ð0; 1�.
Remark 1. Our optimal completion of the PL estimator and the two versions mentioned above differ only inthe value beyond the last observation. In fact they are all very close and the differences tend to zeroexponentially fast. This can be seen for instance as follows (see Phadia and Van Ryzin, 1980). With Ei ¼
fZipuoZiþ1g and Znþ1 ¼ 1, we have
EfjcSEðtÞ � bScðtÞj2g ¼ EfjcSEðtÞ � bScðtÞj
2jEng � PðEnÞ.
The first factor on the right-hand side is finite and
PðEnÞ ¼ PðZnpuÞ ¼ ð1�HðuÞÞn ¼ exp½�n lnðð1�HðuÞÞ�1Þ��!0 if HðuÞ40.
Thus all of their asymptotic properties should be the same. Similar results hold for bSG.
Under the proportional hazard model, Z and d are independent and therefore H2ðtÞ ¼ PðZpt;d ¼ 1Þ ¼ Pðd ¼ 1ÞPðZptÞ ¼ gHðtÞ, H1ðtÞ ¼ ð1� gÞHðtÞ, where g ¼ Pðd ¼ 1Þ ¼ PðXpY Þ. The next proposi-tion gives a neat expression for c.
Theorem 2. Under the proportional hazard model, the critical value of c in (3.1) reduces to
c ¼1
2Pn
1
iði � gÞði þ gÞ
ði þ 2gÞ½ði � gÞ2 þ gð1� gÞ�
� �. (3.2)
Proof. By substituting the values of H1ðtÞ and H2ðtÞ, it is easy to see that the constant c derived in Theorem 1reduces to
c ¼
R10
SðtÞHnðtÞdSðtÞR10 HnðtÞdSðtÞ
Pn1
ðð1� gÞ þ ðn� jÞ=ðn� j þ 1ÞgÞ
ðð1� gÞ þ ððn� jÞ=ðn� j þ 1ÞÞ2gÞ.
![Page 4: An optimal completion of the product limit estimator](https://reader036.fdocuments.in/reader036/viewer/2022081702/5750204f1a28ab877e9a1bfe/html5/thumbnails/4.jpg)
ARTICLE IN PRESSZ. Chen, E. Phadia / Statistics & Probability Letters 76 (2006) 913–919916
To evaluateR10 HnðtÞdSðtÞ and
R10 SðtÞHnðtÞdSðtÞ, note that SG ¼ 1�H, and dH ¼ 1=gG dS, where
H ¼ 1�H. Now using the technique of integration by parts, we haveZ 10
HnðtÞdSðtÞ ¼
Z 10
nSðtÞHn�1ðtÞdHðtÞ
¼n
g
Z 10
Hn�1ðtÞSðtÞGðtÞdSðtÞ
¼n
g
Z 10
Hn�1ðtÞdSðtÞ �
Z 10
HnðtÞdSðtÞ
� �.
Therefore by induction,Z 10
HnðtÞdSðtÞ ¼n
nþ g
Z 10
Hn�1ðtÞdSðtÞ
¼n
nþ g�
n� 1
ðn� 1Þ þ g
Z 10
Hn�2ðtÞdSðtÞ
¼n!
Pn1ði þ gÞ
Z 10
dSðtÞ
¼ �n!
Pn1ði þ gÞ
.
Similarly,Z 10
SðtÞHnðtÞdSðtÞ ¼1
2
Z 10
S2ðtÞnHn�1ðtÞdHðtÞ
¼n
2g
Z 10
SðtÞHn�1ðtÞSðtÞGðtÞdSðtÞ
¼n
2g
Z 10
SðtÞ½Hn�1ðtÞ �HnðtÞ�dSðtÞ,
and by induction, we obtainZ 10
SðtÞHnðtÞdSðtÞ ¼n
nþ 2g
Z 10
SHn�1 dS
¼n!
Pn1ði þ 2gÞ
Z 10
S dS
¼ �n!
2Pn1ði þ 2gÞ
.
HenceR10 SðtÞHnðtÞdSðtÞR1
0 HnðtÞdSðtÞ¼
1
2Pn
1
ði þ gÞði þ 2gÞ
.
Now we will simplify
Pn1
ðð1� gÞ þ ðn� jÞ=ðn� j þ 1ÞgÞ
ðð1� gÞ þ ððn� jÞ=ðn� j þ 1ÞÞ2gÞ,
Pn1
ðð1� gÞ þ ðn� jÞ=ðn� j þ 1ÞgÞ
ðð1� gÞ þ ððn� jÞ=ðn� j þ 1ÞÞ2gÞ¼ Pn
1
½ðn� j þ 1Þ2ð1� gÞ þ ðn� j þ 1Þðn� jÞg�
½ðn� j þ 1Þ2ð1� gÞ þ ðn� jÞ2g�
![Page 5: An optimal completion of the product limit estimator](https://reader036.fdocuments.in/reader036/viewer/2022081702/5750204f1a28ab877e9a1bfe/html5/thumbnails/5.jpg)
ARTICLE IN PRESSZ. Chen, E. Phadia / Statistics & Probability Letters 76 (2006) 913–919 917
¼ Pn1
½i2ð1� gÞ þ iði � 1Þg�
½i2ð1� gÞ þ ði � 1Þ2g�
¼ Pn1
iði � gÞ
½ði � gÞ2 þ gð1� gÞ�,
where in the second equality above, we substituted n� j þ 1 ¼ i. Putting together these terms we get (3.2). &
Thus, the optimality constant c depends on the proportional hazard rate g ¼ Pðd ¼ 1Þ. Since d is observable,g and hence, c and bSc can easily be estimated from the given data.
Our next corollary shows that c is within the (0; 1) range as we expected, and therefore, bSc is optimal.
Corollary 3. Under the proportional hazard model, the optimality constant c, a function of g, satisfies
1
4�
4� g2
4� g2 þ 5gð1� gÞoco
4� g2
4� g2 þ 5gð1� gÞ,
and hence c 2 ð0; 1Þ.
Proof. Since
cðgÞ ¼ð1þ gÞ
2ð1þ 2gÞ�2ð2� gÞð2þ gÞð2þ 2gÞð4� 3gÞ
�Pn3
iði � gÞði þ gÞ
ði þ 2gÞ½ði � gÞ2 þ gð1� gÞ�
¼4� g2
4� g2 þ 5gð1� gÞ�1
2Pn
3
iði � gÞði þ gÞ
ði þ 2gÞ½ði � gÞ2 þ gð1� gÞ�
¼4� g2
4� g2 þ 5gð1� gÞ� f ðgÞ; say.
Denote
LðgÞ ¼D1
2�Pn
3
iði � 2gÞ
ði � gÞ2 þ gð1� gÞand UðgÞ ¼D
1
2�Pn
3
iði þ gÞði þ 2gÞði � gÞ
.
Then we have
LðgÞpf ðgÞ ¼1
2Pn
3
iði � gÞði þ gÞ
ði þ 2gÞ½ði � gÞ2 þ gð1� gÞ�pUðgÞ.
Simple calculation shows that
½ln 2UðgÞ�0 ¼ Sn3
1
ði þ gÞ�
2
ði þ 2gÞþ
1
ði � gÞ
� �¼ Sn
3
2gð2i þ gÞði � gÞði þ gÞði þ 2gÞ
40,
so UðgÞ is an increasing function of g when g 2 ð0; 1Þ, therefore
f ðgÞpUðgÞpUð1Þ ¼1
2Pn
3
iði þ 1Þ
ði þ 2Þði � 1Þ¼
1
2�
n
2�
4
nþ 2¼
n
nþ 2o1.
Similarly, ½ln 2LðgÞ�0o0, so f ðgÞXLðgÞXLð1Þ ¼ n=4ðn� 1Þ414. &
4. Bias bounds
Let cSEðtÞ and cSGðtÞ be as defined earlier, the Efron’s and Gill’s completion, respectively, and bScðtÞ be the
optimal completion considered in this article. Further, let BðtÞ ¼ E bSðtÞ � SðtÞ be the bias with suffix indicating
the bias of a particular completion. Efron (1967) showed that EcSEðtÞpSðtÞ, and Klein (1991) proved that
SðtÞpEcSGðtÞ. A special case of Zhou (1988) shows that jBGjpR t
0 HnðtÞdF ðtÞ. We now give a different bias
bound.
![Page 6: An optimal completion of the product limit estimator](https://reader036.fdocuments.in/reader036/viewer/2022081702/5750204f1a28ab877e9a1bfe/html5/thumbnails/6.jpg)
ARTICLE IN PRESSZ. Chen, E. Phadia / Statistics & Probability Letters 76 (2006) 913–919918
Proposition 4. For both Efron’s and Gill’s completion, the bound on bias B is
jBjon!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ � Pn
j¼1 HðtÞ �1
jH2ðtÞ
� �,
and for the completion discussed in this article, the bound is
jBcjpmaxfc; 1� cgn!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ � maxfc; 1� cgPn
j¼1 HðtÞ �1
jH2ðtÞ
� �.
Proof. We have
0oBG ¼ EcSEðtÞ � SðtÞ þ n!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ
on!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ,
and
04BE ¼ EcSEðtÞ � SðtÞ4� n!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ,
Therefore,jBjon!
RIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ for both Efron’s and Gill’s versions. Now
Bc ¼ EcSGðtÞ � SðtÞ þ ðc� 1Þn!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ
X� ð1� cÞn!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ,
and
Bc ¼ EcSEðtÞ � SðtÞ þ cn!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞ
pcn!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞÞ.
So, jBcjpmaxfc; 1� cgn!R
IðznotÞPnj¼1 dðf
ð1Þj ðzjÞÞ. &
In the case of proportional hazard model,
n!
ZIðznotÞPn
j¼1 dðfð1Þj ðzjÞ ¼ HnðtÞPn
1 1�1
ig
� �,
(see Chen et al., 1982), therefore we have the following.
Corollary 5. In the case of proportional hazard model, the bounds on biases reduces to
jBjoHnðtÞPn1 1�
1
ig
� �and jBcjpmaxfc; 1� cgHnðtÞPn
1 1�1
ig
� �.
The above bias bounds are easy to obtain. However, they are new to the best of our knowledge. To see howgood these bounds are, we compare them with Zhou’s (1988) result. In the case of propositional hazard model,earlier computation shows that
R t
0 HnðxÞdF ðxÞ ! n!=Pn1ði þ gÞ as t!1. Since HnðtÞPn
1ð1� ð1=iÞgÞ !ð1=n!ÞPn
1ði � gÞ, as t!1, and clearly ð1=n!ÞPn1ði � gÞon!=Pn
1ði þ gÞ when g 2 ð0; 1Þ, the new bias bound givenabove is sharper for t!1. The question whether the new bound is always better (that is, whetherR t
0 HnðtÞdF ðtÞXn!R
IðznotÞPnj¼1 dðf
ð1Þj ðzjÞÞ) for all t40, is still open.
![Page 7: An optimal completion of the product limit estimator](https://reader036.fdocuments.in/reader036/viewer/2022081702/5750204f1a28ab877e9a1bfe/html5/thumbnails/7.jpg)
ARTICLE IN PRESSZ. Chen, E. Phadia / Statistics & Probability Letters 76 (2006) 913–919 919
References
Chen, Y.Y., Hollander, M., Langberg, N.A., 1982. Small-sample results for the Kaplan–Meier estimator. J. Amer. Statist. Assoc. 77,
141–144.
Efron, B., 1967. The two sample problem with censored data. Proceedings of the 5th Berkeley Symposium vol. 4, pp. 831–852.
Gill, R.D., 1980. Censoring and stochastic integrals. Mathematical Centre Tract No. 124. Mathematisch Centrum, Amsterdam.
Kaplan, E.L., Meier, P., 1958. Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, 457–481.
Klein, J.P., 1991. Small sample moments of some estimators of the variance of the Kaplan–Meier and Nelson–Aalen estimators. Scand. J.
Statist. 18, 333–340.
Phadia, E., Shao, Y., 1999. Exact moments of the product limit estimator. Statist. Probab. Lett. 41, 277–286.
Phadia, E.G., Van Ryzin, J., 1980. A note on convergence rates for the product limit estimator. Ann. Statist. 8, 673–678.
Zhou, M., 1988. Two sided bias bounds of the Kaplan–Meier estimator. Probab. Theory Related Fields 79, 165–173.