Maximum Likelihood Estimation for Type I Censored Weibull Data...

35
Maximum Likelihood Estimation for Type I Censored Weibull Data Including Covariates Fritz Scholz Mathematics & Computing Technology Boeing Phantom Works August 22, 1996 Revised January 5, 2001 1 The author wishes to thank James King for several inspiring conversation concerning the unique- ness theorem presented in Section 2, Siavash Shahshahani for showing him more than 30 years ago that it follows from Morse Theory, John Betts for overcoming programming difficulties in using HDNLPR, and Bill Meeker for helpful comments on computational aspects. 1 The revision mainly corrects misprints, some notational inconsistencies, and indicates the new calling sequence for compiling the Fortran program

Transcript of Maximum Likelihood Estimation for Type I Censored Weibull Data...

  • Maximum Likelihood Estimation for

    Type I Censored Weibull Data

    Including Covariates

    Fritz Scholz∗

    Mathematics & Computing TechnologyBoeing Phantom Works

    August 22, 1996

    Revised January 5, 20011

    ∗The author wishes to thank James King for several inspiring conversation concerning the unique-ness theorem presented in Section 2, Siavash Shahshahani for showing him more than 30 years agothat it follows from Morse Theory, John Betts for overcoming programming difficulties in usingHDNLPR, and Bill Meeker for helpful comments on computational aspects.

    1The revision mainly corrects misprints, some notational inconsistencies, and indicates the newcalling sequence for compiling the Fortran program

  • Abstract

    This report deals with the specific problem of deriving maximum likeli-hood estimates of the regression model parameters when the residual errorsare governed by a Gumbel distribution. As an additional complication the ob-served responses are permitted to be type I or multiply censored. Since thelog-transform of a 2-parameter Weibull random variable has a Gumbel distri-bution, the results extend to Weibull regression models, where the log of theWeibull scale parameter is modeled linearly as a function of covariates. In theWeibull regression model the covariates thus act as multiplicative modifiers ofthe underlying scale parameter.

    A general theorem for establishing a unique global maximum of a smoothfunction is presented. The theorem was previously published by Mäkeläinen etal. (1981) with a sketch of a proof. The proof presented here is much shorterthan their unpublished proof.

    Next, the Gumbel/Weibull regression model is introduced together with itscensoring mechanism. Using the above theorem the existence and uniqueness ofmaximum likelihood estimates for the posed specific Weibull/Gumbel regressionproblem for type I censored responses is characterized in terms of sufficient andeasily verifiable conditions, which are conjectured to be also necessary.

    As part of an efficient optimization algorithm for finding these maximumlikelihood estimates it is useful to have good starting values. These are foundby adapting the iterative least squares algorithm of Schmee and Hahn (1979)to the Gumbel/Weibull case. FORTRAN code for computing the maximumlikelihood estimates was developed using the optimization routine HDNLPR.Some limited experience of this algorithm with simulated data is presented aswell as the results to a specific example from Gertsbakh (1989).

  • 1 Introduction

    In the theory of maximum likelihood estimation it is shown, subject to regularityconditions, that the likelihood equations have a consistent root. The problems thatarise in identifying the consistent root among possibly several roots were discussedby Lehmann (1980). It is therefore of interest to establish, whenever possible, thatthe likelihood equations have a unique root. For example, for exponential familydistributions it is easily shown, subject to mild regularity conditions, that the log-likelihood function is strictly concave which in turn entails that the log-likelihoodequations have at most one root. However, such global concavity cannot always beestablished. Thus one may ask to what extent the weaker property of local concavityof the log-likelihood function at all roots of the likelihood equations implies thatthere can be at most one root. Uniqueness arguments along these lines, althoughincomplete, may be found in Kendall and Stuart (1973, p. 56), Turnbull (1974), andCopas (1975), for example.

    However, it also was pointed out by Tarone and Gruenhage (1975) that a function oftwo variables may have an infinity of strict local maxima and no other critical points,i.e. no saddle points or minima. To resolve this issue, a theorem is presented whichis well known to mathematicians as a special application of Morse Theory, cf. Milnor(1963) and also Arnold (1978) p. 262. Namely, on an island the number of minimaminus the number of saddle points plus the number of maxima is always one. Thespecialization of the theorem establishing conditions for a unique global maximumwas first presented to the statistical community by Mäkeläinen et al. (1981). SinceMorse Theory is rather deep and since Mäkeläinen et al. only give an outline ofa proof, leaving the lengthy details to a technical report, a short (one page) andmore accessible proof is given here. It is based on the elementary theory of ordinarydifferential equations.

    It is noted here that although Mäkeläinen et al. have priority in publishing thetheorem presented here, a previous version of this paper had been submitted forpublication, but was withdrawn and issued as a technical report (Scholz, 1981), whenthe impending publication of Mäkeläinen et al. became known. Aside from thesetwo independent efforts there was a third by Barndorff-Nielsen and Blæsild (1980),similarly preempted, which remained as a technical report. Their proof of the resultappears to depend on Morse Theory. Similar results under weaker assumptions maybe found in Gabrielsen (1982, 1986). Other approaches, via a multivariate version ofRolle’s theorem were examined in Rai and van Ryzin (1982).

    3

  • 2 The Uniqueness Theorem

    In addition to the essential strict concavity at all critical points the uniqueness the-orem invokes a compactness condition which avoids the problems pointed out byTarone and Gruenhage (1975) and which are illustrated in Figure 1. The theoremcan be stated as follows:

    Theorem 1 Let G be an open, connected subset of Rn and let f : G −→ R betwice continuously differentiable on G with the following two properties:

    i) For any x ∈ G with grad f(x) = 0 the Hessian D2f(x) is negative definite, i.e.all critical points are strict local maxima.

    ii) For any x ∈ G the set {y ∈ G : f(y) ≥ f(x)} is compact.

    Then f has exactly one critical point, hence one global maximum and no other localmaxima on G.

    Proof: By i) all critical points are isolated, i.e. for each critical point x ∈ G of fthere exists and �(x) > 0 such that

    B�(x)(x) = {y ∈ Rn : |y − x| < �(x)} ⊂ G

    contains no other critical point besides x, and such that

    g(x)def= sup

    {f(y) : y ∈ ∂B�(x)(x)

    }< f(x) .

    Let

    Ud(x)(x) ={y ∈ B�(x)(x) : f(y) > f(x)− d(x)

    }with 0 < d(x) < f(x)−g(x), then ∂Ud(x)(x) ⊂ B�(x)(x) (note that f(y) = f(x)−d(x)for y ∈ ∂Ud(x)(x)). Consider now the following vector function

    h(z) = grad f(z) · | grad f(z) |−2

    which is well defined and continuously differentiable on G−C, where C is the set ofcritical points of f in G. Hence the differential equation ż(t) = h(z(t)) with initial

    4

  • condition z(0) = z0 ∈ G− C has a unique right maximal solution z(t; 0, z0) on someinterval [0, t0), t0 > 0, see Hartman (1964), pp. 8-13. Note that f(z(t; 0, z0)) =f(z0) + t for t ∈ [0, t0). It follows from ii) that t0 must be finite. Consider now thefollowing compact set:

    K = {y ∈ G : f(y) ≥ f(z0)} −⋃

    x∈CUd(x)(x) .

    Then z(t; 0, z0) �∈ K for t near t0, see Hartman pp. 12-13. Hence for t near t0z(t; 0, z0) ∈ Ud(x)(x) for some x ∈ C. From the construction of Ud(x)(x) it is clear thatonce such a solution enters Ud(x)(x) it will never leave it. For x ∈ C let P (x) be theset set containing x and all those points z0 ∈ G − C whose solutions z(t; 0, z0) willwind up in Ud(x)(x). It has been shown that {P (x) : c ∈ C} forms a partition of G.Since z(t; 0, z0) is a continuous function of z0 ∈ G− C, see Hartman p. 94, it followsthat each P (x), x ∈ C is open. Since G is assumed to be connected, i.e., G cannotbe the disjoint union of nonempty open sets, one concludes that all but one of theP (x), x ∈ C, must be empty. Q.E.D.Remark: It is clear that a disconnected set G allows for easy counterexamples ofthe theorem. Assumption ii) is violated in the example presented by Tarone andGruenhage: f(x, y) = − exp(−2y)− exp(−y) sin(x). Figure 1 shows the contour linesof f(x, y) in the upper plot and the corresponding perspective in the lower plot. Inthicker line width is indicated the contour f(x, y) = 0, given by y = − log(− sin(x))over the intervals where sin(x) < 0. This latter contour is unbounded since y → ∞as sin(x) → 0 at those interval endpoints. Thus the level set {(x, y) : f(x, y) ≥ 0}is unbounded. What is happening in this example is that there are saddle points atinfinity which act as the connecting agent between the local maxima.

    Assumption ii) may possibly be replaced by weaker assumptions; however, it appearsdifficult to formulate such assumptions without impinging on the simplicity of theoremand proof. The following section will illustrate the utility of the theorem in thecontext of censored Weibull data with covariates. However, it should be noted thatmany other examples exist.

    5

  • Figure 1: Contour and Perspective Plots

    of f(x, y)= − exp(−2y) − exp(−y) sin(x)

    x

    y

    0 2 4 6 8 10 12

    -0.5

    0.0

    0.5

    1.0

    1.5

    -3 -3-2.5 -2.5 -2.5-2

    -2

    -2

    -2

    -2-1.5 -1.5

    -1.5

    -1.5

    -1

    -0.5

    -0.25 -0.25

    -0.25

    -0.25

    -0.1 -0.1

    -0.1

    0 00.1 0.1

    0.20.2

    0.240.24

    • •

    24

    68

    1012

    X 0

    0.5

    1

    Y

    -4-3

    -2-1

    01

    Z

    6

  • 3 Weibull Regression Model Involving Censored Data

    Consider the following linear model:

    yi =p∑

    j=1

    uijβj + σ�i = u′iβ + σ�i i = 1, . . . , n

    where �1, . . . , �n are independent random errors, identically distributed according tothe extreme value or Gumbel distribution with density f(x) = exp[x − exp(x)] andcumulative distribution function F (x) = 1 − exp[− exp(x)]. The n × p matrix U =(uij) of constant regression covariates is assumed to be of full rank p, with n > p.The unknown parameters σ, β1, . . . , βp will be estimated by the method of maximumlikelihood, which here is taken to be the solution to the likelihood equations.

    The above model can also arise from the following Weibull regression model:

    P (Ti ≤ t) = 1− exp(−

    [t

    α(ui)

    ]γ )

    which, after using the following log transformation Yi = log(Ti), results in

    P (Yi ≤ y) = 1− exp[− exp

    (y − log[α(ui)]

    (1/γ)

    )]= 1− exp

    [− exp

    (y − µ(ui)

    σ

    )].

    Using the identifications σ = 1/γ and µ(ui) = log[α(ui)] = u′iβ this reduces to the

    previous linear model with the density f .

    Rather than observing the responses yi completely, the data are allowed to be cen-sored, i.e., for each observation yi one either observes it or some censoring time ci.The response yi is observed whenever ci ≥ yi and otherwise one observes ci, and oneknows whether the observation is a yi or a ci. One will also always know the corre-sponding covariates uij, j = 1, . . . , p for i = 1, . . . , n. Such censoring is called type Icensoring or, since the censoring time points ci can take on multiple values, one alsospeaks of multiply censored data. Thus the data consist of

    S = {(x1, δ1,u1), . . . , (xn, δn,un)} ,

    where xi = yi and δi = 1 when yi ≤ ci, and xi = ci and δi = 0 when yi > ci. Thenumber of uncensored observations is denoted by r =

    ∑ni=1 δi and the index sets of

    uncensored and censored observations by D and C, respectively, i.e.,

    7

  • D = {i : δi = 1, i = 1, . . . , n} = {i1, . . . , ir} and C = {i : δi = 0, i = 1, . . . , n} .

    Furthermore, denote the uncensored observations and corresponding covariates by

    yD =

    yi1...yir

    and UD =

    u′i1...

    u′ir

    .The likelihood function of the data S is

    L(β, σ) =∏i∈D

    1

    σexp

    [xi − u′iβ

    σ− exp

    (xi − u′iβ

    σ

    )]∏i∈C

    exp

    [− exp

    (xi − u′iβ

    σ

    )]

    and the corresponding log-likelihood is

    )(β, σ) = log[L(β, σ)]

    =∑i∈D

    [xi − u′iβ

    σ− exp

    (xi − u′iβ

    σ

    )]− ∑

    i∈Dlog σ − ∑

    i∈Cexp

    (xi − u′iβ

    σ

    ).

    3.1 Conditions for Unique Maximum Likelihood Estimates

    Here conditions will be stated under which the maximum likelihood estimates of βand σ exist and are unique. It seems that this issue has not yet been addressed inthe literature although software for finding the maximum likelihood estimates existsand is routinely used. Some problems with such software have been encounteredand situations have been discovered in which the maximum likelihood estimates,understood as roots of the likelihood equations

    ∂)(β, σ)

    ∂σ= 0 and

    ∂)(β, σ)

    ∂βj= 0 , for j = 1, . . . , p , (1)

    do not exist. Thus it seems worthwhile to explicitly present the conditions whichguarantee unique solutions to the likelihood equations. These conditions appear to be

    8

  • reasonable and not unduely restrictive. In fact, it is conjectured that these conditionsare also necessary, but this has not been pursued.

    Theorem 2 Let r ≥ 1 and the columns of UD be linearly independent. Then foryD not in the column space of UD or for yD = UDβ̂ for some β̂ with xi > u

    ′iβ̂ for

    some i ∈ C the likelihood equations (1) have a unique solution which represents thelocation of the global maximum of )(β, σ) over Rp × (0,∞).Comments: The above assumption concerning UD is stronger than assuming thatthe columns of U be linearly independent. Also, the event that yD is in the columnspace of UD technically has probability zero if r > p, but may occur due to roundingor data granularity problems.

    When r ≥ p and yD = UDβ̂ with xi ≤ u′iβ̂ for all i ∈ {C}, it is easily seen that)(β̂, σ) → ∞ as σ → 0. From the point of view of likelihood maximization this wouldpoint to (β̂, σ̂) = (β̂, 0) as the maximum likelihood estimates, provided one extendsthe permissible range of σ from (0,∞) to [0,∞). However, the conventional largesample normality theory does not apply here, since it is concerned with the roots ofthe likelihood equations.

    The additional requirement xi > u′iβ̂ for some i ∈ C gives the extra information that

    is needed to get out of the denenerate case, namely the linear pattern yD = UDβ̂,because the actual observation yi implied by the censored case xi > u

    ′iβ̂ will also

    satisfy that inequality since yi > xi and thus break the linear pattern and yield aσ̂ > 0. This appears to have been overlooked by Nelson (1982) when on page 392 hesuggests that when estimating k parameters one should have at least k distinct failuretimes, otherwise the estimates do not exist. Although his recommendation was madein a more general context it seems that the conditions of Theorem 2 may have somebearing on other situations as well.

    Proof: First it is shown that any any critical point (β, σ) of ) is a strict localmaximum. In the process the equations resulting from grad )(β, σ) = 0 are used tosimplify the Hessian or matrix of second derivatives of ) at those critical points. Thissimplified Hessian is then shown to be negative definite. The condition grad )(β, σ) =0 results in the following equations:

    ∂)

    ∂σ= − r

    σ− ∑

    i∈D

    xi − u′iβσ2

    +n∑

    i=1

    xi − u′iβσ2

    exp

    (xi − u′iβ

    σ

    )

    9

  • = − rσ− 1σ

    [ ∑i∈D

    zi −n∑

    i=1

    zi exp(zi)

    ]= 0 (2)

    with zi = (xi − u′iβ)/σ and

    ∂)

    ∂βj= − 1

    σ

    [ ∑i∈D

    uij −n∑

    i=1

    uij exp

    (xi − u′iβ

    σ

    ) ]

    = − 1σ

    [ ∑i∈D

    uij −n∑

    i=1

    uij exp (zi)

    ]= 0 for j = 1, . . . , p . (3)

    The Hessian or matrix H of second partial derivatives of ) with respect to (β, σ) ismade up of the following terms for 1 ≤ j, k ≤ p:

    ∂2)(β, σ)

    ∂βj∂βk= − 1

    σ2

    n∑i=1

    uijuik exp(zi) (4)

    ∂2)(β, σ)

    ∂βj∂σ=

    1

    σ2

    [ ∑i∈D

    uij −n∑

    i=1

    uij exp(zi)−n∑

    i=1

    uijzi exp(zi)

    ](5)

    ∂2)(β, σ)

    ∂σ2=

    1

    σ2

    [r + 2

    ∑i∈D

    zi − 2n∑

    i=1

    zi exp(zi)−n∑

    i=1

    z2i exp(zi)

    ](6)

    From (2) one gets

    n∑i=1

    zi exp(zi)−∑i∈D

    zi = r

    and one can simplify (6) to

    ∂2)(β, σ)

    ∂σ2=

    r

    σ2− 2rσ2

    − 1σ2

    n∑i=1

    z2i exp(zi) = −1

    σ2

    [r +

    n∑i=1

    z2i exp(zi)

    ]

    10

  • Using (3) one can simplify (5) to

    ∂2)(β, σ)

    ∂βj∂σ= − 1

    σ2

    n∑i=1

    ziuij exp(zi) .

    Thus the matrix H of second partial derivatives of ) at any critical point is

    H = − 1σ2

    ∑ni=1 exp(zi)uiu′i ∑ni=1 zi exp(zi)ui∑ni=1 zi exp(zi)u

    ′i r +

    ∑ni=1 z

    2i exp(zi)

    = − 1σ2

    B

    Letting wi = exp(zi)/∑n

    j=1 exp(zj) and W =∑n

    j=1 exp(zj) one can write

    B =W

    ∑ni=1wiuiu′i ∑ni=1wiziui∑ni=1wiziu

    ′i r/W +

    ∑ni=1wiz

    2i

    .In this matrix the upper p×p left diagonal submatrix ∑ni=1wiuiu′i is positive definite.This follows from

    a′n∑

    i=1

    wiuiu′ia =

    n∑i=1

    wi|a′ui|2 > 0

    for every a ∈ Rp − {0}, provided the columns of U are linearly independent, whichfollows from our assumption about UD. The lower right diagonal element r +W

    ∑ni=1wiz

    2i of B is positive since r ≥ 1.

    The last step in showing B to be positive definite is to verify that det(B) > 0. Tothis end let

    V =

    (n∑

    i=1

    wiuiu′i

    )−1

    and note that for r > 0 one has

    det(B) = W det

    (n∑

    i=1

    wiuiu′i

    )

    × det[r/W +

    n∑i=1

    wiz2i −

    n∑i=1

    wiziu′iV

    n∑i=1

    wiziui

    ]> 0

    11

  • since

    0 ≤n∑

    i=1

    wi

    zi − u′iV n∑j=1

    wjzjuj

    2

    =n∑

    i=1

    wiz2i − 2

    n∑i=1

    wiziu′iV

    n∑j=1

    wjzjuj +n∑

    i=1

    wiu′iV

    n∑j=1

    wjzjuj u′iV

    n∑j=1

    wjzjuj

    =n∑

    i=1

    wiz2i − 2

    n∑i=1

    wiziu′iV

    n∑j=1

    wjzjuj +n∑

    i=1

    win∑

    j=1

    wjzju′jV uiu

    ′iV

    n∑j=1

    wjzjuj

    =n∑

    i=1

    wiz2i − 2

    n∑i=1

    wiziu′iV

    n∑j=1

    wjzjuj +n∑

    j=1

    wjzju′jV

    n∑i=1

    wiuiu′iV

    n∑j=1

    wjzjuj

    =n∑

    i=1

    wiz2i −

    n∑i=1

    wiziu′iV

    n∑i=1

    wiziui .

    To claim the existence of unique maximum likelihood estimates it remains to demon-strate the compactness condition ii) of Theorem 1. It will be shown that

    a) )(β, σ) −→ −∞ uniformly in β ∈ Rp as σ → 0 or σ → ∞ andb) for any � > 0 and � ≤ σ ≤ 1/� one has

    sup{)(β, σ) : |β| ≥ ρ} −→ −∞ as ρ→ ∞.

    Compact sets in Rp+1 are characterized by being bounded and closed. Using thecontinuous mapping ψ(β, σ) = (β, log(σ)) map the half space K+ = Rp× (0,∞) ontoRp+1. According to Theorem 4.14 of Rudin (1976) ψ maps compact subsets of K+

    into compact subsets of Rp+1, the latter being characterized as closed and bounded.This allows the characterization of compact subsets in K+ as those that are closedand for which |β| and σ are bounded above and for which σ is bounded away fromzero.

    Because of the continuity of )(β, σ) the set

    Q0 = {(β, σ) : )(β, σ) ≥ )(β0, σ0)}

    12

  • is closed and bounded and bounded away from the hyperplane σ = 0. These bounded-ness properties of Q0 are seen by contradiction. If Q0 did not have these properties,then there would be a sequence (βn, σn) with either σn → 0 or σn → ∞ or with0 < � < σn < 1/� and |βn| → ∞. For either of these two cases the above claimsa) and b) state that )(βn, σn) → −∞ which violates )(βn, σn) ≥ )(β0, σ0). Thiscompletes the main argument of the proof of Theorem 2, subject to demonstratingthe claims a) and b).

    To see a) first deal with the case in which yD is not in the column space of UD. Thisentails that for all β ∈ Rp

    |yD − UDβ| ≥ |yD − UDβ̂| = κ > 0 where β̂ = [U ′DUD]−1 U ′DyD .

    Thus max{|xi −u′iβ| : i ∈ D} ≥ κ̃ > 0 uniformly in β ∈ Rp and, using the inequalityx− exp(x) ≤ −|x| for all x ∈ R, one has

    ∑i∈D

    [xi − u′iβ

    σ− exp

    (xi − u′iβ

    σ

    )]≤ −∑

    i∈D

    ∣∣∣∣∣xi − u′iβσ∣∣∣∣∣ (7)

    ≤ − max{|xi − u′iβ| : i ∈ D}σ

    ≤ − κ̃σ−→ −∞ ,

    as σ → 0, and thus uniformly in β ∈ Rp

    )(β, σ) ≤ −r log(σ)− κ̃σ−→ −∞ as σ → 0 .

    To deal with the other case, where UDβ̂ = yD and xi > u′iβ̂ for some i ∈ C, take a

    neighborhood of β̂

    Bρ(β̂) ={β : |β − β̂| ≤ ρ

    }with ρ > 0 chosen sufficiently small so that

    |u′iβ − u′iβ̂| ≤xi − u′iβ̂

    2for all β ∈ Bρ(β̂).

    This in turn implies

    13

  • xi − u′iβ = xi − u′iβ̂ + u′iβ̂ − u′iβ

    ≥ xi − u′iβ̂ −xi − u′iβ̂

    2=

    xi − u′iβ̂2

    for all β ∈ Bρ(β̂). For some κ′ > 0 and for all β �∈ Bρ(β̂) one has |yD − UDβ| ≥ κ′.Bounding the first term of the likelihood )(β, σ) as in (7) for all β �∈ Bρ(β̂) andbounding the last term of the likelihood by

    − exp(xi − u′iβ

    σ

    )≤ − exp

    (xi − u′iβ̂

    )for all β ∈ Bρ(β̂)

    one finds again that either of these bounding terms will dominate the middle term,−r log σ, of )(β, σ) as σ → 0. Thus again uniformly in β ∈ Rp one has )(β, σ) → −∞as σ → 0.As for σ → ∞ note x− exp(x) ≤ −1 and one has

    )(β, σ) ≤ −r log(σ)− r −→ −∞ as σ → ∞uniformly in β ∈ Rp. This establishes a).Now let � ≤ σ ≤ 1/�. From our assumption that the columns of UD are linearlyindependent it follows that

    inf {|UDβ| : |β| = 1} = m > 0where m2 is the smallest eigenvalue of U ′DUD. Thus for all β ∈ Rp

    |UDβ − yD| ≥ |UDβ| − |yD| ≥ m|β| − |yD| ,

    and using the inequality∑ |xi| ≥ √∑x2i one has

    )(β, σ) ≤ −r log(σ)− ∑i∈D

    ∣∣∣∣∣xi − u′iβ

    σ

    ∣∣∣∣∣ ≤ −r log(σ)− |UDβ − yD|σ≤ − r log(σ)− m|β| − |yD|

    σ−→ −∞ as |β| −→ ∞ ,

    again uniformly in |β| ≥ K, with K −→ ∞. This concludes the proof of Theorem 2.

    14

  • 4 Solving the Likelihood Equations

    The previous section showed that the solution to the likelihood equations is uniqueand coincides with the unique global maximum of the likelihood function. This sectiondiscusses some computational issues that arise in solving for these maximum likelihoodestimates. One can either use a multidimensional root finding algorithm to solve thelikelihood equations or one can use an optimization algorithm on the likelihood orlog-likelihood function. It appears that in either case one can run into difficultieswhen trying to evaluate the exponential terms exp([xi − u′iβ]/σ). Depending on thechoice of σ and β this term could easily overflow and terminate all further calculation.Such overflow leads to a likelihood that is practically zero, indicating that σ and βare far away from the optimum. It seems that this problem is what troubles thealgorithm survreg in S-PLUS. In some strongly censored data situations survregsimply crashes with overflow messages. One such data set is given in Table 1 with adagger indicating the three failure times. The histogram of this data set is given inFigure 2 with the three failure cases indicated by dots below the histogram.

    Table 1: Heavily Censored Sample

    626.1 651.7 684.7 686.3 698.2 707.7 709.8 714.7 718.0 719.6720.9 721.9 726.7 740.3 752.9 760.3 764.0 764.8 768.3 773.6774.4 784.1 785.3 788.9 790.3 793.2 794.0 806.1 816.2 825.0826.5 829.8† 832.3 839.4 840.5 843.1 845.2 849.1 849.2† 856.2856.8 859.1 868.9† 869.3 881.1 887.8 890.5 898.2 921.5 934.8

    In the case of simple Weibull parameter estimation without covariates this overflowproblem can be finessed in the likelihood equations by rewriting these equations sothat the exponential terms only appear simultaneously in numerator and denominatorof some ratio, see equation (4.2.2) in Lawless (1982). One can then use a commonscaling factor so that none of the exponential terms overflow.

    In the current case with covariates it appears that this same trick will not work.Thus it is proposed to proceed as follows. Find a starting value (β0, σ0) by way of theSchmee-Hahn regression algorithm presented below. It is assumed that the startingvalue will not suffer from the overflow problems mentioned before.

    Next, employ an optimization algorithm that allows for the possibility that the func-tion to be optimized may not be able to return a function value, gradient or Hessian

    15

  • Figure 2: Histogram for Data Table 1

    650 700 750 800 850 900 950

    02

    46

    8

    cycles

    •• •

    at a desired location. In that case the optimization algorithm should reduce its stepsize and try again. The function box which calculates the function value, gradientand Hessian should take care in trapping exponential overflow problems, i.e., statewhen they cannot be resolved. These problems typically happen only far away fromthe function optimum where the log-likelihood drops off to −∞.Another precaution is to switch from σ to α = log(σ) in the optimization process.Furthermore, it was found that occasionally it was useful to rescale σ, xi and uij bya common scale factor so that σ is in the vicinity of one. This is easily done usingthe preliminary Schmee-Hahn estimates.

    Optimization algorithms usually check convergence based on the gradient (amongother criteria) and the gradient is proportional to the scale of the function to beoptimized. Thus it is useful to rescale the log-likelihood to get its minimum value intoa proper range, near one. This can be done approximately by evaluating the absolutevalue of the log-likelihood at the initial estimate and rescale the log-likelihood functionby dividing by that absolute value.

    16

  • 4.1 Schmee-Hahn Regression Estimates with Censored Data

    This method was proposed by Schmee and Hahn (1979) as a simple estimation methodfor dealing with type I censored data with covariates. It can be implemented by usinga least squares algorithm in iterative fashion.

    We assume the following regression model

    Yi = β1Xi1 + . . .+ βpXip + σei , i = 1, . . . , n

    or in vector/matrix notationY1...Yn

    =X11 . . . X1p...

    . . ....

    Xn1 . . . Xnp

    β1...βp

    + σe1...en

    or more compactly

    Y = Xβ + σe .

    Here Y is the vector of observations, X is the matrix of covariates corresponding toY , β is the vector of regression coefficients, and σe is the vector of independent andidentially distributed error terms with E(ei) = 0 and var(ei) = 1. We denote thedensity of e by g0(z). Often one has Xi1 = 1 for i = 1, . . . , n. In that case the modelhas an intercept.

    Rather than observing this full data set (Y ,X) one observes the Yi in partiallycensored form, i.e., there are censoring values c′ = (c1, . . . , cn) such that Yi is observedwhenever Yi ≤ ci, otherwise the value ci is observed. Also, it is always known whetherthe observed value is a Yi or a ci. This is indicated by a δi = 1 and δi = 0, respectively.Thus the observed censored data consist of

    D = (Ỹ ,X, δ)where δ′ = (δ1, . . . , δn) and Ỹ

    ′= (Ỹ1, . . . , Ỹn) with

    Ỹi =

    Yi if δi = 1, i.e. when Yi ≤ cici if δi = 0, i.e. when Yi > ciBased on this data the basic algorithm consist in treating the observations initiallyas though they are not censored and apply the least squares method to (Ỹ ,X) to

    find initial estimates (σ̂0, β̂′0) of (σ,β

    ′).

    17

  • Next, replace the censored values by their expected values, i.e., replace Ỹi by

    Ỹi,1 = E(Yi|Yi > ci ; σ,β′) whenever δi = 0 ,

    computed by setting (σ,β′) = (σ̂0, β̂′0). Denote this modified Ỹ vector by Ỹ 1. Again

    treat this modified data set as though it is not censored and apply the least squares

    method to (Ỹ 1,X) to find new estimates (σ̂1, β̂′1) of (σ,β

    ′). Repeat the above stepof replacing censored Ỹi values by estimated expected values

    Ỹi,2 = E(Yi|Yi > ci ; σ,β′) whenever δi = 0 ,

    this time using (σ,β′) = (σ̂1, β̂′1). This process can be iterated until some stopping

    criterion is satisfied. Either the iterated regression estimates (σ̂j , β̂′j) do not change

    much any more or the residual sum of squares has stabilized.

    In order to carry out the above algorithm one needs to have a computational expres-sion for

    E(Y |Y > c ; σ,β′) ,where

    Y = β1x1 + . . .+ βpxp + σe = µ(x) + σe

    and the error term e has density g0(z). Then Y has density

    g(y) =1

    σg0

    (y − µ(x)

    σ

    ).

    The conditional density of Y , given that Y > c, is

    gc(y) =

    g(y)/[1−G(c)] for y > c0 for y ≤ c .The formula for E(Y |Y > c ; σ,β′) is derived for two special cases, namely for g0(z) =ϕ(z), the standard normal density with distribution function Φ(z), and for

    g0(z) = δg̃0(δz − γ) = δ exp [δz − γ − exp(δz − γ)] ,where δ = π/

    √6 ≈ 1.28255 and γ ≈ .57721566 is Euler’s constant. Here g̃0(z) =

    exp [z − exp(z)] is the standard form of the Gumbel density with mean −γ and stan-dard deviation δ. Thus g0(z) is the standardized density with mean zero and variance

    18

  • one. The distribution function of g0(z) is denoted by G0(z) = G̃0(δz− γ) and is givenby

    G0(z) = 1− exp(− exp[δz − γ]) .The Gumbel distribution is covered for its intimate connection to the Weibull distri-bution.

    When g0(z) = ϕ(z) and utilizing ϕ′(z) = −zϕ(z) one finds

    E(Y |Y > c ; σ,β′) =∫ ∞

    cy gc(y) dy

    =

    [1− Φ

    (c− µ(x)

    σ

    )]−1 ∫ ∞c

    y1

    σϕ

    (y − µ(x)

    σ

    )dy

    =

    [1− Φ

    (c− µ(x)

    σ

    )]−1 ∫ ∞[c−µ(x)]/σ

    [µ(x) + σz]ϕ(z) dz

    = µ(x)− σ[1− Φ

    (c− µ(x)

    σ

    )]−1 ∫ ∞[c−µ(x)]/σ

    ϕ′(z) dz

    = µ(x) + σ

    [1− Φ

    (c− µ(x)

    σ

    )]−1ϕ

    (c− µ(x)

    σ

    ),

    which is simple enough to evaluate for given σ and µ(x).

    For g0(z) = δ exp [δz − γ − exp(δz − γ)] one obtains in similar fashion

    E(Y |Y > c ; σ,β′) =∫ ∞

    cy gc(y) dy

    =

    [1− G0

    (c− µ(x)

    σ

    )]−1 ∫ ∞c

    y1

    σg0

    (y − µ(x)

    σ

    )dy

    =

    [1− G0

    (c− µ(x)

    σ

    )]−1 ∫ ∞[c−µ(x)]/σ

    [µ(x) + σz]g0(z) dz

    = µ(x) + σ

    [1− G0

    (c− µ(x)

    σ

    )]−1 ∫ ∞[c−µ(x)]/σ

    zg0(z) dz .

    19

  • Here, substituting and integrating by parts, one has∫ ∞a

    zg0(z) dz =∫ ∞

    a[δz − γ + γ] exp [δz − γ − exp(δz − γ)] dz

    = δ−1∫ ∞exp(δa−γ)

    [log(t) + γ] exp(−t) dt

    = δ−1(δa exp[− exp(δa− γ)] +

    ∫ ∞exp(δa−γ)

    exp(−t) t−1 dt)

    = a exp[− exp(δa− γ)] + δ−1E1[exp(δa− γ)] .Here E1(z) is the exponential integral function, see Abramowitz and Stegun (1972).There one also finds various approximation formulas for

    E1(z) =∫ ∞

    zexp(−t) t−1 dt ,

    namely for 0 ≤ z ≤ 1 and coefficients ai given in Table 2 one hasE1(z) = − log(z) + a0 + a1z + a2z2 + a3z3 + a4z4 + a5z5 + �(z)

    with |�(z)| < 2× 10−7, and for 1 ≤ z

  • Combining the above one obtains the following formula for E(Y |Y > c; σ,β′):

    E(Y |Y > c; σ,β′) = µ(x) + δ−1σ exp[exp

    (c− µ(x)σ/δ

    − γ)]

    ×(c− µ(x)σ/δ

    exp

    [− exp

    (c− µ(x)σ/δ

    − γ)]

    + E1

    [exp

    (c− µ(x)σ/δ

    − γ)])

    = c+ δ−1σ exp

    [exp

    (c− µ(x)σ/δ

    − γ)]

    E1

    [exp

    (c− µ(x)σ/δ

    − γ)]

    .

    Note that for � = exp(δ[c− µ(x)]/σ − γ) ≈ 0 one has

    E(Y |Y > c; σ,β′) = µ(x) + δ−1σ exp(�) [(γ + log(�)) exp(−�) + E1(�)]= µ(x) + δ−1σ exp(�)

    ×([γ + log(�)]

    [1− �+O(�2)

    ]− log(�)− γ + a1�+O(�2)

    )= µ(x) + δ−1σ exp(�) [(a1 − γ)�− � log(�)] +O(�2 log(�))

    where a1 is as in Table 2. In particular, in the limiting case as �→ 0, one has

    E(Y |Y > c; σ,β′) −→ µ(x) .

    This makes intuitive sense since in that case the censored observation is so low as toprovide no information about the actual failure time. In that case it reasonable toreplace a “completely missing” observation by its mean value.

    For λ = exp(δ[c− µ(x)]/σ − γ) very large one has

    E(Y |Y > c; σ,β′) = c+ δ−1σ exp(λ)E1(λ) = c+ δ−1σ(1

    λ+O(1/λ2)

    )≈ c .

    21

  • 4.2 Some Specific Examples and Simulation Experiences

    The data set Table 4 is taken from Gertsbakh (1989) for illustrative and comparativepurposes. It gives the log-life times for 40 tested motors under different temperatureand load conditions. The failure indicator is one when the motor failed and zero whenit was still running at the termination of the test. The maximum likelihood estimatesfor the regression coefficients and scale parameter were given by Gertsbakh as theentries in the first row of Table 5. The corresponding estimates as computed by ouralgorithm are given to the same number of digits in the second row of that table. Theresults are reasonably close to each other.

    The data in Table 1 can be taken as another example, although here there are nocovariates. This however provides an independent way of gauging the accuracy ofour algorithm, since in that case we have an independent double precision algorithmbased on root solving. The answers by these two methods are given in Table 6 to therelevant number of digits for comparison. The agreement is very good (at least ninedigits) in this particular example.

    As another check on the algorithm various simulations were performed, either withnoncensored samples and or with various degrees of censoring. In all cases only onecovariate was used. For the noncensored case 1000 samples each were generated atsample sizes n = 5, 20, 50, 100. The data were generated according to the Gumbelmodel with a linear model β1 + β2ui, with β1 = 1 and β2 = 2. The ui were randomlygenerated from a uniform distribution over (0, 1). The scale paramater was σ = .5.Figures 3 and 4 illustrate the results. The dashed vertical line in the histogram for

    σ̂ is located at σ√(n− 2)/n. It appears to be a better indication of the mean of

    the σ̂. Equivalently one should compare σ̂√n/(n− 2) against σ = .5. The n − 2

    “accounts” for the two degrees of freedom lost in estimating β1 and β2. Judging from

    these limited simulation results it appears that the factor√n/(n− 2) corrects for the

    small sample bias reasonably well.

    Figures 5-7 illustrate the statistical properties of the maximum likelihood estimatesfor medium and heavily censored samples of size n = 50, 500 and 1000. The censoringwas done as follows. For each lifetime Yi in the sample a random censoring timeVi = .5 + 3γWi was generated, with Wi taken from a uniform (0, 1) distribution.The smaller of Yi and Vi was then taken as the i

    th observation and the censoringindicator was set appropriately. The parameter γ controls the censoring. A smallvalue of γ means heavy censoring and larger γ means medium to light censoring. In

    22

  • this simulation γ = .2 and γ = 1 were used.

    The presentations in Figures 5-7 plot for each sample the estimate versus the corre-sponding censoring fraction. Originally N = 1000 samples were generated under eachcensoring scenario, but under n = 50 and heavy censoring two samples did not permita solution, since at least 49 lifetimes were censored in those cases. The percentagesgiven in these plots indicate the proportion of estimates above the respective targetline. The percentages given in parentheses use the dashed target line, which as inFigures 3-4 is an attempt at bias correction. Note how the increasing sample size en-tails a reduction in the scatter of the estimates. Also note how the scatter increaseswith increasing censoring fraction.

    Also shown in each plot of Figures 5-7 is the least squares regression line to indicatetrends in the estimates against the censoring fraction. It appears that for heavycensoring there is a definite trend for the intercept estimates β̂1. Namely, as thecensoring fraction increases so does the intercept estimate. We do not know whetherthis effect has been discussed in the literature. The usefulness of this relationship isquestionable, since one usually does not know whether the regression line is aboveor below the target line, since the latter is unknown. Note that the median of theestimates β̂1 is close to target.

    4.3 The Fortran Code GMLE

    The file with the Fortran subroutine GMLE, developed out of the above considera-tions, is called gmle.f and is documented in Appendix A. Although the source codefor it could easily be made available, it still requires linking with three BCSLIBsubroutine libraries, namely optlib, bcsext, and bcslib. Once one has written anappropriate driver for GMLE (which may be contained in the file gmledrv.f, alsoavailable) one needs to compile these as follows on a Sun workstation

    f77 gmledrv.f gmle.f -loptlib -lbcsext -lbcslib .

    23

  • Table 4: Motor Failure Data, Two Factors (from Gertsbakh, p. 206)

    log rescaled rescaled log rescaled rescaledfailure load temper. failure failure load temper. failuretime index indicator time index indicator5.45 1 1 1 5.15 -1 1 15.74 1 1 1 6.11 -1 1 15.80 1 1 1 6.11 -1 1 16.37 1 1 1 6.23 -1 1 16.49 1 1 1 6.28 -1 1 16.91 1 1 1 6.32 -1 1 17.02 1 1 1 6.41 -1 1 17.10 1 1 0 6.56 -1 1 17.10 1 1 0 6.61 -1 1 17.10 1 1 0 6.90 -1 1 05.07 1 -1 1 3.53 -1 -1 15.19 1 -1 1 4.22 -1 -1 15.22 1 -1 1 4.73 -1 -1 15.58 1 -1 1 5.22 -1 -1 15.83 1 -1 1 5.46 -1 -1 16.09 1 -1 1 5.58 -1 -1 16.25 1 -1 1 5.61 -1 -1 16.30 1 -1 0 5.97 -1 -1 16.30 1 -1 0 6.02 -1 -1 16.30 1 -1 0 6.10 -1 -1 0

    24

  • Table 5: Comparison of MLE’s for Data in Table 4

    Source load temperatureintercept coefficient coefficient scale

    Gertsbakh 6.318 0.253 0.391 0.539our code 6.317 0.253 0.391 0.538

    Table 6: Comparison of MLE’s for Data in Table 1

    Source scale shapeparameter parameter

    root solver 952.3774020 23.90139575optimization code 952.3774021 23.90139576

    25

  • Figure 3: 1000 Simulations at n = 5 and n = 20 (uncensored)

    -4 -2 0 2 4

    020

    4060

    8010

    014

    0

    beta_1

    n = 5

    -5 0 5 10

    050

    100

    150

    beta_2

    0.0 0.2 0.4 0.6 0.8 1.0

    010

    2030

    4050

    60

    sigma

    0.0 0.5 1.0 1.5

    020

    4060

    8010

    0

    beta_1

    n = 20

    0.5 1.0 1.5 2.0 2.5 3.0

    010

    2030

    4050

    60

    beta_2

    0.3 0.4 0.5 0.6 0.7

    010

    2030

    4050

    sigma

    26

  • Figure 4: 1000 Simulations at n = 50 and n = 100 (uncensored)

    0.6 0.8 1.0 1.2 1.4

    020

    4060

    beta_1

    n = 50

    1.5 2.0 2.5 3.0

    020

    4060

    80

    beta_2

    0.3 0.4 0.5 0.6

    020

    4060

    80

    sigma

    0.8 1.0 1.2

    010

    2030

    40

    beta_1

    n = 100

    1.4 1.6 1.8 2.0 2.2 2.4 2.6

    010

    2030

    4050

    beta_2

    0.40 0.45 0.50 0.55 0.60

    010

    2030

    4050

    sigma

    27

  • Figure 5: 1000 Simulations at n = 50, medium and heavy censoring

    ••

    ••

    ••

    ••

    •• •

    ••

    •••

    • •

    ••••

    •• ••

    ••

    • ••

    •• •

    ••

    ••

    ••

    •• •

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    •• ••

    •• •

    ••

    • •

    ••

    ••

    ••

    ••

    ••

    •••

    ••

    • • •

    •••

    ••

    ••

    ••

    ••

    ••

    • •

    ••••

    ••

    • ••

    •••

    ••

    • ••

    ••

    ••

    •••

    • ••

    •• ••

    •••

    • ••

    ••

    • •

    • ••

    ••

    • •

    ••

    •• •

    ••

    ••

    • •

    ••

    ••

    ••

    • ••

    •• •

    ••

    ••

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    •••

    •••

    ••

    • ••

    ••

    ••

    •• • •

    •••

    •• •

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    •• •

    • ••

    ••

    ••• •• •

    •••

    ••

    ••

    •••

    • •••

    •••

    ••

    •••

    ••

    ••

    •••

    • • ••

    ••• •

    ••

    ••

    ••

    ••

    •••

    ••

    ••

    •••

    ••

    ••• •

    • ••

    ••

    • ••

    ••

    •• • •

    ••

    ••

    ••

    •••

    • ••

    • •

    •••

    •••

    ••

    ••

    • ••

    ••

    • •• •

    •• •

    ••

    •••

    • ••

    ••

    ••

    ••

    ••

    •• •

    ••

    ••

    ••

    ••

    ••

    • •

    • ••

    • •

    ••

    ••

    •••

    ••

    •• •

    •••

    ••

    •••

    • ••

    • ••

    ••

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    ••

    ••

    ••

    • ••

    ••

    ••

    •• •

    ••

    ••

    ••

    ••

    ••

    •• •

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    ••

    ••

    • •

    • ••

    ••

    •• •

    ••

    •••

    ••

    ••

    ••••

    ••

    • •

    • ••

    • •

    •••

    •• •

    •••

    •••

    ••

    ••

    •• ••

    ••

    • • •

    ••

    ••

    • •• ••

    ••

    •••

    • ••

    ••

    ••

    ••

    •••

    ••

    •••

    ••

    ••

    ••

    ••

    ••• •

    ••

    •••

    •••

    •• •

    ••

    •• ••

    •••

    • •

    • •

    • ••

    ••

    ••

    ••

    ••

    ••

    •• • ••

    ••

    •••

    •••

    • ••••

    •••

    censoring fraction

    beta

    _1

    0.2 0.3 0.4 0.5 0.6

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    1.6

    n = 50

    48.9 %

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    •• •••• •

    ••

    • •

    ••••

    ••

    ••

    ••

    ••

    ••

    • ••

    ••

    ••

    •• •

    •• •

    ••

    • •

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    • •••

    ••

    ••

    •• ••

    •• •

    ••••

    ••

    ••

    • ••

    ••

    •••

    • •• ••

    • ••

    •• •

    ••

    ••

    ••

    ••

    ••

    •••

    ••••

    ••

    •• ••

    • •

    •• •• •

    ••••

    ••

    •••

    • •

    ••

    • •

    • •

    • •

    ••

    •••

    ••

    ••

    ••

    • •

    ••

    ••

    ••

    ••

    ••

    •••

    ••

    • •

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    ••

    • ••

    ••

    •••

    • •

    ••

    • ••

    ••

    ••

    ••

    ••

    ••

    •••

    ••

    •••

    •••

    • ••

    • •

    • •

    •••

    ••

    •• •

    •• •

    • •

    ••

    ••

    ••

    ••

    ••

    •• •

    ••

    •• •

    ••

    ••

    •••

    ••• •

    ••

    ••

    ••

    •• ••

    • •

    • •

    • •

    • •

    •• • •••

    ••

    •••

    ••

    •• •

    ••

    ••

    • •• ••

    ••

    ••

    •••

    ••

    ••••

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    •• •

    •••

    ••

    ••

    •••

    •• •

    • •

    ••

    ••

    ••

    ••

    • • ••

    ••

    ••• •

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    •••

    •••

    ••

    ••

    ••

    ••••

    ••

    ••

    • ••

    ••••

    •••

    ••

    ••

    •• ••

    ••

    •••

    ••

    •••

    ••

    •••

    ••

    ••

    • •

    ••

    • •

    ••

    ••

    ••

    ••

    ••

    • •

    ••

    •••

    ••

    •••

    ••

    • •

    ••

    •••

    ••

    ••

    ••

    ••

    ••

    •••

    • ••

    ••

    • • •

    •• •

    ••

    ••

    •• ••

    ••••

    • •• •

    • ••

    ••• •

    ••

    ••

    ••

    •• •

    •• •

    •• •

    ••

    •• •

    ••

    •••

    ••

    • •

    ••

    ••

    ••

    ••

    ••

    ••

    •••

    • ••

    ••

    ••

    •••

    ••

    ••

    ••

    ••

    •••

    • •

    censoring fraction

    beta

    _2

    0.2 0.3 0.4 0.5 0.6

    1.0

    1.5

    2.0

    2.5

    3.0

    49.5 %

    ••

    ••

    ••

    •• •

    ••

    ••

    • ••

    ••

    ••

    • •

    ••

    ••

    ••

    ••

    • •

    ••

    ••

    ••

    ••

    •• •

    ••

    ••

    •• •••

    ••

    ••

    ••

    •••

    •••

    •• •

    ••

    ••

    ••

    • •

    ••

    ••

    ••

    ••

    ••

    ••

    •• ••

    ••

    •••

    ••

    ••

    ••

    ••

    ••

    •••

    ••

    • • ••

    ••

    ••

    ••

    • •

    •••

    ••

    ••

    ••••

    •••

    •••

    •• ••

    ••

    ••

    •••

    • ••

    ••

    • •

    ••

    •• •

    ••

    ••

    ••

    ••

    •••

    ••••••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    • ••

    ••

    •••

    ••

    • • •

    ••

    ••

    ••

    ••

    • •

    • •

    ••

    ••

    ••

    ••

    ••

    • •

    ••

    ••

    •• •

    • •

    •• ••

    ••

    ••

    • •

    • •

    •••

    • ••

    • ••• •

    ••

    ••

    • •• •

    ••

    •••

    •••

    ••

    •• •

    ••

    ••

    • •••

    • •

    •• ••

    •••

    •••

    ••

    ••••

    ••

    • •

    •••

    ••

    ••

    ••

    • ••

    ••

    ••

    •••

    ••

    • •

    •••

    • •

    ••

    • ••

    • •

    ••

    •• •• • •

    ••

    ••

    •• •

    • • •

    ••

    ••

    ••

    ••

    •• •

    •••

    ••

    •• ••

    ••

    • •

    ••

    ••

    ••

    ••

    ••

    ••

    •• •

    ••

    • ••

    ••

    • •

    •••

    • ••

    ••

    •••

    •• ••

    ••

    ••

    ••

    • •

    ••

    • •• •

    •• ••

    • •

    ••

    ••

    •••

    ••

    •••••

    ••

    ••

    •••

    • •

    ••

    • •

    •••

    ••

    ••

    •••

    • ••

    •••

    ••

    •••

    ••

    •••

    ••

    •• •

    ••

    • ••

    ••

    ••

    ••

    • •

    ••

    •• •

    ••

    ••

    •••

    ••

    • ••

    ••• •••

    ••

    ••

    ••

    • ••

    ••

    • •

    • •

    ••

    •••

    ••

    • •

    •••

    ••

    • ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    • ••

    ••• •

    ••

    ••• •

    • •••

    • ••

    ••

    ••

    ••

    censoring fraction

    sigm

    a

    0.2 0.3 0.4 0.5 0.6

    0.3

    0.4

    0.5

    0.6

    0.7

    40.4 %( 45 %)

    • ••

    ••

    •• • ••

    • •

    •• •

    ••

    ••

    •••

    ••

    ••

    •••

    • •

    ••

    •• •

    ••••

    ••

    • ••

    •••

    ••• •

    ••• •••

    •• • •• •

    ••• ••

    ••

    ••

    ••

    • •••

    •••

    •••••

    ••

    •• ••• •• •

    ••

    •••

    ••

    •• •

    ••

    ••

    •••

    • •

    ••

    ••

    •••

    ••• •

    ••

    •••

    • • •••

    ••

    ••

    • ••••

    •••

    ••

    ••

    •••

    •• •

    ••

    • ••

    •••

    •• •

    • ••

    ••

    •• •

    ••

    ••

    • ••

    ••••

    • • •

    ••

    ••

    ••

    •••

    ••

    •• • •

    • ••

    ••

    ••

    •••

    •••

    ••••

    ••

    ••

    • •

    ••

    • ••

    ••

    ••

    ••

    ••

    • •• •••••

    ••

    ••• ••

    •••

    •••

    ••• ••

    ••• •

    •••

    ••

    •••

    •••

    •• ••••

    ••

    ••

    ••

    ••• •

    •••

    ••• ••

    ••

    ••

    •• •• • •

    ••• •

    •••

    • ••

    ••

    ••

    •• • •

    •• •• •

    •••

    ••

    •••• ••

    ••

    ••• ••

    ••

    ••• •

    • • •••••

    ••

    •• •

    •• ••

    • •••

    ••

    ••

    •• •••

    •• ••

    ••

    • •

    •••

    ••

    •••• ••• •

    ••

    •• •• •••

    ••• ••

    ••

    ••• ••

    •••

    ••

    ••

    • •

    ••••

    ••

    •••

    •••

    •••

    ••

    ••

    ••

    • • • •• ••

    ••

    ••

    •• ••

    • •

    •• •

    ••• •• •

    • •• •

    ••

    •••

    ••

    ••

    • ••• •

    •••

    •••

    •• •

    • •

    •••

    ••

    •• • ••

    ••

    ••

    ••

    ••

    ••

    •••• • •

    • ••

    • ••

    ••• • ••

    ••

    ••

    ••

    ••

    ••

    •••••

    •••

    • •

    • ••

    ••

    ••

    • ••

    •••

    • ••••

    •• •••

    •• •

    ••

    • ••••

    • •••

    •• •

    ••• •• •••

    • •

    ••

    •••

    • •

    •• ••

    ••

    ••

    ••

    •• •

    ••

    ••

    ••

    ••

    ••

    ••• •

    •• • •••

    ••• ••

    ••

    •••

    • •• •

    •• •

    • ••

    • ••

    ••

    ••

    ••

    ••

    •• •

    ••

    •• ••

    • ••

    •• • •

    •••

    ••

    •• •••

    ••• •

    ••• ••

    ••

    ••

    ••

    ••

    ••

    ••

    •••

    • ••

    ••

    • ••• •

    •••

    ••

    ••

    ••

    •••

    ••

    •• •••

    • ••

    ••

    ••••

    •••

    • • •••

    •• •••

    • •••

    •••

    ••

    ••

    •••

    • •

    •••

    ••

    censoring fraction

    beta

    _1

    0.65 0.70 0.75 0.80 0.85 0.90 0.95

    01

    23

    42.1 %

    n = 50

    • •••

    •• ••

    • • •••

    • •••• ••• •

    •• •• ••• •• •• ••• •

    ••

    ••

    ••

    •• •• •

    •••• •

    ••

    ••

    •••• •

    •• ••

    ••

    • • ••• ••• ••

    •••

    ••

    •• •• • •

    ••• •• ••••••

    • •• •• ••

    •••••

    •• •• •

    ••

    •• •

    ••• ••• ••

    •••

    ••

    •••

    •••

    •• • ••••

    ••

    • •••• ••

    • •• •• ••••• •••

    • ••

    •••

    •••

    •••• ••

    ••• •• • •

    •• •

    • ••

    •• •

    •• •••

    ••

    • • •• •••

    •••• • •• •••• • •

    •••••• • ••

    • ••

    • ••

    • • •• •• •••• •

    ••

    ••

    • •••••

    • ••

    •• •••

    •••

    • ••

    • •••• • •••

    •••

    ••

    •• ••• ••• •

    •••••

    • •• ••••

    •• • •• • ••

    • ••

    • •••• •

    ••• •• • •• •••• •• •• •

    •••••• •••• ••

    • • ••••

    •• •• ••

    • •• • • ••• • ••

    •• •• ••

    • ••

    • •

    ••• •• • •••

    •• ••

    ••

    ••• • ••••••

    ••• •

    ••••• •

    • ••••

    •• •• • •••

    •• ••

    ••••

    ••

    ••

    • •••

    • •••

    • •• •••

    ••• •••

    •• • ••

    •••• ••• ••

    ••••

    •• • ••••

    •• ••

    ••• •

    •• •••••

    •• •• • •••

    •• •• • •• ••

    • • ••••

    ••• •••

    •• •••

    •• •••

    ••

    • • • •• •• ••

    ••• • •

    •• •••

    ••• •••••• •

    •• •• •

    •• •• •••• •••

    •• • •• • ••

    •• •• • ••

    •• ••• •• • •

    ••• •

    ••• •

    •• •••

    ••

    • •• ••• •• •• •••

    ••••• • •

    • •••• ••••

    •• •••

    ••

    ••

    ••• •

    ••

    • ••

    •• •• •• ••••

    ••••• •

    • ••• •••

    •••

    • ••• •

    • •••••

    •••

    ••

    ••• • •• •••• •••

    •••• • •••

    •• •••• ••

    • ••

    ••

    •• •••••

    • •••• •••

    ••

    •••

    • •• •••• •••

    •• •••

    •• ••

    • • •• ••

    •••

    • ••• ••

    ••

    ••

    •••• ••

    • •• ••

    • •••• •

    ••• ••

    •••

    ••••• • ••

    •••

    • ••

    •• •••

    •••••

    •••

    •••• • •••

    •• •••• • ••••

    •• •• •• •

    • ••

    •••• •••

    ••

    ••

    •• •

    •••

    • •• •

    •• •••••

    ••

    • • •••

    ••

    • •• • ••

    • •••• • •• ••

    •••

    •• •

    •••••

    ••

    censoring fraction

    beta

    _2

    0.65 0.70 0.75 0.80 0.85 0.90 0.95

    05

    1015

    2025

    30

    49.1 %

    • ••

    •••

    ••

    • •

    •••

    ••

    ••

    ••

    ••

    ••

    ••

    •• ••

    • •

    •••

    • •

    ••

    ••

    ••

    ••

    •• •

    • ••

    ••

    ••

    • ••

    •• •

    ••

    •••

    ••

    ••

    • •

    ••

    ••

    ••••

    •• •

    ••

    •• •

    •• •

    ••

    • •

    • ••

    ••

    ••

    ••

    ••

    ••

    ••

    •••

    ••

    ••

    ••• ••

    ••• •

    ••

    ••

    • •

    •• •••

    ••• ••

    • ••••

    ••

    ••••

    ••

    ••

    ••• •

    • •

    • •

    •••

    ••

    • •

    •••

    ••

    ••

    • ••

    •••

    ••

    ••

    ••

    ••

    ••

    ••

    •• ••

    ••

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    •• •

    ••

    •••

    ••

    ••

    •• •

    ••

    ••

    •• •

    • •

    ••

    •• •

    • •

    ••

    • •

    ••••

    • •

    ••

    •• •• •

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    •••

    ••••

    • •

    • ••

    • •••

    • •

    ••

    •••

    ••

    •••

    • ••

    ••

    •••• •

    •• •

    •• •

    •• •

    •••

    •••

    ••

    ••

    ••

    •• •

    •• •

    ••••

    ••

    • •••••

    ••••

    •• ••

    ••

    •••

    • ••

    • ••

    ••• ••

    ••

    •• •

    • •••

    •••

    ••

    ••

    ••

    • ••••

    ••

    ••

    ••

    ••

    •••

    ••

    •• •

    ••••

    ••

    •••

    ••

    ••

    ••

    ••

    •••

    •••

    ••

    ••••

    ••

    ••

    •• ••

    •••

    •••

    ••

    •••

    • ••

    ••

    •••

    ••

    •••

    ••

    • •

    • ••

    ••

    • • •

    • •

    •••

    ••

    •••

    ••

    ••

    ••• •

    •••

    ••

    • •

    ••

    ••

    ••

    •••

    ••

    • •

    ••

    ••

    ••••• •

    ••

    ••

    ••

    • •

    ••

    ••

    ••

    ••

    •• •

    •• •

    •• •

    ••

    • ••• •

    • •

    ••

    ••

    •• •

    ••• •

    ••

    • ••

    ••

    • •

    ••

    •• •

    ••

    •••• •

    ••

    ••

    ••

    ••

    • ••

    ••

    ••

    •••

    • • ••

    • ••

    ••

    • •••

    ••

    •••

    ••

    • •

    ••

    ••

    ••

    ••

    • •

    •••

    • • •••

    • •

    ••

    ••

    ••

    •••

    • ••

    •••

    censoring fraction

    sigm

    a

    0.65 0.70 0.75 0.80 0.85 0.90 0.95

    0.2

    0.6

    1.0

    1.4

    42.6 %( 44.9 %)

    28

  • Figure 6: 1000 Simulations at n = 500, medium and heavy censoring

    ••

    ••

    • •

    ••

    ••

    ••

    •• •

    ••

    • ••

    •• • •

    ••

    • •

    ••

    ••

    ••••

    ••

    • • ••

    • •••

    ••

    • •

    •• •

    ••

    •• •

    • •

    ••

    ••

    ••

    •••

    ••

    ••• •

    •••

    ••

    • ••

    • •

    • •

    •••• •••

    ••

    ••

    ••

    • ••

    ••

    • •

    •••

    •• •

    ••

    • •••

    •••

    •••

    ••

    •• •

    •• •• •

    •• ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    • ••

    ••• • ••

    ••

    ••

    ••• •

    ••••

    ••

    •••

    •• •• • ••

    ••

    ••

    ••

    • ••

    ••

    •• •

    • •

    ••

    ••

    • •

    •• •

    •••

    • ••

    ••

    ••

    ••

    • •

    •••

    ••

    •••

    ••

    •••

    • •

    ••

    ••

    ••

    •••

    •••

    • •

    ••

    ••

    • •

    ••

    ••

    ••

    ••

    •••

    • •

    ••

    ••

    ••

    •• •

    ••

    •• •

    •••

    ••

    ••

    •••• •

    •••

    ••

    ••

    • •

    ••• •

    ••

    • •••

    ••

    ••

    ••

    ••

    •••

    • •

    •••

    ••

    ••

    ••

    ••

    ••

    ••

    • ••

    ••

    ••

    ••

    ••

    ••

    ••

    •••

    ••

    •••

    ••

    ••

    • •

    • ••

    ••

    ••

    ••

    •••

    ••

    ••

    ••

    ••

    •• •

    ••

    ••

    ••

    • •

    • ••

    •••

    ••

    ••

    ••

    ••

    • ••

    ••

    ••

    •• •

    ••

    •• •

    • •

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    ••

    •••

    • ••

    • •••

    •••

    ••

    ••

    ••

    ••

    •••

    ••

    ••

    • •••

    •••• ••

    • ••

    •••

    ••

    • •

    • •

    •• ••

    ••

    ••

    • •

    • •

    ••

    •••

    ••

    • •

    •• •

    ••

    •••

    ••

    •• ••