Chapter 4 Parametric modelling - Lancaster University

Chapter 4 Parametric modelling

1

Parametric models

• “regression: obs = predictor + error”;

(or signal + noise)

The big intellectual step forward:

• (i) predictor model, contains covariates,

explanatory variables;

• (ii) error model, model for random variation,

e.g. normal.

Generalise: glms are built from

a linear predictor, which gives a regression type

term, and

an EF model for the uncertainty or natural

random variation.

2

Exercise 4.1 Interest lies in comparing the

survival of two groups: is this modelled within the

predictor or within the error?

3

Sol: 4.1 In parametric survival modelling a

group effect may be represented in either.

e.g. different mean lifetimes in a linear predictor

e.g. different survival functions in the error

distribution.

4

Exercise 4.2 Is a model for a lifetime variable a

model for the predictor or for the error or for the

observed value?

5

Sol: 4.2 Both.

Need to think about what modelling means.

6

Exercise 4.3 How are the hazard and survivor

functions expressed within this framework:

observation, predictor or error?

7

Sol: 4.3

obs=data + model →h

8

Probability distributions for lifetime

variables

9

Exponential distribution

The exponential distribution is the ‘canonical

model’ for survival analysis.

lifetime T ∼ Exp(λ) λ > 0, T ≥ 0

pdf f(t) = λ exp(−λt), t ≥ 0;

survivor function S(t) = exp(−λt), t ≥ 0;

integrated hazard H(t) = λt, t ≥ 0;

hazard function h(t) = λ, t ≥ 0.

10

Exercise 4.4 Suggest an appropriate scale to

plot the survivor function.

11

Sol: 4.4

logS(t) = −λt linear:

so plot (t, logS(t)).

12

Exercise 4.5 Suggest how covariates can be

represented in this model.

13

Sol: 4.5

Through λ, note E(T ) = 1/λ.

i.e. this is an instance of a glm.

14

Weibull distribution

The Weibull distribution has scale parameter

λ > 0 and shape parameter α > 0.

lifetime

T ∼ Weibull(α, λ) λ > 0, α > 0, T ≥ 0

pdf f(t) = αλαtα−1e−(λt)α, t ≥ 0;

survivor function S(t) = e−(λt)α, t ≥ 0;

integrated hazard H(t) = (λt)α, t ≥ 0;

hazard function h(t) = λααtα−1, t ≥ 0.

15

Exercise 4.6 Suggest an appropriate scale to

plot the survivor function.

16

Sol: 4.6

logS(t) = −(λt)α and

log [− logS(t)] = α − λ log (t) linear:

so plot ( log (t), log [− logS(t)]).

The linear plot allows rough estimates of (α, λ)

to be read from the graph.

17

The parameters:

λ is the scale parameter, and

α is the shape parameter.

• α > 1, increasing hazard function

• α < 1, decreasing hazard function

• α = 1, Weibull(1, λ) = Exp(λ).

The scale of the hazard is determined by λ.

18

Looking back, for example, at the bearings

data, we have near linearity on the complementary

log–log scale but not the logarithmic scale.

The Weibull distribution is a better model for the

data than the exponential distribution.

19

Weibull moments

The Weibull(α, λ) distribution has the following

moments:

Expectation E(T ) = Γ(1 + 1/α)/λ;

Variance

var(T ) = [Γ(1 + 2/α) − Γ(1 + 1/α)2]/λ2.

One way of understanding the behaviour of a

variable T which has the Weibull(α, λ)

distribution is via the following exercise.

20

Exercise 4.7 Prove using the survivor functions

that if T ∼ Weibull(α, λ) and Z = Tα then

Z ∼ Exp(λα).

Also show that (λT )α ∼ Exp(1).

21

Sol: 4.7

SZ(z) = P (Z > z)

= P (Tα > z)

= P (T > z1/α)

= ST (z1/α)

= exp{−(λz1/α)α}= exp{−λαz},

which is the survivor function of the Exp(λα)

distribution.

The second part of the lemma is proved similarly

(and just as easily.) Homework.

22

The extreme value distribution

The extreme value distribution has parameters µ

and σ:

lifetime T ∼ EV(µ, σ) σ > 0, µ, T ∈ R1

pdf f(t) = σ−1 exp(

t−µσ

)exp

[− exp

(t−µσ

)],

−∞ < t < ∞;

survivor function S(t) = exp[− exp

(t−µσ

)],

−∞ < t < ∞;

integrated hazard H(t) = exp(

t−µσ

),

−∞ < t < ∞;

hazard function h(t) = σ−1 exp(

t−µσ

),

−∞ < t < ∞.

23

EV moments

Expectation E(T ) = µ − γσ where

γ = 0.5772 . . . is Euler’s constant.

Variance var(T ) = π2σ2/6 ∼ 1.645σ2

The extreme value distribution is obtained by a

logarithm transformation of an exponential

random variable.

24

Exercise 4.8 Show that if Z ∼ Exp(1) and

T = µ + σ log (Z) then T ∼ EV(µ, σ).

25

Sol: 4.8

ST (t) = P (T > t) = P (µ + σ log (Z) > t)

= P (Z > exp{(t − µ)/σ})= SZ(exp{(t − µ)/σ})= exp[− exp{(t − µ)/σ})],

which is the survivor function of the EV(µ, σ)

distribution as claimed.

26

The standard EV distribution has µ = 0 and

σ = 1 with pdf exp (t − exp t).

27

Like the Weibull distribution, there is some asymptotic

argument (beyond the scope of this course), analogous to the

central limit theorem for sample means, which suggests that

the extreme value distribution may be appropriate for

modelling lifetimes in some special circumstances.

However, unlike the Weibull distribution, the domain of T is

(−∞,∞), so that negative lifetimes will have a non–zero

probability.

This is obviously a limitation in the use of the model, though it

is still often used.

In particular, P (T < 0) = 1 − expˆ

− exp`

−µσ

´˜

, which can be

made very small for suitable values of µ and σ so in practice

the failings of the model are not so great.

28

Other distributions

other distributions are used: these include the

log–normal,

gamma and

log–logistic distributions,

which have the following density functions.

29

The Lognormal distribution

• T ∼ lognormal(µ, σ2) σ > 0 , t > 0

• T = exp(X) where X ∼ Normal(µ, σ2)

• logT ∼ Normal(µ, σ2)

• T = eµ(eZ)σ where Z ∼ Normal(0, 1)

•f(t) = (2πσ2t2)−1/2 exp{−( log t − µ)2/(2σ2)}.

E[T ] = exp

(µ +

1

2σ2

).

30

Basic functions for the standard log–normal

distribution

0 5 10 15

0.0

0.1

0.2

0.3

0.4

0.5

0.6

f(x

)

H

x0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

F(x

)

x0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

S(x

)

x0 5 10 15

0.0

0.2

0.4

0.6

0.8

h(x

)

x

0 5 10 15

01

23

45

H(x

)x

Density, distribution, survivor, hazard and

cumulative hazard functions for the standard

lognormal distribution.

31

The Gamma distribution

• T ∼ Gamma(α, λ) λ > 0 , α > 0 , t > 0

• f(t) = λαtα−1 exp(−λt)/Γ(α)

• E[T ] = α/λ var[T ] = α/λ2

• X1, . . . , Xn iid Exp(λ)

⇒ T =∑n

i=1 Xi ∼ Gamma(n, λ)

32

The log-logistic distribution

The logistic distribution

X ∼ logistic(µ, σ) σ > 0 ,−∞ < x < ∞.

It looks like Normal(µ, σ2) but has a simple

closed form SX(x) = 1

1+exp(x−µσ )

.

• T ∼ log-logistic(µ, σ) σ > 0 , t > 0,

- T = exp(X) where X ∼ logistic(µ, σ),

- logT ∼ logistic(µ, σ).

• ST (t) = 11+(t exp(−µ))1/σ ,

pdf log–logistic

f(t) = 1/(σt) exp{( log t − µ)/σ}/[1 + exp{( log t − µ)/σ}]2, t

33

The logistic distribution is similar in shape to the

normal distribution.

It and the log–logistic are more manageable than

the normal and log–normal when we encounter

censored data since their survivor functions have

a simple closed forms.

34

MLE

We fit probability models to censored lifetime

data using maximum likelihood.

A big plus for ml that it can deal with

censoring naturally, and the

analysis extends to include covariate effects.

Assume n lifetime measurements t1, t2, . . . , tnwith associated censoring indicators δ1, δ2, . . . , δn.

Assume iid: that is ti derive from a common

distribution, with unknown parameters θ.

We derive the likelihood function.

35

Exercise 4.9 Give a generic definition of the

likelihood function.

36

Sol: 4.9

L(paras) = P(realised values| paras)

or

L(θ) = P(T = t|θ).

Obvious problem of 0 with continuous models.

Fudge: want L(θ) ∝ P(T = t|θ) so that define

L(θ) = fT (t|θ).

But how does this work with censored

observations?

37

Sol: 4.9 Take right censored.

Observe t∗ then in the model T ≥ t∗,

so L(paras) = P(realised value| paras)

becomes

L(θ) = P(T ≥ t∗|θ)= ST (t∗|θ).

38

This calculation requires the independence of the

censoring mechanism and the lifetimes,

otherwise the likelihood is invalid.

Likelihood inference now proceeds in the standard

way, giving

maximum likelihood estimates,

standard errors and confidence intervals

(asymptotic).

For most lifetime models the procedures cannot

be implemented analytically and so numerical

techniques are required;

but for the exponential distribution . . .

40

Exponential mle

The likelihood becomes

L(λ) =n∏

i=1

h(ti|λ)δiS(ti|λ) =n∏

i=1

λδi exp(−λti),

ℓ(λ) =n∑

i=1

δi logλ −n∑

i=1

λti

= m logλ − λ

n∑

i=1

ti

where m is the number uncensored.

Setting the score to 0 gives the mle λ = mPni=1

ti.

With uncensored data we get the usual mle 1/t.

41

As

var(λ) ≈(− d2ℓ

dλ2

)−1

||bλ

the approximate variance is

var(λ) ≈ λ2

m.

Approx ci: λ ± 1.96√

varλ.

42

Exercise 4.10 With lifetime data 5, 7∗, 11 find

the mle and an asymptotic 95% ci.

43

Sol: 4.10 λ = 2/23 = 0.08695652,

1.96

√bλ2

m= 1.96λ/

√m = 0.1205156,

gives ci 0.086 ± 0.121.

Not great as covers negative values.

44

Exercise 4.11 Find the Fisher information for

this exponential likelihood.

Why does it not depend on the lifetimes?

45

Sol: 4.11

ℓ(λ) = m logλ − λn∑

i=1

ti

ℓ′(λ) =m

λ−

n∑

i=1

ti

ℓ′′(λ) = −m

λ2.

does it not depend on the lifetimes

mathematically as score function only affected by

additive constant

46

Exercise 4.12 Write down the log–likelihood for

the Weibull lifetime distribution.

47

Sol: 4.12

L(λ) =n∏

i=1

h(ti|λ)δiS(ti|λ)

=n∏

i=1

[λααtα−1i ]δie−(λti)

α

ℓ(λ) =n∑

i=1

δi log [λααtα−1i ]−(λti)

α

= (α log [λ] + logα)n∑

i=1

δi

+(α − 1)n∑

i=1

δi log (ti)−λα

n∑

i=1

tαi .

48

Exercise 4.13 The standard errors computed

from the Fisher information are asymptotic in the

sense that for large n the coverage probability of

a confidence interval is numerically accurate.

Briefly outline the mathematical argument that

justifies this statement.

Illustrate with the exponential distribution.

49

Sol: 4.13 The exponential score function is

ℓ(λ|t) = m logλ − λ

n∑

i=1

ti,

ℓ(λ|T ) = m logλ − λ

n∑

i=1

Ti.

But the rhs is a

sum of independent random variables

with finite second moment and if

appropriately standardised

is asymptotically Normal by the CLT.

50

mle using R

The functions Surv and survreg in the

R-package survival provide the means to

numerically find mle for parametric models.

51

Example 4.14

library(survival)

load("./m353data.Rdata")

attach(bearings) ; class(bearings)

survtime = Surv(time,cens) ; class(survtime)

survtime

Surv(time,cens) contains the censoring

information.

52

Example 4.15

survreg(survtime~1,data=bearings,dist=’exponential’)

Coefficients: (Intercept) 4.279729

Scale fixed at 1

Loglik(model)= -121.4

sum(cens)/sum(time) # analytic mle of lambda

log(sum(cens)/sum(time))

The constant model 1 fits the intercept but no

covariates.

There appears to be a mismatch between scale

for mles?

53

Fitting the Weibull

Consider lifetime variable T of individual with

covariates x (e.g. age, sex, . . . ).

We require a linear predictor and link function.

Within the Weibull family it is natural to model

the parameter λ in terms of x.

As λ > 0 it is also natural to use a logarithmic

link

λ = exp(β′x)

for some parameter vector β.

54

Recall that if T ∼ Weibull(α, λ), then

Tα ∼ Exp(λα) and (λT )α ∼ Exp(1).

So letting E = (λT )α and taking logs,

logT ∼ − logλ +1

αlogE.

The R function survreg takes the default

logarithmic link and fits models of the form

logT ∼ β′x + σ logE.

Comparing these expressions, R models

• − logλ = β′x is the linear predictor;

• the R-scale parameter σ = 1/α is the reciprocal

of the Weibull shape parameter.

55

Exercise 4.16 Fitting the bearings data with the

Weibull distribution gives

survreg(survtime~1,data=bearings,dist=’weibull’)

Coefficients:(Intercept) 4.405188

Scale= 0.4757721


Interpret the estimates in terms of the standard

Weibull parameters.

56

Sol: 4.16

− log λ = 4.405188 st

λ = 0.0122 = 1/81.87,

1bα = 0.4757721 st

α = 2.10.

57

Exercise 4.17 Use this code to compare the

KM estimate of the survivor function with the

parametric estimate from the fitted Weibull

distribution.

plot(survfit(Surv(time,cens),data=bearings),col=’blue’)

t = seq(0.001,150,length=100)

H = (exp(-4.405188)*t)^(1/0.47577121) # integ hazard

S = exp(-H)

lines(t,S,type="l",col=’red’) ; grid()

58

Sol: 4.17 Fitted T ∼ Weibull(2.10, 0.0122).

From the graph:

excellent fit.

59

Model comparison

Does Weibull distribution provides a better fit to

these data than the exponential distribution?

The exponential distribution is obtained by fixing

α = 1 or the R scale=1 with the Weibull

distribution.

Under the null hypothesis, that α = 1, standard

likelihood theory tells us that the distribution of

2(ℓ1 − ℓ2) ∼ χ2.

There is 1 degree of freedom as there is a single

constraint in the sub–model α = 1.

R output:

2(l1 − l2) = 2(−113.7 − (−121.4)) = 15.4,

60

which is significant when compared with the χ21

distribution.

We conclude that the Weibull distribution gives a

much improved fit.

61

Fitting covariates

We look at two examples of fitting models of the

form

logT ∼ β′x + σ logE

which includes both exponential and Weibull

models.

62

Example: leukaemia data

There are n = 33 observations,

none are censored, and

2 explanatory variables: wbc and ag.

The first few observations are

leuk

wbc ag time

1 2300 present 65

2 750 present 156

3 4300 present 100

hist(wbc)

63

wbc measures white blood cell count,

its distribution is skewed, suggesting a logarithmic

transformation of this variable.

ag is a 2–level factor indicating presence or

absence of the compound in the patient’s blood.

First, we fit an additive model in which the linear

predictor − logλ has a different intercept for the

two ag groups, but the effect of log(wbc) is

constant within each group.

This corresponds to the model formula

log(wbc)+ag.

64

attach(leuk)

out = survreg(

Surv(time)~log(wbc)+ag,leuk,dist=’weibull’)

summary(out)

Value Std.Error z p

(Intercept) 5.8524 1.323 4.425 9.66e-06

log(wbc) -0.3103 0.131 -2.363 1.81e-02

agpresent 1.0206 0.378 2.699 6.95e-03

Log(scale) 0.0399 0.139 0.287 7.74e-01

Scale= 1.04


Loglik(intercept only)= -153.6

summary(out)$loglik # -146.4988

65

Exercise 4.18 Write out the fitted version of

the model

logT ∼ − logλ +1

αlogE,

− logλ = β′x.

Does a larger wbc lead to longer remission times?

66

Sol: 4.18

logT ∼ − log λ +1

1.04logE,

− log λ = 5.85 − 0.31x + 1.02I(ag)

where x represents log(wbc).

A larger wbc leads to shorter remission times as

the coefficient of x is negative.

67

The fit of the model suggests that an exponential

lifetime distribution might be adequate.

out = survreg(

Surv(time)~log(wbc)+ag,leuk,dist=’expon’)

summary(out)

Value Std.Error z p

(Intercept) 5.815 1.263 4.60 4.15e-06

log(wbc) -0.304 0.124 -2.45 1.44e-02

agpresent 1.018 0.364 2.80 5.14e-03

Scale fixed at 1

Exponential distribution

Loglik(model)= -146.5 Loglik(intercept only)= -155.5

summary(out)$loglik # -146.5405

68

The deviance between the two models is

2(−146.4988 − (−146.5405)) = 0.0834, which is

tiny compared to χ21: the hypothesis is accepted.

The regression coefficients are clearly significant.

Hence, this exponential model with formula

log(wbc)+ag represents our ‘best’ model.

The estimated lifetime distribution for a patient is

T ∼ Exp(λ)

for a patient without ag

− log λ = 5.815 − 0.304 log (wbc),

while for a patient with ag present

− log λ = 5.815 + 1.018 − 0.304 log (wbc).

69

Diagnostics

The two models under consideration are the

exponential and the Weibull.

Assessing model fit needs to adjust for the

covariate effects.

If

Ti ∼ Exp(λi) implies λiTi ∼ Exp(1)

Ti ∼ Weibull(α, λi) implies (λiTi)α ∼ Exp(1)

and the rhs is the same for all i.

So use standardised lifetimes λiti for diagnostics.

out = survreg(Surv(time)~log(wbc) + ag, leuk, dist=’exp’)

ntimes <- time*exp(-out$linear.predictors)

70

plot(survfit(Surv(ntimes)),xlab=’t’, ylab=’S(t)’, # KM

col=’blue’, log=T); grid()

0 1 2 3 4

0.05

0.10

0.20

0.50

1.00

t

S(t

)

Residual plot of exponential fit to leuk data.

The plot is reasonably linear, except at long

71

lifetimes where there is some evidence of

curvature.

However there is less data there.

72

Example: gehan data

This diagnostic procedure works equally well with

censored data.

Consider the gehan data, fit the treatment group

as a factor and use a Weibull distribution.

attach(gehan)

out = survreg(Surv(time,cens)~treat,gehan,dist=’weibull’)

Coefficients: (Intercept) treatcontrol

3.515687 -1.267335

Scale= 0.7321944

Loglik(model)= -106.6 Loglik(intercept only)= -116.4

ntimes = time*exp(-out$linear.predictors)

ntimes = (ntimes)^(1/out$scale) # alpha = 1/Rscale

73

plot(survfit(Surv(ntimes,cens),gehan),

col=’blue’, log=T)

74

Exercise 4.19 Comment on the diagnostic plot

and interpret the fitted model.

75

Sol: 4.19 The diagnostic based on (λiTi)α

suggests the Weibull is adequate.

The fitted lifetime distribution is

T ∼ Weibull(1/0.73, λi)

where for a patient in the treatment group

− log (λ) = 3.516

while for a patient in the control group

− log (λ) = 3.516 − 1.267 = 2.25.

The lifetimes are increased by taking the

treatment.

76

Chapter 4 Parametric modelling - Lancaster University

Documents

Transcript of Chapter 4 Parametric modelling - Lancaster University