Parametric joint modelling of longitudinal and survival data
Chapter 4 Parametric modelling - Lancaster University
Transcript of Chapter 4 Parametric modelling - Lancaster University
Chapter 4 Parametric modelling
1
Parametric models
• “regression: obs = predictor + error”;
(or signal + noise)
The big intellectual step forward:
• (i) predictor model, contains covariates,
explanatory variables;
• (ii) error model, model for random variation,
e.g. normal.
Generalise: glms are built from
a linear predictor, which gives a regression type
term, and
an EF model for the uncertainty or natural
random variation.
2
Exercise 4.1 Interest lies in comparing the
survival of two groups: is this modelled within the
predictor or within the error?
3
Sol: 4.1 In parametric survival modelling a
group effect may be represented in either.
e.g. different mean lifetimes in a linear predictor
e.g. different survival functions in the error
distribution.
4
Exercise 4.2 Is a model for a lifetime variable a
model for the predictor or for the error or for the
observed value?
5
Sol: 4.2 Both.
Need to think about what modelling means.
6
Exercise 4.3 How are the hazard and survivor
functions expressed within this framework:
observation, predictor or error?
7
Sol: 4.3
obs=data + model →h
8
Probability distributions for lifetime
variables
9
Exponential distribution
The exponential distribution is the ‘canonical
model’ for survival analysis.
lifetime T ∼ Exp(λ) λ > 0, T ≥ 0
pdf f(t) = λ exp(−λt), t ≥ 0;
survivor function S(t) = exp(−λt), t ≥ 0;
integrated hazard H(t) = λt, t ≥ 0;
hazard function h(t) = λ, t ≥ 0.
10
Exercise 4.4 Suggest an appropriate scale to
plot the survivor function.
11
Sol: 4.4
logS(t) = −λt linear:
so plot (t, logS(t)).
12
Exercise 4.5 Suggest how covariates can be
represented in this model.
13
Sol: 4.5
Through λ, note E(T ) = 1/λ.
i.e. this is an instance of a glm.
14
Weibull distribution
The Weibull distribution has scale parameter
λ > 0 and shape parameter α > 0.
lifetime
T ∼ Weibull(α, λ) λ > 0, α > 0, T ≥ 0
pdf f(t) = αλαtα−1e−(λt)α, t ≥ 0;
survivor function S(t) = e−(λt)α, t ≥ 0;
integrated hazard H(t) = (λt)α, t ≥ 0;
hazard function h(t) = λααtα−1, t ≥ 0.
15
Exercise 4.6 Suggest an appropriate scale to
plot the survivor function.
16
Sol: 4.6
logS(t) = −(λt)α and
log [− logS(t)] = α − λ log (t) linear:
so plot ( log (t), log [− logS(t)]).
The linear plot allows rough estimates of (α, λ)
to be read from the graph.
17
The parameters:
λ is the scale parameter, and
α is the shape parameter.
• α > 1, increasing hazard function
• α < 1, decreasing hazard function
• α = 1, Weibull(1, λ) = Exp(λ).
The scale of the hazard is determined by λ.
18
Looking back, for example, at the bearings
data, we have near linearity on the complementary
log–log scale but not the logarithmic scale.
The Weibull distribution is a better model for the
data than the exponential distribution.
19
Weibull moments
The Weibull(α, λ) distribution has the following
moments:
Expectation E(T ) = Γ(1 + 1/α)/λ;
Variance
var(T ) = [Γ(1 + 2/α) − Γ(1 + 1/α)2]/λ2.
One way of understanding the behaviour of a
variable T which has the Weibull(α, λ)
distribution is via the following exercise.
20
Exercise 4.7 Prove using the survivor functions
that if T ∼ Weibull(α, λ) and Z = Tα then
Z ∼ Exp(λα).
Also show that (λT )α ∼ Exp(1).
21
Sol: 4.7
SZ(z) = P (Z > z)
= P (Tα > z)
= P (T > z1/α)
= ST (z1/α)
= exp{−(λz1/α)α}= exp{−λαz},
which is the survivor function of the Exp(λα)
distribution.
The second part of the lemma is proved similarly
(and just as easily.) Homework.
22
The extreme value distribution
The extreme value distribution has parameters µ
and σ:
lifetime T ∼ EV(µ, σ) σ > 0, µ, T ∈ R1
pdf f(t) = σ−1 exp(
t−µσ
)exp
[− exp
(t−µσ
)],
−∞ < t < ∞;
survivor function S(t) = exp[− exp
(t−µσ
)],
−∞ < t < ∞;
integrated hazard H(t) = exp(
t−µσ
),
−∞ < t < ∞;
hazard function h(t) = σ−1 exp(
t−µσ
),
−∞ < t < ∞.
23
EV moments
Expectation E(T ) = µ − γσ where
γ = 0.5772 . . . is Euler’s constant.
Variance var(T ) = π2σ2/6 ∼ 1.645σ2
The extreme value distribution is obtained by a
logarithm transformation of an exponential
random variable.
24
Exercise 4.8 Show that if Z ∼ Exp(1) and
T = µ + σ log (Z) then T ∼ EV(µ, σ).
25
Sol: 4.8
ST (t) = P (T > t) = P (µ + σ log (Z) > t)
= P (Z > exp{(t − µ)/σ})= SZ(exp{(t − µ)/σ})= exp[− exp{(t − µ)/σ})],
which is the survivor function of the EV(µ, σ)
distribution as claimed.
26
The standard EV distribution has µ = 0 and
σ = 1 with pdf exp (t − exp t).
27
Like the Weibull distribution, there is some asymptotic
argument (beyond the scope of this course), analogous to the
central limit theorem for sample means, which suggests that
the extreme value distribution may be appropriate for
modelling lifetimes in some special circumstances.
However, unlike the Weibull distribution, the domain of T is
(−∞,∞), so that negative lifetimes will have a non–zero
probability.
This is obviously a limitation in the use of the model, though it
is still often used.
In particular, P (T < 0) = 1 − expˆ
− exp`
−µσ
´˜
, which can be
made very small for suitable values of µ and σ so in practice
the failings of the model are not so great.
28
Other distributions
other distributions are used: these include the
log–normal,
gamma and
log–logistic distributions,
which have the following density functions.
29
The Lognormal distribution
• T ∼ lognormal(µ, σ2) σ > 0 , t > 0
• T = exp(X) where X ∼ Normal(µ, σ2)
• logT ∼ Normal(µ, σ2)
• T = eµ(eZ)σ where Z ∼ Normal(0, 1)
•f(t) = (2πσ2t2)−1/2 exp{−( log t − µ)2/(2σ2)}.
E[T ] = exp
(µ +
1
2σ2
).
30
Basic functions for the standard log–normal
distribution
0 5 10 15
0.0
0.1
0.2
0.3
0.4
0.5
0.6
f(x
)
H
x0 5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
F(x
)
x0 5 10 15
0.0
0.2
0.4
0.6
0.8
1.0
S(x
)
x0 5 10 15
0.0
0.2
0.4
0.6
0.8
h(x
)
x
0 5 10 15
01
23
45
H(x
)x
Density, distribution, survivor, hazard and
cumulative hazard functions for the standard
lognormal distribution.
31
The Gamma distribution
• T ∼ Gamma(α, λ) λ > 0 , α > 0 , t > 0
• f(t) = λαtα−1 exp(−λt)/Γ(α)
• E[T ] = α/λ var[T ] = α/λ2
• X1, . . . , Xn iid Exp(λ)
⇒ T =∑n
i=1 Xi ∼ Gamma(n, λ)
32
The log-logistic distribution
The logistic distribution
X ∼ logistic(µ, σ) σ > 0 ,−∞ < x < ∞.
It looks like Normal(µ, σ2) but has a simple
closed form SX(x) = 1
1+exp(x−µσ )
.
• T ∼ log-logistic(µ, σ) σ > 0 , t > 0,
- T = exp(X) where X ∼ logistic(µ, σ),
- logT ∼ logistic(µ, σ).
• ST (t) = 11+(t exp(−µ))1/σ ,
pdf log–logistic
f(t) = 1/(σt) exp{( log t − µ)/σ}/[1 + exp{( log t − µ)/σ}]2, t
33
The logistic distribution is similar in shape to the
normal distribution.
It and the log–logistic are more manageable than
the normal and log–normal when we encounter
censored data since their survivor functions have
a simple closed forms.
34
MLE
We fit probability models to censored lifetime
data using maximum likelihood.
A big plus for ml that it can deal with
censoring naturally, and the
analysis extends to include covariate effects.
Assume n lifetime measurements t1, t2, . . . , tnwith associated censoring indicators δ1, δ2, . . . , δn.
Assume iid: that is ti derive from a common
distribution, with unknown parameters θ.
We derive the likelihood function.
35
Exercise 4.9 Give a generic definition of the
likelihood function.
36
Sol: 4.9
L(paras) = P(realised values| paras)
or
L(θ) = P(T = t|θ).
Obvious problem of 0 with continuous models.
Fudge: want L(θ) ∝ P(T = t|θ) so that define
L(θ) = fT (t|θ).
But how does this work with censored
observations?
37
Sol: 4.9 Take right censored.
Observe t∗ then in the model T ≥ t∗,
so L(paras) = P(realised value| paras)
becomes
L(θ) = P(T ≥ t∗|θ)= ST (t∗|θ).
38
If ti corresponds to an uncensored observation,
then the likelihood contribution is the pdf f(ti|θ)if ti corresponds to a censored lifetime, then
Ti ≥ ti, and the likelihood contribution is S(ti|θ).By independence the overall likelihood is:
L(θ) =∏
δi=1
f(ti|θ)∏
δi=0
S(ti|θ)
=∏
δi=1
h(ti|θ)S(ti|θ)∏
δi=0
S(ti|θ)
=n∏
i=1
h(ti|θ)δiS(ti|θ).
39
This calculation requires the independence of the
censoring mechanism and the lifetimes,
otherwise the likelihood is invalid.
Likelihood inference now proceeds in the standard
way, giving
maximum likelihood estimates,
standard errors and confidence intervals
(asymptotic).
For most lifetime models the procedures cannot
be implemented analytically and so numerical
techniques are required;
but for the exponential distribution . . .
40
Exponential mle
The likelihood becomes
L(λ) =n∏
i=1
h(ti|λ)δiS(ti|λ) =n∏
i=1
λδi exp(−λti),
ℓ(λ) =n∑
i=1
δi logλ −n∑
i=1
λti
= m logλ − λ
n∑
i=1
ti
where m is the number uncensored.
Setting the score to 0 gives the mle λ = mPni=1
ti.
With uncensored data we get the usual mle 1/t.
41
As
var(λ) ≈(− d2ℓ
dλ2
)−1
||bλ
the approximate variance is
var(λ) ≈ λ2
m.
Approx ci: λ ± 1.96√
varλ.
42
Exercise 4.10 With lifetime data 5, 7∗, 11 find
the mle and an asymptotic 95% ci.
43
Sol: 4.10 λ = 2/23 = 0.08695652,
1.96
√bλ2
m= 1.96λ/
√m = 0.1205156,
gives ci 0.086 ± 0.121.
Not great as covers negative values.
44
Exercise 4.11 Find the Fisher information for
this exponential likelihood.
Why does it not depend on the lifetimes?
45
Sol: 4.11
ℓ(λ) = m logλ − λn∑
i=1
ti
ℓ′(λ) =m
λ−
n∑
i=1
ti
ℓ′′(λ) = −m
λ2.
does it not depend on the lifetimes
mathematically as score function only affected by
additive constant
46
Exercise 4.12 Write down the log–likelihood for
the Weibull lifetime distribution.
47
Sol: 4.12
L(λ) =n∏
i=1
h(ti|λ)δiS(ti|λ)
=n∏
i=1
[λααtα−1i ]δie−(λti)
α
ℓ(λ) =n∑
i=1
δi log [λααtα−1i ]−(λti)
α
= (α log [λ] + logα)n∑
i=1
δi
+(α − 1)n∑
i=1
δi log (ti)−λα
n∑
i=1
tαi .
48
Exercise 4.13 The standard errors computed
from the Fisher information are asymptotic in the
sense that for large n the coverage probability of
a confidence interval is numerically accurate.
Briefly outline the mathematical argument that
justifies this statement.
Illustrate with the exponential distribution.
49
Sol: 4.13 The exponential score function is
ℓ(λ|t) = m logλ − λ
n∑
i=1
ti,
ℓ(λ|T ) = m logλ − λ
n∑
i=1
Ti.
But the rhs is a
sum of independent random variables
with finite second moment and if
appropriately standardised
is asymptotically Normal by the CLT.
50
mle using R
The functions Surv and survreg in the
R-package survival provide the means to
numerically find mle for parametric models.
51
Example 4.14
library(survival)
load("./m353data.Rdata")
attach(bearings) ; class(bearings)
survtime = Surv(time,cens) ; class(survtime)
survtime
Surv(time,cens) contains the censoring
information.
52
Example 4.15
survreg(survtime~1,data=bearings,dist=’exponential’)
Coefficients: (Intercept) 4.279729
Scale fixed at 1
Loglik(model)= -121.4
sum(cens)/sum(time) # analytic mle of lambda
log(sum(cens)/sum(time))
The constant model 1 fits the intercept but no
covariates.
There appears to be a mismatch between scale
for mles?
53
Fitting the Weibull
Consider lifetime variable T of individual with
covariates x (e.g. age, sex, . . . ).
We require a linear predictor and link function.
Within the Weibull family it is natural to model
the parameter λ in terms of x.
As λ > 0 it is also natural to use a logarithmic
link
λ = exp(β′x)
for some parameter vector β.
54
Recall that if T ∼ Weibull(α, λ), then
Tα ∼ Exp(λα) and (λT )α ∼ Exp(1).
So letting E = (λT )α and taking logs,
logT ∼ − logλ +1
αlogE.
The R function survreg takes the default
logarithmic link and fits models of the form
logT ∼ β′x + σ logE.
Comparing these expressions, R models
• − logλ = β′x is the linear predictor;
• the R-scale parameter σ = 1/α is the reciprocal
of the Weibull shape parameter.
55
Exercise 4.16 Fitting the bearings data with the
Weibull distribution gives
survreg(survtime~1,data=bearings,dist=’weibull’)
Coefficients:(Intercept) 4.405188
Scale= 0.4757721
Loglik(model)= -113.7
Interpret the estimates in terms of the standard
Weibull parameters.
56
Sol: 4.16
− log λ = 4.405188 st
λ = 0.0122 = 1/81.87,
1bα = 0.4757721 st
α = 2.10.
57
Exercise 4.17 Use this code to compare the
KM estimate of the survivor function with the
parametric estimate from the fitted Weibull
distribution.
plot(survfit(Surv(time,cens),data=bearings),col=’blue’)
t = seq(0.001,150,length=100)
H = (exp(-4.405188)*t)^(1/0.47577121) # integ hazard
S = exp(-H)
lines(t,S,type="l",col=’red’) ; grid()
58
Sol: 4.17 Fitted T ∼ Weibull(2.10, 0.0122).
From the graph:
excellent fit.
59
Model comparison
Does Weibull distribution provides a better fit to
these data than the exponential distribution?
The exponential distribution is obtained by fixing
α = 1 or the R scale=1 with the Weibull
distribution.
Under the null hypothesis, that α = 1, standard
likelihood theory tells us that the distribution of
2(ℓ1 − ℓ2) ∼ χ2.
There is 1 degree of freedom as there is a single
constraint in the sub–model α = 1.
R output:
2(l1 − l2) = 2(−113.7 − (−121.4)) = 15.4,
60
which is significant when compared with the χ21
distribution.
We conclude that the Weibull distribution gives a
much improved fit.
61
Fitting covariates
We look at two examples of fitting models of the
form
logT ∼ β′x + σ logE
which includes both exponential and Weibull
models.
62
Example: leukaemia data
There are n = 33 observations,
none are censored, and
2 explanatory variables: wbc and ag.
The first few observations are
leuk
wbc ag time
1 2300 present 65
2 750 present 156
3 4300 present 100
hist(wbc)
63
wbc measures white blood cell count,
its distribution is skewed, suggesting a logarithmic
transformation of this variable.
ag is a 2–level factor indicating presence or
absence of the compound in the patient’s blood.
First, we fit an additive model in which the linear
predictor − logλ has a different intercept for the
two ag groups, but the effect of log(wbc) is
constant within each group.
This corresponds to the model formula
log(wbc)+ag.
64
attach(leuk)
out = survreg(
Surv(time)~log(wbc)+ag,leuk,dist=’weibull’)
summary(out)
Value Std.Error z p
(Intercept) 5.8524 1.323 4.425 9.66e-06
log(wbc) -0.3103 0.131 -2.363 1.81e-02
agpresent 1.0206 0.378 2.699 6.95e-03
Log(scale) 0.0399 0.139 0.287 7.74e-01
Scale= 1.04
Loglik(model)= -146.5
Loglik(intercept only)= -153.6
summary(out)$loglik # -146.4988
65
Exercise 4.18 Write out the fitted version of
the model
logT ∼ − logλ +1
αlogE,
− logλ = β′x.
Does a larger wbc lead to longer remission times?
66
Sol: 4.18
logT ∼ − log λ +1
1.04logE,
− log λ = 5.85 − 0.31x + 1.02I(ag)
where x represents log(wbc).
A larger wbc leads to shorter remission times as
the coefficient of x is negative.
67
The fit of the model suggests that an exponential
lifetime distribution might be adequate.
out = survreg(
Surv(time)~log(wbc)+ag,leuk,dist=’expon’)
summary(out)
Value Std.Error z p
(Intercept) 5.815 1.263 4.60 4.15e-06
log(wbc) -0.304 0.124 -2.45 1.44e-02
agpresent 1.018 0.364 2.80 5.14e-03
Scale fixed at 1
Exponential distribution
Loglik(model)= -146.5 Loglik(intercept only)= -155.5
summary(out)$loglik # -146.5405
68
The deviance between the two models is
2(−146.4988 − (−146.5405)) = 0.0834, which is
tiny compared to χ21: the hypothesis is accepted.
The regression coefficients are clearly significant.
Hence, this exponential model with formula
log(wbc)+ag represents our ‘best’ model.
The estimated lifetime distribution for a patient is
T ∼ Exp(λ)
for a patient without ag
− log λ = 5.815 − 0.304 log (wbc),
while for a patient with ag present
− log λ = 5.815 + 1.018 − 0.304 log (wbc).
69
Diagnostics
The two models under consideration are the
exponential and the Weibull.
Assessing model fit needs to adjust for the
covariate effects.
If
Ti ∼ Exp(λi) implies λiTi ∼ Exp(1)
Ti ∼ Weibull(α, λi) implies (λiTi)α ∼ Exp(1)
and the rhs is the same for all i.
So use standardised lifetimes λiti for diagnostics.
out = survreg(Surv(time)~log(wbc) + ag, leuk, dist=’exp’)
ntimes <- time*exp(-out$linear.predictors)
70
plot(survfit(Surv(ntimes)),xlab=’t’, ylab=’S(t)’, # KM
col=’blue’, log=T); grid()
0 1 2 3 4
0.05
0.10
0.20
0.50
1.00
t
S(t
)
Residual plot of exponential fit to leuk data.
The plot is reasonably linear, except at long
71
lifetimes where there is some evidence of
curvature.
However there is less data there.
72
Example: gehan data
This diagnostic procedure works equally well with
censored data.
Consider the gehan data, fit the treatment group
as a factor and use a Weibull distribution.
attach(gehan)
out = survreg(Surv(time,cens)~treat,gehan,dist=’weibull’)
Coefficients: (Intercept) treatcontrol
3.515687 -1.267335
Scale= 0.7321944
Loglik(model)= -106.6 Loglik(intercept only)= -116.4
ntimes = time*exp(-out$linear.predictors)
ntimes = (ntimes)^(1/out$scale) # alpha = 1/Rscale
73
plot(survfit(Surv(ntimes,cens),gehan),
col=’blue’, log=T)
74
Exercise 4.19 Comment on the diagnostic plot
and interpret the fitted model.
75
Sol: 4.19 The diagnostic based on (λiTi)α
suggests the Weibull is adequate.
The fitted lifetime distribution is
T ∼ Weibull(1/0.73, λi)
where for a patient in the treatment group
− log (λ) = 3.516
while for a patient in the control group
− log (λ) = 3.516 − 1.267 = 2.25.
The lifetimes are increased by taking the
treatment.
76