StatisticsforSurvivalData Day2 Part IV WBL17–19 Regression ... · StatisticsforSurvivalData Day2...
Transcript of StatisticsforSurvivalData Day2 Part IV WBL17–19 Regression ... · StatisticsforSurvivalData Day2...
Statistics for Survival DataDay 2
WBL 17–19
Alain [email protected]
2018-08-27
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 1 / 176
Part IV
Regression Models
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 106 / 176
Learning objectives
Explain the model assumptions behind parametric regression modelsFit a regression model in RIndicate the fitted model from an R output, and interpret itAssess whether a fitted model is appropriate (model validation)Perform model or variable selection using forward or backward search
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 107 / 176
Section 1
Weibull regression
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 108 / 176
Repetition: Weibull model in logarithmic time scale
Recall Weibull model in logarithmic time scale:Let T be Weibull distributed, and set Y := logT .We have seen that Y belongs to a location-scale familyMore precisely, Y has probability density
fY (y) = αe(y−log λ)α exp(−e(y−log λ)α
)=
1σexp
(y − µσ− exp
(y − µσ
))
with σ := 1/α, µ := log λHence we can write Y = µ+ σZ , where Z has standard extremevalue distribution: fZ (z) = exp (z − ez)
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 109 / 176
Weibull model for two groups
Recall two-sample problem with heroin data set:We fitted a Weibull model two both clinics, using the same shape, butdifferent scale parametersOn the logarithmic time scale, this means we fitted individual µ’s, buta common σ:Clinic 1: Y1 = µ1 + σZClinic 2: Y2 = µ2 + σZ
Introduce a binary explanatory variable X , settings X = 0 foraddicts in clinic 1, and X = 1 for addicts in clinic 2 (indicator variablefor clinic 2).The model can be rewritten as Y = β0 + β1X + σZ , with β0 = µ1,β1 = µ2 − µ1.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 110 / 176
Heroin example: adding more explanatory variables
Apart from the clinic, we have more explanatory variables in the heroindata set:
I prison: indicates whether patient has prison record or notI dose: patient’s methadone dose (continuous variable)
They could also have an influence on the remission time T (andY = logT )Hence we can extend our Weibull regression model:
Y = β0 + β1X1 + β2X2 + β3X3 + σZ ,
X1: indicator variable for clinic 1, X2: indicator variable for prisonrecord, X3: methadone dose
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 111 / 176
Heroin example: Weibull regression in R
R syntax is no surprise:library(survival)addicts.weib.full <- survreg(Surv(survt, status) ~ clinic + prison + dose,
data = addicts, dist = "weibull")summary(addicts.weib.full)
#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## dose 0.02443 0.00459 5.32 1.0e-07## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 112 / 176
General Weibull regression model I
Definition (Weibull regression model)Let T be an event time, and X1, . . . ,Xp explanatory variables. The generalWeibull regression model looks as follows:
Y := logT , Y = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard extreme value distribution: fZ (z) = exp (z − ez).
µ = β0 + β1X1 + . . .+ βpXp is called the linear predictorEstimation of the model parameters β0, . . . , βp and σ can be donewith maximum-likelihood.MLE can account for censored data; therefore survival data shouldalways be fitted with survreg, and not, e.g., glm
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 113 / 176
General Weibull regression model II
Weibull regression model is normally written in logarithmic time scale(as before)Translated back to the original time scale, the survivor function reads
S(t;x1, . . . , xp) = P[T > t | X1 = x1, . . . ,Xp = xp]
= exp{− exp
[α(log(t)− β0 − β1x1 − . . .− βpxp)
]}The hazard function is hence
h(t;x1, . . . , xp) = −∂S∂t (t; x1, . . . , xp)
S(t; x1, . . . , xp)
= α exp[(α− 1) log(t)− α(β0 + β1x1 + . . .+ βpxp)
]= αtα−1 exp
[−α(β0 + β1x1 + . . .+ βpxp)
]Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 114 / 176
Predicting survival times
Heroin example: consider an addict without prison record and a methadonedose of 60 (in units of the data set).
Linear predictor for clinic 1 and 2:newdata <- data.frame(clinic = c("1", "2"),
prison = c("no", "no"), dose = c(60, 60))predict(addicts.weib.full, type = "lp", newdata = newdata)
## 1 2## 6.279409 6.988450
Predicted median (50% quantile) of remission time for both clinics:predict(addicts.weib.full, type = "quantile",
newdata = newdata, p = 0.5)
## 1 2## 408.2661 829.6140
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 115 / 176
Confidence intervals for quantiles
With argument se.fit = TRUE, predict also returns the estimatedstandard error for the quantilesTo calculate confidence intervals for the quantiles, it’s better to getquantiles and standard errors on the logarithmic time scale; use type ="uquantile" then:uquant <- predict(addicts.weib.full, type = "uquantile",
newdata = newdata, p = 0.75, se.fit = TRUE)quant.ci <- data.frame(
ci.lwr = exp(uquant$fit - qnorm(0.975)*uquant$se.fit),est = exp(uquant$fit),ci.upr = exp(uquant$fit + qnorm(0.975)*uquant$se.fit))
quant.ci
## ci.lwr est ci.upr## 1 571.0249 677.0832 802.8401## 2 1001.2688 1375.8618 1890.5968
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 116 / 176
Model validation: Tukey-Anscombe and Q-Q plot
Tukey-Anscombe plotPlot of residuals vs. fitted valuesWith survival data, we only know the residuals of non-censoredindividualsIt usually makes sense to plot residuals and fitted values in logarithmictime scalePlot should show residuals that have a similar distribution over allfitted values (no trend, no cone, etc.)
Q-Q plot of residuals:Plot of empirical quantiles of residuals vs. theoretical residualsexpected by error distribution (in our case, Weibull distribution, orextreme value, if on logarithmic time scale)Q-Q plot should show a straight line
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 117 / 176
TA plot and Q-Q plot in Rlibrary(car)par(mfrow = c(1, 2), cex = 0.6)lin.pred <- predict(addicts.weib.full, type = "lp")[addicts$status == 1]log.resid <- log(addicts$survt[addicts$status == 1]) - lin.predplot(lin.pred, log.resid, main = "TA plot",
xlab = "log(fitted values)", ylab = "log(residuals)")qqPlot(exp(log.resid), dist = "weibull",
shape = 1/addicts.weib.full$scale,main = "Q-Q plot", xlab = "Theor. quantiles", ylab = "Emp. quantiles")
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●●●
●
●
●● ●●
●
●●● ●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●●● ●
●
●●
●
●
●●
●
5.5 6.0 6.5 7.0 7.5 8.0
−4
−3
−2
−1
01
TA plot
log(fitted values)
log(
resi
dual
s)
0.0 1.0 2.0 3.0
0.0
0.5
1.0
1.5
2.0
Q−Q plot
Theor. quantiles
Em
p. q
uant
iles
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●●●
●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●
●●●●
●●●●
●●●●●●●●●
●●
●
●
●
●698
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 118 / 176
Model validation for discrete explanatory variables
Assume we have only discrete explanatory variablesRemember from Weibull model without explanatory variables:
log(− log(S)
)= α log(t)− α log(λ)
Here, we have log(λ) = µ = β0 + β1x1 + . . .+ βpxp
A plot of log(− log(S)
)vs. log(t) should show straight, parallel lines:
I one line per group (= set of subjects sharing the same levels ofexplanatory variables)
I intersection of each group’s line given by values of explanatory variables
In practice, we can plot log(− log(S)
)vs. log(t) with S estimated by
the Kaplan-Meier estimator
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 119 / 176
Graphical model validation: heroin example
Let’s include only clinic and prison as explanatory variables; so theKaplan-Meier estimator in all groups is still reliable:addicts.km.strat <- survfit(Surv(survt, status) ~ clinic + prison,
data = addicts)addicts.table <- summary(addicts.km.strat)plot(log(addicts.table$time), log(-log(addicts.table$surv)),
xlab = "log(t)", ylab = "log(-log(S))", pch = 20, col = addicts.table$strata)
●
●
●
●
●●●
●●●●
●● ●●●
●●●●●●
●●●●●
●●●●●●●
●●●●●●●●●●●●●●
●●●●●
●●●●●●
●●●●
●
●
●
●
●
●●
●●
●●●●●●●
●●●●●●
●●●●
●●●●●●
●●●●●●●●●●
●●●●●●
●●●●●●
●
●
●
●
●
●●
●●
●●●
●●
●●
●
●
●
●
●●
●●●
●●
●
2 3 4 5 6
−4
−3
−2
−1
01
log(t)
log(
−lo
g(S
))
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 120 / 176
A closer look at the Weibull summary
The summary of a Weibull regression model automatically prints z andp values for the coefficients:#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## dose 0.02443 0.00459 5.32 1.0e-07## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238
How are these p values calculated?
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 121 / 176
Recall: Fisher information matrix
Denote the vector of all model parameters by θ:θ = (β0, β1, . . . , βp, σ)
Recall (Part III): the MLE θ is asymptotically normally distributedaround the true parameter vector θ0:
θ ≈ N (θ0, I−1(θ0))
I (θ0) is the Fisher information matrix with entries
(I (θ)
)jk
:= −E[
∂2
∂θj∂θklog L(θ)
]
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 122 / 176
Confidence intervals and tests for coefficients
Consequence 1: a component θk of the parameter vector θ hasstandard error
se(θk) =√(
I (θ0)−1)kk,
which can be estimated as
se(θk) =
√(I (θ)−1
)kk
Consequence 2: an approximate confidence interval to the confidencelevel 1− α is given by[
θ − Φ−1(1− α/2) · se(θ), θ + Φ−1(1− α/2) · se(θ)]
Consequence 3: under the null hypothesis H0 : θk = 0, the teststatistic Z = θk/se(θk) is approximately normally distributed.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 123 / 176
Heroin example: manual calculation of p value
To illustrate the Z test, we can “manually” calculate the p value of thevariable clinic2:
Extract the estimated variance of the parameter vector (i.e. theinverse Fisher information matrix):I.inv <- addicts.weib.full$var
Estimate the standard error of the coefficient for clinic2:(se <- sqrt(I.inv[2, 2]))
## [1] 0.1572246
Calculate the Z statistic and the corresponding p value:(z <- coef(addicts.weib.full)[2]/se)
## clinic2## 4.509734
2*(pnorm(abs(z), lower.tail = FALSE))
## clinic2## 6.490885e-06
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 124 / 176
Section 2
Model Selection
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 125 / 176
Heroin example: remove variable prison
From the summary of the full Weibull model for the heroin data set, wesee that prison is not significant (on the 5% level)Can we remove the variable from the model?Compare the full and the reduced model with a likelihood ratio test:addicts.weib.red <- survreg(Surv(survt, status) ~ clinic + dose,
data = addicts, dist = "weibull")anova(addicts.weib.red, addicts.weib.full, test = "Chisq")
## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)## 1 clinic + dose 234 2172.503 NA NA NA## 2 clinic + prison + dose 233 2168.953 = 1 3.549613 0.05955934
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 126 / 176
Likelihood ratio test I
The likelihood ratio test can be used to compare arbitrary nestedmodelsTest statistic is always
D := −2 log(
likelih. of null modellikelih. of altern. model
)When the alternative (larger) model has k parameters more than thenull (smaller) model, the test statistic has the asymptotic distribution
D ≈ χ2k
under the null hypothesis
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 127 / 176
Heroin example: comparing nested models
We compare two Weibull models that differ by 2 explanatory variables:addicts.weib.red2 <- survreg(Surv(survt, status) ~ clinic,
data = addicts, dist = "weibull")anova(addicts.weib.red2, addicts.weib.full, test = "Chisq")
## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)## 1 clinic 235 2200.155 NA NA NA## 2 clinic + prison + dose 233 2168.953 = 2 31.20223 1.676959e-07
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 128 / 176
Likelihood ratio test II
Formal representation of likelihood ratio test:1 Model: Y logT = β0 + β1X1 + . . .+ βpXp + σZ , with Z having
standard extreme value distribution2 Null hypothesis: H0 : β1 = β2 = . . . = βk = 0
Alternative hypothesis: HA : β1, β2, . . . , βk 6= 0
3 Test statistic: D = −2 log(
likelih. of null modellikelih. of altern. model
)Distribution of D under H0: T ∼ χ2
k (χ2 distribution with k degreesof freedom)
4 Choose significance level: e.g. α = 5%
5 Range of rejection: K = [q,∞), where q is the (1− α)-quantile ofthe χ2 distribution with k degrees of freedom
6 Test decision: reject if D ∈ K
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 129 / 176
Akaike information criterion
Alternative to likelihood ratio test: Akaike information criterion(AIC)AIC assigns a score to every model: AIC = 2k − 2 log(likelihood); k :number of parameters of the modelAIC “penalizes complexity”Model selection procedure: from a given set of candidate models,take the one that minimizes the AICHeroin example:AIC(addicts.weib.full)
## [1] 2178.953
AIC(addicts.weib.red)
## [1] 2180.503
AIC(addicts.weib.red2)
## [1] 2206.155
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 130 / 176
Likelihood ratio test and AIC: comparison
LR test AICAdvantage p-value can compare non-nested
modelsDisadvantage only for nested models not clear how small differ-
ence of AIC must be in-terpreted
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 131 / 176
Model selection: overview
When fitting a Weibull regression model with many explanatoryvariables, not all of them are significant in general (heroin example:prison)As with other regression techniques, we should perform model orvariable selection to get rid of non-significant variablesReasons:
I avoid overfittingI improve interpretability of modelI improve predictive power of model
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 132 / 176
Stepwise model selection I
Theoretically best approach for model selection: exhaustive search:1 fit every possible model2 keep the “best” one according to some criterion (e.g., the one that
minimizes the AIC, or the BIC, or similar)
With p explanatory variables, there are 2p possible models exhaustive search infeasible even with moderate p
Computationally feasible alternative: greedy or stepwise search,either using a likelihood ratio test or a model selection criterion suchas AIC.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 133 / 176
Stepwise model selection II
General idea of greedy search: instead of exhaustively searching the fullmodel space, traverse it in small steps, adding or removing one explanatoryvariable at a time. Two approaches:
Backward selectionI start with full modelI sequentially drop variable that maximally reduces AICI stop when AIC cannot be minimized further
Forward selectionI start with empty modelI sequentially add variable that maximally reduces AICI stop when AIC cannot be minimized further
Do these methods necessarily find the model with the lowest AIC?
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 134 / 176
Backward selection in R
library(MASS)addicts.bw <- stepAIC(addicts.weib.full, direction = "backward",
trace = 0)summary(addicts.bw)
#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## dose 0.02443 0.00459 5.32 1.0e-07## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 135 / 176
Forward selection in R
addicts.empty <- survreg(Surv(survt, status) ~ 1,data = addicts, dist = "weibull")
addicts.fw <- stepAIC(addicts.empty, direction = "forward",scope = list(upper = ~ clinic + prison + dose), trace = 0)
summary(addicts.fw)
#### Call:## survreg(formula = Surv(survt, status) ~ dose + clinic + prison,## data = addicts, dist = "weibull")## Value Std. Error z p## (Intercept) 4.81389 0.27499 17.51 < 2e-16## dose 0.02443 0.00459 5.32 1.0e-07## clinic2 0.70904 0.15722 4.51 6.5e-06## prisonyes -0.22947 0.12079 -1.90 0.057## Log(scale) -0.31495 0.06756 -4.66 3.1e-06#### Scale= 0.73#### Weibull distribution## Loglik(model)= -1084.5 Loglik(intercept only)= -1114.9## Chisq= 60.89 on 3 degrees of freedom, p= 3.8e-13## Number of Newton-Raphson Iterations: 7## n= 238
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 136 / 176
Section 3
Log-Normal and Log-Logistic Regression
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 137 / 176
Regression models for location-scale families
Recall model in Weibull regression: in logarithmic time scale
Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard extreme value distribution.Standard representation valid for all location-scale families: if Y hasdensity f (y ;µ, σ) from a location-scale family, Y = µ+ σZ with Zhaving “standard” distribution f (z ; 0, 1)
Hence we use the same ansatz of regression models for log-normal andlog-logistic distributions:
Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard normal (log-normal case) or standard logistic(log-logistic case) distribution
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 138 / 176
Log-normal regression model
If the event time T has log-normal distribution, we model itsdependency from explanatory variables as
Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard normal distribution: Z ∼ N (0, 1)
The survivor function has the form
S(t | X1, . . . ,Xp) = 1− Φ
(1σ
(log(t)− β0 − β1X1 − . . .− βpXp
))
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 139 / 176
Log-normal regression in R
Fitting a log-normal model is completely analog to fitting a Weibull model:addicts.lnorm.full <- survreg(Surv(survt, status) ~ clinic + prison + dose,
data = addicts, dist = "lognormal")summary(addicts.lnorm.full)
#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "lognormal")## Value Std. Error z p## (Intercept) 3.98333 0.34663 11.49 <2e-16## clinic2 0.57649 0.17648 3.27 0.0011## prisonyes -0.30904 0.15431 -2.00 0.0452## dose 0.03367 0.00568 5.93 3e-09## Log(scale) 0.07476 0.05930 1.26 0.2074#### Scale= 1.08#### Log Normal distribution## Loglik(model)= -1097.8 Loglik(intercept only)= -1123.7## Chisq= 51.85 on 3 degrees of freedom, p= 3.2e-11## Number of Newton-Raphson Iterations: 4## n= 238
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 140 / 176
Model validation: TA and Q-Q plot
Model validation is analog to the Weibull case; it makes sense to make theQ-Q plot on the logarithmic time scale. R code for Q-Q plot (rest as forWeibull regression):qqPlot(log.resid, dist = "norm",
sd = addicts.lnorm.full$scale,main = "Q-Q plot", xlab = "Theor. quantiles (normal)", ylab = "Emp. quantiles")
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●● ●● ●
●●●
●
●
●●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●●
●
●
●●
●
5.0 5.5 6.0 6.5 7.0 7.5 8.0
−3
−2
−1
01
TA plot
log(fitted values)
log(
resi
dual
s)
−3 −2 −1 0 1 2 3
−3
−2
−1
01
Q−Q plot
Theor. quantiles (normal)
Em
p. q
uant
iles
●
●
● ●●●●●●
●●●●●
●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●
●●
● ● ●
●
86
110
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 141 / 176
Log-logistic regression model
If the event time T has log-logistic distribution, we model itsdependency from explanatory variables as
Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard logistic distribution: f (z) =e−z
(1 + e−z)2
The survivor function has the form
S(t | X1, . . . ,Xp) =1
1 + exp[α(log(t)− β0 − β1X1 − . . .− βpXp
)]
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 142 / 176
Log-logistic regression in R
Fitting a log-logistic model is completely analog to fitting a Weibull model:addicts.llogis.full <- survreg(Surv(survt, status) ~ clinic + prison + dose,
data = addicts, dist = "loglogistic")summary(addicts.llogis.full)
#### Call:## survreg(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts, dist = "loglogistic")## Value Std. Error z p## (Intercept) 4.14387 0.33829 12.25 < 2e-16## clinic2 0.58060 0.17157 3.38 0.00071## prisonyes -0.29127 0.14396 -2.02 0.04305## dose 0.03161 0.00552 5.73 1.0e-08## Log(scale) -0.53314 0.06863 -7.77 7.9e-15#### Scale= 0.587#### Log logistic distribution## Loglik(model)= -1093.9 Loglik(intercept only)= -1120## Chisq= 52.18 on 3 degrees of freedom, p= 2.7e-11## Number of Newton-Raphson Iterations: 4## n= 238
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 143 / 176
Model validation: TA and Q-Q plot
Model validation is analog to the Weibull case; it makes sense to make theQ-Q plot on the logarithmic time scale. R code for Q-Q plot (rest as forWeibull regression):qqPlot(log.resid, dist = "norm",
sd = addicts.llogis.full$scale,main = "Q-Q plot", xlab = "Theor. quantiles (logistic)", ylab = "Emp. quantiles")
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●●
●
●
●
●● ●●
●
●●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●●
●
●
●●
●
5.0 5.5 6.0 6.5 7.0 7.5 8.0
−3
−2
−1
01
TA plot
log(fitted values)
log(
resi
dual
s)
−1.5 −0.5 0.5 1.0 1.5
−3
−2
−1
01
Q−Q plot
Theor. quantiles (logistic)
Em
p. q
uant
iles
●
●
●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●
●●●●
●● ● ●
●
86
110
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 144 / 176
Part V
Cox Proportional Hazards Model
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 145 / 176
Learning objectives
Write the general form of a Cox proportional hazards modelExplain the difference to parametric regression modelsFit a Cox PH model in RInterpret the R output of a Cox PH fit, expecially the hazard ratiosPerform model validation with graphical methods and tests
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 146 / 176
Cox proportional hazards (PH) model
Considered so far: parametric regression models for survival dataIn the absence of explanatory variables, we have seen the powerful,non-parametric Kaplan-Meier estimatorThe Cox proportional hazards (PH) model combines both the flexibilityof non-parametric models and the interpretability of regression modelsThe Cox PH model is one of the most popular models in survivalanalysis, especially in the analysis of medical data
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 147 / 176
Cox PH model
Cox PH modelThe Cox proportional hazards model assumes that the hazard at time thas the following form:
h(t; x1, . . . , xp) = h0(t) · exp(β1x1 + . . .+ βpxp)
h0(t) is called the baseline hazard.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 148 / 176
Notes on the Cox PH model
The baseline hazard is not specified more precisely, i.e. has noparametric form Cox PH model is called a semiparametric modelThe baseline hazard is the same for all subjectsThe explanatory variables are assumed to be time-independentThe hazard function must, by definition, always be positive; theexponential function in the Cox PH model assures this (comparableapproach to logistic regression)
Why don’t we need a parameter β0, contrary to regression models?
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 149 / 176
Partial likelihood
Hazard of Cox PH model is not fully parametric no MLE possibleTherefore, consider partial likelihood Lc instead of likelihood:
Lc(β) =∏
i :δi=1
Li (β) ,
Li (β) = probability that an individual with covariates of subject i hasfailure at ti given that there is one failure in the risk set of tiIn “normal” likelihood, there is no conditioning involved.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 150 / 176
Fitting a Cox PH model
Usual approach of estimating model (Cox, 1972), implemented in Rfunction coxph: maximize partial likelihoodThis approach makes baseline hazard a nuisance parameterHence, we only get an estimate of the coefficients β in a first stepThis is normally sufficient since we are usually only interested in thecoefficients: they define hazard ratios (see later).Consequence: if we want to plot the fitted survivor function, we mustuse a Kaplan-Meier estimator in addition to coxph
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 151 / 176
Estimating Cox PH model in R
Cox PH models can be fitted with the function coxph from the survivalpackage:addicts.cox <- coxph(Surv(survt, status) ~ clinic + prison + dose,
data = addicts)summary(addicts.cox)
## Call:## coxph(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts)#### n= 238, number of events= 150#### coef exp(coef) se(coef) z Pr(>|z|)## clinic2 -1.009896 0.364257 0.214889 -4.700 2.61e-06 ***## prisonyes 0.326555 1.386184 0.167225 1.953 0.0508 .## dose -0.035369 0.965249 0.006379 -5.545 2.94e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### exp(coef) exp(-coef) lower .95 upper .95## clinic2 0.3643 2.7453 0.2391 0.5550## prisonyes 1.3862 0.7214 0.9988 1.9238## [...]
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 152 / 176
Lots of output. . .
## Call:## coxph(formula = Surv(survt, status) ~ clinic + prison + dose,## data = addicts)#### n= 238, number of events= 150#### coef exp(coef) se(coef) z Pr(>|z|)## clinic2 -1.009896 0.364257 0.214889 -4.700 2.61e-06 ***## prisonyes 0.326555 1.386184 0.167225 1.953 0.0508 .## dose -0.035369 0.965249 0.006379 -5.545 2.94e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#### exp(coef) exp(-coef) lower .95 upper .95## clinic2 0.3643 2.7453 0.2391 0.5550## prisonyes 1.3862 0.7214 0.9988 1.9238## dose 0.9652 1.0360 0.9533 0.9774#### Concordance= 0.665 (se = 0.026 )## Rsquare= 0.238 (max possible= 0.997 )## Likelihood ratio test= 64.56 on 3 df, p=6e-14## Wald test = 54.12 on 3 df, p=1e-11## Score (logrank) test = 56.32 on 3 df, p=4e-12
Coefficients βjHazard ratios:exponentiatedcoefficients e βj
p-values for globalsignificance tests
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 153 / 176
Hazard ratio I
Consider two subjects i and j with explanatory variables xi1, . . . , xipand xj1, . . . , xjp, resp.
Their hazard ratio is HR =h(t; xi1, . . . , xip)
h(t; xj1, . . . , xjp)
The HR says how much more likely it is that subject i has an event inthe next time unit than subject jHeroin example: the HR between patients i and j says how muchlikely it is that patient i is released the next day than patient jIf the event refers to death, the HR expresses how much higher theinstantaneous death probability is for subject i than for subject j
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 154 / 176
Hazard ratio II
Why is the hazard ratio interesting?Heroin example: suppose subjects i and j have both no prison record(xi2 = xj2 = 0) and got the same methadone dose (xi3 = xj3), butsubject i was treated in clinic 2 (xi1 = 1) and subject j in clinic 1(xj1 = 0).Then the hazard ratio of subjects i and j is
HR =h0(t) · exp(β1 · xi1 + . . .+ βp · xip)
h0(t) · exp(β1 · xj1 + . . .+ βp · xjp)=
eβ1·1
eβ1·0= eβ1
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 155 / 176
Hazard ratio in a Cox PH model
Hazard ratio in Cox PH modelIn a Cox PH model, the hazard ratio of the j-th explanatory variable isdefined as
HRj = eβj
Example: heroin data. The HR of the variable clinic, HR1 = eβ1 , sayshow much more likely it is . . .
. . . that a patient from clinic 2 is released the next day. . .
. . . compared to a patient from clinic 1, . . .
. . . given or assuming that they coincide in all other explanatoryvariables (prison record and methadone dose)
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 156 / 176
Heroin example: hazard ratio of clinic
The estimated HR of the variable clinic is HR1 = 0.3643.What does this mean?Suppose you have to decide whether clinic 1 or clinic 2 does a betterjob (i.e. is releasing patients earlier) as a politician.Do you take the HR from a Cox PH model, or the log-rank test frombefore? Why?
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 157 / 176
Plotting a fitted Cox PH model I
The output of coxph cannot be plotted directly:plot(addicts.cox)
gives an error! (Why?)Example: estimate the survivor function for two patients with noprison record and a mean methadone dose, one in clinic 1 and one inclinic 2:sample.data <- data.frame(
clinic = c("1", "2"), prison = rep("no", 2), dose = rep(mean(addicts$dose), 2))sample.surv <- survfit(addicts.cox, newdata = sample.data)
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 158 / 176
Plotting a fitted Cox PH model II
. . . and plot the fitted survivor function:plot(sample.surv, col = c(1, 2), conf.int = TRUE,
xlab = "Time [days]", ylab = "S(t)")legend("bottomleft", bty = "n", lty = 1, col = 1:2,
legend = sprintf("clinic %d", 1:2))
0 200 400 600 800 1000
0.0
0.2
0.4
0.6
0.8
1.0
Time [days]
S(t
)
clinic 1clinic 2
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 159 / 176
Cox PH assumption revisited I
Recall the proportional hazards assumption:
h(t; x1, . . . , xp) = h0(t) · exp(β1x1 + . . .+ βpxp)
Hence
log h(t; x1, . . . , xp) = log h0(t) + β1x1 + . . .+ βpxp
Consequence: plots of log h(t; x1, . . . , xp) for different groups(assuming discrete, or discretized explanatory variables) should showparallel linesIn practice, we can use kernel estimates (see Part II) of the hazardfunctions
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 160 / 176
Cox PH assumption revisited I
A more common approach for graphical model validation only involvesthe Kaplan-Meier estimator, and no estimate of the hazard functionsThe cumulative hazard function H(t; x1, . . . , xp) =∫ t0 h(u; x1, . . . , xp) du can be decomposed as
H(t; x1, . . . , xp) = H0(t) ·exp(β1x1 + . . .+βpxp), H0(t) =
∫ t
0h0(u) du
Taking the logarithm yields
log(H(t; x1, . . . , xp)) = log(H0(t)) +
p∑j=1
βjxj
Recall that H(t; x1, . . . , xp) = − log S(t; x1, . . . , xp)
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 161 / 176
Model validation: does the PH assumption hold?
Practical consequence of last line:
Graphical test for PH assumptionIf the proportional hazards assumption holds, a plot of log(− log(S)) vs. tfor different groups of subjects shows parallel lines.
The PH assumption can be tested variable by variable.
−ln(−ln) S
Time
Females
MalesLowMedium
High
t
−ln−ln S
(Figures from Kleinbaum and Klein, 2005)
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 162 / 176
Heroin example: testing the PH assumption I
Plotting log(− log(S)) vs. t for the covariateclinic:addicts.km.clinic <-
survfit(Surv(survt, status) ~ clinic,data = addicts)
addicts.table <- summary(addicts.km.clinic)plot(addicts.table$time,
log(-log(addicts.table$surv)),col = addicts.table$strata,xlab = "log(t)", ylab = "log(-log(S))",pch = 20)
●
●
●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●● ●●●● ●
●
●
●
●●
●●●
●●●●
●●●●●●●
● ●● ●● ● ● ●●
0 200 600
−5
−3
−1
1
log(t)
log(
−lo
g(S
))
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 163 / 176
Heroin example: testing the PH assumption II
Simpler way: using function survplot from the rms package.As survplot is not compatible with survfit output, we have to fit the KMestimator using npsurv (which does exactly the same thing as survfit . . . ):library(rms)survplot(npsurv(Surv(survt, status) ~ clinic, data = addicts),
loglog = TRUE, xlab = "Time", ylab = "log(-log(S))")
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 164 / 176
Heroin example: testing the PH assumption III
Plots for all three covariates:
Time0 240 480 720 960
log(
−lo
g(S
))−
8−
6−
4−
20
2
clinic=1
clinic=2
Clinic
Time0 240 480 720 960
log(
−lo
g(S
))−
8−
6−
4−
20
2prison=noprison=yes
Prison
Time0 240 480 720 960
log(
−lo
g(S
))−
8−
6−
4−
20
2
dose >= median(dose)=FALSE
dose >= median(dose)=TRUE
Dose
clinic violates the Cox PH assumption!
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 165 / 176
Cox-Snell residuals I
Plot of log(− log(S)) vs. t:I Advantages: easy to produce, easy to understandI Disadvantage: not so easy to see whether lines are parallel: where they
are steep, they seem to be closer than where they are flat.
Cox-Snell residuals can be used to do a different graphical modelvalidation; it’s more difficult to understand, but easier to judge
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 166 / 176
Cox-Snell residuals II
Let xi1, . . . , xip denote the covariates of subject i , Yi = min{Ti ,Ci}its event or censoring time.Cox-Snell residual:
rCi := H0(Yi ) · exp(β1xi1 + . . .+ βpxip) ,
H0(t) :=∫ t0 h0(u) du.
It can be shown that Cox-Snell residuals have exponential distributionwhen the Cox PH assumptions are met.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 167 / 176
Checking Cox-Snell residuals in R
In R, we get Cox-Snell residuals as follows:cox.snell <- abs(addicts$status - addicts.cox$residuals)
We can plot them against their cumulativehazard function as follows:qqPlot(cox.snell, dist = "exp", rate = mean(cox.snell))
0 2 4 6 8 10
01
23
4
exp quantiles
cox.
snel
l
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●
●●●
●●
●
●9
84
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 168 / 176
Goodness of fit testing approach
Goodness of fit testing approach is an alternative to graphical modelvalidationPro: provides a single p-value ( more objective)Contra: tests for violations of the assumption test must be donethe “wrong way”, with an unknown type II error rateRough idea of goodness of fit test:
I calculate residuals for each of the explanatory variablesI check whether the residuals are not correlated to survival time
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 169 / 176
Schoenfeld residuals and goodness of fit test
Consider the heroin data set againSuppose subject i has as event at time t(j). Then his or herSchoenfeld residual for the variable dose is the difference betweenhis or her methadone dose and a weighted mean methadone dose ofall individuals still at risk at time t(j)
The mean is weighted by the hazard of the patients in the risk setThe goodness of fit test for the variable dose now tests the nullhypothesis that the correlation coefficient between the Schoenfeldresiduals for dose and the survival time is 0.For categorical explanatory variables, the calculation of the Schoenfeldresidual is a bit different, but similar.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 170 / 176
Goodness of fit test in R
In R, the goodness of fit can be tested with the function cox.zph from thesurvival package:cox.zph(addicts.cox)
## rho chisq p## clinic2 -0.2578 11.19 0.000824## prisonyes -0.0382 0.22 0.639369## dose 0.0724 0.70 0.402749## GLOBAL NA 12.62 0.005546
We get separate p-values for each explanatory variable as well as a globalp-value. Again: clinic seems to violate the PH assumption.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 171 / 176
Example: survival of lung cancer patients
Data set from the Insel hospital: overall survival (variable os) wasmeasured from 67 patients with non-small cell lung cancer (NSCLC)Additional variables:
I clinical variables (grade, stage, age, preop, other, rt, drug)I gene expression measurements (cda, gldc, rrm2, tk1, tyms)
Question: which genes or clinical variables are good predictors for theoverall survival of patients?We clearly have too many explanatory variables to fit a good model variable selection
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 172 / 176
Variable selection for Cox models
variable selection for Cox models works exactly as for parametric regressionmodels.
Manual approach: iteratively remove least significant variable,compare larger and smaller model with likelihood-ratio testAutomatic approach: use forward or backward selection based on AIC
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 173 / 176
Lung cancer data set: manual variable selection
We demonstrate the first step of manual variable selection for the lungcancer data set:nsclc.cox.full <- coxph(Surv(os, status) ~ ., data = nsclc)summary(nsclc.cox.full)
The long output is omitted here; gldc is the least significant variable,hence we remove it:nsclc.cox.red <- update(nsclc.cox.full, . ~ . - gldc)# library(lmtest)anova(nsclc.cox.red, nsclc.cox.full, test = "Chisp")
## Analysis of Deviance Table## Cox model: response is Surv(os, status)## Model 1: ~ grade + stage + age + preop + other + rt + drug + cda + rrm2 + tk1 + tyms## Model 2: ~ grade + stage + age + preop + other + rt + drug + cda + gldc + rrm2 + tk1 + tyms## loglik Chisq Df P(>|Chi|)## 1 -37.681## 2 -37.677 0.0078 1 0.9295
We can indeed accept the simpler model.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 174 / 176
Lung cancer data set: automatic variable selection
The well-known function stepAIC is a more convenient way of doingvariable selection, e.g. with backward selection:library(MASS)nsclc.cox.red <- stepAIC(nsclc.cox.full, direction = "backward", trace = 0)summary(nsclc.cox.red)
## Call:## coxph(formula = Surv(os, status) ~ grade + age + other + rt +## drug + tyms, data = nsclc)#### n= 67, number of events= 13#### coef exp(coef) se(coef) z Pr(>|z|)## grade 1.418e+00 4.131e+00 6.125e-01 2.316 0.02057 *## age 7.679e-02 1.080e+00 3.383e-02 2.270 0.02320 *## otheryes 2.830e+00 1.695e+01 9.676e-01 2.925 0.00344 **## rtno -2.244e+00 1.060e-01 1.152e+00 -1.948 0.05140 .## rtyes -2.176e+01 3.551e-10 8.298e+03 -0.003 0.99791## drug 9.185e-01 2.506e+00 3.627e-01 2.533 0.01132 *## tyms -3.857e-01 6.800e-01 1.583e-01 -2.436 0.01485 *## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## [...]
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 175 / 176
References I
David R. Cox. Regression models and life tables. Journal of the Royal StatisticalSociety, 34:187–220, 1972.
David G Kleinbaum and Mitchel Klein. Survival Analysis: A Self-Learning Text.Springer, 2005.
Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 176 / 176