Survival analysis. First example of the day Small cell lungcanser Meadian survival time: 8-10 months...
Transcript of Survival analysis. First example of the day Small cell lungcanser Meadian survival time: 8-10 months...
Survival analysis
First example of the day
• Small cell lungcanser • Meadian survival
time: 8-10 months• 2-year survival is 10%• New treatment
showed median survival of 13.2months
Progressively censored observations
Current life table• Completed dataset
Cohort life table• Analysis “on the fly”
Problem
Do patients survive longer after treatment 1 than after treatment 2?Possible solutions:
• ANOVA on mean survival time?• ANOVA on median survival time?• 100 person years of observation: How long has the average
person been in the study.• 10 persons being observed for 10 years • 100 persons being observed for 100 years
Life table analysis
A sub-set of 13 patients undergoing the same treatment
Life table analysis
Time interval chosen to be 3 months
ni number of patients starting a given period
Life table analysis
di number of terminal events, in this example; progression/response
wi number of patients that have not yet been in the study long enough to finish this period
Life table analysis
Number exposed to risk:ni – wi/2Assuming that patients withdraw in the middle of the period on average.
Life table analysis
qi = di/(ni – wi/2)Proportion of patients terminating in the period
Life table analysis
pi = 1 - qi
Proportion of patients surviving
Life table analysis
Si = pi pi-1 ...pi-N
Cumulative proportion of surviving Conditional probability
Survival curves
How long will a lung canser patient keep having canser on this particular treatment?
Kaplan-Meier
Simple example with only 2 ”terminal-events”.
Confidence interval of the Kaplan-Meier method
Fx after 32 months
( ) ii i
i i i
dSE S S
n n d
1
( ) 0.9 0.094910 10 1iSE S
Confidence interval of the Kaplan-Meier method
Survival plot for all data on treatment 1
Are there differences between the treatments?
Comparing Two Survival CurvesOne could use the confidence intervals…
But what if the confidence intervals are not overlapping only at some points?
Logrank-stats• Hazard ratio
Mantel-Haenszel methods
Comparing Two Survival CurvesThe logrank statistics Aka Mantel-logrank statisticsAka Cox-Mantel-logrank statistics
Comparing Two Survival CurvesFive steps to the logrank statistics table1. Divide the data into intervals (eg. 10 months)2. Count the number of patients at risk in the groups and in
total 3. Count the number of terminal events in the groups and in
total4. Calculate the expected numbers of terminal events
e.g. (31-40) 44 in grp1 and 46 in grp2, 4 terminal events. expected terminal events 4x(44/90) and 4x(46/90)
5. Calculate the total
Comparing Two Survival CurvesSmells like Chi-Square statistics
2
2
all_treatments
O E
E
2 2
2 23 17.07 12 17.934.02
17.07 17.93
1df 0.05p
Comparing Two Survival CurvesHazard ratio
1 1
2 2
23 17.07Hazard ratio 2.01
12 17.93
O E
O E
Comparing Two Survival CurvesMantel Haenszel test
Is the OR significant different from 1?
Look at cell (1,1)Estimated value, E(ai)
Variance, V(ai)
a b n
ORc d n
row total * column total
grand total
2
( )( )( )( )( )
1i
a c b d a b c dV a
n n
Comparing Two Survival CurvesMantel Haenszel test
df = 1; p>0.05
2( )
1.12( )
i i
i
a E aM H
V a
Hazard function
dH
f c
log( )iH S
d is the number of terminal eventsf is the sum of failure timesc is the sum of censured times
Logistic regression
Who survived Titanic?
25
The sinking of Titanic
Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers survived. Who survived?
26
The data
Sibsp is the number of siblings and/or spouses accompanyingParsc is the number of parents and/or children accompanying Some values are missingCan we predict who will survive titanic II?
pclass survived name sex age sibsp parch
1 1 Allen, Miss. Elisabeth Walton female 29 0 0
1 1 Allison, Master. Hudson Trevor male 0.9167 1 2
1 0 Allison, Miss. Helen Loraine female 2 1 2
1 0 Allison, Mr. Hudson Joshua Creighton male 30 1 2
1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25 1 2
1 1 Anderson, Mr. Harry male 48 0 0
1 1 Andrews, Miss. Kornelia Theodosia female 63 1 0
1 0 Andrews, Mr. Thomas Jr male 39 0 0
1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53 2 0
27
Analyzing the data in a (too) simple manner
• Associations between factors without considering interactions
28
Analyzing the data in a (too) simple manner
• Associations between factors without considering interactions
29
Analyzing the data in a (too) simple manner
• Associations between factors without considering interactions
30
Could we use multiple linear regression to predict survival?
0 1 1( ) ... n nE y x x
multiple linear regression Logistic regression
Response variable is defined between –inf and +inf
Response variable is defined between 0 and 1
Normal distributed Bernoulli distributed
31
Logit transformation is modeled linearly
The logistic function
0 1 1
0 1 1
0 1 1 0 1 1
ln ...1
exp ... 1
1 exp ... 1 exp ...
n n
n n
n n n n
px x
p
x xp
x x x x
32
The sigmodal curve
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
33
The sigmodal curve
• The intercept basically just ‘scale’ the input variable
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
0 = 2;
1 = 1
0 = -2;
1 = 1
34
The sigmodal curve
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
0 = 0;
1 = 2
0 = 0;
1 = 0.5
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
35
The sigmodal curve
0 1 1
1
1 e...
z
n n
p
z x x
-6 -4 -2 0 2 4 60
0.2
0.4
0.6
0.8
1
x
p
sigmodal curve
0 = 0;
1 = 1
0 = 0;
1 = -1
• The intercept basically just ‘scale’ the input variable
• Large regression coefficient → risk factor strongly influences the probability
• Positive regression coefficient → risk factor increases the probability
36
Logistic regression of the Titanic data
37
Logistic regression of the Titanic data – passenger class
1. Summary of data2. Coding of the dependent
variable3. Coding of the categorical
explanatory variable:First class: 1Second class: 2Third class: reference
38
Logistic regression of the Titanic data – passenger class
A fit of the null-model, basically just the intercept. Usually not interesting
• The total probability of survival is 500/1309 = 0.382. Cutoff is 0.5 so all are classified as non-survivers.
• Basically tests if the null-model is sufficient. It almost certainly is not.
• Shows that survival is related to pclass (which is not in the null-model)
39
Logistic regression of the Titanic data – passenger class
1. Omnibus test: Uses LR to describe if the adding the pclass variable to the model makes it better. It did! But better than the null-model, so no surprise.
2. Model Summary. Other measures of the goodness of fit.
3. Classification table: By including pclass 67.7 passengers were correctly categorized.
4. Variables in the equation: first line repeats that pclass has a significant effect on survival. B is the logistic fittet parameter. Exp(B) is the odds rations, so the odds of survival is 4.7 (3.6-6.3) times higher than passengers on third class (reference class)
40
Logistic regression of the Titanic data – Adding age to the model
Ups… Some data points are missing And the null model is poorer
41
Logistic regression of the Titanic data – Adding age to the model
• Cox and Senll’s R-square increased from 0.093 to 0.141, indicating a better model
• By this model we can classify 69.1% passenger class only classified 67.7%
42
Logistic regression of the Titanic data – Adding age to the model
• Age has a significant influence on survival. • The odds ratio of age is 0.963• So the odds of a 31 year old is 0.963 times the odds of a 30 year old.• Or the odds for a 30 year old to survive is 1/0.963 = 1.038 times
larger than that of a 31 year old
43
Logistic regression of the Titanic data – Age alone
• The model is extremely poor• Consequently age appear to be insignificant in estimating survival.
44
Logistic regression of the Titanic data – Adding family and sex
• The model is becoming better
45
Logistic regression of the Titanic data – Using the model as to predict
• What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic?
z = -2.703-0.041*25+2.552+1.718+0.925= 1.4670
1.4714
1 1
1 e 1 e0.8133
zp
46
Using the model to predict survival
• What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic?
z = -3.929-0.589*(-5)/14.41+1.718+2.552+0.926 = 1.4714
1.4714
1 1
1 e 1 e0.8133
zp
47
Is it realistic that Leonardo survives and the chick dies?