Survival Analysis

44
Survival Analysis

description

Survival Analysis. Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events. - PowerPoint PPT Presentation

Transcript of Survival Analysis

Page 1: Survival Analysis

Survival Analysis

Page 2: Survival Analysis

Survival Analysis• Statistical methods for analyzing

longitudinal data on the occurrence of events.

• Events may include death, injury, onset of illness, recovery from illness (binary or dichotomous variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts).

• Accommodates data from randomized clinical trial or cohort study design.

Page 3: Survival Analysis

Randomized Clinical Trial (RCT)

Target population

Intervention

Control

Disease

Disease-free

Disease

Disease-free

TIME

Random assignment

Disease-free, at-risk cohort

Page 4: Survival Analysis

Target population

Treatment

Control

Cured

Not cured

Cured

Not cured

TIME

Random assignment

Patient population

Randomized Clinical Trial (RCT)

Page 5: Survival Analysis

Target population

Treatment

Control

Dead

Alive

Dead

Alive

TIME

Random assignment

Patient population

Randomized Clinical Trial (RCT)

Page 6: Survival Analysis

Survival analysis• Primary focus is ‘time-to-event’ or “survival” time

e.g time until death, time until recurrence, time until remission, time until CD4 count declines or drops below a certain level etc.

• Events may include all kinds of “positive” or “negative’”events e.g. Time until tumor shrinks 20%, time until death, time until an alcoholic relapses and begins drinking again, etc.

• Can use single and combined endpointse.g time until death or time until CD4 count declines are single endpoints while time until either CD4 count declines or death occurs is a combined endpoint.

• Problem: the event of interest may never be observed!

Page 7: Survival Analysis

Censoring• Most survival analyses

must deal with a key problem called censoring.

• Censoring occurs when the event of interest is not observed for whatever reason so we do not know the exact “survival” time.

There are generally three reasons why censoring occurs:

1) a person does not experience the event before the study ends.

2) a person is lost to follow-up during the study period.

3) a person “withdraws” from the study for whatever reason.

Page 8: Survival Analysis

The incidence rate of death for renal replacement therapy (RRT)

patients

Survival times of eight patients at risk of death on RRT. The inclusion period was 1996-2000, whereas follow-up was ended on 31 December 2005.

01-01- 1996

Patients

1

2

3

4

5

6

7

8

death

start of RRT

Status

censored

censored

event

censored

event

event

censored

event

recovery of renal function

start of RRT

start of RRT

start of RRT

loss to follow-upstart of RRT

start of RRT

death

start of RRT

start of RRT death

death

31-12- 200531-12- 200001-01- 1996

Patients

1

2

3

4

5

6

7

8

death

Status

censored

censored

event

event

event

censored

event

recovery of renal function

start of RRT

start of RRT

start of RRT

loss to follow-upstart of RRT

start of RRT

death

start of RRT

start of RRT death

death

31-12- 200531-12- 2000

death due to competing cause

Start End

Example – Survival time on RRT: events & censored observations____________________________________________________________Incident RRT patients in the ERA-EDTA Registry were included in an analysis of patient survival on RRT. Like in most survival studies patients were recruited over a period of time (1996-2000 - the inclusion period) and they were observed up to a specific date (31 December 2005 - the end of the follow-up period). During this period the event of interest was ‘death while on RRT’, whereas censoring took place at recovery of renal function, loss to follow-up and at 31 December 2005.

Page 9: Survival Analysis

Assumptions related to censoring

• At any time patients who are censored have the same survival prospects as those who continue to be followed. – This sometimes is problematic: e.g. in the calculation of

survival on dialysis censoring at the time of transplantation is needed because these patients are no longer at risk of death on dialysis – however dialysis patients on the transplant waiting list do not have the same prospects as dialysis patients who are not on the waiting list

• Survival probabilities are assumed to be the same for subjects recruited early and late in the study. – May test this by splitting a cohort of patients in those

who were recruited early and those recruited late and see if their survival curves are different.

Page 10: Survival Analysis

Kaplan Meier method

• Observed survival times are first sorted in ascending order, starting with the patient with the shortest survival time and presented in a table.

Example 2 - Survival probability in RRT patients due to diabetes mellitus and other causes

In a sample of 50 RRT patients taken from a study on diabetes mellitus survival time started running at the moment a patient was included in the study, in this case at the start of RRT. Patients were followed until death or censoring. The survival probability was calculated using the Kaplan Meier method. Subsequently, the survival of patients with ESRD due to diabetes mellitus was compared to the survival of those with ESRD due to other causes.

• Used to estimate survival probabilities and to compare survival of different groups.

Page 11: Survival Analysis

Kaplan Meier method

• At the start of the study all 50 patients were alive - proportion surviving and cumulative survival were 1.00

• When the first patient died on day 34 after the start of RRT, the proportion surviving was 49/50 = 0.9800 = 98%. To calculate the cumulative survival this proportion surviving was multiplied by the 1.0 cumulative survival from the previous step resulting in a cumulative survival dropping to 0.9800.

• When the second patient died at day 35, the proportion surviving was 48/49 = 0.9796. To obtain the cumulative survival at day 35, again, this proportion was multiplied by the 0.9800 cumulative survival from the previous step which resulted in a cumulative survival dropping that day to 0.9600.

• On day 57, however, a patient was withdrawn alive from the study (censored). The proportion surviving that day was 47/47 = 1.00, as this patient did not die but was withdrawn alive from the study. As a result the cumulative survival did not drop that day but remained unchanged at 0.9400.

Time in days

Number at risk

Deaths Withdrawn alive (censored)

Proportion surviving on this day

Cumulative survival†

Cumulative mortality

0 50 0 0 1.00 1.00 0

34 50 1 0 49/50 = 0.9800 0.9800 0.0200

35 49 1 0 48/49 = 0.9796 0.9600 0.0400

44 48 1 0 47/48 = 0.9792 0.9400 0.0600

57 47 0 1 1 0.9400 0.0600

….. .. .. .. .. .. ..

Page 12: Survival Analysis

Kaplan Meier method• Cumulative survival is a probability of surviving the next

period multiplied by the probability of having survived the previous period

• All subjects at risk - also those not experiencing the event during the observation period - can contribute survival time to the denominator of the incidence rate

• By censoring one is able to reduce the number of persons alive without affecting the cumulative survival

0 500 1000 1500 2000 2500 3000 3500Time (days)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive S

urviv

al

Survival FunctionCensored

Page 13: Survival Analysis

Kaplan Meier method

• The median survival is that point in time, from the time of inclusion, when the cumulative survival drops below 50%, in this case it is 1708 days

• Is not related to the number of deaths or the number of subjects that is still at risk

• Why mean survival is used less frequently:– Survival data mostly highly skewed.– In case of censoring one does not know if and when the person will

experience the event – this complicates the calculation of the mean.– In order to calculate a mean survival one would need to wait until all

persons experienced the event.

Time in days

Number at risk

Deaths Withdrawn alive (censored)

Proportion surviving on this day

Cumulative survival†

Cumulative mortality

0 50 0 0 1.00 1.00 0

34 50 1 0 49/50 = 0.9800 0.9800 0.0200

35 49 1 0 48/49 = 0.9796 0.9600 0.0400

44 48 1 0 47/48 = 0.9792 0.9400 0.0600

57 47 0 1 1 0.9400 0.0600

….. .. .. .. .. .. ..

1650 18 1 0 17/18 = 0.9444 0.5289 0.4711

1708 17 1 0 16/17 = 0.9412 0.4978 0.5022

Page 14: Survival Analysis

Log-rank Test

• Most popular method of comparing the survival of groups.

• Takes the whole follow-up period into account.

• Addresses the hypothesis that there are no differences between the populations being studied in the probability of an event at any time point.

0 500 1000 1500 2000 2500 3000 3500Time (days)

0.0

0.2

0.4

0.6

0.8

1.0

Cum

ulat

ive

Surv

ival

ESRD due to diabetesESRD due to other causesdiabetes-censoredother causes-censored

P = 0.04

Page 15: Survival Analysis

Purpose: evaluate drug’s ability to maintain remissions Patients randomly assigned Study terminated after 1 year Different follow up times due to sequential enrollment

6-MP6,6,6,7,10,22,23,6+,9+,10+,11+,17+,19+,20+,25+,32+,32+,34+,35+

Placebo1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23

Example: Remission time of acute leukemia

Page 16: Survival Analysis

Gehan 6 -MP Ex amp le

t im e ( weeks)

surv

ival e

stim

ate

0 10 20 30

0.0

0.2

0.4

0.6

0.8

1.0

Log- Rank: 10. 47Chi- squar e: 17. 68df = 1p < 0. 001)

6- M P ( n=21)Placebo ( n=21)

Page 17: Survival Analysis

6-MP (Group = 1)6,6,6,6+,7,9+,10,10+,11+,17+,19+,20+,22,23,25+,32+,32+,34+,35+

Placebo (Group = 2)1,1,2,2,3,4,4,5,5,8,8,8,8,11,11,12,12,15,17,22,23

In JMP (1 is used to denote censored times, 0 for non-censored)

Example: Remission time of acute leukemia

E.g. for Group 1 – first 8 observations6, 6, 6, 6+, 7, 9+, 10, 10+

Page 18: Survival Analysis

Example: Remission time of acute leukemia

Group 1 – 6-MPGroup 2 - Placebo

We can clearly see that the time until remission (“survival”) time is larger for the treatment (6-MP) group than control. The log-rank and Wilcoxon tests for comparing the “survival” experience of both groups suggest a statistically significant difference exist (p < .0001).

Page 19: Survival Analysis

Retrospective cohort study:From December 2003 BMJ: Aspirin, ibuprofen, and mortality after myocardial infarction: retrospective cohort study

Page 20: Survival Analysis

What the Kaplan Meier method and the log-rank test can and cannot do…

• Together the Kaplan Meier method and the logrank test provide an opportunity to:– Estimate survival probabilities and – Compare survival between groups

• However– One cannot adjust for confounding variables – i.e. no

mutlivariate analysis– They do not provide an estimate of the effect size and the

relating confidence interval

→ In those cases one needs a regression technique like the Cox proportional hazards model (Cox PH Model)

Page 21: Survival Analysis

Cox Proportional Hazard Model

• Before we can talk about the Cox PH model we need to consider some characteristics and terminology associated with survival time distributions.

• Here survival times might be time until death, but these times can also represent other outcomes such as time until remission, time until relapse, etc.

Page 22: Survival Analysis

Introduction to survival distributions

• Ti the event time for an individual, is a random variable having a probability distribution.

• Different models for survival data are distinguished by different choices for the distribution of Ti.

Page 23: Survival Analysis

Describing Survival Distributions

The idea is this: Assume that times-to-event for individuals in your dataset follow a continuous probability distribution (typically a skewed right distribution, generally not normal!). For all possible times Ti after baseline, there is a certain probability that an individual will have an event at exactly time Ti. For example, human beings have a certain probability of dying at ages 3, 25, 80, and 140:

P(T=3), P(T=25), P(T=80), and P(T=140).

These probabilities are obviously vastly different.

Page 24: Survival Analysis

Probability density function: f(t)

In the case of human longevity, Ti is unlikely to follow a normal distribution, because the probability of death is not highest in the middle ages, but at the beginning and end of life. Hypothetical data:

People have a high chance of dying in their 70’s and

80’s; BUT they have a smaller

chance of dying in their 90’s and 100’s, because few

people make it long enough to die at these ages.

Page 25: Survival Analysis

Probability density function: f(t)

Show’s how failure times are distributed. If we had no censoring a histogram of the survival times of say ESRD patients would give us an impression of what the probability density function, f(t), looks like.

The smoothed curve added to the histogram is a visualization of f(t) based upon a sample of patients with ESRD.

Page 26: Survival Analysis

Survival function: 1 - F(t)The goal of survival analysis is to estimate and compare survival experiences of different groups. Survival experience is described by the cumulative survival function:

)(1)(1)( tFtTPtS

Example: If t = 100 years, S(100) = S(t=100) which is the probability of

surviving beyond 100 years.

F(t) is the CDF of f(t), and is “more interesting” than f(t).

Page 27: Survival Analysis

27

Cumulative Survival Same hypothetical data, plotted as cumulative distribution rather than density:

Recall f(t)

𝑆 (𝑡 )=𝑃 (𝑇>𝑡 )

Page 28: Survival Analysis

28

Cumulative survival, S(t) = P(T >t)

S(80) = P(T>80)

S(20) = P(T>20)

Page 29: Survival Analysis

29

Hazard Function h(t): a new concept

AGES

Hazard rate is an instantaneous incidence

rate. Think of it like the rate of change of your chance of dying, like a speedometer on a car racing towards death.

Page 30: Survival Analysis

Hazard function h(t)

ttTttTtPth

t

)/(lim)(0

In words: the probability that if you survive to t, you will succumb to the event in the next instant.

)()((t) :survival anddensity from HazardtStfh

Page 31: Survival Analysis

Hazard h(t) vs. Density f(t)This is subtle, but the idea is:• When you are born, you have a certain

probability of dying at any age; that’s the probability density.– Example: a woman born today has, say, a 1%

chance of dying at 80 years.

• However, as you survive for awhile, your probabilities keep changing (think: conditional probability)– Example, a woman who is 79 today has, say, a

5% chance of dying at 80 years.

Page 32: Survival Analysis

32

A possible set of probability density, failure, survival, and hazard functions.

F(t)=cumulative failure = P(T < t)

S(t)=cumulative survival

h(t)=hazard function

f(t)=density function

Page 33: Survival Analysis

Cox Proportional Hazards Model

• Model for the hazard function as a function of covariates/predictors/independent variables.

• The interpretation of the estimated coefficients in the model is similar to the coefficients in a logistic regression model.

• Logistic Regression Odds Ratios (OR)

• Cox PH Model Hazard Ratio (HR)

In order to understand the distinction between OR’s and HR’s we need to discuss the difference between incidence rates and proportions.

Page 34: Survival Analysis

Incidence Rate vs. Proportion

• Incidence (hazard) rate - number of new cases of disease per population at-risk per unit time (or mortality rate, if outcome is death).

• Cumulative incidence - proportion of new cases that develop in a given time period

• Hazard or rate ratio (HR) is the ratio of incidence rates.

• Odds or risk ratio (OR or RR) is the ratio of proportions.

Page 35: Survival Analysis

Cox Proportional Hazards Model

The Cox PH Model for individuals with k covariate values says the hazard function for these individuals is given by:

where is the baseline hazard function which is assumed to be the same for all individuals. The covariates then multiple the baseline hazard to give a covariate specific hazard function.

Page 36: Survival Analysis

Hazard Ratio (HR)Consider the population i which consists of all individuals with k covariate values and population j which consists of all individual with k covariate values then the hazard ratio for comparing population i to population j individuals is given by:

Page 37: Survival Analysis

Hazard Ratio (HR) – for dichotomous covariates

Example 1: Suppose we are modeling the hazard function for developing lung cancer using smoking status and age as covariates. Find the HR for 60-year old smokers (+1, ) vs. non-smokers (-1 , ).

Thus the HR associated with smoking for 60-year old individuals is . Notice the similarity to the interpretation of coefficients in a logistic regression model.

Note: The particular age is irrelevant as long as it is the same for both populations being compared.

Page 38: Survival Analysis

Hazard Ratio (HR) – for continuous covariates

Example 2: Suppose we are modeling the hazard function for developing lung cancer using smoking status and age as covariates. Find the HR for 70-year old smokers (+1, ) vs. 60-year old smokers ().

Thus the HR associated with a 10-year increase in age starting at age 60 is . Notice the similarity to the interpretation of coefficients for continuous variables in a logistic regression model. Note: This would be the same if we compared any two ages that are 10-years apart. Also smoking status irrelevant if it is the same for both populations we are considering.

Page 39: Survival Analysis

Example: Remission time for acute leukemia

Here we have two dichotomous covariates in a Cox PH model for remission time.

The hazard ratio (HR) for females is then given by .752, so females have less risk of remission than males. The hazard ratio (HR) for males is the reciprocal 1/.752 = 1.33, so males have 1.33 times the risk of remission. These are only point estimates however, thus we also need to consider CI’s.

Page 40: Survival Analysis

Example: Remission time for acute leukemia

Here we have two dichotomous covariates in a Cox PH model for remission time.

The hazard ratio (HR) for receiving the active treatment (6-MP) is given by and the hazard ratio (HR) for those receiving placebo is therefore 1/.2005 = 4.988, thus those receiving placebo have 5 times the risk for remission. Again we should examine CI’s for these HR’s.

Page 41: Survival Analysis

Example: Remission time for acute leukemia

Here we have one continuous covariate in the Cox PH model for remission time.

log of the white blood cell count

The estimate coefficient for the log base 2 of the white blood cell count is 1.59. A unit increase in corresponds to doubling the WBC, so if we compare two populations patients, one with double the WBC of the other the estimated HR is given by So the population with double the WBC has 4.92 times the risk of remission.

Next we consider a Cox PH model using treatment, sex, and as covariates.

Page 42: Survival Analysis

Example: Remission time for acute leukemia

Next we consider a Cox PH model using treatment, sex, and as covariates. We can see that both Treatment and log2WBC are statistically significant, while Sex of the patient is not. JMP can be used to calculate the Risk Ratios or Hazard Ratios (HR).

The estimated HR associated with not receiving the 6-MP therapy is 4.02 with a CI (1.698, 10.307) and the estimated HR associated with doubling the WBC is 4.92 with a CI (2.65, 9.73).

Page 43: Survival Analysis

Example: Remission time for acute leukemia

Next we consider a Cox PH model using treatment, sex, and as covariates. We can see that both Treatment and log2(WBC) are statistically significant, while Sex of the patient is not. JMP can be used to calculate the Risk Ratios or Hazard Ratios (HR).

The estimated HR for males vs. females is 1.30, however the CI includes 1, so we cannot say there is increased risk of recurrence for males. This is further supported by the p-value = .5596.

Page 44: Survival Analysis

Summary of Survival Analysis Survival analysis involves making inferences

about the time until event occurs. Due to the prospective nature of these studies

there are frequently censored time observations. The Kaplan-Meier Method allows us to describe

both visually and numerically the survival experience of subjects in our study.

The log-rank test allows us to compare the survival experience of subjects across treatment groups.

The Cox Proportional Hazards Model allows us to examine the relationship between the survival experience of subjects and covariates that might be related to their survival; or to look at group/treatment differences adjusted for other covariates.