01/20141 EPI 5344: Survival Analysis in Epidemiology Introduction to concepts and basic methods...
-
Upload
briana-caldwell -
Category
Documents
-
view
218 -
download
0
Transcript of 01/20141 EPI 5344: Survival Analysis in Epidemiology Introduction to concepts and basic methods...
01/2014 1
EPI 5344:Survival Analysis in
EpidemiologyIntroduction to concepts and basic methods
February 25, 2014
Dr. N. Birkett,Department of Epidemiology & Community
Medicine,University of Ottawa
01/2014 2
Survival concepts (1)
• Cohort studies– Follow-up a pre-defined group of people for a period of time which can
be: • Same time for everyone
• Different time for different people.
– Determine which people achieve specified outcome.
– Outcomes could be many different things, such as:• Death
– Any cause or cause-specific
• Onset of new disease
• Resumption of smoking in someone who had quit
• Recidivism for drug use or criminal activity
• Change in numerical measure such as blood pressure– Longitudinal data analysis
01/2014 3
Survival concepts (2)
• Cohort studies– Traditional approach to cohorts assumes everyone is followed for the same time
• incidence proportion
• logistic regression modeling
– If follow-up time varies, what do you do with subjects who don’t make it to the
end of the study?• Censoring
– Cohort studies can provide more information than presence/absence of
outcome.• Time when outcome occurred
• Type of outcome (competing outcomes)
– Can look at rate or speed of development of outcome• incidence rate
• person-time
01/2014 4
Survival concepts (3)
• Time to event analysis– Survival Analysis (general term)– Life tables– Kaplan-Meier curves– Actuarial methods– Log-rank test– Cox modeling (proportional hazards)
• Strong link to engineering– Failure time studies
01/2014 5
Survival concepts (4)• Analysis of Cohort studies (from epidemiology)
– Incidence proportion (cumulative incidence)• Select a point in time as the end of follow-up.• Compare groups using t-test, CIR (RR)• Issues include:
– What point in time to use?– What if not all subjects remain under follow-up that long?– Ignores information from subjects who don’t get outcome or reach
the time point– What is incidence proportion for the outcome ‘death’ if we set the
follow-up time to 200 years?» Will always be 100%
01/2014 6
Survival concepts (5)
• Analysis of Cohort studies (from epidemiology)– Incidence rate (density)
• Based on person time of follow-up• Can include information on drop-outs, etc.• Closely linked to survival analysis methods
01/2014 7
Survival concepts (6)
• Cumulative Incidence– The probability of becoming ill over a pre-defined
period of time.– No units– Range 0-1
• Incidence density (rate)– The rate at which people get ill during person-time of
follow-up• Units: 1/time or cases/Person-time• Range 0 to +∞
– Very closely related to hazard rate.
01/2014 8
Measuring Time (1)
• Need to consider:– Units to use to measure time
• Normally, years/months/days• Time of events is usually measured as ‘calendar time’• Other measures are possible (e.g. hours)
– ‘scale’ to be used• time on study• age• calendar date
– Time ‘0’ (‘origin of time’)• The point when time starts
01/2014 9
Time Scale (1)
• Time of events is usually measured as ‘calendar time’
• Can be represented by ‘time lines’ in a graph• Conceptual idea used in analyses
Patient #1 enters on Feb 15, 2000 & dies on Nov 8, 2000
Patient #2 enters on July 2, 2000 & is lost (censored) on April 23, 2001
Patient #3 Enters on June 5, 2001 & is still alive (censored) at the end of the follow-up period
Patient #4 Enters on July 13, 2001 and dies on December 12, 2002
01/2014 10
D
D
C
C
01/2014 11
Time Scale (2)
• In survival analysis, focus is commonly on ‘study
time’– How long after a patient starts follow-up do their events occur?
– Particularly common choice for RCT’s
– Need to define a ‘time 0’ or the point when study time starts
accumulating for each patient.
• Most epidemiologists recommend using ‘age’ as the
time scale for etiological studies– We’ll focus on time since a defining event but, remember
this for the future.
01/2014 12
Origin of Time (1)
• Choice of time ‘0’ affects analysis– can produce very different regression
coefficients and model fit;
• Preferred origin is often unavailable• More than one origin may make sense
– no clear criterion to choose which to use
01/2014 13
Time ‘0’ (2)
• No best time ‘0’ for all situations– Depends on study objectives and design
• RCT of Rx– ‘0’ = date of randomization
• Prognostic study– ‘0’ = date of disease onset– Inception cohort– Often use: date of disease diagnosis
01/2014 14
Time ‘0’ (3)
• ‘point source’ exposure• Date of event
– Hiroshima atomic bomb
– Dioxin spill, Seveso, Italy
01/2014 15
Time ‘0’ (4)
• Chronic exposure• date of study entry
• Date of first exposure
• Age (preferred origin/time scale)
– Issues• There often is no first exposure (or no clear date of 1st
exposure)
• Recruitment long after 1st exposure– Immortal person time
– Lack of info on early events.
– ‘Attained age’ as time scale
01/2014 16
Time ‘0’ (5)
• Calendar time can be very important– studies of incidence/mortality trends
• In survival analysis, focus is on ‘study time’– When after a patient starts follow-up do their events occur
• Need to change time lines to reflect new time scale
Patient #1 enters on Feb 15, 2000 & dies on Nov 8, 2000
Patient #2 enters on July 2, 2000 & is lost (censored) on April 23, 2001
Patient #3 Enters on June 5, 2001 & is still alive (censored) at the end of the follow-up period
Patient #4 Enters on July 13, 2001 and dies on December 12, 2002
01/2014 17
D
D
C
C
01/2014 18
D
D
C
C
01/2014 19
Study course for patients in cohort
2001 2003 2013
01/2014 20
01/2014 21
Time ‘0’ (5)
• Can be interested in more than one ‘event’ and thus more than one ‘time to event’
• An Example– Patients treated for malignant melanoma– Treated with ‘A’ or ‘B’– Expected to influence both time to relapse
and survival
01/2014 22
Time ‘0’ (6)
• Some studies have more than one outcome event
• Let’s use this to illustrate SAS code to compute time-to-
event.
• Four time points:– Date of surgery: Time ‘0’
– Relapse
– Death
– Last follow-up (if still alive without relapse.)
• Event #1: earliest of relapse/death/end
• Event #2: Earliest of death/end
01/2014 23
Time ‘0’
• How do we compute the ‘time on study’ for each of these events?• Convert to days (weeks, months, years) from time ‘0’ for
each person• SAS reads date data using ‘date format’
• stored as # days since Jan 1, 1960.
01/2014 24
SAS code to create event variables
Data melanoma; set melanoma;/* dfs -> Died or relapsed */ dfsevent = 1 – (date_of_relapse = .)*(date_of_death = .);
/* surv -> Alive at the end of follow-up */ survevent = (date_of_death ne .);
if (survevent = 0) then survtime = (date_of_last – date_of_surg)/30.4; else survtime = (date_of_death – date_of_surg)/30.4;
if (dfsevent = 0) then dfstime = (date_of_last - date_of_surg)/30.4; else if (date_of_relapse NE .) then dfstime = (date_of_relapse - date_of_surg)/30.4; else if (date_of_relapse = . and date_of_death NE .) then dfstime = (date_of_death - date_of_surg)/30.4; else dfstime = .E;
Run;
01/2014 25
01/2014 26
01/2014 27
Survival curve (1)
• What can we do with data which includes time-to-event?
• Might be nice to see a picture of the number of people surviving from the start to the end of follow-up.
Sample Data: Mortality, no losses
Year # still alive # dying in the year
2000 10,000 2,000
2001 8,000 1,600
2002 6,400 1,280
2003 5,120 1,024
2004 4,096 820
01/2014 28
01/2014 29
Not the right axis for a survival curve
01/2014 30
Survival curve (2)
• Previous graph has a problem– What if some people were lost to follow-up?– Plotting the number of people still alive would
effectively say that the lost people had all died.
Sample Data: Mortality, no losses
01/2014 31
Year # still alive # dying in the year Lost to follow-up
2000 10,000 2,000 1,000
2001
2002
2003
2004
Year # still alive # dying in the year Lost to follow-up
2000 10,000 2,000 1,000
2001 7,000
2002
2003
2004
Year # still alive # dying in the year Lost to follow-up
2000 10,000 2,000 1,000
2001 7,000 1,400 800
2002 4,800 960 500
2003 3,340 670 400
2004 2,270 460 260
01/2014 32
01/2014 33
Survival curve (2)
• Previous graph has a problem– What if some people were lost to follow-up?– Plotting the number of people still alive would
effectively say that the lost people had all died.
• Instead– True survival curve plots the probability of
surviving.
01/2014 34
01/2014 35
01/2014 36
Survival Curves (1)
• Primary outcome is ‘time to event’• Also need to know ‘type of event’
Person Type Time
1 Death 100
2 Alive 200
3 Lost 150
4 Death 65
And so on
01/2014 37
Survival Curves (2)
• Censored– People who do not have the targeted outcome (e.g.
death)• For now, assume no censoring• How do we represent the ‘time’ data in a
statistical method?– Histogram of death times - f(t)– Survival curve - S(t)– Hazard curve - h(t)
• To know one is to know them all
01/2014 38
t
dxxftF0
)()(
Histogram of death time- Skewed to right- pdf or f(t)- CDF or F(t)
- Area under ‘pdf’ from ‘0’ to ‘t’
t
F(t)
01/2014 39
Survival curves (3)
• Plot % of group still alive (or % dead)
S(t) = survival curve
= % still surviving at time ‘t’
= P(survive to time ‘t’)
Mortality rate = 1 – S(t)
= F(t)
= Cumulative incidence
01/2014 40
Deaths CI(t)
Survival S(t)
t
S(t)
1-S(t)
01/2014 41
‘Rate’ of dying• Consider these 2 survival curves• Which has the better survival profile?
– Both have S(3) = 0
01/2014 42
01/2014 43
Survival curves (4)
• Most people would prefer to be in group‘A’ than group ‘B’.– Death rate is lower in first two years.
– Will live longer than in pop ‘B’
• Concept is called:– Hazard: Survival analysis/stats
– Force of mortality: Demography
– Incidence rate/density: Epidemiology
• DEFINITION– h(t) = rate of dying at time ‘t’ GIVEN that you have survived to
time ‘t’
– Similar to asking the speed of your car given that you are two hours into a five
hour trip from Ottawa to Toronto
• Slight detour and then back to main theme
01/2014 44
Conditional Probability
h(t0) = rate of failing at ‘t0’ conditional on surviving to t0
Requires the ‘conditional survival curve’:
Essentially, you are re-scaling S(t) so that S*(t0) = 1.0
Survival Curves (5)
01/2014 45
S(t0)
t0 t0
01/2014 46
S*(t) = survival curve conditional on surviving to ‘t0‘
CI*(t) = failure/death/cumulative incidence at ‘t’ conditional on surviving to ‘t0‘
Hazard at t0 is defined as: ‘the slope of CI*(t) at t0’
Hazard (instantaneous)Force of MortalityIncidence rateIncidence density
Range: 0 ∞
01/2014 47
Some relationships
If the rate of disease is small: CI(t) ≈ H(t)If we assume h(t) is constant (= ID): CI(t)≈ID*t
01/2014 48
Some survival functions (1)
• Exponential– h(t) = λ– S(t) = exp (- λt)
• Underlies most of the ‘standard’ epidemiological formulae.
• Assumes that the hazard is constant over time– Big assumption which is not usually true
01/2014 49
01/2014 50
Some survival functions (2)
• Weibull– h(t) = λ γ tγ-1
– S(t) = exp (- λ tγ)• Allows fitting a broader range of hazard
functions• Assumes hazard is monotonic
– Always increasing (or decreasing)
01/2014 51
01/2014 52
Hazard curves (2)
01/2014 53
Hazard curves (3)
01/2014 54
Some survival functions (3)
• All these functions assume that everyone eventually gets the outcome event. Suppose this isn’t true:– Cures occur– Immunity
• Mixture models– S(t) = exp(-λt) (1-π) + 1 π– S(t) π as t∞
01/2014 55
Some survival functions (4)
• Piece-wise exponential– Divide follow-up into intervals– The hazard is constant within interval but can differ
across intervals (e.g. ‘0’ for cure)
01/2014 56
01/2014 57
Some survival functions (5)
• Piece-wise exponential– Divide follow-up into intervals– The hazard is constant within interval but can differ
across intervals (e.g. ‘0’ for cure)• Gompertz Model
– Uses a functional form for S(t) which goes to a fixed, non-zero value after a finite time
01/2014 58
Censoring (1)
• So much for theory• In real world, we run into practical issues:
– May only know that subject was disease-free up to time ‘t’ but then you lost track of them
– May only know subject got disease before time ‘t’– May only know subject got disease between two exam dates.– May know subject must have been outcome-free for the first
‘x’ years of follow-up (immortal person-time)– Can’t measure time to infinite precision
• Often only know year of event
– Exact time of event might not even exist in theory
Censoring (2)
• Three main kinds of censoring– Right censoring
• The time of the event is known to be later than some time
• Subject moves to Australia after three years of follow-up– We only know that they died some time after 3 years.
– Left censoring• The time of the event is known to be before some time
– Looking at age of menarche, starting with a group of 12 year old girls.
– Some girls are already menstruating
– Interval censoring• Time of the event occurred between two known times
– Annual HIV test
– Negative on Jan 1, 2012
– Positive on Jan 1, 2013
01/2014 59
01/2014 60
D
D
D
01/2014 61
Censoring (3)
• Right censoring is most commonly considered– Type 1 censoring
• The censoring time is ‘fixed’ (under control of investigator)
– Singly censored• Everyone has the same censoring time
• Commonly due to the study ending on a specific date
– Type 2 censoring• Terminate study after a fixed number of events has happened
– most common in lab studies
– Random censoring• Observation terminated for reason not under investigator’s control
• Varying reasons for drop-out
• Varying entry times
01/2014 62
Censoring (4)
• Right censoring is most commonly considered– Event of interest is death but at the end of their follow-up,
subject is still alive.• Administrative Censoring• Loss-to-follow-up
– A patient moves away or is lost without having experienced event of interest
• Drop-out– Patient dropped from study due to protocol violation, etc.
• Competing risks– Death occurs due to a competing event
• We know something about these patients.– Discarding them would ‘waste’ information
01/2014 63
Study course for patients in cohort
2001 2003 2013
01/2014 64
Censoring (5)
• Standard analysis ignores method used to generate censoring.
• Type 1/2 methods are fine• ‘Random’ censoring can be a problem.
– Informative vs. uninformative censoring• Standard analyses require ‘uninformative’
censoring– The development of the outcome in subjects who are
censored must be the same as in the subjects who remained in follow-up
01/2014 65
Censoring (6)
• Informative vs. uninformative censoring– RCT of new therapy with serious side effects.
• Patients on this Rx can tolerate side effects until near death. Then, they drop out.
• Mortality rate in this group will be 0 (/100,000)
– Control therapy has no side-effects• Patients do not drop out near death.
• Strong bias
01/2014 66
Type of Censoring May Violate Assumption of Independence of Censoring/Survival
If assumption is violated, likely direction of bias on CIR estimate
Deaths from other causes when there are common risk factors*
Yes Underestimation
Failure to follow-up contacts
Yes Underestimation
Migration Yes Variable
Administrative censoring Unlikely§ Variable
* In cause-specific incidence or mortality studies§ More likely in studies with a prolonged accrual period in the presence of secular trends.
01/2014 67