Lecture 5: Non-Parametric Estimation Survival Function Cumulative Hazard Function Confidence...
-
Upload
ambrose-mckenzie -
Category
Documents
-
view
220 -
download
2
Transcript of Lecture 5: Non-Parametric Estimation Survival Function Cumulative Hazard Function Confidence...
Lecture 5: Non-Parametric Estimation
Survival FunctionCumulative Hazard FunctionConfidence Intervals
Switching Gears
• Now, abandon parametric assumptions• Very common in survival• Why
– Not one “catch all” distribution– No central limit theorem for large samples
Censoring
• Assumption:– Potential censoring time is unrelated to potential event
time– Reasonable?
• Estimation approaches are biased when this is violated
• Violation example:– Sick patients tend to miss clinical visits more often– High school drop-out: Kids who move may be more
likely to drop-out
Terminology
• D distinct event times• t1 < t2 < t3 < … < tD
• Ties allowed• At time ti, there are di events
• Yi is the number of individuals at risk at ti
– Yi is all the people who have event times > ti
– di/Yi is an estimate of the conditional probability of an event at ti, given survival to ti
Conditional Probabilities
• Recall:
• Which means:
• And if we have > 2:
How does this relate to S(t)?
Kaplan-Meier Estimation
• AKA ‘product-limit’ estimator
• Step-function• Size of the steps depends of
– Number of events at time t– Pattern of censoring before t
1
1
1 ifˆ
1 ifi
i
i
dY
t t
t tS t t t
Kaplan-Meier Estimation
• Greenwood’s formula– Most common variance estimator– Point-wise
2ˆ ˆˆ
i
i
t t i i i
dV S t S t
Y Y d
Proof of Greenwood’s formula
Proof cont’d
Proof cont’d
Proof cont’d
Example
• Kim paper• Event = time to relapse• Data:
– 10, 20+, 35, 40+, 50+, 55, 70+, 80, 90+
1 i
i
dY S t
id iYTime
Plot it…
Cumulative Hazard
• Use H(t) = -ln(S(t))
Nelson-Aalen Estimator
• Better small sample properties than KM
• Variance of NA estimator
0 if
ifi
i
i
i
diY
t t
t tH t t t
22
ˆi
iH
t t i
d
Y
Uses of NA
• Model Identification– Recall H(t) vs. t– More later (chapter 12)
• Estimates of h(t)– Slopes of H(t)
• Survival Function– S(t) = exp(-H(t)) – S(t) using NA for H(t) is called the Fleming-
Harrington/Breslow method
Kim Example Using NA approach
10 1 10 0.9
20 0 9 0.9
35 1 8 0.788
40 0 7 0.788
50 0 6 0.788
55 1 5 0.63
70 0 4 0.63
71 0 3 0.63
80 1 2 0.315
90 0 1 0.315
i
i
i
dY
t t
H t
H tS t e ˆ 1 i
i
i
dY
t t
S t
id iYTime
Fleming-Harrington Estimate
• Almost equivalent to NA estimate of
• Handles ties slightly differently• If there were 3 deaths out of 10 at risk:
– Nelson estimate increments the hazard by 3/10– Fleming and Harrington increments the hazard by
1/10 + 1/9 + 1/8
1
11
1 0
ˆNA Breslow :
ˆFH :
diYi
t ti
di k
Ykk j
S t e
S t e
S t
Kim Example KM: black, FH: red, NA: green
1 i
ii
dYt t
KM
id iYTime
Ties
exp i
ii
dYt t
NA
1
1
1 0
expk
k
di
Y jk j
FH
di = (1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0)
Yi = (10, 20, 40, 40, 40, 50, 50, 70, 71, 80, 90)
R for KM, NA, and FHlibrary(survival)t<-c(10,20,40,40,40,50,50,70,71,80,90)d<-c(1,0,1,1,1,1,0,0,0,1,0)
st<-Surv(t,d)
help(survfit)fit.km<-survfit(st~1)fit.na<-survfit(st~1, type="fleming-harrington")fit.fh<-survfit(st~1, type="fh")
fit.kmfit.fhsummary(fit.km)summary(fit.na)summary(fit.fh)
plot(fit.km, conf.int=F, xlab="Time to Relapse (months)", ylab="Survival Function", lwd=2)
lines(fit.na, type="s", lwd=2, col=2)lines(fit.fh, type="s", lwd=2, col=3)
Cumulative Hazard• fun: an arbitrary function defining a transformation of the
survival curve. – fun = log is alternative way to draw a log-survival curve (but with the
axis labeled with log(S) values)– fun = sqrt would generate a curve on the square root scale
• Four often used transformations can be specified with a character argument instead– “log” is the same as using the log=T option– “event” plots the cumulative events (f(Y) = 1-y)– “cumhaz” plots the cumulative hazard function f(y) = -log(y)– “cloglog” creates a complimentary log-log survival plot (f(y) = log(-
log(y)) along with log scale for the x-axis).
Help for Generic “plot” Function in R
R Documentationplot {graphics}
Generic X-Y Plotting
Description
Generic function for plotting of R objects. For more details about the graphical parameter arguments, see par.
For simple scatter plots, plot.default will be used. However, there are plot methods for many R objects, including functions, data.frames, density objects, etc. Use methods(plot) and the documentation for these.
Usage
plot(x, y, ...)
Help File for “plot.survfit”R Documentation
plot.survfit {survival}Plot method for survfit objects
Description
A plot of survival curves is produced, one curve for each strata. The log=T option does extra work to avoid log(0), and to try to create a pleasing result. If there are zeros, they are plotted by default at 0.8 times the smallest non-zero value on the curve(s).
Usage
## S3 method for class 'survfit'plot(x, conf.int=, mark.time=TRUE, mark=3, col=1, lty=1, lwd=1, cex=1, log=FALSE, xscale=1, yscale=1, firstx=0, firsty=1, xmax, ymin=0, fun, xlab="", ylab="", xaxs="S", ...)
Cumulative Hazard>class(fit.km)[1] “survfit”
>plot(fit.km, conf.int=F, fun="cumhaz", lwd=2,xlab="Time to Relapse (months)", ylab="H(t)")
>lines(fit.na, type="s", fun="cumhaz", lwd=2, col=2, conf.int=F)>lines(fit.fh, type="s", fun="cumhaz", lwd=2, col=3 ,
conf.int=F)>legend(2, 1.4, c("Kaplan-Meier","Nelson-Aalen","Fleming-Harrington"), col=1:3, lwd=2)
Cumulative Hazard
Interpreting S(t) and H(t)
• General philosophy– Bad to extrapolate
• In survival– Bad to put a lot of stock in estimates at late time
points– Have less data at later times
Observations?
• Convergence to H(t) = lt with increasing N• Could apply to parametric smoothing to get
estimate of h(t), just the slope of the line H(t) versus t
• More divergence for the upper end, where denominator data (risk set) is smaller
• Textbook discusses bias in S(t) at tmax
• Can estimate S(t) by 0 beyond tmax (negatively biased)
• Can estimate S(t) = S(tmax) for t > tmax (positively biased)
• When there is no censoring, the product limit estimator reduces to the empirical survival function
Point-wise Confidence Intervals
• Constructed to ensure that the true value of S(t) at a particular t, falls in the interval with (1 - a)% confidence
• Notation:
• Recall that this is the sum in the Greenwood’s formula:
2
2
ˆˆˆ
ˆS
V S tt
S t
2ˆ ˆˆ i
i i i
i
d
Y Y dt t
V S t S t
“Linear” CIs
• Most commonly estimated is stats package• It is a point-wise CI for t• For simplicity of notation, assume 95%
confidence
0 0 0
0 0
ˆ ˆ1.96
or
ˆ ˆˆ1.96
SS t t S t
S t V S t
There are Other Better Options
• Transformations have better properties• Two main approaches:
– Log transformations: based on cumulative hazard approach
– Arcsine square root
Log Transformation
• Define q:
• Then, the 95% CI is
00
0 0
1.961.96
exp expˆ ˆln ln
i
i i i
i
d
Y Y dt tS t
S t S t
1
0 0ˆ ˆ,S t S t
Derivation of the Log Transformation
Log-log transformation
• Since the survival function estimates a probability, it is bounded by 0 and 1
• Taking the log results in bounds:
• Taking the opposite results in bounds
• Taking the double log results in bounds
ˆlog 0S t
ˆlog log S t
ˆ0 log S t
Complimentary log-log transformation
Log-Log Transformation
• Can estimate a confidence interval for the double log transformation– Estimate variance (delta method)– Use estimate to define CI according to:
2
2
1
1
ˆ ˆˆupper: log log log log
ˆ ˆˆlower: log log log log
S t z Var S t
S t z Var S t
+
-
Log-Log Transformation• To get the CI for the survival function at time t
– Must back transform from the double log
2
2
2
1
1
1
ˆ ˆˆ ˆLet log log & log log
ˆ ˆback transfomation: exp exp ,exp exp
substiture:
ˆ ˆˆexp exp log log log log
ˆ ˆˆexp exp log log log log
:
ˆ ˆ,A Ae e
L t S t A z Var S t
L t A L t A
S t z Var S t
S t z Var S t
yields
S t S t
Arcsin Squareroot
• Very ugly:
12
12
12
12
020 0
0
020 0
0
ˆˆsin max 0,arcsin 0.5 1.96
ˆ1
ˆˆsin min ,arcsin 0.5 1.96
ˆ2 1
S
S
S tLL S t t
S t
S tUL S t t
S t
Cumulative Hazard CIs
• Linear• Log• Arcsin square root
• See KI & Mo page 107
Which to Use When?
• For N > 25 and < 50% censoring– Log and log-log are good– Arcsin square root good– Both given ~ nominal coverage for 95% CI– Exception: extreme right tail where there is little
data• Linear approach requires much larger N for
good coverage
Which to Use When?
• Arcsin square root– Slightly conservative– A little wider than necessary
• Log– Slightly anti-conservative– A little too narrow
• Linear– Overly anti-conservative– Too narrow
• Large Samples: all about the same
Remember…
• Valid for point-wise intervals• Common incorrect interpretation:
– Plot a set of point-wise 95% CIs– Interpret as confidence “band”– These “bands” are too narrow!
Example: Tongue Cancer data
0 100 200 300 400
0.0
0.2
0.4
0.6
0.8
1.0
Time to Death (months)
Su
rviv
al
AneuploidDiploid
R Codelibrary(survival)tongue<-read.csv("H:\\BMTRY_722_Summer2015\\Tongue.csv")dat<-Surv(tongue$Time, tongue$Cens)type<-tongue$Typeplot(survfit(dat~type), conf.int=T, col=c(1,2),lty=c(2,1), lwd=c(2,2), xlab="Time to Death (months)", ylab="Survival", cex.axis=0.9)legend(300, .9, c("Aneuploid","Diploid"), lty=c(2,1), col=c(1,2), lwd=c(2,2), cex=0.8)
Add CIs: “plain”
Just Diploid Tumors
R Codefit.lin<-survfit(dat[type==2]~1, conf.type="plain")fit.log<-survfit(dat[type==2]~1, conf.type="log")fit.loglog<-survfit(dat[type==2]~1, conf.type="log-log")plot(fit.log, conf.int=T, col=2, lwd=2, lty=4)lines(fit.loglog, conf.int=T, col=3, lwd=2, lty=2)lines(fit.lin, conf.int=T, col=1, lwd=2, lty=1)legend(165, 1, c("Linear","Log","Log-Log"), col=c(1,2,3), lty=c(1,4,2), lwd=2)
summary(fit.lin)summary(fit.log)summary(fit.loglog)
R Results: Linear>summary(fit.lin)Call: survfit(formula = dat[type == 2] ~ 1, conf.type = "plain")
time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 28 1 0.9643 0.0351 0.89555 1.000 3 27 1 0.9286 0.0487 0.83318 1.000 4 26 1 0.8929 0.0585 0.77829 1.000 5 25 2 0.8214 0.0724 0.67957 0.963 8 23 1 0.7857 0.0775 0.63373 0.938 12 21 1 0.7483 0.0824 0.58683 0.910 13 20 1 0.7109 0.0863 0.54165 0.880 18 19 1 0.6735 0.0895 0.49797 0.849… 62 12 1 0.4116 0.0948 0.22581 0.597 69 10 1 0.3704 0.0938 0.18654 0.554 104 8 2 0.2778 0.0904 0.10069 0.455 112 5 1 0.2222 0.0877 0.05031 0.394 129 4 1 0.1667 0.0815 0.00692 0.326 181 2 1 0.0833 0.0717 0.00000 0.224
R Results: Log> summary(fit.log)Call: survfit(formula = dat[type == 2] ~ 1, conf.type = "log")
time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 28 1 0.9643 0.0351 0.8979 1.000 3 27 1 0.9286 0.0487 0.8379 1.000 4 26 1 0.8929 0.0585 0.7853 1.000 5 25 2 0.8214 0.0724 0.6911 0.976 8 23 1 0.7857 0.0775 0.6475 0.953 12 21 1 0.7483 0.0824 0.6031 0.929 13 20 1 0.7109 0.0863 0.5603 0.902 18 19 1 0.6735 0.0895 0.5190 0.874 … 62 12 1 0.4116 0.0948 0.2621 0.646 69 10 1 0.3704 0.0938 0.2255 0.608 104 8 2 0.2778 0.0904 0.1468 0.526 112 5 1 0.2222 0.0877 0.1025 0.482 129 4 1 0.1667 0.0815 0.0639 0.435 181 2 1 0.0833 0.0717 0.0155 0.449
R Results: Log-Log> summary(fit.loglog)Call: survfit(formula = dat[type == 2] ~ 1, conf.type = "log-log")
time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 28 1 0.9643 0.0351 0.77244 0.995 3 27 1 0.9286 0.0487 0.74348 0.982 4 26 1 0.8929 0.0585 0.70356 0.964 5 25 2 0.8214 0.0724 0.62296 0.921 8 23 1 0.7857 0.0775 0.58401 0.898 12 21 1 0.7483 0.0824 0.54320 0.871 13 20 1 0.7109 0.0863 0.50381 0.844 18 19 1 0.6735 0.0895 0.46569 0.815… 62 12 1 0.4116 0.0948 0.22854 0.586 69 10 1 0.3704 0.0938 0.19454 0.547 104 8 2 0.2778 0.0904 0.12160 0.459 112 5 1 0.2222 0.0877 0.08081 0.407 129 4 1 0.1667 0.0815 0.04693 0.350 181 2 1 0.0833 0.0717 0.00748 0.283