Lecture 5: Non-Parametric Estimation Survival Function Cumulative Hazard Function Confidence...

Lecture 5: Non-Parametric Estimation

Survival FunctionCumulative Hazard FunctionConfidence Intervals

Switching Gears

• Now, abandon parametric assumptions• Very common in survival• Why

– Not one “catch all” distribution– No central limit theorem for large samples

Censoring

• Assumption:– Potential censoring time is unrelated to potential event

time– Reasonable?

• Estimation approaches are biased when this is violated

• Violation example:– Sick patients tend to miss clinical visits more often– High school drop-out: Kids who move may be more

likely to drop-out

Terminology

• D distinct event times• t1 < t2 < t3 < … < tD

• Ties allowed• At time ti, there are di events

• Yi is the number of individuals at risk at ti

– Yi is all the people who have event times > ti

– di/Yi is an estimate of the conditional probability of an event at ti, given survival to ti

Conditional Probabilities

• Recall:

• Which means:

• And if we have > 2:

How does this relate to S(t)?

Kaplan-Meier Estimation

• AKA ‘product-limit’ estimator

• Step-function• Size of the steps depends of

– Number of events at time t– Pattern of censoring before t

1

1

1 ifˆ

1 ifi

i

i

dY

t t

t tS t t t

Kaplan-Meier Estimation

• Greenwood’s formula– Most common variance estimator– Point-wise

2ˆ ˆˆ

i

i

t t i i i

dV S t S t

Y Y d

Proof of Greenwood’s formula

Proof cont’d

Example

• Kim paper• Event = time to relapse• Data:

– 10, 20+, 35, 40+, 50+, 55, 70+, 80, 90+

1 i

i

dY S t

id iYTime

Plot it…

Cumulative Hazard

• Use H(t) = -ln(S(t))

Nelson-Aalen Estimator

• Better small sample properties than KM

• Variance of NA estimator

0 if

ifi

i

i

i

diY

t t

t tH t t t

22

ˆi

iH

t t i

d

Y

Uses of NA

• Model Identification– Recall H(t) vs. t– More later (chapter 12)

• Estimates of h(t)– Slopes of H(t)

• Survival Function– S(t) = exp(-H(t)) – S(t) using NA for H(t) is called the Fleming-

Harrington/Breslow method

Kim Example Using NA approach

10 1 10 0.9

20 0 9 0.9

35 1 8 0.788

40 0 7 0.788

50 0 6 0.788

55 1 5 0.63

70 0 4 0.63

71 0 3 0.63

80 1 2 0.315

90 0 1 0.315

i

i

i

dY

t t

H t

H tS t e ˆ 1 i

i

i

dY

t t

S t

id iYTime

Fleming-Harrington Estimate

• Almost equivalent to NA estimate of

• Handles ties slightly differently• If there were 3 deaths out of 10 at risk:

– Nelson estimate increments the hazard by 3/10– Fleming and Harrington increments the hazard by

1/10 + 1/9 + 1/8

1

11

1 0

ˆNA Breslow :

ˆFH :

diYi

t ti

di k

Ykk j

S t e

S t e

S t

Kim Example KM: black, FH: red, NA: green

1 i

ii

dYt t

KM

id iYTime

Ties

exp i

ii

dYt t

NA

1

1

1 0

expk

k

di

Y jk j

FH

di = (1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0)

Yi = (10, 20, 40, 40, 40, 50, 50, 70, 71, 80, 90)

R for KM, NA, and FHlibrary(survival)t<-c(10,20,40,40,40,50,50,70,71,80,90)d<-c(1,0,1,1,1,1,0,0,0,1,0)

st<-Surv(t,d)

help(survfit)fit.km<-survfit(st~1)fit.na<-survfit(st~1, type="fleming-harrington")fit.fh<-survfit(st~1, type="fh")

fit.kmfit.fhsummary(fit.km)summary(fit.na)summary(fit.fh)

plot(fit.km, conf.int=F, xlab="Time to Relapse (months)", ylab="Survival Function", lwd=2)

lines(fit.na, type="s", lwd=2, col=2)lines(fit.fh, type="s", lwd=2, col=3)

Cumulative Hazard• fun: an arbitrary function defining a transformation of the

survival curve. – fun = log is alternative way to draw a log-survival curve (but with the

axis labeled with log(S) values)– fun = sqrt would generate a curve on the square root scale

• Four often used transformations can be specified with a character argument instead– “log” is the same as using the log=T option– “event” plots the cumulative events (f(Y) = 1-y)– “cumhaz” plots the cumulative hazard function f(y) = -log(y)– “cloglog” creates a complimentary log-log survival plot (f(y) = log(-

log(y)) along with log scale for the x-axis).

Help for Generic “plot” Function in R

R Documentationplot {graphics}

Generic X-Y Plotting

Description

Generic function for plotting of R objects. For more details about the graphical parameter arguments, see par.

For simple scatter plots, plot.default will be used. However, there are plot methods for many R objects, including functions, data.frames, density objects, etc. Use methods(plot) and the documentation for these.

Usage

plot(x, y, ...)

Help File for “plot.survfit”R Documentation

plot.survfit {survival}Plot method for survfit objects

Description

A plot of survival curves is produced, one curve for each strata. The log=T option does extra work to avoid log(0), and to try to create a pleasing result. If there are zeros, they are plotted by default at 0.8 times the smallest non-zero value on the curve(s).

Usage

## S3 method for class 'survfit'plot(x, conf.int=, mark.time=TRUE, mark=3, col=1, lty=1, lwd=1, cex=1, log=FALSE, xscale=1, yscale=1, firstx=0, firsty=1, xmax, ymin=0, fun, xlab="", ylab="", xaxs="S", ...)

Cumulative Hazard>class(fit.km)[1] “survfit”

>plot(fit.km, conf.int=F, fun="cumhaz", lwd=2,xlab="Time to Relapse (months)", ylab="H(t)")

>lines(fit.na, type="s", fun="cumhaz", lwd=2, col=2, conf.int=F)>lines(fit.fh, type="s", fun="cumhaz", lwd=2, col=3 ,

conf.int=F)>legend(2, 1.4, c("Kaplan-Meier","Nelson-Aalen","Fleming-Harrington"), col=1:3, lwd=2)

Cumulative Hazard

Interpreting S(t) and H(t)

• General philosophy– Bad to extrapolate

• In survival– Bad to put a lot of stock in estimates at late time

points– Have less data at later times

Observations?

• Convergence to H(t) = lt with increasing N• Could apply to parametric smoothing to get

estimate of h(t), just the slope of the line H(t) versus t

• More divergence for the upper end, where denominator data (risk set) is smaller

• Textbook discusses bias in S(t) at tmax

• Can estimate S(t) by 0 beyond tmax (negatively biased)

• Can estimate S(t) = S(tmax) for t > tmax (positively biased)

• When there is no censoring, the product limit estimator reduces to the empirical survival function

Point-wise Confidence Intervals

• Constructed to ensure that the true value of S(t) at a particular t, falls in the interval with (1 - a)% confidence

• Notation:

• Recall that this is the sum in the Greenwood’s formula:

2

2

ˆˆˆ

ˆS

V S tt

S t

2ˆ ˆˆ i

i i i

i

d

Y Y dt t

V S t S t

“Linear” CIs

• Most commonly estimated is stats package• It is a point-wise CI for t• For simplicity of notation, assume 95%

confidence

0 0 0

0 0

ˆ ˆ1.96

or

ˆ ˆˆ1.96

SS t t S t

S t V S t

There are Other Better Options

• Transformations have better properties• Two main approaches:

– Log transformations: based on cumulative hazard approach

– Arcsine square root

Log Transformation

• Define q:

• Then, the 95% CI is

00

0 0

1.961.96

exp expˆ ˆln ln

i

i i i

i

d

Y Y dt tS t

S t S t

1

0 0ˆ ˆ,S t S t

Derivation of the Log Transformation

Log-log transformation

• Since the survival function estimates a probability, it is bounded by 0 and 1

• Taking the log results in bounds:

• Taking the opposite results in bounds

• Taking the double log results in bounds

ˆlog 0S t

ˆlog log S t

ˆ0 log S t

Complimentary log-log transformation

Log-Log Transformation

• Can estimate a confidence interval for the double log transformation– Estimate variance (delta method)– Use estimate to define CI according to:

2

2

1

1

ˆ ˆˆupper: log log log log

ˆ ˆˆlower: log log log log

S t z Var S t

S t z Var S t

+

-

Log-Log Transformation• To get the CI for the survival function at time t

– Must back transform from the double log

2

2

2

1

1

1

ˆ ˆˆ ˆLet log log & log log

ˆ ˆback transfomation: exp exp ,exp exp

substiture:

ˆ ˆˆexp exp log log log log

ˆ ˆˆexp exp log log log log

:

ˆ ˆ,A Ae e

L t S t A z Var S t

L t A L t A

S t z Var S t

S t z Var S t

yields

S t S t

Arcsin Squareroot

• Very ugly:

12

12

12

12

020 0

0

020 0

0

ˆˆsin max 0,arcsin 0.5 1.96

ˆ1

ˆˆsin min ,arcsin 0.5 1.96

ˆ2 1

S

S

S tLL S t t

S t

S tUL S t t

S t

Cumulative Hazard CIs

• Linear• Log• Arcsin square root

• See KI & Mo page 107

Which to Use When?

• For N > 25 and < 50% censoring– Log and log-log are good– Arcsin square root good– Both given ~ nominal coverage for 95% CI– Exception: extreme right tail where there is little

data• Linear approach requires much larger N for

good coverage

Which to Use When?

• Arcsin square root– Slightly conservative– A little wider than necessary

• Log– Slightly anti-conservative– A little too narrow

• Linear– Overly anti-conservative– Too narrow

• Large Samples: all about the same

Remember…

• Valid for point-wise intervals• Common incorrect interpretation:

– Plot a set of point-wise 95% CIs– Interpret as confidence “band”– These “bands” are too narrow!

Example: Tongue Cancer data

0 100 200 300 400

0.0

0.2

0.4

0.6

0.8

1.0

Time to Death (months)

Su

rviv

al

AneuploidDiploid

R Codelibrary(survival)tongue<-read.csv("H:\\BMTRY_722_Summer2015\\Tongue.csv")dat<-Surv(tongue$Time, tongue$Cens)type<-tongue$Typeplot(survfit(dat~type), conf.int=T, col=c(1,2),lty=c(2,1), lwd=c(2,2), xlab="Time to Death (months)", ylab="Survival", cex.axis=0.9)legend(300, .9, c("Aneuploid","Diploid"), lty=c(2,1), col=c(1,2), lwd=c(2,2), cex=0.8)

Add CIs: “plain”

Just Diploid Tumors

R Codefit.lin<-survfit(dat[type==2]~1, conf.type="plain")fit.log<-survfit(dat[type==2]~1, conf.type="log")fit.loglog<-survfit(dat[type==2]~1, conf.type="log-log")plot(fit.log, conf.int=T, col=2, lwd=2, lty=4)lines(fit.loglog, conf.int=T, col=3, lwd=2, lty=2)lines(fit.lin, conf.int=T, col=1, lwd=2, lty=1)legend(165, 1, c("Linear","Log","Log-Log"), col=c(1,2,3), lty=c(1,4,2), lwd=2)

summary(fit.lin)summary(fit.log)summary(fit.loglog)

R Results: Linear>summary(fit.lin)Call: survfit(formula = dat[type == 2] ~ 1, conf.type = "plain")

time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 28 1 0.9643 0.0351 0.89555 1.000 3 27 1 0.9286 0.0487 0.83318 1.000 4 26 1 0.8929 0.0585 0.77829 1.000 5 25 2 0.8214 0.0724 0.67957 0.963 8 23 1 0.7857 0.0775 0.63373 0.938 12 21 1 0.7483 0.0824 0.58683 0.910 13 20 1 0.7109 0.0863 0.54165 0.880 18 19 1 0.6735 0.0895 0.49797 0.849… 62 12 1 0.4116 0.0948 0.22581 0.597 69 10 1 0.3704 0.0938 0.18654 0.554 104 8 2 0.2778 0.0904 0.10069 0.455 112 5 1 0.2222 0.0877 0.05031 0.394 129 4 1 0.1667 0.0815 0.00692 0.326 181 2 1 0.0833 0.0717 0.00000 0.224

R Results: Log> summary(fit.log)Call: survfit(formula = dat[type == 2] ~ 1, conf.type = "log")

time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 28 1 0.9643 0.0351 0.8979 1.000 3 27 1 0.9286 0.0487 0.8379 1.000 4 26 1 0.8929 0.0585 0.7853 1.000 5 25 2 0.8214 0.0724 0.6911 0.976 8 23 1 0.7857 0.0775 0.6475 0.953 12 21 1 0.7483 0.0824 0.6031 0.929 13 20 1 0.7109 0.0863 0.5603 0.902 18 19 1 0.6735 0.0895 0.5190 0.874 … 62 12 1 0.4116 0.0948 0.2621 0.646 69 10 1 0.3704 0.0938 0.2255 0.608 104 8 2 0.2778 0.0904 0.1468 0.526 112 5 1 0.2222 0.0877 0.1025 0.482 129 4 1 0.1667 0.0815 0.0639 0.435 181 2 1 0.0833 0.0717 0.0155 0.449

R Results: Log-Log> summary(fit.loglog)Call: survfit(formula = dat[type == 2] ~ 1, conf.type = "log-log")

time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 28 1 0.9643 0.0351 0.77244 0.995 3 27 1 0.9286 0.0487 0.74348 0.982 4 26 1 0.8929 0.0585 0.70356 0.964 5 25 2 0.8214 0.0724 0.62296 0.921 8 23 1 0.7857 0.0775 0.58401 0.898 12 21 1 0.7483 0.0824 0.54320 0.871 13 20 1 0.7109 0.0863 0.50381 0.844 18 19 1 0.6735 0.0895 0.46569 0.815… 62 12 1 0.4116 0.0948 0.22854 0.586 69 10 1 0.3704 0.0938 0.19454 0.547 104 8 2 0.2778 0.0904 0.12160 0.459 112 5 1 0.2222 0.0877 0.08081 0.407 129 4 1 0.1667 0.0815 0.04693 0.350 181 2 1 0.0833 0.0717 0.00748 0.283

Lecture 5: Non-Parametric Estimation Survival Function Cumulative Hazard Function Confidence...

Documents

Transcript of Lecture 5: Non-Parametric Estimation Survival Function Cumulative Hazard Function Confidence...