Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

Post on 06-Jan-2016

46 views 0 download

description

Quick Overview to Survival Analysis. Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon 2007. 6. 2. Outline. Survival data Censoring & Truncation Survivor function & Hazard function Kaplan-Meier estimator Log-rank test - PowerPoint PPT Presentation

Transcript of Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

Jinheum KimJinheum Kim

(*******@suwon.ac.kr)(*******@suwon.ac.kr)

Department of Applied StatisticsDepartment of Applied Statistics

University of SuwonUniversity of Suwon

2007. 6. 22007. 6. 2

Jinheum KimJinheum Kim

(*******@suwon.ac.kr)(*******@suwon.ac.kr)

Department of Applied StatisticsDepartment of Applied Statistics

University of SuwonUniversity of Suwon

2007. 6. 22007. 6. 2

2

Outline

Survival data

Censoring & Truncation

Survivor function & Hazard function

Kaplan-Meier estimator

Log-rank test

Cox proportional hazards model

Illustration with an example

3

What is survival analysis?

Outcome variable: Time until an event occurs

Time origin (eg, birth date, occurrence of entry into a study or diagnosis of a disease)

Time (eg, years, months, weeks or days)

Event (eg, death, disease incidence, relapse from remission…)

( )T

4

Censoring

Censoring: Don’t know survival time exactly

Why censoring may occur?

No event before the study ends

Lost to follow-up

Withdrawn from the study

5

A hypothetical example

Study end

Study end

3T

12T

3.5T

8T

6T

3.5T

Withdrawn

Lost

2 4 6 8 10 12Weeks

A

B

C

D

E

F

6

Another types of censoring

Left censoring: , observed to fail prior to

Eg, Time to first use of marijuana

Q: When did you first use marijuana?

A: exact age, “I never used it.” or “I have used it but can

not recall just when the first was.”

[0, )T c

[0, )T c c

Double censoring

Interval censoring:

Eg, Time to cosmetic deterioration of breast

cancer patients

( , )T a b

7

Censoring vs. Truncation

When occurs?

Only those individuals whose event time lies within a certain observational window are observed( , )L RY Y

In contrast to censoring where there is at least partial information on each subject

Left truncationWhen Eg, Life lengths of elderly residents of a retirement community

Right truncation When Eg, Waiting time from infection at transfusion to clinical onset of AIDS

(sampled on June 30, 1986)

RY

0LY

8

Illustration

Data on 137 bone marrow transplant patients

Risk factors: patient and donor age, sex, and CMV status, waiting time from diagnosis to transplantation, FAB, MTX

Three groups: AML low risk(54), AML high risk(45), ALL(38)

Survival times

: time(in days) to death or end of study

: disease-free survival time(time to relapse, death or end of study)

: time to acute GvHD

: time to chronic GvHD

: time to return of platelets to normal levels

1T

2T

AT

CT

PT

9

Simplified recovery process from BMT

TRANSPLANT

Relapse

acute GvHD

Death

plateletrecovery

acute GvHD

plateletrecovery

10

Survivor function

(definition)

Probability that a person survives longer than

( ) ( )S t P T t t

(properties)

§ Non-increasing

§

§ Eventually nobody would survive

(0) 1S

( ) 0S

11

Hazard function

(definition)

instantaneous potential per unit time for the event

to occur , given that the individual has survived up to

0

( | )( ) lim

t

P t T t t T th t

t

t

(properties)

§ Non-negative

§ No upper bound

12

vs.

Focus on not failing vs. failing

The higher is, the smaller is

( )S t ( )h t

( )S t ( )h t

Directly describe survival vs. insight about conditional

failure rates

(relational formula)

or 0( ) exp ( )

tS t h u du

( ) /( )

( )

dS t dth t

S t

13

Three goals of survival analysis

Estimate survivor and/or hazard functions

Compare survivor and/or hazard functions

Assess the relationship of explanatory variables to survival time

14

Kaplan-Meier estimator

(Distinct) observed survival times:

, conventionally 1 2 kt t t 0 10, kt t

: # of individuals fail at ( 0, , )jd j k jt

: # of individuals censored in jm 1[ , )j jt t

: # of individuals at risk just prior to

( ) ( )j j j k kn m d m d

jt

multiplying (1-observed proportion

of failures) at each survival times |

ˆ( ) 1j

j

j t t j

dS t

n

t

15

Remarks on

Never reduce to zero if Not defined for

>(largest time recorded)

ˆ( )S t

0km t

(estimated asymptotic variance)

: Greenwood’s formula 2

|

ˆ ˆvar( ( )) ( )( )

j

j

j t t j j j

dS t S t

n n d

Pointwise 95% confidence interval for :

linear and symmetrical, but possibly lies out of (0,1) and low coverage rate with very small samples

( )S t ˆ ˆ( ) 1.96 ( ( ))S t se S t

Life table estimator: used for the survival data grouped into convenient intervals

Nelsen-Aalen estimator for cumulative hazard function :

( )t

|

ˆ ( )j

j

j t t j

dt

n

16

Illustration (revisited)

Survival time=time to relapse, death or end of study,

i.e, disease-free time

Estimated disease-free survival curves:

AML low risk>ALL>AML high risk

Estimated cumulative hazard rates

17

Survival curves for three disease groups

18

CI of survival for ALL group

19

CI of survival for high-risk AML group

20

CI of survival for low-risk AML group

21

Comparison of survivor functions

Test whether or not the survivor functions for two groups are equivalent

: (distinct) survival times by pooling all the sample from two groups

1 2 kt t t

: (observed ) # of failures at in group,

ijd jt2

1j ijid d

: # of individuals at risk just prior to in group

ijnjt

2

1j ijin n

1,2; 1, ,i j k

i

22

Comparison of survivor functions

Idea: Based on 1 1

k kiji ij j ij ijj j

j

nZ d d O E

n

Log-rank statistic

§ under

§ If reject a test for equality of the survivor

functions at level

21 12 2

1 1 11 1( ) 1

1

k k j j j jj j jj j

j j j

n n n dX O E d

n n n

0H

2 21 ( ),X

23

Remarks for log-rank test

Choice of weight function: , specially for

log-rank test

( )jw t ( ) 1jw t

Extension to three or more groups

Stratification on a set of covariates

Trend test for ordered alternatives

: plugging in any set of scores

24

Illustration (revisited)

Test that the disease-free survival curves of three

groups are same over 2,204

( ) 1jw t Three types of test statistic

Log-rank: 13.8037 (p-value=0.0010) with

Gehan: 16.2407 (p-value=0.0003) with

Taron-Ware: 15.6529 (p-value=0.0004) with

highly significant!

( ) 1jw t

( )j jw t n

( )j jw t n

t

25

Cox proportional hazards model

Why regression models need? To predict covariates(or explanatory variables, risk factors) for time to event

Data: , , ; min( , ), ( ), 1,2, ,j j j j j j j j jt z t x c I t x j n

Cox model : hazard rate at for

an individual with risk vector

'0( | ) ( )exp( )h t z h t z tz

A sort of semiparametric model parametrically for

the covariate effect + nonparametrically for baseline hazard function

Why PH is called?

RR(or HR)= is constant against **

( | )exp ( )

( | ) k k k

h t zz z

h t z t

26

Illustration (revisited)

Background: to adjust the comparisons of the three risk groups because this was not a randomized clinical trial

Fixed risk factors =1 if AML low-risk, =1 if AML high-risk

=waiting time

=FAB

=MTX

=1 if donor: male; =1 if patient: male;

=1 if donor & patient: male

=1 if donor: CMV positive; =1 if patient: CMV positive;

=1 if donor & patient: CMV positive

=donor age-28; =patient age-28;

1z 2z

3z

4z

5z

6z 7z

8 6 7z z z

9z 10z

11 9 10z z z

12z 13z 14 12 13z z z

27

ANOVA table for final model (fixed only)

0.265 1.24 0.363 -0.404 1

0.002 9.48 0.354 -1.091 1

p-ValueWald

Chi SquareSE(b)b

Degrees ofFreedom

1Z

2Z

0.001 11.01 0.001 0.003 1

0.728 0.12 0.020 0.007 1

0.831 0.05 0.018 0.004 112Z

13Z

14Z

0.279 0.003 9.03 0.837 14Z

28

Other regression models

Additive hazards model:

Accelerated failure time model:

Focus on direct relationship between and time to event

Effect of covariates is multiplicative on rather then on

hazard function

Parametric, but providing a good fit if correctly chosen

0( | ) ( )h t z h t z

logT z

z

t

29

Refinements of Cox model

Stratification

§ When the PH assumption is violated for some covariate

§'

0( | ) ( ) exp( ), 1,2, ,j jh t z h t z j s

Time-dependent covariates

§ Eg, BP, cholesterol, size of the tumor …

§ '0( | ( )) ( ) exp( ( ))h t z t h t z t

30

Illustration (revisited)

Time-dependent covariates

Whether or not aGvHD occurs at time NS!

Whether or not cGvHD occurs at time NS!

Whether or not the platelets recovered at time Significant!

t t

t

Final risk factors

Fixed-time effects: Disease group, FAB, Age

Time-dependent effect: Platelet recovery

Time-dependent interactions: Disease group Platelet recovery,

Age Platelet recovery, FAB Platelet recovery

31

Three regressions with a time-dependent covariate

0.0006 11.8657 0.3280 -1.1297 1

0.1542 2.0306 0.2676 0.3813 1

0.0862 2.9435 0.2892 -0.4962 1

0.4982 0.4588 0.2876 -0.1948 1

0.1732 1.8548 0.2685 0.3657 1

0.0356 4.4163 0.2962 -0.6225 1

0.2642 1.2470 0.2851 0.3184 1

0.1110 2.5400 0.2722 0.4338 1

0.0554 3.6690 0.2880 -0.5516 1

p-ValueWald

Chi SquareSE(b)b

Degrees ofFreedom

1Z

2Z( )AZ t

1Z

2Z( )CZ t

1Z

2Z

( )PZ t

32

ANOVA table for final model (fixed+time-variant)

0.1814 1.786 0.0020 0.0026 1

0.0072 7.229 0.0434 0.1166 1 Donor age -28

0.0048 7.948 0.0545 -0.1538 1 Patient age -28

0.2676 1.229 1.1139 -1.2348 1

AML with FAB

Grade 4 or 5

0.3658 0.818 1.2242 1.1071 1 AML high risk

0.1103 2.550 0.8186 1.3073 1 AML low risk

p-ValueWald

Chi SquareSE(b)b

Degrees ofFreedom

1 :Z

2 :Z

3 :Z

4 :Z

5 :Z

6 4 5Z Z Z 0.6589 0.195 0.6936 -0.3062 1 Platelet Recovery( ) :pZ t

8 2 ( )pZ Z Z t 0.1479 2.093 1.2908 -1.8675 1

0.0010 10.765 0.9257 -3.0374 17 1 ( )pZ Z Z t

0.0346 4.467 1.1609 2.4535 19 3 ( )pZ Z Z t

0.9561 0.003 0.0023 0.0001 1

0.0022 9.383 0.0480 -0.1470 1

0.0010 10.821 0.0588 0.1933 110 4 ( )pZ Z Z t 11 5 ( )pZ Z Z t

12 6 ( )pZ Z Z t

33

Tests of PH assumption

FactorWald

Chi SquareDegrees ofFreedom

p-Value

Disease group 1.735 2 0.4200

Waiting time 0.005 1 0.9441

FAB 0.444 1 0.5051

MTX 4.322 1 0.0376

Sex 0.220 3 0.9743

CMV status 1.687 3 0.6398

Age 4.759 3 0.1903

34

What did we overview so far?

How to define survival data

Censoring vs. truncation

Survivor function vs. hazard function and their relation

How to estimate the survival function: KM estimator

How to compare survival functions: Log-rank test

How to estimate risk factors: Cox proportional hazards model with fixed and/or time-dependent covariates

Illustrations with BMT data

35

THANK YOU!