Jinheum Kim ([email protected]) Department of Applied Statistics University of Suwon

35
Jinheum Kim Jinheum Kim (*******@suwon.ac.kr) (*******@suwon.ac.kr) Department of Applied Statistics Department of Applied Statistics University of Suwon University of Suwon 2007. 6. 2 2007. 6. 2

description

Quick Overview to Survival Analysis. Jinheum Kim ([email protected]) Department of Applied Statistics University of Suwon 2007. 6. 2. Outline. Survival data Censoring & Truncation Survivor function & Hazard function Kaplan-Meier estimator Log-rank test - PowerPoint PPT Presentation

Transcript of Jinheum Kim ([email protected]) Department of Applied Statistics University of Suwon

Page 1: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

Jinheum KimJinheum Kim

(*******@suwon.ac.kr)(*******@suwon.ac.kr)

Department of Applied StatisticsDepartment of Applied Statistics

University of SuwonUniversity of Suwon

2007. 6. 22007. 6. 2

Jinheum KimJinheum Kim

(*******@suwon.ac.kr)(*******@suwon.ac.kr)

Department of Applied StatisticsDepartment of Applied Statistics

University of SuwonUniversity of Suwon

2007. 6. 22007. 6. 2

Page 2: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

2

Outline

Survival data

Censoring & Truncation

Survivor function & Hazard function

Kaplan-Meier estimator

Log-rank test

Cox proportional hazards model

Illustration with an example

Page 3: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

3

What is survival analysis?

Outcome variable: Time until an event occurs

Time origin (eg, birth date, occurrence of entry into a study or diagnosis of a disease)

Time (eg, years, months, weeks or days)

Event (eg, death, disease incidence, relapse from remission…)

( )T

Page 4: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

4

Censoring

Censoring: Don’t know survival time exactly

Why censoring may occur?

No event before the study ends

Lost to follow-up

Withdrawn from the study

Page 5: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

5

A hypothetical example

Study end

Study end

3T

12T

3.5T

8T

6T

3.5T

Withdrawn

Lost

2 4 6 8 10 12Weeks

A

B

C

D

E

F

Page 6: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

6

Another types of censoring

Left censoring: , observed to fail prior to

Eg, Time to first use of marijuana

Q: When did you first use marijuana?

A: exact age, “I never used it.” or “I have used it but can

not recall just when the first was.”

[0, )T c

[0, )T c c

Double censoring

Interval censoring:

Eg, Time to cosmetic deterioration of breast

cancer patients

( , )T a b

Page 7: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

7

Censoring vs. Truncation

When occurs?

Only those individuals whose event time lies within a certain observational window are observed( , )L RY Y

In contrast to censoring where there is at least partial information on each subject

Left truncationWhen Eg, Life lengths of elderly residents of a retirement community

Right truncation When Eg, Waiting time from infection at transfusion to clinical onset of AIDS

(sampled on June 30, 1986)

RY

0LY

Page 8: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

8

Illustration

Data on 137 bone marrow transplant patients

Risk factors: patient and donor age, sex, and CMV status, waiting time from diagnosis to transplantation, FAB, MTX

Three groups: AML low risk(54), AML high risk(45), ALL(38)

Survival times

: time(in days) to death or end of study

: disease-free survival time(time to relapse, death or end of study)

: time to acute GvHD

: time to chronic GvHD

: time to return of platelets to normal levels

1T

2T

AT

CT

PT

Page 9: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

9

Simplified recovery process from BMT

TRANSPLANT

Relapse

acute GvHD

Death

plateletrecovery

acute GvHD

plateletrecovery

Page 10: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

10

Survivor function

(definition)

Probability that a person survives longer than

( ) ( )S t P T t t

(properties)

§ Non-increasing

§

§ Eventually nobody would survive

(0) 1S

( ) 0S

Page 11: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

11

Hazard function

(definition)

instantaneous potential per unit time for the event

to occur , given that the individual has survived up to

0

( | )( ) lim

t

P t T t t T th t

t

t

(properties)

§ Non-negative

§ No upper bound

Page 12: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

12

vs.

Focus on not failing vs. failing

The higher is, the smaller is

( )S t ( )h t

( )S t ( )h t

Directly describe survival vs. insight about conditional

failure rates

(relational formula)

or 0( ) exp ( )

tS t h u du

( ) /( )

( )

dS t dth t

S t

Page 13: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

13

Three goals of survival analysis

Estimate survivor and/or hazard functions

Compare survivor and/or hazard functions

Assess the relationship of explanatory variables to survival time

Page 14: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

14

Kaplan-Meier estimator

(Distinct) observed survival times:

, conventionally 1 2 kt t t 0 10, kt t

: # of individuals fail at ( 0, , )jd j k jt

: # of individuals censored in jm 1[ , )j jt t

: # of individuals at risk just prior to

( ) ( )j j j k kn m d m d

jt

multiplying (1-observed proportion

of failures) at each survival times |

ˆ( ) 1j

j

j t t j

dS t

n

t

Page 15: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

15

Remarks on

Never reduce to zero if Not defined for

>(largest time recorded)

ˆ( )S t

0km t

(estimated asymptotic variance)

: Greenwood’s formula 2

|

ˆ ˆvar( ( )) ( )( )

j

j

j t t j j j

dS t S t

n n d

Pointwise 95% confidence interval for :

linear and symmetrical, but possibly lies out of (0,1) and low coverage rate with very small samples

( )S t ˆ ˆ( ) 1.96 ( ( ))S t se S t

Life table estimator: used for the survival data grouped into convenient intervals

Nelsen-Aalen estimator for cumulative hazard function :

( )t

|

ˆ ( )j

j

j t t j

dt

n

Page 16: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

16

Illustration (revisited)

Survival time=time to relapse, death or end of study,

i.e, disease-free time

Estimated disease-free survival curves:

AML low risk>ALL>AML high risk

Estimated cumulative hazard rates

Page 17: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

17

Survival curves for three disease groups

Page 18: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

18

CI of survival for ALL group

Page 19: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

19

CI of survival for high-risk AML group

Page 20: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

20

CI of survival for low-risk AML group

Page 21: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

21

Comparison of survivor functions

Test whether or not the survivor functions for two groups are equivalent

: (distinct) survival times by pooling all the sample from two groups

1 2 kt t t

: (observed ) # of failures at in group,

ijd jt2

1j ijid d

: # of individuals at risk just prior to in group

ijnjt

2

1j ijin n

1,2; 1, ,i j k

i

Page 22: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

22

Comparison of survivor functions

Idea: Based on 1 1

k kiji ij j ij ijj j

j

nZ d d O E

n

Log-rank statistic

§ under

§ If reject a test for equality of the survivor

functions at level

21 12 2

1 1 11 1( ) 1

1

k k j j j jj j jj j

j j j

n n n dX O E d

n n n

0H

2 21 ( ),X

Page 23: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

23

Remarks for log-rank test

Choice of weight function: , specially for

log-rank test

( )jw t ( ) 1jw t

Extension to three or more groups

Stratification on a set of covariates

Trend test for ordered alternatives

: plugging in any set of scores

Page 24: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

24

Illustration (revisited)

Test that the disease-free survival curves of three

groups are same over 2,204

( ) 1jw t Three types of test statistic

Log-rank: 13.8037 (p-value=0.0010) with

Gehan: 16.2407 (p-value=0.0003) with

Taron-Ware: 15.6529 (p-value=0.0004) with

highly significant!

( ) 1jw t

( )j jw t n

( )j jw t n

t

Page 25: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

25

Cox proportional hazards model

Why regression models need? To predict covariates(or explanatory variables, risk factors) for time to event

Data: , , ; min( , ), ( ), 1,2, ,j j j j j j j j jt z t x c I t x j n

Cox model : hazard rate at for

an individual with risk vector

'0( | ) ( )exp( )h t z h t z tz

A sort of semiparametric model parametrically for

the covariate effect + nonparametrically for baseline hazard function

Why PH is called?

RR(or HR)= is constant against **

( | )exp ( )

( | ) k k k

h t zz z

h t z t

Page 26: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

26

Illustration (revisited)

Background: to adjust the comparisons of the three risk groups because this was not a randomized clinical trial

Fixed risk factors =1 if AML low-risk, =1 if AML high-risk

=waiting time

=FAB

=MTX

=1 if donor: male; =1 if patient: male;

=1 if donor & patient: male

=1 if donor: CMV positive; =1 if patient: CMV positive;

=1 if donor & patient: CMV positive

=donor age-28; =patient age-28;

1z 2z

3z

4z

5z

6z 7z

8 6 7z z z

9z 10z

11 9 10z z z

12z 13z 14 12 13z z z

Page 27: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

27

ANOVA table for final model (fixed only)

0.265 1.24 0.363 -0.404 1

0.002 9.48 0.354 -1.091 1

p-ValueWald

Chi SquareSE(b)b

Degrees ofFreedom

1Z

2Z

0.001 11.01 0.001 0.003 1

0.728 0.12 0.020 0.007 1

0.831 0.05 0.018 0.004 112Z

13Z

14Z

0.279 0.003 9.03 0.837 14Z

Page 28: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

28

Other regression models

Additive hazards model:

Accelerated failure time model:

Focus on direct relationship between and time to event

Effect of covariates is multiplicative on rather then on

hazard function

Parametric, but providing a good fit if correctly chosen

0( | ) ( )h t z h t z

logT z

z

t

Page 29: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

29

Refinements of Cox model

Stratification

§ When the PH assumption is violated for some covariate

§'

0( | ) ( ) exp( ), 1,2, ,j jh t z h t z j s

Time-dependent covariates

§ Eg, BP, cholesterol, size of the tumor …

§ '0( | ( )) ( ) exp( ( ))h t z t h t z t

Page 30: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

30

Illustration (revisited)

Time-dependent covariates

Whether or not aGvHD occurs at time NS!

Whether or not cGvHD occurs at time NS!

Whether or not the platelets recovered at time Significant!

t t

t

Final risk factors

Fixed-time effects: Disease group, FAB, Age

Time-dependent effect: Platelet recovery

Time-dependent interactions: Disease group Platelet recovery,

Age Platelet recovery, FAB Platelet recovery

Page 31: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

31

Three regressions with a time-dependent covariate

0.0006 11.8657 0.3280 -1.1297 1

0.1542 2.0306 0.2676 0.3813 1

0.0862 2.9435 0.2892 -0.4962 1

0.4982 0.4588 0.2876 -0.1948 1

0.1732 1.8548 0.2685 0.3657 1

0.0356 4.4163 0.2962 -0.6225 1

0.2642 1.2470 0.2851 0.3184 1

0.1110 2.5400 0.2722 0.4338 1

0.0554 3.6690 0.2880 -0.5516 1

p-ValueWald

Chi SquareSE(b)b

Degrees ofFreedom

1Z

2Z( )AZ t

1Z

2Z( )CZ t

1Z

2Z

( )PZ t

Page 32: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

32

ANOVA table for final model (fixed+time-variant)

0.1814 1.786 0.0020 0.0026 1

0.0072 7.229 0.0434 0.1166 1 Donor age -28

0.0048 7.948 0.0545 -0.1538 1 Patient age -28

0.2676 1.229 1.1139 -1.2348 1

AML with FAB

Grade 4 or 5

0.3658 0.818 1.2242 1.1071 1 AML high risk

0.1103 2.550 0.8186 1.3073 1 AML low risk

p-ValueWald

Chi SquareSE(b)b

Degrees ofFreedom

1 :Z

2 :Z

3 :Z

4 :Z

5 :Z

6 4 5Z Z Z 0.6589 0.195 0.6936 -0.3062 1 Platelet Recovery( ) :pZ t

8 2 ( )pZ Z Z t 0.1479 2.093 1.2908 -1.8675 1

0.0010 10.765 0.9257 -3.0374 17 1 ( )pZ Z Z t

0.0346 4.467 1.1609 2.4535 19 3 ( )pZ Z Z t

0.9561 0.003 0.0023 0.0001 1

0.0022 9.383 0.0480 -0.1470 1

0.0010 10.821 0.0588 0.1933 110 4 ( )pZ Z Z t 11 5 ( )pZ Z Z t

12 6 ( )pZ Z Z t

Page 33: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

33

Tests of PH assumption

FactorWald

Chi SquareDegrees ofFreedom

p-Value

Disease group 1.735 2 0.4200

Waiting time 0.005 1 0.9441

FAB 0.444 1 0.5051

MTX 4.322 1 0.0376

Sex 0.220 3 0.9743

CMV status 1.687 3 0.6398

Age 4.759 3 0.1903

Page 34: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

34

What did we overview so far?

How to define survival data

Censoring vs. truncation

Survivor function vs. hazard function and their relation

How to estimate the survival function: KM estimator

How to compare survival functions: Log-rank test

How to estimate risk factors: Cox proportional hazards model with fixed and/or time-dependent covariates

Illustrations with BMT data

Page 35: Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

35

THANK YOU!