STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420...

37
STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1

Transcript of STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420...

Page 1: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

STT520-420: BIOSTATISTICS ANALYSIS

Dr. Cuixian Chen

Chapter 5: Censoring and Lifetables

STT520-420 1

Page 2: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring the study ends before the event occurs the subject is lost to follow-up during the study (e.g.,

they could have moved out of town) the subject withdraws from the study because of death

(assuming death is not the event of our interest, such as car incident) or because of some other reason.

these are all “right-censored” since the event occurs to the right (larger than) the time we last observe

STT520-420

2

Censoring

Page 3: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

STT520-420

3

Right censored data

Page 4: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Examples of censored data

STT520-420

4

Page 5: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Right censored data

STT520-420

5

Page 6: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

STT520-420

6

Right censored data representation

Representation #1: (6, 1), (6, 0), (8, 1), (10, 0), (14, 1);

Representation #2: 6, 6+, 8, 10+, 14.

Page 7: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

STT520-420

7

Left censored data examples

Example 2: if you are studying menarche and you begin following girls at age 12, you may find that some of them have already begun menstruating. Unless you can obtain information about the start date for those girls, the age of menarche is left-censored at age 12.*from:Allison, Paul. Survival Analysis. SAS Institute. 1995.

Page 8: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Double censoring Def: If a dataset contains right-censored (RC) observations, left-censored

(LC) observations and exact/uncensored observations, but not strict interval censored observations. It is called double-censored data (DC data).

Case-I interval-censored data occur when each study subject is observed only once and the only observed info for the survival event of interest is whether the event has occurred no later than the observation time.

STT520-420

8

Page 9: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Interval censored data

STT520-420

9

Example 2: if you’re screening subjects for HIV infection yearly, you may not be able to determine the exact date of infection.**from:Allison, Paul. Survival Analysis. SAS Institute. 1995.

Page 10: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Type I/II censoring

In type I censoring, the # of uncensored/exact observations is a r.v.; Eg: Left-censored data; Right-censored data; Double-censoring data; Interval-censoring data.

On the other hand, in type II censoring, the # of uncensored/exact observations is fixed in advanced. Only the first r<n lifetimes are observed (in reliability

in engineering).

STT520-420

10

Page 11: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Definitions of Censoring and truncation

STT520-420

11

Page 12: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Left truncated data

STT520-420

12

Page 13: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Right truncated data

STT520-420

13

Page 14: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Truncated data example

With a given telescope, we can only detect a very distant stellar object which is brighter than some limiting flux:

– the object is left-truncated if it lies beyond detection by our telescope

– we can’t tell if the object is even there if we can’t see it.

STT520-420

14

Page 15: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example: Types of censoring and truncation

STT520-420

15

Page 16: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

About HIV and AIDS

HIV is a lot like other viruses, including those that cause the "flu" or the common cold. HIV can hide for long periods of time in the cells of your body.

Over time, HIV can destroy so many of your CD4 cells that your body can't fight infections and diseases anymore. When that happens, HIV infection can lead to AIDS.

AIDS is the final stage of HIV infection. People at this stage of HIV disease have badly damaged immune systems, which put them at risk for opportunistic infections (OIs).

STT520-420

16

Page 17: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

STT520-420

17

Example: Types of censoring and truncation

Page 18: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

STT520-420

18

Example: Types of censoring and truncation

Page 19: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

STT520-420

19

Review--Right censored data representation

Representation #1: (6, 1), (6, 0), (8, 1), (10, 0), (14, 1);

Representation #2: 6, 6+, 8, 10+, 14.

Page 20: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

We represent censored data as ordered pairs (def. 5.2): Y1,Y2,…Yn are right-censored by t1,t2,…tn

if the sample consists of (Zi, i), where Zi=min(Yi, ti), and

Note that ti is the value of Zi when the observation is censored, and Y is observed when uncensored.

assume Y’s are independent of the t’s

STT520-420

20

Right Censored Model

)( ,0

)/( ,1

censoredtYif

exactuncensoredtYif

ii

iii

Page 21: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example 5.2 (Stanford Heart Transplant Data) - note the form of the dataset with the Days=Z, Cens=delta, other explanatory variables are Age, and T5.

Note in Example 5.3 the notation of using a “+” sign to represent a right-censored observation; the survival variable is astrocytoma's survival time until death resulting from tumors

STT520-420

21

Examples of censoring

Page 22: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Motivating example: leukemia

STT520-420

22

Go back to Exercise 4.5 on page 68. Note the censoring in the treatment group (with “+”) but not in the placebo group - what is the meaning of these censored observations?

Page 23: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

1. Now let’s write the data in Exercise 4.5 in a form that it can be analyzed with various computer programs…

2. How many variables are there of interest? (We need a column for each variable…). How many observations are there? [ I’ll propose ID, remission time, censor indicator, treatment group]

3. Use Excel to organize the data and then we’ll read it into R (or SAS later) for analysis

4. Use read.csv(file=file.choose()) to get the data into R…

STT520-420

23

Example 4.5, page 68

Page 24: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example 4.5, page 68

## Example 4.5, page 68

## To read data from a *.csv file from online:

data=read.csv("http://people.uncw.edu/chenc/STT520_420/dataset/EX4.5.csv", header = TRUE);

## Or use local directory:

data=read.csv("E:/EX4.5.csv",header = TRUE);

## To write NEW_data into a *.csv file

write.csv(NEW_data, “Z:/Mydata.csv",header = TRUE);

/*Your local timmy drive*/

STT520-420

24

Page 25: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Section 5.4: Lifetable estimates Divide the lifetime axis into fixed disjoint intervals Estimate the conditional probability of survival across each

interval Estimate S (the survival) at the endpoints of the intervals

The intervals of times are represented as

the choice of the endpoints is up to the data analyst In a lifetable, the number at risk in any interval is the number alive

and under consideration (not censored) at the start of the interval. For any interval Ij, we write Nj=number at risk in Ij, ;

Dj = number of deaths (or observed failures) in Ij, ;

Wj= number of observations censored in Ij .

I j [a j 1, a j ), j 1,2,...k 1; a0 0; ak1

STT520-420

25

Lifetable estimates: nonparametric estimate survival function with right censored data

Page 26: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Note: N1=n, the total sample size is initially at risk

Nj = Nj-1 - Dj-1 - Wj-1 ; this shows the propagation of those at risk in the j-1 interval to the j interval.

Write: pj=P(surviving thru Ij | alive at start of Ij)

= P(Y > aj | Y > aj-1) = S(aj)/S(aj-1)

Note that p1=S(a1) since S(a0)=S(0)=1

Then p2=S(a2)/S(a1)=S(a2)/p1 ; so S(a2)=p2*p1 ;

Continuing p3=S(a3)/S(a2)=S(a3)/(p2*p1); so S(a3)=p3*p2*p1;

and so forth till we get Theorem 5.1 (p. 82)

which states that for every j, S(aj)=pj*…*p3*p2*p1 , where pj = the conditional probability of surviving across Ij given alive at the start of Ij . Use this theorem to estimate the survival at the endpoints of the intervals in the lifetable.

STT520-420

26

Lifetable estimates

Page 27: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

In order to get the S’s, we need to estimate the p’s… The usual estimate of a proportion works here (5.3).

Note that when estimating 1-pj, we’re estimating the conditional probability of dying in the interval, given they were alive at the start of the interval… So:

We define the effective number at risk as

which essentially assumes the censoring occurs uniformly across the interval. So we apply this to our estimator above and get the actuarial estimate

ˆ p j 1# dying in I j

number w / potential to die in I j

N j N j .5W j

˜ p j 1D j

N j, j 1,2,...,k 1

STT520-420

27

Lifetable estimates

)(*) (#

],( #)(ˆ

tttimeatsurvivingpatientsof

tttintimeunitperdyingpatientsofth

Page 28: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

If for a given j, Nj’=0, then take the estimate to be 0.

So to estimate S(aj), use Two basic assumptions for the construction of lifetables

are: censor times are independent of lifetimes…this assures the p j is the

same for each individual failure times and censor times in a given interval are uniformly

distributed across the interval Think of a lifetable as a generalization of a frequency

histogram that accounts for right censoring. See Example 5.6 on page 83-84 of melanoma survival (defined as time from first treatment for melanoma to death - in years). Let’s go over this data carefully to understand the computations…try in Excel…and later in SAS!

jj pppaS ~...~~)(~

21

STT520-420

28

Lifetable estimates

Page 29: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example 5.6 on page 83-84 of melanoma survival

The following data on 913 male and female patients with malignant melanoma, treated in the M.D. Anderson Tumor Clinic b/w 1944 and 1960. Here the survival time is defined as time from first treatment for melanoma to death - in years.

Use Lifetable method to find the survivor function.

STT520-420

29

N j N j .5W j

˜ p j 1D j

N j, j 1,2,...,k 1

Page 30: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example 5.6 on page 83-84

## Third method to read in the dataset from online ##

data=read.csv("http://people.uncw.edu/chenc/STT520_420/dataset/Eg5_6.csv", header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE);

d=data[,1]; ## Deaths

w=data[,2]; ## Censored/Withdraws/Losses of followups

n=data[,3]; ## # of at risk

## Then we start to work on the lifetable estimates…

STT520-420

30

Page 31: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

## Example 5.6, page 83-84, Recursive calculation in a lifetable

data=read.csv("http://people.uncw.edu/chenc/STT520_420/dataset/Eg5_6.csv", header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE);

(D=data[,1]); ## deaths

(W=data[,2]); ## censored/withdraws/loss of follow-ups

(N=data[,3]); ## risk

print(cbind(N,D,W));

## since W10=NA, we reassign its value as 0.

W[10]=0;

## Effective number at risk in j-th interval; use of () here is to print output directly.

(N.eff=N-0.5*W);

## Actuary estimate of P_j ##

(P=1-D/N.eff); (Q=1-P); n=length(N); S=rep(n, 0)

for (j in 1:n)

{

S[j]=prod(P[1:j]);

}

print(cbind(N, N.eff, D, W, Q, P, S));

STT520-420

31

Example 5.6 on page 83-84

## Plot the estimated survial function for each interval ##x=1:10;S=c(1, S); x=c(0, x); ## Add the starting point of S(0)=1.plot(x, S, type="s");

Page 32: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Greenwood’s formula gives an error bound around the lifetable estimates . I won’t go through the derivation, but if you’re interested, see pages 85-86.

Theorem 5.3 (Greenwood’s Formula). The standard error of the lifetable estimate is given by

This formula is usable as long as the effective number at risk is not too small in the intervals.

See Example 5.7 on page 87 for a use of this formula. Go over Example 5.8 - use SAS

˜ S (a j )

1,...2,1 , ~

~)(

~ ))(

~(

1

kjNp

qaSaSSE

j

iii

ijj

STT520-420

32

Greenwood’s formula for Lifetable method

Page 33: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example 5.7: Find the standard error: 5-year survival prospects for melanoma patients by Greenwood formula.

STT520-420

33

Example 5.7, page 87

SE=(0.356)*sqrt(0.047/((0.953)*(149))+0.136/((213)*(0.864))+0.148/((304)*(0.852))+0.205/((468)*(0.795))+0.361/((865)*(0.639)))= 0.01899.

1,...2,1 , ~

~)(

~ ))(

~(

1

kjNp

qaSaSSE

j

iii

ijj

Page 34: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example 5.8 - use SAS

STT520-420

34

Ij, Dj, Wj, Nj’, qj, SE(qj), est of S(aj), 1- est of S(aj), SE(est of S(aj)) … pdf, SE(pdf), hazard, SE(hazard)

Page 35: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Example 5.8 - use SAS

STT520-420

35

That means, by HAND, we take S^~(0)=1, [we add this in] S^~(1)=0.6393 [taking the right endpoint of the intervals] and

so on. Similar ideas to the Greenwood formula. But to the output from SAS, we need to be CAREFUL! It started from

1, rather than 0.6393. Therefore, for SAS output, you need make some adjustment to understand the output as:

S^~(0)=1 [taking the left endpoints of intervals.] S^~(1) = 0.6393, and so on...

Note: Let’s compare PPT #33 and #34: When we use hand to estimate survival function by hand (in #33), the survival function starts from 0.6393, instead from 1.

Page 36: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

More about lifetable

Background: (1) # of observations are large; and (2) event times are measured crudely. Also called Actuarial method.

Advantage: (1) life times are grouped into intervals of time (can

be as long or as short as you like) (2) Can produce estimation and plots of the hazard

function in SAS/R. Disadvantage:

(1) choice of intervals is somewhat arbitrary (uncertainty about how to choose intervals)

(2) inevitable loss of informationSTT520-420

36

Page 37: STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420 1.

Review: How to decide bandwidth for histogram

STT520-420

37

Rule of thumb: start

with 5 to 10 bins.

Look at the distribution

and refine your bins

(There isn’t a unique or

“perfect” solution)