STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420...

STT520-420: BIOSTATISTICS ANALYSIS

Dr. Cuixian Chen

Chapter 5: Censoring and Lifetables

STT520-420 1

an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring the study ends before the event occurs the subject is lost to follow-up during the study (e.g.,

they could have moved out of town) the subject withdraws from the study because of death

(assuming death is not the event of our interest, such as car incident) or because of some other reason.

these are all “right-censored” since the event occurs to the right (larger than) the time we last observe

STT520-420

2

Censoring

STT520-420

3

Right censored data

Examples of censored data

STT520-420

4

Right censored data

STT520-420

5

STT520-420

6

Right censored data representation

Representation #1: (6, 1), (6, 0), (8, 1), (10, 0), (14, 1);

Representation #2: 6, 6+, 8, 10+, 14.

STT520-420

7

Left censored data examples

Example 2: if you are studying menarche and you begin following girls at age 12, you may find that some of them have already begun menstruating. Unless you can obtain information about the start date for those girls, the age of menarche is left-censored at age 12.*from:Allison, Paul. Survival Analysis. SAS Institute. 1995.

Double censoring Def: If a dataset contains right-censored (RC) observations, left-censored

(LC) observations and exact/uncensored observations, but not strict interval censored observations. It is called double-censored data (DC data).

Case-I interval-censored data occur when each study subject is observed only once and the only observed info for the survival event of interest is whether the event has occurred no later than the observation time.

STT520-420

8

Interval censored data

STT520-420

9

Example 2: if you’re screening subjects for HIV infection yearly, you may not be able to determine the exact date of infection.**from:Allison, Paul. Survival Analysis. SAS Institute. 1995.

Type I/II censoring

In type I censoring, the # of uncensored/exact observations is a r.v.; Eg: Left-censored data; Right-censored data; Double-censoring data; Interval-censoring data.

On the other hand, in type II censoring, the # of uncensored/exact observations is fixed in advanced. Only the first r<n lifetimes are observed (in reliability

in engineering).

STT520-420

10

Definitions of Censoring and truncation

STT520-420

11

Left truncated data

STT520-420

12

Right truncated data

STT520-420

13

Truncated data example

With a given telescope, we can only detect a very distant stellar object which is brighter than some limiting flux:

– the object is left-truncated if it lies beyond detection by our telescope

– we can’t tell if the object is even there if we can’t see it.

STT520-420

14

Example: Types of censoring and truncation

STT520-420

15

About HIV and AIDS

HIV is a lot like other viruses, including those that cause the "flu" or the common cold. HIV can hide for long periods of time in the cells of your body.

Over time, HIV can destroy so many of your CD4 cells that your body can't fight infections and diseases anymore. When that happens, HIV infection can lead to AIDS.

AIDS is the final stage of HIV infection. People at this stage of HIV disease have badly damaged immune systems, which put them at risk for opportunistic infections (OIs).

STT520-420

16

STT520-420

17


STT520-420

18


STT520-420

19

Review--Right censored data representation

Representation #1: (6, 1), (6, 0), (8, 1), (10, 0), (14, 1);

Representation #2: 6, 6+, 8, 10+, 14.

We represent censored data as ordered pairs (def. 5.2): Y1,Y2,…Yn are right-censored by t1,t2,…tn

if the sample consists of (Zi, i), where Zi=min(Yi, ti), and

Note that ti is the value of Zi when the observation is censored, and Y is observed when uncensored.

assume Y’s are independent of the t’s

STT520-420

20

Right Censored Model

)( ,0

)/( ,1

censoredtYif

exactuncensoredtYif

ii

iii

Example 5.2 (Stanford Heart Transplant Data) - note the form of the dataset with the Days=Z, Cens=delta, other explanatory variables are Age, and T5.

Note in Example 5.3 the notation of using a “+” sign to represent a right-censored observation; the survival variable is astrocytoma's survival time until death resulting from tumors

STT520-420

21

Examples of censoring

Motivating example: leukemia

STT520-420

22

Go back to Exercise 4.5 on page 68. Note the censoring in the treatment group (with “+”) but not in the placebo group - what is the meaning of these censored observations?

1. Now let’s write the data in Exercise 4.5 in a form that it can be analyzed with various computer programs…

2. How many variables are there of interest? (We need a column for each variable…). How many observations are there? [ I’ll propose ID, remission time, censor indicator, treatment group]

3. Use Excel to organize the data and then we’ll read it into R (or SAS later) for analysis

4. Use read.csv(file=file.choose()) to get the data into R…

STT520-420

23

Example 4.5, page 68


## Example 4.5, page 68

## To read data from a *.csv file from online:

data=read.csv("http://people.uncw.edu/chenc/STT520_420/dataset/EX4.5.csv", header = TRUE);

## Or use local directory:

data=read.csv("E:/EX4.5.csv",header = TRUE);

## To write NEW_data into a *.csv file

write.csv(NEW_data, “Z:/Mydata.csv",header = TRUE);

/*Your local timmy drive*/

STT520-420

24

Section 5.4: Lifetable estimates Divide the lifetime axis into fixed disjoint intervals Estimate the conditional probability of survival across each

interval Estimate S (the survival) at the endpoints of the intervals

The intervals of times are represented as

the choice of the endpoints is up to the data analyst In a lifetable, the number at risk in any interval is the number alive

and under consideration (not censored) at the start of the interval. For any interval Ij, we write Nj=number at risk in Ij, ;

Dj = number of deaths (or observed failures) in Ij, ;

Wj= number of observations censored in Ij .

I j [a j 1, a j ), j 1,2,...k 1; a0 0; ak1

STT520-420

25

Lifetable estimates: nonparametric estimate survival function with right censored data

Note: N1=n, the total sample size is initially at risk

Nj = Nj-1 - Dj-1 - Wj-1 ; this shows the propagation of those at risk in the j-1 interval to the j interval.

Write: pj=P(surviving thru Ij | alive at start of Ij)

= P(Y > aj | Y > aj-1) = S(aj)/S(aj-1)

Note that p1=S(a1) since S(a0)=S(0)=1

Then p2=S(a2)/S(a1)=S(a2)/p1 ; so S(a2)=p2*p1 ;

Continuing p3=S(a3)/S(a2)=S(a3)/(p2*p1); so S(a3)=p3*p2*p1;

and so forth till we get Theorem 5.1 (p. 82)

which states that for every j, S(aj)=pj*…*p3*p2*p1 , where pj = the conditional probability of surviving across Ij given alive at the start of Ij . Use this theorem to estimate the survival at the endpoints of the intervals in the lifetable.

STT520-420

26

Lifetable estimates

In order to get the S’s, we need to estimate the p’s… The usual estimate of a proportion works here (5.3).

Note that when estimating 1-pj, we’re estimating the conditional probability of dying in the interval, given they were alive at the start of the interval… So:

We define the effective number at risk as

which essentially assumes the censoring occurs uniformly across the interval. So we apply this to our estimator above and get the actuarial estimate

ˆ p j 1# dying in I j

number w / potential to die in I j

N j N j .5W j

˜ p j 1D j

N j, j 1,2,...,k 1

STT520-420

27

Lifetable estimates

)(*) (#

],( #)(ˆ

tttimeatsurvivingpatientsof

tttintimeunitperdyingpatientsofth

If for a given j, Nj’=0, then take the estimate to be 0.

So to estimate S(aj), use Two basic assumptions for the construction of lifetables

are: censor times are independent of lifetimes…this assures the p j is the

same for each individual failure times and censor times in a given interval are uniformly

distributed across the interval Think of a lifetable as a generalization of a frequency

histogram that accounts for right censoring. See Example 5.6 on page 83-84 of melanoma survival (defined as time from first treatment for melanoma to death - in years). Let’s go over this data carefully to understand the computations…try in Excel…and later in SAS!

jj pppaS ~...~~)(~

21

STT520-420

28

Lifetable estimates

Example 5.6 on page 83-84 of melanoma survival

The following data on 913 male and female patients with malignant melanoma, treated in the M.D. Anderson Tumor Clinic b/w 1944 and 1960. Here the survival time is defined as time from first treatment for melanoma to death - in years.

Use Lifetable method to find the survivor function.

STT520-420

29

N j N j .5W j

˜ p j 1D j

N j, j 1,2,...,k 1

Example 5.6 on page 83-84

## Third method to read in the dataset from online ##

data=read.csv("http://people.uncw.edu/chenc/STT520_420/dataset/Eg5_6.csv", header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE);

d=data[,1]; ## Deaths

w=data[,2]; ## Censored/Withdraws/Losses of followups

n=data[,3]; ## # of at risk

## Then we start to work on the lifetable estimates…

STT520-420

30

## Example 5.6, page 83-84, Recursive calculation in a lifetable

data=read.csv("http://people.uncw.edu/chenc/STT520_420/dataset/Eg5_6.csv", header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE);

(D=data[,1]); ## deaths

(W=data[,2]); ## censored/withdraws/loss of follow-ups

(N=data[,3]); ## risk

print(cbind(N,D,W));

## since W10=NA, we reassign its value as 0.

W[10]=0;

## Effective number at risk in j-th interval; use of () here is to print output directly.

(N.eff=N-0.5*W);

## Actuary estimate of P_j ##

(P=1-D/N.eff); (Q=1-P); n=length(N); S=rep(n, 0)

for (j in 1:n)

{

S[j]=prod(P[1:j]);

}

print(cbind(N, N.eff, D, W, Q, P, S));

STT520-420

31

Example 5.6 on page 83-84

## Plot the estimated survial function for each interval ##x=1:10;S=c(1, S); x=c(0, x); ## Add the starting point of S(0)=1.plot(x, S, type="s");

Greenwood’s formula gives an error bound around the lifetable estimates . I won’t go through the derivation, but if you’re interested, see pages 85-86.

Theorem 5.3 (Greenwood’s Formula). The standard error of the lifetable estimate is given by

This formula is usable as long as the effective number at risk is not too small in the intervals.

See Example 5.7 on page 87 for a use of this formula. Go over Example 5.8 - use SAS

˜ S (a j )

1,...2,1 , ~

~)(

~ ))(

~(

1

kjNp

qaSaSSE

j

iii

ijj

STT520-420

32

Greenwood’s formula for Lifetable method

Example 5.7: Find the standard error: 5-year survival prospects for melanoma patients by Greenwood formula.

STT520-420

33


SE=(0.356)*sqrt(0.047/((0.953)*(149))+0.136/((213)*(0.864))+0.148/((304)*(0.852))+0.205/((468)*(0.795))+0.361/((865)*(0.639)))= 0.01899.

1,...2,1 , ~

~)(

~ ))(

~(

1

kjNp

qaSaSSE

j

iii

ijj

Example 5.8 - use SAS

STT520-420

34

Ij, Dj, Wj, Nj’, qj, SE(qj), est of S(aj), 1- est of S(aj), SE(est of S(aj)) … pdf, SE(pdf), hazard, SE(hazard)

Example 5.8 - use SAS

STT520-420

35

That means, by HAND, we take S^~(0)=1, [we add this in] S^~(1)=0.6393 [taking the right endpoint of the intervals] and

so on. Similar ideas to the Greenwood formula. But to the output from SAS, we need to be CAREFUL! It started from

1, rather than 0.6393. Therefore, for SAS output, you need make some adjustment to understand the output as:

S^~(0)=1 [taking the left endpoints of intervals.] S^~(1) = 0.6393, and so on...

Note: Let’s compare PPT #33 and #34: When we use hand to estimate survival function by hand (in #33), the survival function starts from 0.6393, instead from 1.

More about lifetable

Background: (1) # of observations are large; and (2) event times are measured crudely. Also called Actuarial method.

Advantage: (1) life times are grouped into intervals of time (can

be as long or as short as you like) (2) Can produce estimation and plots of the hazard

function in SAS/R. Disadvantage:

(1) choice of intervals is somewhat arbitrary (uncertainty about how to choose intervals)

(2) inevitable loss of informationSTT520-420

36

Review: How to decide bandwidth for histogram

STT520-420

37

Rule of thumb: start

with 5 to 10 bins.

Look at the distribution

and refine your bins

(There isn’t a unique or

“perfect” solution)

STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420...

Documents

Transcript of STT520-420: BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 5: Censoring and Lifetables STT520-420...