Lecture 15: Time Varying Covariates Time-varying covariates.

41
Lecture 15: Time Varying Covariates Time-varying covariates

Transcript of Lecture 15: Time Varying Covariates Time-varying covariates.

Page 1: Lecture 15: Time Varying Covariates Time-varying covariates.

Lecture 15: Time Varying Covariates

Time-varying covariates

Page 2: Lecture 15: Time Varying Covariates Time-varying covariates.

Time-Dependent Covariates

• Thus far we’ve only considered “fixed” time covariates

• Examples of time varying covariates– Cumulative exposure– Smoking status– Blood pressure

• Now, data structure is– [T, d, Z(t); 0 < t < T]

Page 3: Lecture 15: Time Varying Covariates Time-varying covariates.

CPHM with Time Varying Covariates

• The model looks like what we’ve been working with.

• Now however, Z is a function of t:

0 1exp

p

k kkh t t h t Z t

Z

Page 4: Lecture 15: Time Varying Covariates Time-varying covariates.

Likelihood Time Varying Covariates

• Again, we can use the partial likelihood estimation approach for estimating b

• But Z is now a function of t (as in the model statement):

• Otherwise, testing and estimation are the same as for fixed covariates

1

1

1

exp

expi

p

k ii kD k

pi

k ii kj R t k

Z tL

Z t

Page 5: Lecture 15: Time Varying Covariates Time-varying covariates.

Example: Bone Marrow Transplant

• Main covariate of interest is disease type: – ALL– low risk AML– high risk AML

• Interest is in determining factors associated with disease-free survival (death or relapse)

Page 6: Lecture 15: Time Varying Covariates Time-varying covariates.

BMT Fixed Time Covariates

• There are several fixed time covariates we’ve found to be important– Patient Age– Donor Age– FAB identification– Disease type– Hospital

Page 7: Lecture 15: Time Varying Covariates Time-varying covariates.

BMT Time Varying Covariates

• There are also several time varying covariates– Acute graft vs. host disease (AGvHD)– Chronic graft vs. host disease (CGvHD)– Platelet recovery (PR)

• These all occur after BMT or not at all• They can also vary over the course of the

study

Page 8: Lecture 15: Time Varying Covariates Time-varying covariates.

R: Time-Varying Covariates

• Expand data to describe all scenarios• Need to consider the possible combinations of events• Example: AGVHD and DFS– Possible scenarios at any point in time during the study for

subject 1• No AGVDH: DFS?• AGVHD: DFS?

– For all patients with TTAGVHD < DFS, need two rows in dataset to describe variation

– For all patients with TTAGVHD > DFS, need only one row in the dataset

Page 9: Lecture 15: Time Varying Covariates Time-varying covariates.

Timeline Examples: Observed Event

t0 to ta: no AGVHD until ta, no event

ta to te: AGVHD, event

t0 to te: no AGVHD, event

t0

t0

ta

te

te

Page 10: Lecture 15: Time Varying Covariates Time-varying covariates.

Timeline Examples: Censored Event

t0 to ta: no AGVHD, no event

ta to tc: AGVHD, no event (censored)

t0 to tc: no AGVHD, no event

t0

t0

ta

te

tetc

tc

Page 11: Lecture 15: Time Varying Covariates Time-varying covariates.

Time-Varying Covariates

• First, look at each time varying covariate• Which (if any) are associated with DFS,

adjusting for diagnosis• Estimation and inference are the same as with

fixed time covariates• Difference– Data structure

Page 12: Lecture 15: Time Varying Covariates Time-varying covariates.

Data Set-up>data[1:15,c(1,25,4:8)]ID Disease DFS Death Relapse Either TAGvH AGvH1 1 2081 0 0 0 67 12 1 1602 0 0 0 1602 03 1 1496 0 0 0 1496 04 1 1462 0 0 0 70 15 1 1433 0 0 0 1433 06 1 1377 0 0 0 1377 07 1 1330 0 0 0 1330 08 1 996 0 0 0 72 19 1 226 0 0 0 226 010 1 1199 0 0 0 1199 011 1 1111 0 0 0 1111 012 1 530 0 0 0 38 113 1 1182 0 0 0 1182 014 1 1167 0 0 0 39 115 1 418 1 0 1 418 0

Page 13: Lecture 15: Time Varying Covariates Time-varying covariates.

Expansion• Consider row 1– Now, two rows– Row 1: start time = 0, stop time = 67, agvhd = 0, …– Row 2: start time = 67, stop time = 2081, agvhd =

1, …• Consider row 2– Still 1 row– Row 1: start time = 0, stop time = 1602, agvhd = 0,

Page 14: Lecture 15: Time Varying Covariates Time-varying covariates.

What About Dependence?

• You might be asking whether we need to worry about correlated data?

• In this case we do not need to worry about it.

• There two exceptions:– When subjects have multiple events – When a subject appears in overlapping intervals

• The 2nd case is almost always a data error

• A subject can be at risk in multiple strata at the same time– Corresponds to being simultaneously at risk for two distinct

outcomes.

Page 15: Lecture 15: Time Varying Covariates Time-varying covariates.

R Expansionn<-nrow(bmt)adata<-bmt[, c(1:2,14:23)] #fixed time columnsfor (i in 1:n){ times1<-c(bmt$TAGvH[i], bmt$TCGvH[i], bmt$TRP[i], bmt$DFS[i]) events<-c(bmt$AGvH[i], bmt$CGvH[i], bmt$RP[i], bmt$Either[i]) times2<-times1[which(times1<=times1[4])] utimes<-sort(unique(times2)) for (j in 1:length(utimes)) { if (length(utimes)==1) {vec<-events} if (length(utimes)>1 & j==1) {vec<-c(0,0,0,0)} if (j>1 & j<length(utimes)){loc<-which(times1==utimes[j-1])

vec<-replace(vec, loc, events[loc]) } if (j>1 & j==length(utimes)) {loc<-which(times1==utimes[j-1])

vec<-replace(vec, c(loc,4), events[c(loc,4)])} if (j==1 & i==1) {bmt.long<-unlist(c(0, utimes[j], adata[i,], vec))} if (j==1 & i>1) {bmt.long<-rbind(bmt.long, c(0, utimes[j], adata[i,],vec))} if (j>1) {bmt.long<-rbind(bmt.long, c(utimes[j-1], utimes[j], adata[i,],vec))} } }bmt.long<-as.data.frame(matrix(as.vector(unlist(bmt.long)), nrow=342, ncol=18, byrow=F))colnames(bmt.long)<-c("Tstart","Tstop",colnames(adata),"AGvH","CGvH","PR","event") sum(bmt.long$event)

Page 16: Lecture 15: Time Varying Covariates Time-varying covariates.

Expanded Data> bmt[1:2,] ID Disease TTD TTR Death Relapse Either TAGvH AGvH TCGvH CGvH TRP RP PtAge 1 1 2081 2081 0 0 0 67 1 121 1 13 1 26 2 1 1602 1602 0 0 0 1602 0 139 1 18 1 21….

> bmt.long[1:8,] Tstart Tstop ID Disease PtAge AGvH CGvH PR event 0 13 1 1 26 0 0 0 0 13 67 1 1 26 0 0 1 0 67 121 1 1 26 1 0 1 0 121 2081 1 1 26 1 0 1 0 0 18 2 1 21 0 0 0 0 18 139 2 1 21 0 0 1 0 139 1602 2 1 21 0 1 1 0 0 12 3 1 26 0 0 0 0….

Page 17: Lecture 15: Time Varying Covariates Time-varying covariates.

Alternatively Use:expand.breakpoints

• Previous creates dataset per time-dependent covariate

• Above created by John Maindonald• Expands dataset into rows per person using

either observed number of times, or pre-specified number of times

Page 18: Lecture 15: Time Varying Covariates Time-varying covariates.

expand.breakpoints Approach

> bps<-sort(unique(c(bmt$DFS, bmt$TAGvH, bmt$TCGvH, bmt$TRP)))> bps [1] 1 2 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 … [215] 1850 1857 1870 2024 2081 2133 2140 2204 2218 2246 2252 2409 2430 2506 2569 2640

> bmt.long2<-expand.breakpoints(bmt, index="id", status="Either", tevent="DFS", breakpoints=bps)> bmt.long2 ID Tstart Tstop Either epoch Disease TTD TTR Death Relapse TAGvH AGvH TCGvH CGvH TRP RP 1 0 1 0 1 1 2081 2081 0 0 67 1 121 1 13 1 1 1 2 0 2 1 2081 2081 0 0 67 1 121 1 13 1 1 2 7 0 3 1 2081 2081 0 0 67 1 121 1 13 1 1 7 8 0 4 1 2081 2081 0 0 67 1 121 1 13 1 … 1 1870 2024 0 218 1 2081 2081 0 0 67 1 121 1 13 1 1 2024 20 81 0 219 1 2081 2081 0 0 67 1 121 1 13 1 2 0 1 0 1 1 1602 1602 0 0 1602 0 139 1 18 1

Page 19: Lecture 15: Time Varying Covariates Time-varying covariates.

Still Not Done

• That provides us with separate intervals per patient for all intervals of interest

• BUT, treats AGvHD, CGvHD, and PR as “fixed” time covariates

• We need to create time-dependent versions

Page 20: Lecture 15: Time Varying Covariates Time-varying covariates.

R#create time-dependent covariates> bmt.long$AGvHt<-ifelse(bmt.long$TAGvH<=bmt.long$Tstart &

bmt.long$AGvH==1, 1, 0)> bmt.long$CGvHt<-ifelse(bmt.long$TCGvH<=bmt.long$Tstart &

bmt.long$CGvH==1, 1, 0)> bmt.long$PRt<-ifelse(bmt.long$TRP<=bmt.long$Tstart &

bmt.long$PR==1, 1, 0)

#Look again at pts 1 and 2 to see time dependent variables> bmt.long2$AGvH[which(bmt.long2$ID==1)][1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 … [175] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

> bmt.long$AGvHt[which(bmt.long$id==1)][1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 … [175] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 21: Lecture 15: Time Varying Covariates Time-varying covariates.

Syntax in R

• To define time to event variable, there are two options:– Surv(time, y)– Surv(start.time, stop.time, y)

• For time varying covariates (or left-truncated data), usually simpler to use the latter convention

• In most other cases, simpler to use the former

Page 22: Lecture 15: Time Varying Covariates Time-varying covariates.

Testing Time-Varying Covariates Controlling for Diagnosis

#Acute graft vs. host disease#Chronic graft vs. host disease#Platelet recovery time

rega<-coxph(Surv(Tstart, Tstop, event)~ AGvHDt+factor(Disease), data=bmt.long2)

regc<-coxph(Surv(Tstart, Tstop, event)~ CGvHDt+factor(Disease), data=bmt.long2)

regp<-coxph(Surv(Tstart, Tstop, event)~ PRt+factor(Disease), data=bmt.long2)

Page 23: Lecture 15: Time Varying Covariates Time-varying covariates.

AGvHD> regaCall:coxph(formula = Surv(Tstart, Tstop, Either) ~ AGvHt + factor(Disease), data = bmt.long2)

coef exp(coef) se(coef) z pAGvH 0.323 1.381 0.285 1.31 0.264factor(Disease)2 -0.551 0.576 0.288 -1.91 0.055factor(Disease)3 0.435 1.546 0.272 1.60 0.110

Likelihood ratio test=14.7 on 3 df, p=0.00214 n= 19070, number of events= 83

Page 24: Lecture 15: Time Varying Covariates Time-varying covariates.

CGvHD

> regcCall:coxph(formula = Surv(Tstart, Tstop, Either) ~ CGvHt + factor(Disease), data = bmt.long2)

coef exp(coef) se(coef) z pCGvHt -0.186 0.830 0.288 -0.646 0.520factor(Disease)2 -0.620 0.538 0.296 -2.094 0.036factor(Disease)3 0.367 1.444 0.268 1.368 0.170

Likelihood ratio test=13.9 on 3 df, p=0.00309 n= 19070, number of events= 83

Page 25: Lecture 15: Time Varying Covariates Time-varying covariates.

Platelet Recovery

> regpCall:coxph(formula = Surv(Tstart, Tstop, Either) ~ PRt + factor(Disease), data = bmt.long2)

coef exp(coef) se(coef) z pPRt -1.120 0.326 0.329 -3.40 0.00067factor(Disease)2 -0.497 0.608 0.289 -1.72 0.08600factor(Disease)3 0.382 1.465 0.268 1.43 0.15000

Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 19070, number of events= 83

Page 26: Lecture 15: Time Varying Covariates Time-varying covariates.

Interpretation?

• Patients with low risk AML have less risk of an event compare to ALL patients

• Patients with high risk AML have greater risk of an event relative to patients with ALL

• Patients who experience platelet recovery at a given time have less risk of an event relative to those who have not experienced platelet recovery

Page 27: Lecture 15: Time Varying Covariates Time-varying covariates.

Back to Our Original Models

• Only platelet recovery is significantly associated with disease free survival

• Now investigate model that adjusts for previously mentioned fixed time covariates– Disease type– FAB– Donor/patient age and interaction– hospital

Page 28: Lecture 15: Time Varying Covariates Time-varying covariates.

Models with and without PRt#Model w/ donor/patient age, intx, FAB, dx, hosp, & PR> st<-Surv(bmt.long2$Tstart, bmt.long2$Tstop, bmt.long2$Either)

> reg.fixed<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge, data=bmt.long2)

> reg.tv<-coxph(st~factor(Disease)+PRt, data=bmt.long2)

> reg.all<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge+PRt, data=bmt.long2)

> LRT<-2*(reg.all$loglik[2]-reg.tv$loglik[2])> pchisq(LRT, 4, lower.tail=F)[1] 0.001878685

Page 29: Lecture 15: Time Varying Covariates Time-varying covariates.

Recall Fixed Time Covariate Model> reg.fixedCall:coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 -1.09065 0.336 0.354279 -3.08 0.00210factor(Disease)3 -0.40391 0.668 0.362777 -1.11 0.27000FAB 0.83742 2.310 0.278464 3.01 0.00260PtAge -0.08164 0.922 0.036107 -2.26 0.02400DonAge -0.08459 0.919 0.030097 -2.81 0.00490PtAge:DonAge 0.00316 1.003 0.000951 3.32 0.00089

Likelihood ratio test=32.8 on 6 df, p=1.14e-05 n= 342, number of events= 83

Page 30: Lecture 15: Time Varying Covariates Time-varying covariates.

Time Covariate + Disease Type> reg.tvCall:coxph(formula = st ~ factor(Disease) + PR, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 -0.497 0.608 0.289 -1.72 0.08600factor(Disease)3 0.382 1.465 0.268 1.43 0.15000PR -1.120 0.326 0.329 -3.40 0.00067

Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 342, number of events= 83

Page 31: Lecture 15: Time Varying Covariates Time-varying covariates.

Full Model> reg.allCall:coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + PR, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 -1.03245 0.356 0.353200 -2.92 0.0035factor(Disease)3 -0.41398 0.661 0.365222 -1.13 0.2600FAB 0.81180 2.252 0.283236 2.87 0.0042PtAge -0.07102 0.931 0.035449 -2.00 0.0450DonAge -0.07607 0.927 0.030007 -2.54 0.0110PR -0.98307 0.374 0.338109 -2.91 0.0036PtAge:DonAge 0.00287 1.003 0.000935 3.07 0.0021

Likelihood ratio test=39.9 on 7 df, p=1.3e-06 n= 342, number of events= 83

Page 32: Lecture 15: Time Varying Covariates Time-varying covariates.

Interactions Coding by Hand#Interaction coding#Diagnosis 2 (low risk AML)*PRT#Diagnosis 3 (hi risk AML)*PRT#FAB*PRT#PRT*donor age, PRT*patient age, PRT*Donor age*Patient agebmt.long2$ageint<-(bmt.long2$PtAge-28)*

(bmt.long2$DonAge-28)bmt.long2$dx2.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==2, 1, 0)bmt.long2$dx3.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==3, 1, 0)bmt.long2$fab.pr<-bmt.long2$PRt*bmt.long2$FABbmt.long2$dnr.pr<-bmt.long2$PRt*(bmt.long2$DonAge-28)bmt.long2$pt.pr<-bmt.long2$PRt*(bmt.long2$PtAge-28)bmt.long2$pt.pr.dnr<-bmt.long2$PRt*(bmt.long2$ageint)

Page 33: Lecture 15: Time Varying Covariates Time-varying covariates.

Interactions1. Diag 2 x PRT2. Diag 3 x PRT3. PRT x donor age4. PRT x patient age5. PRT x donor age x patient age (confusing)

1. “additional hazard of failure after platelet recovery in those with diagnosis of low risk AML vs. those with ALL”2. “additional hazard of failure after platelet recovery in those with diagnosis of high risk AML vs. those with ALL”3. “additional hazard of failure after platelet recovery with an increase in donor age”4. “additional hazard of failure after platelet recovery with an increase in patient age”5. “additional hazard of failure after platelet recovery with an increase in the interaction between the patient and donor age”

Page 34: Lecture 15: Time Varying Covariates Time-varying covariates.

Series of Modelsreg1<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt, data=bmt.long2)

reg2<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+dx3.pr, data=bmt.long2)

reg3<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+fab.pr, data=bmt.long2)

reg4<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dnr.pr+pt.pr+ pt.pr.dnr, data=bmt.long2)

reg5<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr, data=bmt.long2)

reg6<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2)

reg7<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2)

Page 35: Lecture 15: Time Varying Covariates Time-varying covariates.

Full Model with Interactions> reg7 coef exp(coef) se(coef) z pfactor(Disease)2 1.325 3.765 0.819 1.618 0.1100factor(Disease)3 1.134 3.108 1.225 0.926 0.3500FAB -1.250 0.286 1.112 -1.124 0.2600DonAge 0.116 1.123 0.043 2.679 0.0074PtAge -0.154 0.857 0.054 -2.820 0.0048ageint 0.0026 1.003 0.001 1.337 0.1800PRt -0.286 0.751 0.695 -0.412 0.6800dx2.pr -3.057 0.047 0.926 -3.299 0.0010dx3.pr -1.894 0.150 1.291 -1.467 0.1400fab.pr 2.471 11.831 1.159 2.131 0.0330dnr.pr -0.147 0.863 0.048 -3.054 0.0023pt.pr 0.193 1.213 0.058 3.289 0.0010pt.pr.dnr 0.000 1.000 0.002 0.060 0.9500

Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 19070, number of events= 83

Page 36: Lecture 15: Time Varying Covariates Time-varying covariates.

Fitting Interactions Directly> reg7bCall:coxph(formula = st ~ factor(Disease) + FAB + DonAge + PtAge + DonAge * PtAge + PR + PR * factor(Disease) + PR * FAB + PR* DonAge + PR * PtAge + DonAge * PtAge * PR, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 1.3257 3.765 0.81952 1.618 0.11000factor(Disease)3 1.1341 3.108 1.22487 0.926 0.35000FAB -1.2503 0.286 1.11245 -1.124 0.26000DonAge 0.0436 1.045 0.05866 0.744 0.46000PtAge -0.2264 0.797 0.09118 -2.484 0.01300PR -1.4817 0.227 2.11360 -0.701 0.48000DonAge:PtAge 0.0026 1.003 0.00194 1.337 0.18000factor(Disease)2:PR -3.0568 0.047 0.92646 -3.299 0.00097factor(Disease)3:PR -1.8941 0.150 1.29132 -1.467 0.14000FAB:PR 2.4707 11.831 1.15926 2.131 0.03300DonAge:PR -0.1506 0.860 0.06967 -2.162 0.03100PtAge:PR 0.1894 1.209 0.10127 1.871 0.06100DonAge:PtAge:PR 0.000138 1.000 0.00230 0.060 0.95000

Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 342, number of events= 83

Page 37: Lecture 15: Time Varying Covariates Time-varying covariates.

Low Risk AML vs. ALL

• Interaction between diagnosis and platelet recovery

• Low-risk AML vs. ALL, prior to platelet recovery– b = 1.326– HR (95% CI): 3.76 (0.76, 18.76)

• Low-risk AML vs. ALL, after platelet recovery– b = 1.326 + (-3.06) = -1.73– HR (95% CI): 0.18 (0.08, 0.41)

Page 38: Lecture 15: Time Varying Covariates Time-varying covariates.

R Code for the HR and 95% CI> betahr<-reg7$coef[1]+reg7$coef[8]> betahrfactor(Disease)2 -1.731125

> seintx<-sqrt(reg7$var[1,1]+reg7$var[8,8]+2*reg7$var[1,8])> seintx[1] 0.4263292

> exp(betahr - qnorm(0.975)*seintx)factor(Disease)2 0.07678741

> exp(betahr + qnorm(0.975)*seintx)factor(Disease)2 0.408389

Page 39: Lecture 15: Time Varying Covariates Time-varying covariates.

Other Interactions?

• High risk AML vs. ALL? High risk AML vs. Low Risk AML?

• Age?• …

Page 40: Lecture 15: Time Varying Covariates Time-varying covariates.

What About Continuous Covariates

• Continuous variables can change over time as well

• Given the times measurements are taken, we can expand the data in the same way.

• We are assuming the value is unchanging during the interval between which it was measured– A little unrealistic BUT…– This is no different from treating a single measure

(e.g. blood pressure) as a fixed time covariate

Page 41: Lecture 15: Time Varying Covariates Time-varying covariates.

Next Time

• Regression Diagnostics… checking the proportional hazards assumption.