© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077:...

28
© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The Topics Covered In Today’s Overview? Topic Slide I. Describing Continuous-Time Event Occurrence Data: 1. Salient Features of Continuous-Time Event Data. 2. Redefining the Survivor And Hazard Functions And Strategies For Estimation. 3. The Cumulative Hazard Function. 4. Developing Your Intuition About Survivor, Cumulative Hazard and Kernel-Smoothed Hazard Functions. II. Fitting Cox Regression Models: 1. Towards a Statistical Model for Continuous-Time Hazard. 2. Fitting the Continuous Time Hazard Model to Data. 3. Evaluating the Results of Model Fitting. 4. Graphically Displaying the Results of Model Fitting. III. Extending the Cox Regression Model : 1. Including Time-Varying Predictors in the Cox Regression Model. 2. Non-Proportional Hazards Cox Regression Models.

Transcript of © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077:...

Page 1: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1

S077: Applied Longitudinal Data AnalysisWeek #6: What Are The Topics Covered In Today’s Overview?

Topic Slide

I. Describing Continuous-Time Event Occurrence Data:1. Salient Features of Continuous-Time Event Data.2. Redefining the Survivor And Hazard Functions And Strategies For Estimation.3. The Cumulative Hazard Function.4. Developing Your Intuition About Survivor, Cumulative Hazard and Kernel-

Smoothed Hazard Functions.

II. Fitting Cox Regression Models:1. Towards a Statistical Model for Continuous-Time Hazard.2. Fitting the Continuous Time Hazard Model to Data.3. Evaluating the Results of Model Fitting.4. Graphically Displaying the Results of Model Fitting.

III. Extending the Cox Regression Model:1. Including Time-Varying Predictors in the Cox Regression Model.2. Non-Proportional Hazards Cox Regression Models.

Page 2: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

… And There Exist An Infinite Number Of Such Instants … any division of continuous time—weeks,

days, hours, etc—can always be made finer. (in contrast to the finite—and usually small—number of values for

TIME in discrete-time)

… the Probability Of Observing Any Particular Event Time Is Infinitesimally Small

… (and approaches 0 as time’s divisions get finer).

• This has serious implications for the definition of hazard -- the lynchpin of any survival analysis.

• We must define continuous-time hazard differently, making it more difficult to estimate and display it in data analysis.

… the Probability That Ties—When Two Or More People Have the Same Event Time– Will

Occur Is Infinitesimally Small …• Continuous-time survival methods were developed

assuming that ties never occur. • Unfortunately, ties are usually present in real

“continuous time” data? Why? Because, while underlying true times to event may be truly continuous, times recorded in the data are usually rounded to the nearest unit (year, month, week, etc).

• This can lead to difficulties, and ad-hoc fix-ups.

In Continuous Time … We Know The Precise Instant That the Events Occur …

e.g., Jane took her first drink at 6:19 after release from an alcohol treatment program

(ALDA, Section 13.1.1, pp 469-471) © Willett & Singer, Harvard University Graduate School of Education S077/Week #6 Slide 2

S077: Applied Longitudinal Data AnalysisWhat Happens When We Record Event Occurrence In Continuous Time?

This Implies that…

Page 3: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

Data source: Diekmann & colleagues (1996), Journal of Social Psychology.

Sample: 57 motorists in Munich, Germany (purposefully) blocked at a green light by a Volkswagen Jetta.

Research design:Tracked from light change until horn honk:

n=43 (75.4%) honked their horns before the light turned red; the rest are censored.

Event time recorded to the nearest 100th of a second!

(ALDA, Section 13.1.1, pp 471-472)

The only tie!

A few very patient people?

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 3

S077: Applied Longitudinal Data AnalysisWhat Do Continuous Time Event History Data Look Like?

Page 4: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 13.1.2 & 13.1.2, pp 472-475)

jiij tTtS Pr)(

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 4

S077: Applied Longitudinal Data AnalysisDefining the Continuous Time Survivor Function

Notation for Continuous-Time Event Data: Ti is a continuous random variable representing the event time for individual i.

tj clocks the infinite number of instants – the “true” time -- when the event could actually occur.

CENSORi indicates whether Ti is censored.

Survival Probability, and the Survivor Function, have the same definition in continuous-time as they do in discrete-time because they refer to the probability that a person will experience the event after a particular instant …that is, in an interval!

Survival Probability, and the Survivor Function, have the same definition in continuous-time as they do in discrete-time because they refer to the probability that a person will experience the event after a particular instant …that is, in an interval!

How Do We Estimate Survival Probability and the Survival

Function, in Continuous Time?Several approaches (see ALDA):• Discrete-Time Method.• Actuarial Method.• Kaplan-Meier (“Product Limit”) Method.

Page 5: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 13.3, p 483-491)

Estimated median lifetime=3.5769 seconds

Kaplan-Meier Estimate Of The Survivor Function:

Note how smooth these estimates are.

)(ˆ1)(ˆ1)(ˆ1)(ˆ21 jj tptptptS

Conditional Probability Of Event Occurrence in period j:

Note how erratic they are, especially as risk set declines in later intervals.

j

jj riskatn

eventsntp )(ˆ

Key idea:Use observed event times to construct time intervals of different lengths, such that each interval contains only one observed event time, then apply standard discrete-time methods:• By convention, construct an initial interval [0, 1.41).• Since the first 3 observed event times are 1.41, 1.51

and 1.67, construct two subsequent intervals: [1.41, 1.51), [1.51, 1.67).

• Continue through all the observed event times.

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 5

S077: Applied Longitudinal Data AnalysisEstimating The Continuous-Time Survivor Function –Kaplan-Meier Approach

Page 6: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

But How Do We Estimate the Continuous-Time Equivalent of the Hazard Probability – In Fact, Is There Such An Equivalent?

Advantages of the KM approach:• Uses all the observed information on the continuous

event times without grouping or “binning up.”• If event occurrence is recorded in a truly continuous

time-metric, the estimated survivor function appears almost ‘continuous.’

• The estimated survivor function is as refined as the fineness of the data collection.

Drawbacks of KM approach:• When examining plots for subgroups, any“drops”

will occur in different places making visual comparison trickier.

• No corresponding (decent) estimate of hazard is available. You could compute:

but these estimates tend to be too erratic to be of much direct use.

j

jKMjKM width

tpth

)(ˆ)(ˆ

(ALDA, Section 13.3, p 483-491)

5 10 15 20

Seconds after light turns green

0

0.00

0.25

0.50

0.75

1.00

S(tj )

Kaplan Meier

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 6

S077: Applied Longitudinal Data AnalysisKaplan-Meier Estimates of the Sample Survivor Function: Pros And Cons

Page 7: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 13.1.2 & 13.1.2, pp 472-475)

HazardHazard again assesses risk—at a particular moment—

that an individual, who has not yet done so, will experience the event …

But, it cannot be defined as a probability because, in continuous time, any probability will always tend to

zero!

Divide time into an infinite number of vanishingly small intervals:

[tj , tj + t)

t

tTtttinterval the in falls TPr Limit)h(t jijji

0 tij

|),[

Instead, We Define Hazard As A “Rate”… the limit of the probability that Ti falls in the interval,

divided by the width of the interval (t), as t 0 :

Tips on Interpreting Continuous-Time Hazard:• It’s not a probability—it’s a“rate” or “probability per

unit of time.”• You need to be explicit about the unit of time—60

mph, 60K/yr.• Unlike discrete-time hazard probabilities,

continuous-time hazard rates can exceed 1 (this has implications for statistical we model log hazard).

• (Intuition – similar to thinking about a number of events occurring in a finite period, and then dividing by the length of the period.)

includes tj excludes tj + t

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 7

S077: Applied Longitudinal Data AnalysisDefining the Continuous-Time “Hazard–Rate”

Page 8: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 13.4, pp 488-491)

Cumulative Hazard-Rate Function • Assesses the total amount of accumulated

risk individual i has faced from the beginning of time (t0) to the present (tj)

• By definition, begins at 0 and rises monotonically over time (never decreasing).

• Has no directly interpretable metric, and is not a probability.

• Cumulation prevents you from using it to directly assess unique risk but, examining its changing shape would allow us to deduce the information we need.

• And, the good news is that it can be estimated directly from the survivor function.

,)()( ijt and t between

ij th cumulationtHj0

ConclusionIf you had a way of estimating H(t), you could deduce the shape of h(t) by

studying how the gradient of H(t) changed over time -- any change in the gradient would reflect a corresponding change in h(t).

First, let’s think conceptually and imagine the transition

from h(t) to H(t) In this example, because h(t) is

constant, the corresponding H(t) would increase linearly with time (because the same fixed amount of risk—the constant value of hazard—is added to the prior cumulative level at each successive instant)

Now, think your way back from H(t) To h(t)

Because this is what you will actually need to do in practice. • Guesstimate the rate of

increase in H(t) at different points in time.

• Because the slopes are identical, here the rate of change in H(t) is constant over time, indicating that the level of h(t) is constant over time.

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 8

S077: Applied Longitudinal Data AnalysisEstimating the Hazard-Rate: The Key is the Cumulative Hazard-Rate Function

Page 9: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 13.4.1, pp 488-491)

h(t) must be increasing(the linear increase in h(t) is not

guaranteed, but a steady increase is).

h(t) must be initially low, then increase and then decrease

When rate of increase in H(t) reverses itself, h(t) has hit a peak (or trough).

h(t) must be decreasing Over time, a smaller amt of risk is added to H(t) suggesting the asymptote in h(t).

H(t) accelerates over time H(t) decelerates over time H(t) accelerates then decelerates

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 9

S077: Applied Longitudinal Data AnalysisFrom Cumulative Hazard-Rate To Hazard-Rate: Develop Your Intuition

Page 10: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 13.4.2, p 491-494)

Conclusion:

Hazard-rate is initially low, increases until around the 5th second, and then decreases

again.

How Can We Systematically Quantify These Changing Rates Of Increase So As To Estimate Hazard-Rate?

0 5 10 15 20

Seconds after light turns green

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

H(tj )

Examining the changing slopes in to learn about the hazard-rate …

)(ˆ tH

The “-Ln S(t)” Method• It requires calculus to prove, but it can be

established that H(tj) = -ln S(tj).

• So … you can estimate H(t) by taking the negative log of the KM estimated survivor function.

Negative log survivor

slowest rate of increase

faster rate of increase

slowing down

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 10

S077: Applied Longitudinal Data AnalysisCumulative Hazard-Rate Function In Practice: Estimation Methods & Data Analytic Practice

Page 11: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 13.5, p 494-497)

Bandwidth=1

Bandwidth=3

Bandwidth=2

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 11

S077: Applied Longitudinal Data AnalysisKernel-Smoothed Estimates Of The Hazard-Rate Function

Idea:Use The Changing Rates Of Change In Cumulative

Hazard To Generate (Admittedly Erratic) Hazard-Rate Estimates & Smooth Them Out …

• h(tj) = rate of change in {-ln S(tj)}.

• So … successive differences in sample cumulative hazard yield “pseudo-slope” estimates of hazard.

• Slide a temporal window—a “bandwidth”— across the plot and aggregate these estimates together , in a moving average.

• Yields “kernel-smoothed” approximate hazard-rate estimates.

Finally… A Computational Window On The Continuous-Time Hazard-Rate

But, as the bandwidth widens: • The link between the smoothed function and the actual

hazard-rate is weakened because we are estimating hazard-rate’s average within a broader timeframe.

• The estimates cannot be computed near to the beginning and the end, because of the need to average (a big problem if hazard is highest initially)

Page 12: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

Sample: 194 inmates released from a minimum security prison

Research Design: Each was followed for up to 3 years.

Event: Whether and, if so, when they were re-arrested.

Arrest recorded to the nearest day.

N=106 (54.6%) were reincarcerated.

Data source:Kristin Henning and colleagues, Criminal Justice and Behavior

Person-Level Dataset (note, we do not use a person-period data set)

PERSONAL—identifies the 61 former inmates (31.4%) who

had a history of person-related crimes (e.g., assault,

kidnapping).

PROPERTY—identifies the 158 (81.4%) who had a history of

property crimes

AGE at release—centered on sample

mean of 30.7

Event Occurrence Information

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 12

S077: Applied Longitudinal Data AnalysisIntroducing Cox Regression Analysis: Illustrative Data-Example

Page 13: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.1.1, pp. 504-507)

Sample Survivor Functions Recidivism is high in both

groups, and those with a history of person-related crimes are at greater risk

(ML of 17.3 vs. 13.1) .

Sample Cumulative Hazard Functions

Approximately linear immediately after release and soon accelerates (but at

different times); eventually both decelerate. Suggests that each underlying hazard function is

initially steady, then rises, then falls.

Sample Kernel-Smoothed Hazard Functions

Can’t describe risk immediately after release, but by month 8, we can see that the hazard-rate for those with

PERSONAL=1 is consistently higher than for those with

PERSONAL=0 .

Intuitively, a continuous-time hazard-rate model should look like a DT hazard probability model, in which a sensible transformation of hazard is expressed as the sum of two components:

• A baseline function, the value of transformed hazard-rate when all predictors are 0

• A weighted linear combination of predictors.

Realistically, because we lack a complete picture of hazard, we develop the model conceptually in terms of cumulative hazard. Then, we use algebra to deduce an equivalent specification of the model in terms of hazard-rate.

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 13

S077: Applied Longitudinal Data AnalysisTowards The Cox Model For Continuous-Time Hazard-Rate: Sample Functions, By PERSONAL

Page 14: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.1.1, pp. 504-507)

What Kind Of Statistical Model Should We Use To Model Log H(t)?

A dual partition still makes sense, with log H(t) expressed as the sum of two parts:

PERSONAL = 0

PERSONAL = 1

0 6 12 18 24 30 36

Months after release

0.00

0.50

1.00

1.50H(t j )

Problem:Cumulative Hazard Rate is bounded below at 0

PERSONAL = 0

PERSONAL = 1

0 6 12 18 24 30 36

Months after release

0.00

1.00

-1.00

-2.00

-3.00

-4.00

-5.00

-6.00

Log H(tj )

Solution:Model the Log of Cumulative Hazard-Rate …

Expands vertical separation at smaller

values

Compresses vertical separation at higher

values

A Baseline Function, now the value of log

H(t) when all predictors are 0

A weighted linear combination of the

predictors.

But, How Do We Specify This Baseline?

As in DTSA, we use a completely general unconstrained profile, which we’ll call log H0(tj),

and we won’t even estimate it!!!You might think that this in-built vagueness creates

problems for estimation, but the beauty of Cox’s approach is that it’s perfectly fine.

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 14

S077: Applied Longitudinal Data AnalysisTowards The Cox Regression Model For Cumulative Hazard-Rate

Page 15: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.1.2, pp. 507-512)

ij0ij PERSONALtH logH(t log 1)()

)() j0ij tH logH(t log

0PERSONAL when

1)()

j0ij tH logH(t log

1PERSONAL when

When PERSONAL=1, the baseline function

shifts “vertically” by 1

Log H(tij

)

"

"

""""""""""""""""" "" ""

"" """ """" "

""" " "

!

!

!!!!!!!!!!!!!!! !!

!!!!!!!!!!!!!!!!!!! !! !!!!!!!

!! !! !! !!! ! !!!

! ! !!!

0 6 12 18 24 30 36

Months after release

0.00

1.00

-1.00

-2.00

-3.00

-4.00

-5.00

PERSONAL = 0

PERSONAL = 1

Mapping the model onto sample log cumulative hazard functions(using +’s and ’s to denote estimated subsample values)

Log H0

(tj )

Log H0(t

j ) + b

1Curves are hypothesized

population log cumulative hazard-

rate functions

(they should go through sample data but we don’t

expect them to fit perfectly)

Vertical distance between functions, b1,

captures the magnitude of the predictor’s effect.

(We assume that the effect is constant regardless of

how long the offender has been out of prison)

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 15

S077: Applied Longitudinal Data AnalysisSpecifying The Cox Regression Model In Terms Of Log Cumulative Hazard-Rate

Page 16: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.1.2, pp. 507-512)

i

ij0ij

PERSONALj0ij

PERSONALtH logH(t log

etH H(t

e e1

1

)()

)()

)() j0ij tH H(t

0PERSONAL when

1)() etH H(t

1PERSONAL when

j0ij

When PERSONAL=1, the baseline function is

no longer shifted vertically; instead, it is multiplied by exp(1).

Mapping the model onto sample cumulative hazard-rate functions(using +’s and ’s to denote estimated subsample values)

H(t ij)

0 6 12 18 24 30 36

Months after release

0.00

0.50

1.00

1.50

PERSONAL = 0

PERSONAL = 1

H0 (tj )

H0 (tj )exp( b 1)

Curves are hypothesized population cumulative hazard-rate functions

Ratio of Cumulative Hazard-Rate Functions

11 exp

)(

exp)(

j0

j0

tH

tH

ratio=exp(b1)

When the outcome is raw Cumulative Hazard-Rate, the functions are magnifications and diminutions of

each other—they are Proportional.

Yet we still say the effect is constant over time because their ratio is constant

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 16

S077: Applied Longitudinal Data AnalysisAntilogging To Specify The Cox Regression Model In Terms Of Cumulative Hazard-Rate

Page 17: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.1.3, pp. 512-516)

Using calculus, we can show that the Cox Regression Models just specified in terms of Cumulative Hazard-Rate are identical to those expressed in terms of Raw Hazard-Rate.

ij0ij XtH logH(t log 1)()

iXj0ij etHH(t 1)()

H(t ij )

0 25 50 75 100

Time

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

H0(tj )

H0(tj )exp( 1)Ratio = exp( 1)

Log H(tij )

0 25 50 75 100

Time

0.00

2.00

4.00

-2.00

-4.00

-6.00

-8.00

-10.00

-12.00

-14.00

Log H0(tj )

Log H0(tj ) +

1

1Difference =

When expressed on a log scale, b1 represents a

constant vertical distance

When expressed on a raw scale, exp(b1) represents a

constant vertical ratio

CumulativeHazard-Rate Format

h(tij )

0 25 50 75 100

Time

0.00

0.05

0.10

0.15

0.20

0.25

h0(tj )

h0(tj )exp( 1)

Ratio = exp( 1)

Log h(tij )

0 25 50 75 100

Time

0.00

2.00

4.00

-2.00

-4.00

-6.00

-8.00

-10.00

-12.00

-14.00

Log h0(tj )

Log h0(tj ) +

1

1Difference =ij0ij Xth logh(t log 1)()

iXj0ij ethh(t 1)()

RawHazard-Rate Format

Practical Consequences

1. Can conduct exploratory data analysis using cumulative hazard-rate.

2. Can interpret parameter estimates in terms of predictors’ effects on hazard-rate.

3. Because raw hazard-rate profiles at different levels of the predictors are proportional, the Cox model is often called a “proportional hazards model.”

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 17

S077: Applied Longitudinal Data AnalysisHazard-Rate Representation Of The Cox Regression Model

Page 18: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.2, pp 516-523)

PijPijijj0ij

PijPijijj0ij

XXXthh(t log

XXXthh(t

2211

2211

)(log)

exp)()

Estimation

In addition to specifying a statistical model for hazard-

rate, Cox developed an ingenious method for fitting his “Cox Regression Model” to data, called the method of Partial Maximum Likelihood Estimation (and is available in all major stat packages (See §

14.2)).

Three Practical Consequences Of Cox’s Method

• The Shape Of The Baseline Hazard-Rate Function Is Irrelevant. Unlike parametric methods, we need not make any assumptions about the shape of the baseline hazard-rate function.

• The Precise Event Times Turn Out To Be Irrelevant; Only Their Rank Order Matters. Cox regression analysis is semi-parametric. The very data that you took pains to collect so precisely is converted effectively into ranks during model fitting!

• Ties Can Create Analytic Difficulties. Even though the specific time values are irrelevant, their ranking does matter. In theory, there should be no ties; in reality, there always are. (In the recidivism data, there are 5 days when 2 people were arrested—9, 77, 178, 207, & 528.) All packages have one or more ad-hoc ways of dealing with this (we use Efron’s Method).

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 18

S077: Applied Longitudinal Data AnalysisFitting The Cox Regression Model To Data

Page 19: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

Simple Uncontrolled Models

Overall Model

Strategy for interpreting parameter estimates:Each summarizes the impact of a one-unit

difference in the predictor on the log

hazard-rate, controlling for other predictors in the

model.

Log-hazard-rate for someone with a history of personal offenses is

0.479 units higher than for someone without this history.

What Does This Look Like Graphically?Returning to the earlier sample log cumulative hazard-rate functions, by PERSONAL, we estimate that in the

population, the average distance between them is 0.479

But, Is There A More Intuitive Way Of Explaining This?

(ALDA, Section 14.3.1, p. 524-528) © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 19

S077: Applied Longitudinal Data AnalysisInterpreting Parameter Estimates From a Fitted Cox Regression Model

Page 20: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

You can antilog the parameter estimates and interpret them as fitted hazard-rate ratios associated with a 1-unit

difference in the predictor …

e1.1946

The estimated hazard-rate describing recidivism among offenders with a history of property offenses is more

than three times that of those with no such history.

For continuous predictors Compute the %age difference in hazard associated with a 1-unit

difference in the predictor: 100*(hazard ratio-1)

100*(0.9342-1) = -6.58%The estimated hazard-rate for

recidivism is 6.6% lower for each additional year of age upon

release

Careful: You Can Only Make Comparative Statements about Fitted Hazard-Rate, from Cox Regression Output …

• You can say that the hazard-rate for one group is three times that of another, but you cannot say how high, or low, either function actually is.

• This is the critical compromise associated with partial ML approach used to fit the Cox Regression Model.

(ALDA, Section 14.3.1, p. 524-528) © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 20

S077: Applied Longitudinal Data AnalysisInterpreting Parameter Estimates as Hazard-Rate Ratios In A Fitted Cox Regression Model

Page 21: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.3.4, p. 532-535)

Q: How Do You Compare Each Person’s Unique Risk Of Event

Occurrence To That Of A “Baseline Person”

(e.g., to someone with values of all predictors equal to 0—here, a person of avg. AGE on release (30.7), with no history of PERSONAL or PROPERTY

crime)

PijPijijj0

PijPijijj0 XXXth

XXXth

2211

2211 exp)(

exp)(

A: You do it by taking ratios of their hazard functions:

risk score

Average Comparative Risk:But, participants arrived by

different routes:• ID 22 was of average age on release

with no history of these crimes.• ID 8 had a history of both crimes but

was 22 years older than the average inmate upon release.

High Comparative Risk:• All were younger than average on

release.• Risk that ID 5 will re-offend is over

seven times than a baseline person.

Low comparative risk:• All much older than average on release.• None has history of both crimes.

Risk scores are useful for demonstrating that there is more than one way to attain a given level of risk but….

Careful: Changing the baseline by centering the predictors changes the values of the risk scores …

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 21

S077: Applied Longitudinal Data AnalysisUsing Risk Scores: Summarizing the Impact Of Several Predictors Simultaneously

Page 22: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 14.4, p. 535-542)

Even though we have repeatedly stated that Cox Regression Analysis provides no information about the Baseline Hazard-Rate Function, it is actually possible to recover estimated baseline functions from the fitted model …

Useful for documenting the combined effects of predictors. Here, we use Model D to control for AGE and demonstate the combined effect of predictors PERSONAL and PROPERTY, documenting the large differences in survival associated with variation in them

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 22

S077: Applied Longitudinal Data AnalysisRecovering Survivor & Cumulative Hazard-Rate Functions From A Fitted Cox Regression Model

Page 23: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 15.1, pp 544-545)

Model specification is easy: Just add the subscript j to the time-varying predictors

But, data demands can be enormously high (sometimes insurmountable) … • You need to know the value of any time-

varying predictor—for everyone still at risk—at every moment when anyone experiences the event.

Same requirement as in discrete-time, but it was unproblematic there because:

Number of unique event times was relatively small

Event occurrence and predictors are typically assessed on the same schedule

In continuous time, you typically can’t set the data collection schedule to coincide with event occurrence for everyone still at risk

ijij0ij XXthh(t 2211exp)()

Practical Implications

• If you’re interested in time-varying predictors, research design is crucial—Don’t wait until the data are collected.

• Time-varying predictors that are non-reversible dichotomies—that themselves represent event occurrence—are easiest to collect data on (eg, 1st marriage, HS graduation).

• Reversible dichotomies and continuous predictors usually require data-imputation and things can get very complex very quickly (discussed in Section 15.1.2 and 15.1.3)

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 23

S077: Applied Longitudinal Data AnalysisIncluding Time-Varying Predictors In A Cox Regression Model

Page 24: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 15.1.1, pp 545-551)

Sample: 1,658 men interviewed twice (in 1974 and 1985) --

382 (23.0%) started using cocaine between ages 17 and 41.

Data source: Burton and colleagues (1996),Journal of Health and Social Behavior

Three Time-Invariant Predictors EARLYMJ and EARLYOD indicate whether the respondent had initiated marijuana (7.2%) or other drugs (3.7%) so early that he could be characterized as a previous user at t0 (age 17).

BIRTHYR (1961-1985), to account for societal changes (included as a control predictor in every model).

Four Time-Varying Predictors

– USEDMJj, SOLDMJj USEDODj , SOLDODj each

identify, at each age tj , whether the respondent had previously used or sold marijuana (MJ) or other drugs (OD).

– Conceptually, think about a person-period data set in which these variables switch from 0 to 1 in the relevant year and stay at 1 thereafter.

– In reality, we do not use a person-period data set but rather computer code in a person-level data set ( Section 15.1, p. 547).

• Rather than using contemporaneous values of the TV predictors, we lag them by one year. Addresses issues of rate-and state-dependence (discussed in Section 12.3.3, p. 440).

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 24

S077: Applied Longitudinal Data AnalysisIncluding Time-Varying Non-Reversible Dichotomies As Predictors In A Cox Model: Example

Page 25: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 15.1.1, pp 545-551)

A: Only Time-Invariant Predictors

IncludedAll 3 stat sig.

B: Substitute Time-Varying Use Predictors

• Effects much larger (and still stat sig.).

• Fit much better.

C: Add Time-Varying Sales Predictors

• Creates ordinal variable when paired with use predictors.

• Both use and sales are stat sig.

• Hazards add up: Someone who both used and sold MJ and OD has a hazard ratio of exp(5.1606)=164.27!

• Best fitting model so far.

D: Add Back Time-Invariant Predictors

• Estimates are not stat sig.• D fits no better than C• We prefer model C

Note: Diminishing BIRTHYR effects• Uncontrolled estimate = 0.2026.• Drops from .1551 to 0.0849 from A to C.• Effects previously attributable to BIRTHYR get absorbed

by TV drug use (known as substitution effects).

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 25

S077: Applied Longitudinal Data AnalysisInterpreting Results Of Fitting Cox Regression Models With Time-Varying Predictors

Page 26: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 15.3.2, pp 564-570)

Sample: 174 teens admitted to a psychiatric hospital

Research Design: True ExperimentHalf (n=88) had traditional psychiatric treatment and services (TREAT=0).

The other half (n=86) were randomly selected to participate in an innovative program that provided coordinated mental health services regardless of setting (in- or out-patient).

Everyone tracked for up to 3 months to determine whether and, if so, when they were released.

RQ: Does provision of comprehensive mental health services reduce the length of hospital stay?

Data source: Michael Foster and colleagues (1996), Evaluation and Program Planning

Fitted Cox model for TREAT

Does this statistically non-significant effect mean that the TREATment has

no effect?

Perhaps, but not necessarily…It could be that the effect of TREATment

varies over time

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 26

S077: Applied Longitudinal Data AnalysisMight The Effect Of A Predictor Differ Over TIME?

Page 27: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 15.3.2, pp 564-570)

iTREATj0ij

ij0ij

eth h(t

TREATth logh(t log

1)()

)() 1

Proportional Hazards Assumption

Any predictor’s effect must only produce a constant difference in the elevation of the

log hazard-rate profile (or a magnification/diminution of the raw

hazard profiles)

In discrete-time, we could plot the sample hazard probability functions to see if there

was a violation, but in continuous time, it’s not easy to plot sample hazard-rate

functions

Solution: Inspect sample cumulative hazard-rate functions because model

equivalence means that these plots can tell what we need to know about

potential violations of the proportionality assumption

-4

-3

-2

-1

0

1

2

0 7 14 21 28 35 42 49 56 63 70 77

Days in hospital

fitt

ed l

og H

(t)

Treatment

Comparison

TREATment effect appears large early

Little TREATment effect late

If The Proportionality Assumption Is Violated For A Predictor, Then There Is An Interaction Between The Predictor And TIME.

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 27

S077: Applied Longitudinal Data AnalysisHow Might We Detect A Violation Of The “Proportional Hazards” Assumption?

Page 28: © Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 1 S077: Applied Longitudinal Data Analysis Week #6: What Are The.

(ALDA, Section 15.3.2, pp 564-570)

Three Common Specifications For Interactions With TIME

A. Linear interaction with TIME—the effect of TREAT declines smoothly (linearly) over time.

B. Step-function—the effect of TREAT differs across epochs (here, we’ll ask about weeks).

C. Logarithmic---similar to linear, but handles typical long tails for TIME.

Statistically Significant Linear Interaction with TIME

By centering TIME on day 1, 0.7064 is TREATment effect on first

day of hospitalization

Effect of TREAT Differs by WeekNote decline in estimates in early weeks.

exp(2.5335)=12.60 is the estimated

hazard ratio on day #1.

Estimated log hazard for TREAT declines

by .5301 as length of stay doubles (1 to 2, 2

to 4, 4 to 8 etc.):

Day 1=12.60

Day 8 = 2.56

Day 32=0.89

© Willett & Singer, Harvard University Graduate School of Education S077/Week #6– Slide 28

S077: Applied Longitudinal Data AnalysisFitting Non-Proportional Cox Regression Models