Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point –...
-
Upload
milo-stevens -
Category
Documents
-
view
219 -
download
0
Transcript of Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point –...
Modelling Longitudinal Data
• Survival Analysis.
• Event History.
• Recurrent Events.
• A Final Point – and link to Multilevel Models (perhaps).
Yi 1 = ’Xi1 + i1
Vector of explanatory variables and estimates
Independent identifiably distributed error
Outcome 1 for individual i
Yi 2 = ’Xi2 + i2
Vector of explanatory variables and estimates
Independent identifiably distributed error
Outcome 2 for individual i
THE SAME AGAIN AT TIME 2
Yi 1 = ’Xi1 + i1 Yi 2 = ’Xi2 + i2
Considered together conventional regression analysis in NOT appropriate
Yi 2 - Yi 1 = ’(Xi2-Xi1) + (i2 - i1)
Change in Score
Here the ’ is simply a regression on the difference or change in scores.
As social scientists we are often substantively interested in whether a specific event has occurred.
Survival Data – Time to an event
In the medical area…
• Duration from treatment to death.
• Time to return of pain after taking a pain killer.
Survival Data – Time to an event
Social Sciences…
• Duration of unemployment.
• Duration of time on a training scheme.
• Duration of housing tenure.
• Duration of marriage.
• Time to conception.
Consider a binary outcome or two-state event
0 = Event has not occurred
1 = Event has occurred
Start of Study End of Study
0 1
0
0
1
1
t1 t2 t3
These durations are a continuous Y so why can’t we use standard
regression techniques?
Start of Study End of Study
0 1
0
0
1
1
1
0
CENSORED OBSERVATIONS
0
Start of Study End of Study
1
B
CENSORED OBSERVATIONS
A
These durations are a continuous Y so why can’t we use standard
regression techniques?
What should be the value of Y for person A and person B at the end of our study (when we fit the model)?
Cox Regression
is a method for modelling time-to-event data in the presence of censored cases.
•Explanatory variables in your model (continuous and categorical).
•Estimated coefficients for each of the covariates.
•Handles the censored cases correctly.
Start of Study End of Study
0 1
0
0
1
1
1
0
CENSORED OBSERVATIONS
0
UNEMPLOYMENT AND RETURNING TO WORK STUDY
0 = Unemployed; 1 = Returned to work
Y variable =
duration with censored observations
X1
X3
X2
A Statistical Model
Y variable =
duration with censored observations
Previous Occupation
Educational Qualifications
A Statistical Model
Length of Work experience
A continuous covariate
More complex event history analysis
Start of Study
End of Study
0
t1 t2 t3
0 = Unemployed; 1 = Returned to work
1 1
UNEMPLOYMENT AND RETURNING TO WORK STUDY
0
Start of Study
End of Study
0
t1
0 = Unemployed; 1 = Returned to work
UNEMPLOYMENT AND RETURNING TO WORK STUDY
Spell or Episode
Start of Study
End of Study
0
t1
0 = Unemployed; 1 = Returned to work
1
UNEMPLOYMENT AND RETURNING TO WORK STUDY
Transition = movement from one state to another
Recurrent Events Analysis
The structure of many large-scale studies results in survey data being collected at a number of discrete occasions. In this situation, rather than being continuous, time lends itself to be conceptualized as a sequence of discrete events. Furthermore, social scientists are often substantively interested in whether a specific event has occurred. Taken together, these two issues appeal to the adoption of a discrete-time or event history approach.
Recurrent events are merely outcomes that can take place on a number of occasions. A simple example is unemployment measured month by month. In any given month an individual can either be employed or unemployed. If we had data for a calendar year we would have twelve discrete outcome measures (i.e. one for each month).
Social scientists now routinely employ statistical models for the analysis of discrete data, most notably logistic and log-linear models, in a wide variety of substantive areas. I believe that the adoption of a recurrent events approach is appealing because it is a logical extension of these models.
Willet and Singer (1995) conclude that discrete-time methods are generally considered to be simpler and more comprehensible, however, mastery of discrete-time methods facilitates a transition to continuous-time approaches should that be required.
Willet, J. and Singer, J. (1995) Investigating Onset, Cessation, Relapse, and Recovery: Using Discrete-Time Survival Analysis to Examine the Occurrence and Timing of Critical Events. In J. Gottman (ed) The Analysis of Change (Hove: Lawrence Erlbaum Associates).
STATISTICAL ANALYSIS FOR BINARY RECURRENT
EVENTS (SABRE)
• Fits appropriate models for recurrent events.
• It is like GLIM.
• It can be downloaded free.
www.cas.lancs.ac.uk/software
Consider a binary outcome or two-state event
0 = Event has not occurred
1 = Event has occurred
In the cross-sectional situation we are used to modelling this with logistic regression.
0 = Unemployed; 1 = Returned to work
UNEMPLOYMENT AND
RETURNING TO WORK STUDY –
A study for six months
Months1 2 3 4 5 6
obs 0 0 0 0 0 0
Constantly unemployed
Months1 2 3 4 5 6
obs 1 1 1 1 1 1
Constantly employed
Months1 2 3 4 5 6
obs 1 0 0 0 0 0
Employed in month 1 then unemployed
Months1 2 3 4 5 6
obs 0 0 0 0 0 1
Unemployed but gets a job in month six
Months1 2 3 4 5 6
obs 0 1 0 1 1 0obs 0 0 1 0 1 1obs 0 1 1 0 0 1obs 1 0 0 0 1 0
Mixed employment patterns
Here we have a binary outcome – so could we simply use logistic regression to model it?
Months1 2 3 4 5 6
obs 0 0 0 0 0 0
Yes and No!
SABRE fits two models that are appropriate to this analysis.
Model 1 = Pooled Cross-Sectional Logit Model
)'exp(1
)]'[exp()(
it
it
x
yxL
it
itB
POOLED CROSS-SECTIONAL
LOGIT MODEL
x it is a vector of explanatory variables and is a vector of
parameter estimates .
)'exp(1
)]'[exp()(
it
it
x
yxL
it
itB
POOLED CROSS-SECTIONAL
LOGIT MODEL
In conventional logistic regression models, where each observation is assumed to be independent, a logistic link function is used, the contribution to the likelihood by the ith case and the t th event is given by the equation above.
This approach can be regarded as a naïve solution to our data analysis problem.
We need to consider a number of issues….
MonthsY1 Y2
obs 0 0
Pickle’s tip - In repeated measured analysis
we would require something like a ‘paired’ t test
rather than an ‘independent’ t test because we
can assume that Y1 and Y2 are related.
SABRE fits two models that are appropriate to this analysis.
Model 2 = Random Effects Model
(or logistic mixture model)
Repeated measures data violate an important assumption of conventional regression models.
The responses of an individual at different points in time will not be independent of each other.
This problem has been overcome by the inclusion of an additional, individual-specific error term.
df(
)'exp(1
)]'[exp()(
1
it
iti
x
yxL
itT
t
itB
The random effects model extends the pooled cross-sectional model to include a case-specific random error term to account for residual heterogeneity.
For a sequence of outcomes for the ith case, the basic random effects model has the integrated (or marginal likelihood) given by the equation.
Davies and Pickles (1985) have demonstrated that the failure to explicitly model the effects of residual heterogeneity may cause severe bias in parameter estimates. Using longitudinal data the effects of omitted explanatory variables can be overtly accounted for within the statistical model. This greatly improves the accuracy of the estimated effects of the explanatory variables
Movers and Stayers
When considering data on recurrent events there will be individuals for whom there will be zero
(or very low) probabilities of change in outcome from one event to the next. These individuals
are termed as ‘stayers’.
Months1 2 3 4 5 6
obs 0 0 0 0 0 0
This person is a stayer!
Months1 2 3 4 5 6
obs 1 1 1 1 1 1
So is this person.
An awareness of the issue of ‘stayers’ is important for technical reasons. A limitation of a parametric modelling approach is that the tail behaviour of the normal distribution is inconsistent with ‘stayers’ and they will tend to be underestimated (see Spilerman 1972).
Spilerman, S. (1972) ‘Extensions of the Mover-Stayer Model’, American Journal of Sociology, 78, pp.599-626.
Recurrent events may be analysed using other software but SABRE is specifically designed to handle stayers and this feature increases SABRE’s flexibility in representing residual heterogeneity (Barry, Francis, Davies, and Stott 1998).
Barry, J., Francis, B., Davies, R.B. and Stott,D. (1998) SABRE Users Guidehttp://www.cas.lancs.ac.uk/software/sabre3.1/sabreuse.html
Past BehaviourCurrent
Behaviour
STATE DEPENDENCE
UnemployedEmployed
Employed
Young People Aged 19
MAYAPRIL
Different Probabilities of Employment
This is called a MARKOV model
A Markov model helps to control for a previous outcome (or behaviour).
1tyγ'β -itx
ACCOUNTS FOR PREVIOUS
OUTCOME (yt-1)
UnemployedExplanatory
Variables Employed
EmployedExplanatory
Variables
The Model Provides TWO sets of estimates
MAY
APRIL
This is a ‘two-state’ MARKOV model
But we can make it more complicated.
MonthsY1 Y2
obs 0 0
First Order Markov Model
MonthsY1 Y2 Y3
obs 0 0 0
Second Order Markov Model
FINAL POINT – A THOUGHT!
Months1 2 3 4 5 6
obs 0 1 0 1 1 0obs 0 0 1 0 1 1obs 0 1 1 0 0 1obs 1 0 0 0 1 0
Mixed employment patterns
a b c d e f
1 2 3 4 1 2 1 2 3 1 2 3 1 2 1 2 3 1 2
g
Observations Months
Individuals
Hierarchical or Multilevel Data Structure
Is the recurrent events model simply a multilevel model fitted
at the single level?
A controversial point!
More later…..