More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working...

Post on 20-Jan-2016

214 views 0 download

Transcript of More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working...

More complex event history analysis

Start of Study

End of Study

0

t1

0 = Unemployed; 1 = Working

UNEMPLOYMENT AND RETURNING TO WORK STUDY

Spell or Episode

Start of Study

End of Study

0

t1 t2 t3

0 = Unemployed; 1 = Working

1 1

UNEMPLOYMENT AND RETURNING TO WORK STUDY

0

Start of Study

End of Study

0

t1

0 = Unemployed; 1 = Working

1

UNEMPLOYMENT AND RETURNING TO WORK STUDY

Transition = movement from one state to another

Recurrent events are merely outcomes that can take place on a number of occasions. A simple example is unemployment measured month by month. In any given month an individual can either be employed or unemployed. If we had data for a calendar year we would have twelve discrete outcome measures (i.e. one for each month).

Social scientists now routinely employ statistical models for the analysis of discrete data, most notably logistic and log-linear models, in a wide variety of substantive areas. I believe that the adoption of a recurrent events approach is appealing because it is a logical extension of these models.

Consider a binary outcome or two-state event

0 = Event has not occurred

1 = Event has occurred

In the cross-sectional situation we are used to modelling this with logistic regression.

0 = Unemployed; 1 = Working

UNEMPLOYMENT AND

RETURNING TO WORK STUDY –

A study for six months

Months1 2 3 4 5 6

obs 0 0 0 0 0 0

Constantly unemployed

Months1 2 3 4 5 6

obs 1 1 1 1 1 1

Constantly employed

Months1 2 3 4 5 6

obs 1 0 0 0 0 0

Employed in month 1 then unemployed

Months1 2 3 4 5 6

obs 0 0 0 0 0 1

Unemployed but gets a job in month six

Here we have a binary outcome – so could we simply use logistic regression to model it?

Yes and No – We need to think about this issue.

Appropriate Software

STATISTICAL ANALYSIS FOR BINARY RECURRENT

EVENTS (SABRE)• Fits appropriate models for recurrent events.

• It is like GLIM.

• It can be downloaded free.

www.cas.lancs.ac.uk/software

SABRE fits two models that are appropriate to this analysis.

Model 1 = Pooled Cross-Sectional Logit Model

Think of this as being the same as a logistic regression in any software package.

)'exp(1

)]'[exp()(

it

it

x

yxL

it

itB

POOLED CROSS-SECTIONAL

LOGIT MODEL

x it is a vector of explanatory variables and is a vector of

parameter estimates .

We could fit a pooled cross-sectional model to our recurrent events data.

This approach can be regarded as a naïve solution to our data analysis problem.

We need to consider a number of issues….

MonthsY1 Y2

obs 0 0

Pickle’s tip - In repeated measured analysis

we would require something like a ‘paired’ t test

rather than an ‘independent’ t test because we

can assume that Y1 and Y2 are related.

SABRE fits two models that are appropriate to this analysis.

Model 2 = Random Effects Model

(or logistic mixture model)

Repeated measures data violate an important assumption of conventional regression models.

The responses of an individual at different points in time will not be independent of each other.

This problem has been overcome by the inclusion of an additional, individual-specific error term.

The random effects model extends the pooled cross-sectional model to include a case-specific random error term to account for residual heterogeneity.

For a sequence of outcomes for the ith case, the basic random effects model has the integrated (or marginal likelihood) given by the equation.

df(

)'exp(1

)]'[exp()(

1

it

iti

x

yxL

itT

t

itB

Davies and Pickles (1985) have demonstrated that the failure to explicitly model the effects of residual heterogeneity may cause severe bias in parameter estimates. Using longitudinal data the effects of omitted explanatory variables can be overtly accounted for within the statistical model. This greatly improves the accuracy of the estimated effects of the explanatory variables

An example – see Davies, Elias & Penn (1992).

A study of wive’s employment status.

Y (femp) 0 = wife unemployed

1 = wife employed

X1 (fmune) 0 = husband employed

1 = husband unemployed

X2 (fund1) 0 = no child under 1 year

1 = child under 1 year

Results of various modelsModel X Vars Deviance d.f.

Pooled - 2054 1579

Pooled fmune 1970 1578

Pooled fmune + fund1

1877 1577

Random effects

fmune + fund1

1344 1576

 

 

Deviance = 1344.2363 on 1576 residual degrees of freedom

 

dis e

 

Parameter Estimate S. Error

___________________________________________________

int 1.5054 0.23772

fmune ( 1) 0.00000E+00 ALIASED [I]

fmune ( 2) -2.2871 0.38153

fund1 ( 1) 0.00000E+00 ALIASED [I]

fund1 ( 2) -2.5752 0.34447

scale 2.2524 0.16565

Random effect

Past BehaviourCurrent

Behaviour

STATE DEPENDENCE

UnemployedEmployed

Employed

MAYAPRIL

STATE DEPENDENCE

MonthsY1 Y2

obs 0 0

Lag Model

1tyγ'β -itx

ACCOUNTS FOR PREVIOUS

OUTCOME (yt-1)

This is called a Lagged model

A Lagged model helps to control for a previous outcome (or behaviour).

Model X Vars Deviance d.f.

Random effects

fmune + fund1

1344 1576

Drop y fmune + fund1

1160 1421

Lag fmune + fund1

823 1420

Results of models – with state dependence

Deviance = 823.21859 on 1420 residual degrees of freedom

Deviance decrease = 336.96811 on 1 residual degree of freedom

 

dis e

 

Parameter Estimate S. Error

___________________________________________________

int -1.3695 0.17259

fmune ( 1) 0.00000E+00 ALIASED [I]

fmune ( 2) -1.5287 0.39847

fund1 ( 1) 0.00000E+00 ALIASED [I]

fund1 ( 2) -3.1227 0.35764

lag 4.3046 0.22885

scale 0.50379 0.28180

State dependence can be explored further by the estimation of aa ‘two-state’ MARKOV model.

UnemployedExplanatory

Variables Employed

EmployedExplanatory

Variables

The Model Provides TWO sets of estimates

MAY

APRIL

Results of models – with state dependence Model X Vars Deviance d.f.

Drop y fmune + fund1

1160 1421

Lag fmune + fund1

823 1420

Markov fmune + fund1

803 1417

Parameter Estimate S. Error

___________________________________________________

 

Unemployed Women at t-1

_______

int -1.5549 0.23159

fmune ( 1) 0.00000E+00 ALIASED [I]

fmune ( 2) -1.9071 0.74901

fund1 ( 1) 0.00000E+00 ALIASED [I]

fund1 ( 2) -1.4606 0.71256

scale 1.2392 0.29000

 

Employed Women at t-1

_______

int 3.0647 0.17575

fmune ( 1) 0.00000E+00 ALIASED [I]

fmune ( 2) -1.3717 0.50228

fund1 ( 1) 0.00000E+00 ALIASED [I]

fund1 ( 2) -3.4226 0.35791

scale 0.10000E-02 0.28111

 

SABRE – Good Points

• Fits appropriate models for recurrent events.• It is like GLIM.• It can be downloaded free.• There is a users list.• Uses the deviance to compare models (correct likelihood).• Fits the Markov model.• Fits a range of other models (e.g. loglinear + ordinal).• Can do more advance analysis (e.g. Mover/Stayers).

SABRE – Bad Points

• It is like GLIM – you need to understand a prog. Syntax.

• Data management and handling are poor.• There are few users.

Alternatives to SABRE

• STATA – Does not fit the full range of models.• Multilevel model software – Okay up to a point but check

that the likelihood is correct (complicated).• No software other than SABRE fits a continuation ratio

model (ordinal), Markov model or the mover/stayer.