CIQLE Workshop: Longitudinal data analysis, Silke Aisenbrey
CIQLE Workshop:
Introduction to longitudinal data analysis with stata panel models and event history analysis
Silke Aisenbrey, Yale University
CIQLE Workshop: Longitudinal data analysis
Goals for the workshop:
-Intro to stata
-Modeling Change over time:Panel Regression Models (fixed, between and random)
-Modeling whether and/or when events occur:Event History Analysis (Data management for event history data, kaplan-meier, cox, piecewise constant)
CIQLE Workshop: Longitudinal data analysis
open stata:
COMMAND
RESULTSresults and syntax
REVIEWof syntax: commands or menu
VARIABLESof open file
CIQLE Workshop: Longitudinal data analysis
open data, with menu (stata data--> eventex.dta)
CIQLE Workshop: Longitudinal data analysis
to see real data
to make changes directly in data
erase variables, cases, make single changes in cases
-->
CIQLE Workshop: Longitudinal data analysis
CIQLE Workshop: Longitudinal data analysis
relational and logical operators in stata:
== is equal to~= is not equal (also !=)
> greater than< less than>= greater than or equal<= less than or equal
& and| or~ not (also!)
basic descriptive commands
CIQLE Workshop: Longitudinal data analysis
sum vartab var1 var2tab var1 var2, col
combine with: …… if var1==2 & var3>0
by var1: ……………sort …………
exercise:
e.g.:
tab abitur sex, coltab abitur sex if cohort==1930, col
sort cohortby cohort: tab abitur sex
basic descriptive commands
CIQLE Workshop: Longitudinal data analysis
help “command”
gen var1 = var2recode var1 (0=.) (1/8=2) (9=3)rename var1 var100
**use the following variables:cohort (indicator of cohort membership)sex (1=male, 2=female)agemaryc (age @ first marriage)
exercise:
e.g.: sum agemaryc
recode age @ married in groups-generate a new variable-recode new variable into groups
-recode if marcens==0
basic commands for data management
CIQLE Workshop: Longitudinal data analysis
possible break
CIQLE Workshop: Longitudinal data analysis
Intro to panel regression with stata:
-panel data
-fixed effects
-between effects
-random effects
-fixed or random?
CIQLE Workshop: Longitudinal data analysis
panel data (panelex1.dta)
CIQLE Workshop: Longitudinal data analysis
Panel data, also called cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods.
Cross-sectional data: only information about variance between subjects
Panel data: two kinds of information between and within subjects
--> two sources of variance
Panel data:
CIQLE Workshop: Longitudinal data analysis
Janet: Basics of panel regression models
CIQLE Workshop: Longitudinal data analysis
cross sectional vs. panel analyses
open panelex1.dtaignore the fact that we have repeated measures:
conclusion: more children --> higher income
regress childrn income
CIQLE Workshop: Longitudinal data analysis
Fixed effects model
Answers the question: What is the effect of x when x changes within persons over time e.g.
Person A has two children at first point of time and three children at second, what effect has this change on income?
Information used: fixed effects estimates using the time-series information in the data
Variance analyzed: within
Problems: only time variant variables
CIQLE Workshop: Longitudinal data analysis
Fixed effects
exercise: separate regression for each unit and then average it:
regress income childrn if id==1regress income childrn if id==2
CIQLE Workshop: Longitudinal data analysis
+( )_____________________________ 2
= - 2.5
exercise: generate dummy variable for person and regress with dummy variable
tab id, g(iddum)
reg income childrn iddum1 iddum2
conclusion: more children --> lower income
CIQLE Workshop: Longitudinal data analysis
Fixed effects
-define data set as panel datatsset id t
-regression with fixed effects commandxtreg income chldrn, fe
CIQLE Workshop: Longitudinal data analysis
Between effects modelAnswers the question: What is the effect of x when x is different (changes) between persons:
Person A has “on the average” three children and Person B has “on the average” five children, what effect has this difference on their income?
In the between effects model we model the mean response, where the means are calculated for each of the units.
Information used: cross-sectional information (between subjects)
Variance analyzed: between variance
Time variant and time invariant variables
CIQLE Workshop: Longitudinal data analysis
Between effects
regress income childrn
conclusion: more children --> more income
define data as panel data xtreg dependent independent, be
average--->
CIQLE Workshop: Longitudinal data analysis
Random effects model: Assumption: no difference between the two answers to the questions:
1) what is the effect of x when x changes within the person: Person A has two children at first point of time and three children at second, what effect does this change have on their income?
2) what is the effect of x when x is different (changes) between persons: Person A has two children and Person B has three children children, what effect does this difference have on their income?
Information used: panel and cross-sectional (between and within subjects)
Variance analyzed: between variance and within variance
Time variant and time invariant variables
CIQLE Workshop: Longitudinal data analysis
Random effects model:
-matrix-weighted average of the fixed and the between estimates.
-assumes b1 has the same effect in the cross section as in the time-series
-requires that individual error terms treated as random variables and follow the normal distribution.
use:
xtreg dependent independent if var==x, re
CIQLE Workshop: Longitudinal data analysis
CIQLE Workshop: Longitudinal data analysis
possible break
CIQLE Workshop: Longitudinal data analysis
open data: panelex2.dta
varlist:
CIQLE Workshop: Longitudinal data analysis
tell stata the structure of the data:
tsset X Y
X= caseidY=time/wave
summary statistics:
xtdesxtsum
CIQLE Workshop: Longitudinal data analysis
use the effects
xtreg dependent independent if sex==1, fe xtreg dependent independent if sex==1, be xtreg dependent independent if sex==1, re
exercise: compare/discuss models
e.g.: xtreg indvar1 indvar2 … if sex==1, fe
try to include time invariant variablestry to make theoretical/empirical argument why you use which model
CIQLE Workshop: Longitudinal data analysis
CIQLE Workshop: Longitudinal data analysis
Problems/Tests/Solutions:
What’s the right model: fixed or random effects?
Test: Hausman Test
Null hypothesis:Coefficients estimated by the efficient random effects estimator are same as those estimated by the consistent fixed effects estimator.
If same (insignificant P-value, Prob>chi2 larger than .05) --> safe to use random effects.
If significant P-value --> use fixed effects.
xtreg y x1 x2 x3 ... , fe estimates store fixed xtreg y x1 x2 x3 ... , re estimates store random hausman fixed random
CIQLE Workshop: Longitudinal data analysis
CIQLE Workshop: Longitudinal data analysis
Problems/Tests/Solutions:
Autocorrelation?
What is autocorrelation:
Last time period’s values affect current values
test: xtserial
Install user-written program, type
findit xtserial or net search xtserial
xtserial depvar indepvars
CIQLE Workshop: Longitudinal data analysis
Significant test statistic indicates presence of serial correlation.
Solution: use model correcting for autocorrelation
xtregar instead of xtreg
CIQLE Workshop: Longitudinal data analysis
CIQLE Workshop: Longitudinal data analysis
possible break
CIQLE Workshop: Longitudinal data analysis
panel
-waves-number of children @ wave1 / 2/ 3/ 4-employed @ wave1 / 2/ 3/ 4-income @ wave1 / 2/ 3/ 4
regression models: dependent variable continuous
event
-dates of events-birth of first child @ 1963-birth of second child @ 1966…-start of first employment @…-start of unemployment @…-start of second employment @…
time information in event data more precise: dependent variable event happens 0/1
different data structure
CIQLE Workshop: Longitudinal data analysis
Different Faces of Event History Data
Time
continuous discrete
CIQLE Workshop: Longitudinal data analysis
Types of censoring
• Subject does not experience event of interest
• Incomplete follow-up Lost to follow-up Withdraws from study
• Left or right censored
CIQLE Workshop: Longitudinal data analysis
CIQLE Workshop: Longitudinal data analysis
open data eventex.dta
CIQLE Workshop: Longitudinal data analysis
tell stata that our data is “survival data”
stset
stset X, failure(Y) id(Z)
X= time at which event happens or right censored, this is always needed
Y= 0 or missing means censored, all other values are interpreted as representing an event taking place/ failure
Z= id
three examples:
• stset ageendschevent: end of schooltime: age @ end of school
• stset agemaryc, failure (marcens) id (caseid) event: marriage
• stset agestjob, failure (stjob) id (caseid) event: first job
CIQLE Workshop: Longitudinal data analysis
DATA MANGAGEMENT HANNAH
CIQLE Workshop: Longitudinal data analysis
Different Models of Event History
Time
continous discrete
non-parametric semi-parametric parametric
-kaplan-meier
-nelson-aalen
-log-rank test for comparison b/w groups
-cox
-piecewise constant
-exponential
-weibull
-log-logistic
-lognormal
-gompertz
-generalized gamma
-logistic
-log-log
only qualitative covariates
inclusion of covariates in models
-compare survival experiences between groups (sex, cohorts)
-univariate -multivariate
Extended from Jenkins 2005
CIQLE Workshop: Longitudinal data analysis
survivor function and hazard function
• Survivor function, S(t) defines the probability of surviving longer than time t
• Survivor and hazard functions can be converted into each other
• Hazard (instantaneous hazard, force of mortality), is the risk that an event will occur during a time interval (Δ(t)) at time t, given that the subject did not experience the event before that time
CIQLE Workshop: Longitudinal data analysis
List the Kaplan-Meier survivor function . sts list . sts list, by(sex) compare
Graph the Kaplan-Meier survivor function . sts graph . sts graph, by(sex)
non-parametric: kaplan-meier
CIQLE Workshop: Longitudinal data analysis
non-parametric: kaplan-meier
exercise:
stset your data for marriage, endschool or first job
e.g.:
1) sts list
2) sts graph
3) sts list, by (…) compare
4) sts graph, by (..)
CIQLE Workshop: Longitudinal data analysis
List the Nelson-Aalen cumulative hazard function . sts list, na . sts list, na by(sex) compare
Graph the Nelson-Aalen cumulative hazard function . sts graph, na . sts graph, na by(sex)
non-parametric: Nelson-Aalen
CIQLE Workshop: Longitudinal data analysis
non-parametric: Nelson-Aalen
exercise:
stset your data for marriage, endschool or first job
1) sts list, na
2) sts graph, na
3) sts list, na by (…) compare
4) sts graph, na by (..)
CIQLE Workshop: Longitudinal data analysis
Comparing Kaplan-Meier curves
Log-rank test can be used to compare survival curves
Hypothesis test (test of significance)
H0: the curves are statistically the same
H1: the curves are statistically different
Compares observed to expected cell counts
non-parametric: kaplan-meier
for age@marr:
CIQLE Workshop: Longitudinal data analysis
Comparing Kaplan-Meier curvesnon-parametric: kaplan-meier
exercise:
Test equality of survivor functions
e.g.: sts test abitur
CIQLE Workshop: Longitudinal data analysis
Limit of Kaplan-Meier curves
• What happens when you have several covariates that you believe contribute to survival?
• Example Education, marital status, children, gender contribute to job change
• Can use K-M curves – for 2 or maybe 3 covariates
• Need another approach – multivariate Cox proportional hazards model is most common -- for many covariates
non-parametric: kaplan-meier
CIQLE Workshop: Longitudinal data analysis
Cox proportional hazards model
• Can handle both continuous and categorical predictor variables
• Without knowing baseline hazard ho(t), can still calculate coefficients for each covariate, and therefore hazard ratio
• Assumes multiplicative risk -
-->proportional hazard assumption
semi-parametric models: cox
CIQLE Workshop: Longitudinal data analysis
semi-parametric models: cox
example age of first marriage stcox sex
Interpretation:
because the cox model does not estimate a baseline, there is no intercept in the output.
sex (male=1) (female=2)whatever the hazard rate at a particular time is for men, it is 1.5 times higher for women
what does this mean in our case?
women get married younger than men do.
CIQLE Workshop: Longitudinal data analysis
Interpretation of the regression coefficients
• An estimated hazard rate ratio greater than 1 indicates the covariate is associated with an increased hazard of experiencing the event of interest
• An estimated hazard rate ratio less than 1 indicates the covariate is associated with a decreased hazard of experiencing the event of interest
• Estimated hazard rate ratio of 1 indicates no association between covariate and hazard.
semi-parametric models: cox
CIQLE Workshop: Longitudinal data analysis
Graphically: estimates for functions:
stcox sex, basehc (H0)stcurve, hazard at1(sex=0) at2(sex=1)
stcox sex, basesurv (S0)stcurve, surviv at1(sex=0) at2(sex=1)
CIQLE Workshop: Longitudinal data analysis
exercise:
make your own cox model
and estimate the hazard and survival
CIQLE Workshop: Longitudinal data analysis
Assessing model adequacy
• Proportional assumption: covariates are independent with respect to time and their hazards are constant over time
• Three general ways to examine model adequacy Graphically: Do survival curves intersect? Mathematically: Schoenfeld test Computationally: Time-dependent variables (extended
model)
CIQLE Workshop: Longitudinal data analysis
compare with kaplan maier:
stcoxkm, by (sex)
exercise: do this with one of your estimates
CIQLE Workshop: Longitudinal data analysis
"log-log" plots
stphplot, by (sex)
exercise: do this with one of your estimates, stphplot can be adjusted--> look in stphplot help
CIQLE Workshop: Longitudinal data analysis
Mathematically: Schoenfeld Test
tests if the log hazard function is constant over time, thus a rejection of the null hypothesis indicates a deviation from the proportional hazard assumption
stcox sex, schoenfeld(sch*) scaledsch(sca*)
estat phtest (if more var estat phtest, detail)
exercise: do this with your model, try to find a model which fits
CIQLE Workshop: Longitudinal data analysis
Summary
• Survival analyses quantifies time to a single, dichotomous event
• Handles censored data well
• Survival and hazard can be mathematically converted to each other
• Kaplan-Meier survival curves can be compared graphically
• Cox proportional hazards models help distinguish individual contributions of covariates to survival, provided certain assumptions are met.
CIQLE Workshop: Longitudinal data analysis
It can get a lot more complicated than this
• The proportional hazards model as shown only works when the time to event data is relatively simple
• Complications non proportional hazard rates time dependent covariates competing risks multiple failures non-absorbing events etc.
Extensive literature for these situations and software is available to handle them.
CIQLE Workshop: Longitudinal data analysis
Semi-parametric models: Piecewise constant
-transition rate assumed to be not constant over observed time
-splits data in user defined time pieces,
-transition rates constant in each “time piece”
-but: transition rates change between time pieces
CIQLE Workshop: Longitudinal data analysis
Semi-parametric models: piecewise constant
in STATA a user written command, an “ado file” by J. Sorensen: stpiece
net search stpiece
install file
stpiece abitur, tp(20 30 40) tv(sex)
tp: time pieces, intervals
tv: covariates whose influence might vary over time pieces
CIQLE Workshop: Longitudinal data analysis
the end
Top Related