Propensity score matching for simple and clustered...
Transcript of Propensity score matching for simple and clustered...
Propensity score matching for simple and
clustered data using SPSS and R
Felix Thoemmes & Wang Liao
Support provided to first author by
IES grant “Matching Strategies for Observational Studies with Multilevel Data in Educational Research”
Increasing use of propensity scores
Source: Web of Science 2
0
2000
4000
6000
8000
10000
1983 1987 1991 1995 1999 2003 2007 2011
Propensity scores
e(x) = p (z=1 | x)
3
Propensity score
probability
z = treatment assignment
1 = treatment group
0 = control group
x = vector of covariates
conditional on
Propensity scores
A single number summary based on all available
covariates that expresses the probability that a
given subject is assigned to the treatment
condition, based on the values of the set of
observed covariates
4
e(x) = p (z=1 | x)
4
Actual assignment
Pro
babili
ty o
f re
ceiv
ing t
reatm
ent
Control Treatment
5
Actual assignment
Pro
babili
ty o
f re
ceiv
ing t
reatm
ent
Control Treatment
Why propensity scores?
• Is there anything that we can do with
propensity scores that we cannot do with
multiple regression?
Comparison
Propensity scores Regression adjustment
Tool to strengthen causal conclusions Tool to strengthen causal conclusions
Models relationship between confounders
and treatment
Models relationship between confounders
and outcome
Specification of functional form can be
checked via balance measures
Specification of functional form can be
checked via examination of residuals
Easy assessment of overlap – little
potential for extrapolation
Overlap is assessed in multi-dimensional
space – often extrapolated
No routine assumptions about linearity
and interactions
Classic ANCOVA assumes linearity and
absence of interaction, but this can be
relaxed
Outcome variable unknown Outcome variable part of the model
Sample size can be diminished through
matching, loss of power
Sample size stays constant, power can
increase due to covariates
Causal effect for treated, untreated, local
comparison
Causal effect extrapolated to population
Selection Estimation Conditioning Model
Checks Effect
Estimation
8
9
Selection Estimation Conditioning Model
Checks Effect
Estimation
Selection of covariates is the single most important aspect to
ensure unbiasedness of causal effect
Debate in literature (see Rubin, Pearl, 2009, Statistics in
Medicine) on how to select covariates
Include variables that are confounders
(based on your theoretical background knowledge)
Exclude variables that are affected by the treatment (potential mediators)
Exclude variables that are instrumental variables
Exclude variables that are collider variables and induce dependencies
Correlational evidence as basis for variable selection can mislead
10
Selection Estimation Conditioning Model
Checks Effect
Estimation
Selection Estimation Conditioning Model
Checks Effect
Estimation
Traditionally, estimated using logistic regression
Might necessitate iterative model optimization
Data mining approaches offer some promise
Covariate-balancing propensity score (K. Imai)
11
Selection Estimation Conditioning Model
Checks Effect
Estimation
Matching can be done in MANY different ways
1:1, 1:k nearest neighbor matching
1:1, 1:k optimal matching
k:k full matching
Kernel matching
Synthetic matching
12
Selection Estimation Conditioning Model
Checks Effect
Estimation
Other approaches include
Stratification (form subclasses based on estimated propensity score)
Weighting (use propensity score to construct weights that balance groups)
Regression adjustment (use propensity score as a covariate)
13
Selection Estimation Conditioning Model
Checks Effect
Estimation
• Check of covariate balance
–standardized difference of covariates (and squares, interactions)
–various diagnostic graphs
• Region of common support (distributional overlap)
–graphical assessment (e.g. histograms)
14
Selection Estimation Conditioning Model
Checks Effect
Estimation
• Estimate of treatment effect
–Mean difference
–Standard error dependent on conditioning scheme
15
Propensity scores in R and SPSS
• “MatchIt()” from Ho et al. performs a wide
variety of these tasks
• “PSM for SPSS” is an SPSS implementation
of MatchIt() and several other R packages
(e.g., “Ritools()”, “cem”, “optmatch”)
MatchIt in R
MatchIt
• Offers various ways to estimate the
propensity score (including generalized
additive models)
• Offers various way to match (including full
matching, nearest neighbor matching, exact
matching)
MatchIt
• Offers various ways to fine-tune the matching
(caliper, discarding of units outside region of
overlap)
MatchIt
• Provides output of
– Balance table (standardized difference)
– Diagnostic balance plots
PSM in SPSS
• Offers most (but not all) of the features of MatchIt
• In addition
– Reports Hansen & Bowers overall chi-square test of balance
– Reports King’s multivariate imbalance measure
– Supports multi-level data (fixed and random effects models)
Multi-level data
• Selection is on level 1, unit of analysis is on
level 1, but clustering is present
SPSS PSM
• Options to estimate fixed effects model,
random effects models (user defines which
slopes should be random)
• PS estimated based on model that allows
intercepts and slopes to be estimated as
random effects (allowed to vary across
clusters)
SPSS PSM
• Conditioning within clusters (CWC),
conditioning across clusters (CAC)
• Flexible PS MLM modeling choices
• Balance checks within clusters and globally
Download:
http://sourceforge.net/
projects/psmspss/
Contact:
felix.thoemmes@cornell
formula
treat ~ x1 + x2 + x3 + x1:x2 + x1^2
method
nearest, full, optimal,
genetic, exact, subclass
formula
treat ~ x1 + x2 + x3 + x1:x2 + x1^2
method
nearest, full, optimal,
genetic, exact, subclass