Endogeneity & Exogeneity
-
Upload
summit-consulting-llc -
Category
Data & Analytics
-
view
200 -
download
0
Transcript of Endogeneity & Exogeneity
EXOGENEITY & ENDOGENEITYShane Thompson
Summit Consulting, LLC
© 2015
EXOGENEITY, ENDOGENEITY, & YOU
Program evaluation, policy impacts, treatment effects
How does our outcome of interest (income, health, free-throw %) change after our independent variable of interest (education, medicine, practice) changes?
Often, we will have to concern ourselves with whether our independent variable of interest is exogenous or endogenous
EXOGENOUS VARIABLES
An independent variable is exogenous in a dataset if it is assigned/chosen without respect to how it might influence the outcome of interest
Education level assigned without regard to potential income benefit
Medicine assigned without respect to probability of success
Practice hours assigned without respect to how it might improve performance
Randomization of “treatment” assignment ensures that independent variables are exogenous
ENDOGENOUS VARIABLES An independent variable is endogenous in a dataset
if it is assigned/chosen according to how it might determine the outcome of interest
Income: Bill Gates drops out of college; Shane Thompson goes forever
Health: Patient with family history of heart attacks takes aspirin; patient without does not take it
FT%: Shaq practices free throws; Steve Nash does not
Simple comparisons of treatment-vs-control are biased
College REDUCES income by billions of dollars! Aspirin INCREASES the probability of heart attacks! Free-throw practice REDUCES free-throw percentage!
COMMUTING TIME: SLUGGING
Weather…wait...where…who…wreck?
Exogenous or endogenous? The weather
EXOGENOUS The time of day I get in the slug line
ENDOGENOUS My destination (14th, 18th, or L’Enfant Plaza)
ENDOGENOUS The gender of the driver
EXOGENOUS A wreck on the HOV lanes
EXOGENOUS
CORRELATION VS. CAUSATION
Estimates of the effects of endogenous variables on outcome variables are correlational A change in an outcome variable given a change
in an endogenous variable is potentially reflective of changes of several other variables in the model
Estimates of the effects of exogenous variables on outcome variables are causal A change in an outcome variable given a change
in an exogenous variable is attributable to the exogenous variable
EX: BLUE JEANS AND REVENUE
AL and AC notice that whenever several Summiteers wear jeans into the office, big deliverables are generally due that day.
In leadership meetings they alert the directors to this amazing trend.
Hypothesis: Wearing jeans increases revenue
# Deliverables Jeans # Deliverables Jeans1 0 1 01 0 3 00 0 2 00 0 3 03 1 5 1 0 0 1 02 0 0 02 0 3 02 0 1 06 1 6 1 3 0 3 01 0 1 03 0 1 01 0 0 03 1 5 1 1 0 2 02 0 0 01 0 0 00 0 0 08 1 5 1 0 0 1 01 0 0 02 0 0 00 0 0 06 1 6 1
JEANS. JEANS! JEANS!
Over the last 50 work days: Summit averages 5.3 deliverables when
Summiteers wear jeans Summit averages 1.1 deliverables when
Summiteers do NOT wear jeans
JEANS INCREASE PRODUCTIVITY!!!
JEANS INCREASE REVENUE!!!
# Deliverables Jeans Day # Deliverables Jeans Day1 0 Monday 1 0 Monday1 0 Tuesday 3 0 Tuesday0 0 Wednesday 2 0 Wednesday0 0 Thursday 3 0 Thursday3 1 Friday 5 1 Friday0 0 Monday 1 0 Monday2 0 Tuesday 0 0 Tuesday2 0 Wednesday 3 0 Wednesday2 0 Thursday 1 0 Thursday6 1 Friday 6 1 Friday3 0 Monday 3 0 Monday1 0 Tuesday 1 0 Tuesday3 0 Wednesday 1 0 Wednesday1 0 Thursday 0 0 Thursday3 1 Friday 5 1 Friday1 0 Monday 2 0 Monday2 0 Tuesday 0 0 Tuesday1 0 Wednesday 0 0 Wednesday0 0 Thursday 0 0 Thursday8 1 Friday 5 1 Friday0 0 Monday 1 0 Monday1 0 Tuesday 0 0 Tuesday2 0 Wednesday 0 0 Wednesday0 0 Thursday 0 0 Thursday6 1 Friday 6 1 Friday
WELL, THAT’S EMBARRASSING…
Jeans DO NOT increase deliverables and DO NOT increase revenue.
Still, enterprising Summiteers seek to find the causal effect of jeans
How might we find the CAUSAL effect of jeans?
BEST OPTION: RANDOMIZED CONTROL TRIAL
Summit randomly assigns casual days
Jean-wearing is exogenous to deliverables, i.e. the assignment to wear jeans is made without regard to how it might influence business
Causal effect of jeans on deliverables: Avg Deliverablesjeans – Avg Deliverablesno jeans
NON-RANDOMIZED DATA
Jean-wearing is endogenous to deliverables, i.e. the decision to wear jeans may be made according to deliverable deadlines and client meetings
Causal effect of jeans on deliverables IS NOT: Avg Deliverablesjeans – Avg Deliverablesno jeans
Why??
BALANCE IN TREATMENT ASSIGNMENT
Exogenous Treatment Endogenous Treatment
Treatment Control
Age
Race
Gender
Income
Age
Race
Gender
Income
Treatment Control
Older
Black
Female
$$$
Younger
$
White
Male
QUASI-EXPERIMENTAL METHODS: MAKING ENDOGENOUS VARIABLES EXOGENOUS
Difference-in-Difference1. Identify a suitable
control group (Firm X)2. Verify parallel trend in
deliverables before treatment (controlling for obs characteristics)
3. Verify no spillover effects
4. Jeans are exogenous, conditional on the parallel trend (which is conditional on obs characteristics)
QUASI-EXPERIMENTAL METHODS: MAKING ENDOGENOUS VARIABLES EXOGENOUS
Regression Discontinuity1. Summit management
allows staff above a fixed threshold of billable hours to wear jeans (threshold is unknown to staff)
2. Staff just above and just below the threshold are equally productive
3. Jeans are exogenous immediately above and below threshold
Billable hours (hundreds)
Delivera
ble
s in
201
4
QUASI-EXPERIMENTAL METHODS: MAKING ENDOGENOUS VARIABLES EXOGENOUS
Propensity Score Matching1. Summit allows all staff
to wear jeans if they want
2. Estimate the probability of jean-wearing given observable staff characteristics
3. Assume we observe ALL relevant predictors of jean-wearing
4. Jeans are exogenous at each probability level
QUASI-EXPERIMENTAL METHODS: MAKING ENDOGENOUS VARIABLES EXOGENOUS
Synthetic Control Method1. Identify several potential
control groups (firms)2. Construct a synthetic Summit
that is a weighted combination of other firms
3. Constrain the synthetic Summit to be approx equal to Summit in observable characteristics and deliverables before the jeans policy
4. Jeans are exogenous, conditional on the pre-treatment equality between Summit and the synthetic control
B
A E
D
C
CONCLUSIONS
Correlation = Causation
Data are generally messy, poorly-tracked, and non-experimental (rife with endogeneity!)
Cutting edge evaluators, statisticians, and econometricians must be able to:1. identify endogenous and exogenous variables2. implement statistical methods to mitigate
endogeneity
The causal effect of jeans is 10,000 additional deliverables