TR Endogeneity
Transcript of TR Endogeneity
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
EconometricsEndogeneity
Toke Reichstein
Department of Innovation and Organizational EconomicsCopenhagen Business School
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Endogeneity and Instrumental Variables RegressionWhat is Endogeneity?What To Do About It?Instruments
Existence of Endogeneity and Evaluating IV’sTesting For EndogeneityTesting the Instruments Strength and Validity
Panel Data and Fixed EffectsCross section versus PanelWhat happens in a Fixed Effects Setting?
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Two Types of Data – Two Sets of Models
I Cross Section DataI Instrumental Variables Regression - Two-Stage regression
approach to model the unobserved
I Panel DataI Exploiting the time dimension of the subjects to control for the
unobserved
Impossible to cover both in one short session - we concentrate onthe cross section setting
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Endogeneity in Management
I Endogeneity is one of the most major challenges ineconometric analysis in management and much of socialsciences
I Social sciences is about understanding the behaviour of people
I It is not possible to establish a laboratory like in naturalsciences and run experiments – keeping the ceteris paribusassumption
I As a consequence, much of the work done in social sciencesare biased since it suffers endogeneity
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
What is Endogeneity?
I Endogeneity can be caused by three circumstances
1. Omitted Variables2. Measurement Error3. Simultaneity
I The effect of endogeneity is bias in estimates and hence:I Rejecting a hypothesis that in fact is true (Type I Error)I Fail to reject a hypothesis that in fact is false (Type II Error)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Omitted Variables Case
I Example:I Consider a sub-sample of subjects for whom we wish to
understand whether college degree have an effect on wagesI Unfortunately, to understand the effect of a college degree, we
need to have a proxy for the subjects intrinsic abilityI Intrinsic ability may influence the likelihood of obtaining a
college degreeI The intrinsic ability may also influence the wages you obtainI As a result a positive estimate on return on college degree may
be attributed to the intrinsic ability of the individual ratherthan the degree
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Self-selection Bias
I Sometimes we call omitted variables endogeneity forself-selection bias
I This is often used when we wish to understand the effect of abehaviour or being enrolled in a program (like the college)
I Here the subject have self-selected to behave in a particularmanner or have chosen to be in the program
I That choice is not a random choice → not a random variable
I We need to understand the choice before we can understandthe effect of that choice on the main variable of interest
I We need to understand an unobserved factor that leads toindividuals self-selecting into a scenario or a behaviour if theunobserved factors have a direct implication on the mainvariables
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Think of a case in which you believe a standard regressionwould suffer from endogeneity
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
What to do about Unobserved Heterogeneity Endogeneity?
I Do nothing and accept potential bias
I Collect panel data and correct with a model that solves theproblem of endogeneity
I Find a suitable proxy for the unobserved - which then is notunobserved anymore
I Apply Instrumental Variables Regression
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
The Example of Wages
I We wish to understand the impact of education on wages
I We are unable to measure individuals non-education basedcapabilities, which not only influence wages but also thechoice and ability to complete a degree
I Here these capabilities are the unobserved heterogeneitycausing bias in the estimated effect of education on wages
I In this case probably a positive impact since boosting theestimated effect of education
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
OLS’s Problem
I The problem using an OLS in cases which suffers fromendogeneity is that the error term and the explanatoryvariables become correlated
Cov(xi , εi ) 6= 0 (1)
I This is caused by the unobserved element (omitted variable)since it is hidden in the error term
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
The Problem at its Core
I Consider the following main equation of interest:
yi = β0 + β1x1i + β2x2i + · · ·+ βkxki + γqi + vi (2)
I Here qi is the unobserved variable (such as intrinsic ability)and vi is the traditional error term
I If we omit qi the equation transforms into:
yi = β0 + β1x1i + β2x2i + · · ·+ βkxki + ui (3)
I where ui = γqi + viI If cov(qi , xj) 6= 0 where j ∈ 1, 2, . . . , k , then cov(ui , xj) 6= 0
I We violate one of the OLS assumptions 99K this is theendogeneity problem which represents a potential bias
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Bias and IV Regression
I We cannot consistently estimate any of the βj ’s, when ui iscorrelated with any of the regressors
I Each of the βj ’s consists of the ’true’ βj + an omittedvariables bias
I γ > 0 and cov(xi , qi ) > 0 leads to an upwards/positive bias instimates (e.i. the effect of xi is overestimated)
I Instrumental variables (IV) regression is designed to controlfor unobserved heterogeneity
I IV regression is designed to correct the esimates in the mainequation as an effect of the unobserved
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Instruments - What are they?
I Instruments (zi ) are variables used to exlain a variable wesuspect of being endogeneous and which are exogenous withrespect to the main equation
cov(zi , ui ) = 0 (4)
cov(zi , xi ) 6= 0 (5)
I These are tough criteriaI Challenge: to find variables correlated with the endogenous
variable but uncorrelated with the part of the error term thatis due to the unobserved heterogeneity
I Rule of thumb: a good instrument should correlate with thekey independent variable, but not with the main equationdependent variable
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Try to identify some instruments appropriate for usage in theexample of educations effect on wages
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
The 2SLS Model for IV Regression I
I Assume that we want to study yi using x1i and a number ofcontrols xji where j ∈ 2, 2, . . . , k
I Assume x1i to be endogenous
I We have a n instruments Zni useful for predicting x1iI The 2sls model becomes
yi = β0 + β1x1i + βjxji + εi (6)
x1i = π0 + πnzni + πjxji + vi (7)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
The 2SLS Model for IV Regression II
I Since cov(zni , εi ) = 0 then it must be thatcov(π0 + πnzni , εi ) = 0
I In your instrumentation (regression against the endogenousvariable), you also include the remaining explanatory variables
I We also sometimes call the instrumental variables zni for theexcluded instruments, because they do not appear in the mainequation explaining yi
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
The 2SLS Approach
I First run the regression against the endogenous variable (firststage) and calculate the predicted x1i (x1i )
I Then use the predicted x1i (x1i ) rather than the observed x1i inthe main (regression) equation (second stage)
yi = β0 + β!x1i + +βjxji + εi (8)
I We have removed the unobserved element of the endogenousvariable
I What is left is the effect of the predicted value of theendogenous variable
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Difference Between OLS and IV
I The OLS estimate is:
β1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2(9)
I The IV estimate is:
β1 =
∑ni=1(zi − z)(yi − y)∑ni=1(zi − z)(xi − x)
(10)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
Interpretation of Results
I Generally, IV regression follows the same conventions as OLS.
I Here the test statistics of the parameters can be shown tofollow a standard normal distribution - the test statistics arehence referred to as z scores and compared to the normaldistribution and not t-distributed as is the case in OLS
I The R2 in IV is less useful since it in fact can be negative; theResidual Sum of Squares can be higher than the Total Sum ofSquares in IV - do not interpret it.
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
OLS Regression
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
What is Endogeneity?What To Do About It?Instruments
2SLS Regression
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Bad Instruments
I If the instruments are poor in the sense that they are notexogenous to the main equation, we obtain biased results(they are said not to be ’valid’)
I The same goes if the correlation between the instruments andthe endogenous variable is not significant (they are said to be’weak’ – they are required to be ’strong’)
I This can be seen from the following expression of theestimated parameter:
plimβ1 = β1 +Corr(zi , ui )
Corr(zixi )
σuσx
(11)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Three Things to Check
I Make sure that you in fact can trust your IV regressionestimates
I You need to check three things:I That the ’suspected’ explanatory variable indeed is endogenousI That you do not have weak instrumentsI Overidentification - validity of instruments
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Endogeneity or Not?
I If all the criteria otherwise are met, we can assume that aproblem with endogeneity would produce different estimates inthe 2SLS case compared to the standard OLS
I We have endogeneity effects if the estimates of the 2SLSdiffer significantly from those of the OLS
I Use the Hausman Test (visual comparison is too weak butmay give a hint)
I Run the two regressions (OLS and IV) and store themI Use the Hausman test to see if the coefficient are different
(<hausman>)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
2SLS Regression
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Durbin-Wu- Hausman Test for Endogeneity
I We wish to understand if the OLS estimates are consistentI Follow 4 steps:
1. Run the reduced form regression against the endogenousvariable
2. Extract the residuals3. Run the main equation including these residuals as explanatory
variables4. Test if the residual is significantly different from zero using a f
test (<test var>)
I If the test shows significance → endogeneity issues
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
2SLS Regression
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Strength of Instruments
I Many tests for weak instruments have been proposed
I No clear consensus on best approach for evaluation
I Generally we consider the significance of the instruments inthe first stage equation
I Significance suggest instruments not to be weak since thecriteria is:
cov(zi , xi ) 6= 0 (12)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Overidentification – Validity
I We say that we have potential overidentification if the numberof instruments exceeds the number of endogenous variables
I We could potentially drop some of the instruments to makethe estimation less restricted
I We have more exogenous variables than needed to estimatethe parameter in the main equation
I We also increase the likelihood of invalid instruments that donot uphold the rule:
Cov(zi , ui ) = 0 (13)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Overidentification (Sargan) Test
I The test is used to test for overidentificationI Follow the following steps
1. Estimate the 2SLS IV regression - Extract residuals2. Regress these residuals on all exogenous variables and extract
R2
3. Calculate nR2 which is χ2 distributed4. Compare the value with the critical value in the chi-square
table with degrees of freedom equal to # instruments less #endogenous variables
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Overidentification Test
I If the statistics (nR2) exceeds the critical χ2 value, we canconclude that the instruments are not exogenous and henceinvalid.
I They are not uncorrelated with the error term and hence hassome explanatory power in the main equation.
I Even when the overidentification test suggests that theinstruments are valid, we should be very careful: The testassumes that one instrument is valid.
I If all instruments do not fulfill the criteria Cov(zi , ui ) = 0,then the test might suggest that the instruments are valid,even when they are not!!
I Use your own reasoning/theory in arguing why theinstruments can be considered correlated with the endogenousvariable but not with yi
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Overidentification Test in Stata
I Instead of using <ivregress> use instead <ivreg2>
I This procedure will automatically produce the key statisticsfor evaluating overidentification (validity)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Testing For EndogeneityTesting the Instruments Strength and Validity
Exercise - Endogeneity
I Assume you wish to understand the effect of politicalengagement on the number of minutes individuals spends onreading their newspaper
I Why would there be endogeneity
I What instruments do you think can be used for correcting thisendogeneity?
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Panel Data
I Panel Data - cross sections of subjects re-sampled
I Typically: We observe characteristics of subject at severaldifferent points in time
I Having several observations on the same subject allows us tocontrol for unobserved characteristics
I Panel also allows researchers to investigate causality
I Panel data allow the research to investigate the lag of aneffect
I Some questions are simply in need of panel data forinvestigation
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Divorce Rate Example
I Lets consider the divorce rate across american states
I We wish to understand the relationship between welfarepayments (AFDC) and the divorce rate
I We consider the cross section of states looking at the size ofAFDC and the divorce rate in the state
I Prior to the study we would think that the relationship wouldbe positive since states with desirable economic climates enjoyboth low divorce rates and low welfare payments
I Furthermore, welfare payment systems are conducive since itaid individuals to cope as single rather than married
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Cross Section of Divorce in American States
I A regression line would suggest that there is a negativerelationship (-0.37) between aid to families with dependentchildren and the divorce rate
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Cross Section of Divorce in American States
I We would conclude that a higher AFDC is associated with lowdivorce rate
I The fast and hasty researcher will conclude that wealthystates that can afford extensive AFDC programs also areconducive to a different cultural and economic climate
I Furthermore, ASDCs potentially can help families withdependent children to cope with their situation and avoidquarrels based on economic positioning
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Panel of Observations Across American States
I Now we see that the relationship within the sates goes in theopposite direction suggesting the expected positiverelationship like the one expected initially
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Cross Section Versus Panel
I The example illustrates that a cross sectional analysis and apanel analysis may point in completely opposite directions
I The dynamic relationship between the variables differsdramatically from the cross sectional relationship betweenstates
I This may point to cross sections suffering from problems inthe shape of unobserved heterogeneity across the studiedsubjects – a feature panel data can be used to remedy
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Panel Data
I Lets assume we did not settle for a cross section and thereforeput some effort into collecting further data giving us a paneldataset
I Equation to be tested only change marginally, but it providesstrong options for controlling for unobservedheterogeneity/endogeneity
I The original equation looked as depicted in equation 14 andlets assume it is changed into equation 15
yi = β0 + β1x1i + β2x2i + · · ·+ βkxki + γqi + vi (14)
yit = β0 + β1x1it + β2x2it + · · ·+ βkxkit + γqi + vit (15)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Fixed Effects Modelling
I In the fixed effect model we convert all the variables bysubtracting its own mean value
(yit − yi ) = (β0 − β0) + β1(x1it − x1i ) + · · ·+βk(xkit − xki ) + γ(qi − qi ) + vit − vit (16)
I Since all variables that do not change over time (thosewithout i subscripts) obtain the same mean value, theydisappear into a nill - this includes the intercept
I We get a new equation expressed by:
yit = βxit + εit (17)
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Controlled for All Fixed Unobserved Effects
I The Fixed Effects estimation evens out all effects that arefixed
I That includes also all unobserved effects
I Puts a limit on what variables we can obtain parameters for
I The degrees of freedom is not calculated in the standard way
I Degrees of freedom = N*T - N - k
I We loose N to the demeaning of the function and k due tothe explanatory variables
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Possibilities in Fixed Effects
I Even if we cannot include fixed effects estimates, we caninclude them interacting with time varying variables
I Interacting with time dummies will enable us to express howthe effect of a constant variable changes across time
I We are able to determine the increasing or decreasing effectas time goes by while keeping the overall effect fixed
I Example: How does education affect wages after completingfinal exams - we cannot put education in, but we can addeducation interacted with time dummies
Toke Reichstein Econometrics
OutlineEndogeneity and Instrumental Variables Regression
Existence of Endogeneity and Evaluating IV’sPanel Data and Fixed Effects
Cross section versus PanelWhat happens in a Fixed Effects Setting?
Be Careful
I This was only a short introduction to Fixed Effects - muchmore is needed for sound panel data analysis
I there are other options that should be considered and maysuit your data better
I Random EffectsI 1st DifferencingI Least Square Dummy Variables Model (LSDV)
Toke Reichstein Econometrics