OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression...
Transcript of OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression...
![Page 1: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/1.jpg)
OLS Assumptions and Goodness of Fit
![Page 2: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/2.jpg)
A little warm-up
• Assume I am a poor free-throw shooter. To win a contest I can choose to attempt one of the two following challenges:
• A. Make three out of four free throws• B. Make six out of eight
• Which should I choose? Why?
![Page 3: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/3.jpg)
Gauss-Markov Assumptions
• These are the full ideal conditions• If these are met, OLS is BLUE — i.e. efficient
and unbiased.• Your data will rarely meet these conditions–This class helps you understand what to do
about this.
![Page 4: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/4.jpg)
Pop Quiz
• Take out a sheet of paper and write down all the Gauss-Markov assumptions.
![Page 5: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/5.jpg)
Assumptions of Classical Linear Regression
• A1: The regression model is linear in parameters– It may not be linear in variables
Y=B1+B2Xi
![Page 6: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/6.jpg)
Assumptions of Classical Linear Regression
• A1: The regression model is linear in parameters– It may not be linear in variables
Y=B1+B2X+B3X2
![Page 7: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/7.jpg)
Assumptions of Classical Linear Regression
• A2: X values are fixed in repeated sampling • Think about an experiment with different
dosages assigned to different groups
• We can also do this if X values vary in repeated sampling, as long as cov(Xi, ui) = 0– See chapter 13 if you’re curious about the
details
![Page 8: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/8.jpg)
What if we violate linearity?
• If you have a non-linear relationship between X and Y and you don’t include an X-squared or X-cubed term, what is the problem?
• A true relationship may exist between X and Y that you fail to detect.
![Page 9: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/9.jpg)
• A2: X values are fixed in repeated sampling • Think about an experiment with different
dosages assigned to different groups
• We can also do this if X values vary in repeated sampling, as long as cov(Xi, ui) = 0
• Think about this as requiring random sampling
![Page 10: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/10.jpg)
Expected Value of Errors is Zero
• A3: Mean value of ui = 0.– E[ui|Xi] = 0– E[ui] = 0 if X is fixed (non-stochastic)
• Its ok to have big errors, but we can’t be wrong systematically– We call that bias
![Page 11: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/11.jpg)
What if the expected value of the errors is not zero?
• This would indicate specification error –Omitted variable bias, for example
![Page 12: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/12.jpg)
Assumptions of Classical Linear Regression
• Homoskedasticity or Constant Variance of ui
![Page 13: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/13.jpg)
What happens if we violate homoskedasticity?
• This is called heteroskedasticity. –Model uncertainty varies from observation to
observation.•Often true in cross-sectional data due to
omitted variable bias.
– See chapter 13 if you’re curious about the details of heteroskedasticity
![Page 14: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/14.jpg)
No Autocorrelation
• A5: No autocorrelation between disturbances– cov(ui,uj |Xi, Xj) = 0
• The observations are sampled independently
![Page 15: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/15.jpg)
What if we have autocorrelation?
• More or less always the case in panel data.– So we have panel-corrected standard errors,
etc. • Also sometimes the case if we sample multiple
children from the same family, or multiple regions from the same country, etc. – Clustered standard errors
![Page 16: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/16.jpg)
Degrees of Freedom
• A6: Number of observations n must be greater than the number of parameters to be estimated – n > number of explanatory variables– AKA degrees of freedom
![Page 17: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/17.jpg)
Not Enough Degrees of Freedom
• If you don’t have enough degrees of freedom, you can’t estimate your parameters– The smaller your sample size, the less precise
your estimates (i.e. large standard errors).– Unable to reject the null hypothesis of no
difference even if the true effect is large.
![Page 18: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/18.jpg)
Variation but no Outliers
• A7: X must vary, but there must not be any outliers
![Page 19: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/19.jpg)
What if there are outliers?
• Our model works too hard to fit these values, giving them effectively too much weight• This is the squared errors problem.
![Page 20: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/20.jpg)
Correct specification• A8: Regression model is correctly
specified.– The correct variables are included– We have the correct functional form– Correct assumptions about the probability
distributions of Yi, Xi and ui.
![Page 21: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/21.jpg)
No perfect multicollinearity• A9: With multiple regression, we add the
assumption of no perfect multicollinearity– The correlation between any two x variables <
1
![Page 22: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/22.jpg)
No perfect multicollinearity• With perfect collinearity, we have to drop
one x variable to even estimate our betas.• With near-perfect collinearity, variance is
inflated–But estimates are not biased
![Page 23: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/23.jpg)
Gauss-Markov Theorem
• When all those assumptions hold, OLS is BLUE– Best Linear Unbiased Estimator– “Best” means least variance (most efficient)– Unbiased means: E[�̂2] = �2
![Page 24: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/24.jpg)
How “good” does it fit?• To measure “reduction in errors” we need a
benchmark for comparison.• The mean of the dependent variable is a
relevant and tractable benchmark for comparing predictions.
• The mean of Y represents our “best guess” at the value of Yi absent other information.
![Page 25: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/25.jpg)
Sums of Squares
• This gives us the following 'sum-of-squares' measures:
• Total Variation = Explained Variation + Unexplained Variation
![Page 26: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/26.jpg)
How well does our model perform?
• R squared statistic– = TSS-USS/TSS– =ESS/TSS• Bounded between 0 and 1• Higher values indicate a better fit
![Page 27: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/27.jpg)
Questions• How do the fitted values of Y change if we
multiply X by a constant?• What if we add a constant to X?
![Page 28: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/28.jpg)
Why do we have an error term• The error term includes the effect of all X
variables not in our model that still effect Y.– Parsimony, intrinsic randomness of humans, – Vague theory, measurement error, wrong
functional form
![Page 29: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/29.jpg)
What does an error term imply?• If we run our project multiple times, we will
estimate a slightly different regression line every time.
![Page 30: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/30.jpg)
How do we know if our test statistic is any good?
• OLS is an estimator– It calculates the slope of the sample
regression line (i.e. the SRF)• It gives us a test statistic (i.e. a p-value)– What does that mean?
• IF AND ONLY IF the assumptions of OLS are met, and the true slope of the population regression line is 0, there is an x percent chance we would estimate a slope this large in our sample regression.
![Page 31: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/31.jpg)
Can we test that?• YES!• First, estimate our regression line, and
calculate the critical value (p=.05)• 2nd, lets make there be no relationship.– “shuffle” the data
• 3rd, re-estimate the regression line. Is the slope steeper than our critical value?
• Repeat steps 2 and 3 10,000 times. How often should the slope of the regression line be greater than the critical value I’ve
![Page 32: OLS Assumptions and Goodness of Fit€¦ · – It calculates the slope of the sample regression line (i.e. the SRF) • It gives us a test statistic (i.e. a p-value) – What does](https://reader034.fdocuments.in/reader034/viewer/2022050218/5f63f06e81d30867f90cae4e/html5/thumbnails/32.jpg)
What does that tell us?• It tells us our type 1 error rate– How often would we reject the null when we
shouldn’t (i.e. when the null is true)• What about Type 2 errors?– How often would we fail to reject the null when
the true value of beta is actually B1? – To calculate that, we need the sample size,
the variance of x and y, all the OLS assumptions
– Or we can simulate it• Validity and Power