Econometrics notes (Introduction, Simple Linear regression, Multiple linear regression)
Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton...
Transcript of Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton...
![Page 1: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/1.jpg)
Week 5: Simple Linear Regression
Brandon Stewart1
Princeton
September 28-October 2, 2020
1These slides are heavily influenced by Matt Blackwell, Adam Glynn, Erin Hartmanand Jens Hainmueller. Illustrations by Shay O’Brien.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 1 / 127
![Page 2: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/2.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI hypothesis testingI what is regression
This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit
Next WeekI mechanics with two regressorsI omitted variables, multicollinearity
Long RunI probability → inference → regression → causal inference
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 2 / 127
![Page 3: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/3.jpg)
Macrostructure—This Semester
The next few weeks,
Linear Regression with Two Regressors
Break Week and Multiple Linear Regression
Rethinking Regression
Regression in the Social Sciences
Causality with Measured Confounding
Unmeasured Confounding and Instrumental Variables
Repeated Observations and Panel Data
Review and Final Discussion
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 3 / 127
![Page 4: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/4.jpg)
1 Mechanics of OLS
2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4
3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective
4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation
5 Non-linearitiesLog TransformationsFun With LogsLOESS
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 4 / 127
![Page 5: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/5.jpg)
Narrow Goal: Understand lm() Output
Call:
lm(formula = sr ~ pop15, data = LifeCycleSavings)
Residuals:
Min 1Q Median 3Q Max
-8.637 -2.374 0.349 2.022 11.155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.49660 2.27972 7.675 6.85e-10 ***
pop15 -0.22302 0.06291 -3.545 0.000887 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.03 on 48 degrees of freedom
Multiple R-squared: 0.2075,Adjusted R-squared: 0.191
F-statistic: 12.57 on 1 and 48 DF, p-value: 0.0008866Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 5 / 127
![Page 6: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/6.jpg)
ReminderHow do we fit the regression line Y = β0 + β1X to the data?Answer: We will minimize the squared sum of residuals
iii YYu
Residual ui is “part” of Yi not predicted
n
i
iu1
2
,min
10
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 6 / 127
![Page 7: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/7.jpg)
The Population QuantityBroadly speaking we are interested in the conditional expectation function(CEF) in part because it minimizes the mean squared error.
The CEF has a potentially arbitrary shape but there is always a best linearpredictor (BLP) or linear projection which is the line given by:
g(X ) = β0 + β1X
β0 = E [Y ]− Cov[X ,Y ]
V [X ]E [X ]
β1 =Cov[X ,Y ]
V [X ]
This may not be a good approximation depending on how non-linear the trueCEF is. However, it provides us with a reasonable target that always exists.
Define deviations from the BLP as
u = Y − g(X )
then, the following properties hold:(1) E [u] = 0, (2) E [Xu] = 0, (3) Cov[X , u] = 0
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 7 / 127
![Page 8: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/8.jpg)
What is OLS?The best linear predictor is the line that minimizes
(β0, β1) = arg minb0,b1
E [(Y − b0 − b1X )2]
Ordinary Least Squares (OLS) is a method for minimizing the sampleanalog of this quantity. It solves the optimization problem:
(β0, β1) = arg minb0,b1
n∑i=1
(Yi − b0 − b1Xi )2
In words, the OLS estimates are the intercept and slope that minimizethe sum of the squared residuals.
There are many loss functions, but OLS uses the squared error losswhich is connected to the conditional expectation function. If wechose a different loss, we would target a different feature of theconditional distribution.Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 8 / 127
![Page 9: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/9.jpg)
Deriving the OLS estimator
Let’s think about n pairs of sample observations:(Y1,X1), (Y2,X2), . . . , (Yn,Xn)
Let {b0, b1} be possible values for {β0, β1}Define the least squares objective function:
S(b0, b1) =n∑
i=1
(Yi − b0 − b1Xi )2.
How do we derive the LS estimators for β0 and β1? We want tominimize this function, which is actually a very well-defined calculusproblem.
1 Take partial derivatives of S with respect to b0 and b1.2 Set each of the partial derivatives to 03 Solve for {b0, b1} and replace them with the solutions
We are going to step through this process together.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 9 / 127
![Page 10: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/10.jpg)
Step 1: Take Partial Derivatives
S(b0, b1) =n∑
i=1
(Yi − b0 − Xib1)2
=n∑
i=1
(Y 2i − 2Yib0 − 2Yib1Xi + b2
0 + 2b0b1Xi + b21X
2i )
∂S(b0, b1)
∂b0=
n∑i=1
(−2Yi + 2b0 + 2b1Xi )
= −2n∑
i=1
(Yi − b0 − b1Xi )
∂S(b0, b1)
∂b1=
n∑i=1
(−2YiXi + 2b0Xi + 2b1X2i )
= −2n∑
i=1
Xi (Yi − b0 − b1Xi )
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 10 / 127
![Page 11: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/11.jpg)
Solving for the Intercept
∂S(b0, b1)
∂b0= −2
n∑i=1
(Yi − b0 − b1Xi )
0 = −2n∑
i=1
(Yi − b0 − b1Xi )
0 =n∑
i=1
(Yi − b0 − b1Xi )
0 =n∑
i=1
Yi −n∑
i=1
b0 −n∑
i=1
b1Xi
b0n =
(n∑
i=1
Yi
)− b1
(n∑
i=1
Xi
)b0 = Y − b1X
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 11 / 127
![Page 12: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/12.jpg)
A Helpful Lemma on Deviations from Means
Lemmas are like helper results that are often invoked repeatedly.
Lemma (Deviations from the Mean Sum to 0)
n∑i=1
(Xi − X ) =
(n∑
i=1
Xi
)− nX
=
(n∑
i=1
Xi
)− n
n∑i=1
Xi/n
=
(n∑
i=1
Xi
)−
n∑i=1
Xi
= 0
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 12 / 127
![Page 13: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/13.jpg)
Solving for the Slope
0 = −2n∑
i=1
Xi (Yi − b0 − b1Xi )
0 =n∑
i=1
Xi (Yi − b0 − b1Xi )
0 =n∑
i=1
Xi (Yi − (Y − b1X )− b1Xi ) (sub in b0)
0 =n∑
i=1
Xi (Yi − Y − b1(Xi − X ))
0 =n∑
i=1
Xi (Yi − Y )− b1
n∑i=1
Xi (Xi − X )
b1
n∑i=1
Xi (Xi − X ) =n∑
i=1
Xi (Yi − Y )− X∑i=1
(Yi − Y ) (add 0)
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 13 / 127
![Page 14: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/14.jpg)
Solving for the Slope
b1
n∑i=1
Xi (Xi − X ) =n∑
i=1
Xi (Yi − Y )− X∑i=1
(Yi − Y )
b1
n∑i=1
Xi (Xi − X ) =n∑
i=1
Xi (Yi − Y )−∑i=1
X (Yi − Y )
b1
(n∑
i=1
Xi (Xi − X )−∑i=1
X (Xi − X )
)=
n∑i=1
(Xi − X )(Yi − Y ) add 0
b1
n∑i=1
(Xi − X )(Xi − X ) =n∑
i=1
(Xi − X )(Yi − Y )
b1 =
∑ni=1(Xi − X )(Yi − Y )∑n
i=1(Xi − X )2
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 14 / 127
![Page 15: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/15.jpg)
The OLS estimator
Now we’re done! Here are the OLS estimators:
β0 = Y − β1X
β1 =
∑ni=1(Xi − X )(Yi − Y )∑n
i=1(Xi − X )2
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 15 / 127
![Page 16: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/16.jpg)
Intuition of the OLS estimatorThe intercept equation tells us that the regression line goes throughthe point (Y ,X ):
Y = β0 + β1X
The slope for the regression line can be written as the following:
β1 =
∑ni=1(Xi − X )(Yi − Y )∑n
i=1(Xi − X )2=
Sample Covariance between X and Y
Sample Variance of X
The higher the covariance between X and Y , the higher the slope willbe.
Negative covariances → negative slopes;positive covariances → positive slopes
If Xi doesn’t vary, the denominator is undefined.
If Yi doesn’t vary, you get a flat line.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 16 / 127
![Page 17: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/17.jpg)
Mechanical properties of OLSLater we’ll see that under certain assumptions, OLS will have nicestatistical properties.
But some properties are mechanical since they can be derived fromthe first order conditions of OLS.
1 The sample mean of the residuals will be zero:
1
n
n∑i=1
ui = 0
2 The residuals will be uncorrelated with the predictor(Cov is the sample covariance):
n∑i=1
Xi ui = 0 =⇒ Cov(Xi , ui ) = 0
3 The residuals will be uncorrelated with the fitted values:n∑
i=1
Yi ui = 0 =⇒ Cov(Yi , ui ) = 0
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 17 / 127
![Page 18: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/18.jpg)
OLS slope as a weighted sum of the outcomes
One useful derivation is to write the OLS estimator for the slope as aweighted sum of the outcomes.
β1 =n∑
i=1
WiYi
Where here we have the weights, Wi as:
Wi =(Xi − X )∑ni=1(Xi − X )2
This is important for two reasons. First, it’ll make derivations latermuch easier. And second, it shows that is just the sum of a randomvariable. Therefore it is also a random variable.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 18 / 127
![Page 19: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/19.jpg)
Lemma 2: OLS as a Weighted Sum of Outcomes
Lemma (OLS as Weighted Sum of Outcomes)
β1 =
∑ni=1(Xi − X )(Yi − Y )∑n
i=1(Xi − X )2
=
∑ni=1(Xi − X )Yi∑ni=1(Xi − X )2
−∑n
i=1(Xi − X )Y∑ni=1(Xi − X )2
=
∑ni=1(Xi − X )Yi∑ni=1(Xi − X )2
=n∑
i=1
WiYi
Where the weights, Wi are:
Wi =(Xi − X )∑ni=1(Xi − X )2
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 19 / 127
![Page 20: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/20.jpg)
We Covered
A brief review of regression
Derivation of the OLS estimator
OLS as a weighted sum of outcomes
Next Time: The Classical Perspective
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 20 / 127
![Page 21: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/21.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI hypothesis testingI what is regression
This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit
Next WeekI mechanics with two regressorsI omitted variables, multicollinearity
Long RunI probability → inference → regression → causal inference
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 21 / 127
![Page 22: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/22.jpg)
1 Mechanics of OLS
2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4
3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective
4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation
5 Non-linearitiesLog TransformationsFun With LogsLOESS
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 22 / 127
![Page 23: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/23.jpg)
Sampling distribution of the OLS estimatorRemember: OLS is an estimator—it’s a machine that we plugsamples into and we get out estimates.
OLS
Sample 1: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)1
Sample 2: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)2
......
Sample k − 1: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)k−1
Sample k: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)k
Just like the sample mean, sample difference in means, or the samplevariance
It has a sampling distribution, with a sampling variance/standarderror, etc.
Let’s take a simulation approach to demonstrate:I Let’s use some data from Acemoglu, Daron, Simon Johnson, and
James A. Robinson. “The colonial origins of comparative development:An empirical investigation.” 2000
I See how the line varies from sample to sample
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 23 / 127
![Page 24: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/24.jpg)
Simulation procedure
1 Draw a random sample of size n = 30 with replacement usingsample()
2 Use lm() to calculate the OLS estimates of the slope and intercept3 Plot the estimated regression line
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 24 / 127
![Page 25: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/25.jpg)
Population Regression
1 2 3 4 5 6 7 8
67
89
1011
12
Log Settler Mortality
Log
GDP
per c
apita
gro
wth
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 25 / 127
![Page 26: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/26.jpg)
Randomly sample from AJR
1 2 3 4 5 6 7 8
67
89
1011
12
Log Settler Mortality
Log
GDP
per c
apita
gro
wth
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 26 / 127
![Page 27: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/27.jpg)
Sampling distribution of OLS
You can see that the estimated slopes and intercepts vary from sampleto sample, but that the “average” of the lines looks about right.
Sampling distribution of intercepts
β0
Freq
uenc
y
6 8 10 12 14
010
030
0
Sampling distribution of slopes
β1
Freq
uenc
y-1.5 -1.0 -0.5 0.0 0.5
010
030
0
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 27 / 127
![Page 28: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/28.jpg)
The Sampling Distribution is a Joint Distribution!
While both the intercept and the slope vary, they vary together.
−1.2
−0.8
−0.4
0.0
8 10 12 14
β0
β 1
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 28 / 127
![Page 29: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/29.jpg)
Sample Mean Properties Review
In the last few weeks we derived the properties of the samplingdistribution for the sample mean, Xn.
Under essentially only the iid assumption (plus finite mean andvariance) we derived the large sample distribution as
Xn ∼ N(µ,σ2
n
)I This means the estimator is unbiased for the population mean:
E [Xn] = µ.I has sampling variance: σ2/nI and standard error: σ/
√n
This in turn gave us confidence intervals and hypothesis tests.
We will use the same strategy here!
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 29 / 127
![Page 30: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/30.jpg)
Our goal
What is the sampling distribution of the OLS slope?
β1 ∼?(?, ?)
We need fill in those ?s.
We’ll start with the mean of the sampling distribution. Is theestimator centered at the true value, β1?
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 30 / 127
![Page 31: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/31.jpg)
Classical Model: OLS Assumptions Preview
1 Linearity in Parameters: The population model is linear in itsparameters and correctly specified
2 Random Sampling: The observed data represent a random samplefrom the population described by the model.
3 Variation in X : There is variation in the explanatory variable.
4 Zero conditional mean: Expected value of the error term is zeroconditional on all values of the explanatory variable
5 Homoskedasticity: The error term has the same variance conditionalon all values of the explanatory variable.
6 Normality: The error term is independent of the explanatory variablesand normally distributed.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 31 / 127
![Page 32: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/32.jpg)
Hierarchy of OLS Assumptions
!"#$%&'(%)$*+(,(*+#-'./0%)$*
1(./(%)$*/$*2*
3$4/(-#"$#--*5)$-/-,#$'6*
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
@(A--B?(.C)D*EF<3GH*I-680,)%'*!$J#.#$'#*************
EK*($"*!LH"
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
M)8)-C#"(-%'/,6*
5:(--/'(:*<?*EF3GH*98(::B9(80:#*!$J#.#$'#***
E,*($"*NH*
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
M)8)-C#"(-%'/,6*
O).8(:/,6*)J*G..).-*
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 32 / 127
![Page 33: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/33.jpg)
OLS Assumption I
Assumption (I. Linearity in Parameters)
The population regression model is linear in its parameters and correctlyspecified as:
Yi = β0 + β1Xi + ui
Note that it can be nonlinear in variablesI OK: Yi = β0 + β1Xi + ui or
Yi = β0 + β1X2i + ui or
Yi = β0 + β1log(Xi ) + uI Not OK: Yi = β0 + β2
1Xi + ui orYi = β0 + exp(β1)Xi + ui
β0, β1: Population parameters — fixed and unknown
ui : Unobserved random variable with E [ui ] = 0 — captures all otherfactors influencing Yi other than Xi
We assume this to be the structural model, i.e., the model describingthe true process generating Yi
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 33 / 127
![Page 34: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/34.jpg)
OLS Assumption II
Assumption (II. Random Sampling)
The observed data:(yi , xi ) for i = 1, ..., n
represent an i.i.d. random sample of size n following the population model.
Data examples consistent with this assumption:
A cross-sectional survey where the units are sampled randomly
Potential Violations:
Time series data (regressor values may exhibit persistence)
Sample selection problems (sample not representative of thepopulation)
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 34 / 127
![Page 35: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/35.jpg)
OLS Assumption III
Assumption (III. Variation in X ; a.k.a. No Perfect Collinearity)
The observed data:xi for i = 1, ..., n
are not all the same value.
Satisfied as long as there is some variation in the regressor X in thesample.
Why do we need this?
β1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2
This assumption is needed just to calculate β.
Only assumption needed for using OLS as a pure data summary.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 35 / 127
![Page 36: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/36.jpg)
Stuck in a moment
Why does this matter? How would you draw the line of best fitthrough this scatterplot, which is a violation of this assumption?
-3 -2 -1 0 1 2 3
-2-1
01
X
Y
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 36 / 127
![Page 37: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/37.jpg)
OLS Assumption IV
Assumption (IV. Zero Conditional Mean)
The expected value of the error term is zero conditional on any value of theexplanatory variable:
E [ui |Xi = x ] = 0
E [ui |X ] = 0 implies a slightly weaker condition Cov(X , u) = 0
Given random sampling, E [u|X ] = 0 also implies E [ui |xi ] = 0 for all i
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 37 / 127
![Page 38: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/38.jpg)
Violating the zero conditional mean assumption
-3 -2 -1 0 1 2 3
-2-1
01
23
45
Assumption 4 violated
X
Y
-3 -2 -1 0 1 2 3-2
-10
12
34
5
Assumption 4 not violated
X
Y
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 38 / 127
![Page 39: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/39.jpg)
Violating the zero conditional mean assumption
−2 −1 0 1 2
−4
−2
02
4
Assumption 4 Violated
X
Y
−2 −1 0 1 2−
4−
20
24
Assumption 4 Not Violated
X
Y
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 39 / 127
![Page 40: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/40.jpg)
Unbiasedness
With Assumptions 1-4, we can show that the OLS estimator for the slopeis unbiased, that is E [β1] = β1.
Let’s prove it!
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 40 / 127
![Page 41: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/41.jpg)
Lemma 3: Weighted Combinations of Xi
Lemma (∑
i WiXi = 1)
n∑i=1
WiXi =n∑
i=1
Xi (Xi − X )∑nj=1(Xj − X )2
=1∑n
j=1(Xj − X )2
n∑i=1
Xi (Xi − X )
=1∑n
j=1(Xj − X )2
[n∑
i=1
Xi (Xi − X )−n∑
i=1
X (Xi − X )
]
=1∑n
j=1(Xj − X )2
n∑i=1
(Xi − X )(Xi − X )
= 1
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 41 / 127
![Page 42: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/42.jpg)
Lemma 4: Estimation Error
Lemma
β1 =n∑
i=1
WiYi
=n∑
i=1
Wi (β0 + β1Xi + ui )
= β0
(n∑
i=1
Wi
)+ β1
(n∑
i=1
WiXi
)+
n∑i=1
Wiui
= β1 +n∑
i=1
Wiui
β1 − β1 =n∑
i=1
Wiui
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 42 / 127
![Page 43: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/43.jpg)
Unbiasedness Proof
E [β1 − β1|X ] = E
[n∑
i=1
Wiui |X
]
=n∑
i=1
E [Wiui |X ]
=∑i=1
WiE [ui |X ]
=∑i=1
Wi0
= 0
Using iterated expectations we can show that it is also unconditionallybiased E [β1] = E [E [β1|X ]] = E [β1] = β1.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 43 / 127
![Page 44: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/44.jpg)
Consistency
Recall the estimation error,
β1 = β1 +n∑
i=1
Wiui
Under iid sampling we have
n∑i=1
Wiui =
∑ni=1(Xi − X )ui∑ni=1(Xi − X )
p→ Cov[Xi , ui ]
V [Xi ]
Under A4 (zero conditional mean error) we have the slightly weakerproperty Cov [Xi , ui ] = 0 so as long as V [X ] > 0, then we have,
β1p→ β1
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 44 / 127
![Page 45: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/45.jpg)
We Covered
The first four assumptions of the classical model
We showed that these four were sufficient to establish unbiasednessand consistency.
We even proved it to ourselves!
Next Time: The Classical Perspective Part 2: Variance.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 45 / 127
![Page 46: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/46.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI hypothesis testingI what is regression
This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit
Next WeekI mechanics with two regressorsI omitted variables, multicollinearity
Long RunI probability → inference → regression → causal inference
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 46 / 127
![Page 47: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/47.jpg)
1 Mechanics of OLS
2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4
3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective
4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation
5 Non-linearitiesLog TransformationsFun With LogsLOESS
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 47 / 127
![Page 48: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/48.jpg)
Where are we?
Now we know that, under Assumptions 1-4, we know that
β1 ∼?(β1, ?)
That is we know that the sampling distribution is centered on thetrue population slope, but we don’t know the population variance.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 48 / 127
![Page 49: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/49.jpg)
Sampling variance of estimated slope
In order to derive the sampling variance of the OLS estimator,
1 Linearity2 Random (iid) sample3 Variation in Xi
4 Zero conditional mean of the errors5 Homoskedasticity
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 49 / 127
![Page 50: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/50.jpg)
Variance of OLS EstimatorsHow can we derive Var[β0] and Var[β1]? Let’s make the following additionalassumption:
Assumption (V. Homoskedasticity)
The conditional variance of the error term is constant and does not vary as afunction of the explanatory variable:
Var[u|X ] = σ2u
This implies Var[u] = σ2u
→ all errors have an identical error variance (σ2ui = σ2
u for all i)
Taken together, Assumptions I–V imply:
E [Y |X ] = β0 + β1X
Var[Y |X ] = σ2u
Violation: Var[u|X = x1] 6= Var[u|X = x2] called heteroskedasticity.
Assumptions I–V are collectively known as the Gauss-Markov assumptions
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 50 / 127
![Page 51: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/51.jpg)
Heteroskedasticity
0 20 40 60 80 100
0
20
40
60
80
100
120
x
y
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 51 / 127
![Page 52: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/52.jpg)
Deriving the sampling variance
V [β1|X ] =??
V [β1|X ] = V
[n∑
i=1
Wiui
∣∣∣X]
=n∑
i=1
W 2i V [ui |X ] (A2: iid)
=n∑
i=1
W 2i σ
2u (A5: homoskedastic)
= σ2u
n∑i=1
((Xi − X )∑n
i ′=1(Xi ′ − X )2
)2
=σ2u∑n
i=1(Xi − X )2
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 52 / 127
![Page 53: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/53.jpg)
Variance of OLS Estimators
Theorem (Variance of OLS Estimators)
Given OLS Assumptions I–V (Gauss-Markov Assumptions):
V [β1 | X ] =σ2u∑n
i=1(xi − x)2
V [β0 | X ] = σ2u
{1
n+
x2∑ni=1(xi − x)2
}where V [u | X ] = σ2
u (the error variance).
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 53 / 127
![Page 54: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/54.jpg)
Understanding the sampling variance
V [β1|X1, . . . ,Xn] =σ2u∑n
i=1(Xi − X )2
What drives the sampling variability of the OLS estimator?I The higher the variance of Yi |Xi , the higher the sampling varianceI The lower the variance of Xi , the higher the sampling varianceI As we increase n, the denominator gets large, while the numerator is
fixed and so the sampling variance shrinks to 0.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 54 / 127
![Page 55: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/55.jpg)
Variance in X Reduces Standard Errors
−1.0 −0.5 0.0 0.5 1.0
−4
−2
02
4
x
y
−1.0 −0.5 0.0 0.5 1.0−
4−
20
24
x
y
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 55 / 127
![Page 56: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/56.jpg)
Estimating the Variance of OLS Estimators
How can we estimate the unobserved error variance Var [u] = σ2u?
We can derive an estimator based on the residuals:
ui = yi − yi = yi − β0 − β1xi
Recall: The errors ui are NOT the same as the residuals ui .
Intuitively, the scatter of the residuals around the fitted regression line shouldreflect the unseen scatter about the true population regression line.
We can measure scatter with the mean squared deviation:
MSD(u) ≡ 1
n
n∑i=1
(ui − ¯u)2 =1
n
n∑i=1
u2i
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 56 / 127
![Page 57: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/57.jpg)
Estimating the Variance of OLS Estimators
By construction, the regression line is closer since it is drawn to fit thesample we observe
Specifically, the regression line is drawn so as to minimize the sum of thesquares of the distances between it and the observations
So the spread of the residuals MSD(u) will slightly underestimate the errorvariance Var[u] = σ2
u on average
In fact, we can show that with a single regressor X we have:
E [MSD(u)] =n − 2
nσ2u (degrees of freedom adjustment)
Thus, an unbiased estimator for the error variance is:
σ2u =
n
n − 2MSD(u) =
n
n − 2
1
n
n∑i=1
ui =1
n − 2
n∑i=1
u2i
We plug this estimate into the variance estimators for β0 and β1.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 57 / 127
![Page 58: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/58.jpg)
Where are we?
Under Assumptions 1-5, we know that
β1 ∼?
(β1,
σ2u∑n
i=1(Xi − X )2
)Now we know the mean and sampling variance of the samplingdistribution.
How does this compare to other estimators for the population slope?
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 58 / 127
![Page 59: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/59.jpg)
Where are we?
!"#$%&'(%)$*+(,(*+#-'./0%)$*
1(./(%)$*/$*2*
3$4/(-#"$#--*5)$-/-,#$'6*
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
@(A--B?(.C)D*EF<3GH*I-680,)%'*!$J#.#$'#*************
EK*($"*!LH"
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
M)8)-C#"(-%'/,6*
5:(--/'(:*<?*EF3GH*98(::B9(80:#*!$J#.#$'#***
E,*($"*NH*
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
M)8)-C#"(-%'/,6*
O).8(:/,6*)J*G..).-*
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 59 / 127
![Page 60: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/60.jpg)
OLS is BLUE :(
Theorem (Gauss-Markov)
Given OLS Assumptions I–V, the OLS estimator is BLUE, i.e. the
1 Best: Lowest variance in class
2 Linear: Among Linear estimators
3 Unbiased: Among Linear Unbiased estimators
4 Estimator.
A linear estimator is one that can be written as β = Wy
Assumptions 1-5 are called the “Gauss Markov Assumptions”
Result fails to hold when the assumptions are violated!
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 60 / 127
![Page 61: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/61.jpg)
Gauss-Markov Theorem
Regression Analysis Tutorial 70
Econometrics Laboratory � University of California at Berkeley � 22-26 March 1999
All estimators
unbiased
linear
Gauss-Markov Theorem
OLS is efficient in the class of unbiased, linear estimators.
OLS is BLUE--best linear unbiased estimator.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 61 / 127
![Page 62: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/62.jpg)
Where are we?
Under Assumptions 1-5, we know that
β1 ∼?
(β1,
σ2u∑n
i=1(Xi − X )2
)
And we know that σ2u∑n
i=1(Xi−X )2 is the lowest variance of any linear
estimator of β1
What about the last question mark? What’s the form of thedistribution?
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 62 / 127
![Page 63: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/63.jpg)
Large-sample distribution of OLS estimatorsRemember that the OLS estimator is the sum of independent r.v.’s:
β1 =n∑
i=1
WiYi
Mantra of the Central Limit Theorem:“the sums and means of random variables tend to be Normallydistributed in large samples.”
True here as well, so we know that in large samples:
β1 − β1
SE [β1]∼ N(0, 1)
Can also replace SE with an estimate:
β1 − β1
SE [β1]∼ N(0, 1)
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 63 / 127
![Page 64: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/64.jpg)
Where are we?
Under Assumptions 1-5 and in large samples, we know that
β1 ∼ N
(β1,
σ2u∑n
i=1(Xi − X )2
)
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 64 / 127
![Page 65: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/65.jpg)
Sampling distribution in small samples
What if we have a small sample? What can we do then?
Can’t get something for nothing, but we can make progress if wemake another assumption:
1 Linearity
2 Random (iid) sample
3 Variation in Xi
4 Zero conditional mean of the errors
5 Homoskedasticity
6 Errors are conditionally Normal
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 65 / 127
![Page 66: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/66.jpg)
OLS Assumptions VI
Assumption (VI. Normality)
The population error term is independent of the explanatory variable, u⊥⊥X , andis normally distributed with mean zero and variance σ2
u:
u ∼ N(0, σ2u), which implies Y |X ∼ N(β0 + β1X , σ
2u)
Note: This also implies homoskedasticity and zero conditional mean.
Together Assumptions I–VI are the classical linear model (CLM)assumptions.
The CLM assumptions imply that OLS is BUE (i.e. minimum varianceamong all linear or non-linear unbiased estimators)
Non-normality of the errors is a serious concern in small samples. We canpartially check this assumption by looking at the residuals (more in comingweeks)
Variable transformations can help to come closer to normality
Reminder: we don’t need normality assumption in large samples
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 66 / 127
![Page 67: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/67.jpg)
Sampling distribution of OLS slope
If we have Yi given Xi is distributed N(β0 + β1Xi , σ2u), then we have
the following at any sample size:
β1 − β1
SE [β1]∼ N(0, 1)
Furthermore, if we replace the true standard error with the estimatedstandard error, then we get the following:
β1 − β1
SE [β1]∼ tn−2
The standardized coefficient follows a t distribution n − 2 degrees offreedom. We take off an extra degree of freedom because we had toestimate one more parameter than just the sample mean.
All of this depends on Normal errors!
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 67 / 127
![Page 68: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/68.jpg)
Where are we?
Under Assumptions 1-5 and in large samples, we know that
β1 ∼ N
(β1,
σ2u∑n
i=1(Xi − X )2
)
Under Assumptions 1-6 and in any sample, we know that
β1 − β1
SE [β1]∼ tn−2
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 68 / 127
![Page 69: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/69.jpg)
Hierarchy of OLS Assumptions
!"#$%&'(%)$*+(,(*+#-'./0%)$*
1(./(%)$*/$*2*
3$4/(-#"$#--*5)$-/-,#$'6*
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
@(A--B?(.C)D*EF<3GH*I-680,)%'*!$J#.#$'#*************
EK*($"*!LH"
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
M)8)-C#"(-%'/,6*
5:(--/'(:*<?*EF3GH*98(::B9(80:#*!$J#.#$'#***
E,*($"*NH*
1(./(%)$*/$*2*
7($")8*9(80:/$;*
</$#(./,6*/$*=(.(8#,#.-*
>#.)*5)$"/%)$(:*?#($*
M)8)-C#"(-%'/,6*
O).8(:/,6*)J*G..).-*
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 69 / 127
![Page 70: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/70.jpg)
Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.
Gauss-Markov assumptions:
I (A1) linearity, (A2) i.i.d. sample, (A3) variation in X , (A4) zeroconditional mean error, (A5) homoskedasticity.
I basically, assume the model is right
OLS is BLUE, plus (A6) normality of the errors and we get small sampleSEs and BUE.
What is the basic approach here?
I A1 defines a linear model for the outcome:
Yi = β0 + β1Xi + uiI A2 and A4 let us write the CEF as function of Xi alone.
E [Yi |Xi ] = µi = β0 + β1Xi
I A5-6, define a probabilistic model for the conditional distribution:
Yi |Xi ∼ N (µi , σ2)
I A3 covers the edge-case that the βs are indistinguishable.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 70 / 127
![Page 71: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/71.jpg)
Agnostic views on regression
These assumptions assume we know a lot about how Yi is ‘generated’.
Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.
Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample
properties)
Lose the distributional assumptions and focus on approximating thebest linear predictor.
If the true CEF happens to be linear, the best linear predictor is it.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 71 / 127
![Page 72: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/72.jpg)
Unbiasedness Result
One of the results most people know is that OLS is unbiased, butunbiased for what?
It is unbiased for the CEF under the assumption that the model iscorrectly specified.
However, this could be a quite poor approximation to the true CEF ifthere is a great deal of non-linearity.
We will often use OLS as a means to approximate the CEF, but don’tforget that it is just an approximation!
We will return in a few weeks to how you diagnose this approximation.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 72 / 127
![Page 73: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/73.jpg)
Pedagogical Note
For now we are going to move forward with the classical worldviewand we will return to some alternative approaches later in the semesteronce we are comfortable with the matrix representation of regression.
This will lead to techniques like robust standard errors which don’trely on the assumptions of homoskedasticity (but have othertradeoffs!)
For now, just remember that regression is a linear approximation tothe CEF!
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 73 / 127
![Page 74: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/74.jpg)
We Covered
Sampling Variance
Gauss Markov
Large Sample and Small Sample Properties
Next Time: Inference
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 74 / 127
![Page 75: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/75.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI hypothesis testingI what is regression
This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit
Next WeekI mechanics with two regressorsI omitted variables, multicollinearity
Long RunI probability → inference → regression → causal inference
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 75 / 127
![Page 76: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/76.jpg)
1 Mechanics of OLS
2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4
3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective
4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation
5 Non-linearitiesLog TransformationsFun With LogsLOESS
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 76 / 127
![Page 77: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/77.jpg)
Where are we?
Under Assumptions 1-5 and in large samples, we know that
β1 ∼ N
(β1,
σ2u∑n
i=1(Xi − X )2
)Under Assumptions 1-6 and in any sample, we know that
β1 − β1
SE [β1]∼ tn−2
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 77 / 127
![Page 78: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/78.jpg)
Null and alternative hypotheses review
Null: H0 : β1 = 0I The null is the straw man we want to knock down.I With regression, almost always null of no relationship
Alternative: Ha : β1 6= 0I Claim we want to testI Could do one-sided test, but you shouldn’t
Notice these are statements about the population parameters, not theOLS estimates.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 78 / 127
![Page 79: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/79.jpg)
Test statistic
Under the null of H0 : β1 = c , we can use the following familiar teststatistic:
T =β1 − c
SE [β1]
Under the null hypothesis:I large samples: T ∼ N (0, 1)I any size sample with normal errors: T ∼ tn−2
I conservative to use tn−2 anyways since tn−2 is approximately normal inlarge samples.
Thus, under the null, we know the distribution of T and can use thatto formulate a rejection region and calculate p-values.
By default, R shows you the test statistic for β1 = 0 and uses the tdistribution.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 79 / 127
![Page 80: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/80.jpg)
Rejection regionChoose a level of the test, α, and find rejection regions thatcorrespond to that value under the null distribution:
P(−tα/2,n−2 < T < tα/2,n−2) = 1− α
This is exactly the same as with sample means and sample differencesin means, except that the degrees of freedom on the t distributionhave changed.
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
0.5
x
dnor
m(x
)
Retain RejectReject
0.025 0.025t = 1.96-t = -1.96
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 80 / 127
![Page 81: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/81.jpg)
p-value
The interpretation of the p-value is the same: the probability of seeinga test statistic at least this extreme if the null hypothesis were true
Mathematically:
P
(∣∣∣∣∣ β1 − c
SE [β1]
∣∣∣∣∣ ≥ |Tobs |
)If the p-value is less than α we would reject the null at the α level.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 81 / 127
![Page 82: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/82.jpg)
Confidence intervalsVery similar to the approach with sample means. By the samplingdistribution of the OLS estimator, we know that we can find t-valuessuch that:
P(− tα/2,n−2 ≤
β1 − β1
SE [β1]≤ tα/2,n−2
)= 1− α
If we rearrange this as before, we can get an expression for confidenceintervals:
P(β1 − tα/2,n−2SE [β1] ≤ β1 ≤ β1 + tα/2,n−2SE [β1]
)= 1− α
Thus, we can write the confidence intervals as:
β1 ± tα/2,n−2SE [β1]
We can derive these for the intercept as well:
β0 ± tα/2,n−2SE [β0]
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 82 / 127
![Page 83: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/83.jpg)
CIs Simulation Example
Returning to our simulation example we can simulate the samplingdistributions of the 95 % confidence interval estimates for β1 and β0
Statistical Inference for Simple Linear RegressionDiagnostics
Properties of the Least Squares Estimator (Point Estimation)Hypothesis TestsConfidence Intervals
Sampling distribution of interval estimates
Returning to the simulation example, we can simulate the sampling distributions of the95% interval estimates for bβ0 and bβ1.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
0 2 4 6 8 10
−6
−4
−2
02
46
xx
yy
ββ0
0 2 4 6 8 10
ββ1
−2.0 −1.5 −1.0 −0.5 0.0
Gov2000: Quantitative Methodology for Political Science I
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 83 / 127
![Page 84: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/84.jpg)
CIs Simulation Example
Statistical Inference for Simple Linear RegressionDiagnostics
Properties of the Least Squares Estimator (Point Estimation)Hypothesis TestsConfidence Intervals
Sampling distribution of interval estimates
When we repeat the process over and over, we expect 95% of the confidence intervalsto contain the true parameters.Note that, in a given sample, one CI may cover its true value and the other may not.
ββ0
0 2 4 6 8 10
ββ1
−2.0 −1.5 −1.0 −0.5 0.0
Gov2000: Quantitative Methodology for Political Science I
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 83 / 127
![Page 85: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/85.jpg)
Prediction error
How do we judge how well a line fits the data?
One way is to find out how much better we do at predicting Y oncewe include X into the regression model.
Prediction errors without X : best prediction is the mean, so oursquared errors, or the total sum of squares (SStot) would be:
SStot =n∑
i=1
(Yi − Y )2
Once we have estimated our model, we have new prediction errors,which are just the sum of the squared residuals or SSres :
SSres =n∑
i=1
(Yi − Yi )2
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 84 / 127
![Page 86: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/86.jpg)
Sum of Squares
1 2 3 4 5 6 7 8
67
89
1011
12
Total Prediction Errors
Log Settler Mortality
Log
GDP
per c
apita
gro
wth
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 85 / 127
![Page 87: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/87.jpg)
Sum of Squares
1 2 3 4 5 6 7 8
67
89
1011
12
Residuals
Log Settler Mortality
Log
GDP
per c
apita
gro
wth
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 85 / 127
![Page 88: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/88.jpg)
R-square
By definition, the residuals have to be smaller than the deviationsfrom the mean, so we might ask the following: how much lower is theSSres compared to the SStot?
We quantify this question with the coefficient of determination or R2.This is the following:
R2 =SStot − SSres
SStot= 1− SSres
SStot
This is the fraction of the total prediction error eliminated byproviding information on X .
Alternatively, this is the fraction of the variation in Y is “explainedby” X .
R2 = 0 means no relationship
R2 = 1 implies perfect linear fit
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 86 / 127
![Page 89: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/89.jpg)
Is R-squared useful?
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 2 4 6 8 10
05
1015
x
y
R−squared = 0.66
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 87 / 127
![Page 90: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/90.jpg)
Is R-squared useful?
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 2 4 6 8 10
02
46
810
x
y
R−squared = 0.96
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 87 / 127
![Page 91: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/91.jpg)
Is R-squared useful?
5 10 15
46
810
12
X
Y
5 10 15
46
810
12
X
Y
5 10 15
46
810
12
X
Y
5 10 15
46
810
12
X
Y
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 87 / 127
![Page 92: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/92.jpg)
Interpreting a RegressionLet’s have a quick chat about interpretation.
0
1
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 88 / 127
![Page 93: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/93.jpg)
State Legislators and African American PopulationInterpretations of increasing quality:
> summary(lm(beo ~ bpop, data = D))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.31489 0.32775 -4.012 0.000264 ***
bpop 0.35848 0.02519 14.232 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.317 on 39 degrees of freedom
Multiple R-squared: 0.8385,Adjusted R-squared: 0.8344
F-statistic: 202.6 on 1 and 39 DF, p-value: < 2.2e-16
“In states where an additional .01 proportion of the population is African American, weobserve on average .035 proportion more African American state legislators (between .03and .04 with 95% confidence).”
(still not perfect, the best will be subject matter specific. is fairly clear it is non-causal,gives uncertainty.)
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 89 / 127
![Page 94: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/94.jpg)
Ground Rules: Interpretation of the Slope
I almost didn’t include the last example in the slides. It is hard to giveground rules that cover all cases. Regressions are a part of marshalingevidence in an argument which makes them naturally specific to context.
1 Give a short, but precise interpretation of the association usinginterpretable language and units
2 If the association has a causal interpretation explain why, otherwisedo not imply a causal interpretation.
3 Provide a meaningful sense of uncertainty
4 Indicate the practical significance of the finding for your argument.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 90 / 127
![Page 95: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/95.jpg)
Goal Check: Understand lm() Output
Call:
lm(formula = sr ~ pop15, data = LifeCycleSavings)
Residuals:
Min 1Q Median 3Q Max
-8.637 -2.374 0.349 2.022 11.155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.49660 2.27972 7.675 6.85e-10 ***
pop15 -0.22302 0.06291 -3.545 0.000887 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.03 on 48 degrees of freedom
Multiple R-squared: 0.2075,Adjusted R-squared: 0.191
F-statistic: 12.57 on 1 and 48 DF, p-value: 0.0008866Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 91 / 127
![Page 96: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/96.jpg)
We Covered
Hypothesis tests
Confidence intervals
Goodness of fit measures
Next Time: Non-linearities
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 92 / 127
![Page 97: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/97.jpg)
Where We’ve Been and Where We’re Going...
Last WeekI hypothesis testingI what is regression
This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit
Next WeekI mechanics with two regressorsI omitted variables, multicollinearity
Long RunI probability → inference → regression → causal inference
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 93 / 127
![Page 98: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/98.jpg)
1 Mechanics of OLS
2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4
3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective
4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation
5 Non-linearitiesLog TransformationsFun With LogsLOESS
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 94 / 127
![Page 99: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/99.jpg)
Non-linear CEFs
When we say that CEFs are linear with regression, we mean linear inparameters but by including transformations of our variables we canmake non-linear shapes of pre-specified functional forms.
Many of these non-linear transformations are made by creatingmultiple variables out of a single X and so will have to wait for futureweeks.
The function log(·) is one common transformation that has only oneparameter.
This is particularly useful for positive and right-skewed variables.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 95 / 127
![Page 100: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/100.jpg)
Why does everyone keep logging stuff??
Logs linearize exponential growth.
exponentialgrows by a fixed percent.grows by a fixed amount.
linearStewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 96 / 127
![Page 101: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/101.jpg)
How? Let’s look.First, here’s a graph showing exponential growth.
x0123456789
10
y=(2x)1248
163264
128256512
1024
We’re going to use y = 2x, but
any other exponent will work
0 2 4 6 8 100
200
400
600
800
1000
1200
x
y
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 97 / 127
![Page 102: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/102.jpg)
What happens when we take the log of y?
logy = z ez = y
x0123456789
10
y1248
163264
128256512
1024
We’re going to use y = 2x, but
any other exponent will work
z
0 1 2 3 4 5 6 7 80
2
4
6
8
10
x
z
What happens when we take the log of y?
log1 = 0 e0 = 1
x0123456789
10
y1248
163264
128256512
1024
We’re going to use y = 2x, but
any other exponent will work
z
0 1 2 3 4 5 6 7 80
2
4
6
8
10
x
z
0
What happens when we take the log of y?
log2 = .69 e.69 = 2
x0123456789
10
y1248
163264
128256512
1024
We’re going to use y = 2x, but
any other exponent will work
z
0 1 2 3 4 5 6 7 80
2
4
6
8
10
x
z
0.69
What happens when we take the log of y?
log4 = 1.39 e1.39 = 4
x0123456789
10
y1248
163264
128256512
1024
We’re going to use y = 2x, but
any other exponent will work
z
0 1 2 3 4 5 6 7 80
2
4
6
8
10
x
z
0.69
1.39
What happens when we take the log of y?
log8 = 2.08 e2.08 = 8
x0123456789
10
y1248
163264
128256512
1024
We’re going to use y = 2x, but
any other exponent will work
z
0 1 2 3 4 5 6 7 80
2
4
6
8
10
x
z
0.69
1.392.08
What happens when we take the log of y?
logy = z ez = y
x0123456789
10
y1248
163264
128256512
1024
We’re going to use y = 2x, but
any other exponent will work
z
0 1 2 3 4 5 6 7 80
2
4
6
8
10
x
z
0.69
1.392.082.773.474.164.855.556.246.93
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 98 / 127
![Page 103: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/103.jpg)
Interpretation
The log transformation changes the interpretation of β1:
Regress log(Y ) on X −→ β1 approximates percent increase in ourprediction of Y associated with one unit increase in X .
Regress Y on log(X ) −→ β1 approximates increase in Y associatedwith a percent increase in X .
Note that these approximations work only for small increments.
In particular, they do not work when X is a discrete random variable.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 99 / 127
![Page 104: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/104.jpg)
Example from the American War Library
●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●
●
●●
●
●
●
0e+00 1e+05 2e+05 3e+05 4e+05 5e+05
0e+
001e
+05
2e+
053e
+05
4e+
055e
+05
6e+
05
X: Numbers of American Soldiers Killed in Action
Y: N
umbe
rs o
f Am
eric
an S
oldi
ers
Wou
nded
in A
ctio
n
JapanItaly TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, AfghanistanFranco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze ServiceGrenadaSouth KoreaTexas War Of IndependenceLebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer RebellionMoro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS ColeTerrorism, World Trade CenterSpanish American WarIndian WarsPhilippines WarD−DayAleutian CampaignWar of 1812Revolutionary WarIwo JimaOperation Iraqi Freedom, Iraq
Okinawa
Korean War
Civil War, SouthVietnam War
World War I
Civil War, North
World War II
β1 = 1.23 −→ One additional soldier killed predicts 1.23 additional soldierswounded
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 100 / 127
![Page 105: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/105.jpg)
Wounded (Scale in Levels)
JapanItaly TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, AfghanistanFranco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze ServiceGrenadaSouth KoreaTexas War Of IndependenceLebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer RebellionMoro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS ColeTerrorism, World Trade CenterSpanish American WarIndian WarsPhilippines WarD−DayAleutian CampaignWar of 1812Revolutionary WarIwo JimaOperation Iraqi Freedom, IraqOkinawaKorean WarCivil War, SouthVietnam WarWorld War ICivil War, NorthWorld War II
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0e+00 1e+05 2e+05 3e+05 4e+05 5e+05 6e+05
Number of Wounded
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 101 / 127
![Page 106: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/106.jpg)
Wounded (Logarithmic Scale)
JapanItaly TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, AfghanistanFranco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze ServiceGrenadaSouth KoreaTexas War Of IndependenceLebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer RebellionMoro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS ColeTerrorism, World Trade CenterSpanish American WarIndian WarsPhilippines WarD−DayAleutian CampaignWar of 1812Revolutionary WarIwo JimaOperation Iraqi Freedom, IraqOkinawaKorean WarCivil War, SouthVietnam WarWorld War ICivil War, NorthWorld War II
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2 4 6 8 10 12
Log(Number of Wounded)
10 100 1,000 10,000 100,000 1,000,000
Number of Wounded
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 102 / 127
![Page 107: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/107.jpg)
Regression: Log-Level
●
●●●
●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●
●
●●●●
●
●
●
●
●
●
●●●
●
●
●
0e+00 1e+05 2e+05 3e+05 4e+05
24
68
1012
X: Numbers of American Soldiers Killed in Action
Y: l
og(N
umbe
rs o
f Am
eric
an S
oldi
ers
Wou
nded
in A
ctio
n)
Japan
Italy TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, Afghanistan
Franco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze Service
GrenadaSouth KoreaTexas War Of Independence
LebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer Rebellion
Moro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS Cole
Terrorism, World Trade Center
Spanish American WarIndian Wars
Philippines WarD−DayAleutian CampaignWar of 1812
Revolutionary War
Iwo JimaOperation Iraqi Freedom, Iraq
Okinawa
Korean WarCivil War, SouthVietnam War
World War ICivil War, North
World War II
β1 = 0.0000237 −→ One additional soldier killed predicts 0.0023 percent increasein the number of soldiers wounded
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 103 / 127
![Page 108: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/108.jpg)
Regression: Log-Log
●
●●●
●●●●
● ●● ● ●●● ●●
●●
● ●●● ●
● ●● ●
●●●●
●
●
●●
●●●
●
●
●
●
●
●●●
●
●
●
2 4 6 8 10 12
24
68
1012
X: Log(Numbers of American Soldiers Killed in Action)
Y: L
og(N
umbe
rs o
f Am
eric
an S
oldi
ers
Wou
nded
in A
ctio
n)
Japan
Italy TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan Theater Mexican WarOperation Enduring Freedom, Afghanistan
Franco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil War Dominican RepublicRussia Siberia ExpeditionBarbary Wars NicaraguaMexicoChina Yangtze Service
GrenadaSouth KoreaTexas War Of Independence
LebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer Rebellion
Moro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS Cole
Terrorism, World Trade Center
Spanish American WarIndian Wars
Philippines WarD−DayAleutian CampaignWar of 1812
Revolutionary War
Iwo JimaOperation Iraqi Freedom, Iraq
Okinawa
Korean WarCivil War, SouthVietnam War
World War ICivil War, North
World War II
β1 = 0.797 −→ A percent increase in deaths predicts 0.797 percent increase inthe wounded
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 104 / 127
![Page 109: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/109.jpg)
Four Most Commonly Used Models
Model Equation β1 Interpretation
Level-Level Y = β0 + β1X ∆Y = β1∆XLog-Level log(Y ) = β0 + β1X %∆Y = 100β1∆XLevel-Log Y = β0 + β1log(X ) ∆Y = (β1/100)%∆XLog-Log log(Y ) = β0 + β1log(X ) %∆Y = β1%∆X
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 105 / 127
![Page 110: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/110.jpg)
Why Does This Approximation Work?
A useful thing to know is that for small x ,
log(1 + x) ≈ x
exp(x) ≈ 1 + x
This can be derived from a series expansion of the log function.Numerically, when |x | ≤ .1, the approximation is within 0.001.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 106 / 127
![Page 111: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/111.jpg)
Why Does This Approximation Work?Take two numbers a > b > 0. The percentage difference between a and bis
p = 100
(a− b
b
)We can rewrite this as
a
b= 1 +
p
100
Taking natural logs
log(a)− log(b) = log(
1 +p
100
)Applying our approximation and multiplying by 100 we find,
p ≈ 100 (log(a)− log(b))
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 107 / 127
![Page 112: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/112.jpg)
Be Careful: Log-Level with binary X
Assume we have: log(Y ) = β0 + β1 X where X is binary with values 1 or 0.Assume β1 > .2. What is the problem with saying that a one unit increase in X isassociated with a β1 · 100 percent change in Y ?
Log approximation is inaccurate for large changes like going from X = 0 toX = 1. Instead the percent change in Y when X goes from 0 to 1 needs to becomputed using:
100(YX=1 − YX=0)/YX=0 = 100((YX=1/YX=0)− 1)
= 100((YX=1/YX=0)− 1)
= 100(exp(β1)− 1)
Recall: log(YX=1)− log(YX=0) = log(YX=1/YX=0) = β1.
A one unit change in X (ie. going from 0 to 1) is associated with a100(exp(β1)− 1) percent increase in Y .
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 108 / 127
![Page 113: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/113.jpg)
Interpreting a Logged Outcome
On the last few slides, there was a bit that was a little dodgy.
When we log the outcome, we are no longer approximating E [Y |X ]we are approximating E [log(Y )|X ].
Jensen’s inequality gives us information on this relation:f (E [X ]) ≤ E [f (X )] for any convex function f ().
In practice, this means we are no longer characterizing theexpectation of Y and it is technically innaccurate to talk about Y ‘onaverage’ changing in a certain way.
What are we characterizing? The geometric mean.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 109 / 127
![Page 114: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/114.jpg)
Geometric Mean
exp(E (log(Y ))) = exp
(1
N
N∑i=1
log(Yi )
)
= exp
(1
Nlog(
N∏i=1
Yi )
)
= exp
log
( N∏i=1
Yi
) 1N
=
(N∏i=1
Yi
) 1N
= Geometric Mean(Y )
The geometric mean is a robust measure of central tendency.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 110 / 127
![Page 115: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/115.jpg)
Application
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 111 / 127
![Page 116: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/116.jpg)
Core Idea
Classic approach : E (log (Y ) | X )︸ ︷︷ ︸Mean of log
offspring income Ygiven parent income X
= β0 + β1︸︷︷︸Intergenerational
elasticity(IGE)
log (X )︸ ︷︷ ︸Log parent
income
MG proposal : log (E (Y | X ))︸ ︷︷ ︸Log of mean
offspring income Ygiven parent income X
= α0 + α1︸︷︷︸Intergenerationalelasticity of the
expectation (IGEE)
log (X )︸ ︷︷ ︸Log parent
income
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 112 / 127
![Page 117: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/117.jpg)
Geometric Mean is Closer to the Median Than the Mean
Exp
onen
tiate
dm
ean
of lo
g
Med
ian
Mea
n
Subgroup: 1st quartile of childhood income
+ 1 familybeyond axis
Exp
onen
tiate
dm
ean
of lo
g
Med
ian
Mea
n
Subgroup: 2nd quartile of childhood income
+ 2 familiesbeyond axis
Exp
onen
tiate
dm
ean
of lo
g
Med
ian
Mea
n
Subgroup: 3rd quartile of childhood income
+ 8 familiesbeyond axis
Med
ian
Exp
onen
tiate
dm
ean
of lo
g
Mea
n
Subgroup: 4th quartile of childhood income
+ 11 familiesbeyond axis
0 50 100 150 200 250Offspring family income
(thousands of 2016 dollars)
Den
sity
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 113 / 127
![Page 118: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/118.jpg)
Our Response
Images from this section are from this paper or earlier drafts of it.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 114 / 127
![Page 119: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/119.jpg)
Two Implicit Choices
(1) Summary Statistics for the Conditional Distribution(gets you down to one number per value of x)
and
(2) Assume or Learn a Functional Form(potentially simplifies the set of summary statistics to a single number)
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 115 / 127
![Page 120: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/120.jpg)
Visualizing the MG Proposal
●●
●
●
●● ●
GeometricMean
ArithmeticMean
●
Mitnik andGruskyProposal
ClassicIntergenerationalElasticity
Parentincome densityM
edia
n
$0k
$100k
$200k
$300k
$0k $100k $200kPredictor: Parent Income
Out
com
e: O
ffspr
ing
Inco
me
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 116 / 127
![Page 121: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/121.jpg)
Single Summary Statistics Necessarily Mask Information
1. Near equality 2. Distributed 3. A few high earners
0 50 100 150 0 50 100 150 0 50 100 1500
1
2
3
4
0.0
0.5
1.0
1.5
0
1
2
3
4
5
Income (thousands of dollars)
Den
sity
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 117 / 127
![Page 122: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/122.jpg)
The Mean is a Normative Choice
Very sensitive to
low incomes
Goes to negative
infinity when income = 0
More income
equally valuable
at top and bottom
Like log, but
well−behaved
when income = 0
Concavity between
linear and log
A. Log B. Linear C. Inverse hyperbolic sine D. Log(Income + $5,000)
0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100Income (thousands of dollars)
Util
ity
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 118 / 127
![Page 123: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/123.jpg)
A New Proposal
●●
●●
● ● ●
●●
●●
● ● ●●
●●
●● ● ●
●
●
●
●
●● ●
●
●
●
●
●
●
●
10th percentile25th percentile50th percentile
75th percentile
90th percentile
Parentincome densityM
edia
n
$0k
$100k
$200k
$300k
$0k $100k $200kPredictor: Parent Income
Out
com
e: O
ffspr
ing
Inco
me
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 119 / 127
![Page 124: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/124.jpg)
Single Number Summaries
A key selling point of the conventional IGE, the MG proposal andregression more broadly is the single-number summary.
Any such summary necessitates a loss of information.
Even with more complex functional forms, we can always calculatesuch a summary. For instance here, median is (on average) $4k higherwhen parent income is $10k higher.
We obtain this by simply plugging in the 50th percentile at eachoffspring income, adding $10k to each parent income and taking theaverage.
If you are willing to commit to a quantity of interest, you can usuallyestimate it directly.
At their best, single-number summaries are a way that the reader cancalculate any approximation to a variety of quantities they areinterested in. At their worst, they are a way for authors to abdicateresponsibility for choosing a clear quantity of interest.
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 120 / 127
![Page 125: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/125.jpg)
Broader Implications (Lee, Lundberg and Stewart)
0
100
200
300
400
0.00 0.25 0.50 0.75Proportion of Non−Hispanic Blacks in the County
Cov
id D
eath
s P
er 1
00,0
00 R
esid
ents
Covid data from NYTimes github as of 2020/09/07Demographic data from American Community Survey 2014−2018 5−year estimate
Traditional Approach to Visualize Covid−19 Death Rates in US Counties
0
100
200
300
0.0 0.2 0.4 0.6 0.8Proportion of Non−Hispanic Blacks in the County
Cov
id D
eath
s P
er 1
00,0
00 R
esid
ents
Percentile
95th
90th
75th
50th
25th
10th
5th
Covid data from NYTimes github as of 2020/09/07Demographic data from American Community Survey 2014−2018 5−year estimate
Covid−19 Death Rates in US Counties
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 121 / 127
![Page 126: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/126.jpg)
1 Mechanics of OLS
2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4
3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective
4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation
5 Non-linearitiesLog TransformationsFun With LogsLOESS
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 122 / 127
![Page 127: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/127.jpg)
So what is ggplot2 doing?
6
8
10
2 4 6 8Log Settler Mortality
Log
GD
P p
er c
apita
gro
wth
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 123 / 127
![Page 128: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/128.jpg)
LOESS
We can combine the nonparametric kernel method idea of using onlylocal data with a parametric model
Idea: fit a linear regression within each band
Locally weighted scatterplot smoothing (LOWESS or LOESS):1 Pick a subset of the data that falls in the interval [x − h , x + h]
2 Fit a line to this subset of the data (= local linear regression),weighting the points by their distance to x using a kernel function
3 Use the fitted regression line to predict the expected value ofE [Y |X = x0]
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 124 / 127
![Page 129: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/129.jpg)
LOESS Example
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 125 / 127
![Page 130: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/130.jpg)
We Covered
Interpretation with logged independent and dependent variables
The geometric mean!
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 126 / 127
![Page 131: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,](https://reader034.fdocuments.in/reader034/viewer/2022052616/60a7c4ec65ae96344821055e/html5/thumbnails/131.jpg)
This Week in Review
OLS!
Classical regression assumptions!
Inference!
Logs!
Going Deeper:
Aronow and Miller (2019) Foundations of Agnostic Statistics.Cambridge University Press. Chapter 4.
Next week: Linear Regression with Two Variables!
Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 127 / 127