Multivariate Models Analysis of Variance and Regression Using Dummy Variables.
Regression With Dummy Variables Econ420 1
Transcript of Regression With Dummy Variables Econ420 1
-
7/29/2019 Regression With Dummy Variables Econ420 1
1/47
Econometrics
OLS Regression with Dummy Variables
-
7/29/2019 Regression With Dummy Variables Econ420 1
2/47
Copyright 2009 South-Western/Cengage Learning
2
-
7/29/2019 Regression With Dummy Variables Econ420 1
3/47
Interpretation of Regression Coefficients with a
Binary Regressor
Female is a dummy variable, which indicates whether a person
is female:
Consider a regression of wage on a constant, and Female:
For a male, the regression model is:
0 is the average wage for male workers.
For a female, the regression model is:
0+ 0 is the average wage for female workers.
0 is the difference in average wages between female and
male workers (how much more females earn relative to
males).3
maleif,0
femaleif,1Female
iii uFemaleWage 00
ii uWage 0
ii
uWage 00
-
7/29/2019 Regression With Dummy Variables Econ420 1
4/47
Regression for Females
-> sex = 0
Linear regression Number of obs= 12
F( 1, 10) = 2.21
Prob > F = 0.1683R-squared = 0.2868
Root MSE = .38277
Robust
gpa Coef. Std. Err. t P>t [95% Conf. Interval]
hsgpa .869 .585 1.49 0.168 -.435 2.173
_cons .244 2.26 0.11 0.916 -4.785 5.273
4
-
7/29/2019 Regression With Dummy Variables Econ420 1
5/47
Regression for Males
-> sex = 1
Linear regression Number of obs= 22
F( 1, 20) = 15.29
Prob > F = 0.0009R-squared = 0.2236
Root MSE = .35021
Robust
gpa Coef. Std. Err. t P>t [95% Conf. Interval]
hsgpa .710 .182 3.91 0.001 .331 1.089
_cons .691 .711 0.97 0.342 -.791 2.173
5
-
7/29/2019 Regression With Dummy Variables Econ420 1
6/47
GPA Example: Regression with Dummy Variables
regress gpa sex, robust
Linear regression Number of obs = 34
F( 1, 32) = 1.41
Prob > F = 0.2440
R-squared = 0.0443
Root MSE = .40364
Robust
gpa Coef. Std. Err. t P>t [95% Conf. Interval]
sex -.1764 .1486 -1.19 0.244 -.4792 .1264
_cons 3.540 .1231 28.75 0.000 3.2889 .7904
GPA = 3.540 0.1764 SEX, R2 = 0.04
(0.12) (0.15)6
-
7/29/2019 Regression With Dummy Variables Econ420 1
7/47
Interpretation of Regression Coefficients with a
Binary Regressor
Consider a regression of wage on a constant, and Female:
For a male, the regression model is:
0 is the intercept for male workers.
For a female, the regression model is:
0+ 0 is the intercept for female workers.
0 is the shift in the intercept7
iiii uEducFemaleWage 100
iii uEducWage 10
iii uEducWage 100
-
7/29/2019 Regression With Dummy Variables Econ420 1
8/47
Copyright 2009 South-Western/Cengage Learning
8
-
7/29/2019 Regression With Dummy Variables Econ420 1
9/47
Example: Wage Discrimination
Consider a regression model:
(0.72) (0.26) (0.049) (0.012)
0 = -1.81
In a simple regression:
(0.21) (0.30)9
iiiii uExperEducFemaleWage 2100
iiiiiiuExperEducFemaleWage 025.0572.081.157.1
iii uFemaleWage 51.210.7
-
7/29/2019 Regression With Dummy Variables Econ420 1
10/47
Using Dummy Variables for Multiple Categories
4 groups: married men (MM), married women(MF), single
men (SM) and single women (SF)
Regression model:
(0.100) (0.055) (0.058) (0.056) (0.007) (0.005)
Which one is the base category?
10
iu
iExper
iEduc
iSF
iMF
iMM
iWageLog 027.0079.0110.0198.0213.0321.0)(
-
7/29/2019 Regression With Dummy Variables Econ420 1
11/47
Example: Effects of Physical Attractiveness on Wage
3 groups: Below average(BA), above average(AA), and
average(A)
Regression model for men:
(0.100) (0.046) (0.033)
Which one is the base category?
Regression model for women:
(0.100) (0.066) (0.049)
11
i
rsotherfacto
i
AA
i
BA
i
WageLog 016.0164.0321.0)(
irsotherfactoiAAiBAiWageLog 035.0124.0200.0)(
-
7/29/2019 Regression With Dummy Variables Econ420 1
12/47
Outline Last Time: What is a dummy variable?
How to interpret coefficients in a regression witha dummy variable(s)?
Can we show the coefficient on a dummy
variable on a graph?
Today: Interaction terms and heteroskedasticity
Why do we need interaction terms? 3 types of interaction terms
What are the consequences of and the solution
for heteroskedasticity?
-
7/29/2019 Regression With Dummy Variables Econ420 1
13/47
Interaction Terms Involving Dummy Variable
Consider a regression model:
(0.100) (0.056) (0.055) (0.072)
13
iuiMarriedFemaleiMarriediFemaleiWageLog ...*301.0213.0110.0321.0)(
-
7/29/2019 Regression With Dummy Variables Econ420 1
14/47
14
Interactions Between Independent
Variables: Test Score Example
Perhaps the effect of class size reduction is bigger in districtswhere many students are still learning English,
i.e smaller classes help more if there are many Englishlearners, who need individual attention
That is, TestScoreSTR
might depend onPctEL
More generally,1
YX
might depend onX2
How to model such interactions betweenX1 andX2?We first consider binaryXs, then continuousXs
-
7/29/2019 Regression With Dummy Variables Econ420 1
15/47
15
(a) Interactions Between 2 Binary Variables
Yi =0 +1D1i +2D2i + uiD1i,D2i are binary1 is the effect of changingD1=0 toD1=1. In this specification,
this effect doesnt depend on the value of D2.To allow the effect of changingD1 to depend onD2, include the
interaction termD1iD2i as a regressor:
Yi =0 +1D1i +2D2i +3(D1iD2i) + ui
-
7/29/2019 Regression With Dummy Variables Econ420 1
16/47
16
Interpreting the Coefficients
Yi=
0+
1D
1i+
2D
2i+
3(D
1iD
2i) + u
i
General rule: compare various cases:
E(Yi|D1i=0) = 0 + 2D2 (1)
E(Yi|D1i=1) = 0 + 1 +2D2 + 3D2 (2)
subtract (2)(1):
E(Yi|D1i=1)E(Yi|D1i=0) = 1 + 3D2
The effect of a change inD1 depends onD2 (what we wanted)3 = is the difference in the effect of a change inD1 on Ybetween those who haveD2 = 1 and those who have D2 =0
-
7/29/2019 Regression With Dummy Variables Econ420 1
17/47
Example: ln(wage) vs. gender and
completion of a college degree
F is the effect of being female on wages,
C is the effect of a college education on wages.
This regression does not allow the effect of obtaining a collegedegree to depend on gender.
IfFC
is statistically different from zero, then the effect of
education on earnings is gender specific.
FC shows by how much the wage differential between those
with college degree and those without college degree is larger
for females relative to males17
iCiCFiFiuDDY
0
iCiFiFCCiCFiFi uDDDDY 0
-
7/29/2019 Regression With Dummy Variables Econ420 1
18/47
18
Example: TestScore, STR, English learners
Let
HiSTR =1 if 20
0 if 20
STR
STR
and HiEL =
1 if l0
0 if 10
PctEL
PctEL
TestScore = 664.118.2HiEL1.9HiSTR3.5(HiSTRHiEL)(1.4) (2.3) (1.9) (3.1)
Effect ofHiSTR whenHiEL = 0 is1.9Effect ofHiSTR whenHiEL = 1 is1.93.5 =5.4Class size reduction is estimated to have a bigger effect when
the percent of English learners is large
This interaction isnt statistically significant: t= 3.5/3.1
-
7/29/2019 Regression With Dummy Variables Econ420 1
19/47
19
(b) Interaction between Continuous andBinary Variables
Yi =0 + 1Di + 2Xi + ui
Di is binary,Xis continuousThe effect of X on Y(holding constantD) =2, which does not
depend onD
To allow the effect ofXto depend onD, include theinteraction termDiXi as a regressor:
Yi =0 + 1Di +2Xi +3(DiXi) + ui
-
7/29/2019 Regression With Dummy Variables Econ420 1
20/47
20
b) Interaction between Continuous and
Binary Variables 2 Regression Lines
Yi =0 + 1Di +2Xi +3(DiXi) + ui
For observations withDi= 0 (the D= 0 group):Yi =0 +2Xi + ui - The D=0 regression l ine
For observations withDi= 1 (the D= 1 group):Yi =0 + 1 +2Xi +3Xi + ui
= (0+1) + (2+3)Xi + ui The D=1 regression l ine
The 2 regression lines have both different intercepts anddifferent slopes.
-
7/29/2019 Regression With Dummy Variables Econ420 1
21/47
21
Interaction between Continuous and Binary
Variables 2 Regression Lines.
-
7/29/2019 Regression With Dummy Variables Econ420 1
22/47
22
Interpreting the Coefficients
Yi = 0 + 1Di + 2Xi +3(DiXi) + ui
General rule: compare the various cases
Y= 0 + 1D + 2X+ 3(DX) (1)
Now changeX:Y+ Y= 0 + 1D + 2(X+X) +3[D(X+X)] (2)
subtract (2)(1):
Y= 2X+ 3DX
Y
X
= 2 + 3D
The effect ofXdepends onD (what we wanted)3 = increment to the effect ofX, whenD = 1
-
7/29/2019 Regression With Dummy Variables Econ420 1
23/47
23
Example: TestScore, STR, HiEL
(=1 if PctEL 10)
TestScore = 682.20.97STR + 5.6HiEL1.28(STRHiEL)(11.9) (0.59) (19.5) (0.97)
WhenHiEL = 0:Test Score = 682.20.97STR +u
WhenHiEL = 1,Test Score= 682.20.97STR + 5.61.28STR +u
= 687.82.25STR + u
Two regression lines: one for eachHiSTR group.Class size reduction is estimated to have a larger effect when
the percent of English learners is large.
-
7/29/2019 Regression With Dummy Variables Econ420 1
24/47
24
Example: Testing hypotheses
Test Score = 682.20.97STR + 5.6HiEL1.28(STRHiEL)(11.9) (0.59) (19.5) (0.97)
If the two regression lines have the same slope thecoefficient on STRHiEL is zero: t=1.28/0.97 =1.32
The two regression lines have the same intercept thecoefficient onHiEL is zero: t=5.6/19.5 = 0.29
The two regression lines are the same populationcoefficient onHiEL = 0 andpopulation coefficient on
STRHiEL = 0:F= 89.94 (p-value < .001) !!We reject the joint hypothesis but neither individualhypothesis (how can this be?)
The regressors are highly correlated large standard errorson individual coefficients
-
7/29/2019 Regression With Dummy Variables Econ420 1
25/47
25
(c) Interaction between 2 Continuous
Variables
Yi =0 + 1X1i +2X2i + ui
X1,X2 are continuousAs specified, the effect ofX1doesnt depend onX2As specified, the effect ofX2doesnt depend onX1To allow the effect ofX1to depend onX2, include the
interaction termX1iX2i as a regressor:
Yi = 0 +1X1i + 2X2i + 3(X1iX2i) + ui
-
7/29/2019 Regression With Dummy Variables Econ420 1
26/47
26
Interpreting the Coefficients:
Yi = 0 +1X1i + 2X2i + 3(X1iX2i) + ui
General rule: compare the various cases
Y= 0 +1X1 +2X2 +3(X1X2) (1)
Now changeX1:Y+ Y= 0 +1(X1+X1) +2X2 + 3[(X1+X1)X2] (2)
subtract (2)(1):
Y=1X1 +3X2X1 or1
Y
X
=1 +3X2
The effect ofX1 depends onX2 (what we wanted)3 = increment to the effect ofX1 from a unit change inX2
-
7/29/2019 Regression With Dummy Variables Econ420 1
27/47
27
Example: TestScore, STR, PctEL
Test Score = 686.31.12STR0.67PctEL + .0012(STRPctEL),(11.8) (0.59) (0.37) (0.019)
The estimated effect of class size reduction is nonlinear because
the size of the effect itself depends onPctEL:TestScore
STR
=1.12 + .0012PctEL
PctEL TestScore
STR
0 1.12
20% 1.12+.001220 =1.10
-
7/29/2019 Regression With Dummy Variables Econ420 1
28/47
28
Example: Hypothesis Tests
Test Score = 686.31.12STR0.67PctEL + .0012(STRPctEL),(11.8) (0.59) (0.37) (0.019)
Does population coefficient on STRPctEL = 0?t= .0012/.019 = .06 cant reject null at 5% level
Does population coefficient on STR = 0?t=1.12/0.59 =1.90 cant reject null at 5% level
Do the coefficients on bothSTRandSTRPctEL = 0?F= 3.89 (p-value = .021) reject null at 5% level(!!)
(Why? high but imperfect multicollinearity)
-
7/29/2019 Regression With Dummy Variables Econ420 1
29/47
Copyright 2009 South-Western/Cengage Learning
29
-
7/29/2019 Regression With Dummy Variables Econ420 1
30/47
30
Heteroskedasticity and Homoskedasticity
What?Consequences of homoskedasticityImplication for computing standard errors
Homoskedasticity
If var(u|Xi) is constantthat is, if the variance of the
conditional distribution ofu givenXdoes not depend onXthen u is said to be homoskedastic. Otherwise, u is
heteroskedastic.
-
7/29/2019 Regression With Dummy Variables Econ420 1
31/47
Example: Earnings of male and female
college graduates
Homoskedasticity: Var(ui) does not depend on Malei
For women:
For men:
Homoskedasticity: the variance of earnings is
the same for men and for women
Equal group variances = homoskedasticity
Unequal group variances = heteroskedasticity31
iii uMaleEarnings 10
ii uEarnings 0
ii uEarnings 10
-
7/29/2019 Regression With Dummy Variables Econ420 1
32/47
32
Homoskedasticity in a picture:
E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)The variance ofudoes notdepends onx: u is homoskedastic.
-
7/29/2019 Regression With Dummy Variables Econ420 1
33/47
33
Heteroskedasticity in a picture:
E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)The variance ofudoesdepend onx: u is heteroskedastic
-
7/29/2019 Regression With Dummy Variables Econ420 1
34/47
34
A real-data example from labor economics: average
hourly earnings vs. years of education (Data source:
Current Population Survey)
eteroskedastic or homoskedastic?
-
7/29/2019 Regression With Dummy Variables Econ420 1
35/47
35
So far we have (without saying so) assumed that
u might be heteroskedastic.
Heteroskedasticity and homoskedasticity concern var(u|X=x).
Because we have not explicitly assumed homoskedastic errors,
we have implicitly allowed for heteroskedasticity.
The OLS estimators remain unbiased, consistent and
asymptotically Normal even when the errors are heteroskedastic.
-
7/29/2019 Regression With Dummy Variables Econ420 1
36/47
36
What if the errors are in fact homoskedastic?
If Assumptions 1-4 hold and the errors are homoskedastic, OLSestimators are efficient (have the lowest variance) among all linear
estimators. (Gauss-Markov theorem).
The formula for the variance of1
and the OLS standard error
simplifies: If var(ui|Xi=x) =2
u , then
2
2
1)(
2
1
)(XX
unVar
i
i
Note: var( 1
) is inversely proportional to var(
X): more spread in
means more information about1
)()( 11 VarSE
-
7/29/2019 Regression With Dummy Variables Econ420 1
37/47
37
We now have two formulas for standard
errors for1
Homoskedasticity-only standard errorsare valid only if theerrors are homoskedastic.
The usual standard errorsorheteroskedasticityrobuststandard errors- are valid whether or not the errors areheteroskedastic.
The main advantage of the homoskedasticity-only standarderrors is that the formula is simpler. But the disadvantage is
that the formula is only correct if the errors are
homoskedastic.
-
7/29/2019 Regression With Dummy Variables Econ420 1
38/47
38
Practical implications
The homoskedasticity-only formula for the standard error of1
and the heteroskedasticity-robust formula differ so in
general,you get different standard errors using the different
formulas.
Homoskedasticity-only standard errors are the default settingin regression softwaresometimes the only setting (e.g.
Excel). To get the general heteroskedasticity-robust
standard errors you must override the default.
If you dont override the default and there is in fact
heteroskedasticity, your standard errors (and wrong t-
statistics and confidence intervals) will be wrongtypically,
homoskedasticity-only SEs are too small.
-
7/29/2019 Regression With Dummy Variables Econ420 1
39/47
Consequences of Heteroskedasticity
Homoskedasticity-only standard errors (that
Stata reports) are valid only if the errors are
homoskedastic.
Heteroskedasticity
robuststandard errors
(that Stata reports when we add the robust
option) - are valid whether or not the errors are
heteroskedastic.
39
-
7/29/2019 Regression With Dummy Variables Econ420 1
40/47
Heteroskedasticity-robust standard errors
in STATA regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581- | Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
If you use the, robust option, STATA computes heteroskedasticity-robust
standard errors
Otherwise, STATA computes homoskedasticity-only standard errors
40
-
7/29/2019 Regression With Dummy Variables Econ420 1
41/47
The bottom line:
Heteroskedasticity-robust standard errors are
correct whether the errors are heteroskedastic
or homoskerdastic
If the errors are heteroskedastic and you use
the homoskedasticity-only standard errors,
your standard errors will be wrong
So, you should always use heteroskedasticity-
robust standard errors.
41
-
7/29/2019 Regression With Dummy Variables Econ420 1
42/47
Evaluating the Results of Regression Analysis
Testing for Heteroskedasticity:1. Visual Evidence:
- Does u_hat exhibit any systematic pattern?
- regress y x1 x2 x3
- predict uhat, residuals - Stata will create
a record residuals for the estimated model in a variable
uhat
- gen uhatsq=uhat*uhat
- scatter uhatsq x1
- scatter uhatsq x2
- scatter uhatsq x342
-
7/29/2019 Regression With Dummy Variables Econ420 1
43/47
Evaluating the Results of Regression
Analysis
2. White Test for Heteroskedasticity
Regresse the squared error term from the OLS regression
on the independent variables in the regression, their
squares and interaction terms. Calculate R-squared from this regression
n R-squared ~ 2q , where q is the number of regressors
including constant
regress y x1 x2 x3
imtest, white
Stata provides p-value for H0 of homoskedasticity (low p-
value provides evidence for rejecting the null hypothesis)43
-
7/29/2019 Regression With Dummy Variables Econ420 1
44/47
Evaluating the Results of Regression
Analysis
Testing for Normality of the Error Terms
1. Visual Evidence:
- regress y x1 x2 x3
- predict uhat, residuals
- Histogram uhat, normal Stata will build
the histogram for the residuals and plot it on te
same graph with a normal density function
44
-
7/29/2019 Regression With Dummy Variables Econ420 1
45/47
Evaluating the Results of Regression
Analysis
45
0
.001
.
-1000 0 1000 2000
Residuals
-
7/29/2019 Regression With Dummy Variables Econ420 1
46/47
Evaluating the Results of Regression
Analysis
2. Jarque-Bera Test for Normality
H0: error terms are Normal
Test statistic:
The 5% critical value is 5.99; if JB > 5.99, reject the
null of normality.
In Stata, there is no command to calculate this teststatistic directly.
summarize uhat, detail
Calculate JB test statistic manually46
-
7/29/2019 Regression With Dummy Variables Econ420 1
47/47
summarize uhat, detail Residuals Percentiles
1% -661.5541 -835.7715 5% -507.4577 -799.6147
10% -416.4949 -723.7985 Obs 935
25% -249.1033 -721.2892 Sum of Wgt. 935
50% -42.96934 Mean 1.09e-07
Std. Dev. 372.7708
75% 197.4006 1544.31
90% 459.8176 1788.275 Variance 138958.1
95% 625.2347 2005.186 Skewness 1.210522 99% 1168.44 2225.102Kurtosis 6.533908
JB = (n/6)*(skewness^2+(((kurtosis-3)^2)/4)) =715