Regression With Dummy Variables Econ420 1

7/29/2019 Regression With Dummy Variables Econ420 1

1/47

Econometrics

OLS Regression with Dummy Variables


2/47

Copyright 2009 South-Western/Cengage Learning

2


3/47

Interpretation of Regression Coefficients with a

Binary Regressor

Female is a dummy variable, which indicates whether a person

is female:

Consider a regression of wage on a constant, and Female:

For a male, the regression model is:

0 is the average wage for male workers.

For a female, the regression model is:

0+ 0 is the average wage for female workers.

0 is the difference in average wages between female and

male workers (how much more females earn relative to

males).3

maleif,0

femaleif,1Female

iii uFemaleWage 00

ii uWage 0

ii

uWage 00


4/47

Regression for Females

-> sex = 0

Linear regression Number of obs= 12

F( 1, 10) = 2.21

Prob > F = 0.1683R-squared = 0.2868

Root MSE = .38277

Robust

gpa Coef. Std. Err. t P>t [95% Conf. Interval]

hsgpa .869 .585 1.49 0.168 -.435 2.173

_cons .244 2.26 0.11 0.916 -4.785 5.273

4


5/47

Regression for Males

-> sex = 1

Linear regression Number of obs= 22

F( 1, 20) = 15.29

Prob > F = 0.0009R-squared = 0.2236

Root MSE = .35021

Robust


hsgpa .710 .182 3.91 0.001 .331 1.089

_cons .691 .711 0.97 0.342 -.791 2.173

5


6/47

GPA Example: Regression with Dummy Variables

regress gpa sex, robust

Linear regression Number of obs = 34

F( 1, 32) = 1.41

Prob > F = 0.2440

R-squared = 0.0443

Root MSE = .40364

Robust


sex -.1764 .1486 -1.19 0.244 -.4792 .1264

_cons 3.540 .1231 28.75 0.000 3.2889 .7904

GPA = 3.540 0.1764 SEX, R2 = 0.04

(0.12) (0.15)6


7/47

Interpretation of Regression Coefficients with a

Binary Regressor

Consider a regression of wage on a constant, and Female:

For a male, the regression model is:

0 is the intercept for male workers.

For a female, the regression model is:

0+ 0 is the intercept for female workers.

0 is the shift in the intercept7

iiii uEducFemaleWage 100

iii uEducWage 10

iii uEducWage 100


8/47


8


9/47

Example: Wage Discrimination

Consider a regression model:

(0.72) (0.26) (0.049) (0.012)

0 = -1.81

In a simple regression:

(0.21) (0.30)9

iiiii uExperEducFemaleWage 2100

iiiiiiuExperEducFemaleWage 025.0572.081.157.1

iii uFemaleWage 51.210.7


10/47

Using Dummy Variables for Multiple Categories

4 groups: married men (MM), married women(MF), single

men (SM) and single women (SF)

Regression model:

(0.100) (0.055) (0.058) (0.056) (0.007) (0.005)

Which one is the base category?

10

iu

iExper

iEduc

iSF

iMF

iMM

iWageLog 027.0079.0110.0198.0213.0321.0)(


11/47

Example: Effects of Physical Attractiveness on Wage

3 groups: Below average(BA), above average(AA), and

average(A)

Regression model for men:

(0.100) (0.046) (0.033)

Which one is the base category?

Regression model for women:

(0.100) (0.066) (0.049)

11

i

rsotherfacto

i

AA

i

BA

i

WageLog 016.0164.0321.0)(

irsotherfactoiAAiBAiWageLog 035.0124.0200.0)(


12/47

Outline Last Time: What is a dummy variable?

How to interpret coefficients in a regression witha dummy variable(s)?

Can we show the coefficient on a dummy

variable on a graph?

Today: Interaction terms and heteroskedasticity

Why do we need interaction terms? 3 types of interaction terms

What are the consequences of and the solution

for heteroskedasticity?


13/47

Interaction Terms Involving Dummy Variable

Consider a regression model:

(0.100) (0.056) (0.055) (0.072)

13

iuiMarriedFemaleiMarriediFemaleiWageLog ...*301.0213.0110.0321.0)(


14/47

14

Interactions Between Independent

Variables: Test Score Example

Perhaps the effect of class size reduction is bigger in districtswhere many students are still learning English,

i.e smaller classes help more if there are many Englishlearners, who need individual attention

That is, TestScoreSTR

might depend onPctEL

More generally,1

YX

might depend onX2

How to model such interactions betweenX1 andX2?We first consider binaryXs, then continuousXs


15/47

15

(a) Interactions Between 2 Binary Variables

Yi =0 +1D1i +2D2i + uiD1i,D2i are binary1 is the effect of changingD1=0 toD1=1. In this specification,

this effect doesnt depend on the value of D2.To allow the effect of changingD1 to depend onD2, include the

interaction termD1iD2i as a regressor:

Yi =0 +1D1i +2D2i +3(D1iD2i) + ui


16/47

16

Interpreting the Coefficients

Yi=

0+

1D

1i+

2D

2i+

3(D

1iD

2i) + u

i

General rule: compare various cases:

E(Yi|D1i=0) = 0 + 2D2 (1)

E(Yi|D1i=1) = 0 + 1 +2D2 + 3D2 (2)

subtract (2)(1):

E(Yi|D1i=1)E(Yi|D1i=0) = 1 + 3D2

The effect of a change inD1 depends onD2 (what we wanted)3 = is the difference in the effect of a change inD1 on Ybetween those who haveD2 = 1 and those who have D2 =0


17/47

Example: ln(wage) vs. gender and

completion of a college degree

F is the effect of being female on wages,

C is the effect of a college education on wages.

This regression does not allow the effect of obtaining a collegedegree to depend on gender.

IfFC

is statistically different from zero, then the effect of

education on earnings is gender specific.

FC shows by how much the wage differential between those

with college degree and those without college degree is larger

for females relative to males17

iCiCFiFiuDDY

0

iCiFiFCCiCFiFi uDDDDY 0


18/47

18

Example: TestScore, STR, English learners

Let

HiSTR =1 if 20

0 if 20

STR

STR

and HiEL =

1 if l0

0 if 10

PctEL

PctEL

TestScore = 664.118.2HiEL1.9HiSTR3.5(HiSTRHiEL)(1.4) (2.3) (1.9) (3.1)

Effect ofHiSTR whenHiEL = 0 is1.9Effect ofHiSTR whenHiEL = 1 is1.93.5 =5.4Class size reduction is estimated to have a bigger effect when

the percent of English learners is large

This interaction isnt statistically significant: t= 3.5/3.1


19/47

19

(b) Interaction between Continuous andBinary Variables

Yi =0 + 1Di + 2Xi + ui

Di is binary,Xis continuousThe effect of X on Y(holding constantD) =2, which does not

depend onD

To allow the effect ofXto depend onD, include theinteraction termDiXi as a regressor:

Yi =0 + 1Di +2Xi +3(DiXi) + ui


20/47

20

b) Interaction between Continuous and

Binary Variables 2 Regression Lines

Yi =0 + 1Di +2Xi +3(DiXi) + ui

For observations withDi= 0 (the D= 0 group):Yi =0 +2Xi + ui - The D=0 regression l ine

For observations withDi= 1 (the D= 1 group):Yi =0 + 1 +2Xi +3Xi + ui

= (0+1) + (2+3)Xi + ui The D=1 regression l ine

The 2 regression lines have both different intercepts anddifferent slopes.


21/47

21

Interaction between Continuous and Binary

Variables 2 Regression Lines.


22/47

22

Interpreting the Coefficients

Yi = 0 + 1Di + 2Xi +3(DiXi) + ui

General rule: compare the various cases

Y= 0 + 1D + 2X+ 3(DX) (1)

Now changeX:Y+ Y= 0 + 1D + 2(X+X) +3[D(X+X)] (2)

subtract (2)(1):

Y= 2X+ 3DX

Y

X

= 2 + 3D

The effect ofXdepends onD (what we wanted)3 = increment to the effect ofX, whenD = 1


23/47

23

Example: TestScore, STR, HiEL

(=1 if PctEL 10)

TestScore = 682.20.97STR + 5.6HiEL1.28(STRHiEL)(11.9) (0.59) (19.5) (0.97)

WhenHiEL = 0:Test Score = 682.20.97STR +u

WhenHiEL = 1,Test Score= 682.20.97STR + 5.61.28STR +u

= 687.82.25STR + u

Two regression lines: one for eachHiSTR group.Class size reduction is estimated to have a larger effect when

the percent of English learners is large.


24/47

24

Example: Testing hypotheses

Test Score = 682.20.97STR + 5.6HiEL1.28(STRHiEL)(11.9) (0.59) (19.5) (0.97)

If the two regression lines have the same slope thecoefficient on STRHiEL is zero: t=1.28/0.97 =1.32

The two regression lines have the same intercept thecoefficient onHiEL is zero: t=5.6/19.5 = 0.29

The two regression lines are the same populationcoefficient onHiEL = 0 andpopulation coefficient on

STRHiEL = 0:F= 89.94 (p-value < .001) !!We reject the joint hypothesis but neither individualhypothesis (how can this be?)

The regressors are highly correlated large standard errorson individual coefficients


25/47

25

(c) Interaction between 2 Continuous

Variables

Yi =0 + 1X1i +2X2i + ui

X1,X2 are continuousAs specified, the effect ofX1doesnt depend onX2As specified, the effect ofX2doesnt depend onX1To allow the effect ofX1to depend onX2, include the

interaction termX1iX2i as a regressor:

Yi = 0 +1X1i + 2X2i + 3(X1iX2i) + ui


26/47

26

Interpreting the Coefficients:

Yi = 0 +1X1i + 2X2i + 3(X1iX2i) + ui

General rule: compare the various cases

Y= 0 +1X1 +2X2 +3(X1X2) (1)

Now changeX1:Y+ Y= 0 +1(X1+X1) +2X2 + 3[(X1+X1)X2] (2)

subtract (2)(1):

Y=1X1 +3X2X1 or1

Y

X

=1 +3X2

The effect ofX1 depends onX2 (what we wanted)3 = increment to the effect ofX1 from a unit change inX2


27/47

27

Example: TestScore, STR, PctEL

Test Score = 686.31.12STR0.67PctEL + .0012(STRPctEL),(11.8) (0.59) (0.37) (0.019)

The estimated effect of class size reduction is nonlinear because

the size of the effect itself depends onPctEL:TestScore

STR

=1.12 + .0012PctEL

PctEL TestScore

STR

0 1.12

20% 1.12+.001220 =1.10


28/47

28

Example: Hypothesis Tests

Test Score = 686.31.12STR0.67PctEL + .0012(STRPctEL),(11.8) (0.59) (0.37) (0.019)

Does population coefficient on STRPctEL = 0?t= .0012/.019 = .06 cant reject null at 5% level

Does population coefficient on STR = 0?t=1.12/0.59 =1.90 cant reject null at 5% level

Do the coefficients on bothSTRandSTRPctEL = 0?F= 3.89 (p-value = .021) reject null at 5% level(!!)

(Why? high but imperfect multicollinearity)


29/47


29


30/47

30

Heteroskedasticity and Homoskedasticity

What?Consequences of homoskedasticityImplication for computing standard errors

Homoskedasticity

If var(u|Xi) is constantthat is, if the variance of the

conditional distribution ofu givenXdoes not depend onXthen u is said to be homoskedastic. Otherwise, u is

heteroskedastic.


31/47

Example: Earnings of male and female

college graduates

Homoskedasticity: Var(ui) does not depend on Malei

For women:

For men:

Homoskedasticity: the variance of earnings is

the same for men and for women

Equal group variances = homoskedasticity

Unequal group variances = heteroskedasticity31

iii uMaleEarnings 10

ii uEarnings 0

ii uEarnings 10


32/47

32

Homoskedasticity in a picture:

E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)The variance ofudoes notdepends onx: u is homoskedastic.


33/47

33

Heteroskedasticity in a picture:

E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)The variance ofudoesdepend onx: u is heteroskedastic


34/47

34

A real-data example from labor economics: average

hourly earnings vs. years of education (Data source:

Current Population Survey)

eteroskedastic or homoskedastic?


35/47

35

So far we have (without saying so) assumed that

u might be heteroskedastic.

Heteroskedasticity and homoskedasticity concern var(u|X=x).

Because we have not explicitly assumed homoskedastic errors,

we have implicitly allowed for heteroskedasticity.

The OLS estimators remain unbiased, consistent and

asymptotically Normal even when the errors are heteroskedastic.


36/47

36

What if the errors are in fact homoskedastic?

If Assumptions 1-4 hold and the errors are homoskedastic, OLSestimators are efficient (have the lowest variance) among all linear

estimators. (Gauss-Markov theorem).

The formula for the variance of1

and the OLS standard error

simplifies: If var(ui|Xi=x) =2

u , then

2

2

1)(

2

1

)(XX

unVar

i

i

Note: var( 1

) is inversely proportional to var(

X): more spread in

means more information about1

)()( 11 VarSE


37/47

37

We now have two formulas for standard

errors for1

Homoskedasticity-only standard errorsare valid only if theerrors are homoskedastic.

The usual standard errorsorheteroskedasticityrobuststandard errorsare valid whether or not the errors areheteroskedastic.

The main advantage of the homoskedasticity-only standarderrors is that the formula is simpler. But the disadvantage is

that the formula is only correct if the errors are

homoskedastic.


38/47

38

Practical implications

The homoskedasticity-only formula for the standard error of1

and the heteroskedasticity-robust formula differ so in

general,you get different standard errors using the different

formulas.

Homoskedasticity-only standard errors are the default settingin regression softwaresometimes the only setting (e.g.

Excel). To get the general heteroskedasticity-robust

standard errors you must override the default.

If you dont override the default and there is in fact

heteroskedasticity, your standard errors (and wrong t-

statistics and confidence intervals) will be wrongtypically,

homoskedasticity-only SEs are too small.


39/47

Consequences of Heteroskedasticity

Homoskedasticity-only standard errors (that

Stata reports) are valid only if the errors are

homoskedastic.

Heteroskedasticity

robuststandard errors

(that Stata reports when we add the robust

option) - are valid whether or not the errors are

heteroskedastic.

39


40/47

Heteroskedasticity-robust standard errors

in STATA regress testscr str, robust

Regression with robust standard errors Number of obs = 420

F( 1, 418) = 19.26

Prob > F = 0.0000

R-squared = 0.0512

Root MSE = 18.581- | Robust

testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]

--------+----------------------------------------------------------------

str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671

_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057

If you use the, robust option, STATA computes heteroskedasticity-robust

standard errors

Otherwise, STATA computes homoskedasticity-only standard errors

40


41/47

The bottom line:

Heteroskedasticity-robust standard errors are

correct whether the errors are heteroskedastic

or homoskerdastic

If the errors are heteroskedastic and you use

the homoskedasticity-only standard errors,

your standard errors will be wrong

So, you should always use heteroskedasticity-

robust standard errors.

41


42/47

Evaluating the Results of Regression Analysis

Testing for Heteroskedasticity:1. Visual Evidence:

- Does u_hat exhibit any systematic pattern?

- regress y x1 x2 x3

- predict uhat, residuals - Stata will create

a record residuals for the estimated model in a variable

uhat

- gen uhatsq=uhat*uhat

- scatter uhatsq x1

- scatter uhatsq x2

- scatter uhatsq x342


43/47

Evaluating the Results of Regression

Analysis

2. White Test for Heteroskedasticity

Regresse the squared error term from the OLS regression

on the independent variables in the regression, their

squares and interaction terms. Calculate R-squared from this regression

n R-squared ~ 2q , where q is the number of regressors

including constant

regress y x1 x2 x3

imtest, white

Stata provides p-value for H0 of homoskedasticity (low p-

value provides evidence for rejecting the null hypothesis)43


44/47


Analysis

Testing for Normality of the Error Terms

1. Visual Evidence:

- regress y x1 x2 x3

- predict uhat, residuals

- Histogram uhat, normal Stata will build

the histogram for the residuals and plot it on te

same graph with a normal density function

44


45/47


Analysis

45

0

.001

.

-1000 0 1000 2000

Residuals


46/47


Analysis

2. Jarque-Bera Test for Normality

H0: error terms are Normal

Test statistic:

The 5% critical value is 5.99; if JB > 5.99, reject the

null of normality.

In Stata, there is no command to calculate this teststatistic directly.

summarize uhat, detail

Calculate JB test statistic manually46


47/47

summarize uhat, detail Residuals Percentiles

1% -661.5541 -835.7715 5% -507.4577 -799.6147

10% -416.4949 -723.7985 Obs 935

25% -249.1033 -721.2892 Sum of Wgt. 935

50% -42.96934 Mean 1.09e-07

Std. Dev. 372.7708

75% 197.4006 1544.31

90% 459.8176 1788.275 Variance 138958.1

95% 625.2347 2005.186 Skewness 1.210522 99% 1168.44 2225.102Kurtosis 6.533908

JB = (n/6)*(skewness^2+(((kurtosis-3)^2)/4)) =715

Regression With Dummy Variables Econ420 1

Documents

Transcript of Regression With Dummy Variables Econ420 1