ECON 7710, 2010 10.1 Heteroskedasticity What is heteroskedasticity? What are the consequences? How...

ECON 7710, 2010

10.1

Heteroskedasticity

• What is heteroskedasticity?

• What are the consequences?

• How is heteroskedasticity identified?

• How is heteroskedasticity corrected?

Objectives

ECON 7710, 2010

10.2

Main empirical model for Unit 10:

foodexpi = 0 + 1incomei + i.

foodexp: Family food expenditureincome : Family income

Least squares estimates, US data (UE_Tab0301)

Is this the best estimated equation?

40. N 31710

Income12807740oodexpf

2

i03101422

i

,.

..ˆ ***

..

Rse

ECON 7710, 2010

10.3

1. The Nature of Heteroskedasticity

In a regression about firms, for the same mistake,

millionbillion

ECON 7710, 2010

10.4

Heteroskedasticity is a problem that occurs when the error term does not have a constant variance.

CLRM: Each error term comes from the same probability distribution.

Assumption CLRM.5 is violated!

ECON 7710, 2010

10.5

Yi = 0 + 1X1i + 2X2i + i

Regression Model

E(i|X1i,X2i) = 0

var(i|X1i,X2i) = 2

zero mean:

homoskedasticity:

cov(i, j|X1i,X2i,X1j,X2j) = i = j

no autocorrelation:

ECON 7710, 2010

10.6

Identical distributions for observations i and j

Distribution for i

Distribution for j

ECON 7710, 2010

10.7

X2

..

X1

..

X3 X4 X

Yf(Y)

0

HomoskedasticityYi = 0 + 1Xi + i var(i|Xi) = 2 for all i

Conditional Distribution

ECON 7710, 2010

10.8

HeteroskedasticityYi = 0 + 1Xi + i var(i|Xi) = i

2 for all i


ECON 7710, 2010

10.9

ECON 7710, 2010

10.10

ECON 7710, 2010

10.11

Pure heteroskedasticity

Different variances of the error term.

Correctly specified PRF.

Impure heteroskedasticity

Different variances of the error term.

Specification error.

ECON 7710, 2010

10.12

2. Detecting Heteroscedasticity

2.1 Graphical Method

Plotting foodexp against income (for one regressor)

Example 1:Food expenditure, US Data (UE_Tab0301)

40

80

120

160

200

240

280

200 400 600 800 1,000 1,200

income

foo

de

xp

Scatter Diagram of Regressing foodexp on income

ECON 7710, 2010

10.13

Example 1: Food expenditure, US Data, UE_Tab0301

-80

-40

0

40

80

120

200 400 600 800 1,000 1,200

income

resi

dual

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

200 400 600 800 1,000 1,200

income

squa

red

resi

dual

Plotting e against income.

Plotting e2 against income.

ECON 7710, 2010

10.14

Example 2: textbook data, (Woody3)*** *** *** **

se

2

Y 102,192 9,075 N 0.35 P 1.29 I

R 0.6182, N = 33.

-30,000

-20,000

-10,000

0

10,000

20,000

30,000

40,000

0 50,000 100,000 150,000 200,000 250,000

Population

resi

du

al

ECON 7710, 2010

10.15

3.2 Park Test

Model

Yi = 0 + 1X1i + … + KXKi + t i = 1,…,N (*)

Suppose it is suspected that var(i) depends on Zi in the form of

var(i) = i2 = 2Zi

1evi

lni2 = ln2 + 1lnZki + vi

Ho: 1 = 0 (Homoskedastic errors);HA: 1 0 (Heteroskedastic errors).

ECON 7710, 2010

10.16

Step 1: Estimate the equation (*) with OLS and obtain the residuals.

0 1 1ˆ ˆ ˆˆ

i i i i i K Kie Y Y Y X X

Step 2: Regress the natural log of squared residuals on the natural log of a possible proportionality factor

ln(ei2) = 0 + 1lnZi + vi

where vi is an error term satisfying all classical assumptions.

ECON 7710, 2010

10.17

Step 3

If the coefficient of lnZ is significantly different from zero, then it would suggest that there is heteroscedastic pattern in the residuals with respect to Z. Otherwise, homoscedastic errors cannot be rejected.

Example 3: Park Test: US data (UE_Tab0301) ^ ln(e2) = -7.46 + 2.07** ln(income) t (2.28) p-value (0.0284)

ECON 7710, 2010

10.18

Advantages of the Park test:

a. The test is simple.

b. It provides information about the variance structure.

Limitations of the Park test:a. The distribution of the dependent variable is

problematic.b. It assumes a specific functional form.c. It does not work when the variance depends on two or

more variables.

d. The correct variable with which to order the observations must be identified first.

e. It cannot handle partitioned data.

ECON 7710, 2010

10.19

3.3 White’s Test

Model

Yi = 0 + 1X1i + 2X2i + i i = 1,…,N (*)

Suppose it is suspected there may be heteroskedasticity but we are not sure of its functional form.

Ho: The conditional variance of i is constant.

HA: The conditional variance of i is not constant.

ECON 7710, 2010

10.20

Step 1: Estimate the equation (*) with OLS and obtain the residuals.

0 1 1 2 2ˆ ˆ ˆˆ

i i i i i ie Y Y Y X X

Step 2: Regress the squared residuals on all explanatory variables, all cross product terms and the square of each explanatory variable.

ei2 = 0 + 1X1i + 2X2i

+ 3X1i2 + 4X2i

2

+ 5X1iX2i + vi

ECON 7710, 2010

10.21

Step 3: Test the overall significance of the equation in Step 2. (df = number of regressors)

Reject the hypothesis of homoskedasticity if NR2er

r > cv.

Statistic = NR2white ~ 2

df

Critical value (cv) = 2df,

Example 4: White test: US data (UE_Tab0301) ^ e2 = 1924 – 7.4 income + 0.0088income2*

R2 = 0.3646, N = 40, NR2 = 14.58 cv = 2(2, 0.01) = 9.21.

ECON 7710, 2010

10.22

Advantages of the White test:

a. It does not assume a specific functional form.

b. It is applicable when the variance depends on two or more variables.

Limitations of the White test:

a. It is an large-sample test.

b. It provides no information about the variance structure.

c. It loses many degrees of freedom when there are many regressors.

d. It cannot handle partitioned data.

e. It also captures specification errors.

ECON 7710, 2010

10.23

3. Consequences of Heteroskedasticity

If heteroskedasticity appears but OLS is used for estimation, how are the OLS estimates affected?

Unaffected: OLS estimators are still linear and unbiased because, on average, overestimates ar

e as likely as underestimates.

K,1,,0k ˆE kk

ECON 7710, 2010

10.24

3.1 OLS estimators are inefficient.

Some fluctuations of the error term are attributed to the variation in independent variables.

There are other linear and unbiased estimators that have smaller variances than the OLS estimator.

ECON 7710, 2010

10.25

3.2 Unreliable Hypothesis Testing

khetero

kols ˆvarˆvar

kˆse biased

unreliable testing conclusion

ECON 7710, 2010

10.26

4. Remedies

4.1 Heteroskedasticity-Corrected Standard Errors

Yi = 0 + 1X1i + 2X2i + i

heteroskedasticity: var(i) = i2

OLS estimators are unbiased.

The standard errors of OLS are biased.

ECON 7710, 2010

10.27

A heteroskedasticity-consistent (HC) standard error of an estimated coefficient is a standard error of an estimated coefficient adjusted for heteroskedasticity.

a. HC standard errors are consistent for any type of heteroskedasticity.

b. Hypothesis tests are valid with HC standard errors in large samples.

c. Typically, HC se > OLS se

ECON 7710, 2010

10.28

incorrect variance formula:

2i

2

1XX

ˆvar

Example 5:

Yi = 0 + 1Xi + i, var(i|Xi) = i.

correct variance formula:

22

i

2i

2i

1

XX

XXˆvar

ECON 7710, 2010

10.29

HC estimator of the variance of the slope coefficient in a simple regression model

22

i

2i

2i

1

XX

XXeˆvar .est

Example 6: HC Standard Errors, US data (UE_Tab0301)

***i i

22.14 0.031ols se24.32 0.039hc se

2

foodexp = 40.77 0.13 income

0.3171, N = 40.R

ECON 7710, 2010

10.30

Yi = 0 + 1X1i + 2X2i + i

i2 = c Zi

2

The variance is assumed to be proportional to the value of Zi

2

var(i) = i2E(i) = 0 cov(t, s) = 0 t = s

4.2 Weighted Least Squares

ECON 7710, 2010

10.31

Step 1: Decide which variable is proportional to the heteroskedasticity.

Step 2: Divide all terms in the original model by that variable (divide by Zi ).

ECON 7710, 2010

10.32

Step 3: Run least squares on the transformed model which has new variables. Note that the transformed model have an intercept only if Z is one of the explanatory variables.

For example, if Zi = X2i, then

ECON 7710, 2010

10.33

Example 7: WLS: US data (UE_Tab0301)

What are values of the estimated coefficients of the original model?

Has the problem of heteroskedasticity solved?

***

0.02342 14.0380

2

foodexp 10.1577 21.2858

0.0570, N = 40.

se

income income

R

ECON 7710, 2010

10.34

0 1

OLS estimate 40.77 0.128***

OLS se 22.14 0.031

HC se 24.32 0.039

WLS estimate 21.28 0.158***

WLS se 14.03 0.023

Comparing different estimates: US data (UE_Tab0301)

The WLS estimates have improved upon those of OLS.

ECON 7710, 2010

10.35

Other possibilities

• var(i) = cZi

• var(i) = cZi

• var(i) = c(a1X1i + a2X2i)

ECON 7710, 2010

10.36

In large samples HC standard errors are consistent measures for any type of heteroscedasticity. CI & t-test are valid.

ECON 7710, 2010

10.37

4.3 Re-specifying the Regression Model

4.3.1 Use another functional form E.g., Double-log: Less variation

The heteroskedasticity may be impure.

Example 8: US data (UE_Tab0301)

***

0.90 0.14se

2

ˆln foodexp 0.30 0.69 ln

0.4014, N = 40.

income

R

The hypothesis of constant variance can be rejected.

ECON 7710, 2010

10.38

** ***

i50.86 0.078se

2

foodexp 94.21 0.44 totexp

0.3698, N = 55.R

Empirical model: foodexpi = 0 + 1totexpi + i.

Example 9: India data (Food_India55)

The hypothesis of homoskedasticity can be rejected by the Park and White tests.

ECON 7710, 2010

10.39

** ***i

50.86 0.078ols se43.26 0.074hc se

foodexp 94.21 0.44 totexp.

** ***

37.9435 0.0632

se

foodexp 176.5439 0.4650 .

totexp totexp

***

0.78 0.12se

2

ˆln foodexp 1.15 0.74 ln totexp

0.4125, N = 55.R

Double-log

HC

WLS

Which model is the best?

ECON 7710, 2010

10.40

4.3.2 Other reformulations

E.g., take average of variables related to the size of observed units, adding more variables

Example 10: Data set “Concert”The concert tour of a singer in the US

revenue = 0 + 1adv + 2stad + 3cd

+ 4radio + 5weekend + .

ECON 7710, 2010

10.41

revenue 73 3.15adv 34.66stad 8.30cd

300radio 356weekend

se

revenue 1 adv cd81 2.10 50.20 stad 7.53

stad stad stad stad

radio weekend176 293

stad stad

revenue adv stad cd22 2.21 109 7.93

pop pop pop pop

2.53radio 4.28weekend

(1)

(2)

(3)

ECON 7710, 2010

10.42

Remarks:

•The variable Z is difficult to identify. The functional relationship between the error and Z is not known. Use WLS at last.

•With correct WLS, we expect the standard errors of the regression coefficients will be smaller than the OLS counterparts.

•A log transformation usually reduces the degree of heteroskedasticity.

•The hypothesis of homoskedasticity should not be rejected in the new model.

ECON 7710, 2010

10.43

5. A Complete Example

Sources: Section 8.2.2 (pp. 255 – 256) Section 10.5 (pp. 369 – 376)

pconi = 0 + 1regi + 2taxi + 3uhmi + i.

Empirical regression model

pconi1: petroleum consumption in the ith state

regi : motor vehicle registrations in the ith state (‘000)

taxi : the gasoline tax rate in the ith state(cents per gallon)

uhm : urban highway miles wihtin the ith state

ECON 7710, 2010

10.44

pcon = 389.57*** – 0.061reg – 36.47***tax + 60.76***uhm se, vif (0.04, 24.3) (13.15, 1.1) (10.26, 24.9)

Adj. R2 = 0.9192, N = 50.

^

Equation 1

Equation 2

pcon = 551.69*** + 0.19***reg – 53.59***tax se (0.012) (16.86)

Adj. R2 = 0.8607, N = 50.

^

ECON 7710, 2010

10.45

-800

-400

0

400

800

1,200

0 5,000 10,000 15,000 20,000

REG

resi

dual

Graphical investigation

ECON 7710, 2010

10.46

Park test

White test

ln(e2) = 1.65 + 0.95***ln(REG) R2 = 0.1657, N = 50 se (0.3083)

^

e2 = 11,098,291 + 140REG – 0.0005REG2 – 12.84REGTAX – 237,873TAX + 12347TAX2.

R2 = 0.6645, N = 50, NR2 = 33.22.

^

Checking for other specifications:

Double log, quadratic

ECON 7710, 2010

10.47

50N ,1989.0R

tax0103.0pop

reg1082.01684.0

pop

oncp

2

***

00349.007159.0

**

se

pcon = 551.69*** + 0.19***reg – 53.59***taxhc se (0.022) (23.90)

R2 = 0.8664, N = 50.

^

50N ,3600.0R

reg

tax3890.17

reg

1539.2181678.0

reg

oncp

2

6822.4

***

1033.48

******

01367.0

se

(4)

(5)

(6)

ECON 7710, 2010

10.48

Selected Exercises

Ch. 10: Q. 1, 3, 4, 5, 8, 10, 12, 14

ECON 7710, 2010

10.49

Yi = 0 + 1X1i + 2X2i + i

Regression Model

E(i|X1i,X2i) = 0

var(i|X1i,X2i) = 2

zero mean:

homoskedasticity:

cov(i, j|X1i,X2i,X1j,X2j) = i = j

no autocorrelation:

heteroskedasticity: var(i|X1i,X2i) = i2

ECON 7710, 2010

10.50

.

X1 X2

.

X3

.

X

Yf(Y)

HeteroskedasticityYi = 0 + 1Xi + i var(i|Xi) = i

2 for all i


0

ECON 7710, 2010

10.51

Step 3: Test the overall significance of the equation in Step 2. (df = number of regressors)

Reject the hypothesis of homoskedasticity if NR2er

r > cv.

Statistic = NR2err ~ 2

df

Critical value (cv) = 2df,

ECON 7710, 2010

10.52

Step 1: Decide which variable is proportional to the heteroskedasticity.

Step 2: Divide all terms in the original model by that variable (divide by Zi ).

*i

*2i2

*1i1

*0i0

*i

i

i

i

i22

i

1i1

i0

i

i

XβXβXβY

ZZ

X

Z

Xβ

Z

1β

Z

Y

ECON 7710, 2010

10.53

Step 3: Run least squares on the transformed model which has new variables. Note that the transformed model have an intercept only if Z is one of the explanatory variables.

*i2

*1i1

*0i0

*i

i

i2

i

1i1

i0

i

i

βXβXβY

ZZ

Xβ

Z

1β

Z

Y

For example, if Zi = X2i, then

ECON 7710, 2010

10.54

In large samples HC standard errors are consistent measures for any type of heteroscedasticity. CI & t-test are valid.

WLS

HC se’s

OLS estimator Improve No

Standard errors Improve Improve

Specific form Yes No

Large sample No Yes

ECON 7710, 2010 10.1 Heteroskedasticity What is heteroskedasticity? What are the consequences? How...

Documents

Transcript of ECON 7710, 2010 10.1 Heteroskedasticity What is heteroskedasticity? What are the consequences? How...