Heteroskedasticity Instructor: G. William Schwertschwert.ssb.rochester.edu/a425/a425_het.pdf ·...

Heteroskedasticity APS 425 - Advanced Managerial Data Analysis

(c) Prof. G. William Schwert, 2001-2015 1

APS 425Fall 2015

Heteroskedasticity

Instructor: G. William Schwert585-275-2470

[email protected]

Heteroskedasticity(Nonconstant Variance of the Errors)

• Recall assumption 5:– Homoskedasticity: var(ei) = constant– That means, the variance of ei is the same for all

observations in the sample, and thus, the variance of Yi is the same for all observations in the sample

– The uncertainty in Yi is the same amount when Xi is small as when Xi is a large

• When you have heteroskedasticity, the spread of the dependent variable Y could depend on the value of X, for example

• Some observations are inherently less influenced by unmeasured factors



Heteroskedasticity

• Graphical example:

• Appears that there is more dispersion among the Y-values when X is larger

0

5

10

15

20

25

0 5 10 15

Heteroskedasticity

• Example: database with 249 small to medium sized companies, containing both employee and sales information for the year 2000

• SALES = total company sales in $1000

• EMPLOYEES = number of FTEs employed by the company

• Model: SALESi = 0 + 1 EMPLOYEESi + ei



Heteroskedasticity & Eviews





• Look only at this part:

• Consider the p-value for the F-statistic

• The null hypothesis for the White test is Homoskedasticity

• If fail to reject the null hypothesis, then we have homoskedasticity

• If reject the null hypothesis, then we have heteroskedasticity

• Significance level of 5% is commonly used for this test

• Conclusion: REJECT, so assume heteroskedasticity


• How to tell Eviews to assume Heteroskedasticity:– Click on the Estimate button at the top of the Equation window

– Click on the Options button in the Equation Specification window

– Check the Heteroskedasticity checkbox




Indicates heteroskedasticitywas assumed by Eviews in these results


• Since there is heteroskedasticity,– Estimators (b0 and b) for both sets of results are unbiased and consistent– Standard errors in standard results are WRONG (i.e., incorrect)– Standard errors in White results are correct– Estimators (b0 and b) are not efficient (i.e., they don’t have minimum

standard errors), but this is the best we can do unless we know the precise nature of the heteroskedasticity



Weighted Least Squares

• Up to this point we have merely been correcting OLS estimates for the bias in the estimated standard errors (and t-statistics)

• We can also get better estimators of the coefficients if we can correct for the heteroskedasticity

=>Weighted Least Squares


Example:Suppose that you have a regression

of sales on employees (with no constant):

SALESi = 1 EMPLOYEESi + ei

It looks like the variance of the errors is going to be positively related to the level of employees



White Test Confirms Heteroskedasticty

It looks like there is significant heteroskedasticity in the residuals from this regression model

Heteroskedasticity-consistent t-stats are about 2/3 the size of the “raw model”


Consider three possible hypotheses:(1) Var(ei) = 2 => SD(ei) = (2) Var(ei) = 2 EMPLOYEESi => SD(ei) = EMPLOYEESi

½

(3) Var(ei) = 2 EMPLOYEESi2 => SD(ei) = EMPLOYEESi

WLS would imply dividing the equation by the appropriate variable so that the transformed residual has constant standard deviation and variance:(1) SALESi = 1 EMPLOYEESi + ei

(2) (SALESi/EMPLOYEESi½) = 1 EMPLOYEESi

½ + ei

(3) (SALESi/EMPLOYEESi) = 1 + ei



WLS:(2) (SALESi/EMPLOYEESi

½) = 1 EMPLOYEESi½ + ei

It looks like there may still be a little bit of heteroskedasticity, but this specification is better than the “raw model” -- F-test is half as big as for levels regression

Note that the t-stat for the slope is about 25% bigger, because WLS is more efficient

WLS in Eviews

In the estimation I tell Eviews to use 1/SQR(EMPLOYEES) as the weights and you get exactly the same results as when we did the regression manually, above

Note that Eviews also gives you summary statistics in terms of the unweighted/raw data



WLS:(3) (SALESi/EMPLOYEESi) = 1 + ei

Note that you can’t do a White test here because there are no regressors

Also, you can’t compare the R2 statistics across these models because the

dependent variables are different

WLS in Eviews

In this case, I tell Eviews to use 1/EMPLOYEES as the weights and you get exactly the same results as when we did the regression manually, above



WLS:Different Estimators of 1

OLS: b1 = xi yi / xi2 => [the usual slope of the regression line]

WLS: b1 = xi½ (yi / xi

½) / [xi½ ]2 [if SD(ei) = EMPLOYEESi

½]

= yi / xi = y / x => [the ratio of the sample means]

WLS: b1 = 1 (yi / xi) / [1 ]2 [if SD(ei) = EMPLOYEESi]

= (yi / xi ) / N = (y / x) => [the sample mean of the ratio]

__

___

WLS:From Eviews Manual



WLS: Some Diagnostics

• One of the assumptions we make when estimating a regression using least squares is that the errors are Normally distributed (with constant mean and variance)

• If the variance is not constant, one thing you will generally see is a “fat-tailed” histogram; i.e., kurtosis > 3, and outliers

• Thus, we can use the histogram of the residuals as a further diagnostic for whether we have “fixed” the heteroskedasticity problem

Histogram from OLS Model (1)

• Note that the kurtosis is large, the Jarque-Bera statistic (testing Normality) has a p-value of 0, and there are outliers in the histogram

• Also note that the plot of the residuals shows erratic spikes (mostly associated with larger firms)



Histogram from WLS Model (2)

• Note that the kurtosis is smaller than in the OLS model, but the Jarque-Bera statistic still has a p-value of 0, and there are outliers in the histogram

• Also note that the plot of the standardized residuals still shows erratic spikes (mostly associated with larger firms)

Histogram from WLS Model (3)

• Note that the kurtosis is close to 3, and the Jarque-Bera statistic has a p-value of 0.11, implying that the data are consistent with a Normal distribution (and constant variance)

• Also note that the plot of the standardized residuals looks much more regular in its spread

• Thus, this model seems best



More General Approach to WLS

• Sometimes it will not be obvious how to use a single independent variable to create appropriate weights

• This is a more data-driven approach

• Start with OLS model (I am going to put the constant term back in), then create a new variable for the absolute value of the residuals, ABSRES

Forecast Residual Standard Deviation Using EMPLOYEESi

½

• Next, I am going to regress the absolute value of the residuals from the OLS regression against several functions of the independent variable

– but I could use any variable or combination of variables here if I thought they could explain the heteroskedasticity in the original OLS regression

– Start with EMPLOYEESi½ and then create forecasts of residual standard deviation

from this model, ABSRESF1




½

• This looks a lot like the earlier WLS results, but note that the constant term now looks significant

Histogram from WLS Forecast of

Residual Standard Deviation Using EMPLOYEESi½

• The kurtosis is about 4.8, and the Jarque-Bera statistic has a p-value of 0., and there are outliers in the histogram

• Similar to what we saw in the residuals from Model (2) – which also relied on a relationship with the square root of EMPLOYEES




• Next, I use with EMPLOYEESi and then create forecasts of residual standard deviation from this model, ABSRESF2


• In this case, the constant term is not significantly different from 0.



Histogram from WLS Forecast of Residual Standard Deviation Using

EMPLOYEESi

• The kurtosis is about 3.4, and the Jarque-Bera statistic has a p-value of 0.094, which suggests that Normality (with constant variance) is a reasonable assumption

• Similar to what we saw in the residuals from Model (3) – which also relied on a relationship with the level of EMPLOYEES

Try Using Logs

• Often people find that using log transformations help to solve heteroskedasticity problems

• Here is the log-log scatter diagram

• Looks like it might work, let’s see . . .



Try Using Logs

• 1% increase in employees is associated with a 1% increase in sales

• Big negative outlier (obs # 40)

Try Using Logs

• Omitting obs # 40 reduces skewness & kurtosis

• Seems inferior to WLS model 3



Links

Sales Datahttp://schwert.ssb.rochester.edu/a425/a425_sales.wf1

Return to APS 425 Home Pagehttp://schwert.ssb.rochester.edu/a425/a425main.htm

Heteroskedasticity Instructor: G. William Schwertschwert.ssb.rochester.edu/a425/a425_het.pdf ·...

Documents

Transcript of Heteroskedasticity Instructor: G. William Schwertschwert.ssb.rochester.edu/a425/a425_het.pdf ·...