Heteroskedasticity Instructor: G. William Schwertschwert.ssb.rochester.edu/a425/a425_het.pdf ·...
Transcript of Heteroskedasticity Instructor: G. William Schwertschwert.ssb.rochester.edu/a425/a425_het.pdf ·...
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 1
APS 425Fall 2015
Heteroskedasticity
Instructor: G. William Schwert585-275-2470
Heteroskedasticity(Nonconstant Variance of the Errors)
• Recall assumption 5:– Homoskedasticity: var(ei) = constant– That means, the variance of ei is the same for all
observations in the sample, and thus, the variance of Yi is the same for all observations in the sample
– The uncertainty in Yi is the same amount when Xi is small as when Xi is a large
• When you have heteroskedasticity, the spread of the dependent variable Y could depend on the value of X, for example
• Some observations are inherently less influenced by unmeasured factors
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 2
Heteroskedasticity
• Graphical example:
• Appears that there is more dispersion among the Y-values when X is larger
0
5
10
15
20
25
0 5 10 15
Heteroskedasticity
• Example: database with 249 small to medium sized companies, containing both employee and sales information for the year 2000
• SALES = total company sales in $1000
• EMPLOYEES = number of FTEs employed by the company
• Model: SALESi = 0 + 1 EMPLOYEESi + ei
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 3
Heteroskedasticity & Eviews
Heteroskedasticity & Eviews
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 4
Heteroskedasticity & Eviews
• Look only at this part:
• Consider the p-value for the F-statistic
• The null hypothesis for the White test is Homoskedasticity
• If fail to reject the null hypothesis, then we have homoskedasticity
• If reject the null hypothesis, then we have heteroskedasticity
• Significance level of 5% is commonly used for this test
• Conclusion: REJECT, so assume heteroskedasticity
Heteroskedasticity & Eviews
• How to tell Eviews to assume Heteroskedasticity:– Click on the Estimate button at the top of the Equation window
– Click on the Options button in the Equation Specification window
– Check the Heteroskedasticity checkbox
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 5
Heteroskedasticity & Eviews
Indicates heteroskedasticitywas assumed by Eviews in these results
Heteroskedasticity & Eviews
• Since there is heteroskedasticity,– Estimators (b0 and b) for both sets of results are unbiased and consistent– Standard errors in standard results are WRONG (i.e., incorrect)– Standard errors in White results are correct– Estimators (b0 and b) are not efficient (i.e., they don’t have minimum
standard errors), but this is the best we can do unless we know the precise nature of the heteroskedasticity
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 6
Weighted Least Squares
• Up to this point we have merely been correcting OLS estimates for the bias in the estimated standard errors (and t-statistics)
• We can also get better estimators of the coefficients if we can correct for the heteroskedasticity
=>Weighted Least Squares
Weighted Least Squares
Example:Suppose that you have a regression
of sales on employees (with no constant):
SALESi = 1 EMPLOYEESi + ei
It looks like the variance of the errors is going to be positively related to the level of employees
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 7
White Test Confirms Heteroskedasticty
It looks like there is significant heteroskedasticity in the residuals from this regression model
Heteroskedasticity-consistent t-stats are about 2/3 the size of the “raw model”
Weighted Least Squares
Consider three possible hypotheses:(1) Var(ei) = 2 => SD(ei) = (2) Var(ei) = 2 EMPLOYEESi => SD(ei) = EMPLOYEESi
½
(3) Var(ei) = 2 EMPLOYEESi2 => SD(ei) = EMPLOYEESi
WLS would imply dividing the equation by the appropriate variable so that the transformed residual has constant standard deviation and variance:(1) SALESi = 1 EMPLOYEESi + ei
(2) (SALESi/EMPLOYEESi½) = 1 EMPLOYEESi
½ + ei
(3) (SALESi/EMPLOYEESi) = 1 + ei
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 8
WLS:(2) (SALESi/EMPLOYEESi
½) = 1 EMPLOYEESi½ + ei
It looks like there may still be a little bit of heteroskedasticity, but this specification is better than the “raw model” -- F-test is half as big as for levels regression
Note that the t-stat for the slope is about 25% bigger, because WLS is more efficient
WLS in Eviews
In the estimation I tell Eviews to use 1/SQR(EMPLOYEES) as the weights and you get exactly the same results as when we did the regression manually, above
Note that Eviews also gives you summary statistics in terms of the unweighted/raw data
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 9
WLS:(3) (SALESi/EMPLOYEESi) = 1 + ei
Note that you can’t do a White test here because there are no regressors
Also, you can’t compare the R2 statistics across these models because the
dependent variables are different
WLS in Eviews
In this case, I tell Eviews to use 1/EMPLOYEES as the weights and you get exactly the same results as when we did the regression manually, above
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 10
WLS:Different Estimators of 1
OLS: b1 = xi yi / xi2 => [the usual slope of the regression line]
WLS: b1 = xi½ (yi / xi
½) / [xi½ ]2 [if SD(ei) = EMPLOYEESi
½]
= yi / xi = y / x => [the ratio of the sample means]
WLS: b1 = 1 (yi / xi) / [1 ]2 [if SD(ei) = EMPLOYEESi]
= (yi / xi ) / N = (y / x) => [the sample mean of the ratio]
__
___
WLS:From Eviews Manual
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 11
WLS: Some Diagnostics
• One of the assumptions we make when estimating a regression using least squares is that the errors are Normally distributed (with constant mean and variance)
• If the variance is not constant, one thing you will generally see is a “fat-tailed” histogram; i.e., kurtosis > 3, and outliers
• Thus, we can use the histogram of the residuals as a further diagnostic for whether we have “fixed” the heteroskedasticity problem
Histogram from OLS Model (1)
• Note that the kurtosis is large, the Jarque-Bera statistic (testing Normality) has a p-value of 0, and there are outliers in the histogram
• Also note that the plot of the residuals shows erratic spikes (mostly associated with larger firms)
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 12
Histogram from WLS Model (2)
• Note that the kurtosis is smaller than in the OLS model, but the Jarque-Bera statistic still has a p-value of 0, and there are outliers in the histogram
• Also note that the plot of the standardized residuals still shows erratic spikes (mostly associated with larger firms)
Histogram from WLS Model (3)
• Note that the kurtosis is close to 3, and the Jarque-Bera statistic has a p-value of 0.11, implying that the data are consistent with a Normal distribution (and constant variance)
• Also note that the plot of the standardized residuals looks much more regular in its spread
• Thus, this model seems best
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 13
More General Approach to WLS
• Sometimes it will not be obvious how to use a single independent variable to create appropriate weights
• This is a more data-driven approach
• Start with OLS model (I am going to put the constant term back in), then create a new variable for the absolute value of the residuals, ABSRES
Forecast Residual Standard Deviation Using EMPLOYEESi
½
• Next, I am going to regress the absolute value of the residuals from the OLS regression against several functions of the independent variable
– but I could use any variable or combination of variables here if I thought they could explain the heteroskedasticity in the original OLS regression
– Start with EMPLOYEESi½ and then create forecasts of residual standard deviation
from this model, ABSRESF1
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 14
Forecast Residual Standard Deviation Using EMPLOYEESi
½
• This looks a lot like the earlier WLS results, but note that the constant term now looks significant
Histogram from WLS Forecast of
Residual Standard Deviation Using EMPLOYEESi½
• The kurtosis is about 4.8, and the Jarque-Bera statistic has a p-value of 0., and there are outliers in the histogram
• Similar to what we saw in the residuals from Model (2) – which also relied on a relationship with the square root of EMPLOYEES
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 15
Forecast Residual Standard Deviation Using EMPLOYEESi
• Next, I use with EMPLOYEESi and then create forecasts of residual standard deviation from this model, ABSRESF2
Forecast Residual Standard Deviation Using EMPLOYEESi
• In this case, the constant term is not significantly different from 0.
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 16
Histogram from WLS Forecast of Residual Standard Deviation Using
EMPLOYEESi
• The kurtosis is about 3.4, and the Jarque-Bera statistic has a p-value of 0.094, which suggests that Normality (with constant variance) is a reasonable assumption
• Similar to what we saw in the residuals from Model (3) – which also relied on a relationship with the level of EMPLOYEES
Try Using Logs
• Often people find that using log transformations help to solve heteroskedasticity problems
• Here is the log-log scatter diagram
• Looks like it might work, let’s see . . .
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 17
Try Using Logs
• 1% increase in employees is associated with a 1% increase in sales
• Big negative outlier (obs # 40)
Try Using Logs
• Omitting obs # 40 reduces skewness & kurtosis
• Seems inferior to WLS model 3
Heteroskedasticity APS 425 - Advanced Managerial Data Analysis
(c) Prof. G. William Schwert, 2001-2015 18
Links
Sales Datahttp://schwert.ssb.rochester.edu/a425/a425_sales.wf1
Return to APS 425 Home Pagehttp://schwert.ssb.rochester.edu/a425/a425main.htm