Residual Analysis Purposes –Examine Functional Form (Linear vs. Non- Linear Model) –Evaluate...
-
Upload
alannah-kelley -
Category
Documents
-
view
220 -
download
0
description
Transcript of Residual Analysis Purposes –Examine Functional Form (Linear vs. Non- Linear Model) –Evaluate...
Residual AnalysisResidual Analysis
• Purposes– Examine Functional Form (Linear vs. Non-
Linear Model)– Evaluate Violations of Assumptions
• Graphical Analysis of Residuals
Residual AnalysisResidual Analysis
(X1, Y1)
For one value X1, a population contains may Y values. Their mean is Y1.
X1
Y
Y
X
A Population Regression Line
Y = X
Y
x
A Sample Regression Line
The sample line approximates the population regression line.
y = a + bx
Histogram of Y Values at X = X1
Y
f(e)
XX1
Y = XY1 = X1
Normal Distribution of Y Values when X = X1
Y
f(e)
XX1
Y1 = X1 Y = X
The standard deviation of the normal distribution is the standard error of estimate.
Normality & Constant Variance Assumptions
Y
f(e)
X
X1X2
A Normal Regression Surface
Y
f(e)
X
X1X2
Every cross-sectional slice of the surface is a normal curve.
Analysis of Residuals
A residual is the difference between the actual value of Y and the
predicted value .Y
Linear Regression and Correlation Assumptions
• The independent variables and the dependent variable have a linear relationship.
• The dependent variable must be continuous and at least interval-scale.
Linear Regression Assumptions • Normality
Y Values Are Normally Distributed with a mean of Zero For Each X. heresiduals ( )are normally distributed with a mean of Zero.
Homoscedasticity (Constant Variance) The variation in the residuals must be the same for all values of Y. The standard deviation of the residuals is the same regardless of the given
value of X.
Independence of Errors The residuals are independent for each value of X The residuals ( ) are independent of each other The size of the error for a particular value of x is not related to the size of
the error for any other value of x
Evaluating the Aptness of the Fitted Regression Model
Does the model appear linear?
Residual Plot for Linearity(Functional Form)
Aptness of the Fitted Model
Correct Specification
X
e
Add X2 Term
X
e
Residual Plots for LinearityResidual Plots for Linearityof the Fitted Modelof the Fitted Model
• Scatter Plot of Y vs. X value• Scatter Plot of residuals vs. X value
Using SPSS to Test for Linearity of the Regression Model
• Analyze/Regression/Linear– Dependent - Sales– Independent - Customers– Save
• Predicted Value (Unstandardized or Standardized)• Residual (Unstandardizedor Standardized)
• Graphs/Scatter/Simple• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: Customer (independent variable)
Sales and Customers Problem
.85945
.54359-.00009.31852.10951
-.10343-.60249-.14501.19914.03063
-.72027-.60503
-1.02895-.08175.55129.16327
-.02414.43939.53032
12345678910111213141516171819
Unstandardized Residual
Sales and Customers Problem a
907 11 10.34055 .85945926 11 10.50641 .54359506 7 6.84009 -.00009741 9 8.89148 .31852789 9 9.31049 .10951889 10 10.18343 -.10343874 9 10.05249 -.60249510 7 6.87501 -.14501529 7 7.04086 .19914420 6 6.08937 .03063679 8 8.35027 -.72027872 9 10.03503 -.60503924 9 10.48895 -1.02895607 8 7.72175 -.08175452 7 6.36871 .55129729 9 8.78673 .16327794 9 9.35414 -.02414844 10 9.79061 .43939
1010 12 11.23968 .53032
12345678910111213141516171819
CUSTOMER SALES
Unstandardized Predicted
ValueUnstandardized Residual
Scatter Plot of Customer by Sales
CUSTOMER
11001000900800700600500400
SA
LES
12
11
10
9
8
7
6
Scatter Plot of Customer by Residuals
CUSTOMER
11001000900800700600500400
Uns
tand
ardi
zed
Res
idua
l1.0
.5
0.0
-.5
-1.0
-1.5
Plot of Residuals vs R&D ExpendituresPlot of Residuals vs X Values
RDEXPEND
1614121086420
Res
idua
l
60
40
20
0
-20
-40
ELECTRONIC FIRMS
TheLinear Regression Assumptions
1. Normality of residuals (Errors)2. Homoscedasticity (Constant Variance)3. Independence of Residuals (Errors)
Need to verify using residual analysis.
Residual Plots for NormalityResidual Plots for Normality• Construct histogram of residuals
– Stem-and-leaf plot– Box-and-whisker plot– Normal probability plot
• Scatter Plot residuals vs. X values– Simple regression
• Scatter Plot residuals vs. Y– Multiple regression
Residual Plot 1 for Residual Plot 1 for NormalityNormalityConstruct histogram of residuals
• Nearly symmetric• Centered near or at zero• Shape is approximately normal
RESIDUAL
3.02.01.00.0-1.0-2.0-3.0
10
8
6
4
2
0
Std. Dev = 1.61 Mean = 0.0N = 31.00
Using SPSS to Test for NormalityHistogram of Residuals
• Analyze/Regression/Linear– Dependent - Sales– Independent - Customers– Plot/Standardized Residual Plot: Histogram– Save
• Predicted Value (Unstandardized or Standardized)• Residual (Unstandardizedor Standardized)
• Graphs/Histogram– Variable - residual (Unstandardized or Standardedized)
Regression Standardized Residual
1.501.00.500.00-.50-1.00-1.50-2.00
Histogram
Dependent Variable: SALESFr
eque
ncy
7
6
5
4
3
2
1
0
Std. Dev = .97
Mean = 0.00
N = 20.00
Histogram of Residuals of Sales and Customer Problemfrom regression output
Unstandardized Residual
.75.50.250.00-.25-.50-.75-1.00
7
6
5
4
3
2
1
0
Std. Dev = .49
Mean = 0.00
N = 20.00
Histogram of Residuals of Sales and Customer Problemfrom graph output
Residual Plot 2 for Residual Plot 2 for NormalityNormalityPlot residuals vs. X values
• Points should be distributed about the horizontal line at 0
• Otherwise, normality is violated
X
Residuals
0
Using SPSS to Test for NormalityScatter Plot
• Simple Regression– Graph/Scatter/Simple
• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: Customers [independent variable ]
• Multiple Regression– Graph/Scatter/Simple
• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: predicted Y values
Scatter Plot of Customer by Residuals
CUSTOMER
11001000900800700600500400
Uns
tand
ardi
zed
Res
idua
l1.0
.5
0.0
-.5
-1.0
-1.5
An accounting standards board investigating the treatment of research and development expenses by the nation’s major electronic firms was interested in the relationship between a firm’s research and development expenditures and its earnings.
The Electronic FirmsThe Electronic Firms
Earnings = 6.840 + 10.671(rdexpend)
ELECTRONIC FIRMS
RDEXPEND EARNINGS PRE_1 RES_1 ZPR_1 ZRE_1
15.00 221.00 166.90075 54.09925 1.84527 2.39432 8.50 83.00 97.54224 -14.54224 .48229 -.64361 12.00 147.00 134.88913 12.11087 1.21620 .53600 6.50 69.00 76.20116 -7.20116 .06291 -.31871 4.50 41.00 54.86008 -13.86008 -.35647 -.61342 2.00 26.00 28.18373 -2.18373 -.88070 -.09665 .50 35.00 12.17792 22.82208 -1.19523 1.01006 1.50 40.00 22.84846 17.15154 -.98554 .75909 14.00 125.00 156.23021 -31.23021 1.63558 -1.38218 9.00 97.00 102.87751 -5.87751 .58713 -.26013 7.50 53.00 86.87170 -33.87170 .27260 -1.49909 .50 12.00 12.17792 -.17792 -1.19523 -.00787 2.50 34.00 33.51900 .48100 -.77585 .02129 3.00 48.00 38.85427 9.14573 -.67101 .40477 6.00 64.00 70.86589 -6.86589 -.04194 -.30387
List of Data, Predicted Values and Residuals
Data Predicted Residual Standardized Standardized Value Predicted Value Residual
Std. Dev = .96 Mean = 0.00N = 15.00
Regression Standardized Residual
2.502.00
1.501.00
.500.00
-.50-1.00
-1.50
HistogramDependent Variable: EARNINGS
Freq
uenc
y
6543210
ELECTRONIC FIRMS
Plot of St. Residuals vs RDexpendPlot of Standardized Residuals vs X Value
RDEXPEND
1614121086420
Stan
dard
ized
Res
idua
l
3
2
1
0
-1-2
ELECTRONIC FIRMS
Residual Plot for HomoscedasticityConstant Variance
Correct Specification
X
SR
0
Heteroscedasticity
X
SR
0
Fan-Shaped.Standardized Residuals Used.
• Simple Regression– Graphs/Scatter/Simple
• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: rdexpend [independent variable ]
• Multiple Regression– Graphs/Scatter/Simple
• Y-Axis: residual [ res_1 or zre_1 ]• X-Axis: predicted Y values
Using SPSS to Test for Homoscedasticity of Residuals
Test for Homoscedasticity
Plot of Residuals vs Number
NUMBER
6543210
Res
idua
l
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
DUNTON’S WORLD OF SOUND
Plot of Residuals vs R&D ExpendituresPlot of Residuals vs X Values
RDEXPEND
1614121086420
Res
idua
l
60
40
20
0
-20
-40
Test for Homoscedasticity
ELECTRONIC FIRMS
Scatter Plot of Customer by Residuals
CUSTOMER
11001000900800700600500400
Uns
tand
ardi
zed
Res
idua
l1.0
.5
0.0
-.5
-1.0
-1.5
Residual Plot for Independence
Correct Specification
X
SR
Not Independent
X
SR
Plots Reflect Sequence Data Were Collected.
Two Types of Autocorrelation
• Positive Autocorrelation: successive terms in time series are directly related
• Negative Autocorrelation: successive terms are inversely related
0
20
-20
0 4 8 12 16 20
Residualy - y
Time Period, t
Positive autocorrelation:Residuals tend to be followedby residuals with the same sign
0
20
-20
0 4 8 12 16 20
Residualy - y
Time Period, t
Negative Autocorrelation:Residuals tend to change signsfrom one period to the next
Problems with autocorrelated time-series data
• sy.x and sb are biased downwards• Invalid probability statements about
regression equation and slopes• F and t tests won’t be valid• May imply that cycles exist• May induce a falsely high or low agreement
between 2 variables
Using SPSS to Test for Independence of Errors
• Graphs/Sequence– Variables: residual (res_1)
• Durbin-Watson Statistic
Time Sequence of Residuals
Sequence number
7654321
Res
idua
l
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
DUNTON’S WORLD OF SOUND
Sequence number
151413121110987654321
Time Sequence Plot of ResidualsRe
sidu
al
60
40
20
0
-20
-40
ELECTRONIC FIRMS
794 9799 8837 7855 9845 10844 10863 11875 11880 12905 13886 12843 10904 12950 12841 10
Customers Sales($000)
Customers and sales for period of 15 consecutive weeks.
Residuals over Time
Time
151413121110987654321
Uns
tand
ardi
zed
Res
idua
l2
1
0
-1
-2
-3
Durbin-Watson Procedure• Used to Detect Autocorrelation
– Residuals in One Time Period Are Related to Residuals in Another Period
– Violation of Independence Assumption• Durbin-Watson Test Statistic
D(e e
e
i ii
n
ii
n
12
2
2
1
)
H0 : No positive autocorrelation exists (residuals are random)H1 : Positive autocorrelation exists
Accept Ho if d> du
Reject Ho if d < dL
Inconclusive if dL < d < du
d =
Testing for Positive Autocorrelation
There is positiveautocorrelation
The test isinconclusive
There is no evidence of autocorrelation
0 dL du2 4
Rule of Thumb
• Positive autocorrelation - D will approach 0• No autocorrelation - D will be close to 2• Negative autocorrelation - D is greater than 2
and may approach a maximum of 4
Using SPSS with Autocorrelation
• Analyze/Regression/Linear• Dependent; Independent• Statistics/Durbin-Watson (use only time series
data)
794 9799 8837 7855 9845 10844 10863 11875 11880 12905 13886 12843 10904 12950 12841 10
Customers Sales($000)
Customers and sales for period of 15 consecutive weeks.
Residuals over Time
Time
151413121110987654321
Uns
tand
ardi
zed
Res
idua
l2
1
0
-1
-2
-3
Model Summaryb
.811a .657 .631 .94 .883Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Durbin-Watson
Predictors: (Constant), CUSTOMERa.
Dependent Variable: SALESb.
Durbin-Watson.883
Using SPSS with Autocorrelation
• Analyze/Regression/Linear• Dependent; Independent• Statistics/ Durbin-Watson (use only time series data) • If DW indicates autocorrelation, then …
– Analyze/Time Series/Autoregression– Cochrane-Orcutt– OK
Solutions for autocorrelation• Use Final Parameters under Cochrane-Orcutt• Changes in the dependent and independent variables -
first differences• Transform the variables• Include an independent variable that measures the time of
the observation• Use lagged variables (once lagged value of dependent
variable is introduced as independent variable, Durbon-Watson test is not valid