Post on 26-Dec-2015
Regression Analysis and Multiple Regression
Session 7
• Using Statistics
• The Simple Linear Regression Model
• Estimation: The Method of Least Squares
• Error Variance and the Standard Errors of Regression Estimators
• Correlation
• Hypothesis Tests about the Regression Relationship
• How Good is the Regression?
• Analysis of Variance Table and an F Test of the Regression Model
• Residual Analysis and Checking for Model Inadequacies
• Use of the Regression Model for Prediction
• Using the Computer
• Summary and Review of Terms
Simple Linear Regression Model
This scatterplot locates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that:
Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising.
Scatterplot of Advertising Expenditures (X) and Sales (Y)
50403020100
140
120
100
80
60
40
20
0
Advertising
Sa
les
The scatter of points tends to be distributed around a positively sloped straight line.
The pairs of values of advertising expenditures and sales are not located exactly on a straight line. The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. The line represents the nature of the relationship on average.
7-1 Using Statistics
X
Y
X
Y
X 0
0
0
0
0
Y
X
Y
X
Y
X
Y
Examples of Other Scatterplots
The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship.
A statistical model separates the systematic component of a relationship from the random component.
Data
Statistical model
Systematic component
+Random
errors
In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE).
In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line.
Model Building
The population simple linear regression model:
Y= 0 + 1 X + Nonrandom or Random
Systematic Component Component
where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and is the error term, the only random component in the model, and thus, the only source of randomness in Y.
0 is the intercept of the systematic component of the regression relationship.
1 is the slope of the systematic component. The conditional mean of Y:
E Y X X[ ] 0 1
7-2 The Simple Linear Regression Model
The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi
Actual observed values of Y differ from the expected value by an unexplained or random error:
Yi = E[Yi] + i
= 0 + 1 Xi + i X
Y
E[Y]=0 + 1 X
Xi
}} 1 = Slope
1
0 = Intercept
Yi
{Error: i
Regression Plot
Picturing the Simple Linear Regression Model
• The relationship between X and Y is a straight-line relationship.
• The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term i.
• The errors i are normally
distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2)
X
Y
E[Y]=0 + 1 X
Assumptions of the Simple Linear Regression Model
Identical normal distributions of errors, all centered on the regression line.
Assumptions of the Simple Linear Regression Model
Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line.The estimated regression equation: Y=b0 + b1X + e
where b0 estimates the intercept of the population regression line, 0 ;b1 estimates the slope of the population regression line, 1;and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points.
The estimated regression line:
+
where Y (Y - hat) is the value of Y lying on the fitted regression line for a givenvalue of X.
Y b b X 0 1
7-3 Estimation: The Method of Least Squares
Fitting a Regression Line
X
Y
Data
X
Y
Three errors from a fitted line
X
Y
Three errors from the least squares regression line
e
X
Errors from the least squares regression line are minimized
.{Error ei Yi Yi
Yi the predicted value of Y for Xi
Y
X
Y b b X 0 1 the fitted regression line
Yi
Yi
Errors in Regression
Least Squares RegressionThe sum of squared errors in regression is:
SSE = e (y
The is that which the SSEwith respect to the estimates b and b .
The :
y x
x y x x
i
2
i=1
n
ii=1
n
0 1
ii=1
n
ii=1
n
i ii=1
n
ii=1
n
i
2
i=1
n
)y
nb b
b b
i
2
0 1
0 1
least squares regression line
normal equations
minimizes
b0SSE
b1
Least squares b0
Least squares b1
Sums of Squares and Cross Products:
Least squares regression estimators:
SS x x xx
n
SS y y yy
n
SS x x y y xyx y
n
bSSSS
b y b x
x
y
xy
XY
X
( )
( )
( )( )( )
2 2
2
2 2
2
1
0 1
Sums of Squares, Cross Products, and Least Squares Estimators
Miles Dollars Miles 2 Miles*Dollars 1211 1802 1466521 2182222 1345 2405 1809025 3234725 1422 2005 2022084 2851110 1687 2511 2845969 4236057 1849 2332 3418801 4311868 2026 2305 4104676 4669930 2133 3016 4549689 6433128 2253 3385 5076009 7626405 2400 3090 5760000 7416000 2468 3694 6091024 9116792 2699 3371 7284601 9098329 2806 3998 7873636 11218388 3082 3555 9498724 10956510 3209 4692 10297681 15056628 3466 4244 12013156 14709704 3643 5298 13271449 19300614 3852 4801 14837904 18493452 4033 5147 16265089 20757852 4267 5738 18207288 24484046 4498 6420 20232004 28877160 4533 6059 20548088 27465448 4804 6426 23078416 30870504 5090 6321 25908100 32173890 5233 7026 27384288 36767056 5439 6964 29582720 3787719679498 10605 293426944 390185024
SSx xx
n
SSxy xyx y
n
bSS XY
SS X
b y b x
22
29342694479448
2
2540947552
390185024106605
2
2551402848
151402848
409475521 255333776 1 26
0 1106605
251 255333776)
79448
25
274 85
( )
. .
( .
.
Example 7-1
5500500045004000350030002500200015001000
M iles
Do
llars
8 000
7 000
6 000
5 000
4 000
3 000
2 000
1 000
R-Squared = 0.965Y = 274.850 + 1.25533X
Regression of Dollars Charged against MilesMTB > Regress 'Dollars' 1 'Miles';SUBC> Constant.
Regression Analysis
The regression equation isDollars = 275 + 1.26 Miles
Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000
s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%
Analysis of Variance
SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896
Example 7-1: Using the Computer
The results on the right side are the output created by selecting REGRESSION option from the DATA ANALYSIS toolkit.
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.98243393R Square 0.965176428Adjusted R Square 0.963662359Standard Error 318.1578225Observations 25
ANOVAdf SS MS F Significance F
Regression 1 64527736.8 64527736.8 637.4721586 2.85084E-18Residual 23 2328161.201 101224.4Total 24 66855898
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 274.8496867 170.3368437 1.61356569 0.120259309 -77.51844165 627.217815 -77.51844165 627.217815MILES 1.255333776 0.049719712 25.248211 2.85084E-18 1.152480856 1.358186696 1.152480856 1.358186696
Example 7-1: Using Computer-Excel
Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles).
Residuals vs. Miles
-800
-600
-400
-200
0
200
400
600
0 1000 2000 3000 4000 5000 6000
Miles
Re
sid
ua
ls
Example 7-1: Using Computer-Excel
Y
X
What you see when looking at the total variation of Y.
X
What you see when looking along the regression line at the error variance of Y.
Y
Total Variance and Error Variance
Degrees of Freedom in Regression:
An unbiased estimator of s2
, denoted by S2
:
df = (n - 2) (n total observations less one degree of freedom
for each parameter estimated (b0 and b1) )
= ( - )
=
MSE =SSE
(n - 2)
SSE Y Y SSY
SS XY
SS XSSY b SS XY
( )2
2
1
X
Y
Square and sum all regression errors to find SSE.
Example 10 - 1:
SSE SSY b SS XY
MSESSE
n
s MSE
=
166855898 1 255333776 51402852 4
2328161 2
2
2328161 2
23101224 4
101224 4 318 158
( . )( . )
.
.
.
. .
7-4 Error Variance and the Standard Errors of Regression Estimators
The standard error of (intercept)
where s = MSE
The standard error of (slope)
0
1
b
s bs x
nSS
b
s bs
SS
X
X
:
( )
:
( )
0
2
1
Example 10 - 1:
s bs x
nSS X
s bs
SS X
( )
.
( )( . ).
( )
.
..
0
2
318 158 293426944
25 4097557 84170 338
1
318 158
40947557 840 04972
Standard Errors of Estimates in Regression
A (1 - ) 100% confidence interval for b0
A (1 - ) 100% confidence interval for b1
:
,( )( )
:
,( )( )
b tn
s b
b tn
s b
02
2 0
12
2 1
Example 10 - 195% Confidence Intervals:b t s b
b t s b
0 0 025 25 2 0
0 025 25 2
170 33827485 352 43
7758 627 28
01 25533 010287115246 1 35820
1 1
. ,( ) ( )
. ,( ) ( )
( . ). .
[ . , . ]
( ). .
[ . , . ]
= 274.85 2.069) (
= 1.25533 2.069) ( .04972
Length = 1
Height =
Slope
Least-squares point estimate:b1=1.25533
Upper
95%
bou
nd o
n slo
pe: 1
.358
20
Lower 95% bound: 1
.15246
(not a possible value of the regression slope at 95%)
0
Confidence Intervals for the Regression Parameters
The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by, can take on any value from -1 to 1.
indicates a perfect negative linear relationship-1< <0 indicates a negative linear relationship indicates no linear relationship0< <1 indicates a positive linear relationshipindicates a perfect positive linear relationship
The absolute value of indicates the strength or exactness of the relationship.
7-5 Correlation
Y
X
=0
Y
X
=-.8 Y
X
=.8Y
X
=0
Y
X
=-1 Y
X
=1
Illustrations of Correlation
The sample correlation coefficient*:
=rSS
XYSS
XSS
Y
The population correlation coefficient:
=
Cov X Y
X Y
( , )
The covariance of two random variables X and Y: where X and are the population means of X and Y respectivelyY .
Cov X Y E X X Y Y( , ) [( )( )]
Example 10 - 1:
=
rSS
XYSS
XSS
Y
51402852.4
40947557.84 6685589851402852.4
52321943 299824
( )( )
..
*Note: If < 0, b1 < 0 If = 0, b1 = 0 If > 0, b1 >0
Covariance and Correlation
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.992265946R Square 0.984591707Adjusted R Square 0.98266567Standard Error 0.279761372Observations 10
ANOVAdf SS MS F Significance F
Regression 1 40.0098686 40.0098686 511.2009204 1.55085E-08Residual 8 0.626131402 0.078266425Total 9 40.636
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept -8.762524695 0.594092798 -14.74942084 4.39075E-07 -10.13250603 -7.39254336 -10.13250603 -7.39254336US 1.423636087 0.062965575 22.60975277 1.55085E-08 1.278437117 1.568835058 1.278437117 1.568835058
RESIDUAL OUTPUT
Observation Predicted Y Residuals1 2.057109569 0.2428904312 2.484200395 0.1157996053 3.05365483 -0.153654834 3.480745656 -0.2807456565 3.765472874 -0.0654728746 4.050200091 0.0497999097 4.619654526 0.1803454748 5.758563396 -0.0585633969 7.466926701 -0.466926701
10 8.463471962 0.436528038
X Variable 1 Line Fit Plot
0
5
10
0 5 10 15
X Variable 1
Y
Y
Predicted Y
Example 7-2: Using Computer-Excel
8 9 10 11 12
2
3
4
5
6
7
8
9
United States
Inte
rna
tiona
l
Y = -8.76252 + 1.42364X R-Sq = 0.9846
Regression Plot
Example 7-2: Regression Plot
H0: =0 (No linear relationship)H1: 0 (Some linear relationship)
Test Statistic: tr
rn
n( )
2 212
Example 10 -1:
=0.98241- 0.9651
25- 2
=0.98240.0389
H rejected at 1% level0
tr
rn
t
n( )
.
.
. .
2 2
0 005
12
2525
2 807 2525
Hypothesis Tests for theCorrelation Coefficient
Y
X
Y
X
Y
X
Constant Y Unsystematic Variation Nonlinear Relationship
A hypothesis test for the existence of a linear relationship between X and Y:
H 0 H1Test statistic for the existence of a linear relationship between X and Y:
( - )
where is the least - squares estimate of the regression slope and ( ) is the standard error of .
When the null hypothesis is true, the statistic has a distribution with - degrees of freedom.
:
:
( )
1 0
1 0
2
1
1
1 1 12
tn
b
s b
b s b b
t n
Hypothesis Tests about the Regression Relationship
Example 10 - 1:
H 0 H1
=1.25533
0.04972
H 0 is rejected at the 1% level and we may
conclude that there is a relationship between
charges and miles traveled.
( - )
:
:
( )
.
. .( . , )
1 0
1 0
1
1
25 25
2 807 25 25
2
0 005 23
t
b
s b
t
n
Example 10 - 3:
H0 H1
=1.24 - 1
0.21
H0 is not rejected at the 10% level.
We may not conclude that the beta
coefficient is different from 1.
( - )
:
:
( )
.
. .( . , )
1 1
1 1
11
1
114
1 671 114
2
0 05 58
t
b
s b
t
n
Hypothesis Tests for the Regression Slope
The coefficient of determination, r2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data.
.{
Y
X
Y
Y
Y
X
{}Total Deviation
Explained Deviation
Unexplained Deviation
Total = Unexplained ExplainedDeviation Deviation Deviation (Error) (Regression)
SST = SSE + SSR
r2
( ) ( ) ( )
( ) ( ) ( )
y y y y y y
y y y y y y
SSR
SST
SSE
SST
2 2 2
1Percentage of total variation explained by the regression.
7-7 How Good is the Regression?
Y
X
r2=0 SSE
SST
Y
X
r2=0.90SSE
SST
SSR
Y
X
r2=0.50 SSE
SST
SSR
Example 10 -1:
r 2 SSRSST
64527736 866855898
0 96518.
.
5500500045004000350030002500200015001000
7000
6000
5000
4000
3000
2000
Miles
Dol
lar s
The Coefficient of Determination
7-8 Analysis of Variance and an F Test of the Regression Model
Example 10-1
Source ofVariation
Sum ofSquares
Degrees ofFreedom
Mean SquareF Ratio p Value
Regression 64527736.8 1 64527736.8 637.47 0.000
Error 2328161.2 23 101224.4
Total 66855898.0 24
Source ofVariation
Sum ofSquares
Degrees ofFreedom Mean Square F Ratio
Regression SSR (1) MSR MSRMSE
Error SSE (n-2) MSE
Total SST (n-1) MST
x or y
0
Residuals
Homoscedasticity: Residuals appear completely random. No indication of model inadequacy.
0
Residuals
Curved pattern in residuals resulting from underlying nonlinear relationship.
0
Residuals
Residuals exhibit a linear trend with time.
Time
0
Residuals
Heteroscedasticity: Variance of residuals changes when x changes.
x or y
x or y
7-9 Residual Analysis and Checking for Model Inadequacies
• Point PredictionA single-valued estimate of Y for a given value of X
obtained by inserting the value of X in the estimated regression equation.
• Prediction Interval For a value of Y given a value of X
Variation in regression line estimate.Variation of points around regression line.
For an average value of Y given a value of XVariation in regression line estimate.
7-10 Use of the Regression Model for Prediction
X
Y
X
Y
Regression line
Upper limit on slope
Lower limit on slope
1) Uncertainty about the slope of the regression line
X
Y
X
Y
Regression lineUpper limit on intercept
Lower limit on intercept
2) Uncertainty about the intercept of the regression line
Errors in Predicting E[Y|X]
X
Y
X
Prediction Interval for E[Y|X]
Y
Regression line
• The prediction band for E[Y|X] is narrowest at the mean value of X.
• The prediction band widens as the distance from the mean of X increases.
• Predictions become very unreliable when we extrapolate beyond the range of the sample itself.
Prediction Interval for E[Y|X]
Prediction band for E[Y|X]
Additional Error in Predicting Individual Value of Y
3) Variation around the regression line.
X
YRegression line
X
Y
X
Prediction Interval for E[Y|X]
Y
Regression line
Prediction band for E[Y|X]
Prediction band for Y
A (1- ) 100% prediction interval for Y:
Example 10 -1 (X = 4000):
(1.2553)(4000)
( )
( )
. . [ , ]
y tn
x xSSX
2
2
2
11
274.85 2.069 1125
4000 3177.9240947557.84
5296 05 676 62 4619.43 5972.67
Prediction Interval for a Value of Y
A (1- ) 100% prediction interval for the E[Y X]:
Example 10 -1 (X = 4000):
(1.2553)(4000)
( )
( )
. . [ , ]
y tn
x xSSX
2
2
2
1
274.85 2.069125
4000 3177.9240947557.84
5296 05 156 48 5139.57 5452.53
Prediction Interval for the Average Value of Y
MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4;SUBC> predict 4000;SUBC> residuals in C5.Regression Analysis
The regression equation isDollars = 275 + 1.26 Miles
Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000
s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%
Analysis of Variance
SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896
Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 5296.2 75.6 ( 5139.7, 5452.7) ( 4619.5, 5972.8)
Using the Computer
5500500045004000350030002500200015001000
500
0
-500
Miles
Res
ids
700060005000400030002000
500
0
-500
Fits
Res
ids
MTB > PLOT 'Resids' * 'Fits' MTB > PLOT 'Resids' *'Miles'
Plotting on the Computer (1)
Plotting on the Computer (2)
MTB > HISTOGRAM 'StRes'
210-1-2
8
7
6
5
4
3
2
1
0
StRes
Fre
quen
cy
5500500045004000350030002500200015001000
7000
6000
5000
4000
3000
2000
MilesD
olla
rs
MTB > PLOT 'Dollars' * 'Miles'
• Using Statistics.
• The k-Variable Multiple Regression Model.
• The F Test of a Multiple Regression Model.
• How Good is the Regression.
• Tests of the Significance of Individual Regression Parameters.
• Testing the Validity of the Regression Model.
• Using the Multiple Regression Model for Prediction.
Multiple Regression (1)11
• Qualitative Independent Variables.
• Polynomial Regression.
• Nonlinear Models and Transformations.
• Multicollinearity.
• Residual Autocorrelation and the Durbin-Watson Test.
• Partial F Tests and Variable Selection Methods.
• Using the Computer.
• The Matrix Approach to Multiple Regression Analysis.
• Summary and Review of Terms.
Multiple Regression (1)11
Slope: 1
Intercept: 0
Any two points (A and B), or an intercept and slope (0 and 1), define a line on a two-dimensional surface.
B
A
x
y
x2
x1
y
C
A
B
Any three points (A, B, and C), or an intercept and coefficients of x1 and x2 (0 , 1, and 2), define a plane in a three-dimensional surface.
Lines Planes
7-11 Using Statistics
y x x 0 1 1 2 2
The population regression model of a dependent variable, Y, on a set of k independent variables, X1, X2,. . . , Xk is given by:
Y= 0 + 1X1 + 2X2 + . . . + kXk +
where 0 is the Y-intercept of the regression surface and each i , i = 1,2,...,k is the slope of the regression surface - sometimes called the response surface - with respect to Xi.
x2
x1
y 2
10
Model assumptions:1. ~N(0,2), independent of other errors.2. The variables Xi are uncorrelated with the error term.
7-12 The k-Variable Multiple Regression Model
In a simple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression line.
In a multiple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression plane.
X
Y
x2
x1
y
y b b x 0 1y b b x b x 0 1 1 2 2
Simple and Multiple Least-Squares Regression
The estimated regression relationship:
where is the predicted value of Y, the value lying on the estimated regression surface. The terms b0,...,k are the least-squares estimates of the population regression parameters i.
Y b b X b X b Xk k 0 1 1 2 2
Y
The actual, observed value of Y is the predicted value plus an error:
y=b0+ b1 x1+ b2 x2+. . . + bk xk+e
The Estimated Regression Relationship
y nb b x b x
x y b x b x b x x
x y b x b x x b x
0 1 1 2 2
1 0 1 1 1
2
2 1 2
2 0 2 1 1 2 2 2
2
Minimizing the sum of squared errors with respect to the estimated coefficients b0, b1, and b2 yields the following normal equations:
Least-Squares Estimation: The 2-Variable Normal Equations
Y X1 X2 X1X2 X12 X2
2 X1Y X2Y 72 12 5 60 144 25 864 360 76 11 8 88 121 64 836 608 78 15 6 90 225 36 1170 468 70 10 5 50 100 25 700 350 68 11 3 33 121 9 748 204 80 16 9 144 256 81 1280 720 82 14 12 168 196 144 1148 984 65 8 4 32 64 16 520 260 62 8 3 24 64 9 496 186 90 18 10 180 324 100 1620 900--- --- --- --- ---- --- ---- ----743 123 65 869 1615 509 9382 5040
Normal Equations:
743 = 10b0+123b1+65b2
9382 = 123b0+1615b1+869b2
5040 = 65b0+869b1+509b2
b0 = 47.164942b1 = 1.5990404b2 = 1.1487479
Estimated regression equation:
. . .Y X X 47164942 15990404 114874791 2
Example 7-3
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.980326323R Square 0.961039699Adjusted R Square 0.949908185Standard Error 1.910940432Observations 10
ANOVAdf SS MS F Significance F
Regression 2 630.5381466 315.2690733 86.33503537 1.16729E-05Residual 7 25.56185335 3.651693336Total 9 656.1
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 47.16494227 2.470414433 19.09191496 2.69229E-07 41.32334457 53.00653997X1 1.599040336 0.280963057 5.691283238 0.00074201 0.934668753 2.263411919X2 1.148747938 0.30524885 3.763316185 0.007044246 0.426949621 1.870546256
Excel Output
Example 7-3: Using the Computer
Total Deviation = Regression Deviation + Error Deviation SST = SSR + SSE
x2
x1
y
y
Y Y : Error Deviation
Y Y : Regression DeviationTotal deviation: Y Y
Decomposition of the Total Deviation in a Multiple Regression Model
A statistical test for the existence of a linear relationship between Y and any or all of the independent variables X1, x2, ..., Xk:
H0: 1 = 2 = ...= k=0H1: Not all the i (i=1,2,...,k) are 0
Source ofVariation
Sum ofSquares
Degrees ofFreedom Mean Square F Ratio
Regression SSR (k)
Error SSE (n-(k+1))=(n-k-1)
Total SST (n-1)
MSRSSR
k
MSESSE
n k
( ( ))1
MSTSST
n
( )1
7-13 The F Test of a Multiple Regression Model
Analysis of Variance
SOURCE DF SS MS F pRegression 2 630.54 315.27 86.34 0.000Error 7 25.56 3.65Total 9 656.10
The test statistic, F = 86.34, is greater than the critical point of F(2, 7)
for any common level of significance (p-value 0), so the null hypothesis is rejected, and we might conclude that the dependent variable is related to one or more of the independent variables.0
F
F Distribution with 2 and 7 Degrees of Freedom
F0.01=9.55
=0.01
Test statistic 86.34f(F)
Using the Computer: Analysis of Variance Table (Example 7-3)
The multiple coefficient of determination, R2 , measures the proportion ofthe variation in the dependent variable that is explained by the combinationof the independent variables in the multiple regression model:
=SSRSST
= 1-SSESST
R2
The is an unbiasedestimator of the variance of the population
errors, denoted by 2
:
=
mean square error
Standard error of estimate
, :
( ( ))
( )( ( ))
MSESSE
n k
y y
n k
s MSE
1
2
1
x2
x1
y
Errors: y - y
7-14 How Good is the Regression
The , R2
, is the coefficient ofdetermination with the SSE and SST divided by their respective degrees of freedom:
= 1 -
SSE
(n - (k + 1))
SST
(n - 1)
adjusted multiple coefficient of determination
R2
SST
SSESSR
=SSR
SST= 1 -
SSE
SSTR
2
Example 11-1: s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%
Decomposition of the Sum of Squares and the Adjusted Coefficient of Determination
Source ofVariation
Sum ofSquares
Degrees ofFreedom Mean Square F Ratio
Regression SSR (k)
Error SSE (n-(k+1))=(n-k-1)
Total SST (n-1)
MSRSSR
k
MSESSE
n k
( ( ))1
MSTSST
n
( )1
FMSR
MSE
=SSR
SST= 1 -
SSE
SSTR
2 = 1 -
SSE
(n - (k + 1))
SST
(n - 1)
=MSE
MSTR
2F
R
R
n k
k
2
12
1
( )
( ( ))
( )
Measures of Performance in Multiple Regression and the ANOVA Table
Hypothesis tests about individual regression slope parameters:
(1) H0: 1=0H1: 10
(2) H0: 2=0H1: 20 .
. .
(k) H0: k=0H1: k0
Test statistic for test i tb
s bn k
i
i
:( )( ( )
1
0
7-15 Tests of the Significance of Individual Regression Parameters
VariableCoefficientEstimate
StandardError t-Statistic
Constant 53.12 5.43 9.783 *X1 2.03 0.22 9.227 *X2 5.60 1.30 4.308 *X3 10.35 6.88 1.504
X4 3.45 2.70 1.259
X5 -4.25 0.38 11.184 *n=150 t0.025=1.96
Regression Results for Individual Parameters
MTB > regress 'Y' on 2 predictors 'X1' 'X2'
Regression Analysis
The regression equation isY = 47.2 + 1.60 X1 + 1.15 X2
Predictor Coef Stdev t-ratio pConstant 47.165 2.470 19.09 0.000X1 1.5990 0.2810 5.69 0.000X2 1.1487 0.3052 3.76 0.007
s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%
Analysis of Variance
SOURCE DF SS MS F pRegression 2 630.54 315.27 86.34 0.000Error 7 25.56 3.65Total 9 656.10
SOURCE DF SEQ SSX1 1 578.82X2 1 51.72
Example 7-3: Using the Computer
MTB > READ ‘a:\data\c11_t6.dat’ C1-C5MTB > NAME c1 'EXPORTS' c2 'M1' c3 'LEND' c4 'PRICE' C5 'EXCHANGE'MTB > REGRESS 'EXPORTS' on 4 predictors 'M1' 'LEND' 'PRICE' 'EXCHANGE'
Regression Analysis
The regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE
Predictor Coef Stdev t-ratio pConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000LEND 0.00470 0.04922 0.10 0.924PRICE 0.036511 0.009326 3.91 0.000EXCHANGE 0.268 1.175 0.23 0.820
s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4%
Analysis of Variance
SOURCE DF SS MS F pRegression 4 32.9463 8.2366 73.06 0.000Error 62 6.9898 0.1127Total 66 39.9361
Using the Computer: Example 7-4
MTB > REGRESS 'EXPORTS' on 3 predictors 'LEND' 'PRICE' 'EXCHANGE'
Regression Analysis
The regression equation isEXPORTS = - 0.29 - 0.211 LEND + 0.0781 PRICE - 2.10 EXCHANGE
Predictor Coef Stdev t-ratio pConstant -0.289 3.308 -0.09 0.931LEND -0.21140 0.03929 -5.38 0.000PRICE 0.078148 0.007268 10.75 0.000EXCHANGE -2.095 1.355 -1.55 0.127
s = 0.4130 R-sq = 73.1% R-sq(adj) = 71.8%
Analysis of Variance
SOURCE DF SS MS F pRegression 3 29.1919 9.7306 57.06 0.000 Error 63 10.7442 0.1705Total 66 39.9361
Example 7-5: Three Predictors
MTB > REGRESS 'EXPORTS' on 2 predictors 'M1' 'PRICE'
Regression Analysis
The regression equation isEXPORTS = - 3.42 + 0.361 M1 + 0.0370 PRICE
Predictor Coef Stdev t-ratio pConstant -3.4230 0.5409 -6.33 0.000M1 0.36142 0.03925 9.21 0.000PRICE 0.037033 0.004094 9.05 0.000 s = 0.3306 R-sq = 82.5% R-sq(adj) = 81.9%
Analysis of Variance
SOURCE DF SS MS F pRegression 2 32.940 16.470 150.67 0.000Error 64 6.996 0.109Total 66 39.936
Example 7-5: Two Predictors
160150140130120110
1
0
-1
P RIC E
RE
SID
UA
L
98765
1
0
-1
M1
RE
SID
UA
L
Residuals Plotted Against M1 (Apparently Random)
Residuals Plotted Against Price (Apparent Heteroscedasticity)
7-16 Investigating the Validity of the Regression Model: Residual Plots
Investigating the Validity of the Regression: Residual Plots (2)
Residuals Plotted Against Time (Apparently Random)
Residuals Plotted Against Fitted Values (Apparent Heteroscedasticity)
706050403020100
1
0
-1
TIME
RE
SI D
UA
L
543
1
0
-1
Y-HATR
ES
IDU
AL
MTB > Histogram 'SRES1'.Histogram of SRES1 N = 67
Midpoint Count -3.0 1 * -2.5 1 * -2.0 3 *** -1.5 1 * -1.0 5 ***** -0.5 13 ************* 0.0 19 ******************* 0.5 12 ************ 1.0 6 ****** 1.5 3 *** 2.0 2 ** 2.5 0 3.0 1 *
Standardized Residuals:
i
~ ( , )N 0 1
Histogram of Standardized Residuals: Example 7-6
.
.
.
...
.
.
....
... .
* Outlier
y
x
Regression line without outlier
Regression line with outlier
Outliers
... .... .
.. ... . .
Point with a large value of xiy
x
*
Regression line when all data are included
No relationship in this cluster
Influential Observations
Investigating the Validity of the Regression: Outliers and Influential Observations
Unusual ObservationsObs. M1 EXPORTS Fit Stdev.Fit Residual St.Resid 1 5.10 2.6000 2.6420 0.1288 -0.0420 -0.14 X 2 4.90 2.6000 2.6438 0.1234 -0.0438 -0.14 X 25 6.20 5.5000 4.5949 0.0676 0.9051 2.80R 26 6.30 3.7000 4.6311 0.0651 -0.9311 -2.87R 50 8.30 4.3000 5.1317 0.0648 -0.8317 -2.57R 67 8.20 5.6000 4.9474 0.0668 0.6526 2.02R
R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it large influence.
Outliers and Influential Observations: Example 7-6
Sales
Advertising
Promotions
8.00
18.00
312
63.42
89.76
Estimated Regression Plane for Example 11-1
7-17 Using the Multiple Regression Model for Prediction
MTB > regress 'EXPORTS' 2 'M1' 'PRICE';SUBC> predict 6 160;SUBC> predict 5 150;SUBC> predict 4 130. Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 4.6708 0.0853 ( 4.5003, 4.8412) ( 3.9885, 5.3530) 3.9390 0.0901 ( 3.7590, 4.1190) ( 3.2543, 4.6237) 2.8370 0.1116 ( 2.6140, 3.0599) ( 2.1397, 3.5342)
A (1 - a) 100% prediction interval for a value of Y given values of Xi:
A (1 - a) 100% prediction interval for the conditional mean of Y givenvalues of Xi:
( )
[ ( )]
( ,( ( )))
( ,( ( )))
y t s y MSE
y t s E Y
n k
n k
21
2
21
Prediction in Multiple Regression
MOVIE EARN COST PROM BOOK 1 28 4.2 1.0 0 2 35 6.0 3.0 1 3 50 5.5 6.0 1 4 20 3.3 1.0 0 5 75 12.5 11.0 1 6 60 9.6 8.0 1 7 15 2.5 0.5 0 8 45 10.8 5.0 0 9 50 8.4 3.0 1 10 34 6.6 2.0 0 11 48 10.7 1.0 1 12 82 11.0 15.0 1 13 24 3.5 4.0 0 14 50 6.9 10.0 0 15 58 7.8 9.0 1 16 63 10.1 10.0 0 17 30 5.0 1.0 1 18 37 7.5 5.0 0 19 45 6.4 8.0 1 20 72 10.0 12.0 1
MTB > regress 'EARN’ 3 'COST' 'PROM’ 'BOOK'
Regression Analysis
The regression equation isEARN = 7.84 + 2.85 COST + 2.28 PROM + 7.17 BOOK
Predictor Coef Stdev t-ratio pConstant 7.836 2.333 3.36 0.004COST 2.8477 0.3923 7.26 0.000PROM 2.2782 0.2534 8.99 0.000BOOK 7.166 1.818 3.94 0.001
s = 3.690 R-sq = 96.7% R-sq(adj) = 96.0%
Analysis of Variance
SOURCE DF SS MS F pRegression 3 6325.2 2108.4 154.89 0.000Error 16 217.8 13.6 Total 19 6543.0
An indicator (dummy, binary) variable of qualitative level A:
if level A is obtained
if level A is not obtainedX h
1
0
7-18 Qualitative (or Categorical) Independent Variables (in Regression)
A multiple regression with two quantitative variables (X1 and X2) and one qualitative variable (X3):
A regression with one quantitative variable (X1) and one qualitative variable (X2):
X1
Y
Line for X2=1
Line for X2=0
b0
b0+b2
x2
x1
y
b3
y b b x b x 0 1 1 2 2
y b b x b x b x 0 1 1 2 2 3 3
Picturing Qualitative Variables in Regression
b0 X1
YLine for X = 0 and X3 = 1
A regression with one quantitative variable (X1) and two qualitative variables (X2 and X2):
b0+b2
b0+b3
Line for X2 = 1 and X3 = 0
Line for X2 = 0 and X3 = 0
A qualitative variable with r levels or categories is represented with (r-1) 0/1 (dummy) variables.
Category X2 X3
Adventure 0 0Drama 0 1Romance 1 0y b b x b x b x
0 1 1 2 2 3 3
Picturing Qualitative Variables in Regression: Three Categories and Two Dummy Variables
Salary = 8547 + 949 Education + 1258 Experience - 3256 Gender (SE) (32.6) (45.1) (78.5) (212.4) (t) (262.2) (21.0) (16.0) (-15.3)
On average, female salaries are $3256 below male salaries
Genderif Female
if Male
1
0
Using Qualitative Variables in Regression: Example 7-6
A regression with interaction between a quantitative variable (X1) and a qualitative variable (X2 ):
X1
YLine for X2=0
b0+b2
b0
Line for X2=1Slope = b1
Slope = b1+b3
y b b x b x b x x 0 1 1 2 2 3 1 2
Interactions between Quantitative and Qualitative Variables: Shifting Slopes
One-variable polynomial regression model:Y=0+1 X + 2X2 + 3X3 +. . . + mXm +
where m is the degree of the polynomial - the highest power of X appearing in the equation. The degree of the polynomial is the order of the model.
X1
Y
X1
Y
y b b X 0 1
( )
y b b X b X
b
0 1 2
2
20
y b b X 0 1
y b b X b X b X 0 1 2
2
3
3
7-19 Polynomial Regression
MTB > regress sales' 2 'advert’ 'advsqr'
Regression Analysis
The regression equation isSALES = 3.52 + 2.51 ADVERT - 0.0875 ADVSQR
Predictor Coef Stdev t-ratio pConstant 3.5150 0.7385 4.76 0.000ADVERT 2.5148 0.2580 9.75 0.000ADVSQR -0.08745 0.01658 -5.28 0.000
s = 1.228 R-sq = 95.9% R-sq(adj) = 95.4%
Analysis of Variance
SOURCE DF SS MS F pRegression 2 630.26 315.13 208.99 0.000Error 18 27.14 1.51Total 20 657.40
151050
25
15
5
ADVERT
SA
LES
Polynomial Regression: Example 7-7
Variable Estimate Standard Error T-statistic X1 2.34 0.92 2.54 X2 3.11 1.05 2.96 X1
2 4.22 1.00 4.22 X2
2 3.57 2.12 1.68 X1X2 2.77 2.30 1.20
Polynomial Regression: Other Variables and Cross-Product Terms
The
Y X X X
The
Y X X X
:
multiplicative model
logarithmic transformation
:
log log log log log log
0 1 2 3
0 1 1 2 2 3 3
1 2 3
MTB > loge c1 c3MTB > loge c2 c4MTB > name c3 'LOGSALE' c4 'LOGADV'MTB > regress 'logsale' 1 'logadv'Regression AnalysisThe regression equation isLOGSALE = 1.70 + 0.553 LOGADV
Predictor Coef Stdev t-ratio pConstant 1.70082 0.05123 33.20 0.000LOGADV 0.55314 0.03011 18.37 0.000
s = 0.1125 R-sq = 94.7% R-sq(adj) = 94.4%
Analysis of VarianceSOURCE DF SS MS F pRegression 1 4.2722 4.2722 337.56 0.000 Error 19 0.2405 0.0127Total 20 4.5126
7-20 Nonlinear Models and Transformations: Multiplicative Model
MTB > regress 'sales' 1 'logadv'
Regression AnalysisThe regression equation isSALES = 3.67 + 6.78 LOGADV
Predictor Coef Stdev t-ratio pConstant 3.6683 0.4016 9.13 0.000LOGADV 6.7840 0.2360 28.74 0.000
s = 0.8819 R-sq = 97.8% R-sq(adj) = 97.6%
Analysis of Variance
SOURCE DF SS MS F pRegression 1 642.62 642.62 826.24 0.000 Error 19 14.78 0.78Total 20 657.40
The
Y e
The
Y X
X
:
exponential model
logarithmic transformation
:
log log log
0
0 1 1
1
Transformations: Exponential Model
151050
30
20
10
ADVERT
SA
LES
Simple Regression of Sales on Advertising
3210
3.5
2.5
1.5
LOGADV
LOG
SA
LE
Regression of Log(Sales) on Log(Advertising)
R- Sq uared = 0 .8 9 5
Y = 6 .59 2 71 + 1.19 176 X
R-Sq uared = 0 .9 47
Y = 1.70 0 82 + 0 .553 13 6 X
3210
25
15
5
LOGADV
SA
LES
R-Sq uared = 0 .978
Y = 3.6 682 5 + 6.784 X
Regression of Sales on Log(Advertising)
22122
1.5
0.5
-0.5
-1.5
Y-HAT
RE
SID
S
Residual Plots: Sales vs Log(Advertising)
Plots of Transformed Variables
• Square root transformation:Useful when the variance of the regression errors is
approximately proportional to the conditional mean of Y.
• Logarithmic transformation:Useful when the variance of regression errors is approximately
proportional to the square of the conditional mean of Y.
• Reciprocal transformation:Useful when the variance of the regression errors is
approximately proportional to the fourth power of the conditional mean of Y.
Y Y
Y Ylog( )
YY
1
Variance Stabilizing Transformations
E Y Xe
e
pp
p
X
X( )
log
( )
( )
0 1
0 11
1
y
x
1
0
Logistic Function
The logistic function:
Transformation to linearize the logistic function:
Regression with Dependent Indicator Variables
x2
x1
Orthogonal X variables provide information from independent sources. No multicollinearity.
x2x1
Perfectly collinear X variables provide identical information content. No regression.
Some degree of collinearity. Problems with regression depend on the degree of collinearity.
x2
x1
A high degree of negative collinearity also causes problems with regression.
x2
x1
7.21 Multicollinearity
• Variances of regression coefficients are inflated.
• Magnitudes of regression coefficients may be different from what are expected.
• Signs of regression coefficients may not be as expected.
• Adding or removing variables produces large changes in coefficients.
• Removing a data point may cause large changes in coefficient estimates or signs.
• In some cases, the F ratio may be significant while the t ratios are not.
Effects of Multicollinearity
MTB > CORRELATION 'm1' 'lend’ 'price’ 'exchange'
Correlations (Pearson)
M1 LEND PRICELEND -0.112PRICE 0.447 0.745EXCHANGE -0.410 -0.279 -0.420
MTB > regress 'exports' on 4 predictors 'm1’ 'lend’ 'price’ 'exchange';SUBC> vif.
Regression AnalysisThe regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE
Predictor Coef Stdev t-ratio p VIFConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000 3.2LEND 0.00470 0.04922 0.10 0.924 5.4PRICE 0.036511 0.009326 3.91 0.000 6.3EXCHANGE 0.268 1.175 0.23 0.820 1.4
s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4%
Detecting the Existence of Multicollinearity: Correlation Matrix of Independent Variables and Variance Inflation Factors
1.00.50.0
100
50
0Rh
2
VIF
Relationship between VIF and Rh2
The associated with
where R is the value obtained for the regression of X on the other independent variables.
h
2 2
variance inflation factor X
VIF XR
R
h
hh
:
( ) 1
1 2
Variance Inflation Factor
• Drop a collinear variable from the regression.
• Change in sampling plan to include elements outside the multicollinearity range.
• Transformations of variables.
• Ridge regression.
Solutions to the Multicollinearity Problem
An autocorrelation is a correlation of the values of a variable with values of the same variable lagged one or more periods back. Consequences of autocorrelation include inaccurate estimates of variances and inaccurate predictions.
Lagged Residuals
i i i-1 i-2 i-3 i-4
1 1.0 * * * * 2 0.0 1.0 * * * 3 -1.0 0.0 1.0 * * 4 2.0 -1.0 0.0 1.0 * 5 3.0 2.0 -1.0 0.0 1.0 6 -2.0 3.0 2.0 -1.0 0.0 7 1.0 -2.0 3.0 2.0 -1.0 8 1.5 1.0 -2.0 3.0 2.0 9 1.0 1.5 1.0 -2.0 3.010 -2.5 1.0 1.5 1.0 -2.0
The Durbin-Watson test (first-order autocorrelation): H0: 1 = 0 H1:0The Durbin-Watson test statistic:
dei eii
n
eii
n
( )12
22
1
7-22 Residual Autocorrelation and the Durbin-Watson Test
k = 1 k = 2 k = 3 k = 4 k = 5 n dL dU dL dU dL dU dL dU dL dU
15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21 16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15 17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10 18 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 0.71 2.06 . . . . . . . . . . . . . . . . . . 65 1.57 1.63 1.54 1.66 1.50 1.70 1.47 1.73 1.44 1.77 70 1.58 1.64 1.55 1.67 1.52 1.70 1.49 1.74 1.46 1.77 75 1.60 1.65 1.57 1.68 1.54 1.71 1.51 1.74 1.49 1.77 80 1.61 1.66 1.59 1.69 1.56 1.72 1.53 1.74 1.51 1.77 85 1.62 1.67 1.60 1.70 1.57 1.72 1.55 1.75 1.52 1.77 90 1.63 1.68 1.61 1.70 1.59 1.73 1.57 1.75 1.54 1.78 95 1.64 1.69 1.62 1.71 1.60 1.73 1.58 1.75 1.56 1.78100 1.65 1.69 1.63 1.72 1.61 1.74 1.59 1.76 1.57 1.78
Critical Points of the Durbin-Watson Statistic: =0.05, n= Sample Size, k = Number of Independent Variables
MTB > regress 'EXPORTS' 4 'M1' 'LEND' 'PRICE' 'EXCHANGE';SUBC> dw.
Durbin-Watson statistic = 2.58
PositiveAutocorrelation
NegativeAutocorrelation
Test isInconclusive
NoAutocorrelation
Test isInconclusive
0 dL dU 4-dL4-dU 4
For n = 67, k = 4: dU1.73 4-dU2.27 dL1.47 4- dL2.53 < 2.58
H0 is rejected, and we conclude there is negative first-order autocorrelation.
Using the Durbin-Watson Statistic
Full model:Y = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 +
Reduced model:Y = 0 + 1 X1 + 2 X2 +
Partial F test:H0: 3 = 4 = 0H1: 3 and 4 not both 0
Partial F statistic:
where SSER is the sum of squared errors of the reduced model, SSEF is the sum of squared errors of the full model; MSEF is the mean square error of the full model [MSEF = SSEF/(n-(k+1))]; r is the number of variables dropped from the full model.
F(r, (n (k 1))
(SSER
SSEF
) / r
MSEF
7-23 Partial F Tests and Variable Selection Methods
• All possible regressionsRun regressions with all possible combinations of independent
variables and select best model.
• Stepwise proceduresForward selection
Add one variable at a time to the model, on the basis of its F statistic.
Backward elimination
Remove one variable at a time, on the basis of its F statistic.Stepwise regression
Adds variables to the model and subtracts variables from the model, on the basis of the F statistic.
Variable Selection Methods
Compute F statistic for each variable not in the model
Enter most significant (smallest p-value) variable into model
Calculate partial F for all variables in the model
Is there a variable with p-value > Pout?Removevariable
Stop
Yes
NoIs there at least one variable with p-value > Pin?
No
Stepwise Regression
MTB > STEPWISE 'EXPORTS' PREDICTORS 'M1’ 'LEND' 'PRICE’ 'EXCHANGE'
Stepwise Regression
F-to-Enter: 4.00 F-to-Remove: 4.00
Response is EXPORTS on 4 predictors, with N = 67
Step 1 2Constant 0.9348 -3.4230
M1 0.520 0.361T-Ratio 9.89 9.21
PRICE 0.0370T-Ratio 9.05
S 0.495 0.331R-Sq 60.08 82.48
Stepwise Regression: Using the Computer
MTB > REGRESS 'EXPORTS’ 4 'M1’ 'LEND’ 'PRICE' 'EXCHANGE';SUBC> vif;SUBC> dw.Regression AnalysisThe regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE
Predictor Coef Stdev t-ratio p VIFConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000 3.2LEND 0.00470 0.04922 0.10 0.924 5.4PRICE 0.036511 0.009326 3.91 0.000 6.3EXCHANGE 0.268 1.175 0.23 0.820 1.4
s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4% Analysis of Variance
SOURCE DF SS MS F pRegression 4 32.9463 8.2366 73.06 0.000Error 62 6.9898 0.1127Total 66 39.9361
Durbin-Watson statistic = 2.58
Using the Computer: MINITAB
data exports;infile 'c:\aczel\data\c11_t6.dat';input exports m1 lend price exchange;proc reg data = exports;model exports=m1 lend price exchange/dw vif;run;
Model: MODEL1Dependent Variable: EXPORTS
Analysis of Variance
Sum of Mean Source DF Squares Square F Value Prob>F
Model 4 32.94634 8.23658 73.059 0.0001 Error 62 6.98978 0.11274 C Total 66 39.93612
Root MSE 0.33577 R-square 0.8250 Dep Mean 4.52836 Adj R-sq 0.8137 C.V. 7.41473
Using the Computer: SAS
Parameter Estimates
Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517 M1 1 0.368456 0.06384841 5.771 0.0001 LEND 1 0.004702 0.04922186 0.096 0.9242 PRICE 1 0.036511 0.00932601 3.915 0.0002 EXCHANGE 1 0.267896 1.17544016 0.228 0.8205
Variance Variable DF Inflation
INTERCEP 1 0.00000000 M1 1 3.20719533 LEND 1 5.35391367 PRICE 1 6.28873181 EXCHANGE 1 1.38570639
Durbin-Watson D 2.583(For Number of Obs.) 671st Order Autocorrelation -0.321
Using the Computer: SAS (continued)
The population regression
y
y
y
y
x x x x
x x x x
x x x x
x x x xk
k
k
k
n n n nk
.
.
.
..
model:
.
.
.
...
...
...
. . . . .
. . . . .
. . . . .
.
1
2
3
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
1
2
1
1
1
1
3
1
2
3
.
.
.
.
.
.
k k
Y X
The estimated regression
model:
Y = Xb+ e
The Matrix Approach to Regression Analysis (1)
The normal equations
X Xb X Y
Estimators
b X X X Y
values
Y Xb X X X X Y HY
V b X X
s b MSE X X
:
:
( )
:
( )
( ) ( )
( ) ( )
1
1
2 1
2 1
Predicted
The Matrix Approach to Regression Analysis (2)