8/14/2019 Simple Regression With SPSS
1/19
Example of Using SPSS to Generate a Simple Regression Analysis
Given a desire of a Retail Chain management team to develop a strategy toforecasting annual sales, the following data from a random sample of existing stores has
been gathered:
STORE SQUARE FOOTAGE ANNUAL SALES ($)
1 1726.00 3681.00
2 1642.00 3895.00
3 2816.00 6653.00
4 5555.00 9543.00
5 1292.00 3418.00
6 2208.00 5563.00
7 1313.00 3660.00
8 1102.00 2694.00
9 3151.00 5468.0010 1516.00 2898.00
11 5161.00 10674.00
12 4567.00 7585.00
13 5841.00 11760.00
14 3008.00 4085.00
We can enter the data into SPSSPc by typing it directly into the data editor, or by cuttingand pasting:
8/14/2019 Simple Regression With SPSS
2/19
Next, by clicking on Variable View, we can apply variable and value labels whereappropriate:
Assuming, for now, that if a relationship exists between the two variables, it is linear innature, we can generate a simple Scatterplot (or Scatter Diagram) for the data. This isaccomplished with the command sequence:
8/14/2019 Simple Regression With SPSS
3/19
Regression Analysis for Site Selection
Simple Scatterplot of Data
Square Footage of Store
70006000500040003000200010000
SalesR
evenueofStore
14000
12000
10000
8000
6000
4000
2000
0
Which yields the following (editable) scatterplot:
We can generate a simple straight line equation from the output resulting when using theEnter Command in regression:
8/14/2019 Simple Regression With SPSS
4/19
Variables Entered/Removedb
Square
Footage of
Storea
. Enter
Model
1
VariablesEntered VariablesRemoved Method
All requested variables entered.a.
Dependent Variable: Sales Revenue of Storeb.
Which yields:
8/14/2019 Simple Regression With SPSS
5/19
Mode l Summary
.954a .910 .902 936.8500
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), Square Footage of Storea.
ANOVAb
1.06E+08 1 106208119.7 121.009 .000a
10532255 12 877687.937
1.17E+08 13
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Square Footage of Storea.
Dependent Variable: Sales Revenue of Storeb.
^
So then Yi = 901.247 + 1.686X (noting that no direct interpretation of the Yintercept at 0 Square Footage is possible, sothat the intercept represents the portion ofthe annual sales varying due to factors other
than store size)
and where
SS
SSSS
Coefficientsa
901.247 513.023 1.757 .104 -216.534 2019.027
1.686 .153 .954 11.000 .000 1.352 2.020
(Constant)
Square Footage of Store
Model1
B Std. Error
Unstandardized
Coefficients
Beta
Standardi
zed
Coefficien
ts
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Sales Revenue of Storea.
b0
b1
8/14/2019 Simple Regression With SPSS
6/19
SST = SSR (regression sum of squares) + SSE (error sum of squares)
= sum of the squared differences between each observed value for Yand Y-Bar
SSR = sum of the squared differences between each predicted value of Yand Y-Bar
SSE = sum of the squared differences between each observed value of Yand each predicted value for Y
Coefficient of Determination = SSR/SSt = 0.91 (sample)
Standard Error of the Estimate = SYX = SQRT { SSE / n - 2} = 936.85
8/14/2019 Simple Regression With SPSS
7/19
Testing the General Assumptions of Regression and Residual Analysis
1. Normality of Error - similar to the t-test and ANOVA, regression is robust todeparture from the normality of errors around the regression line. This assumption is
often tested by simply plotting the Standardized Residuals (each residual divided byits standard error) on a histogram with a superimposed normal distribution, or on anormalo probability plot. SPSS allows us to perform both functions automatically(while, incidentally, saving the residual values in the original data file if this option istoggled):
8/14/2019 Simple Regression With SPSS
8/19
Of course, the assessment of normality by visually scanning the data leaves some
statisticians unsettled; so I usually add an appropriate test of normality conducted on thedata:
Variable n A-D p-value
Stand._Resid. 14 0.348 0.503
Regression Standardized Residual
1.00.500.00-.50-1.00-1.50-2.00
Histogram
Dependent Variable: Sales Revenue of Store
Frequency
5
4
3
2
1
0
Std. Dev = .96
Mean = 0.00
N = 14.00
Normal P-P Plot of Regression Standardized Residual
Dependent Variable: Sales Revenue of Store
Observed Cum Prob
1.00.75.50.250.00
ExpectedCum
Prob
1.00
.75
.50
.25
0.00
8/14/2019 Simple Regression With SPSS
9/19
2. Homoscedasticity - the assumption that the variability of data around the regressionline be constant for all values of X. In other words, error must be independent ofX. Generally, this assumption may be tested by plotting the X values against theraw residuals for Y. In SPSS, this must be done by plotting a Scatterplot from thesaved variables:
Click Here
8/14/2019 Simple Regression With SPSS
10/19
Results in data automatically added to the data file:
8/14/2019 Simple Regression With SPSS
11/19
Then, simply produce the requisite scatterplot as before:
Notice how there is no 'fanning' pattern to the data, implying homoscedasticity.
Square Footage of Store
600050004000300020001000UnstandardizedResidual
2000
1000
0
-1000
-2000
8/14/2019 Simple Regression With SPSS
12/19
Other authors, including those who wrote the SPSS routine, choose to plot the X valuesagainst the Studentized Residuals (Standardized Residuals Adjusted for their distancefrom the average X value) rather than the Unstandardized (raw) Residuals. SPSS willgenerate this plot automatically (select this under the Plots panel):
Scatterplot of Studentized Residuals
and Square Footage (X)
Square Footage of Store
600050004000300020001000
StudentizedResidual
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
-2.0
-2.5
Note the equivalence of results between the two plots. Statistically speaking, the X valuesand Residuals may be inferred to be 0.00. We can infer this using the correlation utility inSPSSPc, which tests the null hypothesis that the Pearson rho for the population is equal to0.00:
8/14/2019 Simple Regression With SPSS
13/19
8/14/2019 Simple Regression With SPSS
14/19
It should be noted that the distribution of the data also suggest that an assumption oflinearity is also reasonable at this point.
3) Independence of the Errors - assumes that no autocorrelation is present. Generally,evaluated by plotting the residuals in the order or sequence in which the originaldata were collected. This approach, when meaningful, uses the Durbin-WatsonStatistic and associated Tables of Critical values. SPSS can generate this valuewhen requested as part of the Model Summary:
A number of other statistics are also available in SPSS regarding ResidualAnalysis:
Mode l Summaryb
.954a .910 .902 936.8500 .910 121.009 1 12 .000 2.446
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
R Square
Change F Change df1 df2 Sig. F Change
Change Statistics
Durbin-W
atson
Predictors: (Constant), Square Footage of Storea.
Dependent Variable: Sales Revenue of Storeb.
Correlations
1.000 .000 .015
. 1.000 .959
14 14 14
.000 1.000 .999**
1.000 . .000
14 14 14
.015 .999** 1.000
.959 .000 .
14 14 14
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Square Footage of Store
Unstandardized Residual
Studentized Residual
Square
Footage
of Store
Unstanda
rdized
Residual
Studentize
d Residual
Correlation is significant at the 0.01 level (2-tailed).**.
8/14/2019 Simple Regression With SPSS
15/19
Residuals Statisticsa
2759.3672 10749.96 5826.9286 2858.2959 14
-1.073 1.722 .000 1.000 14
250.7362 512.8126 345.3026 81.3831 14
2771.8208 10518.55 5804.4373 2830.7178 14
-1888.14 1070.6108 -3.25E-13 900.0964 14
-2.015 1.143 .000 .961 14
-2.092 1.288 .011 1.035 14
-2033.82 1442.1392 22.4913 1049.3911 14
-2.512 1.329 -.014 1.111 14
.003 2.967 .929 .901 14
.001 .355 .086 .103 14
.000 .228 .071 .069 14
Predicted Value
Std. Predicted Value
Standard Error of
Predicted Value
Adjusted Predicted Value
Residual
Std. Residual
Stud. Residual
Deleted Residual
Stud. Deleted Residual
Mahal. Distance
Cook's DistanceCentered Leverage Value
Minimum Maximum Mean Std. Deviation N
Dependent Variable: Sales Revenue of Storea.
8/14/2019 Simple Regression With SPSS
16/19
Inferences About the Model and Interval Estimates
We can determine the presence of a significant relationship between X and Y by testing todetermine whether the observed slope is significantly greater than 0, the hypothesizedslope of the regression line if no relationship existed. This can be done with a t-test,
which divides the observed slope by the standard error of the slope (supplied by SPSS):
or with an ANOVA model, which provides identical results:
noting that t2, as expected, equals F; and the p-values are therefore equal. Note that SPSSalso provides the confidence interval associated with the slope.
Finally, SPSS allows you to calculate and store both Confidence and Prediction Limitsfor the observed data. After you generate the scatterplot, left double-click on the chart;this will take you to the chart editor:
Coefficientsa
901.247 513.023 1.757 .104 -216.534 2019.027
1.686 .153 .954 11.000 .000 1.352 2.020
(Constant)
Square Footage of Store
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardi
zed
Coefficien
ts
t Sig. Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: Sales Revenue of Storea.
ANOVAb
1.06E+08 1 106208119.7 121.009 .000a
10532255 12 877687.937
1.17E+08 13
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Square Footage of Storea.
Dependent Variable: Sales Revenue of Storeb.
8/14/2019 Simple Regression With SPSS
17/19
Next:
Then:
8/14/2019 Simple Regression With SPSS
18/19
Click on Fit Options
8/14/2019 Simple Regression With SPSS
19/19
LCL UCL LPL UPL
3135.52558 4487.50548 1661.27256 5961.75850
2976.95430 4362.80609 1514.25297 5825.50741
5102.73145 6196.07384 3536.24581 7762.55948
9232.70820 11302.74446 7979.09247 12556.36019
2309.22155 3850.24435 897.92860 5261.53731
4028.95209 5219.51308 2497.98206 6750.48311
2349.56701 3880.71656 935.07592 5295.20765
1942.80866 3575.92595 560.87909 4957.85553
5663.35086 6765.16486 4100.00127 8328.51446
2737.79303 4177.06134 1293.06683 5621.78754
8677.59067 10529.18763 7362.03125 11844.74705
7827.42925 9376.22071 6418.64584 10785.004129632.63839 11867.28348 8422.94738 13076.97449
5426.83323 6519.44789 3860.07783 8086.20329
Regression Analysis for Site Selection
Scatterplot of Data Including Confidence & Prediction Limits
Square Footage of Store
600050004000300020001000
SalesRe
venueofStore
12000
10000
8000
6000
4000
2000 Rsq = 0.9098
Top Related