Regr Hand
Transcript of Regr Hand
-
7/29/2019 Regr Hand
1/4
SPSS Regression Output
Regression
The box below is the first thing youll see in the standard SPSS regression output. Standard output
means that I did not click ANY boxes or options to get this printout. For bivariate regression thisfirst box is not of much interest, because it simply lists the single variable we are using as apredictor. Later for multiple regression this will show all the variables in our models, and it can also
show the sets of variables for several models at once.
Variables Entered/Removedb
SOCIO-EC
ONOMIC
STATUS
COMPOSIT
Ea
. Enter
Model
1
Variables
Entered
Variables
Removed Method
All requested variables entered.a.
Dependent Variable: MATH STANDARDIZED SCOREb.
The model summary box comes next. In it you will find R, R2 and the standard error of estimate
(Sy.x), which is the square root of the mean squared error, or MSE. Here we interpret R as the
correlation of the Y scores with the predicted values Y .
The adjusted R
2
is used in multiple regression. It is adjusted to account for the use of morepredictors simply adding more Xs can raise your R2, so this value is adjusted downwards a little topenalize ourselves for just hunting around for significant predictors.
Model Summary
.531a .282 .279 8.2148
Model
1
R R Square
Adjusted R
Square
Std. Error
of the
Estimate
Predictors: (Constant), SOCIO-ECONOMIC STATUS
COMPOSITE
a.
Note that in the footnote to the model summary SPSS tells us what predictors are relevant for the Rand R2, even though in this case we have only one predictor. If we were to run several bivariate
regressions or several multiple regressions, we would get a list of several models.
The word constant in parentheses refers to the intercept. This is printed because it is possible to
force SPSS not to estimate an intercept. This is only done in unusual situations for most
regressions, and all of the regressions we will run, we will allow SPSS to estimate the intercept term.
-
7/29/2019 Regr Hand
2/4
The box below shows the ANOVA table for the regression. ANOVA stands for Analysis Of
Variance specifically the analysis of variation in the Y scores. Here we see the two sums of
squares introduced in class the regression and residual (or error) sums of squares. The variance ofthe residuals (or errors) is the value of the mean square error or MSEhere it is 67.483.
Recall that we compare the value of the MSE to the value of the variance of Y. The standardoutput does not give us the variance of Y you need to click the statistics button (in the regression
menu) to get it OR run descriptive statistics on Y.
Also in this table we find the F test. This tests the hypothesis that the predictor (here our only
predictor) shows no relationship to Y. We can write hypothesis this in several ways, as mentioned in
class.
The F test has two numbers for its degrees of freedom (recall that our t test has one df). These are
called the numerator and denominator degrees of freedom,. Of df1 and df2 . Here the numerator df
(df1) tells us how many predictors we have (this time it is 1) and the denominator degrees of freedom
are n - 1- df1 or n-2 for bivariate regression.
The value of the test for our data is F(1,248) = 97.42. The table shows us this is significant (p < .001). As the F is large, we determine that our predictor of math outcome (here , ses) is related to
math score in our population.
ANOVAb
6574.387 1 6574.387 97.423 .000a
16735.828 248 67.483
23310.215 249
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), SOCIO-ECONOMIC STATUS COMPOSITEa.
Dependent Variable: MATH STANDARDIZED SCOREb.
Again the footnote tells us what predictor is being used and what outcome is being predicted.
Last, the table provides us with the data we need to compute R2. If we compute SS-regression
divided by SS-Total, we should get R2.
SS-regression /SS-Total = 6574.39/ 23310.21 = .282
The last table is full of information about the model. In it we find the slope (or slopes, in multipleregression). Our values of b0 and b1 are listed as unstandardized values, and their standard errors
SE( b0 ) and SE( b1) are in the second column. The standardized coefficient for the predictor in a
bivariate regression is simple the correlation. Check back to the value of R in the first table, and see
that it is the same as Beta here -- .531. In our notation from class this is b*1.
-
7/29/2019 Regr Hand
3/4
We can write the sample regression model from these slopes and also the sample standardized
regression model, if we like. Those models are
Sample regression model: Sample standardized regression model:
Yi = b0 + b1 (sesi) + eI Z(yi) = b*1Z(sesi) + eI*
Yi = 51.28 + 6.54 (sesi) + eI Z(yi) = .531 Z(sesi) + ei*
Note, you could also write these models using Y and omitting the error terms.
Recall again the interpretation of the slopes. The unstandardized slope of 6.54 tells us that a
students math score increases by about 6.5 points for every additional point on the SES scale.Higher SES scores are associated with higher math scores.
The standardized slope tells us that for each standard-deviation unit of increase in SES, we predict
slightly more than a half of a standard deviation increase in math score.
Coefficientsa
51.277 .520 98.552 .000
6.537 .662 .531 9.870 .000
(Constant)
SOCIO-ECONOMIC
STATUS COMPOSITE
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: MATH STANDARDIZED SCOREa.
Last, the table gives us the t tests for the slope and intercept. In multiple regression we will get
individual tests for each predictor. The table does not tell us the df for the t. We need to know that
the df for each t is the same as the df for residuals in the F table above. Here the df is n-2.
Each t test examines the hypothesis H0: = 0 for the predictor used.
As we learned in class, the F is the square of the t test, when we have only one predictor. Here that
is 9.87*9.87 = 97.42. Also our results must agree (if only one X is used). Here again we reject the
null model, and decide that SES is a good predictor of math score, and has a slope that is not zero in
the population.
Last we check our assumptions. In regression we make assumptions about the structural part of the
model, that is, about the predictor(s). We assume that all of the important predictors are in ourmodel, and no unimportant ones are included. This is usually NOT a good assumption for a
bivariate regression!!
-
7/29/2019 Regr Hand
4/4
We also make assumptions about our errors. Specifically, we assume that the residuals are
independent and normally distributed, and that they have equal variances for any X value.
Therefore we make a normal plot (to get this we need to click on the [Plots] button in SPSS).These residuals dont look very normal it is likely that other predictors could explain more
variation in the data.
Regression Standardized Residual
2.251.75
1.25.75
.25-.25
-.75-1.25
-1.75
-2.25
-2.75
Histogram
Dependent Variable: MATH STANDARDIZED
Frequency
30
20
10
0
Std. Dev = 1.00
Mean = 0.00
N = 250.00
To check the assumption about variances we make a scatterplot. We plot the residuals on the Y axis
and the predictor variable (or equivalently the predicted values) on the X axis. Using the [Plots]button, we select zpred and put it on the X axis and use zresid on the Y axis. We hope to find equal
scatter in the points all along the horizontal axis. This plot looks pretty good!!
Scatterplot
Dependent Variable: MATH STANDARDIZED S
Regression Standardized Predicted Value
3210-1-2-3
RegressionStandard
izedResidual
3
2
1
0
-1
-2
-3