SPSS Training Thomas V. Joshua, MS July, 2012 College of Nursing.

SPSS Training Thomas V. Joshua, MS July, 2012 College of Nursing

Regression Diagnostics SPSS has the different regression diagnoses we can use to judge the quality of your regression model. Theoretically, we wish to check that our general assumptions are valid. We would like to perform such checks before model fitting but this is usually impossible, so that we are forced to conduct such checks after model fitting. Such checks are often referred to as diagnostics.

Regression Diagnostics Diagnostics in standard multiple regression models: Distribution Normality Equal variance (homogeneity) Independence Linearity explanatory variables Transformations response variable explanatory variables

Regression Diagnostics (cont) Diagnostics in standard multiple regression models: Outliers dubious measurements mistakes in recording atypical individuals Influence individual observations that exert undue influence on the coefficients Collinearity predictors that are highly collinear.

Residuals The user can request any of these to be saved in variables with the names: res unstandardized residual zre standardized residual sre studentized residual pre predicted value

Residual Plots Residuals can be used in plots for the following diagnostics 1. Normality tests Distribution Histogram of sre (studentized residual). Q-Q plot of sre (studentized residual). Homogeneity 2. Linearity - Scatterplot of res (unstandardized residual) vs pre. 3. Outliers - Scatterplot of sre vs pre.

Influence Statistics The leverage measures this sort of influence of the independent variables. A case with large leverage and an unusual value for the dependent variable will have the largest effect on the parameter estimates. Cooks distance is a combined measure of influence of both dependent and independent variables. The values of these can be saved in variables: coo Cooks distance lev leverages

Influence Plots Outliers in independent variables Scatterplot of lev vs pre. Anomalous cases Scatterplot of coo vs pre Formal Tests Linearity - Can add extra terms to the model Linearity - Can add extra terms to the model Outliers - Can test the change in S.S. when an observation is excluded Outliers - Can test the change in S.S. when an observation is excluded

Diagnostic Facilities in SPSS Regression Linear Statistics: Durbin-Watson, Casewise diagnostics Plots: Histogram, Normal Probability plots (residuals) Save: allows residuals, influence statistics and predicted values to be saved. Compare Means One-way ANOVA Options: Homogeneity of variance (+ descriptives and means plot).

Diagnostic Facilities in SPSS (cont) User-defined Diagnostics Having saved the relevant diagnostic statistics, these can be displayed using the graphical facilities, e.g. Scatterplots Histograms P-P and Q-Q plots Sequence (Scatterplot is often more flexible)

Example Data set: SPSS v16 data set GSS93 subset.sav Step: Analyze -> Regression -> Linear Dependent variable: Respondent's Income (rincom91) Dependent variable: Respondent's Income (rincom91) Independent variable: age of respondent (age), age when first married (agewed), respondents highest degree (degree), and highest year of school completed (educ). Independent variable: age of respondent (age), age when first married (agewed), respondents highest degree (degree), and highest year of school completed (educ).

Syntax REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI BCOV R ANOVA COLLIN TOL CHANGE ZPP /CRITERIA=PIN(.05) POUT(.10) CIN(95) /NOORIGIN /DEPENDENT rincom91 /METHOD=ENTER agewed age educ degree /PARTIALPLOT ALL /SCATTERPLOT=(*SDRESID,*ZPRED ) (*ZPRED,rincom91 ) /RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID) /CASEWISE PLOT(ZRESID) OUTLIERS(3) /SAVE PRED ZPRED MAHAL COOK LEVER MCIN RESID ZRESID SRESID DRESID SDRESID DFBETA DFFIT.

Output The correlation matrix below shows the Pearsonian R's, the significance of each R, and the sample size (n) for each R. All correlations are significant at the.05 level or better, except age with age when first married, and respondent's highest degree. At r =.884, there may be multicollinearity between highest year of education completed and highest degree.

R-squared is the percent of the dependent explained by the independents. In this case, they explains only 16.9% of the variance. This suggests the model was misspecified. Other variables, not suggested by the researcher, explain the bulk of the variance. Moreover, the independents variables here may share common variance with these unmeasured variables and in fact part or even all of the observed 16.9% might disappear if these unmeasured variables were entered first in the equation.

Adjusted R-squared is a standard, arbitrary downward adjustment to penalize for the possibility that, with many independents, some of the variance may be due to chance. The more independents, the more the adjustment penalty. Since there are only four independents here, the penalty is minor. The F value for the "Change Statistics" shows the significance level associated with adding the variable or set of variables for that step in stepwise or hierarchical regression, not pertinent here. The Durbin-Watson statistic tests for serial correlation of error terms for adjacent cases. This test is mainly used in relation to time series data, where adjacent cases are sequential years.

Output (cont) The ANOVA table below tests the overall significance of the model (that is, of the regression equation). If we had been doing stepwise regression, significance for each step would be computed. Here the significance of the F value is below.05, so the model is significant.

The t-test tests the significance of each b coefficient. It is possible to have a regression model which is significant overall by the F test, but where a particular coefficient is not significant. Here only age of respondent and highest year of school completed is significant. This would lead the researcher to re-run the model, dropping one variable at a time, starting with the least significant (here, age when first married).

The zero-order correlation is simply the raw correlation from the correlation matrix given at the top of this output. The partial correlation is the correlation of the given variable with respondent's income, controlling for other independent variables in the equation. Part correlation, in contrast, "removes" the effect of the control variable(s) on the independent variable alone. Part correlation is used when one hypothesizes that the control variable affects the independent variable but not the dependent variable, and when one wants to assess the unique effect of the independent variable on the dependent variable.

The tolerance for a variable is 1 - R-squared for the regression of that variable on all the other independents, ignoring the dependent. When tolerance is close to 0 there is high multicollinearity of that variable with other independents and the b and beta coefficients will be unstable. VIF is the variance inflation factor, which is simply the reciprocal of tolerance. Therefore, when VIF is high there is high multicollinearity and instability of the b and beta coefficients.

The table below is another way of assessing if there is too much multicollinearity in the model. To simplify, crossproducts of the independent variables are factored. High eigenvalues indicate dimensions (factors) which account for a lot of the variance in the crossproduct matrix. Eigenvalues close to 0 indicate dimensions which explain little variance.

The condition index summarizes the findings, and a common rule of thumb is that a condition index over 15 indicates a possible multicollinearity problem and a condition index over 30 suggests a serious multicollinearity problem. The researcher will wish to drop a variable, combine variables, or pursue some other multicollinearity strategy.

Output (cont) The table below is a listing of outliers: cases where the prediction is 3 standard deviations or more from the mean value of the dependent. The researcher looks at these cases to consider if they merit a separate model, or if they reflect measurement errors. Either way, the researcher may decide to drop these cases from analysis.

The bottom three rows are measures of the influence of the minimum, maximum, and mean case on the model. Mahalanobis distance is (n-1) times leverage (the bottom row), which is a measure of case influence. Cases with leverage values less than.2 are not a problem, but cases with leverage values of.5 or higher may be unduly influential in the model and should be examined. Cook's distance measures how much the b coefficients change when a case is dropped. In this example, it does not appear there are problem cases since the maximum leverage is only.098.

Output (cont) Partial regression plots simply show the plot of one independent (such as agewed, for instance) on the dependent (respondent income). A partial residual plot, also called an "added variable plot," is plotted when there are two or more predictors. You get one partial residual plot for each predictor. Thus the partial residual plot is a visual form of the t- test of the b coefficient for the given variable.

SPSS Training Thomas V. Joshua, MS July, 2012 College of Nursing.

Documents

Transcript of SPSS Training Thomas V. Joshua, MS July, 2012 College of Nursing.

Introduction to SPSS - staff.emu.edu.tr · PDF fileBasics of SPSS Basic Structures of SPSS Missing Value Analyze Data Graphing Data ... – Click PROGRAM FILES SPSS Eval – Click

SPSS How to use Spss software

Correlation & Regression. The Data SPSS-Data.htm SPSS-Data.htm Corr_Regr.

Intermediate IBM SPSS - Flinders University · Intermediate IBM SPSS ... – IBM SPSS Advanced Statistics – IBM SPSS Complex Samples ... • Chapters 9,15 in Field, Andy P. (2009).

IBM SPSS Modeler 14.2 User’s Guidefaculty.smu.edu/tfomby/eco5385_eco6380/data/SPSS/SPSS Modeler_1… · IBM SPSS Modeler Source, Process, and Output Nodes. Descriptions of all the

Getting your Data into SPSS S11 - University of Guelph ... your Data into SPSS ..... 0 SPSS availability ..... 3 Data for SPSS ... Exercise: Open a text data file The previous SPSS

SPSS Data Analysis...SPSS SPSS Modeler SPSS Text Analytics for Surveys Stata 11 12 Data Acquisition Qualtrics (web survey server) Numbers Amos, JMP, MATLAB, SAS, SPSS, R (under review:

SPSS statistics help - how to use SPSS

Intermediate IBM SPSS - Flinders University...SPSS / PASW / IBM SPSS Pawel Skuza 2013 IBM SPSS on Flinders University • Flinders University has licence for number of IBM SPSS products

SPSS Interactive Graphics 10 - priede.bf.lu.lvpriede.bf.lu.lv/grozs/Datorlietas/SPSS/SPSS... · iii Preface SPSS® 10.0 is a comprehensive system for analyzing data. This manual,

IBM SPSS & Apache Spark - Meetupfiles.meetup.com/7770922/SPSS and Spark.pdf · IBM SPSS & Apache Spark Making Big Data analytics easier and more ... IBM SPSS Modeler Discover key

SPSS ANALYSIS WITHOUT ANGUISH USING SPSS V12

SPSS statistics - how to use SPSS

Installing SPSS 15 - NUI Galway - NUI Galway · Installing SPSS 15 Note: If you are upgrading from SPSS 14 to SPSS 15 please take a backup of all your SPSS files before installing

AN SPSS PRIMER - California University of Pennsylvaniaworkforce.calu.edu/sweeney/SPSS Primer.pdf · An SPSS Primer 3 FORMATTING VARIABLES SPSS Data Editor: (Variable View) This screen

SPSS Survey Tips - » SPSS Predictive Analytics

introduction to spss 07jan2014 - Arizona Stateeagle/spss/spssintro.pdf · Introduction to IBM SPSS Statistics ... SPSS Software Versions ASU is licensed to use SPSS 20, 21, and 22.

SPSS 16.0 for Windows, Macintosh, and Linux SPSS …spss.ch/upload/1192788433_SPSS 16 Complete small.pdf · SPSS 16.0 Complete SPSS 16.0 for Windows, Macintosh, and Linux SPSS 16

SPSS Maps™ 10 - Kuwait University Maps 10.0.pdf · Because SPSS Maps is an option to the SPSS Base software, you have available all ... SPSS India Tel: +91.80.225.0260 SPSS Ireland

IBM SPSS Statistics: What’s New - SPSS Schweiz heisst ... · IBM SPSS Statistics: What’s New ... Added in IBM SPSS Statistics 20 ... in SPSS Custom Tables • Interactive Model