SPSS Statistics Training a
Transcript of SPSS Statistics Training a
SPSS Statistics Training a Session 1 Sample Files:
a. Welcome Dialog 1. From the welcome window 2. Click on Sample Files > demo.sav > open
b. File Structure
1. From the top bar menu: 2. Click on File > Open > Data 3. Applications > IBM > SPSS > Statistics > 26 > Samples> English
File Types:
3. .sav files -> SPSS data files 4. .sps files -> Syntax or text commands 5. .spv files - > Output files 6. In addition, there might be .csv, .xlsx, .txt, …
Data Types, Measures, and Roles In Data View Window: Variable Types (Column Header Type):
Ruler = Quantitative/Internal/Ratio variable (Scale Variable) 3 step bars = Ordinal Variable 3 circles = Nominal/Categorical/Discreet Variable
Switch to Variable view from the bottom of the window: 1. Click on the “…” next to one of the values under “Type” Column 2. The most common variable types in SPSS are Numeric and String 3. Other variable types:
Comma, Dot, Scientific Notion, Date, Dollar, Currency, …
Variable Role
1. Input 2. Target 3. Both 4. None -> Example: ID Numbers, or something that is not going to be used in
modeling 5. Partition 6. Split – Both used in Automatic Modeling
Data Visualization Graphs
• Bar Chart 1. From the tool bar menu on the top click on Graphs > Chart Builder > Ok 2. From Gallery > drag the simple bar chart to the canvas 3. Drag a scale (quantitative) variable like “Level of education” to the X-Axes 4. Click Ok
There’s one problem here and that is because SPSS thinks “Level of education” is a continues / scaled variable, it also gives the mean and standard deviation. If we were talking about years of education this would be appropriate (it’s a ratio level) Let’s look into the data set 1. Go to Data Window > Double click on “ed” for “Level of education” > It takes you
to the Variable window 2. Open Values for “ed” > it shows ordinal values for this variable not scales 3. Go to Measure > Change it to “Ordinal” 4. Go back to Data Window and chack the icon for “ed”variable which now is
changed to “Ordinal” 5. Now click on the recent command button on the top of the window and select
“Chart Builder” 6. Click on Reset 7. Drag a bar chart to the canvas 8. Select and drag “Level of Education” to the X-Axis > Click Ok
• Boxplots 1. From the top bar menu click on Graphs > Chart Builder > Ok 2. From Gallery > drag the Boxplot chart to the canvas 3. Select “Years at current address” and drag it to the Y-Axis. Because that is
our outcome variable (the thing that we are trying to get the boxplot for) 4. Boxplots are really good for showing outliers 5. Noe Select “Level of education” and drag it to the X-Axis 6. Make sure that “Level of education” is an ordinal (categorical) field, rather
than scale which is by default for this data set. 7. Click Ok 8. By default, it shows ID number for the outliers and you can change this
9. To get rid of all those IDs: Double click on the chart 10. In order to have a better focus on the points that are in the box area, we need
to eliminate the outliers which have dominated our visualization for this example. To do so, we can cut down the scale of the income threshold to maybe show only 300000 per year
11. Double click on the chart and then double click again on the Y-Axis to open the Properties window
12. Then Click on Scale and adjust the Maximum from 1200 to 300 and Major Increment from 200 to 50
13. Then click Apply
14. You can also reformat the chart or change the title of the axis using the
Properties window
Session 2 Graphboard Template Chooser If you are looking for some fast visualization and possibly some recommendation from the tool this is a grate option. By clicking on one or a collection of the variables, the program will recommend appropriate visualizations. Example 1: A Scale Variable “Household income in thousands”
You can modify the graph to look like this: This method can be used whenever we have a strongly skewed distribution, like working with money data
1. Click on Scale 2. Click on Type > Select Log 3. Notice that we lost people who make less that $10000 a year, but those are not
many people. (Outliers) 4. Always try to make your graph look close to a normal distribution
Example2: Multi Variable Graph: “Household income in thousands” and “years with current employer”
You can see these two variables have a strong association with each other.
Legacy Dialogs Example 1. Boxplot for Several Variables at Once
1. If you have multiple variables that are on the same scale, then you can visualize all of them simultaneously
2. This is a good way to compare variables that are related to each other – Too see their correlations
3. Click on Legacy Dialogs > Select Basic Boxplot 4. Select Summaries of Separate Variables > Click on Define 5. In the new window that opens up > Select 3 variables that are on the same scale;
such as “Years at current address”, “Years with current employer” and “Age in years”
6. Click “Ok”
Creating Regression Variable Plots Example 1. The regression variable plot is a way of looking at the association of variables that you might use in a regression model
1. From Graph, select Regression Variable Plots 2. Vertical-Axis Variable > is the outcome of the regression (The thing that we want
predict) – It only accepts “Scale” variables 3. Horizontal-Axis Variable > is the input variables > This will look at every possible
combination of those plots and it will adapt it to the level of measurement of each one – You can use any kind of variables you want for this part (For explanatory variables)
4. Select “Household income in thousands” and “job satisfaction” for your output variables
5. for explanatory variables, select “Age in years”, “Level of Education”, and “Gender”; (which is interesting categorical variable, since it’s a text and not a number)
6. Click Ok
This produces several plots as an output:
Although, the income goes up with level of education, the job satisfaction on average seems to be decrease as the level of education increases.
The idea here is that, especially when your model gets pretty sophisticate, you want to see what’s happening in the data, before you get carried away with modeling and regression Variables Plot is a great option to look at all the combination and produce it quickly and effectively.
Comparing Subgroups
1. From Graph menu, select Compare Subgroups 2. We want to pick multiple variables that we want to look at / split by a different
group 3. For “Subgroup Defied by”, you must pick a Categorical variable – 4. Select “Household income in thousands” for “Subgroups Defined by” 5. Select “Age in years”, “Price of primary vehicle”, “years with current employers”
“Numbers of people in household”, and “Job satisfaction” for “Variables to Plot” Click ok
By default, this will sort the data set as well. The output is not really precise. This is an impressionistic devise. It’s point is to give you a general feel for what the differences are. Then you can go and compare each of the variables separately. But the ability to do compare subgroups is a neat option for looking at patterns overall in several variables or potentially several groups simultaneously.
Splitting Files “Comparing subgroups and Selecting Subgroups”
• Comparing Groups: In order to show the differences, let’s try an example before and after splitting the data file.
1. Go to Analyze 2. Select Descriptive Statistics 3. Select Descriptives … 4. Select: “Wireless service”, “Multiple lines”, “Voice Mail”, “Internet”, Caller ID”, and
“Call waiting” 5. Hit ok
6. Go to Data 7. Select “Split File”
8. Now select Compare Groups 9. Select “Gender” as the variable that you want to group your statistics analysis
10. One important thing here is that SPSS needs to have the data file sorted by the field that you want to split it with – This actually will make a change into the data set – By default the data set is sorted by alphabetical order
11. Now we will compare our descriptive analysis again with the same fields that we compared earlier
12. This time let’s try “Separated by” instead of “Layered by” 13. Go to “Split File” > This time select “Organize output by groups” 14. Click ok
15. This will force SPSS to treat the category of gender as 2 separate files and do the analysis one after the other
16. Now try Descriptive Statistics example again