SPSS Statistics Training a

SPSS Statistics Training a Session 1 Sample Files:

a. Welcome Dialog 1. From the welcome window 2. Click on Sample Files > demo.sav > open

b. File Structure

1. From the top bar menu: 2. Click on File > Open > Data 3. Applications > IBM > SPSS > Statistics > 26 > Samples> English

File Types:

3. .sav files -> SPSS data files 4. .sps files -> Syntax or text commands 5. .spv files - > Output files 6. In addition, there might be .csv, .xlsx, .txt, …

Data Types, Measures, and Roles In Data View Window: Variable Types (Column Header Type):

Ruler = Quantitative/Internal/Ratio variable (Scale Variable) 3 step bars = Ordinal Variable 3 circles = Nominal/Categorical/Discreet Variable

Switch to Variable view from the bottom of the window: 1. Click on the “…” next to one of the values under “Type” Column 2. The most common variable types in SPSS are Numeric and String 3. Other variable types:

Comma, Dot, Scientific Notion, Date, Dollar, Currency, …

Variable Role

1. Input 2. Target 3. Both 4. None -> Example: ID Numbers, or something that is not going to be used in

modeling 5. Partition 6. Split – Both used in Automatic Modeling

Data Visualization Graphs

• Bar Chart 1. From the tool bar menu on the top click on Graphs > Chart Builder > Ok 2. From Gallery > drag the simple bar chart to the canvas 3. Drag a scale (quantitative) variable like “Level of education” to the X-Axes 4. Click Ok

There’s one problem here and that is because SPSS thinks “Level of education” is a continues / scaled variable, it also gives the mean and standard deviation. If we were talking about years of education this would be appropriate (it’s a ratio level) Let’s look into the data set 1. Go to Data Window > Double click on “ed” for “Level of education” > It takes you

to the Variable window 2. Open Values for “ed” > it shows ordinal values for this variable not scales 3. Go to Measure > Change it to “Ordinal” 4. Go back to Data Window and chack the icon for “ed”variable which now is

changed to “Ordinal” 5. Now click on the recent command button on the top of the window and select

“Chart Builder” 6. Click on Reset 7. Drag a bar chart to the canvas 8. Select and drag “Level of Education” to the X-Axis > Click Ok

• Boxplots 1. From the top bar menu click on Graphs > Chart Builder > Ok 2. From Gallery > drag the Boxplot chart to the canvas 3. Select “Years at current address” and drag it to the Y-Axis. Because that is

our outcome variable (the thing that we are trying to get the boxplot for) 4. Boxplots are really good for showing outliers 5. Noe Select “Level of education” and drag it to the X-Axis 6. Make sure that “Level of education” is an ordinal (categorical) field, rather

than scale which is by default for this data set. 7. Click Ok 8. By default, it shows ID number for the outliers and you can change this

9. To get rid of all those IDs: Double click on the chart 10. In order to have a better focus on the points that are in the box area, we need

to eliminate the outliers which have dominated our visualization for this example. To do so, we can cut down the scale of the income threshold to maybe show only 300000 per year

11. Double click on the chart and then double click again on the Y-Axis to open the Properties window

12. Then Click on Scale and adjust the Maximum from 1200 to 300 and Major Increment from 200 to 50

13. Then click Apply

14. You can also reformat the chart or change the title of the axis using the

Properties window

Session 2 Graphboard Template Chooser If you are looking for some fast visualization and possibly some recommendation from the tool this is a grate option. By clicking on one or a collection of the variables, the program will recommend appropriate visualizations. Example 1: A Scale Variable “Household income in thousands”

You can modify the graph to look like this: This method can be used whenever we have a strongly skewed distribution, like working with money data

1. Click on Scale 2. Click on Type > Select Log 3. Notice that we lost people who make less that $10000 a year, but those are not

many people. (Outliers) 4. Always try to make your graph look close to a normal distribution

Example2: Multi Variable Graph: “Household income in thousands” and “years with current employer”

You can see these two variables have a strong association with each other.

Legacy Dialogs Example 1. Boxplot for Several Variables at Once

1. If you have multiple variables that are on the same scale, then you can visualize all of them simultaneously

2. This is a good way to compare variables that are related to each other – Too see their correlations

3. Click on Legacy Dialogs > Select Basic Boxplot 4. Select Summaries of Separate Variables > Click on Define 5. In the new window that opens up > Select 3 variables that are on the same scale;

such as “Years at current address”, “Years with current employer” and “Age in years”

6. Click “Ok”

Creating Regression Variable Plots Example 1. The regression variable plot is a way of looking at the association of variables that you might use in a regression model

1. From Graph, select Regression Variable Plots 2. Vertical-Axis Variable > is the outcome of the regression (The thing that we want

predict) – It only accepts “Scale” variables 3. Horizontal-Axis Variable > is the input variables > This will look at every possible

combination of those plots and it will adapt it to the level of measurement of each one – You can use any kind of variables you want for this part (For explanatory variables)

4. Select “Household income in thousands” and “job satisfaction” for your output variables

5. for explanatory variables, select “Age in years”, “Level of Education”, and “Gender”; (which is interesting categorical variable, since it’s a text and not a number)

6. Click Ok

This produces several plots as an output:

Although, the income goes up with level of education, the job satisfaction on average seems to be decrease as the level of education increases.

The idea here is that, especially when your model gets pretty sophisticate, you want to see what’s happening in the data, before you get carried away with modeling and regression Variables Plot is a great option to look at all the combination and produce it quickly and effectively.

Comparing Subgroups

1. From Graph menu, select Compare Subgroups 2. We want to pick multiple variables that we want to look at / split by a different

group 3. For “Subgroup Defied by”, you must pick a Categorical variable – 4. Select “Household income in thousands” for “Subgroups Defined by” 5. Select “Age in years”, “Price of primary vehicle”, “years with current employers”

“Numbers of people in household”, and “Job satisfaction” for “Variables to Plot” Click ok

By default, this will sort the data set as well. The output is not really precise. This is an impressionistic devise. It’s point is to give you a general feel for what the differences are. Then you can go and compare each of the variables separately. But the ability to do compare subgroups is a neat option for looking at patterns overall in several variables or potentially several groups simultaneously.

Splitting Files “Comparing subgroups and Selecting Subgroups”

• Comparing Groups: In order to show the differences, let’s try an example before and after splitting the data file.

1. Go to Analyze 2. Select Descriptive Statistics 3. Select Descriptives … 4. Select: “Wireless service”, “Multiple lines”, “Voice Mail”, “Internet”, Caller ID”, and

“Call waiting” 5. Hit ok

6. Go to Data 7. Select “Split File”

8. Now select Compare Groups 9. Select “Gender” as the variable that you want to group your statistics analysis

10. One important thing here is that SPSS needs to have the data file sorted by the field that you want to split it with – This actually will make a change into the data set – By default the data set is sorted by alphabetical order

11. Now we will compare our descriptive analysis again with the same fields that we compared earlier

12. This time let’s try “Separated by” instead of “Layered by” 13. Go to “Split File” > This time select “Organize output by groups” 14. Click ok

15. This will force SPSS to treat the category of gender as 2 separate files and do the analysis one after the other

16. Now try Descriptive Statistics example again

SPSS Statistics Training a

Documents

Transcript of SPSS Statistics Training a