Spss comd interpret

13
SPSS: SPSS Commands and Interpreting Statistics Frequency Distributions We use frequency distributions to determine the frequency or number of people that fall into a certain category. For example, if we classified those running for Senator or governor as Democratic and Republican, a frequency distribution would allow us to determine the percent that were Democrat and Republican. In our data file, the variable we used to list candidates as Republican or Democrat was “party.” 1. Go to Analyze—Descriptive Statistics—Frequency 2. Double click on party and then click OK. Interpreting Frequency Distributions 1. As you can see, the two parties are listed below: Democrat and Republican. The “Missing” category simply reflect the candidates whose party affiliation we could not determine. 2. Under the “Frequency” column, we have the number of candidates that were Democrat (186), Republican (280), or unclassified or missing (5). 3. Finally, we typically use the “Valid Percent” column in deterring the frequency distribution of Democrats and Republicans because it does not take into consideration those cases where we could not assign a category. In this case, 39.9 percent were Democrat and 61.1 percent were Republican. Clearly, there is a greater number of Republican than Democratic candidates.

description

SPSS

Transcript of Spss comd interpret

Page 1: Spss comd interpret

SPSS:  SPSS  Commands  and  Interpreting  Statistics  

Frequency  Distributions  

We  use  frequency  distributions  to  determine  the  frequency  or  number  of  people  that  fall  into  a  certain  category.  For  example,  if  we  classified  those  running  for  Senator  or  governor  as  Democratic  and  Republican,  a  frequency  distribution  would  allow  us  to  determine  the  percent  that  were  Democrat  and  Republican.    

In  our  data  file,  the  variable  we  used  to  list  candidates  as  Republican  or  Democrat  was  “party.”    

1.  Go  to  Analyze—Descriptive  Statistics—Frequency  

2.  Double  click  on  party  and  then  click  OK.    

 

Interpreting  Frequency  Distributions  

1. As you can see, the two parties are listed below: Democrat and Republican. The “Missing” category simply reflect the candidates whose party affiliation we could not determine. 2. Under the “Frequency” column, we have the number of candidates that were Democrat (186), Republican (280), or unclassified or missing (5). 3. Finally, we typically use the “Valid Percent” column in deterring the frequency distribution of Democrats and Republicans because it does not take into consideration those cases where we could not assign a category. In this case, 39.9 percent were Democrat and 61.1 percent were Republican. Clearly, there is a greater number of Republican than Democratic candidates.

Page 2: Spss comd interpret

Political Party

Frequency Percent Valid Percent

Cumulative Percent

Democrat 186 39.5 39.9 39.9 Republican 280 59.4 60.1 100.0

Valid

Total 466 98.9 100.0 Missing 9.00 5 1.1 Total 471 100.0  

Chi  Square  Test  

Often  we  have  two  nominal  level  variables  (gender,  party  affiliation,  or  ethnicity  for  example)  and  we  need  to  determine  if  a  relationship  exists  between  them.  For  example,  we  may  want  to  know  if  ethnicity  is  related  to  party  affiliation.  We  suspect  it  is  the  case  and  we  hypothesize  because  that  minorities  are  associated  with  the  Democratic  Party  and  whites  with  the  Republican  Party.  

Using  a  crosstab  table  and  Chi  Square  test,  we  can  determine  if  there  is  a  relationship  between  two  variable  that  IS  NOT  DUE  TO  CHANCE.    

1.  To  do  this,  go  to  Analyze—Descriptive  Statistics—Crosstabs.  

We  put  the  “Political  Party”  in  the  Row  because  the  dependent  variable  ALWAYS  goes  in  the  Row  box.  We  put  “Ethnicity”  in  the  Column  because  the  independent  (explanatory  variable)  ALWAYS  goes  in  the  “Column”  box.  

 

 

 

Page 3: Spss comd interpret

 

2.  Next  we  click  the  “Statistics”  button  and  click  “Chi  Square”,  “Phi  and  Cramer’s  V”,  and  “Lambda.”  Click  the  “Continue”  button.  

 

Page 4: Spss comd interpret

3.  Next,  click  the  “Cell”  button.  Under    “Counts”,  check  “Observed”  and  under  “Percentages”  click  “Row”,  “Column”,  and  “Total”.  Then  click  the  “Continue”  Button.  

 

4. Click the “OK” Button to run your crosstab.

Page 5: Spss comd interpret

Interpreting Your Crosstab

1. Reading a crosstabulation can be confusing. Over the years, I have found the following to be helpful in reading them. First, we always begin with the dependent variable that is listed in the column. In this case it is ethnicity, and since we are looking at ethnicity, we will read the cell associated with “% within Ethnicity (2)”. Here is how we read this table. If we are interested in what party Non-whites support, we say:

“Of those who are non-white, 72.1% are Democrats.” And “of those people who are non-white, 27.9% are Republicans.”

If we are interested in the white respondents, we say:

“Of those who are white, 35.5% are Democrat AND 64.5% are Republican.”

If you use this phrase and fill-in the blanks, you can interpret this table properly every time!

“Of those who are _____, ____% are _______ AND _____% are _______.

Political Party * Ethnicity (2) Crosstabulation Ethnicity (2)

White

Non-White Totl

Count 146 31 177 % within Political Party

82.5% 17.5% 100.0%

% within Ethnicity (2) 35.5% 72.1% 39.0%

Democrat

% of Total 32.2% 6.8% 39.0% Count 265 12 277 % within Political Party

95.7% 4.3% 100.0%

% within Ethnicity (2) 64.5% 27.9% 61.0%

Political Party

Republican

% of Total 58.4% 2.6% 61.0% Count 411 43 454 % within Political Party

90.5% 9.5% 100.0%

% within Ethnicity (2) 100.0% 100.0% 100.0%

Total

% of Total 90.5% 9.5% 100.0%

 

Page 6: Spss comd interpret

 

2.  We  thought,  hypothesized,  that  ethnicity  was  related  to  party  affiliation:  Non-­‐whites  were  more  likely  to  be  Democrat  and  Whites  more  likely  to  be  Republican.  As  you  can  see  from  the  table  above,  this  is  true.  72%  of  non-­‐whites  called  themselves  Democrats  and  65%  of  whites  called  themselves  Republicans.  So  our  statistics  bear  out  our  hypothesis.  

3.  However,  is  there  a  possibility  that  the  relationship  between  ethnicity  and  party  affiliation  is  due  to  chance—that  is  to  say,  there  really  is  no  statistically  significant  reason  to  believe  these  variables  are  related  to  one  another.    

To  answer  this  question,  we  use  the  Pearson  Chi-­‐Square  test.  Look  at  the  table  below.  In  the  Pearson  Chi-­‐Square  row,  there  are  numbers  under  three  “Sig.”  columns.  Disregard  the  column  for  the  time  being.  If  the  number  is  between  .000  and  .050,  we  can  say  that  the  relationship  between  the  independent  variable  (ethnicity  in  this  case)  is  significantly  related  to  the  dependent  variable  (party  affiliation).  This  is  another  way  of  saying  that  the  relationship  is  not  due  to  chance  and  really  exists!  As  you  can  see  below,  the  Chi-­‐Square  coefficient  (number)  is  .000  under  the  “Asymp.  Sign  (2-­‐sided)”  column.  Therefore,  ethnicity  is  definitely  related  to  party  affiliation  

If  the  number  is  .051  or  above,  the  significance  is  due  to  “chance”  and  we  say  that  we  are  not  confident  that  the  ethnicity  and  party  affiliation  are  related.  Our  hypothesis  that  ethnicity  is  related  to  party  affiliation  is  rejected.  

Chi-Square Tests

Value df

Asymp. Sig. (2-sided)

Exact Sig. (2-sided)

Exact Sig. (1-sided)

Pearson Chi-Square 21.886a 1 .000 Continuity Correctionb 20.375 1 .000 Likelihood Ratio 21.438 1 .000 Fisher's Exact Test .000 .000 Linear-by-Linear Association

21.838 1 .000

N of Valid Cases 454 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 16.76. b. Computed only for a 2x2 table

 

 

Page 7: Spss comd interpret

4.  How  strong  is  the  relationship  between  the  independent  variable  (ethnicity)  and  the  dependent  variable  (party  affiliation).  The  are  two  measures  of  association  and  for  our  purposes  use  Cramer’s  V  unless  SPSS  spits  out  only  a  Phi  statistic.  Under  the  “Value”  column,  a  number  is  listed.  The  higher  the  number,  the  greater  the  strength  of  association.  Let’s  use  the  following  scale:    

0-­‐.30=no  relationship  (0)  to  weak  relationship  

.31-­‐.70=moderate  relationship  

.71-­‐1.0=strong  relationship  

A  strong  relationship  means  that  knowing  the  ethnicity  of  a  person  will  give  us  very  good  reason  to  guess  the  political  party  with  which  they  are  affiliated.  A  weak  relationship,  means  that  knowing  the  ethnicity  of  a  person  gives  does  not  give  us  much  confidence  is  guessing  the  person’s  political  party  affiliation.  In  this  case,  the  association  is  weak  (.220).  If  I  guess  the  person’s  political  affiliation  based  on  a  person’s  apparent  race,  I  would  likely  be  wrong!  

Symmetric Measures Value Approx. Sig.

Phi -.220 .000 Nominal by Nominal Cramer's

V .220 .000

N of Valid Cases 454

Page 8: Spss comd interpret

Pearson  Correlation  

A correlation is a powerful way to determine the association between two interval level variables. An interval level variable is one whose values are an equal distance apart. For example, income (dollars), ages (years), experience in politics measured in years (years), and percent of the vote (percentages). Male and female are not interval level variables, because they are not expressed in values equal distance apart. They are categorical variables. For example, we may be interested in determining if political experience as measured by the number of years a person has served in office is related to campaign funds raised. We suspect that the longer the incumbent is in office, the more campaign funds s/he will raise. After all, an incumbent has political power and is likely to be reelected: we would want to contribute to the incumbent.

1. To do a correlation analysis, go to Analyze—Correlation—Bivariate

2. Find and double click the variables “Political Experience” and “Money Raised”. This will put these two variables in the variable window.

3. Click the “OK” button to run your correlation.

 

 

 

 

 

 

Page 9: Spss comd interpret

Interpreting Your Pearson Correlation

1. A correlation coefficient (number) represents the strength of an association between to variables. The  higher  the  number,  the  greater  the  strength  of  association.  Let’s  use  the  following  scale:    

0-­‐.30=no  relationship  (0)  to  weak  relationship  

.31-­‐.70=moderate  relationship  

.71-­‐1.0=strong  relationship  

2.  In  this  case  the  correlation  between  “Political  Experience”  and  “Money  Raised”  is  .331**  This  would  be  moderate  relationship.  

3.  The  “Sig.  (2-­‐tailed)”  is  important.  It  tells  us  if  the  relationship  is  due  to  chance.  If  the  correlation  coefficient  (number)  is  between  .000  and  .050,  we  can  say  that  the  political  experience  and  money  raised  are  significantly  related  and  we  can  say  that  an  increase  in  political  experience  will  lead  to  an  increase  in  campaign  contributions.  If  the  coefficient  is  .051  or  more,  we  say  that  we  cannot  be  confident  that  political  experience  and  money  raised  are  related  or  associated.    

In  this  case,  we  can  say  that  there  is  a  “moderate,  significant  relationship  between  political  experience  and  money  raised.  

Correlations

Political Experience (Years)

Money Raised

Pearson Correlation

1 .331**

Sig. (2-tailed) .000

Political Experience (Years)

N 462 414 Pearson Correlation

.331** 1

Sig. (2-tailed) .000

Money Raised

N 414 421 **. Correlation is significant at the 0.01 level (2-tailed).

Page 10: Spss comd interpret

Multiple  Regression  

A  very  powerful  way  to  analyze  data  is  by  using  a  “multiple  regression.”  For  our  purposes,  a  multiple  regression  allow  us  to  look  at  several  factors  that  affect  a  dependent  variable  and  determine  what  factors  exert  a  greater  influence  on  the  dependent  variable.  For  example,  we  may  suspect  that  the  size  of  a  person’s  vote  is  determined  by  the  quality  of  the  candidate  AND  the  amount  of  money  raised.  After  all,  better  Senate  candidates  will  win  a  greater  percentage  of  the  vote  than  poorer  Senate  candidates  and  candidates  with  more  money  will  be  able  to  spend  more  to  get  elected.  With  more  money  to  spend,  they  should  get  a  greater  percent  of  the  vote.  But,  which  factor  is  more  important:  candidate  quality  or  money  raised.  To  answer  this  question,  we  do  a  multiple  regression.  

1.  Go  to  Analyze—Regression—Linear  

2.  Since  the  dependent  variable  is  the  percentage  of  the  vote  a  candidate  received,  we  put  “Vote:  Primary  or  Convention”  in  the  “Dependent”  variable  box.  The  two  independent  variables  we  expect  to  influence  the  dependent  variable  goe  in  the  “Independent(s)”  variable  box.  It  should  look  like  this:  

 

3.  Click  the  “OK”  button.  

Interpreting  Your  Multiple  Regression  

1.  Your  output  produces  a  number  of  tables.  Let’s  look  at  the  most  important  tables.  

Page 11: Spss comd interpret

 

1.  The  first  table,  “Variables  Entered/Removed”,  tells  you  what  variables  were  used  in  the  analysis.  As  you  can  see,  “Money  Raised”  and  “Political  Experience”  were  used.  Under  the  table,  you  can  see  that  the  dependent  variable  was  “Vote:  Primary  or  Convention.”  

Variables Entered/Removedb,c

Model Variables Entered Variables Removed Method

1 Money Raised, Political Experience (Years)

. Enter

a. All requested variables entered. b. Dependent Variable: Vote: Primary or Convention c. Models are based only on cases for which Office = Senate  

2.  There  are  two  “coefficients”  or  numbers  that  are  important:  the  “R”  and  “R  Square.”  The  “R”  is  the  combined  effect  of  all  the  independent  variables  on  the  dependent  variable.  In  this  case  there  is  a  moderate,  positive  association  between  money  raised  and  candidate  quality  (.662).  The  “R  Square”  simply  means  that  these  two  variables  explain  43.8  percent  of  the  variance  in  the  dependent  variable:  the  vote.  This  is  a  technical  way  of  saying  that  there  are  other  factors  (variables)  that  explain  the  remaining  56.2  percent  of  the  variance.  What  might  they  be?  How  about  incumbency  or  candidate  quality?  

Model Summary

R

Model

Office = Senate

(Selected) R Square Adjusted R

Square Std. Error of the Estimate

1 .662a .438 .432 20.33142 a. Predictors: (Constant), Money Raised, Political Experience (Years)

 

 

Page 12: Spss comd interpret

3.  In  the  ANOVA  table,  look  only  at  the  “Sig.”  column.  If  the  number  is  between  .000-­‐.05  inclusive,  then  we  can  say  that  the  relationship  between  the  independent  variables  (money  raised  and  candidate  quality  in  this  case)  and  the  dependent  variable  (share  of  the  vote)  is  not  due  to  chance—which  is  the  case  here.  This  means  that  we  are  confident  that  money  raised  and  candidate  quality  influence  the  vote.  If  it  is  greater  than  .05  (for  example  .051  or  .60  or  .154),  then  the  relationship  MIGHT  BE  DUE  TO  CHANCE  and  we  should  say  we  are  not  confident  that  money  raised  and  candidate  quality  are  linked  to  the  percentage  of  the  vote.      

ANOVAb,c

Model Sum of Squares df Mean Square F Sig. Regression 62539.414 2 31269.707 75.646 .000a Residual 80193.138 194 413.367

1

Total 142732.552 196 a. Predictors: (Constant), Money Raised, Political Experience (Years) b. Dependent Variable: Vote: Primary or Convention c. Selecting only cases for which Office = Senate

4.  A  very  important  table  is  the  “Coefficients”  table.  This  table  tell  us,  among  other  things,  how  much  influence  each  independent  variable  exerts  on  the  depend  variable.  Note  the  following  columns.    

a.  Under  “Model”  are  listed  the  two  independent  variables—Political  Experience”  and  “Money  Raised.”    

b.  Really  important  are  the  coefficients  (numbers)  under  the  column  “Standardized  Coefficients,  Beta”.  The  higher  the  number  the  more  influence  this  variable  influences  the  dependent  variable,  the  percentage  of  the  vote.  In  this  case,  you  can  see  that  “Political  Experience”  (.398)  is  more  important  than  “Money  Raised”  (.370)—but  not  much  more.  Thus,  we  can  say  that  political  experience  is  more  important  than  money  in  explaining  voting  for  Senate  candidates—but  not  by  much!  

In  some  cases  the  Beta  coefficient  will  have  a  negative  sign  in  front  of  it.  Disregard  this  sign  in  interpreting  which  variable  exerts  the  most  influence  over  the  dependent  variable.  The  larger  the  number,  regardless  of  the  sign,  exerts  more  influence.  

c.  The  “Sig.”  column  simply  states  whether  the  independent  variables  (political  experience  and  money  raised)  are  significantly  related  to  the  dependent  variable  (percent  of  the  vote).  If  the  number  is  between  .000  and  .050,  we  can  say  that  the  relationship  is  NOT  due  to  chance:  that  there  is  a  significant  relationship  between  this  variable  and  the  dependent  variable.  As  you  can  see,  the  relationship  is  

Page 13: Spss comd interpret

significant  and  we  can  say  that  “political  experience  and  money  raised  are  significantly  related  to  the  vote.”  

Coefficientsa,b Unstandardized

Coefficients Standardized Coefficients

Model B Std. Error Beta t Sig. (Constant) 16.224 1.698 9.556 .000 Political Experience (Years)

1.077 .166 .398 6.485 .000 1

Money Raised 2.470E-6 .000 .370 6.024 .000 a. Dependent Variable: Vote: Primary or Convention b. Selecting only cases for which Office = Senate