Introduction to Statistics: Political Science (Class 3) Calculating R-Squared, Dichotomous and...
-
Upload
mekhi-millis -
Category
Documents
-
view
215 -
download
0
Transcript of Introduction to Statistics: Political Science (Class 3) Calculating R-Squared, Dichotomous and...
Introduction to Statistics: Political Science (Class 3)
Calculating R-Squared, Dichotomous and Nominal
Variables, F-tests
R-Squared
R-Squared Example
• Measure of proportion of variance in Y explained by the IVs
Coef. St.Err T PBush FT -.165 .019 -8.72 0.000Party Identification 7.354 .278 26.44 0.000Constant 65.28 .962 67.89 0.000
FU
LL S
AM
PLE
Coef. St.Err T PBush FT -.090 .489 -0.18 0.860Party Identification 12.31 7.47 1.65 0.143Constant 50.16 18.23 2.75 0.028
10 R
ando
m C
ases
R2 = .5336
• First, we need the variance of Y
• Mean = 66, so:
Obama FT = 50.16 + (-.090)(Bush FT) + 12.31(Party Identification)
Observed(Observed-
Mean)(Observed-
Mean)2
50 -16 256
30 -36 1296
100 34 1156
50 -16 256
70 4 16
30 -36 1296
85 19 361
100 34 1156
85 19 361
60 -6 36
Variance = 6190
Bush FT PID Predicted Observed(Observed-
Predicted)(Observed-
Predicted)2
25.00 0.00 47.92 50.00 2.08 4.320.00 2.00 74.78 30.00 -44.78 2005.640.00 3.00 87.10 100.00 12.90 166.53
40.00 0.00 46.58 50.00 3.42 11.7130.00 2.00 72.10 70.00 -2.10 4.3960.00 -1.00 32.48 30.00 -2.48 6.130.00 3.00 87.10 85.00 -2.10 4.390.00 3.00 87.10 100.00 12.90 166.531.00 1.00 62.38 85.00 22.62 511.490.00 1.00 62.47 60.00 -2.47 6.12
SSR (Sum of Squared Residuals) = 2887.25Variance of Y = 6190
R2 =(6190-2887.25)
6190 = .5336
What is a “good” R2?
• Predict feelings about Obama with:– Party ID and feelings about Bush– Education– Zodiac sign
Non-continuous IVs
Dealing with Dichotomous and Nominal Variables
Democratic Peace
• Is sum of democracy scores the right measure?
• Alternative: Are the pair of countries both democracies?
• Indicator/dummy/dichotomous variable:– 1 if both countries have democracy scores >5– 0 otherwise
Dichotomous IV
Coef SE Coef T PDemocratic Pair (1=yes) 5.18 0.362 14.31 0.000Constant 24.35 0.171 142.45 0.000
R-squared = 0.0057
Coef SE Coef T PDemocratic Pair (1=yes) 4.74 0.369 12.84 0.000Military Spending ($mil) 0.053 0.002 25.59 0.000Constant 22.21 0.204 108.98 0.000
R-squared = 0.0242
DV: Years at peace
Nominal variables
• Speed dating survey: You have 100 points to distribute among the following attributes -- give more points to those attributes that are more important in a potential date, and fewer points to those attributes that are less important in a potential date.
• Attractive• Fun• Intelligent• Sincere• Ambitious• Shared Interests
How do people’s perspective/goals affect what’s important to them?
• What is your primary goal in participating in this event? – Seemed like a fun night out=1– To meet new people=2– To get a date=3– Looking for a serious relationship=4– To say I did it=5
• Does this make sense as a linear scale?
Who is likely to say each of the following is important?
• Attractiveness? Fun? – Seemed like a fun night out=1– To meet new people=2– To get a date=3– Looking for a serious relationship=4– To say I did it=5
• Does this make sense as a linear scale?
Effects of Nominal Variable
One Variable:Seemed like a fun night out=1
To meet new people=2To get a date=3
Looking for a serious relationship=4To say I did it=5
Five Variables:Seemed like a fun night out (1=yes)
To meet new people (1=yes) To get a date (1=yes)
Looking for a serious relationship (1=yes) To say I did it (1=yes)
Importance of Attribute = β0 + β1(Seemed Fun) + β2(Meet People) + β3(Date) + β4(Serious Relationship) + β5(Say Did) + u
What would β0 correspond to in this model?
“Reference Group”
• Leave one indicator out
Importance of Attribute = β0 + β1(Seemed Fun) + β2(Meet People) + β3(Date) + β4(Serious Relationship) + β5(Say Did) + u
(Remember: reference group is “to say I did it”)
Attractiveness Coef.SE Coef. T p
Seemed Fun -4.011 0.883 -4.54 0.000
Meet People -3.843 0.891 -4.31 0.000
Date -3.186 1.033 -3.09 0.002
Serious Relationship -6.320 1.084 -5.83 0.000
Constant 22.566 0.846 26.68 0.000
What if we want to know whether people who want a date and those who want a serious relationship differ in how important they think attractiveness is?
Easiest way: change reference category
Importance of Attribute = β0 + β1(Seemed Fun) + β2(Meet People) + β3(Date) + β4(Serious Relationship) + β5(Say Did) + u
Attractiveness Coef. SE Coef. T p
Seemed Fun 2.309 0.723 3.19 0.001
Meet People 2.477 0.733 3.38 0.001
Date 3.134 0.900 3.48 0.001
Say I Did 6.320 1.084 5.83 0.000
Constant 16.246 0.678 23.95 0.000
Do people who want a date and those who want a serious relationship differ in how important they think attractiveness is?
Nominal and Dichotomous IVs
Attractiveness Coef.SE Coef. T p
Seemed Fun 1.852 0.696 2.66 0.008
Meet People 2.516 0.705 3.57 0.000
Date 2.998 0.865 3.46 0.001
Say I Did 6.303 1.042 6.05 0.000
Gender (1=male) 4.689 0.326 14.38 0.000
Constant 14.084 0.669 21.06 0.000
Estimated points allocated to attractiveness for men who attended because it seemed fun?
F-Tests
Testing the joint significance of variables
F-test
• Way of testing joint significance of variables – i.e., whether set of variables significantly improve explanatory power
• When to use:– Nominal variables– Variables likely to be highly correlated, but
important predictors
Terminology
• Unrestricted model – includes IVs you want to test joint significance of
• Restricted model – same model, excluding IVs to be tested
• SSR – Sum of Squared Residuals
Formula
• q = # of variables being tested• n = number of cases• k = number of IVs in unrestricted
F =(SSRr - SSRur)/q
SSRur/(n-(k+1)
Who values fun people?
Fun Coef.SE Coef. T p
Seemed Fun 0.537 0.349 1.54 0.124
Meet People -0.058 0.354 -0.17 0.869
Date -1.235 0.434 -2.84 0.004
Say I Did -0.271 0.523 -0.52 0.605
Gender (1=male) 0.254 0.164 1.55 0.121
Constant 17.139 0.336 51.06 0.000
What if we want to know whether the reason for attending variables as a group improve the explanatory power of the model?
q = # of variables being tested
n = number of cases
k = number of IVs in unrestrictedF =
(SSRr - SSRur)/q
SSRur/(n-(k+1)
UNRESTRICTED Sum of Squares df MS
Model 672.078 5 134.4156
Residual 40819.896 2478 16.47292
Total 41491.974 2483 16.71042
RESTRICTED
Restricted Sum of Squares df MS
Model 62.841 1 62.84063
Residual 41429.133 2482 16.69183
Total 41491.974 2483 16.71042
F =(41429.133 - 40819.896)/4
40819.896 /(2484-(5+1))= 9.25
Statistical significance of F-test
• What does an F value of 9.25 mean?• Similar idea to a t-test, but shape of F-
distribution depends (heavily) on degrees of freedom– Numerator = number of IVs being tested– Denominator = N-(number of IVs)-1– Here: 4 and 2478 (2484-5-1)
Look up critical value in a table or use Minitab
• Calc Probability Distributions F
Note: this will give you area under the curve up to your F-test, so use 1-p
Cumulative Distribution Function
F distribution with 4 DF in numerator and 2478 DF in denominator
x P( X <= x )9.25 1.00000
Notes and Next Time
• Graded homework will be handed back next time and model answers will be posted online early next week
• New homework will be handed out next time (and due next Thursday)
• Next time: – Functional form in multivariate regression