Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions...

40
Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Transcript of Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions...

Page 1: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Sociology 5811:Lecture 16: Crosstabs 2Measures of Association

Plus Differences in Proportions

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Page 2: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Announcements

• Final project proposals due Nov 15• Get started now!!!

• Find a dataset

• figure out what hypotheses you might test

• Today: Wrap up Crosstabs• If time remains, we’ll discuss project ideas…

Page 3: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Review: Chi-square Test

• Chi-Square test is a test of independence

• Null hypothesis: the two categorical variables are statistically independent

• There is no relationship between them

• H0: Gender and political party are independent

• Alternate hypothesis: the variables are related, not independent of each other

• H1: Gender and political party are not independent

• Test is based on comparing the observed cell values with the values you’d expect if there were no relationship between variables.

Page 4: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Review: Expected Cell Values

• If two variables are independent, cell values will depend only on row & column marginals– Marginals reflect frequencies… And, if frequency is

high, all cells in that row (or column) should be high

• The formula for the expected value in a cell is:

N

fff jiij

))((ˆ

• fi and fj are the row and column marginals

• N is the total sample size

Page 5: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Review: Chi-square Test

• The Chi-square formula:

R

i

C

j ij

ijij

E

OE

1 1

22 )(

• Where:

• R = total number of rows in the table

• C = total number of columns in the table

• Eij = the expected frequency in row i, column j

• Oij = the observed frequency in row i, column j

– Assumption for test: Large N (>100)– Critical value DofF: (R-1)(C-1).

Page 6: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-square Test of Independence

• Example: Gender and Political Views– Let’s pretend that N of 68 is sufficient

Women Men

DemocratO11: 27

E11: 23.4

O12 : 10

E12 : 13.6

RepublicanO21 : 16

E21 : 19.6

O22 : 15

E22 : 11.4

Page 7: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-square Test of Independence

• Compute (E – O)2 /E for each cell

Women Men

Democrat(23.4 – 27)2/23.4

= .55(13.6 – 10)2/13.6

= .95

Republican(19.6 – 16)2/19.6

= .66

(11.4 – 15)2/15

= .86

Page 8: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-Square Test of Independence

• Finally, sum up to compute the Chi-square

• 2 = .55 + .95 + .66 + .86 = 3.02

• What is the critical value for =.05?• Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1

• According to Knoke, p. 509: Critical value is 3.84

• Question: Can we reject H0?• No. 2 of 3.02 is less than the critical value

• We cannot conclude that there is a relationship between gender and political party affiliation.

Page 9: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Chi-square Test of Independence

• Weaknesses of chi-square tests:

• 1. If the sample is very large, we almost always reject H0.

• Even tiny covariations are statistically significant

• But, they may not be socially meaningful differences

• 2. It doesn’t tell us how strong the relationship is• It doesn’t tell us if it is a large, meaningful difference or a

very small one

• It is only a test of “independence” vs. “dependence”

• Measures of Association address this shortcoming.

Page 10: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Measures of Association

• Separate from the issue of independence, statisticians have created measures of association– They are measures that tell us how strong the

relationship is between two variables

• Weak Association Strong Association

Women Men

Dem. 51 49

Rep. 49 51

Women Men

Dem. 100 0

Rep. 0 100

Page 11: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association:Yule’s Q

• #1: Yule’s Q– Appropriate only for 2x2 tables (2 rows, 2 columns)

• Label cell frequencies a through d: a b

c d

• Recall that extreme values along the “diagonal” (cells a & d) or the “off-diagonal” (b & c) indicate a strong relationship.

• Yule’s Q captures that in a measure

• 0 = no association. -1, +1 = strong association

adbc

adbcQ

:Formula

Page 12: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association:Yule’s Q

• Rule of Thumb for interpreting Yule’s Q:• Bohrnstedt & Knoke, p. 150

Absolute value of Q

Strength of Association

0 to .24 “virtually no relationship”

.25 to .49 “weak relationship”

.50 to .74 “moderate relationship”

.75 to 1.0 “strong relationship”

Page 13: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

a b

c d

Crosstab Association:Yule’s Q• Example: Gender and Political Party Affiliation

Women Men

Dem 27 10

Rep 16 15

Calculate “bc”

bc = (10)(16) = 160

Calculate “ad”

ad = (27)(15) = 405

adbc

adbcQ

405160

405160

48.505

245

• -.48 = “weak association”, almost “moderate”

Page 14: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Association: Other Measures

• Phi ()• Very similar to Yule’s Q

• Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc.

• Gamma (G)• Based on a very different method of calculation

• Not limited to 2x2 tables

• Requires ordered variables

• Tau c (c) and Somer’s d (dyx)• Same basic principle as Gamma

• Several Others discussed in Knoke, Norusis.

Page 15: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Gamma, like Q, is based on comparing “diagonal” to “off-diagonal” cases.– But, it does so differently

• Jargon:

• Concordant pairs: Pairs of cases where one case is higher on both variables than another case

• Discordant pairs: Pairs of cases for which the first case (when compared to a second) is higher on one variable but lower on another

Page 16: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Example: Approval of candidates– Cases in “Love Trees/Love Guns” cell make

concordant pairs with cases lower on both

Hate Trees

Trees OK

Love Trees

Love Guns

1205 603 71

Guns = OK

659 1498 452

Hate Guns

431 467 1120

All 71 individuals can be a pair with everyone in the

lower cells. Just Multiply!

(71)(659+1498+ 431+467) = 216,905 conc. pairs

Page 17: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• More possible concordant pairs– The “Love Guns/Trees are OK” cell and the “Trees =

OK/Love Guns” cells also can have concordant pairs

Hate Trees

Trees = OK

Love Trees

Love Guns

1205 603 71

Guns = OK

659 1498 452

Hate Guns

431 467 1120

These 603 can pair with all those that score lower on

approval for Guns & Trees

(603)(659 + 431) = 657,270 conc. pairs

These can pair lower too!

(452)(431 + 467) = 405,896 conc. pairs

Page 18: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Discordant pairs: Pairs where a first person ranks higher on one dimension (e.g. approval of Trees) but lower on the other (e.g., app. of Guns)

Hate Trees

Trees = OK

Love Trees

Love Guns

1205 603 71

Guns = OK

659 1498 452

Hate Guns

431 467 1120

The top-left cell is higher on Guns but lower on Trees than those in the

lower right. They make pairs:

(1205)(1498 + 452 + 467 + 1120) = 4,262,085

discordant pairs

Page 19: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Associaton: Gamma

• If all pairs are concordant or all pairs are discordant, the variables are strongly related

• If there are an equal number of discordant and concordant pairs, the variables are weakly associated.

• Formula for Gamma:ds

ds

nn

nnG

• ns = number of concordant pairs

• nd = number of discordant pairs

Page 20: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association: Gamma

• Calculation of Gamma is typically done by computer

• Zero indicates no association

• +1 = strong positive association

• -1 = strong negative association

• It is possible to do hypothesis tests on Gamma• To determine if population gamma differs from zero

• Requirements: random sample, N > 50

• See Knoke, p. 155-6.

Page 21: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Crosstab Association

• Final remarks:

• You have a variety of possible measures to assess association among variables. Which one should you use?

• Yule’s Q and Phi require a 2x2 table

• Larger ordered tables: use Gamma, Tau-c, Somer’s d

• Ideally, report more than one to show that your findings are robust.

Page 22: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Odds ratios are a powerful way of analyzing relationships in crosstabs

• Many advanced categorical data analysis techniques are based on odds ratios

• Review: What is a probability?• p(A) = # of outcomes that are “A” divided by total number

of outcomes

• To convert a frequency distribution to a probability distribution, simply divide frequency by N

• The same can be done with crosstabs: Cell frequency over N is probability.

Page 23: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• If total N = 68, probability of drawing cases is:

Women Men

Dem 27 / 68 10 / 68

Rep 16 / 68 15 / 68

Women Men

Dem .397 .147

Rep .235 .220

Page 24: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Odds are similar to probability… but not quite

• Odds of A = Number of outcomes that are A, divided by number of outcomes that are not A– Note: Denominator is different that probability

• Ex: Probability of rolling 1 on a 6-sided die = 1/6

• Odds of rolling a 1 on a six-sided die = 1/5

• Odds can also be calculated from probabilities:

i

ii p

podds

1

Page 25: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Conditional odds = odds of being in one category of a variable within a specific category of another variable– Example: For women, what are the odds of being

democrat?– Instead of overall odds of being democrat, conditional

odds are about a particular subgroup in a table

Women Men

Dem 27 10

Rep 16 15

Conditional odds of being democrat are:

27 / 16 = 1.69

Note: Odds for women are different than men

Page 26: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• If variables in a crosstab are independent, their conditional odds are equal

• Odds of falling into one category or another are same for all values of other variable

• If variables in a crosstab are associated, conditional odds differ

• Odds can be compared by making a ratio• Ratio is equal to 1 if odds are the same for two groups

• Ratios much greater or less than 1 indicate very different odds.

Page 27: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios

• Formula for Odds Ratio in 2x2 table:

ad

bc

ca

dbOR XY

Women Men

Dem 27 10

Rep 16 15

• Ex: OR = (10)(16)/(27)(15) = 160 / 405 = .395

• Interpretation: men have .395 times the odds of being a democrat compared to women

• Inverted value (1/.395=2.5) indicates odds of women being democrat = 2.5 is times men’s odds

a b

c d

Page 28: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Odds Ratios: Final Remarks

• 1. Cells with zeros cause problems for odds ratios

• Ratios with zero in denominator are undefined.

• Thus, you need to have full cells

• 2. Odds ratios can be used to measure assocation• Indeed, Yule’s Q is based on them

• 3. Odds ratios form the basis for most advanced categorical data analysis techniques

• For now it may be easier to use Yule’s Q, etc. But, if you need to do advanced techniques, you will use odds ratios.

Page 29: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.
Page 30: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.
Page 31: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions• Another approach to small (2x2) tables:

• Instead of making a crosstab, you can just think about the proportion of people in a given category

• More similar to T-test than a Chi-square test

• Ex: Do you approve of Pres. Bush? (Yes/No)

• Sample: N = 86 women, 80 men

• Proportion of women that approve: PW = .70

• Proportion of men that approve: PM = .78

• Issue: Do the populations of men/women differ?• Or are the differences just due to sampling variability

Page 32: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions

• Hypotheses:

• Again, the typical null hypothesis is that there are no differences between groups

• Which is equivalent to statistical independence

• H0: Proportion women = proportion men

• H1: Proportion women not = proportion men• Note: One-tailed directional hypotheses can also be used.

Page 33: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions

• Strategy: Figure out the sampling distribution for differences in proportions

• Statisticians have determined relevant info:

• 1. If samples are “large”, the sampling distribution of difference in proportions is normal– The Z-distribution can be used for hypothesis tests

• 2. A Z-value can be calculated using the formula:

)(

21

21σ̂

ZPP

PP

Page 34: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Tests for Difference in Proportions

• Standard error can be estimated as:

21

2211

NN

PNPNPboth

21

21)( )1(σ̂

21 NN

NNPP bothbothPP

• Where:

Page 35: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Q: Do you approve of Pres. Bush? (Yes/No)

• Sample: N = 86 women, 80 men

• Women: N = 86, PW = .70

• Men: N = 80, PW = .78

• Total N is “Large”: 166 people– So, we can use a Z-test

• Use = .05, two-tailed Z = 1.96

Page 36: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Use formula to calculate Z-value

)()()(

21

212121σ̂

08.

σ̂

78.70.

σ̂Z

PPPPPP

PP

• And, estimate the Standard Error as:

21

21)( )1(σ̂

21 NN

NNPP bothbothPP

Page 37: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• First: Calculate Pboth:

21

2211

NN

PNPNPboth

739.166

4.622.60

bothP

8086

)78(.80)70(.86

bothP

Page 38: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Plug in Pboth=.739:

21

21)( )739.1(739.σ̂

21 NN

NNPP

)80)(86(

8086454.σ̂ )( 21

PP

104.6880

166674.σ̂ )( 21

PP

Page 39: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Finally, plug in S.E. and calculate Z:

)()()(

21

212121σ̂

08.

σ̂

78.70.

σ̂Z

PPPPPP

PP

769.104.

08.

σ̂Z

)(

21

21

PP

PP

Page 40: Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute.

Difference in Proportions: Example

• Results:

• Critical Z = 1.96

• Observed Z = .739

• Conclusion: We can’t reject null hypothesis– Women and Men do not clearly differ in approval of

Bush