Crosstabulation and Measures of Association

Crosstabulation and Measures of Crosstabulation and Measures of AssociationAssociation

Investigating the relationship between two Investigating the relationship between two variablesvariables

Generally a statistical relationship exists if the Generally a statistical relationship exists if the values of the observations for one variable are values of the observations for one variable are associated with the values of the observations associated with the values of the observations for another variablefor another variable

Knowing that two variables are related allows us Knowing that two variables are related allows us to make predictions.to make predictions.

If we know the value of one, we can predict the If we know the value of one, we can predict the value of the other.value of the other.

Determining how the values of one Determining how the values of one variable are related to the values of variable are related to the values of another is one of the foundations of another is one of the foundations of empirical science. empirical science.

In making such determinations we must In making such determinations we must consider the following features of the consider the following features of the relationship.relationship.

1.) The level of measurement of the variables. 1.) The level of measurement of the variables. Difference varibles necessitate different Difference varibles necessitate different procedures.procedures.

2.) The form of the relationship. We can ask if 2.) The form of the relationship. We can ask if changes in X move in lockstep with changes in changes in X move in lockstep with changes in Y or if a more sophisticated relationship exists.Y or if a more sophisticated relationship exists.

3.)The strength of the relationship. Is it 3.)The strength of the relationship. Is it possible that some levels of X will always be possible that some levels of X will always be associated with certain levels of Y?associated with certain levels of Y?

4.) Numerical Summaries of the relationship. 4.) Numerical Summaries of the relationship. Social scientists strive to boil down the different Social scientists strive to boil down the different aspects of a relationship to a single number that aspects of a relationship to a single number that reveals the type and strength of the association.reveals the type and strength of the association.

5.) Conditional relationships. The variables X 5.) Conditional relationships. The variables X and Y may seem to be related in some fashion and Y may seem to be related in some fashion but appearances can be deceiving. but appearances can be deceiving. Spuriousness for example. So we need to know Spuriousness for example. So we need to know if the introduction of any other variables into the if the introduction of any other variables into the analysis changes the relationship.analysis changes the relationship.

Types of AssociationTypes of Association

1.) General Association – simply 1.) General Association – simply associated in some way.associated in some way.

2.) Positive Monotonic Correlation – when 2.) Positive Monotonic Correlation – when the variables have order (ordinal or the variables have order (ordinal or continuous) high values of one var are continuous) high values of one var are associated with high values of the other. associated with high values of the other. Converse is also true.Converse is also true.

3.) Negative Monotonic Correlation – Low 3.) Negative Monotonic Correlation – Low values are associated with high values.values are associated with high values.

Types of Association Cont.Types of Association Cont.

4.) Positive Linear Association – A 4.) Positive Linear Association – A particular type of positive monotonic particular type of positive monotonic relationship where the plotted values of X-relationship where the plotted values of X-Y fall on a straight line that slopes upward.Y fall on a straight line that slopes upward.

5.) Negative Linear Relaionship – Straight 5.) Negative Linear Relaionship – Straight line that slopes downward.line that slopes downward.

Strength of RelationshipsStrength of Relationships

Virtually no relationships between Virtually no relationships between variables in Social Science (and largely in variables in Social Science (and largely in natural science as well) have a perfect natural science as well) have a perfect form.form.

As a result it makes sense to talk about As a result it makes sense to talk about the strength of relationships.the strength of relationships.

Strength Cont.Strength Cont.

The strength of a relationship between The strength of a relationship between variables can be found by simply looking at variables can be found by simply looking at a graph of the data.a graph of the data.

If the values of X and Y are tied together If the values of X and Y are tied together tightly then the relationship is strong.tightly then the relationship is strong.

If the X-Y points are spread out then the If the X-Y points are spread out then the relationship is weak.relationship is weak.

Direction of RelationshipDirection of Relationship

We can also infer direction from a graph We can also infer direction from a graph by simply observing how the values for our by simply observing how the values for our variables move across the graph.variables move across the graph.

This is only true, however, when our This is only true, however, when our variables are ordinal or continuous.variables are ordinal or continuous.

Types of Bivariate Relationships Types of Bivariate Relationships and Associated Statisticsand Associated Statistics

Nominal/Ordinal (including dichotomous)Nominal/Ordinal (including dichotomous) Crosstabulation (Lamda, Chi-Square Gamma, etc.)Crosstabulation (Lamda, Chi-Square Gamma, etc.)

Interval and DichotomousInterval and Dichotomous Difference of means testDifference of means test

Interval and Nominal/OrdinalInterval and Nominal/Ordinal Analysis of VarianceAnalysis of Variance

Interval and RatioInterval and Ratio Regression and correlationRegression and correlation

Assessing Relationships between Assessing Relationships between VariablesVariables

1. Calculate appropriate statistic to 1. Calculate appropriate statistic to measure the magnitude of the relationship measure the magnitude of the relationship in the samplein the sample

2. Calculate additional statistics to 2. Calculate additional statistics to determine if the relationship holds for the determine if the relationship holds for the population of interest (statistical population of interest (statistical significance)significance) Substantive significance vs. Statistical Substantive significance vs. Statistical

significancesignificance

What is a Crosstabulation?What is a Crosstabulation?

Crosstabulations are appropriate for examining Crosstabulations are appropriate for examining relationships between variables that are relationships between variables that are nominalnominal, , ordinalordinal, or , or dichotomousdichotomous..

Crosstabs show values for variables categorized Crosstabs show values for variables categorized by another variable.by another variable.

They display the joint distribution of values of the They display the joint distribution of values of the variables by listing the categories for one along variables by listing the categories for one along the x-axis and the other along the y-axisthe x-axis and the other along the y-axis

Each case is then placed in a cell of the Each case is then placed in a cell of the table that represents the combination of table that represents the combination of values that corresponds to its scores on values that corresponds to its scores on the variables.the variables.

What is a Crosstabulation?What is a Crosstabulation?

Example: We would like to know if Example: We would like to know if presidential vote choice in 2000 was presidential vote choice in 2000 was related to race. related to race.

Vote choice = Gore or BushVote choice = Gore or Bush Race = White, Hispanic, BlackRace = White, Hispanic, Black

Are Race and Vote Choice Are Race and Vote Choice Related? Why?Related? Why?

Black

Hispanic

White

TOTAL

Gore

106

23

427

556

Bush

8

15

484

507

TOTAL

114

38

911

1063

Are Race and Vote Choice Are Race and Vote Choice Related? Why?Related? Why?

Black

Hispanic

White

TOTAL

Gore

106 (93%)

23 (60.5%)

427 (46.9%)

556 (52.3)

Bush

8 (7%)

15 (39.5%)

484 (53.1%)

507 (47.7)

TOTAL

114 (100%)

38 (100%)

911 (100%)

1063 (100%)

Measures of Association for Measures of Association for CrosstabulationsCrosstabulations

Purpose – to determine if nominal/ordinal Purpose – to determine if nominal/ordinal variables are related in a crosstabulationvariables are related in a crosstabulation

At least one nominal variableAt least one nominal variable LamdaLamda Chi-SquareChi-Square Cramer’s VCramer’s V

Two ordinal variablesTwo ordinal variables TauTau GammaGamma

Measures of Association for Measures of Association for CrosstabulationsCrosstabulations

These measures of association provide us with These measures of association provide us with correlation coefficients that summarize data from correlation coefficients that summarize data from a table into one number .a table into one number .

This is extremely useful when dealing with This is extremely useful when dealing with several tables or very complex tables.several tables or very complex tables.

These coefficients measure both the strength These coefficients measure both the strength and direction of an association.and direction of an association.

Coefficients for Nominal DataCoefficients for Nominal Data

When one or both of the variables are When one or both of the variables are nominal, ordinal coefficients cannot be nominal, ordinal coefficients cannot be used because there is no underlying used because there is no underlying ordering.ordering.

Instead we use PRE testsInstead we use PRE tests

Lambda (PRE coefficient)Lambda (PRE coefficient)

PRE – Proportional Reduction in ErrorPRE – Proportional Reduction in Error

Two RulesTwo Rules 1.) Make a prediction on the value of an 1.) Make a prediction on the value of an

observation in the absence of no prior observation in the absence of no prior informationinformation

2.) Given information on a second variable 2.) Given information on a second variable and take it into account in making the and take it into account in making the prediction.prediction.

Lambda PRELambda PRE

If the two variables are associated then the use of If the two variables are associated then the use of rule two should lead to fewer errors in your rule two should lead to fewer errors in your predictions than rule one.predictions than rule one.

How many fewer errors depends upon how closely How many fewer errors depends upon how closely the variables are associated.the variables are associated.

PRE = (E1 – E2) / E1PRE = (E1 – E2) / E1

Scale goes from 0 -1Scale goes from 0 -1

LambdaLambda

Lambda is a PRE coefficient and it relies on rules 1 Lambda is a PRE coefficient and it relies on rules 1 & 2 above. & 2 above.

When applying rule one all we have to go on is what When applying rule one all we have to go on is what proportion of the population fit into one category as proportion of the population fit into one category as opposed to another.opposed to another.

So, without any other information, guessing that So, without any other information, guessing that every observation is in the modal category would every observation is in the modal category would give you the best chance of getting the most correct.give you the best chance of getting the most correct.

Why?Why?

Think of it like this. If you knew that I Think of it like this. If you knew that I tended to make exams where the most tended to make exams where the most often used answer was B, then, without often used answer was B, then, without any other information, you would be best any other information, you would be best served to pick B every time.served to pick B every time.

But, if you know information about each But, if you know information about each case’s value on another variable, rule two case’s value on another variable, rule two directs you to only look at the members of directs you to only look at the members of that new category (variable) and find the that new category (variable) and find the modal category (only on that var).modal category (only on that var).

ExampleExample

Suppose a sample of 100 voters and you need to Suppose a sample of 100 voters and you need to predict how they will vote in the general election. predict how they will vote in the general election.

Assume we know that overall 30% voted democrat Assume we know that overall 30% voted democrat and 30% voted republican and 40% were and 30% voted republican and 40% were independent.independent.

Now suppose we take one person out of the group Now suppose we take one person out of the group (John Smith), our best guess would be that he (John Smith), our best guess would be that he would vote independent.would vote independent.

Now suppose we take another person (Larry Now suppose we take another person (Larry Mendez) and again we would assume he Mendez) and again we would assume he voted independent.voted independent.

As a result our best guess is to predict that all As a result our best guess is to predict that all of the voters (all 100) were independent.of the voters (all 100) were independent.

We are sure to get some wrong but it’s the We are sure to get some wrong but it’s the best we can do over the long run.best we can do over the long run.

How many do we get wrong? 60.How many do we get wrong? 60.

Suppose now that we know something Suppose now that we know something about the voters regions (where they are about the voters regions (where they are from) and we know what proportions from) and we know what proportions various regions voted in the election.various regions voted in the election.

NE-30 , MW – 20, SO – 30 , WE - 20NE-30 , MW – 20, SO – 30 , WE - 20

LamdaLamda

NE

MW

SO

WE

TOTAL

REPUB

1 2 4

1 2 10

1 2 6

1 2 10

30

IND

1 2 12

1 2 8

1 2 16

1 2 4

40

DEM

1 2 14

1 2 2

1 2 8

1 2 6

30

TOTAL

30

20

30

20

100

Lamda – Rule 1 Lamda – Rule 1 (prediction based solely on knowledge of marginal (prediction based solely on knowledge of marginal distribution of dependent variable – partisanship)distribution of dependent variable – partisanship)

NE

MW

SO

WE

TOTAL

REPUB

1 2 4 0

1 2 10 0

1 2 6 0

1 2 10 0

30

IND

1 2 12 30

1 2 8 20

1 2 16 30

1 2 4 20

40

DEM

1 2 14 0

1 2 2 0

1 2 8 0

1 2 6 0

30

TOTAL

30

20

30

20

100

Lamda – Rule 2Lamda – Rule 2(prediction based on knowledge provided by independent (prediction based on knowledge provided by independent

variable )variable )

NE

MW

SO

WE

TOTAL

REPUB

1 2 4 0 0

1 2 10 0 20

1 2 6 0 0

1 2 10 0 20

30

IND

1 2 12 30 0

1 2 8 20 0

1 2 16 30 30

1 2 4 20 0

40

DEM

1 2 14 0 30

1 2 2 0 0

1 2 8 0 0

1 2 6 0 0

30

TOTAL

30

20

30

20

100

Lamda –Calculation of ErrorsLamda –Calculation of Errors Errors w/Rule 1: 18 + 12 + 14 + 16 = 60Errors w/Rule 1: 18 + 12 + 14 + 16 = 60 Errors w/Rule 2: 16 + 10 + 14 + 10 = 50Errors w/Rule 2: 16 + 10 + 14 + 10 = 50 Lamda =(Errors R1 – Errors R2)/Errors R1Lamda =(Errors R1 – Errors R2)/Errors R1 Lamda = (60-50)/60=10/60=.17Lamda = (60-50)/60=10/60=.17

NE

MW

SO

WE

TOTAL

REPUB

1 2 4 0 0

1 2 10 0 20

1 2 6 0 0

1 2 10 0 20

30

IND

1 2 12 30 0

1 2 8 20 0

1 2 16 30 30

1 2 4 20 0

40

DEM

1 2 14 0 30

1 2 2 0 0

1 2 8 0 0

1 2 6 0 0

30

TOTAL

30

20

30

20

100

LamdaLamda

PRE measurePRE measure Ranges from 0-1Ranges from 0-1 Potential problems with LamdaPotential problems with Lamda

Underestimates relationship when variables Underestimates relationship when variables (one or both) are highly skewed(one or both) are highly skewed

Always 0 when modal category of Y is the Always 0 when modal category of Y is the same across all categories of Xsame across all categories of X

Chi –Square (Chi –Square (22))

Also appropriate for any crosstabulation Also appropriate for any crosstabulation with at least one nominal variable (and with at least one nominal variable (and another nominal/ordinal variable)another nominal/ordinal variable)

Based on the difference between the Based on the difference between the empirically observed crosstab and what empirically observed crosstab and what we would expect to observe if the two we would expect to observe if the two variables are variables are statistically independentstatistically independent

Background for Background for 22

Statistical Independence Statistical Independence – A property of two – A property of two variables in which the probability that an variables in which the probability that an observation is in a particular category of on variable observation is in a particular category of on variable and also in a particular category of the other and also in a particular category of the other variable equals the simple or marginal probability of variable equals the simple or marginal probability of being in those categories.being in those categories.

Plays a large role in data analysis Plays a large role in data analysis

Is another way to view the strength of a relaitionshipIs another way to view the strength of a relaitionship

ExampleExample

Suppose we have two nominal or categorical Suppose we have two nominal or categorical variables, X and Y. We label the categories for variables, X and Y. We label the categories for the first category (a,b,c) and those of the the first category (a,b,c) and those of the second (r,s,t). second (r,s,t).

Let P(X = a) stand for the probability that a Let P(X = a) stand for the probability that a randomly selected case has property a on randomly selected case has property a on variable X and P(Y = r) stand for the probability variable X and P(Y = r) stand for the probability that a randomly selected case has property r that a randomly selected case has property r on variable Y. on variable Y.

These two probabilities are called marginal These two probabilities are called marginal distributions and simply refers to the distributions and simply refers to the chance that an observation has a chance that an observation has a particular value on a particular variable particular value on a particular variable irrespective of its value on another irrespective of its value on another variable.variable.

Finally, let us assume that P(X = a, Y = r) stands for Finally, let us assume that P(X = a, Y = r) stands for the joint probability that a randomly selected the joint probability that a randomly selected observation has both property a and property r observation has both property a and property r simultaneously.simultaneously.

Statistical Independence – The two variables are Statistical Independence – The two variables are therefore statisitically independent only if the therefore statisitically independent only if the chances of observing a combination of categories is chances of observing a combination of categories is equal to the marginal probability of choosing one equal to the marginal probability of choosing one category times the marginal probability of the other.category times the marginal probability of the other.

Background for Background for 22

P(X = a, Y = r) = [P(X = a)] [P(Y = r)]P(X = a, Y = r) = [P(X = a)] [P(Y = r)]

For example, if men are as likely to vote as For example, if men are as likely to vote as women, then the two variables (gender and women, then the two variables (gender and voter turnout) are statistically independent voter turnout) are statistically independent because the probability of observing a male because the probability of observing a male nonvoter in the sample is equal to the nonvoter in the sample is equal to the probability of observing a male times the probability of observing a male times the probability of obseving a nonvoter.probability of obseving a nonvoter.

ExampleExample

If 100/300 are men & 210/300 voted then;If 100/300 are men & 210/300 voted then;

The marginal probabilities are:The marginal probabilities are:

P(X=m)=100/300 = .33 and P(Y=v) = P(X=m)=100/300 = .33 and P(Y=v) = 210/300 = .7210/300 = .7

.33 x .7 = .23 and is our marginal probability.33 x .7 = .23 and is our marginal probability

If we know that 70 of the voters are male If we know that 70 of the voters are male and take that proportion and divide by the and take that proportion and divide by the total number of voters (70/300) we also total number of voters (70/300) we also get .23.get .23.

We can therefore say that the two We can therefore say that the two variables are independent. variables are independent.

The chi-squared statistic essentially The chi-squared statistic essentially compares an observed result (the table compares an observed result (the table produced by the sample) with a hypothetical produced by the sample) with a hypothetical table that would occur if (in the population) table that would occur if (in the population) the variables were statistically independent.the variables were statistically independent.

A value of 0 implies statistical independence A value of 0 implies statistical independence which means no association.which means no association.

Chi-squared increases as the departures Chi-squared increases as the departures of observed and expected values grows. of observed and expected values grows. There is no upper limit to how big the There is no upper limit to how big the difference can become but if it is past a difference can become but if it is past a critical value then there is reason to reject critical value then there is reason to reject the null hypothesis that the two variables the null hypothesis that the two variables are independent.are independent.

How do we Calc. Chi^2How do we Calc. Chi^2

The observed frequencies are already in The observed frequencies are already in the crosstab.the crosstab.

The expected frequencies in each table The expected frequencies in each table cell are found by multiplying the row and cell are found by multiplying the row and the column marginal totals and dividing by the column marginal totals and dividing by the sample size.the sample size.

Chi –Square (Chi –Square (22))

NE

MW

SO

WE

TOTAL

REPUB

O E 4

O E 10

O E 6

O E 10

30

IND

O E 12

O E 8

O E 16

O E 4

40

DEM

O E 14

O E 2

O E 8

O E 6

30

TOTAL

30

20

30

20

100

Calculating Expected FrequenciesCalculating Expected Frequencies

To calculate the expected cell frequency for NE To calculate the expected cell frequency for NE Republicans:Republicans:

• E/30 = 30/100, therefore E=(30*30)/100 = 9E/30 = 30/100, therefore E=(30*30)/100 = 9

NE

MW

SO

WE

TOTAL

REPUB

O E 4

O E 10

O E 6

O E 10

30

IND

O E 12

O E 8

O E 16

O E 4

40

DEM

O E 14

O E 2

O E 8

O E 6

30

TOTAL

30

20

30

20

100

Calculating the Chi-Square StatisticCalculating the Chi-Square Statistic

The chi-square statistic is calculated as:The chi-square statistic is calculated as:

(Obs. Frequency(Obs. Frequencyikik - Exp. Frequency - Exp. Frequencyikik))22 / Exp. Frequency / Exp. Frequency ikik

(25/9)+(16/6)+(9/9)+(16/6)+(0)+(0)+(16/12)+(16/8)+(25/9)+16/6)+(1/9)+(0) = (25/9)+(16/6)+(9/9)+(16/6)+(0)+(0)+(16/12)+(16/8)+(25/9)+16/6)+(1/9)+(0) = 1818

NE

MW

SO

WE

TOTAL

REPUB

O E 4 9

O E 10 6

O E 6 9

O E 10 6

30

IND

O E 12 12

O E 8 8

O E 16 12

O E 4 8

40

DEM

O E 14 9

O E 2 6

O E 8 9

O E 6 6

30

TOTAL

30

20

30

20

100

The value 9, is the expected frequency in The value 9, is the expected frequency in the first cell of the table and is what we the first cell of the table and is what we would expect in a sample of 100 (with 30 would expect in a sample of 100 (with 30 Republicans and 30 north easterners) if Republicans and 30 north easterners) if there is statistical independence in the there is statistical independence in the population.population.

This is more than we have in our sample This is more than we have in our sample so there is a difference.so there is a difference.

Just Like the Hyp. TestJust Like the Hyp. Test

Null : Statistical Independence between x Null : Statistical Independence between x and Yand Y

Alt : X and Y are not independent.Alt : X and Y are not independent.

Interpreting the Chi-Square StatisticInterpreting the Chi-Square Statistic

The Chi-Square statistic ranges from 0 to infinity The Chi-Square statistic ranges from 0 to infinity 0 = perfect statistical independence0 = perfect statistical independence Even though two variables may be statistically Even though two variables may be statistically

independent in the population, in a sample the independent in the population, in a sample the Chi-Square statistic may be > 0Chi-Square statistic may be > 0

Therefore it is necessary to determine Therefore it is necessary to determine statistical statistical significancesignificance for a Chi-Square statistic (given a for a Chi-Square statistic (given a certain level of confidence)certain level of confidence)

Cramer’s VCramer’s V

Problem with Chi-Square: not comparable Problem with Chi-Square: not comparable across different sample sizes (and their across different sample sizes (and their associated crosstab)associated crosstab)

Cramer’s V is a standardization of the Chi-Cramer’s V is a standardization of the Chi-Square statistic Square statistic

Calculating Cramer’s VCalculating Cramer’s V V = V =

Where R = #rows and C = Where R = #rows and C = #columns#columns

• V ranges from 0-1V ranges from 0-1

Example (region and Example (region and partisanship)partisanship)

= = √.09 = .30= = √.09 = .30

)1,1(

CRMinN

SquaredChi

)13(100

18

Relationships between Ordinal Relationships between Ordinal VariablesVariables

There are several measures of association There are several measures of association appropriate for relationships between appropriate for relationships between ordinal variablesordinal variables

Gamma, Tau-b, Tau-c, Somer’s dGamma, Tau-b, Tau-c, Somer’s d

All are based on identifying All are based on identifying concordantconcordant, , discordantdiscordant, and , and tiedtied pairs of observations pairs of observations

Concordant Pairs:Concordant Pairs:Ideology and VotingIdeology and Voting

IdeologyIdeology - conserv (1), moderate (2), liberal (3) - conserv (1), moderate (2), liberal (3) Voting Voting - never (1), sometimes (2), often (3)- never (1), sometimes (2), often (3)

Consider two hypothetical individuals in the Consider two hypothetical individuals in the sample with scoressample with scores

• Individual A: Ideology=1, Voting=1Individual A: Ideology=1, Voting=1• Individual B: Ideology=2, Voting=2Individual B: Ideology=2, Voting=2

• Pair A&B are considered a Pair A&B are considered a concordant pairconcordant pair because B’s because B’s ideology score is greater than A’s score, and B’s voting score ideology score is greater than A’s score, and B’s voting score is greater than A’s scoreis greater than A’s score

Concordant Pairs (cont’d)Concordant Pairs (cont’d) All of the following are concordant pairsAll of the following are concordant pairs

A(1,1) B(2,2)A(1,1) B(2,2) A(1,1) B(2,3)A(1,1) B(2,3) A(1,1) B(3,2)A(1,1) B(3,2) A(1,2) B(2,3)A(1,2) B(2,3) A(2,2) B(3,3)A(2,2) B(3,3)

Concordant pairs are consistent with a positive Concordant pairs are consistent with a positive relationship between the IV and the DV (ideology and relationship between the IV and the DV (ideology and voting)voting)

Discordant PairsDiscordant Pairs All of the following are discordant pairsAll of the following are discordant pairs

A(1,2) B(2,1)A(1,2) B(2,1) A(1,3) B(2,2)A(1,3) B(2,2) A(2,2) B(3,1)A(2,2) B(3,1) A(1,2) B(3,1)A(1,2) B(3,1) A(3,1) B(1,2)A(3,1) B(1,2)

Discordant pairs are consistent with a negative Discordant pairs are consistent with a negative relationship between the IV and the DV (ideology and relationship between the IV and the DV (ideology and voting)voting)

Identifying Concordant PairsIdentifying Concordant Pairs

Concordant Pairs for Never - Conserv (1,1)Concordant Pairs for Never - Conserv (1,1) #Concordant = 80*70 + 80*10 + 80*20 + 80*80 #Concordant = 80*70 + 80*10 + 80*20 + 80*80

= 14,400= 14,400

Conservative (1)

Moderate (2)

Liberal (3) Never (1)

80

10

10

Sometimes (2)

20

70

10

Often (3)

0

20

80

Identifying Concordant PairsIdentifying Concordant Pairs

Concordant Pairs for Never - Moderate (1,2)Concordant Pairs for Never - Moderate (1,2) #Concordant = 10*10 + 10*80 = 900#Concordant = 10*10 + 10*80 = 900

Conservative (1)

Moderate (2)

Liberal (3)

Never (1)

80

10

10

Sometimes (2)

20

70

10

Often (3)

0

20

80

Identifying Discordant PairsIdentifying Discordant Pairs

Discordant Pairs for Often - Conserv (1,3)Discordant Pairs for Often - Conserv (1,3) #Discordant = 0*10 + 0*10 + 0*70 + 0*10 = 0#Discordant = 0*10 + 0*10 + 0*70 + 0*10 = 0

Conservative (1)

Moderate (2)

Liberal (3)

Never (1)

80

10

10

Sometimes (2)

20

70

10

Often (3)

0

20

80

Identifying Discordant PairsIdentifying Discordant Pairs

Discordant Pairs for Often - Moderate (2,3)Discordant Pairs for Often - Moderate (2,3) #Discordant = 20*10 + 20*10#Discordant = 20*10 + 20*10

Conservative (1)

Moderate (2)

Liberal (3)

Never (1)

80

10

10

Sometimes (2)

20

70

10

Often (3)

0

20

80

GammaGamma

Gamma is calculated by identifying all Gamma is calculated by identifying all possible pairs of individuals in the sample possible pairs of individuals in the sample and determining if they are concordant or and determining if they are concordant or discordantdiscordant

Gamma = (#C - #D) / (#C + #D)Gamma = (#C - #D) / (#C + #D)

Interpreting GammaInterpreting Gamma

Gamma = 21400/24400 =.88Gamma = 21400/24400 =.88 Gamma ranges from -1 to +1Gamma ranges from -1 to +1 Gamma does not account for tied pairsGamma does not account for tied pairs

Tau (b and c) and Somer’s d account for Tau (b and c) and Somer’s d account for tied pairs in different waystied pairs in different ways

Square tables:

Non-Square tables:

ExampleExample

NES 2004 – What explains variation in NES 2004 – What explains variation in one’s political Ideology?one’s political Ideology?

Income?Income? Education?Education? Religion?Religion? Race?Race?

Bivariate Relationships and Bivariate Relationships and Hypothesis Testing Hypothesis Testing

(Significance Testing)(Significance Testing) 1. Determine the null and alternative 1. Determine the null and alternative

hypotheseshypotheses

• Null: There is no relationship between X Null: There is no relationship between X and Y (X and Y are statistically and Y (X and Y are statistically independent and independent and test statistictest statistic = 0). = 0).

• Alternative: There IS a relationship Alternative: There IS a relationship between X and Y (between X and Y (test statistictest statistic does not does not equal 0).equal 0).

Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing

2. Determine Appropriate Test Statistic 2. Determine Appropriate Test Statistic (based on measurement levels of X and Y)(based on measurement levels of X and Y)

3. Identify the type of sampling distribution 3. Identify the type of sampling distribution for test statistic, for test statistic, and what it would look like and what it would look like if the null hypothesis were trueif the null hypothesis were true..


4. Calculate the test statistic from the sample 4. Calculate the test statistic from the sample data and determine the probability of observing data and determine the probability of observing a test statistic this large (in absolute terms) a test statistic this large (in absolute terms) if the if the null hypothesis is truenull hypothesis is true. .

P-value (significance level)P-value (significance level) – probability of – probability of observing a test statistic at least as large as our observing a test statistic at least as large as our observed test statistic, if in fact the null observed test statistic, if in fact the null hypothesis is truehypothesis is true


5. Choose an “alpha level” – a decision rule to 5. Choose an “alpha level” – a decision rule to guide us in determining which values of the p-guide us in determining which values of the p-value lead us to reject/not reject the null value lead us to reject/not reject the null hypothesishypothesis When the p-value is extremely small, we reject the When the p-value is extremely small, we reject the

null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically significant,”“statistically significant,”

When the p-value is not small, we do not reject the When the p-value is not small, we do not reject the null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically insignificant.”“statistically insignificant.”

Most common alpha level: .05Most common alpha level: .05

Bottom LineBottom Line

Assuming we will always use an alpha Assuming we will always use an alpha level of .05:level of .05:

Reject the null hypothesis if P-value<.05Reject the null hypothesis if P-value<.05 Do not reject the null hypothesis if P-Do not reject the null hypothesis if P-

value>.05value>.05

An ExampleAn Example

Dependent variable: Vote Choice in 2000Dependent variable: Vote Choice in 2000 (Gore, Bush, Nader)(Gore, Bush, Nader) Independent variable: IdeologyIndependent variable: Ideology

(liberal, moderate, conservative)(liberal, moderate, conservative)


1. Determine the null and alternative 1. Determine the null and alternative hypotheses.hypotheses.


Null HypothesisNull Hypothesis: There is no relationship : There is no relationship between ideology and vote choice in 2000.between ideology and vote choice in 2000.

Alternative (Research) HypothesisAlternative (Research) Hypothesis: There : There is a relationship between ideology and is a relationship between ideology and vote choice (liberals were more likely to vote choice (liberals were more likely to vote for Gore, while conservatives were vote for Gore, while conservatives were more likely to vote for Bush).more likely to vote for Bush).


2. Determine Appropriate Test Statistic 2. Determine Appropriate Test Statistic (based on measurement levels of X and Y)(based on measurement levels of X and Y)

3. Identify the type of sampling distribution 3. Identify the type of sampling distribution for test statistic, for test statistic, and what it would look like and what it would look like if the null hypothesis were trueif the null hypothesis were true..

Sampling Distributions for the Chi-Squared StatisticSampling Distributions for the Chi-Squared Statistic(under assumption of perfect independence)(under assumption of perfect independence)

df = (rows-1)(columns-1)df = (rows-1)(columns-1)


4. Calculate the test statistic from the sample 4. Calculate the test statistic from the sample data and determine the probability of observing data and determine the probability of observing a test statistic this large (in absolute terms) a test statistic this large (in absolute terms) if the if the null hypothesis is truenull hypothesis is true. .

P-value (significance level)P-value (significance level) – probability of – probability of observing a test statistic at least as large as our observing a test statistic at least as large as our observed test statistic, if in fact the null observed test statistic, if in fact the null hypothesis is truehypothesis is true


5. Choose an “alpha level” – a decision rule to 5. Choose an “alpha level” – a decision rule to guide us in determining which values of the p-guide us in determining which values of the p-value lead us to reject/not reject the null value lead us to reject/not reject the null hypothesishypothesis When the p-value is extremely small, we reject the When the p-value is extremely small, we reject the

null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically significant,”“statistically significant,”

When the p-value is not small, we do not reject the When the p-value is not small, we do not reject the null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically insignificant.”“statistically insignificant.”

Most common alpha level: .05Most common alpha level: .05

In-Class ExerciseIn-Class Exercise For some years now, political commentators have cited For some years now, political commentators have cited

the importance of a “gender gap” in explaining election the importance of a “gender gap” in explaining election outcomes. What is the source of the gender gap?outcomes. What is the source of the gender gap?

Develop a simple theory and corresponding hypothesis Develop a simple theory and corresponding hypothesis (where gender is the independent variable) which seeks (where gender is the independent variable) which seeks to explain the source of the gender gap.to explain the source of the gender gap.

Specifically, determine:Specifically, determine: TheoryTheory Null and research hypothesisNull and research hypothesis Test statistic for a cross-tabulation to test your hypothesisTest statistic for a cross-tabulation to test your hypothesis

Crosstabulation and Measures of Association

Documents

Transcript of Crosstabulation and Measures of Association