Crosstabulation and Measures of Association
-
Upload
derek-levine -
Category
Documents
-
view
45 -
download
0
description
Transcript of Crosstabulation and Measures of Association
Crosstabulation and Measures of Crosstabulation and Measures of AssociationAssociation
Investigating the relationship between two Investigating the relationship between two variablesvariables
Generally a statistical relationship exists if the Generally a statistical relationship exists if the values of the observations for one variable are values of the observations for one variable are associated with the values of the observations associated with the values of the observations for another variablefor another variable
Knowing that two variables are related allows us Knowing that two variables are related allows us to make predictions.to make predictions.
If we know the value of one, we can predict the If we know the value of one, we can predict the value of the other.value of the other.
Determining how the values of one Determining how the values of one variable are related to the values of variable are related to the values of another is one of the foundations of another is one of the foundations of empirical science. empirical science.
In making such determinations we must In making such determinations we must consider the following features of the consider the following features of the relationship.relationship.
1.) The level of measurement of the variables. 1.) The level of measurement of the variables. Difference varibles necessitate different Difference varibles necessitate different procedures.procedures.
2.) The form of the relationship. We can ask if 2.) The form of the relationship. We can ask if changes in X move in lockstep with changes in changes in X move in lockstep with changes in Y or if a more sophisticated relationship exists.Y or if a more sophisticated relationship exists.
3.)The strength of the relationship. Is it 3.)The strength of the relationship. Is it possible that some levels of X will always be possible that some levels of X will always be associated with certain levels of Y?associated with certain levels of Y?
4.) Numerical Summaries of the relationship. 4.) Numerical Summaries of the relationship. Social scientists strive to boil down the different Social scientists strive to boil down the different aspects of a relationship to a single number that aspects of a relationship to a single number that reveals the type and strength of the association.reveals the type and strength of the association.
5.) Conditional relationships. The variables X 5.) Conditional relationships. The variables X and Y may seem to be related in some fashion and Y may seem to be related in some fashion but appearances can be deceiving. but appearances can be deceiving. Spuriousness for example. So we need to know Spuriousness for example. So we need to know if the introduction of any other variables into the if the introduction of any other variables into the analysis changes the relationship.analysis changes the relationship.
Types of AssociationTypes of Association
1.) General Association – simply 1.) General Association – simply associated in some way.associated in some way.
2.) Positive Monotonic Correlation – when 2.) Positive Monotonic Correlation – when the variables have order (ordinal or the variables have order (ordinal or continuous) high values of one var are continuous) high values of one var are associated with high values of the other. associated with high values of the other. Converse is also true.Converse is also true.
3.) Negative Monotonic Correlation – Low 3.) Negative Monotonic Correlation – Low values are associated with high values.values are associated with high values.
Types of Association Cont.Types of Association Cont.
4.) Positive Linear Association – A 4.) Positive Linear Association – A particular type of positive monotonic particular type of positive monotonic relationship where the plotted values of X-relationship where the plotted values of X-Y fall on a straight line that slopes upward.Y fall on a straight line that slopes upward.
5.) Negative Linear Relaionship – Straight 5.) Negative Linear Relaionship – Straight line that slopes downward.line that slopes downward.
Strength of RelationshipsStrength of Relationships
Virtually no relationships between Virtually no relationships between variables in Social Science (and largely in variables in Social Science (and largely in natural science as well) have a perfect natural science as well) have a perfect form.form.
As a result it makes sense to talk about As a result it makes sense to talk about the strength of relationships.the strength of relationships.
Strength Cont.Strength Cont.
The strength of a relationship between The strength of a relationship between variables can be found by simply looking at variables can be found by simply looking at a graph of the data.a graph of the data.
If the values of X and Y are tied together If the values of X and Y are tied together tightly then the relationship is strong.tightly then the relationship is strong.
If the X-Y points are spread out then the If the X-Y points are spread out then the relationship is weak.relationship is weak.
Direction of RelationshipDirection of Relationship
We can also infer direction from a graph We can also infer direction from a graph by simply observing how the values for our by simply observing how the values for our variables move across the graph.variables move across the graph.
This is only true, however, when our This is only true, however, when our variables are ordinal or continuous.variables are ordinal or continuous.
Types of Bivariate Relationships Types of Bivariate Relationships and Associated Statisticsand Associated Statistics
Nominal/Ordinal (including dichotomous)Nominal/Ordinal (including dichotomous) Crosstabulation (Lamda, Chi-Square Gamma, etc.)Crosstabulation (Lamda, Chi-Square Gamma, etc.)
Interval and DichotomousInterval and Dichotomous Difference of means testDifference of means test
Interval and Nominal/OrdinalInterval and Nominal/Ordinal Analysis of VarianceAnalysis of Variance
Interval and RatioInterval and Ratio Regression and correlationRegression and correlation
Assessing Relationships between Assessing Relationships between VariablesVariables
1. Calculate appropriate statistic to 1. Calculate appropriate statistic to measure the magnitude of the relationship measure the magnitude of the relationship in the samplein the sample
2. Calculate additional statistics to 2. Calculate additional statistics to determine if the relationship holds for the determine if the relationship holds for the population of interest (statistical population of interest (statistical significance)significance) Substantive significance vs. Statistical Substantive significance vs. Statistical
significancesignificance
What is a Crosstabulation?What is a Crosstabulation?
Crosstabulations are appropriate for examining Crosstabulations are appropriate for examining relationships between variables that are relationships between variables that are nominalnominal, , ordinalordinal, or , or dichotomousdichotomous..
Crosstabs show values for variables categorized Crosstabs show values for variables categorized by another variable.by another variable.
They display the joint distribution of values of the They display the joint distribution of values of the variables by listing the categories for one along variables by listing the categories for one along the x-axis and the other along the y-axisthe x-axis and the other along the y-axis
Each case is then placed in a cell of the Each case is then placed in a cell of the table that represents the combination of table that represents the combination of values that corresponds to its scores on values that corresponds to its scores on the variables.the variables.
What is a Crosstabulation?What is a Crosstabulation?
Example: We would like to know if Example: We would like to know if presidential vote choice in 2000 was presidential vote choice in 2000 was related to race. related to race.
Vote choice = Gore or BushVote choice = Gore or Bush Race = White, Hispanic, BlackRace = White, Hispanic, Black
Are Race and Vote Choice Are Race and Vote Choice Related? Why?Related? Why?
Black
Hispanic
White
TOTAL
Gore
106
23
427
556
Bush
8
15
484
507
TOTAL
114
38
911
1063
Are Race and Vote Choice Are Race and Vote Choice Related? Why?Related? Why?
Black
Hispanic
White
TOTAL
Gore
106 (93%)
23 (60.5%)
427 (46.9%)
556 (52.3)
Bush
8 (7%)
15 (39.5%)
484 (53.1%)
507 (47.7)
TOTAL
114 (100%)
38 (100%)
911 (100%)
1063 (100%)
Measures of Association for Measures of Association for CrosstabulationsCrosstabulations
Purpose – to determine if nominal/ordinal Purpose – to determine if nominal/ordinal variables are related in a crosstabulationvariables are related in a crosstabulation
At least one nominal variableAt least one nominal variable LamdaLamda Chi-SquareChi-Square Cramer’s VCramer’s V
Two ordinal variablesTwo ordinal variables TauTau GammaGamma
Measures of Association for Measures of Association for CrosstabulationsCrosstabulations
These measures of association provide us with These measures of association provide us with correlation coefficients that summarize data from correlation coefficients that summarize data from a table into one number .a table into one number .
This is extremely useful when dealing with This is extremely useful when dealing with several tables or very complex tables.several tables or very complex tables.
These coefficients measure both the strength These coefficients measure both the strength and direction of an association.and direction of an association.
Coefficients for Nominal DataCoefficients for Nominal Data
When one or both of the variables are When one or both of the variables are nominal, ordinal coefficients cannot be nominal, ordinal coefficients cannot be used because there is no underlying used because there is no underlying ordering.ordering.
Instead we use PRE testsInstead we use PRE tests
Lambda (PRE coefficient)Lambda (PRE coefficient)
PRE – Proportional Reduction in ErrorPRE – Proportional Reduction in Error
Two RulesTwo Rules 1.) Make a prediction on the value of an 1.) Make a prediction on the value of an
observation in the absence of no prior observation in the absence of no prior informationinformation
2.) Given information on a second variable 2.) Given information on a second variable and take it into account in making the and take it into account in making the prediction.prediction.
Lambda PRELambda PRE
If the two variables are associated then the use of If the two variables are associated then the use of rule two should lead to fewer errors in your rule two should lead to fewer errors in your predictions than rule one.predictions than rule one.
How many fewer errors depends upon how closely How many fewer errors depends upon how closely the variables are associated.the variables are associated.
PRE = (E1 – E2) / E1PRE = (E1 – E2) / E1
Scale goes from 0 -1Scale goes from 0 -1
LambdaLambda
Lambda is a PRE coefficient and it relies on rules 1 Lambda is a PRE coefficient and it relies on rules 1 & 2 above. & 2 above.
When applying rule one all we have to go on is what When applying rule one all we have to go on is what proportion of the population fit into one category as proportion of the population fit into one category as opposed to another.opposed to another.
So, without any other information, guessing that So, without any other information, guessing that every observation is in the modal category would every observation is in the modal category would give you the best chance of getting the most correct.give you the best chance of getting the most correct.
Why?Why?
Think of it like this. If you knew that I Think of it like this. If you knew that I tended to make exams where the most tended to make exams where the most often used answer was B, then, without often used answer was B, then, without any other information, you would be best any other information, you would be best served to pick B every time.served to pick B every time.
But, if you know information about each But, if you know information about each case’s value on another variable, rule two case’s value on another variable, rule two directs you to only look at the members of directs you to only look at the members of that new category (variable) and find the that new category (variable) and find the modal category (only on that var).modal category (only on that var).
ExampleExample
Suppose a sample of 100 voters and you need to Suppose a sample of 100 voters and you need to predict how they will vote in the general election. predict how they will vote in the general election.
Assume we know that overall 30% voted democrat Assume we know that overall 30% voted democrat and 30% voted republican and 40% were and 30% voted republican and 40% were independent.independent.
Now suppose we take one person out of the group Now suppose we take one person out of the group (John Smith), our best guess would be that he (John Smith), our best guess would be that he would vote independent.would vote independent.
Now suppose we take another person (Larry Now suppose we take another person (Larry Mendez) and again we would assume he Mendez) and again we would assume he voted independent.voted independent.
As a result our best guess is to predict that all As a result our best guess is to predict that all of the voters (all 100) were independent.of the voters (all 100) were independent.
We are sure to get some wrong but it’s the We are sure to get some wrong but it’s the best we can do over the long run.best we can do over the long run.
How many do we get wrong? 60.How many do we get wrong? 60.
Suppose now that we know something Suppose now that we know something about the voters regions (where they are about the voters regions (where they are from) and we know what proportions from) and we know what proportions various regions voted in the election.various regions voted in the election.
NE-30 , MW – 20, SO – 30 , WE - 20NE-30 , MW – 20, SO – 30 , WE - 20
LamdaLamda
NE
MW
SO
WE
TOTAL
REPUB
1 2 4
1 2 10
1 2 6
1 2 10
30
IND
1 2 12
1 2 8
1 2 16
1 2 4
40
DEM
1 2 14
1 2 2
1 2 8
1 2 6
30
TOTAL
30
20
30
20
100
Lamda – Rule 1 Lamda – Rule 1 (prediction based solely on knowledge of marginal (prediction based solely on knowledge of marginal distribution of dependent variable – partisanship)distribution of dependent variable – partisanship)
NE
MW
SO
WE
TOTAL
REPUB
1 2 4 0
1 2 10 0
1 2 6 0
1 2 10 0
30
IND
1 2 12 30
1 2 8 20
1 2 16 30
1 2 4 20
40
DEM
1 2 14 0
1 2 2 0
1 2 8 0
1 2 6 0
30
TOTAL
30
20
30
20
100
Lamda – Rule 2Lamda – Rule 2(prediction based on knowledge provided by independent (prediction based on knowledge provided by independent
variable )variable )
NE
MW
SO
WE
TOTAL
REPUB
1 2 4 0 0
1 2 10 0 20
1 2 6 0 0
1 2 10 0 20
30
IND
1 2 12 30 0
1 2 8 20 0
1 2 16 30 30
1 2 4 20 0
40
DEM
1 2 14 0 30
1 2 2 0 0
1 2 8 0 0
1 2 6 0 0
30
TOTAL
30
20
30
20
100
Lamda –Calculation of ErrorsLamda –Calculation of Errors Errors w/Rule 1: 18 + 12 + 14 + 16 = 60Errors w/Rule 1: 18 + 12 + 14 + 16 = 60 Errors w/Rule 2: 16 + 10 + 14 + 10 = 50Errors w/Rule 2: 16 + 10 + 14 + 10 = 50 Lamda =(Errors R1 – Errors R2)/Errors R1Lamda =(Errors R1 – Errors R2)/Errors R1 Lamda = (60-50)/60=10/60=.17Lamda = (60-50)/60=10/60=.17
NE
MW
SO
WE
TOTAL
REPUB
1 2 4 0 0
1 2 10 0 20
1 2 6 0 0
1 2 10 0 20
30
IND
1 2 12 30 0
1 2 8 20 0
1 2 16 30 30
1 2 4 20 0
40
DEM
1 2 14 0 30
1 2 2 0 0
1 2 8 0 0
1 2 6 0 0
30
TOTAL
30
20
30
20
100
LamdaLamda
PRE measurePRE measure Ranges from 0-1Ranges from 0-1 Potential problems with LamdaPotential problems with Lamda
Underestimates relationship when variables Underestimates relationship when variables (one or both) are highly skewed(one or both) are highly skewed
Always 0 when modal category of Y is the Always 0 when modal category of Y is the same across all categories of Xsame across all categories of X
Chi –Square (Chi –Square (22))
Also appropriate for any crosstabulation Also appropriate for any crosstabulation with at least one nominal variable (and with at least one nominal variable (and another nominal/ordinal variable)another nominal/ordinal variable)
Based on the difference between the Based on the difference between the empirically observed crosstab and what empirically observed crosstab and what we would expect to observe if the two we would expect to observe if the two variables are variables are statistically independentstatistically independent
Background for Background for 22
Statistical Independence Statistical Independence – A property of two – A property of two variables in which the probability that an variables in which the probability that an observation is in a particular category of on variable observation is in a particular category of on variable and also in a particular category of the other and also in a particular category of the other variable equals the simple or marginal probability of variable equals the simple or marginal probability of being in those categories.being in those categories.
Plays a large role in data analysis Plays a large role in data analysis
Is another way to view the strength of a relaitionshipIs another way to view the strength of a relaitionship
ExampleExample
Suppose we have two nominal or categorical Suppose we have two nominal or categorical variables, X and Y. We label the categories for variables, X and Y. We label the categories for the first category (a,b,c) and those of the the first category (a,b,c) and those of the second (r,s,t). second (r,s,t).
Let P(X = a) stand for the probability that a Let P(X = a) stand for the probability that a randomly selected case has property a on randomly selected case has property a on variable X and P(Y = r) stand for the probability variable X and P(Y = r) stand for the probability that a randomly selected case has property r that a randomly selected case has property r on variable Y. on variable Y.
These two probabilities are called marginal These two probabilities are called marginal distributions and simply refers to the distributions and simply refers to the chance that an observation has a chance that an observation has a particular value on a particular variable particular value on a particular variable irrespective of its value on another irrespective of its value on another variable.variable.
Finally, let us assume that P(X = a, Y = r) stands for Finally, let us assume that P(X = a, Y = r) stands for the joint probability that a randomly selected the joint probability that a randomly selected observation has both property a and property r observation has both property a and property r simultaneously.simultaneously.
Statistical Independence – The two variables are Statistical Independence – The two variables are therefore statisitically independent only if the therefore statisitically independent only if the chances of observing a combination of categories is chances of observing a combination of categories is equal to the marginal probability of choosing one equal to the marginal probability of choosing one category times the marginal probability of the other.category times the marginal probability of the other.
Background for Background for 22
P(X = a, Y = r) = [P(X = a)] [P(Y = r)]P(X = a, Y = r) = [P(X = a)] [P(Y = r)]
For example, if men are as likely to vote as For example, if men are as likely to vote as women, then the two variables (gender and women, then the two variables (gender and voter turnout) are statistically independent voter turnout) are statistically independent because the probability of observing a male because the probability of observing a male nonvoter in the sample is equal to the nonvoter in the sample is equal to the probability of observing a male times the probability of observing a male times the probability of obseving a nonvoter.probability of obseving a nonvoter.
ExampleExample
If 100/300 are men & 210/300 voted then;If 100/300 are men & 210/300 voted then;
The marginal probabilities are:The marginal probabilities are:
P(X=m)=100/300 = .33 and P(Y=v) = P(X=m)=100/300 = .33 and P(Y=v) = 210/300 = .7210/300 = .7
.33 x .7 = .23 and is our marginal probability.33 x .7 = .23 and is our marginal probability
If we know that 70 of the voters are male If we know that 70 of the voters are male and take that proportion and divide by the and take that proportion and divide by the total number of voters (70/300) we also total number of voters (70/300) we also get .23.get .23.
We can therefore say that the two We can therefore say that the two variables are independent. variables are independent.
The chi-squared statistic essentially The chi-squared statistic essentially compares an observed result (the table compares an observed result (the table produced by the sample) with a hypothetical produced by the sample) with a hypothetical table that would occur if (in the population) table that would occur if (in the population) the variables were statistically independent.the variables were statistically independent.
A value of 0 implies statistical independence A value of 0 implies statistical independence which means no association.which means no association.
Chi-squared increases as the departures Chi-squared increases as the departures of observed and expected values grows. of observed and expected values grows. There is no upper limit to how big the There is no upper limit to how big the difference can become but if it is past a difference can become but if it is past a critical value then there is reason to reject critical value then there is reason to reject the null hypothesis that the two variables the null hypothesis that the two variables are independent.are independent.
How do we Calc. Chi^2How do we Calc. Chi^2
The observed frequencies are already in The observed frequencies are already in the crosstab.the crosstab.
The expected frequencies in each table The expected frequencies in each table cell are found by multiplying the row and cell are found by multiplying the row and the column marginal totals and dividing by the column marginal totals and dividing by the sample size.the sample size.
Chi –Square (Chi –Square (22))
NE
MW
SO
WE
TOTAL
REPUB
O E 4
O E 10
O E 6
O E 10
30
IND
O E 12
O E 8
O E 16
O E 4
40
DEM
O E 14
O E 2
O E 8
O E 6
30
TOTAL
30
20
30
20
100
Calculating Expected FrequenciesCalculating Expected Frequencies
To calculate the expected cell frequency for NE To calculate the expected cell frequency for NE Republicans:Republicans:
• E/30 = 30/100, therefore E=(30*30)/100 = 9E/30 = 30/100, therefore E=(30*30)/100 = 9
NE
MW
SO
WE
TOTAL
REPUB
O E 4
O E 10
O E 6
O E 10
30
IND
O E 12
O E 8
O E 16
O E 4
40
DEM
O E 14
O E 2
O E 8
O E 6
30
TOTAL
30
20
30
20
100
Calculating the Chi-Square StatisticCalculating the Chi-Square Statistic
The chi-square statistic is calculated as:The chi-square statistic is calculated as:
(Obs. Frequency(Obs. Frequencyikik - Exp. Frequency - Exp. Frequencyikik))22 / Exp. Frequency / Exp. Frequency ikik
(25/9)+(16/6)+(9/9)+(16/6)+(0)+(0)+(16/12)+(16/8)+(25/9)+16/6)+(1/9)+(0) = (25/9)+(16/6)+(9/9)+(16/6)+(0)+(0)+(16/12)+(16/8)+(25/9)+16/6)+(1/9)+(0) = 1818
NE
MW
SO
WE
TOTAL
REPUB
O E 4 9
O E 10 6
O E 6 9
O E 10 6
30
IND
O E 12 12
O E 8 8
O E 16 12
O E 4 8
40
DEM
O E 14 9
O E 2 6
O E 8 9
O E 6 6
30
TOTAL
30
20
30
20
100
The value 9, is the expected frequency in The value 9, is the expected frequency in the first cell of the table and is what we the first cell of the table and is what we would expect in a sample of 100 (with 30 would expect in a sample of 100 (with 30 Republicans and 30 north easterners) if Republicans and 30 north easterners) if there is statistical independence in the there is statistical independence in the population.population.
This is more than we have in our sample This is more than we have in our sample so there is a difference.so there is a difference.
Just Like the Hyp. TestJust Like the Hyp. Test
Null : Statistical Independence between x Null : Statistical Independence between x and Yand Y
Alt : X and Y are not independent.Alt : X and Y are not independent.
Interpreting the Chi-Square StatisticInterpreting the Chi-Square Statistic
The Chi-Square statistic ranges from 0 to infinity The Chi-Square statistic ranges from 0 to infinity 0 = perfect statistical independence0 = perfect statistical independence Even though two variables may be statistically Even though two variables may be statistically
independent in the population, in a sample the independent in the population, in a sample the Chi-Square statistic may be > 0Chi-Square statistic may be > 0
Therefore it is necessary to determine Therefore it is necessary to determine statistical statistical significancesignificance for a Chi-Square statistic (given a for a Chi-Square statistic (given a certain level of confidence)certain level of confidence)
Cramer’s VCramer’s V
Problem with Chi-Square: not comparable Problem with Chi-Square: not comparable across different sample sizes (and their across different sample sizes (and their associated crosstab)associated crosstab)
Cramer’s V is a standardization of the Chi-Cramer’s V is a standardization of the Chi-Square statistic Square statistic
Calculating Cramer’s VCalculating Cramer’s V V = V =
Where R = #rows and C = Where R = #rows and C = #columns#columns
• V ranges from 0-1V ranges from 0-1
Example (region and Example (region and partisanship)partisanship)
= = √.09 = .30= = √.09 = .30
)1,1(
CRMinN
SquaredChi
)13(100
18
Relationships between Ordinal Relationships between Ordinal VariablesVariables
There are several measures of association There are several measures of association appropriate for relationships between appropriate for relationships between ordinal variablesordinal variables
Gamma, Tau-b, Tau-c, Somer’s dGamma, Tau-b, Tau-c, Somer’s d
All are based on identifying All are based on identifying concordantconcordant, , discordantdiscordant, and , and tiedtied pairs of observations pairs of observations
Concordant Pairs:Concordant Pairs:Ideology and VotingIdeology and Voting
IdeologyIdeology - conserv (1), moderate (2), liberal (3) - conserv (1), moderate (2), liberal (3) Voting Voting - never (1), sometimes (2), often (3)- never (1), sometimes (2), often (3)
Consider two hypothetical individuals in the Consider two hypothetical individuals in the sample with scoressample with scores
• Individual A: Ideology=1, Voting=1Individual A: Ideology=1, Voting=1• Individual B: Ideology=2, Voting=2Individual B: Ideology=2, Voting=2
• Pair A&B are considered a Pair A&B are considered a concordant pairconcordant pair because B’s because B’s ideology score is greater than A’s score, and B’s voting score ideology score is greater than A’s score, and B’s voting score is greater than A’s scoreis greater than A’s score
Concordant Pairs (cont’d)Concordant Pairs (cont’d) All of the following are concordant pairsAll of the following are concordant pairs
A(1,1) B(2,2)A(1,1) B(2,2) A(1,1) B(2,3)A(1,1) B(2,3) A(1,1) B(3,2)A(1,1) B(3,2) A(1,2) B(2,3)A(1,2) B(2,3) A(2,2) B(3,3)A(2,2) B(3,3)
Concordant pairs are consistent with a positive Concordant pairs are consistent with a positive relationship between the IV and the DV (ideology and relationship between the IV and the DV (ideology and voting)voting)
Discordant PairsDiscordant Pairs All of the following are discordant pairsAll of the following are discordant pairs
A(1,2) B(2,1)A(1,2) B(2,1) A(1,3) B(2,2)A(1,3) B(2,2) A(2,2) B(3,1)A(2,2) B(3,1) A(1,2) B(3,1)A(1,2) B(3,1) A(3,1) B(1,2)A(3,1) B(1,2)
Discordant pairs are consistent with a negative Discordant pairs are consistent with a negative relationship between the IV and the DV (ideology and relationship between the IV and the DV (ideology and voting)voting)
Identifying Concordant PairsIdentifying Concordant Pairs
Concordant Pairs for Never - Conserv (1,1)Concordant Pairs for Never - Conserv (1,1) #Concordant = 80*70 + 80*10 + 80*20 + 80*80 #Concordant = 80*70 + 80*10 + 80*20 + 80*80
= 14,400= 14,400
Conservative (1)
Moderate (2)
Liberal (3) Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
Identifying Concordant PairsIdentifying Concordant Pairs
Concordant Pairs for Never - Moderate (1,2)Concordant Pairs for Never - Moderate (1,2) #Concordant = 10*10 + 10*80 = 900#Concordant = 10*10 + 10*80 = 900
Conservative (1)
Moderate (2)
Liberal (3)
Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
Identifying Discordant PairsIdentifying Discordant Pairs
Discordant Pairs for Often - Conserv (1,3)Discordant Pairs for Often - Conserv (1,3) #Discordant = 0*10 + 0*10 + 0*70 + 0*10 = 0#Discordant = 0*10 + 0*10 + 0*70 + 0*10 = 0
Conservative (1)
Moderate (2)
Liberal (3)
Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
Identifying Discordant PairsIdentifying Discordant Pairs
Discordant Pairs for Often - Moderate (2,3)Discordant Pairs for Often - Moderate (2,3) #Discordant = 20*10 + 20*10#Discordant = 20*10 + 20*10
Conservative (1)
Moderate (2)
Liberal (3)
Never (1)
80
10
10
Sometimes (2)
20
70
10
Often (3)
0
20
80
GammaGamma
Gamma is calculated by identifying all Gamma is calculated by identifying all possible pairs of individuals in the sample possible pairs of individuals in the sample and determining if they are concordant or and determining if they are concordant or discordantdiscordant
Gamma = (#C - #D) / (#C + #D)Gamma = (#C - #D) / (#C + #D)
Interpreting GammaInterpreting Gamma
Gamma = 21400/24400 =.88Gamma = 21400/24400 =.88 Gamma ranges from -1 to +1Gamma ranges from -1 to +1 Gamma does not account for tied pairsGamma does not account for tied pairs
Tau (b and c) and Somer’s d account for Tau (b and c) and Somer’s d account for tied pairs in different waystied pairs in different ways
Square tables:
Non-Square tables:
ExampleExample
NES 2004 – What explains variation in NES 2004 – What explains variation in one’s political Ideology?one’s political Ideology?
Income?Income? Education?Education? Religion?Religion? Race?Race?
Bivariate Relationships and Bivariate Relationships and Hypothesis Testing Hypothesis Testing
(Significance Testing)(Significance Testing) 1. Determine the null and alternative 1. Determine the null and alternative
hypotheseshypotheses
• Null: There is no relationship between X Null: There is no relationship between X and Y (X and Y are statistically and Y (X and Y are statistically independent and independent and test statistictest statistic = 0). = 0).
• Alternative: There IS a relationship Alternative: There IS a relationship between X and Y (between X and Y (test statistictest statistic does not does not equal 0).equal 0).
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
2. Determine Appropriate Test Statistic 2. Determine Appropriate Test Statistic (based on measurement levels of X and Y)(based on measurement levels of X and Y)
3. Identify the type of sampling distribution 3. Identify the type of sampling distribution for test statistic, for test statistic, and what it would look like and what it would look like if the null hypothesis were trueif the null hypothesis were true..
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
4. Calculate the test statistic from the sample 4. Calculate the test statistic from the sample data and determine the probability of observing data and determine the probability of observing a test statistic this large (in absolute terms) a test statistic this large (in absolute terms) if the if the null hypothesis is truenull hypothesis is true. .
P-value (significance level)P-value (significance level) – probability of – probability of observing a test statistic at least as large as our observing a test statistic at least as large as our observed test statistic, if in fact the null observed test statistic, if in fact the null hypothesis is truehypothesis is true
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
5. Choose an “alpha level” – a decision rule to 5. Choose an “alpha level” – a decision rule to guide us in determining which values of the p-guide us in determining which values of the p-value lead us to reject/not reject the null value lead us to reject/not reject the null hypothesishypothesis When the p-value is extremely small, we reject the When the p-value is extremely small, we reject the
null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically significant,”“statistically significant,”
When the p-value is not small, we do not reject the When the p-value is not small, we do not reject the null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically insignificant.”“statistically insignificant.”
Most common alpha level: .05Most common alpha level: .05
Bottom LineBottom Line
Assuming we will always use an alpha Assuming we will always use an alpha level of .05:level of .05:
Reject the null hypothesis if P-value<.05Reject the null hypothesis if P-value<.05 Do not reject the null hypothesis if P-Do not reject the null hypothesis if P-
value>.05value>.05
An ExampleAn Example
Dependent variable: Vote Choice in 2000Dependent variable: Vote Choice in 2000 (Gore, Bush, Nader)(Gore, Bush, Nader) Independent variable: IdeologyIndependent variable: Ideology
(liberal, moderate, conservative)(liberal, moderate, conservative)
An ExampleAn Example
1. Determine the null and alternative 1. Determine the null and alternative hypotheses.hypotheses.
An ExampleAn Example
Null HypothesisNull Hypothesis: There is no relationship : There is no relationship between ideology and vote choice in 2000.between ideology and vote choice in 2000.
Alternative (Research) HypothesisAlternative (Research) Hypothesis: There : There is a relationship between ideology and is a relationship between ideology and vote choice (liberals were more likely to vote choice (liberals were more likely to vote for Gore, while conservatives were vote for Gore, while conservatives were more likely to vote for Bush).more likely to vote for Bush).
An ExampleAn Example
2. Determine Appropriate Test Statistic 2. Determine Appropriate Test Statistic (based on measurement levels of X and Y)(based on measurement levels of X and Y)
3. Identify the type of sampling distribution 3. Identify the type of sampling distribution for test statistic, for test statistic, and what it would look like and what it would look like if the null hypothesis were trueif the null hypothesis were true..
Sampling Distributions for the Chi-Squared StatisticSampling Distributions for the Chi-Squared Statistic(under assumption of perfect independence)(under assumption of perfect independence)
df = (rows-1)(columns-1)df = (rows-1)(columns-1)
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
4. Calculate the test statistic from the sample 4. Calculate the test statistic from the sample data and determine the probability of observing data and determine the probability of observing a test statistic this large (in absolute terms) a test statistic this large (in absolute terms) if the if the null hypothesis is truenull hypothesis is true. .
P-value (significance level)P-value (significance level) – probability of – probability of observing a test statistic at least as large as our observing a test statistic at least as large as our observed test statistic, if in fact the null observed test statistic, if in fact the null hypothesis is truehypothesis is true
Bivariate Relationships and Bivariate Relationships and Hypothesis TestingHypothesis Testing
5. Choose an “alpha level” – a decision rule to 5. Choose an “alpha level” – a decision rule to guide us in determining which values of the p-guide us in determining which values of the p-value lead us to reject/not reject the null value lead us to reject/not reject the null hypothesishypothesis When the p-value is extremely small, we reject the When the p-value is extremely small, we reject the
null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically significant,”“statistically significant,”
When the p-value is not small, we do not reject the When the p-value is not small, we do not reject the null hypothesis (why?). The relationship is deemed null hypothesis (why?). The relationship is deemed “statistically insignificant.”“statistically insignificant.”
Most common alpha level: .05Most common alpha level: .05
In-Class ExerciseIn-Class Exercise For some years now, political commentators have cited For some years now, political commentators have cited
the importance of a “gender gap” in explaining election the importance of a “gender gap” in explaining election outcomes. What is the source of the gender gap?outcomes. What is the source of the gender gap?
Develop a simple theory and corresponding hypothesis Develop a simple theory and corresponding hypothesis (where gender is the independent variable) which seeks (where gender is the independent variable) which seeks to explain the source of the gender gap.to explain the source of the gender gap.
Specifically, determine:Specifically, determine: TheoryTheory Null and research hypothesisNull and research hypothesis Test statistic for a cross-tabulation to test your hypothesisTest statistic for a cross-tabulation to test your hypothesis