Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent...
-
Upload
marcus-curtis -
Category
Documents
-
view
216 -
download
0
Transcript of Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent...
![Page 1: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/1.jpg)
Bivariate Relationships
Analyzing two variables at a time, usually the Independent & Dependent Variables
Like one variable at a time, this can be done visually with charts and graphs (such as a scatterplot), and with frequency tables. To see two univariate frequency tables together at the same time, you cross-table them, that is, you create a cross-tabulation (or shorthand: Crosstab).
Guidelines for creating crosstabs:
(1) Put the Dependent variable in ROWS
(2) Put the Independent variable in COLUMNS
(3) Calculate percentages in the direction of the independent variable (Columns in this case).
You are comparing the distributions of each category (value) of the independent variable with one another in terms of the categories of the dependent variable. For example, if you want to see if there is a relationship between gender and religion, you compare the values of gender (that is, male and female) across the various religions. When the number of men and the number of women are not exactly the same, you must standardize to compare by presenting the results in terms of percentages. The percentages of men who are Catholic, Jewish, etc. with the percentages of women who are Catholic, Jewish, etc. To compare, the percentages of men must add up to 100% as does the percentages of women.
![Page 2: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/2.jpg)
College Status * SEX Crosstabulation
342 212 554
74.9% 68.1% 71.8%
119 103 222
25.1% 32.0% 27.9%
474 322 796
100.0% 100.0% 100.0%
Count
% within SEX
Count
% within SEX
Count
% within SEX
Graduated
Withdrew/Leave
CollegeStatus
Total
F M
SEX
Total
Hypothesis: There is no relationship between Sex and College Status (Graduated or Left the College)
Which is the Independent and Dependent Variables? What are the levels of measurement?
Put into words: 74.9% of __________ have _____________.
This is not the same as saying 74.9% of those who graduated are Female.
If 71.8% of the entire four year period graduated, then compare the percentages of women with men relative to that 71.8%. Who tends to graduate disproportionately higher or lower than the overall rate?
![Page 3: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/3.jpg)
College Status * SEX Crosstabulation
342 212 554
61.7% 38.3% 100.0%
119 103 222
53.6% 46.4% 100.0%
474 322 796
59.5% 40.5% 100.0%
Count
% within CollegeStatus
Count
% within CollegeStatus
Count
% within CollegeStatus
Graduated
Withdrew/Leave
CollegeStatus
Total
F M
SEX
Total
This table, however, says: 61.7% of ___ are _______.
To say that 80% of sociology majors are women is not the same as saying that 80% of women are sociology majors.
You must always compare the categories (or values) of the independent variable by calculating percentages within each category separately. Each must add up to 100%.
And if 59.5% of all respondents are Female and 40.5% are Males, then who graduates disproportionately higher or lower than their distribution in the sample?
![Page 4: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/4.jpg)
College Status* Race/Ethnicity Crosstabulation
8 3 1 6 2 20
40.0% 15.0% 5.0% 30.0% 10.0% 100.0%
1.8% 3.3% 2.6% 5.5% 1.8% 2.5%
292 68 28 81 82 554
52.7% 12.3% 5.1% 14.6% 14.8% 100.0%
66.5% 73.9% 73.7% 74.3% 72.6% 69.6%
139 21 9 22 29 222
62.6% 9.5% 4.1% 9.9% 13.1% 100.0%
31.7% 22.8% 23.7% 20.2% 25.7% 27.9%
439 92 38 109 113 796
55.2% 11.6% 4.8% 13.7% 14.2% 100.0%
100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
Count
% within CollegeStatus
% withinRace/Ethnicity
Count
% within CollegeStatus
% withinRace/Ethnicity
Count
% within CollegeStatus
% withinRace/Ethnicity
Count
% within CollegeStatus
% withinRace/Ethnicity
Current
Graduated
Withdrew/Leave
CollegeStatus
Total
WhiteAsian/Pacific
Isl African Amer Latino(a) Other
Race/Ethnicity
Total
Put into words what this table is telling us.
![Page 5: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/5.jpg)
But how do we know if the differences between categories is big enough? What if we find that 75% of men own Toyotas and 79% of women own them? Is 4% a large enough difference or is that just sampling error?
To decide if a difference is significant enough to hold a press conference, we must use some statistical tests which will tell us what the odds are – the probability – that these findings occurred by chance alone, that is, by accident and not a real finding. If the odds are small, we have a significant finding, because the probability of the finding happening by accident is so small that it must be due to a real impact of the independent variable on the dependent variable – not an accidental impact.
For tell this you have to look for two things:
(1)The Value of the Statistic
(2) The Probability of that statistic occurring by chance
If the probability of a statistic occurring by chance is less than 5% (p < .05), then you reject the null (or accept the positive) and declare that there is a relationship between the independent and dependent variables.
![Page 6: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/6.jpg)
Chi-Square: a measure of association between the independent and dependent variables (usually nominal or ordinal measures). If the probability of obtaining a particular Chi-Square value by chance alone is less than .05, then we declare we have supported our hypothesis (or rejected our null). We hold a press conference and declare that indeed there is a relationship between the independent and dependent variables. Then we state in words what the relationship is (such as, women are more likely than men to vote Independent).
For the following data,
(a) state the null hypothesis being tested
(b) What are the independent & dependent variables?
(c) What levels of measurement are they?
![Page 7: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/7.jpg)
Use Quantitative Tools * Sex Crosstabulation
11 12 23
21.2% 12.8% 15.8%
14 29 43
26.9% 30.9% 29.5%
9 32 41
17.3% 34.0% 28.1%
18 21 39
34.6% 22.3% 26.7%
52 94 146
100.0% 100.0% 100.0%
Count
% within Sex
Count
% within Sex
Count
% within Sex
Count
% within Sex
Count
% within Sex
Not at All
A Little
Moderately
Greatly
Use QuantitativeTools
Total
Male Female
Sex
Total
Chi-Square Tests
6.898a 3 .075
7.046 3 .070
.001 1 .975
146
Pearson Chi-Square
Likelihood Ratio
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)
0 cells (.0%) have expected count less than 5. Theminimum expected count is 8.19.
a.
![Page 8: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/8.jpg)
The value of Chi-square does not tell you much in and of itself. You must depend on the probability level to tell you if it is significant and then all it tells you is that there is an association between the variables.
However, there are statistics that can tell you how strong a relationship is between your variables, not just whether there is one or not. These are called correlations. They tell you how much of variability of the dependent variable is explained by knowing the variability of the independent variable.
Nominal variables: Lambda
Ordinal variables: Gamma, Spearman’s rho
Interval/Ratio variables: Pearson r
All correlations have two components:
(1)The value which ranges from 0 to 1.0, where 1.0 is a perfect strong correlation and 0 is no correlation at all.
(2) For those variables that have a direction (an order: ordinal, ratio measures), a plus or minus sign to indicate a positive or inverse relationship
![Page 9: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/9.jpg)
A Lambda correlation of .75 between race and religion tells us that this is a strong relation (it’s close to 1.0) and therefore the variation in religion among our sample can be explained by the variation in race. You would then look to see which religions depend on which races and report that information (such as Whites tend to be Protestant, Latinos are Catholic, and so on).
A guideline:
Correlations between 0 and .30 tend to be weak
Correlations between .30 and .70 tend to be moderate
Correlations between .70 and 1.0 tend to be strong
A Pearson r correlation of -.60 is just as strong as one that is .60, and stronger than a correlation of .50, for example. The minus sign just tells us that it is inverse: those who score low on one variable, score high on the other. It does not mean it is weak or less than any positive correlation.
![Page 10: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/10.jpg)
Visual Version of Correlation: Scatterplots
ACT
40302010
SA
T_
TO
T
1800
1600
1400
1200
1000
800
600
400
Pearson r = .84
![Page 11: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/11.jpg)
Certain correlations also tell us the proportion reduction in error or PRE. This means that the proportion (or percentage) of errors that are made in predicting the values of a dependent variables is reduced by knowing the values of an independent variable.
For example:
A Lambda of .45 between race and religion would indicate that 45% of the errors in explaining the variability of religion among the respondents in our sample are reduced by knowing the variability of races in the sample.For Lambda and Gamma, PRE is simply the correlation coefficient. (Multiply by 100 to get a percent instead of a proportion).
For Pearson r and Spearman’s rho, you must square the correlation value to determine the proportion of error reduction (r2 or rho2). So a Pearson r correlation of -.50 between high school GPA and SAT scores would suggest that .25 or 25% of the errors in predicting SAT scores would be reduced once we know the respondents’ high school GPAs.
![Page 12: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/12.jpg)
Directional Measures
.045 .061 .727 .467
.068 .090 .727 .467
.000 .000 .c .c
.016 .012 .073d
.047 .034 .077d
Symmetric
Use QuantitativeTools Dependent
Sex Dependent
Use QuantitativeTools Dependent
Sex Dependent
Lambda
Goodman andKruskal tau
Nominal byNominal
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Cannot be computed because the asymptotic standard error equals zero.c.
Based on chi-square approximationd.
Symmetric Measures
-.005 .136 -.036 .971
146
GammaOrdinal by Ordinal
N of Valid Cases
ValueAsymp.
Std. Errora
Approx. Tb
Approx. Sig.
Not assuming the null hypothesis.a.
Using the asymptotic standard error assuming the null hypothesis.b.
Put these findings into words
![Page 13: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/13.jpg)
Review
(1) Determine the independent and dependent variables in the hypothesis.
(2) Label the levels of measurement for each variable.
(3) Decide the appropriate statistics to use.
(4) Evaluate the value of the statistic and the probability (or significance) level.
(5) If the p-value is less than .05, then reject the null and accept the positive hypothesis.
(6) If the statistic is a correlation (lambda, gamma, Pearson r, Spearman rho), then determine the PRE.
(7) Put the findings into words for (a) fellow statistics experts and (b) for the general public on your Facebook page or Twitter feed!
![Page 14: Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.](https://reader036.fdocuments.in/reader036/viewer/2022082517/56649e2a5503460f94b17b92/html5/thumbnails/14.jpg)
Example
There is no relationship between High School GPA and SAT scores.
There is a relationship between High School GPA and College GPA.
Correlations
1 .379** .180**
. .000 .000
1460 1440 1356
.379** 1 .192**
.000 . .000
1440 1508 1397
.180** .192** 1
.000 .000 .
1356 1397 1417
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
GPA_HS
GPA_CUM
SATTOTAL
GPA_HS GPA_CUM SATTOTAL
Correlation is significant at the 0.01 level (2-tailed).**.
(1) What are the independent &
dependent variables?
(2) Levels of measurement?
(3) Which statistic do you use?
(4) What do the values of the Pearson r mean?
(5) What are the Significance levels?
(6) What are the PRE interpretations?
(7) Put into words for a statistical audience and for the general public.