Correlational Problems and Fallacies James H. Steiger.
-
Upload
clementine-golden -
Category
Documents
-
view
225 -
download
1
Transcript of Correlational Problems and Fallacies James H. Steiger.
Introduction
In this module, we discuss some common problems and fallacies regarding correlation coefficients and their interpretation Interpreting a correlation Correlation and causality Perfect correlation and equivalence No Correlation vs. No Relation Combining Populations, and Ignoring Explanatory
Variables Restriction of Range
Interpreting a Correlation
If scores are on roughly similar scales, the shape of the scatterplot can reveal a substantial amount about the correlation.
Interpreting a CorrelationScatterplot (Cigarettes vs. Cardiac Reserve)
Cigarettes Smoked
Ca
rdia
c R
ese
rve
18
22
26
30
34
38
42
2 6 10 14 18 22 26 30
r =
Interpreting a CorrelationScatterplot (Shoe Size vs. IQ)
Shoe Size
IQ
20
40
60
80
100
120
140
160
-2 0 2 4 6 8 10 12 14 16
r = .01
Interpreting a Correlation
Scatterplot (GPA vs. IQ)
GPA
IQ
20
40
60
80
100
120
140
160
20 40 60 80 100 120
r = .72
Anscombe’s Quartet
X1 Y1 X2 Y2 X3 Y3 X4 Y4 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.10 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.10 4 5.39 19 12.50 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89
Anscombe’s Quartet
9XM 2 11Xs
7.5YM 2 4.13Ys
Each of the above 4 data sets has the following summary statistics:
.82XYr
Each has a best fitting linear regression line of ˆ .5 3Y X
Anscombe’s QuartetScatterplot (Anscombe.STA 8v*11c)
y=3+0.5*x+eps
X4
Y4
4
6
8
10
12
14
6 8 10 12 14 16 18 20
Scatterplot (Anscombe.STA 8v*11c)
y=3+0.5*x+eps
X3
Y3
4
6
8
10
12
14
2 4 6 8 10 12 14 16
Scatterplot (Anscombe.STA 8v*11c)
y=3+0.5*x+eps
X1
Y1
3
4
5
6
7
8
9
10
11
12
2 4 6 8 10 12 14 16
Scatterplot (Anscombe.STA 8v*11c)
y=3.+0.5*x+eps
X2
Y2
2
3
4
5
6
7
8
9
10
2 4 6 8 10 12 14 16
Correlation and Causality
Correlation is not causality. This is a standard adage in textbooks on statistics and experimental design, but it is still forgotten on occasion.
Example: The correlation between number of fire trucks sent to a fire and the dollar damage done by the fire.
Perfect Corrrelation and Equivalence
Two variables may correlate highly (or even perfectly), without measuring the same construct.
Example: Height and weight on the planet Zorg.
Zero Correlation vs. No Relation
The Pearson correlation coefficient is a measure of linear relation. Many strong relationships are nonlinear. Always examine the scatterplot!
Combining Populations
If two groups with different means and/or covariances are combined, the resulting mixture can exhibit spurious correlations.
Example. (C.P.) Suppose the correlation between strength and mathematics performance is zero for 6th grade boys, and zero for 8th grade boys. Does this mean it will be zero in a combined group of 6th and 8th graders?
Restriction of Range
Often, when linear regression is used to predict performance, the population is restricted. (For example, the GRE is used to predict performance in graduate school, but people with low GRE scores are often refused admission to graduate school. Consequently, the “available data” are a truncated version of the full data set.
Restriction of RangeScatterplot (Restriction of Range.STA 10v*1000c)
y=33.905+0.514*x+eps
VAR1
VAR
2
40
50
60
70
80
90
100
20 30 40 50 60 70 80 90 100 110
N = 1000 r = .73
Restriction of RangeScatterplot (Restriction of Range.STA 10v*1000c)
y=35.09+0.497*x+eps
VAR1
VAR
2
60
66
72
78
84
90
96
78 82 86 90 94 98 102 106
N = 153 r = .40
The “Third Variable Fallacy”
Often people assume, sometimes almost subconsciously, that when two variables correlate highly with a third variable, they correlate highly with each other.
Actually, if rXW and rYW are both .7071, rXY can vary anywhere from 0 to 1.
Only when rXW and/or rYW become very high does the correlation between X and Y become highly restricted.