Correlation
description
Transcript of Correlation
Correlation
A bit about Pearson’s r
Questions• Why does the
maximum value of r equal 1.0?
• What does it mean when a correlation is positive? Negative?
• What is the purpose of the Fisher r to z transformation?
• What is range restriction? Range enhancement? What do they do to r?
• Give an example in which data properly analyzed by ANOVA cannot be used to infer causality.
• Why do we care about the sampling distribution of the correlation coefficient?
• What is the effect of reliability on r?
Basic Ideas
• Nominal vs. continuous IV
• Degree (direction) & closeness (magnitude) of linear relations– Sign (+ or -) for direction– Absolute value for magnitude
• Pearson product-moment correlation coefficient
N
zzr YX
Illustrations
757269666360
Height
210
180
150
120
90
Wei
ght
Plot of Weight by Height
4003002001000Study Time
30
20
10
0
Err
ors
Plot of Errors by Study Time
1.91.81.71.61.5Toe Size
700
600
500
400
SA
T-V
Plot of SAT-V by Toe Size
Positive, negative, zero
Simple Formulas
rxy
NS SX Y
x X X and y Y Y
N
XXSX
2)(
rz z
Nx y
zX X
SX
Use either N throughout or else use N-1 throughout (SD and denominator); result is the same as long as you are consistent.
N
xyYXovC ),(
Pearson’s r is the average cross product of z scores. Product of (standardized) moments from the means.
Graphic Representation
757269666360
Height
210
180
150
120
90
Wei
ght
Plot of Weight by Height
757269666360
Height
Plot of Weight by Height
Plot of Weight by Height
Mean = 66.8 Inches
Mean = 150.7 lbs.
210-1-2Z-height
2
1
0
-1
-2
Z-w
eigh
t
Plot of Weight by Height in Z-scores
2
1
0
-1
-2
Z-w
eigh
t
Plot of Weight by Height in Z-scores
Plot of Weight by Height in Z-scores
+
-
-
+
1. Conversion from raw to z.
2. Points & quadrants. Positive & negative products.
3. Correlation is average of cross products. Sign & magnitude of r depend on where the points fall.
4. Product at maximum (average =1) when points on line where zX=zY.
Descriptive StatisticsN Minimum Maximum Mean Std. Deviation
Ht 10 60.00 78.00 69.0000 6.05530Wt 10 110.00 200.00 155.0000 30.27650Valid N (listwise) 10
r = 1.0
r=1
r=.99
Leave X, add error to Y.
r=.99
r=.91Add more error.
With 2 variables, the correlation is the z-score slope.
Review
• Why does the maximum value of r equal 1.0?
• What does it mean when a correlation is positive? Negative?
Sampling Distribution of rStatistic is r, parameter is ρ (rho). In general, r is slightly biased.
1.20.80.40.0-0.4-0.8-1.2
Observed r
0.08
0.06
0.04
0.02
0.00
Rel
ativ
e F
requ
ency
Sampling Distributions of r
rho=0 rho=.5rho=-.5
r N2
2 21
( )The sampling variance is approximately:
Sampling variance depends both on N and on ρ.
Empirical Sampling Distributions of the Correlation Coefficient
100;5. N 100;7. N
50;5. N 50;7. N
0.9 + 0 | 0 | | 0 | | 0 0 | 0.8 + 0 | | | | | | | | | +-----+ | 0 | +-----+ | | 0.7 + 0 | *--+--* *--+--* | | | +-----+ | | | | | | +-----+ | | | | | 0.6 + | | | | | | +-----+ 0 | | +-----+ | | 0 | | | | | | 0 | 0.5 + *--+--* *--+--* 0 0 | | | | | 0 0 | +-----+ | | * 0 | | +-----+ 0 0.4 + | | 0 | | | * 0 | | | * | | | 0.3 + 0 | | 0 | * | 0 | | 0 0 0.2 + 0 0 | 0 0 | 0 0 | 0 0.1 + 0 | 0 | 0 | 0 0 + * | * | * | -0.1 + ------------+-----------+-----------+-----------+----------- param .5_N100 .5_N50 .7_N100 .7_N50
Fisher’s r to z Transformation
r.10.20.30.40.50.60.70.80.90
z.10.20.31.42.55.69.871.101.47 1.00.80.60.40.20.0
r (sample value input)
1.5
1.3
1.1
0.9
0.6
0.4
0.2
0.0
z (o
utpu
t)
Fisher r to z Transformation
Sampling distribution of z is normal as N increases.Pulls out short tail to make better (normal) distribution.Sampling variance of z = (1/(n-3)) does not depend on ρ.
)1(
)1(ln5.
r
rz
Hypothesis test: 0:0 H
212
r
rNt
Result is compared to t with (N-
2) df for significance.
Say r=.25, N=100
56.2986.
25.899.9
25.1
25.98
2
t
t(.05, 98) = 1.984.
p< .05
Hypothesis test 2: valueH :0
z
rr
N
e e
. log . log
/
511
511
1 3
One sample z test where r is sample value and ρ is hypothesized population value.
Say N=200, r = .54, and ρ is .30.
ze e
. log..
. log..
/
51 541 54
51 301 30
1 200 3z
. .
.
60 31
07 =4.13
Compare to unit normal, e.g., 4.13 > 1.96 so it is significant. Our sample was not drawn from a population in which rho is .30.
Hypothesis test 3: 210 : H
Testing equality of correlations from 2 INDEPENDENT samples.
z
rr
rr
N N
e e
. log . log
/ ( ) / ( )
511
511
1 3 1 3
1
1
2
2
1 2
Say N1=150, r1=.63, N2=175, r2=70.
ze e
. log..
. log..
/ ( ) / ( )
51 631 63
51 701 70
1 150 3 1 175 3z
. .
.
74 87
11= -1.18, n.s.
Hypothesis test 4: kH ...: 210
Testing equality of any number of independent correlations.
)3(
)3(1
i
k
iii
n
znz
2))(3( zznQ ii
Compare Q to chi-square with k-1 df.
Study r n z (n-3)z zbar (z-zbar)2 (n-3)(z-zbar)2
1 .2 200 .2 39.94 .41 .0441 8.69
2 .5 150 .55 80.75 .41 .0196 2.88
3 .6 75 .69 49.91 .41 .0784 5.64
sum 425 170.6 17.21=Q
Chi-square at .05 with 2 df = 5.99. Not all rho are equal.
Hypothesis test 5: dependent r13120 : H
34120 : H
Hotelling-Williams test
323
223
1312)3( )1(||)3/()1(2
)1)(1()(
rrRNN
rNrrt N
2/)( 1312 rrr
534.)3)(.6)(.4(.23.6.4.1|| 222 R
Say N=101, r12=.4, r13=.6, r23=.3
5.2/)6.4(. r
1.2)3.1(5.534).98/()100(2
)3.1)(100()6.4(.
32)3(
Nt
t(.05, 98) = 1.98See my notes.
))()((21|| 2313122
232
132
12 rrrrrrR
Review
• What is the purpose of the Fisher r to z transformation?
• Test the hypothesis that – Given that r1 = .50, N1 = 103– r2 = .60, N2 = 128 and the samples are
independent.
• Why do we care about the sampling distribution of the correlation coefficient?
21
Range Restriction/Enhancement
ReliabilityReliability sets the ceiling for validity. Measurement error attenuates correlations.
'' YYXXTTXY YX
If correlation between true scores is .7 and reliability of X and Y are both .8, observed correlation is 7.sqrt(.8*.8) = .7*.8 = .56.
Disattenuated correlation
''/ YYXXXYTT YX
If our observed correlation is .56 and the reliabilities of both X and Y are .8, our estimate of the correlation between true scores is .56/.8 = .70.
Review
• What is range restriction? Range enhancement? What do they do to r?
• What is the effect of reliability on r?
SAS Power Estimationproc power;
onecorr dist=fisherz corr = 0.35
nullcorr = 0.2 sides = 1 ntotal = 100 power = .; run;
proc power; onecorr corr = 0.35
nullcorr = 0 sides = 2 ntotal = . power = .8; run;
Computed PowerActual alpha = .05Power = .486
Computed N TotalAlpha = .05Actual Power = .801Ntotal = 61
Power for CorrelationsRho N required against
Null: rho = 0
.10 782
.15 346
.20 193
.25 123
.30 84
.35 61
Sample sizes required for powerful conventional significance tests for typical values of the correlation coefficient in psychology. Power = .8, two tails, alpha is .05.