Correlation

Correlation

A bit about Pearson’s r

Questions• Why does the

maximum value of r equal 1.0?

• What does it mean when a correlation is positive? Negative?

• What is the purpose of the Fisher r to z transformation?

• What is range restriction? Range enhancement? What do they do to r?

• Give an example in which data properly analyzed by ANOVA cannot be used to infer causality.

• Why do we care about the sampling distribution of the correlation coefficient?

• What is the effect of reliability on r?

Basic Ideas

• Nominal vs. continuous IV

• Degree (direction) & closeness (magnitude) of linear relations– Sign (+ or -) for direction– Absolute value for magnitude

• Pearson product-moment correlation coefficient

N

zzr YX

Illustrations

757269666360

Height

210

180

150

120

90

Wei

ght

Plot of Weight by Height

4003002001000Study Time

30

20

10

0

Err

ors

Plot of Errors by Study Time

1.91.81.71.61.5Toe Size

700

600

500

400

SA

T-V

Plot of SAT-V by Toe Size

Positive, negative, zero

Simple Formulas

rxy

NS SX Y

x X X and y Y Y

N

XXSX

2)(

rz z

Nx y

zX X

SX

Use either N throughout or else use N-1 throughout (SD and denominator); result is the same as long as you are consistent.

N

xyYXovC ),(

Pearson’s r is the average cross product of z scores. Product of (standardized) moments from the means.

Graphic Representation

757269666360

Height

210

180

150

120

90

Wei

ght


757269666360

Height



Mean = 66.8 Inches

Mean = 150.7 lbs.

210-1-2Z-height

2

1

0

-1

-2

Z-w

eigh

t

Plot of Weight by Height in Z-scores

2

1

0

-1

-2

Z-w

eigh

t



+

-

-

+

1. Conversion from raw to z.

2. Points & quadrants. Positive & negative products.

3. Correlation is average of cross products. Sign & magnitude of r depend on where the points fall.

4. Product at maximum (average =1) when points on line where zX=zY.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation

Ht 10 60.00 78.00 69.0000 6.05530Wt 10 110.00 200.00 155.0000 30.27650Valid N (listwise) 10

r = 1.0

r=1

r=.99

Leave X, add error to Y.

r=.99

r=.91Add more error.

With 2 variables, the correlation is the z-score slope.

Review

• Why does the maximum value of r equal 1.0?

• What does it mean when a correlation is positive? Negative?

Sampling Distribution of rStatistic is r, parameter is ρ (rho). In general, r is slightly biased.

1.20.80.40.0-0.4-0.8-1.2

Observed r

0.08

0.06

0.04

0.02

0.00

Rel

ativ

e F

requ

ency

Sampling Distributions of r

rho=0 rho=.5rho=-.5

r N2

2 21

( )The sampling variance is approximately:

Sampling variance depends both on N and on ρ.

Empirical Sampling Distributions of the Correlation Coefficient

100;5. N 100;7. N

50;5. N 50;7. N

0.9 + 0 | 0 | | 0 | | 0 0 | 0.8 + 0 | | | | | | | | | +-----+ | 0 | +-----+ | | 0.7 + 0 | *--+--* *--+--* | | | +-----+ | | | | | | +-----+ | | | | | 0.6 + | | | | | | +-----+ 0 | | +-----+ | | 0 | | | | | | 0 | 0.5 + *--+--* *--+--* 0 0 | | | | | 0 0 | +-----+ | | * 0 | | +-----+ 0 0.4 + | | 0 | | | * 0 | | | * | | | 0.3 + 0 | | 0 | * | 0 | | 0 0 0.2 + 0 0 | 0 0 | 0 0 | 0 0.1 + 0 | 0 | 0 | 0 0 + * | * | * | -0.1 + ------------+-----------+-----------+-----------+----------- param .5_N100 .5_N50 .7_N100 .7_N50

Fisher’s r to z Transformation

r.10.20.30.40.50.60.70.80.90

z.10.20.31.42.55.69.871.101.47 1.00.80.60.40.20.0

r (sample value input)

1.5

1.3

1.1

0.9

0.6

0.4

0.2

0.0

z (o

utpu

t)

Fisher r to z Transformation

Sampling distribution of z is normal as N increases.Pulls out short tail to make better (normal) distribution.Sampling variance of z = (1/(n-3)) does not depend on ρ.

)1(

)1(ln5.

r

rz

Hypothesis test: 0:0 H

212

r

rNt

Result is compared to t with (N-

2) df for significance.

Say r=.25, N=100

56.2986.

25.899.9

25.1

25.98

2

t

t(.05, 98) = 1.984.

p< .05

Hypothesis test 2: valueH :0

z

rr

N

e e

. log . log

/

511

511

1 3

One sample z test where r is sample value and ρ is hypothesized population value.

Say N=200, r = .54, and ρ is .30.

ze e

. log..

. log..

/

51 541 54

51 301 30

1 200 3z

. .

.

60 31

07 =4.13

Compare to unit normal, e.g., 4.13 > 1.96 so it is significant. Our sample was not drawn from a population in which rho is .30.

Hypothesis test 3: 210 : H

Testing equality of correlations from 2 INDEPENDENT samples.

z

rr

rr

N N

e e

. log . log

/ ( ) / ( )

511

511

1 3 1 3

1

1

2

2

1 2

Say N1=150, r1=.63, N2=175, r2=70.

ze e

. log..

. log..

/ ( ) / ( )

51 631 63

51 701 70

1 150 3 1 175 3z

. .

.

74 87

11= -1.18, n.s.

Hypothesis test 4: kH ...: 210

Testing equality of any number of independent correlations.

)3(

)3(1

i

k

iii

n

znz

2))(3( zznQ ii

Compare Q to chi-square with k-1 df.

Study r n z (n-3)z zbar (z-zbar)2 (n-3)(z-zbar)2

1 .2 200 .2 39.94 .41 .0441 8.69

2 .5 150 .55 80.75 .41 .0196 2.88

3 .6 75 .69 49.91 .41 .0784 5.64

sum 425 170.6 17.21=Q

Chi-square at .05 with 2 df = 5.99. Not all rho are equal.

Hypothesis test 5: dependent r13120 : H

34120 : H

Hotelling-Williams test

323

223

1312)3( )1(||)3/()1(2

)1)(1()(

rrRNN

rNrrt N

2/)( 1312 rrr

534.)3)(.6)(.4(.23.6.4.1|| 222 R

Say N=101, r12=.4, r13=.6, r23=.3

5.2/)6.4(. r

1.2)3.1(5.534).98/()100(2

)3.1)(100()6.4(.

32)3(

Nt

t(.05, 98) = 1.98See my notes.

))()((21|| 2313122

232

132

12 rrrrrrR

Review

• What is the purpose of the Fisher r to z transformation?

• Test the hypothesis that – Given that r1 = .50, N1 = 103– r2 = .60, N2 = 128 and the samples are

independent.

• Why do we care about the sampling distribution of the correlation coefficient?

21

Range Restriction/Enhancement

ReliabilityReliability sets the ceiling for validity. Measurement error attenuates correlations.

'' YYXXTTXY YX

If correlation between true scores is .7 and reliability of X and Y are both .8, observed correlation is 7.sqrt(.8*.8) = .7*.8 = .56.

Disattenuated correlation

''/ YYXXXYTT YX

If our observed correlation is .56 and the reliabilities of both X and Y are .8, our estimate of the correlation between true scores is .56/.8 = .70.

Review

• What is range restriction? Range enhancement? What do they do to r?

• What is the effect of reliability on r?

SAS Power Estimationproc power;

onecorr dist=fisherz corr = 0.35

nullcorr = 0.2 sides = 1 ntotal = 100 power = .; run;

proc power; onecorr corr = 0.35

nullcorr = 0 sides = 2 ntotal = . power = .8; run;

Computed PowerActual alpha = .05Power = .486

Computed N TotalAlpha = .05Actual Power = .801Ntotal = 61

Power for CorrelationsRho N required against

Null: rho = 0

.10 782

.15 346

.20 193

.25 123

.30 84

.35 61

Sample sizes required for powerful conventional significance tests for typical values of the correlation coefficient in psychology. Power = .8, two tails, alpha is .05.

Correlation

Documents

Transcript of Correlation