Correlation

38
Correlation

description

Correlation. (Lírios-Vicent Van Gogh,1889). setosa. virginica. versicolor. Iris data. Fisher’s iris data. S.Length S.Width P.Length P.Width Species 1 5.1 3.5 1.4 0.2 setosa - PowerPoint PPT Presentation

Transcript of Correlation

Page 1: Correlation

Correlation

Page 2: Correlation

(Lírios-Vicent Van Gogh,1889)

Page 3: Correlation

setosa versicolor virginica

Iris data

Page 4: Correlation

• Fisher’s iris data

S.Length S.Width P.Length P.Width Species

1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa ………………. 49 5.3 3.7 1.5 0.2 setosa 50 5.0 3.3 1.4 0.2 setosa 51 7.0 3.2 4.7 1.4

versicolor 52 6.4 3.2 4.5 1.5

versicolor …………………. 99 6.2 2.9 4.3 1.3

versicolor 100 5.7 2.8 4.1 1.3

versicolor101 6.3 3.3 6.0 2.5

virginica …………………150 5.9 3.0 5.1 1.8

virginica

Page 5: Correlation

Scatter-plot matrix

Sepal.Length

2.0 3.0 4.0 0.5 1.5 2.5

4.55.5

6.57.5

2.0

3.0

4.0

Sepal.Width

Petal.Length

1234567

0.5

1.5

2.5

Petal.Width

4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 1.0 2.0 3.0

1.0

2.0

3.0

Species

Page 6: Correlation

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

2.0

2.5

3.0

3.5

4.0

Sepal.Length

Sepal.Width

setosa

versicolorvirginica

Scatter plot (by group) and Trendlines

Page 7: Correlation

4.5 5.0 5.5

2.5

3.0

3.5

4.0

Sepal.Length

Sepal.Width

Scatterplot for setosa of iris data

Page 8: Correlation

30 50 70 90

30

50

70

90

30 50 70 90

30

50

70

90

30 50 70 90

30

50

70

90

negative relationship positive relationship

no apparent relationship

How to quantify the relationship ?

Page 9: Correlation

30 50 70 90

30

50

70

90

30 50 70 90

30

50

70

90

30 50 70 90

30

50

70

90

count pairs

Page 10: Correlation

20 30 50 70 80

30 10 40 70 50

-30 -20 0 20 30

-10 -30 0 30 10

count positive pairs

X

Y

XX YY

Page 11: Correlation

20 30 50 70 80

50 70 40 10 30

-30 -20 0 20 30

10 30 0 -30 -10

count negative pairs

X

Y

XX YY

Page 12: Correlation

30 50 70 90

30

50

70

90

300 500 700 9003

05

07

09

0

Need to consider scale matters

Page 13: Correlation

-10, -2, 3, 5, 7, 9

5, -7, 10, -3, 8, 5

Maximize the sum of products of each pair.

Page 14: Correlation

-10, -2, 3, 5, 7, 9

-7, -3, 5, 5, 8, 10

positively matched, negatively matched

-10, -2, 3, 5, 7, 9

10, 8, 5, 5, -3, -7

Page 15: Correlation

20 30 50 70 80

30 10 40 70 50

-30 -20 0 20 30

-10 -30 0 30 10

X

Y

XX YY

n

iii yyxx

nYXCov

1

* ))((1

1),(

Page 16: Correlation

30 50 70 90

30

50

70

90

-

30 50 70 90

30

50

70

90

0

30 50 70 90

30

50

70

90

+

Page 17: Correlation

-300 -200 0 200 300

-20 -60 0 60 20

-30 -20 0 20 30

-10 -30 0 30 10

Need to consider scale

XX YY

XX YY

n

iii yyxx

nYXCov

1

* ))((1

1),(

Page 18: Correlation

-300 -200 0 200 300

-20 -60 0 60 20XX YY

n

iii yyxx

nYXCov

1

* ))((1

1),(

n

iii xxxx

nXVar

1

* ))((1

1)(

)()(

),(),(

**

**

YVarXVar

YXCovYXCorr

Page 19: Correlation

-300 -200 0 200 300

-20 -60 0 60 20

22 XX

YY

),(10),10(),( 1*

1*

2* YXCovYXCovYXCov

)(10)10()( 12

1*

2* XVarXVarXVar

),(),( 1*

2* YXCorrYXCorr

-30 -20 0 20 30

-20 -60 0 60 2011 XX

YY

Page 20: Correlation

30 50 70 90

30

50

70

90 -1

30 50 70 903

05

07

09

0 +1

Cauchy-Schwartz inequality

1),(*1 YXCorr

(very strong) negative linear relationship(very strong) positive linear relationship

Page 21: Correlation

20 30 50 70 80

50 70 40 10 30

Exercise

X

Y

?),(* YXCov

Page 22: Correlation
Page 23: Correlation

YXYX XYEYXEYXCov )())((),(

Sample version

n

iii yyxx

nYXCov

1

* ))((1

1),(

Population version

)(,)( YEXE YX

Page 24: Correlation

2 YX

Population

covariance

1 2 3 Total

1 0 0.25 0 0.25

2 0.25 0 0.25 0.5

3 0 0.25 0 0.25

Total 0.25 0.5 0.25 1

XY X Y XY f

1 2 2 0.25

2 1 2 0.25

2 3 6 0.25

3 2 6 0.25

3 3 9 0

… … … …

0224)())((),( YXYX XYEYXEYXCov

425.0625.02)( XYE

Page 25: Correlation

Population

covariance

1 2 3 Total

1 1/3 0 0 1/3

2 0 1/3 0 1/3

3 0 0 1/3 1/3

Total 1/3 1/3 1/3 1

XY

?)())((),( YXYX XYEYXEYXCov

Exercise

1 2 3 Total

1 0 0 1/3 1/3

2 0 1/3 0 1/3

3 1/3 0 0 1/3

Total 1/3 1/3 1/3 1

XY

Page 26: Correlation

))((),( YX YXEYXCov

dydxyxfYX YX ),())((

i j

jiYjXi yxfyx ),())((

iiiYiXi yxfyx ),())((

Page 27: Correlation

22YX

XYXY

ss

sr

n N

n

iiiXY yyxx

ns

1

))((1

1),())((

1ii

N

iYiXiXY yxfyx

22YX

XYXY

Page 28: Correlation

Covariance is a measure of linear association

between two variables. Covariance is not a measure

curved association.

20 40 60 80 100

-100

-50

050

100

150

x

y

Page 29: Correlation

Covariance is affected by scales of variables,

but correlation is not, except of sign of scale.

Covariance may be any real value,

but correlation is a value only in [-1,1].

Page 30: Correlation

cXZ /)(

cYXCovYZCov /),(),(

0,),(

0),,(),(

cYXCorr

cYXCorrYZCorr

Page 31: Correlation

YX

XY

YXYX

YXYX

YXCovYXCovZZCov

),(),(

),(

XXX XZ /)(

Correlation is the covariance for standardized variables

YYY YZ /)(

),( YXCorr

Page 32: Correlation

30 50 70 90

30

50

70

90

300 500 700 900

30

50

70

90

Covariance = 189

Correlation = 0.78

Covariance = ?

Correlation = ?

Page 33: Correlation

30 50 70 90

30

50

70

90

-90 -80 -70

60

80

10

01

20

Covariance = 189

Correlation = 0.78

Covariance = ?

Correlation = ?

-60

Page 34: Correlation

Grouped by Zip Code

Gathering groups with + corr's does not give + corr.

Page 35: Correlation

High correlation does not mean that one variable

is the cause of the other.

Correlation is a measure of linear association,

but not a causation.

Page 36: Correlation

Correlation and causality

The more STBK stores, the higher will APT price increase ?

The more Starbucks, the higher APT price !

APT prices in Seoul

Page 37: Correlation

STBK

APT price

강남구 45 1030

강동구 2 530

중구 24 520

중랑구 0 330

STBK: number of Starbucks stores

APT price: Average APT price by a 1 m2

The more Starbucks, the deeper financial crisis are !

Page 38: Correlation

Thank you !!