Chris Piech CS109, Stanford University

Covariance and CorrelationChris Piech

CS109, Stanford University

Your random variables are correlated

Four Prototypical Trajectories

Revisión

E[g(X,Y )] =X

g(x, y)p(x, y)

E[g(X)] =X

g(x)p(x)

Expected Values of Functions

Forexample:

Let g(X,Y ) = X · Y

X, Y areindependent randomvariables:

E[X · Y ] = E[g(X,Y )]

g(x, y) · p(x, y)

xy · p(x)p(y)

xp(x) ·X

yp(y) = E[X] · E[Y ]

Fin de la revisión

Tell your friends!

Recall our Ebola Bats

Gene1 Gene2 Gene3 Gene4 Gene5 TraitTRUE FALSE TRUE TRUE FALSE FALSEFALSE FALSE TRUE TRUE TRUE TRUETRUE FALSE TRUE FALSE FALSE FALSETRUE FALSE TRUE TRUE TRUE FALSEFALSE TRUE TRUE TRUE TRUE TRUEFALSE FALSE FALSE TRUE FALSE FALSETRUE FALSE FALSE TRUE FALSE FALSETRUE FALSE FALSE TRUE FALSE FALSETRUE FALSE TRUE FALSE FALSE FALSEFALSE TRUE FALSE TRUE FALSE FALSETRUE TRUE FALSE TRUE FALSE FALSETRUE FALSE FALSE TRUE FALSE FALSETRUE FALSE TRUE TRUE TRUE FALSEFALSE FALSE TRUE TRUE FALSE FALSETRUE FALSE FALSE TRUE FALSE FALSETRUE FALSE FALSE TRUE FALSE FALSE

…TRUE FALSE FALSE TRUE FALSE FALSE

Bat Data

Gene1 Gene2 Gene3 Gene4 Gene5 Trait0.71 0.29 0.89 0.82 0.76 0.830.17 0.02 0.89 0.02 0.94 0.850.01 0.63 0.76 0.38 0.82 0.030.19 0.95 0.63 0.89 0.94 0.320.46 0.96 0.36 0.12 0.50 0.100.48 0.51 0.45 0.16 0.40 0.530.20 0.77 0.27 0.23 0.90 0.670.49 0.24 0.77 0.37 0.29 0.710.59 0.95 0.38 0.42 0.72 0.250.43 0.66 0.57 0.03 0.15 0.240.32 0.42 0.25 0.12 0.79 0.980.77 0.31 0.66 0.78 0.68 0.770.46 0.59 0.38 0.99 0.71 0.370.97 0.66 0.05 0.99 0.36 0.180.50 0.66 0.35 0.41 0.62 0.080.70 0.85 0.98 0.29 0.59 0.38

...0.78 0.09 0.69 0.41 0.82 0.76

Expression Amount

La baila de la Covariance

-4 0 4

Spot The Difference

Understanding Covariance

x� E[x] = 3

y � E[y] = 2.6

(x� E[x])(y � E[y]) = 7.8

Vary Together

(x� E[x])(y � E[y]) = 0

x� E[x] ⇡ 0

y � E[y] ⇡ 0

Vary Together

y � E[y] = �2.8

x� E[x] = �1.1

(x� E[x])(y � E[y]) ⇡ 3.1

• Say X and Y are arbitrary random variables• Covariance of X and Y:

])][])([[(),(Cov YEYXEXEYX --=

The Dance of the Covariance

x y (x – E[X])(y – E[Y])p(x,y)

Above mean

Positive

Bellow mean

Positive

Bellow mean

Above mean

Negative

Above mean

Bellow mean

Negative

• Say X and Y are arbitrary random variables• Covariance of X and Y:

• Equivalently:

§ X and Y independent, E[XY] = E[X]E[Y] à Cov(X,Y) = 0§ But Cov(X,Y) = 0 does not imply X and Y independent!

])][])([[(),(Cov YEYXEXEYX --=

]][][][][[),(Cov XEYEYXEYXEXYEYX +--=

][][][][][][][ YEXEYEXEYEXEXYE +--=

][][][ YEXEXYE -=

The Dance of the Covariance

• Consider the following data:Weight Height Weight * Height

64 57 364871 59 418953 49 259767 62 415455 51 280558 50 290077 55 423557 48 273656 42 235251 42 214276 61 463668 57 3876

E[W] = 62.75

E[H] = 52.75

E[W*H] = 3355.83

3035404550556065

40 45 50 55 60 65 70 75 80

Height

Weight

Cov(W, H) = E[W*H] – E[W]E[H]= 3355.83 – (62.75)(52.75)= 45.77

Covariance and Data

CovarianceSocrative: (a) positive, (b) negative, (c) zero

Positive

• X and Y are random variables with PMF:

§ E[X] = -1(1/3) + 0(1/3) + 1(1/3) = 0§ E[Y] = 0(2/3) + 1(1/3) = 1/3§ Since XY = 0, E[XY] = 0§ Cov(X, Y) = E[XY] – E[X]E[Y] = 0 – 0 = 0

• But, X and Y are clearly dependent!

XY -1 0 1 pY(y)

0 1/3 0 1/3 2/3

1 0 1/3 0 1/3

pX(x) 1/3 1/3 1/3 1

îíì ¹

= otherwise 1

0 if 0 XY

Independence and Covariance

• Say X and Y are arbitrary random variables§

• Covariance of sums of random variables§ X1, X2, …, Xn and Y1, Y2, …, Ym are random variables

),(Cov),(Cov XYYX =

)(Var][][][),(Cov 2 XXEXEXEXX =-=

),(Cov),(Cov YXaYbaX =+

åååå= ===

=÷÷ø

öççè

ii YXY,X

1 111),(Cov Cov

Properties of Covariance

• Let IA and IB be indicators for events A and B

§ E[IA] = P(A), E[IB] = P(B), E[IAIB] = P(AB)§ Cov(IA, IB) = E[IAIB] – E[IA] E[IB]

= P(AB) – P(A)P(B)= P(A | B)P(B) – P(A)P(B)= P(B)[P(A | B) – P(A)]

§ Cov(IA, IB) determined by P(A | B) – P(A)§ P(A | B) > P(A) Þ r(IA, IB) > 0§ P(A | B) = P(A) Þ r(IA, IB) = 0 (and Cov(IA, IB) = 0)§ P(A | B) < P(A) Þ r(IA, IB) < 0

îíì

=otherwise 0

occurs if 1 AIA

îíì

=otherwise 0

occurs if 1 BIB

Do Indicators Covary?

• Consider rolling a 6-sided die§ Let indicator variable X = 1 if roll is 1, 2, 3, or 4§ Let indicator variable Y = 1 if roll is 3, 4, 5, or 6

• What is Cov(X, Y)?§ E[X] = 2/3 and E[Y] = 2/3

§ E[XY] =

= (0 * 0) + (0 * 1/3) + (0 * 1/3) + (1 * 1/3) = 1/3

§ Cov(X, Y) = E[XY] – E[X]E[Y] = 1/3 – 4/9 = -1/9§ Consider: P(X = 1) = 2/3 and P(X = 1 | Y = 1) = 1/2

o Observing Y = 1 makes X = 1 less likely

ååx y

yxpxy ),(

Example of Covariance

Correlation

• Consider the following data:Weight Height Weight * Height

64 57 364871 59 418953 49 259767 62 415455 51 280558 50 290077 55 423557 48 273656 42 235251 42 214276 61 463668 57 3876

E[W] = 62.75

E[H] = 52.75

E[W*H] = 3355.83

3035404550556065

40 45 50 55 60 65 70 75 80

Height

Weight

Cov(W, H) = E[W*H] – E[W]E[H]= 3355.83 – (62.75)(52.75)= 45.77

What is Wrong With This?

• Say X and Y are arbitrary random variables§ Correlation of X and Y, denoted r(X, Y):

§ Note: -1 £ r(X, Y) £ 1 § Correlation measures linearity between X and Y§ r(X, Y) = 1 Þ Y = aX + b where a = sy/sx

§ r(X, Y) = -1 Þ Y = aX + b where a = -sy/sx

§ r(X, Y) = 0 Þ absence of linear relationshipo But, X and Y can still be related in some other way!

§ If r(X, Y) = 0, we say X and Y are “uncorrelated”o Note: Independence implies uncorrelated, but not vice versa!

Y)Var(X)Var(),(Cov),( YXYX =r

Viva La Correlatión

Viva La Correlatión1010 students in the UK gave their preferences on everything

from music genres to interests

http://www.aei.org/publication/blog/

Rock Music Vs Oil?

Hubbert Peak Theory

http://www.bbc.com/news/magazine-27537142

Divorce Vs Butter?

If we have time

• Computing Cov(Xi, Xj)§ Indicator Ii(k) = 1 if trial k has outcome i, 0 otherwise

§ When a ¹ b, trial a and b independent:§ When a = b:§ Since trial a cannot have outcome i and j:

Þ Xi and Xj negatively correlated

kii kIX

)( å=

kjj kIX

åå= =

bjiji aIbIXX

))(),((Cov),(Cov

0))(),((Cov =aIbI ji

)]([)]([)]()([))(),((Cov aIEaIEaIaIEaIbI jijiji -=

ii pkIE =)]([

0)]()([ =aIaIE ji

åå===

bajiji aIEaIEaIbIXX

)])([)]([())(),((Cov),(Cov

aji pnppp -=-=å

Covariance and the Multinomial

• Multinomial distributions:§ Count of strings hashed into buckets in hash table§ Number of server requests across machines in cluster§ Distribution of words/tokens in an email§ Etc.

• When m (# outcomes) is large, pi is small§ For equally likely outcomes: pi = 1/m

§ Large m Þ Xi and Xj very mildly negatively correlated§ Poisson paradigm applicable

2),(Cov mn

jiji pnpXX -=-=

Multinomials All Around

Que te vayas bien

Chris Piech CS109, Stanford University

Documents

Transcript of Chris Piech CS109, Stanford University

10: The Normal (Gaussian) Distributionweb.stanford.edu/class/cs109/lectures/10_normal_gaussian...Did not invent Normal distribution but rather popularized it 6 Lisa Yan, CS109, 2020

07: Random Variables II - Stanford Universityweb.stanford.edu/class/archive/cs/cs109/cs109.1208/lectures/07_rvs_ii.pdfLisa Yan, CS109, 2020 Consider an experiment: 𝑛independent

Nested 2 - web.stanford.eduweb.stanford.edu/class/cs106a/lectures/18-Nested... · Piech, CS106A, Stanford University Nested 2 Chris Piech + Mehran Sahami CS106A, Stanford University

Stanford Universitystanford.edu/~cpiech/cs221/handouts/docs/Practice... · 2013. 8. 20. · Author: Chris Piech Created Date: 7/22/2013 12:02:17 PM

Will Monroe July 14, 2017 Mehran Sahami and Chris Piech€¦ · Mehran Sahami and Chris Piech. Announcements: Problem Set 3 (election prediction) Posted yesterday on the course website.

16: Continuous Joint Distributions - Stanford Universityweb.stanford.edu/class/archive/cs/cs109/cs109.1208/...hits at (456.2344132343, 532.1865739012)? The CS109 logo was created by

Animation - web.stanford.eduweb.stanford.edu/.../cs106a.1186/lectures/9-Animation/9-Animation.… · Animation Chris Piech CS106A, Stanford University This is Method Man. He is part

04: Conditional Probability and Bayes › class › archive › cs › cs109 › cs...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Conditional Probability + Chain Rule 04a_conditional

Animation...If you don’t pause, humans won’t be able to see it Piech + Sahami, CS106A, Stanford University Move To Center def main(): # setup canvas = Canvas(CANVAS_WIDTH, CANVAS_HEIGHT)

Lecture 18: Central Limit Theorem - Stanford Universityweb.stanford.edu/class/archive/cs/cs109/cs109.1188/lectures/18_clt.pdfLecture 18: Central Limit Theorem Lisa Yan August 6, 2018.

Expressions - Stanford University · 2020. 6. 29. · Expressions Chris Gregg Based on slides by Chris Piech and Mehran Sahami CS106A, Stanford University

Informatics Education Using Nothing But a Browsercpiech/bio/papers/nothingButA...Informatics Education Using Nothing But a Browser Chris Piech, piech@cs.stanford.edu Department of

Networks in Math & Computer Sciencesnyder/cs109/CS109.MathLect03.GraphIntroduction.pdf · 9/14/15 1 Computer Science Computer Science Networks in Math & Computer Science CS/MA 109

The Poisson distribution - Stanford University · The Poisson distribution Will Monroe July 12, 2017 with materials by Mehran Sahami and Chris Piech

02: Combinatorics - Stanford University€¦ · 02: Combinatorics (live) Lisa Yan April 8, 2020 37. Lisa Yan, CS109, 2020 Reminders: Lecture with •Turn on your camera if you are

Search Engines - web.stanford.edu · Search Engines Chris Piech and Mehran Sahami CS106A, Stanford University. Piech + Sahami, CS106A, Stanford University Housekeeping •Assignment

More Lists - Stanford Universityweb.stanford.edu/class/cs106a/lectures/13-More... · Piech + Sahami, CS106A, Stanford University Diagnostic Assessment Median = 47 Mean = 44.6 Std.

Primitives, Objects, and Heap/Stackweb.stanford.edu/class/archive/cs/cs106a/cs106a.1184/lectures/25... · Primitives, Objects, and Heap/Stack 1 Review 1 Chris Piech CS106A, Stanford

18: Central Limit Theoremweb.stanford.edu/class/archive/cs/cs109/cs109.1202/... · 2019. 11. 13. · Lisa Yan, CS109, 2019 Medicinal Beta •Before being tested, a medicine is believed

07: Variance, Bernoulli, Binomial - Stanford Universityweb.stanford.edu/class/cs109/lectures/07_bernoulli... · 2020. 10. 14. · Lisa Yan and Jerry Cain, CS109, 2020 Consider an