Lecture Week 9 Joint / Multivariate Probability Dists

31
Lecture Week 9 Joint / Multivariate Probability Dists Multivariate poss and ass probs 27/11/2012 ST2004 Wk 9 2012 1

Transcript of Lecture Week 9 Joint / Multivariate Probability Dists

Page 1: Lecture Week 9 Joint / Multivariate Probability Dists

Lecture Week 9 Joint / Multivariate Probability Dists

Multivariate poss and ass probs

27/11/2012 ST2004 Wk 9 2012 1

Page 2: Lecture Week 9 Joint / Multivariate Probability Dists

Joint Probability Distributions

Random Variable Y a name

Two lists: (i) poss values y (ii) ass. probs

Tabulated, or Defined by a Formula

Discrete or Continuous

Expected Value of Y (or a function g(●) of Y)

Weighted Avg of poss values y (or g(y))

Summation ∑ or Integral ∫

27/11/2012 ST2004 Wk 9 2012 2

Wk8 Y scalar. Wk9 Y multivar. Mini-league EXCEL

In this course we only consider discrete, tabulated, multivariate prob dists. The most important multivar dist is the multivariate Normal; see Tijms 3rd Ed Ch 12

Page 3: Lecture Week 9 Joint / Multivariate Probability Dists

Multivariate Prob Dist • Marginal, Joint, Conditional Prob Dists

– Discrete, pmf

– Illustration Mini League

• Probs of Composite Events – Probability rules

• (Conditional) Exp Val and Vars – Covariance, Correlation

• Application – Optimal Prediction

27/11/2012 ST2004 Wk 9 2012 3

Page 4: Lecture Week 9 Joint / Multivariate Probability Dists

Winners

(i)(ii)(iii)

ABA 0.060 0.056 210 0.060 0.056

ABC 0.230 0.224 201 0.061 0.084

ACA 0.061 0.084 120 0.027 0.024

ACC 0.367 0.336 111 0.257 0.260

BBA 0.027 0.024 102 0.367 0.336

BBC 0.091 0.096 021 0.091 0.096

BCA 0.027 0.036 012 0.137 0.144

BCC 0.137 0.144

1 1 1 1

Prob dist of games won

A 0.121 0.140 by 0 1 2

B 0.118 0.120 A 0.240 0.620 0.140 1

C 0.504 0.480 B 0.420 0.460 0.120 1

#N/A 0.257 0.260 C 0.080 0.440 0.480 1

1 1

Outright

Winner

Rel

Freq Prob

Rel

Freq ProbPoints

for ABC

Rel

Freq Prob

Mini-League Multivariate Dists 3 teams play each other once

Simulation; 1000 reps

(i) AB 0.7 0.3

(ii) BC 0.4 0.6

(iii) AC 0.2 0.8

Game Prob of Winning

Pr(A winner)

Pr(B winner)

Pr(A winner)

Exp Val Var

A 0.9 0.37

B 0.7 0.45

C 1.4 0.4

27/11/2012 ST2004 Wk 9 2012 4

Using eg

Pr Winners of ( ),( ),( ) are A,B,A, resp

=Pr A wins ( ) B wins ( ) A wins ( )

Pr A wins ( ) Pr B wins ( ) Pr A wins ( )

0.7 0.4 0.2

i ii iii

i AND ii AND iii

i ii iii

Simulation Theory

Page 5: Lecture Week 9 Joint / Multivariate Probability Dists

Mini-League Multivariate Dists 3 teams play each other once

Simulation; 1000 reps

(i) AB 0.7 0.3

(ii) BC 0.4 0.6

(iii) AC 0.2 0.8

Game Prob of Winning

Pr(A winner)

Pr(B winner)

Pr(A winner)

Exp Val Var

A 0.9 0.37

B 0.7 0.45

C 1.4 0.4

Winners

(i)(ii)(iii)

ABA 0.060 0.056 210 0.060 0.056

ABC 0.230 0.224 201 0.061 0.084

ACA 0.061 0.084 120 0.027 0.024

ACC 0.367 0.336 111 0.257 0.260

BBA 0.027 0.024 102 0.367 0.336

BBC 0.091 0.096 021 0.091 0.096

BCA 0.027 0.036 012 0.137 0.144

BCC 0.137 0.144

1 1 1 1

Prob dist of games won

A 0.121 0.140 by 0 1 2

B 0.118 0.120 A 0.240 0.620 0.140 1

C 0.504 0.480 B 0.420 0.460 0.120 1

#N/A 0.257 0.260 C 0.080 0.440 0.480 1

1 1

Outright

Winner

Rel

Freq Prob

Rel

Freq ProbPoints

for ABC

Rel

Freq Prob

Challenge; obtain directly. Eg; what must the true iff NA=1?

27/11/2012 ST2004 Wk 9 2012 5

Answers

Page 6: Lecture Week 9 Joint / Multivariate Probability Dists

Probabilities and Events Events

Elementary A wins in match (i) T/F

Composite A wins league T/F

NANBNC = 021 T/F

Probabilities Value in (0,1) for status (T/F) of event

Represents uncertainty given information Pr(A wins league, given probs for each match, and indep)

Pr(A wins league, given that somebody wins, and probs, and indep)

Pr(A wins league, given no info, other than probs, and indep)

27/11/2012 ST2004 Wk 9 2012 6

Shorthand for ( (A wins 0) AND (B wins 2) AND (C wins 1) )

Page 7: Lecture Week 9 Joint / Multivariate Probability Dists

Composite Events

• Defining Events

X = (NA=0) where NA= games won by A

• Decomposing Composite Events

X = (NA=0) ≡ (A loses (i)) AND (A loses (iii))

X = (Team A tops the table)

Recursion for K teams

Score from k dice

Can’t simulate? Events not defined

Event Identity. Boolean operator

27/11/2012 ST2004 Wk 9 2012 7

Page 8: Lecture Week 9 Joint / Multivariate Probability Dists

Dice Recursion - explanation Event Identity

(Sk=r) ≡ (Sk-1=r-1)AND(Diek=1)OR…

OR(Sk-1=r-6)AND(Diek=6)

Pr(Sk=r) = Pr( (Sk-1=r-1)AND(Diek=1) )+ …

+Pr( (Sk-1=r-6)AND(Diek=6) )

=

=

= ( Pr(Sk-1=r-1) + … Pr(Sk-1=r-6) )/6

27/11/2012 ST2004 Wk 9 2012 8

Page 9: Lecture Week 9 Joint / Multivariate Probability Dists

27/11/2012 ST2004 Wk 9 2012

1 1 2

, generic / events 0 Pr( ),Pr( ) 1

Pr Pr Pr Pr

Pr Pr | Pr Pr | Pr

? Pr Pr

Pr 0 ,

; Pr 1

,

Special cases

given Y True

k k

X Y T F X Y

X Y X Y X AND Y

X Y X Y Y Y X X

better X Y

X AND Y mut exclusive disjoint

If X X exhaustive ie X OR X OR X

and mut excl th

O

n

D

R

AN

e

1 2Pr Pr Pr 1

Pr | Pr

kX X X

X Y X independent

Probability Rules; more than one event

9

Page 10: Lecture Week 9 Joint / Multivariate Probability Dists

Mini-League Event Identities

Events (A OR B tops table) leads to Pr (A OR B tops table) (A OR B wins league) leads to….. (C does not top table) Events for (Marg) Prob dist NA) (Joint) Prob dist of NA,NC

27/11/2012 ST2004 Wk 9 2012 10

Page 11: Lecture Week 9 Joint / Multivariate Probability Dists

Mini-League Prob Rules using Event Identities

Probs from explicit Event Identities Pr(A OR B tops table) Pr(A OR B wins league) Pr(C does not top table) (Marg) Prob dist NA

(Joint) Prob dist of NA,NC

Prob dist of games won

by 0 1 2

A 0.240 0.620 0.140 1

B 0.420 0.460 0.120 1

C 0.080 0.440 0.480 1

210 0.056

201 0.084

120 0.024

111 0.260

102 0.336

021 0.096

012 0.144

Points

for ABC Prob

27/11/2012 ST2004 Wk 9 2012 11

Page 12: Lecture Week 9 Joint / Multivariate Probability Dists

Probs from explicit Event Identities for more elementary events Pr(A OR B tops table)=Pr(210)+Pr(201)+Pr(111)+Pr(120)+Pr(021) =Pr(A tops) + Pr(B tops) – Pr(both top) Pr(A OR B wins league outright)=Pr(210)+Pr(201)+Pr(120)+Pr(021) Pr(C does not top table)=Pr(A OR B tops table)=1-Pr(C tops table) (Marg) Prob dist NA Pr(NA =0) Pr(NA =1) Pr(NA =2)

=Pr(021)+Pr(012) =0.240 0.620 0.140 (Joint) Prob dist of NA,NC eg Pr(NA=1,NC =1) =Pr(111)=0.260

Prob dist of games won

by 0 1 2

A 0.240 0.620 0.140 1

B 0.420 0.460 0.120 1

C 0.080 0.440 0.480 1

210 0.056

201 0.084

120 0.024

111 0.260

102 0.336

021 0.096

012 0.144

Points

for ABC Prob

27/11/2012 ST2004 Wk 9 2012 12

Answers In all cases several alternative constructions poss Mini-League

Prob Rules using Event Identities

Page 13: Lecture Week 9 Joint / Multivariate Probability Dists

Mini-League Joint Probabilites

Knowing that , conditional on, C wins 1 game (Marg) Prob dist of NA

(Joint) Prob dist of NA,NC

27/11/2012 ST2004 Wk 9 2012 13

‘jitter’ added as away of showing that the 9 red dots do not carry equal probabilities

Cov= -0.63

0 1 2

0 0.000 0.144 0.096 0.24

A 1 0.336 0.260 0.024 0.62

2 0.084 0.056 0.000 0.14

0.42 0.46 0.12 1

BDist AB

points

Answers

0 1 2

0 0.000 0.144 0.096

Dist AB

points

B

Page 14: Lecture Week 9 Joint / Multivariate Probability Dists

Knowing that , conditional on, C wins 0 game (Marg) Prob dist of NA

(Joint) Prob dist of NA,NB

0 0 1 2

0 0 0 0 0

A 1 0 0 27 27

2 0 60 0 60

0 60 27 87

B

Dist AB pts

given C=counts

27/11/2012 ST2004 Wk 9 2012 14

Frequencies Note in sample of 1000 sims, 87 have C winning 0 Of these 27 have NA= 1, NB= 2 Rel Freq of (NA=1,NB=2,NC=0)=27/1000 Of cases where NC=0 Rel Freq of (NA=1,NB=2)=27/87

Mini-League Conditional Probability Dists

Page 15: Lecture Week 9 Joint / Multivariate Probability Dists

Knowing that , conditional on, C wins at least 1 game (Marg) Prob dist of NA

(Joint) Prob dist of NA,NB

Winners

(i)(ii)(iii)

ABA 0.060 0.056

ABC 0.230 0.224

ACA 0.061 0.084

ACC 0.367 0.336

BBA 0.027 0.024

BBC 0.091 0.096

BCA 0.027 0.036

BCC 0.137 0.144

Rel

Freq Prob

27/11/2012 ST2004 Wk 9 2012 16

Mini-League Conditional Probability Dists

Page 16: Lecture Week 9 Joint / Multivariate Probability Dists

27/11/2012

#N/A

0 1 2

0 0.00 0.16 0.10 0.26 E[NA|C>0] 0.83

A 1 0.37 0.28 0.00 0.65 Var[NA|C>0] 0.32

2 0.09 0.00 0.00 0.09

0.46 0.44 0.10 1 E[NANB] 0.28

E[NB|C>0] 0.65 Cov[NA,NB|NC>0] -0.26

Var[NB|C>0] 0.44

Dist AB pts

given C>0

Pr(C>0)=

B

Knowing only elementary probs Prob dist of NA

Prob dist of NA,NB

Knowing additionally that C wins at least one game Prob dist of NA

Prob dist of NA,NB

0.9

0.37

, 0.21

| 0 0.83

| 0 0.32

, | 0 0.26

A

A

A B

A C

A C

A B C

E N

Var N

Cov N N

E N N

Var N N

Cov N N N

ST2004 Wk 9 2012 18

Cov= -0.21

0 1 2

0 0.000 0.144 0.096 0.24

A 1 0.336 0.260 0.024 0.62

2 0.084 0.056 0.000 0.14

0.42 0.46 0.12 1

Dist AB

points

B

Mini-League Expected Values Variance, Covariance

Page 17: Lecture Week 9 Joint / Multivariate Probability Dists

Exp Val, Variance for Sums

27/11/2012 ST2004 Wk 9 2012 20

,( ) ( )

2 ,

all x y

A B

A B

E X Y PossVals x y Pr X Y x y

E X E Y

Var X Y Var X Var Y Cov X Y

E N N

Var N N

2 2 2

More general

... ...

...

2 , 2 , ...

E aX bY cZ aE X bE Y cE Y

Var aX bY cZ a Var X b Var Y c Var Z

abCov X Y acCov X Z

Theory Tijms, Ch 9

Page 18: Lecture Week 9 Joint / Multivariate Probability Dists

Expected Values Directly via Sums

0.9

0.3

A

A

E N

Var N

27/11/2012 ST2004 Wk 9 2012 21

#N/A

0 1 2

0 0.00 0.16 0.10 0.26 E[NA|C>0] 0.83

A 1 0.37 0.28 0.00 0.65 Var[NA|C>0] 0.32

2 0.09 0.00 0.00 0.09

0.46 0.44 0.10 1 E[NA,NB] 0.28

E[NB|C>0] 0.65 Cov[NA,NB|NC>0] -0.26

Var[NB|C>0] 0.44

Dist AB pts

given C>0

Pr(C>0)=

B

#wins against B + # wins again

Alternative Simpler Calculatio

st C

0 / 1

0 1

0.3 0.7

0.7; 0.3 0.7 0.21

0.2

n

;

A

A plays B A plays C

A plays B

A plays B A plays B

A plays C A play

N

I I Binary vars

Dist I

Poss

Prob

E I Var I

E I Var I

0.2 0.8 0.16

0.7 0.2

0.21 0.16

s B

A A plays B A plays C

A A plays B A plays C

E N E I I

Var N Var I I

0.3 0.4

0.3 0.7 0.4 0.6

0.6 0.8

0.6 0.4 0.8 0.2

B

B

C

C

E N

Var N

E N

Var N

Theory follows

Page 19: Lecture Week 9 Joint / Multivariate Probability Dists

Knowing additionally that C wins at least one game Cond Prob dist of NA

Cond Prob dist of NA,NB

| 0 0.83

| 0 0.32

, | 0 0.26

A C

A C

A B C

E N N

Var N N

Cov N N N

27/11/2012 ST2004 Wk 9 2012 23

Mini-League Conditional Expectations

Answers

Joint Bivariate Conditional Prob Dists

0.920

0 1 2

0 0.000 0.157 0.104 0.261 E[NA|C>0] 0.83

A 1 0.365 0.283 0.000 0.648 Var[NA|C>0] 0.32

2 0.091 0.000 0.000 0.091

0.457 0.439 0.104 1

E[NB|C>0] 0.65 Cov[NA,NB|NC>0] -0.26

Var[NB|C>0] 0.44

Dist AB pts

given

Pr(C0)=

Pr(C0)=

B

Page 20: Lecture Week 9 Joint / Multivariate Probability Dists

Conditionally Decomposing Expectation

2

Roll Die

Roll another die = score

2,4 Roll 2 dice max

6 Toss coin : 2; : 6

1 1 1 1 13.5 (4.47) 2 6

2 3 6 2 2

3.907

Odd Y

Y

Head Y Tail Y

E Y

E Y

Var Y

27/11/2012 ST2004 Wk 9 2012 24

Page 21: Lecture Week 9 Joint / Multivariate Probability Dists

Conditionally Decomposing Expectation Simulation

27/11/2012 ST2004 Wk 9 2012 25

choose by dice roll

1

score

one die

2

max

two die

3

(2,6)eq

prob Roll

(1,3,5)

(2,4) (6)

=>option

Hence

Y Option 1 2 3

1 1 3 6 6 3 6 probs 0.5 0.333 0.167

2 6 2 6 1 1 6 Avg 3.48 4.48 3.98

3 4 6 2 4 2 6 Avg Y Avg of Sq 15.01 22.01 19.87

4 3 2 2 4 2 2 3.89 3.90

5 3 2 2 2 2 2 Avg Y2

6 5 5 2 3 1 5 18.14 18.15

7 5 3 2 1 1 5 Var Y

8 5 5 6 2 2 5 3.00 Hence Var 2.97

9 3 5 6 1 1 3

10 4 2 2 6 3 2

9999 4 5 6 4 2 5

10000 1 6 6 6 3 6

options

Wted Avg Options

Wted Avg (Sq Options)

Page 22: Lecture Week 9 Joint / Multivariate Probability Dists

Conditionally Decomposing Expectation Theory

Pr( ) Pr( , )

Pr( | ) Pr( )

| Pr( )

x x y

y x

y

E X x X x x X x Y y

x X x Y y Y y

E X Y y Y y

27/11/2012 ST2004 Wk 9 2012 26

0 1 2 NA 0 1 2

0 0.000 0.144 0.096 0.24 0 0 0.6 0.4 1 1.40 0.336

NA 1 0.336 0.260 0.024 0.62 1 0.542 0.419 0.039 1 0.50 0.308

2 0.084 0.056 0.000 0.14 2 0.6 0.4 0 1 0.40 0.056

Marg Dist B 0.42 0.46 0.12 1

E[NB] 0.7 0.7

ExpVal

NB

given

NA

NB

Mult

by

Pr(NA)

Cond dists given

NA

Jt Dist AB

points

NB

Page 23: Lecture Week 9 Joint / Multivariate Probability Dists

Conditionally Decomposing Expectation

1 1 12 3 6

2 1 1 12 3 6

Roll Die

Roll another die = score

2,4 Roll 2 dice max

6 Toss coin : 2; : 6

3.5 (4.47) 4

3.907

? (?) ?

Odd Y

Y

Head Y Tail Y

E Y

E Y

Var Y

27/11/2012 ST2004 Wk 9 2012 27

13

13

13

Choose from A, B, C with equal prob

number of games won by random team

prob

prob

prob

[ ]

[ ]

A

B

c

N

N N

N N

N N

E N

Var N

Challenge: project

Answers

Page 24: Lecture Week 9 Joint / Multivariate Probability Dists

Prediction : Theory

27/11/2012 ST2004 Wk 9 2012 28

Know C wins 1 game; what to predict for NA?

Page 25: Lecture Week 9 Joint / Multivariate Probability Dists

Optimal prediction

Know C wins 1 game; what to predict for NA?

• With Conditional Dist for NA given NC =1

– Most (cond) probable NA cond mode

– Conditional Exp Val NA min pred var

• With Exp Vals, Vars, Cov

– Best Linear Predictor least sq regression

27/11/2012 ST2004 Wk 9 2012 29

Pred NA = -0.40 + 1.46(NC)

Page 26: Lecture Week 9 Joint / Multivariate Probability Dists

Optimal pred: Cond Dist of X|Y=y

Pred by Most Prob, given NC =1 , predict as 1.0

Pred by Cond Exp Val , given NC =1 predict as 0.97

Theory: (Cond) Exp Val is ‘Min Sq Error Predictor’

All available info about X captured by cond prob dist, ie given Y

‘Min Sq Error Predictor’ is pred value that

minimises E[ Sq Error of Prediction]

(i) AB 0.7

(ii) BC 0.4

(iii) AC 0.2

Game Prob of Winning

Pr(A winner)

Pr(B winner)

Pr(A winner)

0.440

0 1 2 Given C = 1

0 0.000 0.000 0.218 0.218 0 1 2

A 1 0.000 0.591 0.000 0.591 0.22 0.59 0.19 1

2 0.191 0.000 0.000 0.191

0.1909 0.5909 0.2182 1 Exp Val 0.97

Var 0.41

Dist

NA

BDist AB pts

given C=1

Pr(C=1)=

27/11/2012 ST2004 Wk 9 2012 30

Page 27: Lecture Week 9 Joint / Multivariate Probability Dists

Min Sq Error Predictor

2

2

22

2

Predict random variable by some (specific) to be chosen

E is Squared Error -

Seek that that minimises E -

E - E - - -

-

= is that value of wit

X X

X X X

X X X

X X X X

Var X X

X X

2

h min E - X X

Theory general Not limited to discrete dists

27/11/2012 ST2004 Wk 9 2012 31

Prob dist of [ ] best

Given [ | ] best

X E X

Info E X Info

Page 28: Lecture Week 9 Joint / Multivariate Probability Dists

Best Lin Pred from Corr( X,Y) given that Y = y

Often prediction based on corr

not on cond dist.

Best Linear Pred

linear fn of obs value y of Y

chosen to min E[ Sq Error of Pred]

Exp Val Var

A 0.9 0.37

B 0.7 0.45

C 1.4 0.4

Th Corr

A B C

A 1 -0.515 -0.416

B -0.515 1 -0.566

C -0.416 -0.566 1

Cov= -0.21

0 1 2

0 0.000 0.144 0.096 0.24

A 1 0.336 0.260 0.024 0.62

2 0.084 0.056 0.000 0.14

0.42 0.46 0.12 1

Dist AB

points

B

confirm

27/11/2012 ST2004 Wk 9 2012 32

Jt Prob Dist leads to Cond Dist & Corr

Corr lead to Codoes not nd Dist

Page 29: Lecture Week 9 Joint / Multivariate Probability Dists

Best Linear Predictor

2 2

2 2

2

2

[ , ]; ; ; ; [ , ]

Seek , (and thus ) that minimises E -

-

Predict by some =

related random variable; to be chosen

X X Y Y

X Y

X Y

Cov X YE X Var X E Y Var Y Corr X Y

a b X X X

E X X

X X b Y equiv X a bY

Y b

2

2 22

22 2 2

2

- - 2 - - -

- 2 , 1

is that value of with min -

=

X Y

X X Y Y

X Y X

X

Y

X XX Y X Y

Y Y

E X b Y

E X b X Y b Y

Var X bCov X Y b Var Y b

b b E X X

X Y

X

Y

Y

Cf generic least-squares regression

Theory general Not limited to discrete dists

27/11/2012 ST2004 Wk 9 2012 33

Page 30: Lecture Week 9 Joint / Multivariate Probability Dists

Best Lin Pred from Corr(NA, NC) given that NC = 1 Exp Val Var

A 0.9 0.37

B 0.7 0.45

C 1.4 0.4

Th Corr

A B C

A 1 -0.515 -0.416

B -0.515 1 -0.566

C -0.416 -0.566 1

Pred NA = a + b(1) b =((0.37)/(0.40))0.5(-0.416) =-0.40 a =0.9-(-0.47)(0.4) = 1.46 Pred NA = 1.06 (cf 1.00, 0.97)

Cov= -0.16

0 1 2

0 0.000 0.096 0.144 0.24

A 1 0.024 0.260 0.336 0.62

2 0.056 0.084 0.000 0.14

0.08 0.44 0.48 1

CDist AC

points

27/11/2012 ST2004 Wk 9 2012 34

Page 31: Lecture Week 9 Joint / Multivariate Probability Dists

Homework

(i) AB 0.7

(ii) BC 0.5

(iii) AC 0.7

Game Prob of Winning

Pr(A winner)

Pr(B winner)

Pr(A winner)

• Compute: Joint Prob Dist for NA, NC; Corr [NA, NC] Prob Dist (NA- NC ) Marginal Dists, Exp Val, Var E [NA-NC] Var [NA-NC]

• Law of Cond Expectations Team A prize = (NA)2 if NB > NC

= (NA) else What is E[prize]? • Prediction for NA when NC = 0,1,2

Min Sq Error Pred for NA each NC = 0,1,2 Best Lin Pred of NAC each NC = 0,1,2

Mini League

27/11/2012 ST2004 Wk 9 2012 35

From dist NA-NC & from formulae