12 S241 Expectation for Multivariate Distributions

Expectation

for multivariate distributions

Definition

Let X1, X2, …, Xn denote n jointly distributed random variable with joint density function

f(x1, x2, …, xn )

then 1, , nE g X X

1 1 1, , , , , ,n n ng x x f x x dx dx

ExampleLet X, Y, Z denote 3 jointly distributed random variable with joint density function then

2127 0 1,0 1,0 1

, ,0 otherwise

x yz x y zf x y z

Determine E[XYZ].

E XYZ xyz x yz dxdydz Solution:

11 1 1 14 2

2 2 2 2

0 0 0 00

12 3 27 4 2 7

x xyz y z dydz yz y z dydz

x yz xy z dxdydz

11 12 32 2

3 3 1 227 2 3 7 2 3

y yz z dz z z dz

3 2 3 1 2 3 17 177 4 9 7 4 9 7 36 84

Some Rules for Expectation

1 11. , ,i i n nE X x f x x dx dx

i i i ix f x dx

Thus you can calculate E[Xi] either from the joint distribution of

X1, … , Xn or the marginal distribution of Xi. Proof: 1 1, , , ,i n nx f x x dx dx

1 1 1 1, ,i n i i n ix f x x dx dx dx dx dx

i i i ix f x dx

1 1 1 12. n n n nE a X a X a E X a E X

The Linearity property

Proof:

1 1 1 1, ,n n n na x a x f x x dx dx

1 1 1 1, , n na x f x x dx dx

1 1, ,n n n na x f x x dx dx

1 1, , , ,q q kE g X X h X X

In the simple case when k = 2

3. (The Multiplicative property) Suppose X1, … , Xq

are independent of Xq+1, … , Xk then

1 1, , , ,q q kE g X X E h X X

E XY E X E Y

if X and Y are independent

1 1, , , ,q q kE g X X h X X

Proof:

1 1 1 1, , , , , ,q q k k ng x x h x x f x x dx dx

1 1, , , ,q q kE g X X E h X X

1 1 1 1, , , , , ,q q k qg x x h x x f x x

2 1 1 1, ,q k q q kf x x dx dx dx dx

1 1 1 1, , q q q kf x x dx dx dx dx

1 2 1 1, , , , , ,q k q k qh x x f x x g x x

1 1, , , ,q q kE g X X E h X X

1 2 1 1, , , ,q k q k q kh x x f x x dx dx

1, , qE g X X

Some Rules for Variance

2 2 2Var X XX E X E X

1. Var Var Var 2Cov ,X Y X Y X Y

where Cov , = X YX Y E X Y

2Var X YX Y E X Y

where X Y X YE X Y

2Var X YX Y E X Y

2 22X X Y YE X X Y Y Var 2Cov , VarX X Y Y

and Var Var VarX Y X Y

Note: If X and Y are independent, then

Cov , = X YX Y E X Y

= X YE X E Y

= 0X YE X E Y

2 2and Var 2 X Y XY X YX Y

Definition: For any two random variables X and Y then define the correlation coefficient XY to be:

Cov , Cov ,=

Var Varxy

X Y X Y

Thus Cov , = XY X YX Y

2 2X Y

Properties of the correlation coefficient XY

Cov , Cov ,=

Var Varxy

X Y X Y

If and are independent than 0.XYX Y

: Cov , 0 X Y Reason

The converse is not necessarily true.i.e. XY = 0 does not imply that X and Y are independent.

More properties of the correlation coefficient XY

if there exists a and b such thatand 1XY

1P Y bX a

whereXY = +1 if b > 0 and XY = -1 if b< 0

Proof: Let and . X YU X V Y

Let 2 0 g b E V bU for all b.

Consider choosing b to minimize

Since g(b) ≥ 0, then g(bmin) ≥ 0

2 g b E V bU

Consider choosing b to minimize

2 2 22 E V bVU b U 2 2 22 E V bE VU b E U

22 2 0 g b E VU bE U

E VUb b

Hence g(bmin) ≥ 0

2 2 2min min min2 g b E V b E VU b E U

2E VU E VU

E V E VUE U E U

E VE U

Hence 2

E U E V

E X E Y

2 2 2min min min2 g b E V b E VU b E U

2min 0E V b U

If and only if2 1XY

This will be true if min 0 1P V b U

i.e. min 0 1Y XP Y b X min min1 where Y XP Y b X a a b

Summary1 1XY

1P Y bX a

E X Yb b

minand YY X Y XY X

Cov ,= =

VarXY X Y Y

2 22. Var Var Var 2 Cov ,aX bY a X b Y ab X Y

2Var aX bYaX bY E aX bY

with aX bY X YE aX bY a b

2Var X YaX bY E aX bY a b

2 22 22X X Y YE a X ab X Y b Y 2 2Var 2 Cov , Vara X ab X Y b Y

1 13. Var n na X a X

2 21 1Var Varn na X a X

1 2 1 2 1 12 Cov , 2 Cov ,n na a X X a a X X

1 12 Cov ,n n n na a X X

Var 2 Cov ,n

i i i j i ji

a X a a X X

Var if , , are mutually independentn

i i ni

a X X X

Some Applications (Rules of Expectation & Variance)

Let 11

1 1 1n

X X X Xn n n

Let X1, … , Xn be n mutually independent random variables each having mean and standard deviation (variance 2).

1 1 n na X a X

Then 11 1

nX E X E X E Xn n

1 1n n

or X n

1 1nX Var X Var X Var X

2 22 21 1

and X X n Thus

Hence the distribution of is centered at and becomes more and more compact about as n increases

Tchebychev’s Inequality

Tchebychev’s InequalityLet X denote a random variable with

mean =E(X) and variance Var(X) = E[(X – )2] = 2

Note:Is called the standard deviation of X,

11P X kk

11P k X kk

2Var X E X

Proof:

dxxfxXVar 22)(

kdxxfx 2

kdxxfxdxxfx 22

kdxxfkdxxfk 2222

kdxxfx 2

kXPkXPk 22

kdxxkfdxxfk 22

kXPk 22

kXPk 222 Thus

11 andk

Tchebychev’s inequality is very conservative

•k =1

•k = 2

•k = 3

kXkPkXP

0111 2 XPXP

211222 2 XPXP

311333 2 XPXP

The Law of Large Numbers

Let X1, … , Xn be n mutually independent random variables each having mean

Then for any > 0 (no matter how small)

1 as P X P X n

11X X X XP k X kk

and X X n Now

We will use Tchebychev’s inequality which states for any random variable X.

where or X

nk k kn

kXkP XX

11 1 1 P Xk n

1 as P X n

Thus the Law of Large Numbers states

ˆ 1 as P p p p n

A Special caseLet X1, … , Xn be n mutually independent random variables each having Bernoulli distribution with parameter p

1 if repetition is (prob )0 if repetition is (prob 1 )i

iE X p

1 ˆ proportion of successesnX XX pn

Thus the Law of Large Numbers states that

Some people misinterpret this to mean that if the proportion of successes is currently lower that p then the proportion of successes in the future will have to be larger than p to counter this and ensure that the Law of Large numbers holds true.Of course if in the infinite future the proportion of successes is p than this is enough to ensure that the Law of Large numbers holds true.

ˆ proportion of successesp

converges to the probability of success p

Some more applications

Rules of expectation and Rules of Variance

The mean and varianceof a Binomial Random variable

We have already computed this by other methods:

1. Using the probability function p(x).2. Using the moment generating function mX(t).

Suppose that we have observed n independent repetitions of a Bernoulli trialLet X1, … , Xn be n mutually independent random variables each having Bernoulli distribution with parameter pand defined by

1 if repetition is (prob )0 if repetition is (prob )i

Now X = X1 + … + Xn has a Binomial distribution with parameters n and pX is the total number of successes in the n repetitions.

1 0iE X p q p

1X nE X E X p p np

2 22 1 0iVar X p p p q pq

21var varX nX X pq pq npq

The mean and varianceof a Hypergeometric distribution

The hypergeometric distribution arises when we sample with replacement n objects from a population of N = a + b objects. The population is divided into to groups (group A and group B). Group A contains a objects while group B contains b objects

Let X denote the number of objects in the sample of n that come from group A. The probability function of X is:

a bx n x

p xa b

Let X1, … , Xn be n random variables defined by

1 if object selected comes from group 0 if object selected comes from group

1 nX X X

1 and 0i ia bP X P X

a b a b

1 !1 1 !

a b ni

a P a b n aP Xa bP a b

Therefore

1 1 0 0 i i iaE X P X P X

2 2 21 1 0 0 i i iaE X P X P X

22var - i i ia aX E X E X

a b a b

1- a a a ba b a b a b a b

1 nE X E X X

var ia bX

a b a b

1Var Var nX X X

Var 2 Cov ,n

i i ji

We need to also calculate Cov ,i jX X

Note: Cov , U VU V E U V U V U VE UV V U

U V V U U VE UV

U VE UV E UV E U E V

and iaE X

Thus Cov ,i j i j i jX X E X X E X E X

1 1 0 0i j i j i jE X X P X X P X X 1 1, 1i j i jP X X P X X

11, 1 a b n

i ja b n

a a PP X X

2 2 ! 1

a ba a

a b n a aa b a b a b

Cov ,i j i j i jX X E X X E X E X

a aE X X

a b a b

a a aa b a b a b

a a b a a baa b a b a b

a b a b

2var i

a b abXa b a b a b

1Var Var nX X X

Var 2 Cov ,n

i i ji

and 2Cov ,

abX Xa b a b

Var Var 2 Cov ,n

i i ji

X X X X

n nab abna b a b a b

Var Var 2 Cov ,n

i i ji

X X X X

n nab abna b a b a b

a ba b

1A Bnp p f

1 1where , and 1 1A B

a b n np p fa b a b a b N

Thus if X has a hypergeometric distribution with parameters a, b and n then

Var 1A BX np p f

1 1where , and 1 1A B

a b n np p fa b a b a b N

AaE X n np

The mean and varianceof a Negative Binomial distribution

The Negative Binomial distribution arises when we repeat a Bernoulli trial until k successes (S) occur. Then X = the trial on which the kth success occurred.

The probability function of X is:

1 , 1, 2,...

1k x kx

p x p q x k k kk

Let X1= the number of trial on which the 1st success occurred.

and Xi = the number of trials after the (i -1)st success on which the ith success occurred (i ≥ 2)

Xi each have a geometric distribution with parameter p.

Then X = X1 + … + Xk

and X1, … , Xk are mutually independent

1thus and Vari iqE X X

hence k

kE X E Xp

and Var Vark

kqX Xp

Thus if X has a negative binomial distribution with parameters k and p then

2Var kqXp

Multivariate Moments

Non-central and Central

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint moment of (X1, X2) of order (k1, k2) is defined to be:

1 2 1 2k k

k k E X X

1 2 1 2 1 2

1 2 1 2 1 2 1 2-

, if , are discrete

, if , are continuous

x x p x x X X

x x f x x dx dx X X

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint central moment of (X1, X2) of order (k1, k2) is defined to be:

0, 1 1 2 2

k kk k E X X

1 1 2 2 1 2 1 2

1 1 2 2 1 2 1 2 1 2-

, if , are discrete

x x p x x X X

x x f x x dx dx X X

where 1 = E [X1] and 2 = E [X2]

01,1 1 1 2 2 1 2 Cov ,E X X X X

= the covariance of X1 and X2.

Cov , Cov ,=

Var Varxy

X Y X Y

Properties of the correlation coefficient XY

Cov , Cov ,=

Var Varxy

X Y X Y

If and are independent than 0.XYX Y

: Cov , 0 X Y Reason

The converse is not necessarily true.i.e. XY = 0 does not imply that X and Y are independent.

More properties of the correlation coefficient

1P Y bX a

whereXY = +1 if b > 0 and XY = -1 if b< 0

Some Rules for Expectation

1 11. , ,i i n nE X x f x x dx dx

i i i ix f x dx

Thus you can calculate E[Xi] either from the joint distribution of

X1, … , Xn or the marginal distribution of Xi.

1 1 1 12. n n n nE a X a X a E X a E X

The Linearity property

1 1, , , ,q q kE g X X h X X

In the simple case when k = 2

3. (The Multiplicative property) Suppose X1, … , Xq

are independent of Xq+1, … , Xk then

1 1, , , ,q q kE g X X E h X X

E XY E X E Y

Some Rules for Variance

2 2 2Var X XX E X E X

1. Var Var Var 2Cov ,X Y X Y X Y

where Cov , = X YX Y E X Y

and Var Var VarX Y X Y

Note: If X and Y are independent, then

Cov , = X YX Y E X Y

= X YE X E Y

= 0X YE X E Y

2 2and Var 2 X Y XY X YX Y

Cov , Cov ,=

Var Varxy

X Y X Y

Thus Cov , = XY X YX Y

2 2X Y

2 22. Var Var Var 2 Cov ,aX bY a X b Y ab X Y

2Var aX bYaX bY E aX bY

with aX bY X YE aX bY a b

2Var X YaX bY E aX bY a b

2 22 22X X Y YE a X ab X Y b Y 2 2Var 2 Cov , Vara X ab X Y b Y

1 13. Var n na X a X

2 21 1Var Varn na X a X

1 12 Cov ,n n n na a X X

Var 2 Cov ,n

i i i j i ji

a X a a X X

Var if , , are mutually independentn

i i ni

a X X X

Distribution functions, Moments,

Moment generating functions in the Multivariate case

The distribution function F(x)

This is defined for any random variable, X.

F(x) = P[X ≤ x]

Properties

1. F(-∞) = 0 and F(∞) = 1.

2. F(x) is non-decreasing(i. e. if x1 < x2 then F(x1) ≤ F(x2) )

3. F(b) – F(a) = P[a < X ≤ b].

4. Discrete Random Variables

F(x) is a non-decreasing step function with

F x P X x p u

jump in at .p x F x F x F x x

0 and 1F F

-1 0 1 2 3 4

5. Continuous Random Variables Variables

F(x) is a non-decreasing continuous function with

F x P X x f u du

.f x F x

0 and 1F F

f(x) slope

-1 0 1 2x

To find the probability density function, f(x), one first finds F(x) then .f x F x

The joint distribution function F(x1, x2, …, xk)

is defined for k random variables, X1, X2, … , Xk.

F(x1, x2, … , xk) = P[ X1 ≤ x1, X2 ≤ x2 , … , Xk ≤ xk ]

for k = 2

F(x1, x2) = P[ X1 ≤ x1, X2 ≤ x2]

(x1, x2)

Properties

1. F(x1 , -∞) = F(-∞ , x2) = F(-∞ , -∞) = 0

2. F(x1 , ∞) = P[ X1 ≤ x1, X2 ≤ ∞] = P[ X1 ≤ x1] = F1 (x1) = the marginal cumulative distribution

function of X1

F(∞, ∞) = P[ X1 ≤ ∞, X2 ≤ ∞] = 1

= the marginal cumulative distribution function of X2

F(∞, x2) = P[ X1 ≤ ∞, X2 ≤ x2] = P[ X2 ≤ x2] = F2 (x2)

3. F(x1, x2 ) is non-decreasing in both the x1 direction and the x2 direction.

i.e. if a1 < b1 if a2 < b2 then

i. F(a1, x2) ≤ F(b1 , x2)

ii. F(x1, a2) ≤ F(x1 , b2)

iii. F( a1, a2) ≤ F(b1 , b2) (b1, b2)

(b1, a2)(a1, a2)

(a1, b2)x2

4. P[a < X1 ≤ b, c < X2 ≤ d] =

F(b,d) – F(a,d) – F(b,c) + F(a,c).

(b, d)

(b, c)(a, c)

(a, d)

4. Discrete Random Variables

F(x1, x2) is a step surface

2 2 1 1

1 2 1 1 2 2 1 2, , ,u x u x

F x x P X x X x p u u

1 2 1 2 1 2, jump in , at , .p x x F x x x x

(x1, x2)

5. Continuous Random Variables

F(x1, x2) is a surface

1 2 1 1 2 2 1 2 1 2, , ,x x

F x x P X x X x f u u du du

2 21 2 1 2

1 21 2 2 12

F x x F x xf x x

x x x x

(x1, x2)

Multivariate Moments

Non-central and Central

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint moment of (X1, X2) of order (k1, k2) is defined to be:

1 2 1 2k k

k k E X X

1 2 1 2 1 2

1 2 1 2 1 2 1 2-

, if , are discrete

x x p x x X X

x x f x x dx dx X X

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint central moment of (X1, X2) of order (k1, k2) is defined to be:

0, 1 1 2 2

k kk k E X X

1 1 2 2 1 2 1 2

1 1 2 2 1 2 1 2 1 2-

, if , are discrete

x x p x x X X

x x f x x dx dx X X

where 1 = E [X1] and 2 = E [X2]

01,1 1 1 2 2 1 2 Cov ,E X X X X

= the covariance of X1 and X2.

Multivariate Moment Generating functions

Recall

The moment generating function

if is discrete

if is continuous

e p x X

m t E ee f x dx X

DefinitionLet X1, X2, … Xk be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:

1 , , 1, , k k

t X t XX X km t t E e

1 1 1-

, , if , , are discrete

, , if , , are continuous

t x t xk k

t x t xk k k

e p x x X X

e f x x dx dx X X

DefinitionLet X1, X2, … Xk be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:

1 , , 1, , k k

t X t XX X km t t E e

1 1 1-

, , if , , are discrete

, , if , , are continuous

t x t xk k

t x t xk k k

e p x x X X

e f x x dx dx X X

1 , ,: 0, ,0 1

kX Xm Note

0, , , 0k iX X X

m t m t

Power Series expansion the joint moment generating function (k = 2)

, , tX sY tX sYX Ym t s E e E e e

using 12! 3! 4!

u u u ue u

1 12! 2!

tX sYE tX sY

2 22 21

2! 2! ! !

k mk mt s t sE Xt Ys X XYts Y X Y

1,0 0,1 2,0 1,1 2,0 ,12! 2! ! !

k mt s t st s ts

2,0 0,2 ,2 21,0 0,1 1,11

2! 2! ! !k m k mt s t ts s t s

12 S241 Expectation for Multivariate Distributions

Documents

Transcript of 12 S241 Expectation for Multivariate Distributions

Introduction to Probability Theory - math.usask.calaverty/S241/S241 Lectures PDF/02 S241... · Rule 2 Suppose we carry out two operations in sequence Let n 1 = the number of ways

Rules of Probability - University of Saskatchewanlaverty/S241/S241 Lectures PDF/03... · 2011-05-11 · Example: The birthday problem In a room of n unrelated individuals, what is

Methoden der Psychologie Multivariate Analysemethoden Günter Meinhardt Johannes Gutenberg Universität Mainz Multivariate Distanz – Multivariate Normalverteilung.

Expectation & Experience

Praise for Expectation Hangover - Christine Hasslerchristinehassler.com/wp-content/uploads/2015/05/Expectation... · expectation hangover-text.indd 1 7/15/14 4:12 pm. expectation

Mathematical Expectation

Expectation & Variance 1 Expectation

MULTIVARIATE SPATIAL STATISTICS - sct.uab.catsct.uab.cat/estadistica/sites/sct.uab.cat.estadistica/files/Dmayer... · MULTIVARIATE SPATIAL STATISTICS ... using multivariate geostatistics

Multivariate quantiles and multivariate L-moments · The existence of the r-th L-moment for all rif the expectation of the underlying random variable is ﬁnite A distribution is

Introduction to Probability Theorymath.usask.ca/~laverty/S241/S241 Lectures PDF/02 S241 Counting.pdfRule 1 Suppose we have a collection of sets A 1, A 2, A 3, … and that any pair

Discrete Random Variables - University of Saskatchewanlaverty/S241/S241 Lectures PDF/07 S241 Dis… · Discrete random variables For a discrete random variable X the probability distribution

Expectation Investing

Multivariate Mx Exercise D Posthuma Files: \\danielle\Multivariate.

Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data.

Discrete distributionsmath.usask.ca/~laverty/S241/S241 Lectures PDF/09 S241 Expectatio… · X = the number of successes in n ... p = the probability of success-0.02 0.04 0.06 0.08

Students’ expectation

13 S241 Functions of Random Variables

Discrete Random Variables - University of Saskatchewanmath.usask.ca/~laverty/S241/S241 Lectures PDF/07 S241... · 2011-05-11 · Discrete random variables For a discrete random variable

Conditional Expectation

Great Expectation