12 S241 Expectation for Multivariate Distributions

Post on 18-Jul-2016

238 views 1 download

description

wew

Transcript of 12 S241 Expectation for Multivariate Distributions

Expectation

for multivariate distributions

Definition

Let X1, X2, …, Xn denote n jointly distributed random variable with joint density function

f(x1, x2, …, xn )

then 1, , nE g X X

1 1 1, , , , , ,n n ng x x f x x dx dx

ExampleLet X, Y, Z denote 3 jointly distributed random variable with joint density function then

2127 0 1,0 1,0 1

, ,0 otherwise

x yz x y zf x y z

Determine E[XYZ].

1 1 1

2

0 0 0

127

E XYZ xyz x yz dxdydz Solution:

11 1 1 14 2

2 2 2 2

0 0 0 00

12 3 27 4 2 7

x

x

x xyz y z dydz yz y z dydz

1 1 1

3 2 2

0 0 0

127

x yz xy z dxdydz

11 12 32 2

0 00

3 3 1 227 2 3 7 2 3

y

y

y yz z dz z z dz

12 3

0

3 2 3 1 2 3 17 177 4 9 7 4 9 7 36 84

z z

Some Rules for Expectation

1 11. , ,i i n nE X x f x x dx dx

i i i ix f x dx

Thus you can calculate E[Xi] either from the joint distribution of

X1, … , Xn or the marginal distribution of Xi. Proof: 1 1, , , ,i n nx f x x dx dx

1 1 1 1, ,i n i i n ix f x x dx dx dx dx dx

i i i ix f x dx

1 1 1 12. n n n nE a X a X a E X a E X

The Linearity property

Proof:

1 1 1 1, ,n n n na x a x f x x dx dx

1 1 1 1, , n na x f x x dx dx

1 1, ,n n n na x f x x dx dx

1 1, , , ,q q kE g X X h X X

In the simple case when k = 2

3. (The Multiplicative property) Suppose X1, … , Xq

are independent of Xq+1, … , Xk then

1 1, , , ,q q kE g X X E h X X

E XY E X E Y

if X and Y are independent

1 1, , , ,q q kE g X X h X X

Proof:

1 1 1 1, , , , , ,q q k k ng x x h x x f x x dx dx

1 1, , , ,q q kE g X X E h X X

1 1 1 1, , , , , ,q q k qg x x h x x f x x

2 1 1 1, ,q k q q kf x x dx dx dx dx

1 1 1 1, , q q q kf x x dx dx dx dx

1 2 1 1, , , , , ,q k q k qh x x f x x g x x

1 1, , , ,q q kE g X X E h X X

1 2 1 1, , , ,q k q k q kh x x f x x dx dx

1, , qE g X X

Some Rules for Variance

2 2 2Var X XX E X E X

1. Var Var Var 2Cov ,X Y X Y X Y

Proof

Thus

where Cov , = X YX Y E X Y

2Var X YX Y E X Y

where X Y X YE X Y

2Var X YX Y E X Y

2 22X X Y YE X X Y Y Var 2Cov , VarX X Y Y

and Var Var VarX Y X Y

Note: If X and Y are independent, then

Cov , = X YX Y E X Y

= X YE X E Y

= 0X YE X E Y

2 2and Var 2 X Y XY X YX Y

Definition: For any two random variables X and Y then define the correlation coefficient XY to be:

Cov , Cov ,=

Var Varxy

X Y

X Y X Y

X Y

Thus Cov , = XY X YX Y

if X and Y are independent

2 2X Y

Properties of the correlation coefficient XY

Cov , Cov ,=

Var Varxy

X Y

X Y X Y

X Y

If and are independent than 0.XYX Y

: Cov , 0 X Y Reason

The converse is not necessarily true.i.e. XY = 0 does not imply that X and Y are independent.

More properties of the correlation coefficient XY

1 1XY

if there exists a and b such thatand 1XY

1P Y bX a

whereXY = +1 if b > 0 and XY = -1 if b< 0

Proof: Let and . X YU X V Y

Let 2 0 g b E V bU for all b.

Consider choosing b to minimize

Since g(b) ≥ 0, then g(bmin) ≥ 0

or

2 g b E V bU

Consider choosing b to minimize

2 2 22 E V bVU b U 2 2 22 E V bE VU b E U

22 2 0 g b E VU bE U

min 2

E VUb b

E U

Hence g(bmin) ≥ 0

2 2 2min min min2 g b E V b E VU b E U

2

22 2

2E VU E VU

E V E VUE U E U

2

22

0E VU

E VE U

Hence 2

2 21

E VU

E U E V

or

2

22 2

1X Y

XY

X Y

E X Y

E X E Y

2 2 2min min min2 g b E V b E VU b E U

2min 0E V b U

Note

If and only if2 1XY

This will be true if min 0 1P V b U

i.e. min 0 1Y XP Y b X min min1 where Y XP Y b X a a b

Summary1 1XY

if there exists a and b such thatand 1XY

1P Y bX a

where

min 2

X X

X

E X Yb b

E X

minand YY X Y XY X

X

a b

2

Cov ,= =

VarXY X Y Y

XYX X

X YX

2 22. Var Var Var 2 Cov ,aX bY a X b Y ab X Y

Proof

Thus

2Var aX bYaX bY E aX bY

with aX bY X YE aX bY a b

2Var X YaX bY E aX bY a b

2 22 22X X Y YE a X ab X Y b Y 2 2Var 2 Cov , Vara X ab X Y b Y

1 13. Var n na X a X

2 21 1Var Varn na X a X

1 2 1 2 1 12 Cov , 2 Cov ,n na a X X a a X X

2 3 2 3 2 22 Cov , 2 Cov ,n na a X X a a X X

1 12 Cov ,n n n na a X X

2

1

Var 2 Cov ,n

i i i j i ji

a X a a X X

i j

21

1

Var if , , are mutually independentn

i i ni

a X X X

Some Applications (Rules of Expectation & Variance)

Let 11

1 1 1n

i ni

X X X Xn n n

Let X1, … , Xn be n mutually independent random variables each having mean and standard deviation (variance 2).

1 1 n na X a X

Then 11 1

nX E X E X E Xn n

1 1n n

Also

or X n

2 2

21

1 1nX Var X Var X Var X

n n

2 22 21 1

n n

2 2

2nn n

and X X n Thus

Hence the distribution of is centered at and becomes more and more compact about as n increases

X

Tchebychev’s Inequality

Tchebychev’s InequalityLet X denote a random variable with

mean =E(X) and variance Var(X) = E[(X – )2] = 2

then

Note:Is called the standard deviation of X,

2

11P X kk

2

11P k X kk

2Var X E X

Proof:

dxxfxXVar 22)(

kdxxfx 2

k

k

kdxxfxdxxfx 22

k

kdxxfkdxxfk 2222

kdxxfx 2

kdxxfx 2

kXPkXPk 22

k

kdxxkfdxxfk 22

kXPk 22

kXPk 222 Thus

2

1or k

kXP

2

11 andk

kXP

Tchebychev’s inequality is very conservative

•k =1

•k = 2

•k = 3

2

11k

kXkPkXP

0111 2 XPXP

43

211222 2 XPXP

98

311333 2 XPXP

The Law of Large Numbers

The Law of Large Numbers

Let1

1 n

ii

X Xn

Let X1, … , Xn be n mutually independent random variables each having mean

Then for any > 0 (no matter how small)

1 as P X P X n

Proof

2

11X X X XP k X kk

and X X n Now

We will use Tchebychev’s inequality which states for any random variable X.

P X

where or X

nk k kn

2

11k

kXkP XX

as n

Thus

Thus

2 2

11 1 1 P Xk n

1 as P X n

Thus the Law of Large Numbers states

ˆ 1 as P p p p n

A Special caseLet X1, … , Xn be n mutually independent random variables each having Bernoulli distribution with parameter p

1 if repetition is (prob )0 if repetition is (prob 1 )i

pX

q p

SF

iE X p

1 ˆ proportion of successesnX XX pn

Thus the Law of Large Numbers states that

as n

Some people misinterpret this to mean that if the proportion of successes is currently lower that p then the proportion of successes in the future will have to be larger than p to counter this and ensure that the Law of Large numbers holds true.Of course if in the infinite future the proportion of successes is p than this is enough to ensure that the Law of Large numbers holds true.

ˆ proportion of successesp

converges to the probability of success p

Some more applications

Rules of expectation and Rules of Variance

The mean and varianceof a Binomial Random variable

We have already computed this by other methods:

1. Using the probability function p(x).2. Using the moment generating function mX(t).

Suppose that we have observed n independent repetitions of a Bernoulli trialLet X1, … , Xn be n mutually independent random variables each having Bernoulli distribution with parameter pand defined by

1 if repetition is (prob )0 if repetition is (prob )i

i pX

i q

SF

Now X = X1 + … + Xn has a Binomial distribution with parameters n and pX is the total number of successes in the n repetitions.

1 0iE X p q p

1X nE X E X p p np

2 22 1 0iVar X p p p q pq

21var varX nX X pq pq npq

The mean and varianceof a Hypergeometric distribution

The hypergeometric distribution arises when we sample with replacement n objects from a population of N = a + b objects. The population is divided into to groups (group A and group B). Group A contains a objects while group B contains b objects

Let X denote the number of objects in the sample of n that come from group A. The probability function of X is:

a bx n x

p xa b

n

Then

Let X1, … , Xn be n random variables defined by

1 if object selected comes from group 0 if object selected comes from group

th

i th

i AX

i B

1 nX X X

1 and 0i ia bP X P X

a b a b

Proof

1 1

1 !1 1 !

1 !

!

a b ni

a b n

a ba

a P a b n aP Xa bP a b

a b n

and

Therefore

1 1 0 0 i i iaE X P X P X

a b

2 2 21 1 0 0 i i iaE X P X P X

a b

2

22var - i i ia aX E X E X

a b a b

1- a a a ba b a b a b a b

Thus

bna b

1 nE X E X X

1

n

ii

E X

and

Also

var ia bX

a b a b

1Var Var nX X X

1

Var 2 Cov ,n

i i ji

X X X

We need to also calculate Cov ,i jX X

Note: Cov , U VU V E U V U V U VE UV V U

U V V U U VE UV

U VE UV E UV E U E V

and iaE X

a b

Thus Cov ,i j i j i jX X E X X E X E X

Note:

1 1 0 0i j i j i jE X X P X X P X X 1 1, 1i j i jP X X P X X

2 2

2

11, 1 a b n

i ja b n

a a PP X X

P

2 !1

2 2 ! 1

! 1!

a ba a

a b n a aa b a b a b

a b n

and

Thus

Cov ,i j i j i jX X E X X E X E X

11i j

a aE X X

a b a b

211

a a aa b a b a b

11

a a aa b a b a b

1 11

a a b a a baa b a b a b

21ab

a b a b

with

Thus

2var i

a b abXa b a b a b

1Var Var nX X X

1

Var 2 Cov ,n

i i ji

X X X

and 2Cov ,

1i j

abX Xa b a b

1

Var Var 2 Cov ,n

i i ji

X X X X

2 2

12

2 1

n nab abna b a b a b

i j

i j

Thus

1

Var Var 2 Cov ,n

i i ji

X X X X

2 2

12

2 1

n nab abna b a b a b

i j

2

11

1nabn

a ba b

1A Bnp p f

1 1where , and 1 1A B

a b n np p fa b a b a b N

Thus if X has a hypergeometric distribution with parameters a, b and n then

Var 1A BX np p f

1 1where , and 1 1A B

a b n np p fa b a b a b N

AaE X n np

a b

The mean and varianceof a Negative Binomial distribution

The Negative Binomial distribution arises when we repeat a Bernoulli trial until k successes (S) occur. Then X = the trial on which the kth success occurred.

The probability function of X is:

1 , 1, 2,...

1k x kx

p x p q x k k kk

Let X1= the number of trial on which the 1st success occurred.

and Xi = the number of trials after the (i -1)st success on which the ith success occurred (i ≥ 2)

Xi each have a geometric distribution with parameter p.

Then X = X1 + … + Xk

and X1, … , Xk are mutually independent

2

1thus and Vari iqE X X

p p

1

hence k

ii

kE X E Xp

21

and Var Vark

ii

kqX Xp

Thus if X has a negative binomial distribution with parameters k and p then

2Var kqXp

kE Xp

Multivariate Moments

Non-central and Central

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint moment of (X1, X2) of order (k1, k2) is defined to be:

1 2

1 2 1 2k k

k k E X X

1 2

1 2

1 2

1 2 1 2 1 2

1 2 1 2 1 2 1 2-

, if , are discrete

, if , are continuous

k k

x x

k k

x x p x x X X

x x f x x dx dx X X

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint central moment of (X1, X2) of order (k1, k2) is defined to be:

1 2

1 2

0, 1 1 2 2

k kk k E X X

1 2

1 2

1 2

1 1 2 2 1 2 1 2

1 1 2 2 1 2 1 2 1 2-

, if , are discrete

, if , are continuous

k k

x x

k k

x x p x x X X

x x f x x dx dx X X

where 1 = E [X1] and 2 = E [X2]

Note

01,1 1 1 2 2 1 2 Cov ,E X X X X

= the covariance of X1 and X2.

Definition: For any two random variables X and Y then define the correlation coefficient XY to be:

Cov , Cov ,=

Var Varxy

X Y

X Y X Y

X Y

Properties of the correlation coefficient XY

Cov , Cov ,=

Var Varxy

X Y

X Y X Y

X Y

If and are independent than 0.XYX Y

: Cov , 0 X Y Reason

The converse is not necessarily true.i.e. XY = 0 does not imply that X and Y are independent.

More properties of the correlation coefficient

1 1XY

if there exists a and b such thatand 1XY

1P Y bX a

whereXY = +1 if b > 0 and XY = -1 if b< 0

Some Rules for Expectation

1 11. , ,i i n nE X x f x x dx dx

i i i ix f x dx

Thus you can calculate E[Xi] either from the joint distribution of

X1, … , Xn or the marginal distribution of Xi.

1 1 1 12. n n n nE a X a X a E X a E X

The Linearity property

1 1, , , ,q q kE g X X h X X

In the simple case when k = 2

3. (The Multiplicative property) Suppose X1, … , Xq

are independent of Xq+1, … , Xk then

1 1, , , ,q q kE g X X E h X X

E XY E X E Y

if X and Y are independent

Some Rules for Variance

2 2 2Var X XX E X E X

1. Var Var Var 2Cov ,X Y X Y X Y

where Cov , = X YX Y E X Y

and Var Var VarX Y X Y

Note: If X and Y are independent, then

Cov , = X YX Y E X Y

= X YE X E Y

= 0X YE X E Y

2 2and Var 2 X Y XY X YX Y

Definition: For any two random variables X and Y then define the correlation coefficient XY to be:

Cov , Cov ,=

Var Varxy

X Y

X Y X Y

X Y

Thus Cov , = XY X YX Y

if X and Y are independent

2 2X Y

2 22. Var Var Var 2 Cov ,aX bY a X b Y ab X Y

Proof

Thus

2Var aX bYaX bY E aX bY

with aX bY X YE aX bY a b

2Var X YaX bY E aX bY a b

2 22 22X X Y YE a X ab X Y b Y 2 2Var 2 Cov , Vara X ab X Y b Y

1 13. Var n na X a X

2 21 1Var Varn na X a X

1 2 1 2 1 12 Cov , 2 Cov ,n na a X X a a X X

2 3 2 3 2 22 Cov , 2 Cov ,n na a X X a a X X

1 12 Cov ,n n n na a X X

2

1

Var 2 Cov ,n

i i i j i ji

a X a a X X

i j

21

1

Var if , , are mutually independentn

i i ni

a X X X

Distribution functions, Moments,

Moment generating functions in the Multivariate case

The distribution function F(x)

This is defined for any random variable, X.

F(x) = P[X ≤ x]

Properties

1. F(-∞) = 0 and F(∞) = 1.

2. F(x) is non-decreasing(i. e. if x1 < x2 then F(x1) ≤ F(x2) )

3. F(b) – F(a) = P[a < X ≤ b].

4. Discrete Random Variables

F(x) is a non-decreasing step function with

u x

F x P X x p u

jump in at .p x F x F x F x x

0 and 1F F

0

0.2

0.4

0.6

0.8

1

1.2

-1 0 1 2 3 4

F(x)

p(x)

5. Continuous Random Variables Variables

F(x) is a non-decreasing continuous function with

x

F x P X x f u du

.f x F x

0 and 1F F

F(x)

f(x) slope

0

1

-1 0 1 2x

To find the probability density function, f(x), one first finds F(x) then .f x F x

The joint distribution function F(x1, x2, …, xk)

is defined for k random variables, X1, X2, … , Xk.

F(x1, x2, … , xk) = P[ X1 ≤ x1, X2 ≤ x2 , … , Xk ≤ xk ]

for k = 2

F(x1, x2) = P[ X1 ≤ x1, X2 ≤ x2]

(x1, x2)

x1

x2

Properties

1. F(x1 , -∞) = F(-∞ , x2) = F(-∞ , -∞) = 0

2. F(x1 , ∞) = P[ X1 ≤ x1, X2 ≤ ∞] = P[ X1 ≤ x1] = F1 (x1) = the marginal cumulative distribution

function of X1

F(∞, ∞) = P[ X1 ≤ ∞, X2 ≤ ∞] = 1

= the marginal cumulative distribution function of X2

F(∞, x2) = P[ X1 ≤ ∞, X2 ≤ x2] = P[ X2 ≤ x2] = F2 (x2)

3. F(x1, x2 ) is non-decreasing in both the x1 direction and the x2 direction.

i.e. if a1 < b1 if a2 < b2 then

i. F(a1, x2) ≤ F(b1 , x2)

ii. F(x1, a2) ≤ F(x1 , b2)

iii. F( a1, a2) ≤ F(b1 , b2) (b1, b2)

x1

(b1, a2)(a1, a2)

(a1, b2)x2

4. P[a < X1 ≤ b, c < X2 ≤ d] =

F(b,d) – F(a,d) – F(b,c) + F(a,c).

(b, d)

x1

(b, c)(a, c)

(a, d)

x2

4. Discrete Random Variables

F(x1, x2) is a step surface

2 2 1 1

1 2 1 1 2 2 1 2, , ,u x u x

F x x P X x X x p u u

1 2 1 2 1 2, jump in , at , .p x x F x x x x

(x1, x2)

x1

x2

5. Continuous Random Variables

F(x1, x2) is a surface

1 1

1 2 1 1 2 2 1 2 1 2, , ,x x

F x x P X x X x f u u du du

2 21 2 1 2

1 21 2 2 12

, ,,

F x x F x xf x x

x x x x

(x1, x2)

x1

x2

Multivariate Moments

Non-central and Central

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint moment of (X1, X2) of order (k1, k2) is defined to be:

1 2

1 2 1 2k k

k k E X X

1 2

1 2

1 2

1 2 1 2 1 2

1 2 1 2 1 2 1 2-

, if , are discrete

, if , are continuous

k k

x x

k k

x x p x x X X

x x f x x dx dx X X

DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint central moment of (X1, X2) of order (k1, k2) is defined to be:

1 2

1 2

0, 1 1 2 2

k kk k E X X

1 2

1 2

1 2

1 1 2 2 1 2 1 2

1 1 2 2 1 2 1 2 1 2-

, if , are discrete

, if , are continuous

k k

x x

k k

x x p x x X X

x x f x x dx dx X X

where 1 = E [X1] and 2 = E [X2]

Note

01,1 1 1 2 2 1 2 Cov ,E X X X X

= the covariance of X1 and X2.

Multivariate Moment Generating functions

Recall

The moment generating function

if is discrete

if is continuous

tx

xtX

Xtx

e p x X

m t E ee f x dx X

DefinitionLet X1, X2, … Xk be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:

1 1

1 , , 1, , k k

k

t X t XX X km t t E e

1 1

1

1 1

1 1

1 1 1-

, , if , , are discrete

, , if , , are continuous

k k

k

k k

t x t xk k

x x

t x t xk k k

e p x x X X

e f x x dx dx X X

DefinitionLet X1, X2, … Xk be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:

1 1

1 , , 1, , k k

k

t X t XX X km t t E e

1 1

1

1 1

1 1

1 1 1-

, , if , , are discrete

, , if , , are continuous

k k

k

k k

t x t xk k

x x

t x t xk k k

e p x x X X

e f x x dx dx X X

1 , ,: 0, ,0 1

kX Xm Note

1 , ,

0, , , 0k iX X X

i

m t m t

Power Series expansion the joint moment generating function (k = 2)

, , tX sY tX sYX Ym t s E e E e e

2 3 4

using 12! 3! 4!

u u u ue u

2 2

1 12! 2!

tX sYE tX sY

2 22 21

2! 2! ! !

k mk mt s t sE Xt Ys X XYts Y X Y

k m

2 2

1,0 0,1 2,0 1,1 2,0 ,12! 2! ! !

k m

k mt s t st s ts

k m

2,0 0,2 ,2 21,0 0,1 1,11

2! 2! ! !k m k mt s t ts s t s

k m