ECON509_-_Slides_3_-_2014_11_07

125
ECON509 Probability and Statistics Slides 3 Bilkent This Version: 7 Nov 2014 (Bilkent) ECON509 This Version: 7 Nov 2014 1 / 125

description

Probability and statistics notes

Transcript of ECON509_-_Slides_3_-_2014_11_07

Page 1: ECON509_-_Slides_3_-_2014_11_07

ECON509 Probability and StatisticsSlides 3

Bilkent

This Version: 7 Nov 2014

(Bilkent) ECON509 This Version: 7 Nov 2014 1 / 125

Page 2: ECON509_-_Slides_3_-_2014_11_07

Preliminaries

In this part of the lectures, we will discuss multiple random variables.

This mostly involves extending the concepts we have learned so far to the case ofmore than one variable. This might sound straightforward but, as always, we have tobe careful.

This part is heavily based on Casella & Berger, Chapter 4.

(Bilkent) ECON509 This Version: 7 Nov 2014 2 / 125

Page 3: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

So far, our interest has been on events involving a single random variable only. Inother words, we have only considered univariate models.

Multivariate models, on the other hand, involve more than one variable.

Consider an experiment about health characteristics of the population. Would we beinterested in one characteristic only, say weight? Not really. There are manyimportant characteristics.

Denition (4.1.1): An n-dimensional random vector is a function from a samplespace Ω into Rn , n-dimensional Euclidean space.

Suppose, for example, that with each point in a sample space we associate anordered pair of numbers, that is, a point (x , y ) 2 R2, where R2 denotes the plane.Then, we have dened a two-dimensional (or bivariate) random vector (X ,Y ) .

(Bilkent) ECON509 This Version: 7 Nov 2014 3 / 125

Page 4: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Example (4.1.2): Consider the experiment of tossing two fair dice. The samplespace has 36 equally likely points. For example:

(3, 3) : both dices show a 3,

(4, 1) : rst die shows a 4 and the second die a 1,

etc ... .

Now, let

X = sum of the two dice & Y= jdi¤erence of the two dicej .

Then,

(3, 3) : X = 6 and Y = 0,

(4, 1) : X = 5 and Y = 3,

and so we can dene the bivariate random vector (X ,Y ) thus.

(Bilkent) ECON509 This Version: 7 Nov 2014 4 / 125

Page 5: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

What is, say, P (X = 5 and Y = 3)? One can verify that the two relevant samplepoints in Ω are (4, 1) and (1, 4) . Therefore, the event fX = 5 and Y = 3g will onlyoccur if the event f(4, 1), (1, 4)g occurs. Since each of these sample points in Ω areequally likely,

P (f(4, 1), (1, 4)g) = 236=118.

Thus,

P (X = 5 and Y = 3) =118.

For example, can you see why

P(X = 7,Y 4) = 19

?

This is because the only sample points that yield this event are (4, 3), (3, 4), (5, 2)and (2, 5).

Note that from now on we will use P(event a, event b) rather than P(event a andevent b).

(Bilkent) ECON509 This Version: 7 Nov 2014 5 / 125

Page 6: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Denition (4.1.3): Let (X ,Y ) be a discrete bivariate random vector. Then thefunction f (x , y ) from R2 into R, dened by f (x , y ) = P(X = x ,Y = y ) is calledthe joint probability mass function or joint pmf of (X ,Y ). If it is necessary to stressthe fact that f is the joint pmf of the vector (X ,Y ) rather than some vector, thenotation fX ,Y (x , y ) will be used.

(Bilkent) ECON509 This Version: 7 Nov 2014 6 / 125

Page 7: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

The joint pmf of (X ,Y ) completely denes the probability distribution of therandom vector (X ,Y ), just as in the discrete case.

The value of f (x , y ) for each of 21 possible values is given in the following Table

x2 3 4 5 6 7 8 9 10 11 12

0 136

136

136

136

136

136

1 118

118

118

118

118

y 2 118

118

118

118

3 118

118

118

4 118

118

5 118

Table: Probability table for Example (4.1.2)

The joint pmf is dened for all (x , y ) 2 R2, not just the 21 pairs in the above Table.For any other (x , y ) , f (x , y ) = P(X = x ,Y = y ) = 0.

(Bilkent) ECON509 This Version: 7 Nov 2014 7 / 125

Page 8: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

As before, we can use the joint pmf to calculate the probability of any event denedin terms of (X ,Y ) . For A R2,

P((X ,Y ) 2 A) = ∑fx ,yg2A

f (x , y ).

We could, for example, have A = f(x , y ) : x = 7 and y 4g. Then,

P((X ,Y ) 2 A) = P(X = 7,Y 4) = f (7, 1) + f (7, 3) = 118+118=19.

Expectations are also dealt with in the same way as before. Let g (x , y ) be areal-valued function dened for all possible values (x , y ) of the discrete randomvector (X ,Y ) . Then, g (X ,Y ) is itself a random variable and its expected value is

E [g (X ,Y )] = ∑(x ,y )2R2

g (x , y )f (x , y ).

(Bilkent) ECON509 This Version: 7 Nov 2014 8 / 125

Page 9: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Example (4.1.4): For the (X ,Y ) whose joint pmf is given in the above Table, whatis the expected value of XY ? Letting g (x , y ) = xy , we have

E [XY ] = 2 0 136+ 4 0 1

36+ ...+ 8 4 1

18+ 7 5 1

18= 13

1118.

As before,

E [ag1(X ,Y ) + bg2(X ,Y ) + c ] = aE [g1(X ,Y )] + bE [g2(X ,Y )] + c .

One very useful result is that any nonnegative function from R2 into R that isnonzero for at most a countable number of (x , y ) pairs and sums to 1 is the jointpmf for some bivariate discrete random vector (X ,Y ).

(Bilkent) ECON509 This Version: 7 Nov 2014 9 / 125

Page 10: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Example (4.1.5): Dene f (x , y ) by

f (0, 0) = f (0, 1) = 1/6,

f (1, 0) = f (1, 1) = 1/3,

f (x , y ) = 0 for any other (x , y ) .

Then, f (x , y ) is nonnegative and sums up to one, so f (x , y ) is the joint pmf forsome bivariate random vector (X ,Y ) .

(Bilkent) ECON509 This Version: 7 Nov 2014 10 / 125

Page 11: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Suppose we have a multivariate random variable (X ,Y ) but are concerned with, say,P(X = 2) only.

We know the joint pmf fX ,Y (x , y ) but we need fX (x) in this case.

Theorem (4.1.6): Let (X ,Y ) be a discrete bivariate random vector with joint pmffX ,Y (x , y ). Then the marginal pmfs of X and Y , fX (x) = P(X = x) andfY (y ) = P(Y = y ), are given by

fX (x) = ∑y2R

fX ,Y (x , y ) and fY (y ) = ∑x2R

fX ,Y (x , y ).

Proof: For any x 2 R, let Ax = f(x , y ) : ∞ < y < ∞g. That is, Ax is the line inthe plane with rst coordinate equal to x . Then, for any x 2 R,

fX (x) = P(X = x) = P(X = x ,∞ < Y < ∞)

= P((X ,Y ) 2 Ax ) = ∑(x ,y )2Ax

fX ,Y (x , y )

= ∑y2R

fX ,Y (x , y ).

(Bilkent) ECON509 This Version: 7 Nov 2014 11 / 125

Page 12: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Example (4.1.7): Now we can compute the marginal distribution for X and Y fromthe joint distribution given in the above Table. Then,

fY (0) = fX ,Y (2, 0) + fX ,Y (4, 0) + fX ,Y (6, 0)

+fX ,Y (8, 0) + fX ,Y (10, 0) + fX ,Y (12, 0)

= 1/6.

As an exercise, you can check that,

fY (1) = 5/18, fY (2) = 2/9, fY (3) = 1/6, fY (4) = 1/9, fY (5) = 1/18.

Notice that ∑5y=0 fY (y ) = 1, as expected, since these are the only six possible valuesof Y .

(Bilkent) ECON509 This Version: 7 Nov 2014 12 / 125

Page 13: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Now, it is crucial to understand that the marginal distribution of X and Y , describedby the marginal pmfs fX (x) and fY (y ), do not completely describe the jointdistribution of X and Y .

There are, in fact, many di¤erent joint distributions that have the same marginaldistributions.

The knowledge of the marginal distributions only does not allow us to determine thejoint distribution (except under certain assumptions).

(Bilkent) ECON509 This Version: 7 Nov 2014 13 / 125

Page 14: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Example (4.1.9): Dene a joint pmf by

f (0, 0) = 1/12, f (1, 0) = 5/12, f (0, 1) = f (1, 1) = 3/12,

f (x , y ) = 0 for all other values.

Then,

fY (0) = f (0, 0) + f (1, 0) = 1/2,

fY (1) = f (0, 1) + f (1, 1) = 1/2,

fX (0) = f (0, 0) + f (0, 1) = 1/3,

and fX (1) = f (1, 0) + f (1, 1) = 2/3.

Now consider the marginal pmfs for the distribution considered in Example (4.1.5).

fY (0) = f (0, 0) + f (1, 0) = 1/6+ 1/3 = 1/2,

fY (1) = f (0, 1) + f (1, 1) = 1/6+ 1/3 = 1/2,

fX (0) = f (0, 0) + f (0, 1) = 1/6+ 1/6 = 1/3,

and fX (1) = f (1, 0) + f (1, 1) = 1/3+ 1/3 = 2/3.

We have the same marginal pmfs but the joint distributions are di¤erent!

(Bilkent) ECON509 This Version: 7 Nov 2014 14 / 125

Page 15: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Consider now the corresponding denition for continuous random variables.

Denition (4.1.10): A function f (x , y ) from R2 to R is called a joint probabilitydensity function or joint pdf of the continuous bivariate random vector (X ,Y ) if, forevery A R2,

P((X ,Y ) 2 A) =Z Z

Af (x , y )dxdy .

The notationR R

A means that the integral is evaluated over all (x , y ) 2 A.Naturally, for real valued functions g (x , y ),

E [g (X ,Y )] =Z ∞

Z ∞

∞g (x , y )f (x , y )dxdy .

It is important to realise that the joint pdf is dened for all (x , y ) 2 R2. The pdfmay equal 0 on a large set A if P((X ,Y ) 2 A) = 0 but the pdf is still dened forthe points in A.

(Bilkent) ECON509 This Version: 7 Nov 2014 15 / 125

Page 16: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Again, naturally,

fX (x) =Z ∞

∞f (x , y )dy , ∞ < x < ∞,

fY (y ) =Z ∞

∞f (x , y )dx , ∞ < y < ∞.

As before, a useful result is that any function f (x , y ) satisfying f (x , y ) 0 for all(x , y ) 2 R2 and

1 =Z ∞

Z ∞

∞f (x , y )dxdy ,

is the joint pdf of some continuous bivariate random vector (X ,Y ) .

(Bilkent) ECON509 This Version: 7 Nov 2014 16 / 125

Page 17: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Example (4.1.11): Let

f (x , y ) =

(6xy 2 0 < x < 1 and 0 < y < 10 otherwise

.

Is f (x , y ) a joint pdf? Clearly, f (x , y ) 0 for all (x , y ) in the dened range.Moreover,Z ∞

Z ∞

∞6xy 2dxdy =

Z 1

0

Z 1

06xy 2dxdy =

Z 1

0

3x2y 2

j1x=0dy

=Z 1

03y 2dy = y 3 j1y=0 = 1.

(Bilkent) ECON509 This Version: 7 Nov 2014 17 / 125

Page 18: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Now, consider calculating P(X + Y 1). Let A = f(x , y ) : x + y 1g in whichcase we are interested in calculating P((X ,Y ) 2 A). Now, f (x , y ) = 0 everywhereexcept where 0 < x < 1 and 0 < y < 1. Then, the region we have in mind isbounded by the lines

x = 1, y = 1 and x + y = 1.

Therefore,

A = f(x , y ) : x + y 1, 0 < x < 1, 0 < y < 1g= f(x , y ) : x 1 y , 0 < x < 1, 0 < y < 1g= f(x , y ) : 1 y x < 1, 0 < y < 1g.

(Bilkent) ECON509 This Version: 7 Nov 2014 18 / 125

Page 19: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

Hence,

P(X + Y 1) =Z Z

Af (x , y )dxdy =

Z 1

0

Z 1

1y6xy 2dxdy

=Z 1

0

Z 1

1y3x2y 2 j1x=1y dy =

Z 1

0

h3y 2 3(1 y )2y 2

idy

=Z 1

0

6y 3 3y 4

dy =

64y 4 3

5y 5

1

y=0

=32 35=910.

Moreover,

f (x) =Z ∞

∞f (x , y )dy =

Z 1

06xy 2dy = 2xy 3 j1y=0 = 2x .

Then, for example,

P12< X <

34

=Z 3/4

1/22xdx =

516.

(Bilkent) ECON509 This Version: 7 Nov 2014 19 / 125

Page 20: ECON509_-_Slides_3_-_2014_11_07

Joint and Marginal Distribution

The joint probability distribution of (X ,Y ) can be completely described using thejoinf cdf (cumulative distribution function) rather than with the joint pmf or jointpdf.

The joint cdf is the function F (x , y ) dened by

F (x , y ) = P(X x ,Y y ) for all (x , y ) 2 R2.

Although for discrete random vectors it might not be convenient to use the joint cdf,for continuous random variables, the following relationship makes the joint cdf veryuseful:

F (x , y ) =Z x

Z y

∞f (s , t)dsdt.

From the bivariate Fundamental Theorem of Calculus,

∂2F (x , y )∂x∂y

= f (x , y )

at continuity points of f (x , y ). This relationship is very important.

(Bilkent) ECON509 This Version: 7 Nov 2014 20 / 125

Page 21: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

We have talked a little bit about conditional probabilities before. Now we willconsider conditional distributions.

The idea is the same. If we have some extra information about the sample, we canuse that information to make better inference.

Suppose we are sampling from a population where X is the height (in kgs) and Y isthe weight (in cms). What is P(X > 95)? Would we have a better/more relevantanswer if we knew that the person in question has Y = 202 cms? Usually,P(X > 95jY = 202) is supposed to be much larger than P(X > 95jY = 165).Once we have the joint distribution for (X ,Y ) , we can calculate the conditionaldistributions, as well.

Notice that now we have three distribution concepts: marginal distribution,conditional distribution and joint distribution.

(Bilkent) ECON509 This Version: 7 Nov 2014 21 / 125

Page 22: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Denition (4.2.1): Let (X ,Y ) be a discrete bivariate random vector with joint pmff (x , y ) and marginal pmfs fX (x) and fY (y ). For any x such thatP(X = x) = fX (x) > 0, the conditional pmf of Y given that X = x is the functionof y denoted by f (y jx) and dened by

f (y jx) = P(Y = y jX = x) = f (x , y )fX (x)

.

For any y such that P(Y = y ) = fY (y ) > 0, the conditional pmf of X given thatY = y is the function of x denoted by f (x jy ) and dened by

f (x jy ) = P(X = x jY = y ) = f (x , y )fY (y )

.

Can we verify that, say, f (y jx) is a pmf? First, since f (x , y ) 0 and fX (x) > 0,f (y jx) 0 for every y . Then,

∑yf (y jx) = ∑y f (x , y )

fX (x)=fX (x)fX (x)

= 1.

(Bilkent) ECON509 This Version: 7 Nov 2014 22 / 125

Page 23: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and IndependenceExample (4.2.2): Dene the joint pmf of (X ,Y ) by

f (0, 10) = f (0, 20) =218, f (1, 10) = f (1, 30) =

318,

f (1, 20) =418

and f (2, 30) =418,

while f (x , y ) = 0 for all other combinations of (x , y ).Then,

fX (0) = f (0, 10) + f (0, 20) =418,

fX (1) = f (1, 10) + f (1, 20) + f (1, 30) =1018,

fX (2) = f (2, 30) =418.

Moreover,

f (10j0) =f (0, 10)fX (0)

=2/184/18

=12,

f (20j0) =f (0, 20)fX (0)

=12.

Therefore, given the knowledge that X = 0, Y is equal to either 10 or 20, with equalprobability.

(Bilkent) ECON509 This Version: 7 Nov 2014 23 / 125

Page 24: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

In addition,

f (10j1) = f (30j1) = 3/1810/18

=310,

f (20j1) =4/1810/18

=410,

f (30j2) =4/184/18

= 1.

Interestingly, when X = 2, we know for sure that Y will be equal to 30.

Finally,

P(Y > 10jX = 1) = f (20j1) + f (30j1) = 7/10,

P(Y > 10jX = 0) = f (20j0) = 1/2,

etc ...

(Bilkent) ECON509 This Version: 7 Nov 2014 24 / 125

Page 25: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

The analogous denition for continuous random variables is given next.

Denition (4.2.3): Let (X ,Y ) be a continuous bivariate random vector with jointpdf f (x , y ) and marginal pdfs fX (x) and fY (y ). For any x such that fX (x) > 0, theconditional pdf of Y given that X = x is the function of y denoted by f (y jx) anddened by

f (y jx) = f (x , y )fX (x)

.

For any y such that fY (y ) > 0, the conditional pdf of X given that Y = y is thefunction of x denoted by f (x jy ) and dened by

f (x jy ) = f (x , y )fY (y )

.

Note that for discrete random variables, P(X = x) = fX (x) and P(X = x ,Y = y )= f (x , y ). Then Denition (4.2.1) is actually parallel to the denition ofP(Y = y jX = x) in Denition (1.3.2). The same interpretation is not valid forcontinuous random variables since P(X = x) = 0 for every x . However, replacingpmfs with pdfs leads to Denition (4.2.3).

(Bilkent) ECON509 This Version: 7 Nov 2014 25 / 125

Page 26: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

The conditional expected value of g (Y ) given X = x is given by

E [g (Y )jx ] = ∑yg (y )f (y jx) and E [g (Y )jx ] =

Z ∞

∞g (y )f (y jx)dx ,

in the discrete and continuous cases, respectively.

The conditional expected value has all of the propertes of the usual expected valuelisted in Theorem 2.2.5.

(Bilkent) ECON509 This Version: 7 Nov 2014 26 / 125

Page 27: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Example (2.2.6): Suppose we measure the distance between a random variable Xand a constant b by (X b)2. What is

β = arg minbE [(X b)2 ],

i.e. the best predictor of X in the mean-squared sense?

Now,

E [(X b)2 ] = EfX E [X ] + E [X ] bg2

= E

h(fX E [X ]g+ fE [X ] bg)2

i= E

fX E [X ]g2

+ fE [X ] bg2

+2E (fX E [X ]gfE [X ] bg) ,

whereE (fX E [X ]gfE [X ] bg) = fE [X ] bgE fX E [X ]g = 0.

Then,E [(X b)2 ] = E

fX E [X ]g2

+ fE [X ] bg2 .

(Bilkent) ECON509 This Version: 7 Nov 2014 27 / 125

Page 28: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

We have no control over the rst term, but the second term is always positive, andso, is minimised by setting b = E [X ]. Therefore, b = E [X ] is the value thatminimises the prediction error and is the best predictor of X .

In other words, the expectation of a random variable is its best predictor.

Can you show this for the conditional expectation, as well? See Exercise 4.13 whichyou will be asked to solve as homework.

(Bilkent) ECON509 This Version: 7 Nov 2014 28 / 125

Page 29: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and IndependenceExample (4.2.4): Let (X ,Y ) have joint pdf f (x , y ) = ey , 0 < x < y < ∞.Suppose we wish to compute the conditional pdf of Y given X = x .Now, if x 0, f (x , y ) = 0 8y , so fX (x) = 0.If x > 0, f (x , y ) > 0 only if y > x . Thus

fX (x) =Z ∞

∞f (x , y )dy =

Z ∞

xey dy = ex .

Then, the marginal distribution of X is the exponential distribution.Now, likewise, for any x > 0, f (y jx) can be computed as follows:

f (y jx) = f (x , y )fX (x)

=ey

ex= e(yx ), if y > x ,

and

f (y jx) = f (x , y )fX (x)

=0ex

= 0, if y x .

Thus, given X = x , Y has an exponential distribution, where x is the locationparameter in the distribution of Y and β = 1 is the scale parameter. Notice that theconditional distribution of Y is di¤erent for every value of x . Hence,

E [Y jX = x ] =Z ∞

xye(yx )dy = 1+ x .

(Bilkent) ECON509 This Version: 7 Nov 2014 29 / 125

Page 30: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

This is obtained by integration by parts.

Z ∞

xye(yx )dy = ye(yx )

y=x

+Z ∞

xe(yx )dy

= 0hxe(xx )

i+he(yx )

i ∞

y=x

= x +h0+ e(xx )

i= x + 1.

What about the conditional variance? Using the standard denition,

Var (Y jx) = E [Y 2 jx ] fE [Y jx ]g2.

Hence,

Var (Y jx) =Z ∞

xy 2e(yx )dy

Z ∞

xye(yx )dy

2= 1.

Again, you can obtain the rst part of this result by integration by parts.

What is the implication of this? One can show that the marginal distribution of Y isgamma(2, 1) and so Var (Y ) = 2. Hence, the knowledge that X = x has reduced thevariability of Y by 50%!!!

(Bilkent) ECON509 This Version: 7 Nov 2014 30 / 125

Page 31: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

The conditional distribution of Y given X = x is possibly a di¤erent probabilitydistribution for each value of x .

Thus, we really have a family of probability distributions for Y , one for each x .

When we wish to describe this entire family, we will use the phrase the distributionof Y jX .For example, if X 0 is an integer valued random variable and the conditionaldistribution of Y given X = x is binomial(x , p), then we might say the distributionof Y jX is binomial(X , p) or write Y jX binomial(X , p).In addition, E [g (Y )jx ] is a function of x , as well. Hence, the conditionalexpectation is a random variable whose value depends on the value of X . When wethink of Example 4.2.4, we can then say that

E [Y jX ] = 1+ X .

(Bilkent) ECON509 This Version: 7 Nov 2014 31 / 125

Page 32: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Yet, in some other cases, the conditional distribution might not depend on theconditioning variable.

Say, the conditional distribution of Y given X = x is not di¤erent for di¤erentvalues of x .

In other words, knowledge of the value of X does not provide any more information.

This situation is dened as independence.

Denition (4.2.5): Let (X ,Y ) be a bivariate random vector with joint pdf or pmff (x , y ) and marginal pdfs or pmfs fX (x) and fY (y ). Then X and Y are calledindependent random variables if, for every x 2 R and y 2 R,

f (x , y ) = fX (x)fY (y ). (1)

(Bilkent) ECON509 This Version: 7 Nov 2014 32 / 125

Page 33: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Now, in the case of independence, clearly,

f (y jx) = f (x , y )fX (x)

=fX (x)fY (y )fX (x)

= fY (y ).

We can either start with the joint distribution and check independence for eachpossible value of x and y , or start with the assumption that X and Y areindependent and model the joint distribution accordingly. In this latter direction, oureconomic intuition might have to play an important role.

Would information on the value of X really increase our information about thelikely value of Y ?

(Bilkent) ECON509 This Version: 7 Nov 2014 33 / 125

Page 34: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Example (4.2.6): Consider the discrete bivariate random vector (X ,Y ) , with jointpmf given by

f (10, 1) = f (20, 1) = f (20, 2) = 1/10,

f (10, 2) = f (10, 3) = 1/5 and f (20, 3) = 3/10.

The marginal pmfs are then given by

fX (10) = fX (20) = .5 and fY (1) = .2, fY (2) = .3 and fY (3) = .5.

Now, for example,

f (10, 3) =156= 1212= fX (10)fY (3),

although f (10, 1) =110=1215= fX (10)fY (1).

Do we always have to check all possible pairs, one by one???

(Bilkent) ECON509 This Version: 7 Nov 2014 34 / 125

Page 35: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Lemma (4.2.7): Let (X ,Y ) be a bivariate random vector with joint pdf or pmff (x , y ). Then, X and Y are independent random variables if and only if there existfunctions g (x) and h(y ) such that, for every x 2 R and y 2 R,

f (x , y ) = g (x)h(y ).

Proof: First, we deal with the only if part. Dene

g (x) = fX (x) and h(y ) = fY (y ).

Then, by (1),f (x , y ) = fX (x)fY (y ) = g (x)h(y ).

For the if part, suppose f (x , y ) = g (x)h(y ). DeneZ ∞

∞g (x)dx = c and

Z ∞

∞h(y )dy = d ,

where

cd =

Z ∞

∞g (x)dx

Z ∞

∞h(y )dy

=

Z ∞

Z ∞

∞g (x)h(y )dydx =

Z ∞

Z ∞

∞f (x , y )dydx = 1.

(Bilkent) ECON509 This Version: 7 Nov 2014 35 / 125

Page 36: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Moreover,

fX (x) =Z ∞

∞g (x)h(y )dy = g (x)d and fY (y ) =

Z ∞

∞g (x)h(y )dx = h(y )c .

Then,f (x , y ) = g (x)h(y ) = g (x)h(y ) cd|z

1

= fX (x)fY (y ),

proving that X and Y are independent.

To prove the Lemma for discrete random vectors, replace integrals with summations.

(Bilkent) ECON509 This Version: 7 Nov 2014 36 / 125

Page 37: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

It might not be clear enough at rst sight but this is a powerful result. It impliesthat we do not have to calculate the marginal distributions rst and then checkwhether their product gives the joint distribution.

Instead, it is enough to check whether the joint distribution is equal to the productof some function of x and some function of y , for all values of (x , y ) 2 R2.

Example (4.2.8): Consider the joint pdf

f (x , y ) =1384

x2y 4ey(x/2), x > 0 and y > 0.

If we dene

g (x) = x2ex/2 and h(y ) =y 4ey

384,

for x > 0 and y > 0 and g (x) = h(y ) = 0 otherwise, then, clearly,

f (x , y ) = g (x)h(y )

for all (x , y ) 2 R2. By Lemma (4.2.7), X and Y are independently distributed!

(Bilkent) ECON509 This Version: 7 Nov 2014 37 / 125

Page 38: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and IndependenceExample (4.2.9): Let X be the number of living parents of a student randomlyselected from an elementary school in Kansas city and Y be the number of livingparents of a retiree randomly selected from Sun City. Suppose, furthermore, that wehave

fX (0) = .01, fX (1) = .09, fX (2) = .90,

fY (0) = .70, fY (1) = .25, fY (2) = .05.

It seems reasonable that X and Y will be independent: knowledge of the number ofparents of the student does not give us any information on the number of parents ofthe retiree and vice versa. Therefore, we should have

fX ,Y (x , y ) = fX (x)fY (y ).

Then, for example

fX ,Y (0, 0) = .0070, fX ,Y (0, 1) = .0025,

etc.We can thus calculate quantities such as,

P (X = Y ) = f (0, 0) + f (1, 1) + f (2, 2)

= .01 .70+ .09 .25+ .90 .05 = .0745.

(Bilkent) ECON509 This Version: 7 Nov 2014 38 / 125

Page 39: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Theorem (4.2.10): Let X and Y be independent random variables.1 For any A R and B R, P (X 2 A,Y 2 B ) = P (X 2 A)P (Y 2 B ); that is, theevents fX 2 Ag and fY 2 Bg are independent events.

2 Let g (x ) be a function only of x and h(y ) be a function only of y . Then

E [g (X )h(Y )] = E [g (X )]E [h(Y )].

Proof: Start with (2) and consider the continuous case. Now,

E [g (X )h(Y )] =Z ∞

Z ∞

∞g (x)h(y )f (x , y )dxdy

=Z ∞

Z ∞

∞g (x)h(y )fX (x)fY (y )dxdy

=

Z ∞

∞g (x)fX (x)dx

Z ∞

∞h(y )fY (y )dy

= E [g (X )]E [h(Y )].

As before, the discrete case is proved by replacing integrals with sums.

(Bilkent) ECON509 This Version: 7 Nov 2014 39 / 125

Page 40: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Now, consider (1). Remember that for some set A R, the indicator function IA(x)is given by

IA(x) =

(1 if x 2 A0 otherwise

.

Let some g (x) and h(y ) be the indicator functions of the sets A and B , respectively.Now, g (x)h(y ) is the indicator function of the set C R2 whereC = f(x , y ) : x 2 A, y 2 Bg.Note that for an indicator function such as g (x), E [g (X )] = P(X 2 A), since

P(X 2 A) =Zx2A

f (x)dx =ZIA(x)f (x)dx = E [IA(x)].

Then, using the result we proved on the previous slide,

P(X 2 A,Y 2 B) = P((X ,Y ) 2 C ) = E [g (X )h(Y )]= E [g (X )]E [h(Y )]

= P(X 2 A)P(Y 2 B).

These results make life a lot easier when calculating expectations of certain randomvariables.

(Bilkent) ECON509 This Version: 7 Nov 2014 40 / 125

Page 41: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

Example (4.2.11): Let X and Y be independent exponential(1) random variables.Then, from the previous theorem we have that

P(X 4,Y < 3) = P(X 4)P(Y < 3) = e4(1 e3).

Letting g (x) = x2 and h(y ) = y , we see that

E [X 2Y ] = E [X 2 ]E [Y ] =Var (X ) + fE [X ]g2

E [Y ] =

1+ 12

1 = 2.

The moment generating function is back!Theorem (4.2.12): Let X and Y be independent random variables with momentgenerating functions MX (t) and MY (t). Then the moment generating function ofthe random variable Z = X + Y is given by

MZ (t) = MX (t)MY (t).

Proof: Now, we know from before that MZ (t) = E [etZ ]. Then,

E [etZ ] = E [et(X+Y )] = E [etX etY ] = E [etX ]E [etY ] = MX (t)MY (t),

where we have used the result that for two independent random variables X and Y ,E [g (X )g (Y )] = E [g (X )]E [g (Y )].Of course, if independence does not hold, life gets pretty tough! But we will notdeal with that here.

(Bilkent) ECON509 This Version: 7 Nov 2014 41 / 125

Page 42: ECON509_-_Slides_3_-_2014_11_07

Conditional Distributions and Independence

We will now introduce a special case of Theorem (4.2.12).

Theorem (4.2.14): Let X N(µX , σ2X ) and Y N(µY , σ2Y ) be independentnormal random variables. Then the random variable Z = X + Y has aN(µX + µY , σ

2X + σ2Y ) distribution.

Proof: The mgfs of X and Y are

MX (t) = exp

µX t +

σ2X t2

2

!and MY (t) = exp

µY t +

σ2Y t2

2

!.

Then, from Theorem (4.2.12)

MZ (t) = MX (t)MY (t) = exp

"(µX + µY )t +

(σ2X + σ2Y )t2

2

#,

which is the mgf of a normal random variable with mean µX + µY and varianceσ2X + σ2Y .

(Bilkent) ECON509 This Version: 7 Nov 2014 42 / 125

Page 43: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

We now consider transformations involving two random variables rather than onlyone.

Let (X ,Y ) be a random vector and consider (U ,V ) where

U = g1(X ,Y ) and V = g2(X ,Y ),

for some pre-specied functions g1() and g2().For any B R2, notice that

(U ,V ) 2 B if and only if (X ,Y ) 2 A, A = f(x , y ) : g1(x , y ), g2(x , y ) 2 Bg.

Hence,P ((U ,V ) 2 B) = P ((X ,Y ) 2 A) .

This implies that the probability distribution of (U ,V ) is completely determined bythe probability distribution of (X ,Y ) .

(Bilkent) ECON509 This Version: 7 Nov 2014 43 / 125

Page 44: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Consider the discrete case rst.

If (X ,Y ) is a discrete random vector, then there is only a countable set of values forwhich the joint pmf of (X ,Y ) is positive. Dene this set as A.Moreover, let

B = f(u, v ) : u = g1(x , y ) and v = g2(x , y ) for some (x , y ) 2 Ag.

Therefore, B gives the countable set of all possible values for (U ,V ) .Finally, dene

Auv = f(x , y ) 2 A : g1(x , y ) = u and g2(x , y ) = vg, for any (u, v ) 2 B.

Then, the joint pmf of (U ,V ) is given by

fU ,V (u, v ) = P (U = u,V = v ) = P ((X ,Y ) 2 Auv ) = ∑(x ,y )2Auv

fX ,Y (x , y ).

(Bilkent) ECON509 This Version: 7 Nov 2014 44 / 125

Page 45: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Example (4.3.1): Let X Poisson(θ) and Y Poisson(λ). Due to independence,the joint pmf is given by

fX ,Y (x , y ) =θx eθ

x !λy eλ

y !, x = 0, 1, 2, ... and y = 0, 1, 2, ... .

ObviouslyA = f (x , y ) : x = 0, 1, 2, ... and y = 0, 1, 2, ...g.

DeneU = X + Y and V = Y ,

implying thatg1(x , y ) = x + y and g2(x , y ) = y .

(Bilkent) ECON509 This Version: 7 Nov 2014 45 / 125

Page 46: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

What is B? Since y = v , for any given v ,

u = x + y = x + v .

Hence, u = v , v + 1, v + 2, v + 3, ... . Therefore,

B = f (u, v ) : v = 0, 1, 2, ... and u = v , v + 1, v + 2, v + 3, ...g.

Moreover, for any (u, v ) the only (x , y ) satisfying u = x + y and v = y are given byx = u v and y = v . Therefore, we always have

Auv = (u v , v ).

As such,

fU ,V (u, v ) = ∑(x ,y )2Auv

fX ,Y (x , y ) = fX ,Y (u v , v )

=θuv eθ

(u v )!λv eλ

v !, v = 0, 1, 2, ... and u = v , v + 1, v + 2, ... .

(Bilkent) ECON509 This Version: 7 Nov 2014 46 / 125

Page 47: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

What is the marginal pmf of U?

For any xed non-negative integer u, fU ,V (u, v ) > 0 only for v = 0, 1, ..., u. Then,

fU (u) =u

∑v=0

θuv eθ

(u v )!λv eλ

v != e(θ+λ)

u

∑v=0

θuvλv

(u v )!v !

=e(θ+λ)

u!

u

∑v=0

u!(u v )!v !

θuvλv

=e(θ+λ)

u!

u

∑v=0

uv

θuvλv =

e(θ+λ)

u!(θ + λ)u , u = 0, 1, 2, ... ,

which follows from the binomial formula given by (a+ b)n = ∑ni=0 (nx)a

xbnx .

This is the pmf of a Poisson random variable with parameter θ + λ. A theoremfollows.

Theorem (4.3.2): If X Poisson(θ), Y Poisson(λ) and X and Y areindependent, then X + Y Poisson(θ + λ).

(Bilkent) ECON509 This Version: 7 Nov 2014 47 / 125

Page 48: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Consider the continuous case now.

Let (X ,Y ) fX ,Y (x , y ) be a continuous random vector and

A = f(x , y ) : fX ,Y (x , y ) > 0g,B = f(u, v ) : u = g1(x , y ) and v = g2(x , y ) for some (x , y ) 2 Ag.

The joint pdf fU ,V (u, v ) will be positive on the set B.Assume now that the transformation u = g1(x , y ) and v = g2(x , y ) is a one-to-onetransformation from A onto B.Why onto? Because we have dened B in such a way that any (u, v ) 2 Bcorresponds to some (x , y ) 2 A.We assume that the one-to-one assumption is maintained. That is, we are assumingthat for each (u, v ) 2 B there is only one (x , y ) 2 A such that(u, v ) = (g1(x , y ), g2(x , y )) .

When the transformation is one-to-one and onto, we can solve the equations

u = g1(x , y ) and v = g2(x , y ),

and obtainx = h1(u, v ) and y = h2(u, v ).

(Bilkent) ECON509 This Version: 7 Nov 2014 48 / 125

Page 49: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

The last remaining ingredient is the Jacobian of the transformation. This is thedeterminant of a matrix of partial derivatives.

J =

∂x∂u

∂x∂v

∂y∂u

∂y∂v

= ∂x∂u

∂y∂v ∂x

∂v∂y∂u.

These partial derivatives are given by

∂x∂u=

∂h1 (u, v )∂u

,∂x∂v=

∂h1 (u, v )∂v

,∂y∂u=

∂h2 (u, v )∂u

,∂y∂v=

∂h2 (u, v )∂v

.

Then, the transformation formula is given by

fU ,V (u, v ) = fX ,Y [h1 (u, v ) , h2 (u, v )] jJ j ,

while fU ,V (u, v ) = 0 for (u, v ) /2 B.Clearly, we need J 6= 0 on B.

(Bilkent) ECON509 This Version: 7 Nov 2014 49 / 125

Page 50: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

The next example is based on the beta distribution, which is related to the gammadistribution.

The beta (α, β) pdf is given by

fX (x jα, β) =Γ (α+ β)

Γ (α) Γ (β)xα1 (1 x)β1 ,

where 0 < x < 1, α > 0 and β > 0.

(Bilkent) ECON509 This Version: 7 Nov 2014 50 / 125

Page 51: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Example (4.3.3): Let X beta (α, β), Y beta (α+ β,γ) and X ?? Y .The symbol ?? is used to denote independence.Due to independence, the joint pdf is given by

fX ,Y (x , y ) =Γ (α+ β)

Γ (α) Γ (β)xα1 (1 x)β1 Γ (α+ β+ γ)

Γ (α+ β) Γ (γ)y α+β1 (1 y )γ1 ,

where 0 < x < 1 and 0 < y < 1.

We considerU = XY and V = X .

Lets dene A and B rst.As usual,

A = f(x , y ) : fX ,Y (x , y ) > 0g.

(Bilkent) ECON509 This Version: 7 Nov 2014 51 / 125

Page 52: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Now we know that V = X and the set of possible values for X is 0 < x < 1. Hence,the set of possible values for V is given by 0 < v < 1.

Since U = XY = VY , for any given value of V = v , U will vary between 0 and v , asthe set of possible values for Y is 0 < y < 1. Hence

0 < u < v .

Therefore, the given transformation maps A onto

B = f(u, v ) : 0 < u < v < 1g.

For any (u, v ), we can uniquely solve for (x , y ) . That is,

x = h1 (u, v ) = v and y = h2 (u, v ) =uv.

An aside: Of course, we assume that this transformation is one-to-one on A only.This is not the case on R2 because for any value of y , (0, y ) is mapped into thepoint (0, 0) .

(Bilkent) ECON509 This Version: 7 Nov 2014 52 / 125

Page 53: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Forx = h1 (u, v ) = v and y = h2 (u, v ) =

uv,

we have

J =

∂x∂u

∂x∂v

∂y∂u

∂y∂v

=0 11v u

v 2

= 1v .Then, the transformation formula gives,

fU ,V (u, v ) =Γ (α+ β+ γ)

Γ (α) Γ(β)Γ (γ)v α1(1 v )β1

uv

α+β1 1 u

v

γ1 1v,

where 0 < u < v < 1.

Obviously, since V = X , V beta(α, β). What about U?

(Bilkent) ECON509 This Version: 7 Nov 2014 53 / 125

Page 54: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Dene K = Γ(α+β+γ)Γ(α)Γ(β)Γ(γ) . Then,

fU (u) =Z 1

ufU ,V (u, v )dv

= KZ 1

uv α1(1 v )β1

uv

α+β1 1 u

v

γ1 1vdv

= KZ 1

uuα1(1 v )β1 (u)β u

vvuv α1

1v

α+β1

| z vβ

1 u

v

γ1 1vdv

= Kuα1Z 1

u(1 v )β1(u)β v

u

1v

β

| z (u/v )β1

1 u

v

γ1 1vuv|z

u/v 2

dv

= Kuα1Z 1

u

uv u

β1 1 u

v

γ1 uv 2dv .

(Bilkent) ECON509 This Version: 7 Nov 2014 54 / 125

Page 55: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Let

y =uv u

11 u ,

which implies that

dy = uv 2

11 u dv ,

or, equivalently,

dv = dy v2

u(1 u) .

Now, observe that

y β1 =uv u

β1 11 u

β1,

(1 y )γ1 =

1 u u/v + u

1 u

γ1=

1 u/v1 u

γ1

=1 u

v

γ1 11 u

γ1.

(Bilkent) ECON509 This Version: 7 Nov 2014 55 / 125

Page 56: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Moreover, for v = u, y = 1 and for v = 1, y = 0. Therefore,

fU (u) = Kuα1Z 1

u

uv u

β1

| z y β1(1u)β1

1 u

v

γ1

| z (1y )γ1(1u)γ1

uv 2

dv|zdy v2u (1u)

= Kuα1 (1 u)β+γ1Z 0

1y β1 (1 y )γ1 dy

= Kuα1 (1 u)β+γ1Z 1

0y β1 (1 y )γ1 dy .

Now,y β1 (1 y )γ1

is the kernel for the pdf of Y beta (β,γ)

fY (y jβ,γ) =Γ (β+ γ)

Γ (β) Γ (γ)y β1 (1 y )γ1 , 0 < y < 1, α > 0, β > 0,

and it is straightforward to show thatZ 1

0y β1 (1 y )γ1 = Γ (β) Γ (γ)

Γ (β+ γ).

(Bilkent) ECON509 This Version: 7 Nov 2014 56 / 125

Page 57: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Hence,

fU (u) = Kuα1 (1 u)β+γ1Z 1

0y β1 (1 y )γ1 dy

=Γ (α+ β+ γ)

Γ (α) Γ(β)Γ (γ)Γ (β) Γ (γ)Γ (β+ γ)

uα1 (1 u)β+γ1

=Γ (α+ β+ γ)

Γ (α) Γ (β+ γ)uα1 (1 u)β+γ1 ,

where 0 < u < 1.

This shows that the marginal distribution of U is beta (α, β+ γ) .

(Bilkent) ECON509 This Version: 7 Nov 2014 57 / 125

Page 58: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Example (4.3.4): Let X N(0, 1), Y N(0, 1) and X ?? Y . The transformationis given by

U = g1 (X ,Y ) = X + Y and V = g2 (X ,Y ) = X Y .

Now, the pdf for (X ,Y ) is given by

fX ,Y (x , y ) =1p2π

exp12x2

1p2π

exp12y 2

=12π

exp12

x2 + y 2

,

where ∞ < x < ∞ and ∞ < y < ∞.

Therefore,A = f (x , y ) : fX ,Y (x , y ) > 0g = R

2.

(Bilkent) ECON509 This Version: 7 Nov 2014 58 / 125

Page 59: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

We haveu = x + y and v = x y ,

and to determine B we need to nd out all the values taken on by (u, v ) as wechoose di¤erent (x , y ) 2 A.Thankfully, when these equations are solved for (x , y ) they yield unique solutions:

x = h1 (u, v ) =u + v2

and y = h2 (u, v ) =u v2

.

Now, the reasoning is as follows: A = R2. Moreover, for every (u, v ) 2 B, there is aunique (x , y ) . Therefore, B = R2, as well! The mapping is one-to-one and, bydenition, onto.

(Bilkent) ECON509 This Version: 7 Nov 2014 59 / 125

Page 60: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

The Jacobian is given by

J =

∂x∂u

∂x∂v

∂y∂u

∂y∂v

= 12 1

212 1

2

= 14 14= 1

2.

Therefore,

fU ,V (u, v ) = fX ,Y [h1 (u, v ) , h2 (u, v )] jJ j

=12π

exp

"u + v2

2 12

#exp

"u v2

2 12

#12,

where ∞ < u < ∞ and ∞ < v < ∞.

Rearranging gives,

fU ,V (u, v ) =1p2π

1p2exp

u

2

4

1p2π

1p2exp

v

2

4

Since the joint density factors into a function of u and a function of v , by Lemma(4.2.7), U and V are independent.

(Bilkent) ECON509 This Version: 7 Nov 2014 60 / 125

Page 61: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Remember Theorem (4.2.14).

Theorem (4.2.14): Let X N(µX , σ2X ) and Y N(µY , σ2Y ) be independentnormal random variables. Then the random variable Z = X + Y has aN(µX + µY , σ

2X + σ2Y ) distribution.

Then,U = X + Y N(0, 2).

What about V ? Dene Z = Y and notice that

Z = Y N(0, 1).

Then, by Theorem (4.2.14)

V = X Y = X + Z N(0, 2),

as well.

That the sums and di¤erences of independent normal random variables areindependent normal random variables is true independent of the means of X and Y ,as long as Var (X ) = Var (Y ). See Exercise 4.27.

Note that the more di¢ cult bit here is to prove that U and V are indeedindependent.

(Bilkent) ECON509 This Version: 7 Nov 2014 61 / 125

Page 62: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

Theorem (4.3.5): Let X ?? Y be two random variables. Dene U = g (X ) andV = h(Y ), where g (x) is a function only of x and h(y ) is a function only of y .Then U ?? V .Proof: Consider the proof for continuous random variables U and V . Dene for anyu 2 R and v 2 R,

Au = fx : g (x) ug and Bv = fy : h(y ) vg.

Then,

FU ,V (u, v ) = P (U u,V v )= P (X 2 Au ,Y 2 Bv )= P (X 2 Au )P (Y 2 Bv ) ,

where the last result follows by Theorem (4.2.10). Then,

fU ,V (u, v ) =∂2

∂u∂vFU ,V (u, v )

=

dduP (X 2 Au )

ddvP (Y 2 Bv )

,

and clearly, the joint pdf factorises into a function only of u and a function only ofv . Therefore, by Lemma (4.2.7), U and V are independent.

(Bilkent) ECON509 This Version: 7 Nov 2014 62 / 125

Page 63: ECON509_-_Slides_3_-_2014_11_07

Bivariate Transformations

What if we start with a bivariate random variable (X ,Y ) but are only interested inU = g (X ,Y ) and not (U ,V )?

We can then choose a convenient V = h (X ,Y ) , such that the resultingtransformation from (X ,Y ) to (U ,V ) is one-to-one on A.Then, the joint pdf of (U ,V ) can be calculated as usual and we can obtain themarginal pdf of U from the joint pdf of (U ,V ) .

Which V is convenientwould generally depend on what U is.

(Bilkent) ECON509 This Version: 7 Nov 2014 63 / 125

Page 64: ECON509_-_Slides_3_-_2014_11_07

Bivariate TransformationsWhat if the transformation of interest is not one-to-one?In a similar fashion to Theorem 2.1.8, we can dene

A = f(x , y ) : fX ,Y (x , y ) > 0g,and then divide this into a partition where the transformation is one-to-one on eachblock.Specically, let A0,A1, ...,Ak be a partition of A which satises the followingproperties.

1 P ((X ,Y ) 2 A0) = 0;2 The transformation U = g1(X ,Y ) and V = g2(X ,Y ) is a one-to-one transformationfrom Ai onto B for each i = 1, ..., k .

Then we can calculate the inverse functions from B to Ai for each i . Letx = h1i (u, v ) and y = h2i (u, v ) for the i th partition.

For (u, v ) 2 B, this inverse gives the unique (x , y ) 2 Ai such that(u, v ) = (g1 (x , y ) , g2 (x , y )) .

Let Ji be the Jacobian for the i th inverse. Then,

fU ,V (u, v ) =k

∑i=1

fX ,Y [h1i (u, v ), h2i (u, v )] jJi j

(Bilkent) ECON509 This Version: 7 Nov 2014 64 / 125

Page 65: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

As an introduction to the idea of hierarchical models, consider the following example.

Example (4.4.1): As insect lays a large number of eggs, each surviving withprobability p. On the average, how many eggs will survive?

Often, the large number of eggs is taken to be Poisson (λ) . Now, assume thatsurvival of each egg is an independent event. Therefore, survival of eggs constitutesBernoulli trials.

Lets put this into a statistical framework.

X jY binomial (Y , p) ,

Y Poisson (λ) .

Therefore, survival is a random variable, which in turn is a function of anotherrandom variable.

This framework is convenient in the sense that complex models can be analysed as acollection of simpler models.

(Bilkent) ECON509 This Version: 7 Nov 2014 65 / 125

Page 66: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Lets get back to the original question.

Example (4.4.2): Now,

P(X = x) =∞

∑y=0

P (X = x ,Y = y )

=∞

∑y=0

P (X = x jY = y )P (Y = y )

=∞

∑y=x

yx

px (1 p)yx

eλλy

y !

!,

where the summation in the third line starts from y = x rather than y = 0 since ify < x , the conditional probability should be equal to zero: clearly, the number ofsurviving eggs cannot be larger than the number of those laid.

(Bilkent) ECON509 This Version: 7 Nov 2014 66 / 125

Page 67: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Let t = y x . This gives,

P(X = x) =∞

∑y=x

y !(y x)!x !

px (1 p)yx eλλxλyx

y !

!

=(λp)x eλ

x !

∑y=x

[(1 p)λ]yx

(y x)! =(λp)x eλ

x !

∑t=0

[(1 p)λ]t

t !.

But the summation in the nal term is the kernel for Poisson ((1 p)λ) . Rememberthat,

∑t=0

[(1 p)λ]t

t != e(1p)λ.

Therefore,

P(X = x) =(λp)x eλ

x !e(1p)λ =

(λp)x

x !eλp ,

which implies that X Poisson (λp) .The answer to the original question then is E [X ] = λp : on average, λp eggs survive.

(Bilkent) ECON509 This Version: 7 Nov 2014 67 / 125

Page 68: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Now comes a very useful theorem which you will, most likely, use frequently in thefuture.

Remember that E [X jy ] is a function of y and E [X jY ] is a random variable whosevalue depends on the value of Y .

Theorem (4.4.3): If X and Y are two random variables, then

EX [X ] = EYnEX jY [X jY ]

o,

provided that the expectations exist.

It is important to notice that the two expectations are with respect to two di¤erentprobability densities, fX . () and fX jY (jY = y ).This result is widely known as the Law of Iterated Expectations.

(Bilkent) ECON509 This Version: 7 Nov 2014 68 / 125

Page 69: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Proof: Let fX ,Y (x , y ) denote the joint pdf of (X ,Y ) . Moreover, let fY (y ) andfX jY (x jy ) denote the marginal distribution of Y and the conditional distribution ofX (conditional on Y = y ), respectively. By denition,

EX [X ] =Z ∞

Z ∞

∞xf (x , y )dxdy =

Z ∞

Z ∞

∞xfX jY (x jy )dx

fY (y )dy

=Z ∞

∞EX jY [X jY = y ]fY (y )dy

= EYnEX jY [X jY ]

o.

The corresponding proof for the discrete case can be obtained by replacing integralswith sums.

How does this help us? Consider calculating the expected number of survivors again.

EX [X ] = EYnEX jY [X jY ]

o= EY [pY ] = pEY [Y ] = pλ.

(Bilkent) ECON509 This Version: 7 Nov 2014 69 / 125

Page 70: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

The textbook gives the following denition for mixture distributions.

Denition (4.4.4): A random variable X is said to have a mixture distribution if thedistribution of X depends on a quantity that also has a distribution.

Therefore, the mixture distribution is a distribution that is generated through ahierarchical mechanism.

Hence, as far as Example (4.4.1) is concerned, the Poisson (λp) distribution, whichis the result of combining a binomial (Y , p) and a Poisson(λ) distribution, is amixture distribution.

(Bilkent) ECON509 This Version: 7 Nov 2014 70 / 125

Page 71: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Example (4.4.5): Now, consider the following hierarchical model:

X jY binomial(Y , p),

Y jΛ Poisson(Λ),

Λ exponential(β).

Then,

EX [X ] = EYnEX jY [X jY ]

o= EY [pY ]

= EΛ

nEY jΛ[pY jΛ]

o= pEΛ

nEY jΛ[Y jΛ]

o= pEΛ[Λ] = pβ,

which is obtained by successive application of the Law of Iterated Expectations.

Note that in this example we considered both discrete and continuous randomvariables. This is ne.

(Bilkent) ECON509 This Version: 7 Nov 2014 71 / 125

Page 72: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

The three-stage hierarchy of the previous model can also be considered as atwo-stage hierarchy by combining the last two stages.

Now,

P(Y = y ) = P (Y = y , 0 < Λ < ∞) =Z ∞

0f (y ,λ)dλ

=Z ∞

0f (y jλ)f (λ)dλ =

Z ∞

0

"eλλy

y !

#1βeλ/βdλ

=1

βy !

Z ∞

0λy eλ(1+β1)dλ =

1βy !

Γ(y + 1)

1

1+ β1

!y+1

=1

1+ β

1

1+ β1

!y.

In order to see that

Z ∞

0λy eλ(1+β1)dλ = Γ(y + 1)

1

1+ β1

!y+1,

just use the pdf for λ Gamma(y + 1, (1+ 1/β)1).

(Bilkent) ECON509 This Version: 7 Nov 2014 72 / 125

Page 73: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

The nal expression is that of the negative binomial pmf. Therefore, the two-stagehierarchy is given by

X jY binomial(Y , p),

Y negative binomialr = 1, p =

11+ β

.

(Bilkent) ECON509 This Version: 7 Nov 2014 73 / 125

Page 74: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Another useful result is given next.

Theorem (4.4.7): For any two random variables X and Y ,

VarX (X ) = EY [VarX jY (X jY )] + VarY fEX jY [X jY ]g

Proof: Remember that

VarX (X ) = EX [X2 ] fEX [X ]g2

= EYnEX jY [X

2 jY ]oEY fEX jY [X jY ]g

2=

hEYnEX jY [X

2 jY ]o EY

fEX jY [X jY ]g2

i+

EYfEX jY [X jY ]g2

EY fEX jY [X jY ]g

2= EY

hnEX jY [X

2 jY ]o fEX jY [X jY ]g2

i+

EYfEX jY [X jY ]g2

EY fEX jY [X jY ]g

2= EY

hVarX jY (X jY )

i+ VarY

fEX jY [X jY ]g

,

which yields the desired result.

(Bilkent) ECON509 This Version: 7 Nov 2014 74 / 125

Page 75: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Example (4.4.6): Consider the following generalisation of the binomial distribution,where the probability of success varies according to a distribution.Specically,

X jP binomial (n,P) ,

P beta (α, β) .

ThenEX [X ] = EP

nEX jP [X jP ]

o= EP [nP ] = n

α

α+ β,

where the last result follows from the fact that for P beta (α, β) ,E [P ] = α/ (α+ β) .

Example (4.4.8): Now, lets calculate the variance of X . By Theorem (4.4.7),

VarX (X ) = VarP fEX jP [X jP ]g+ EP [VarX jP (X jP)].

Now, EX jP [X jP ] = nP and since P beta (α, β) ,

VarP (EX jP [X jP ]) = VarP (nP) = n2αβ

(α+ β)2 (α+ β+ 1).

Moreover, VarX jP (X jP) = nP(1 P), due to X jP being a binomial randomvariable.

(Bilkent) ECON509 This Version: 7 Nov 2014 75 / 125

Page 76: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

What about EP [VarX jP (X jP)]? Remember that the beta (α, β) pdf is given by

fX (x jα, β) =Γ (α+ β)

Γ (α) Γ (β)xα1 (1 x)β1 ,

where 0 < x < 1, α > 0 and β > 0.

Then,

EP [VarX jP (X jP)] = EP [nP(1 P)]

= nΓ (α+ β)

Γ(α)Γ(β)

Z 1

0p(1 p)pα1(1 p)β1dp.

The integrand is the kernel of another beta pdf with parameters α+ 1 and β+ 1since the pdf for P beta (α+ 1, β+ 1) is given by

Γ (α+ β+ 2)Γ (α+ 1) Γ (β+ 1)

pα (1 p)β .

(Bilkent) ECON509 This Version: 7 Nov 2014 76 / 125

Page 77: ECON509_-_Slides_3_-_2014_11_07

Hierarchical Models and Mixture Distributions

Therefore,

EP [VarX jP (X jP)] = nΓ (α+ β)

Γ(α)Γ(β)Γ(α+ 1)Γ(β+ 1)

Γ (α+ β+ 2)

= nΓ (α+ β)

Γ(α)Γ(β)αΓ(α)βΓ(β)

(α+ β+ 1) (α+ β) Γ (α+ β)

= nαβ

(α+ β+ 1) (α+ β).

Thus,

VarX (X ) = nαβ

(α+ β+ 1) (α+ β)+ n2

αβ

(α+ β)2 (α+ β+ 1)

= nαβ(α+ β+ n)

(α+ β)2 (α+ β+ 1).

(Bilkent) ECON509 This Version: 7 Nov 2014 77 / 125

Page 78: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

Let X and Y be two random variables. To keep notation concise, we will use thefollowing notation.

E [X ] = µX , E [Y ] = µY , Var (X ) = σ2X and Var (Y ) = σ2Y .

Denition (4.5.1): The covariance of X and Y is the number dened by

Cov (X ,Y ) = E [(X µX ) (Y µY )].

Denition (4.5.2): The correlation of X and Y is the number dened by

ρXY =Cov (X ,Y )

σX σY,

which is also called the correlation coe¢ cient.

If large (small) values of X tend to be observed with large (small) values of Y , thenCov (X ,Y ) will be positive.

Why so? Within the above setting, when X > µX then Y > µY is likely to be truewhereas when X < µX then Y < µY is likely to be true. HenceE [(X µX ) (Y µY )] > 0.

Similarly, if large (small) values of X tend to be observed with small (large) values ofY , then Cov (X ,Y ) will be negative.

(Bilkent) ECON509 This Version: 7 Nov 2014 78 / 125

Page 79: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

Correlation normalises covariance by the standard deviations and is, therefore, amore informative measure.

If Cov (X ,Y ) = 50 while Cov (W ,Z ) = 0.9, this does not necessarily mean thatthere is a much stronger relationship between X and Y . For example, ifVar (X ) = Var (Y ) = 100 while Var (W ) = Var (Z ) = 1, then

ρXY = 0.5 and ρWZ = 0.9.

Theorem (4.5.3): For any random variables X and Y ,

Cov (X ,Y ) = E [XY ] µX µY .

Proof:

Cov (X ,Y ) = E [(X µX ) (Y µY )]

= E [XY µXY µY X + µX µY ]

= E [XY ] µX E [Y ] µY E [X ] + µX µY= E [XY ] µX µY .

(Bilkent) ECON509 This Version: 7 Nov 2014 79 / 125

Page 80: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

Example (4.5.4): Let (X ,Y ) be a bivariate random variable and

fX ,Y (x , y ) = 1, 0 < x < 1, x < y < x + 1.

Now,

fX (x) =Z x+1

xfX ,Y (x , y )dy =

Z x+1

x1dy = y jx+1y=x = 1,

implying that X Uniform(0, 1). Therefore, E [X ] = 1/2 and Var (x) = 1/12.

Now, fY (y ) is a bit more complicated to calculate. Considering the region wherefX ,Y (x , y ) > 0, we observe that 0 < x < y when 0 < y < 1 and y 1 < x < 1when 1 y < 2. Therefore,

fY (y ) =R y0 fX ,Y (x , y )dx =

R y0 1dx = y , when 0 < y < 1

fY (y ) =R 1y1 fX ,Y (x , y )dx =

R 1y1 1dx = 2 y , when 1 y < 2 .

(Bilkent) ECON509 This Version: 7 Nov 2014 80 / 125

Page 81: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

As an exercise, you can show that

µY = 1 and σ2Y = 1/6.

Moreover,

EX ,Y [XY ] =Z 1

0

Z x+1

xxydydx =

Z 1

0

xy 2

2

x+1

y=x

dx

=Z 1

0

x2 +

12xdx =

712.

Therefore,

Cov (X ,Y ) =112

and ρXY =1p2.

(Bilkent) ECON509 This Version: 7 Nov 2014 81 / 125

Page 82: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

Theorem (4.5.5): If X ?? Y , then Cov (X ,Y ) = ρXY = 0.

Proof: Since X ?? Y , by Theorem (4.2.10),

E [XY ] = E [X ]E [Y ].

ThenCov (X ,Y ) = E [XY ] µX µY = µX µY µX µY = 0,

and consequently,

ρXY =Cov (X ,Y )

σX σY= 0.

It is crucial to note that although X??Y implies that Cov (X ,Y ) = ρXY = 0, therelationship does not necessarily hold in the reverse direction.

(Bilkent) ECON509 This Version: 7 Nov 2014 82 / 125

Page 83: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

Theorem (4.5.6): If X and Y are any two random variables and a and b are anytwo constants, then

Var (aX + bY ) = a2Var (X ) + b2Var (Y ) + 2abCov (X ,Y ).

If X and Y are independent random variables, then

Var (aX + bY ) = a2Var (X ) + b2Var (Y ).

You will be asked to prove this as a homework assignment.

Note that if two random variables, X and Y , are positively correlated, then

Var (X + Y ) > Var (X ) + Var (Y ),

whereas if X and Y are negatively correlated, then

Var (X + Y ) < Var (X ) + Var (Y ).

For positively correlated random variables, large values in one tend to beaccompanied by large values in the other. Therefore, the total variance is magnied.

Similarly, for negatively correlated random variables, large values in one tend to beaccompanied by small values in the other. Hence, the variance of the sum isdampened.

(Bilkent) ECON509 This Version: 7 Nov 2014 83 / 125

Page 84: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

Example (4.5.8): Let X Uniform(0, 1), Z Uniform(0, 1/10) and Z ?? X . Let,moreover,

Y = X + Z ,

and consider (X ,Y ) .

Consider the following intuitive derivation of the distribution of (X ,Y ) . We aregiven X = x and Y = x + Z for a particular value of X . Now, the conditionaldistribution of Z given X is,

Z jX Uniform(0, 1/10),

since Z and X are independent! Therefore, one can interpret x as a locationparameter in the conditional distribution of Y given X = x .

This conditional distribution is Uniform(x , x + 1/10) : we simply shift the locationof the distribution of Z by x units.

Now,

fX ,Y (x , y ) = fY jX (y jx)fX (x) =1

x + 1/10 x1

1 0 =1

1/1011= 10,

where 0 < x < 1 and x < y < x + 1/10.

(Bilkent) ECON509 This Version: 7 Nov 2014 84 / 125

Page 85: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

Based on these ndings, it can be shown that

E [X ] =12

and E [X + Z ] =12+120=1120.

Then,

Cov (X ,Y ) = E [XY ] E [X ]E [Y ]= E [X (X + Z )] E [X ]E [X + Z ]= E [X 2 ] + E [XZ ] fE [X ]g2 E [X ]E [Z ]= E [X 2 ] + E [X ]E [Z ] fE [X ]g2 E [X ]E [Z ]

= E [X 2 ] fE [X ]g2 = Var (X ) = 112.

By Theorem (4.5.6),

Var (Y ) = Var (X ) + Var (Z ) =112+

11200

.

Then,

ρXY =1/12p

1/12p1/12+ 1/1200

=

r100101

.

(Bilkent) ECON509 This Version: 7 Nov 2014 85 / 125

Page 86: ECON509_-_Slides_3_-_2014_11_07

Covariance and Correlation

A quick comparison of Examples (4.5.4) and (4.5.8) reveals an interesting result.

In the rst example, one can show that Y jX Uniform(x , x + 1) while in the latterexample we have Y jX Uniform(x , x + 1/10).

Therefore, in the latter case, knowledge about the value of X gives us a more preciseidea about the possible values that Y can take on. Seen from a di¤erentperspective, in the second example Y is conditionally more tightly distributed aroundthe particular value of X .

In relation to this observation, ρXY for the rst example is equal top1/2 which is

much smaller compared to ρXY =p100/101 from the second example.

Another important result, provided without proof at this point, is that

1 ρXY 1;

see part (1) of Theorem (4.5.7) in Casella & Berger (p. 172). We will provide analternative proof of this result when we deal with inequalities.

(Bilkent) ECON509 This Version: 7 Nov 2014 86 / 125

Page 87: ECON509_-_Slides_3_-_2014_11_07

Covariance and CorrelationBivariate Normal Distribution

We now introduce the bivarite normal distribution.

Denition (4.5.10): Let ∞ < µX < ∞, ∞ < µY < ∞, σX > 0, σY > 0 and1 < ρ < 1. The bivariate normal pdf with means µX and µY , variances σ2X andσ2Y , and correlation ρ is the bivariate pdf given by

fX ,Y (x , y ) =1

2πσX σYp1 ρ2

exp 12(1 ρ2)

hu2 2ρuv + v 2

i,

where u =yµY

σY

and v =

xµX

σX

, while ∞ < x < ∞ and ∞ < y < ∞.

(Bilkent) ECON509 This Version: 7 Nov 2014 87 / 125

Page 88: ECON509_-_Slides_3_-_2014_11_07

Covariance and CorrelationBivariate Normal Distribution

More concisely, this would be written asXY

N

µXµY

,

σ2X ρσX σY

ρσX σY σ2Y

.

In addition, starting from the bivariate distribution, one can show that

Y jX = x N

µY + ρσY

x µX

σX

, σ2Y (1 ρ2)

,

and, likewise,

X jY = y N

µX + ρσX

y µY

σY

, σ2X (1 ρ2)

.

Finally, again, starting from the bivariate distribution, it can be shown that

X N(µX , σ2X ) and Y N(µY , σ2Y ).

Therefore, joint normality implies conditional and marginal normality. However, thisdoes not go in the opposite direction; marginal or conditional normality does notnecessarily imply joint normality.

(Bilkent) ECON509 This Version: 7 Nov 2014 88 / 125

Page 89: ECON509_-_Slides_3_-_2014_11_07

Covariance and CorrelationBivariate Normal Distribution

The normal distribution has another interesting property.

Remember that although independence implies zero covariance, the reverse is notnecessarily true.

The normal distribution is an exception to this: if two normally distributed randomvariables have zero correlation (or, equivalently, zero covariance) then they areindependent.

Why? Remember that independence is a property that governs all moments, notjust the second order ones (such as variance or covariance).

However, as the preceding discussion reveals, the distribution of a bivariate normalrandom variable is entirely determined by its mean and covariance matrix. In otherwords, the rst and second order moments are su¢ cient to characterise thedistribution.

Therefore, we do not have to worry about any higher order moments. Hence, zerocovariance implies independence in this particular case.

(Bilkent) ECON509 This Version: 7 Nov 2014 89 / 125

Page 90: ECON509_-_Slides_3_-_2014_11_07

Covariance and CorrelationBivariate Normal Distribution

The following is based on Section 3.6 of Gallant (1997).Start with

fX ,Y (x , y ) =1

2πσX σYp1 ρ2

exp 12(1 ρ2)

hu2 2ρuv + v 2

i.

Now,

u2 2ρuv + v 2 = u2 2ρuv + v 2 + ρ2v 2 ρ2v 2

= (u ρv )2 +1 ρ2

v 2.

Then,

fX ,Y (x , y ) =1

2πσX σYp1 ρ2

exp 12(1 ρ2)

h(u ρv )2 +

1 ρ2

v 2i

=1

σYp2π (1 ρ2)

exp 12(1 ρ2)

(u ρv )2

1

σXp2π

exp 12(1 ρ2)

1 ρ2

v 2

=1

σYp2π (1 ρ2)

exp 12(1 ρ2)

(u ρv )2

1

σXp2π

exp v

2

2

.

(Bilkent) ECON509 This Version: 7 Nov 2014 90 / 125

Page 91: ECON509_-_Slides_3_-_2014_11_07

Covariance and CorrelationBivariate Normal Distribution

Then, the bivariate pdf factors into

1

σYp2π (1 ρ2)

exp 12(1 ρ2)

(u ρv )2

=1

σYp2π (1 ρ2)

exp

( 12(1 ρ2)

σ2Yσ2Y

y µY

σY ρ

x µXσX

2)

=1

σYp2π (1 ρ2)

exp

( 12(1 ρ2)σ2Y

y µY

ρσYσX

(x µX )

2)and

1

σXp2π

exp v

2

2

=

1

σXp2π

exp

"12

x µX

σX

2#.

The rst of these can be considered as the conditional density fY jX (y jx) which isnormal with mean and variance

µY jX = µY +ρσYσX

(x µX ) and σ2Y jX = (1 ρ2)σ2Y .

The second, on the other hand, is the unconditional density fX (x) which is normalwith mean µX and variance σ2X .

(Bilkent) ECON509 This Version: 7 Nov 2014 91 / 125

Page 92: ECON509_-_Slides_3_-_2014_11_07

Covariance and CorrelationBivariate Normal Distribution

Now, the covariance is given by

σYX =Z ∞

Z ∞

∞(y µY )(x µX )fX ,Y (x , y )dxdy

=Z ∞

Z ∞

∞(y µY )(x µX )fY jX (y jx)fX (x)dxdy

=Z ∞

Z ∞

∞(y µY )fY jX (y jx)dy| z

E [yµY jX ]

fX (x)(x µX )dx

=Z ∞

µY +

ρσYσX

(x µX ) µY

fX (x)(x µX )dx

=ρσYσX

Z ∞

∞(x µX )

2fX (x)dx

=ρσYσX

Var (X ) =ρσYσX

σ2X = ρσY σX .

This shows that the correlation is given by ρ since

ρXY =σXY

σX σY=

ρσX σYσX σY

= ρ.

(Bilkent) ECON509 This Version: 7 Nov 2014 92 / 125

Page 93: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

So far, we have discussed multivariate random variables which consist of two randomvariables only. Now, we extend these ideas to general multivariate random variables.

For example, we might have the random vector (X1,X2,X3,X4) where X1 istemperature, X2 is weight, X3 is height and X4 is blood pressure.

The ideas and concepts we have discussed so far extend to such random vectorsdirectly.

(Bilkent) ECON509 This Version: 7 Nov 2014 93 / 125

Page 94: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Let X = (X1, ...,Xn). Then the sample space for X is a subset of Rn , then-dimensional Euclidian space.

If this sample space is countable, then X is a discrete random vector and its jointpmf is given by

f (x) = f (x1, ..., xn) = P(X1 = x1,X2 = x2, ...,Xn = xn) for each (x1, ..., xn) 2 Rn .

For any A Rn ,P (X 2 A) = ∑

x2Af (x).

Similarly, for the continuous random vector, we have the joint pdf given byf (x) = f (x1, ..., xn) which satises

P(X 2 A) =Z

ZAf (x)dx =

Z

ZAf (x1, ..., xn)dx1...dxn .

Note thatR RA is an n-fold integration, where the limits of integration are such

that the integral is calculated over all points x 2 A.

(Bilkent) ECON509 This Version: 7 Nov 2014 94 / 125

Page 95: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Let g (x) = g (x1, ..., xn) be a real-valued function dened on the sample space of X.Then, for the random variable g (X),

(discrete) : E [g (X)] = ∑x2Rn

g (x)f (x),

(continuous) : E [g (X)] =Z ∞

Z ∞

∞g (x)f (x)dx.

The marginal pdf or pmf of (X1, ...,Xk ) , the rst k coordinates of (X1, ...,Xn), isgiven by

(discrete) : f (x1, ..., xk ) = ∑(xk+1 ,...,xn)2Rnk

f (x1, ..., xn) ,

(continuous) : f (x1, ..., xk ) =Z ∞

Z ∞

∞f (x1, ..., xn)dxk+1...dxn ,

for every (x1, ..., xk ) 2 Rk .

(Bilkent) ECON509 This Version: 7 Nov 2014 95 / 125

Page 96: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

As you would expect, if f (x1, ..., xk ) > 0 then,

f (xk+1, ..., xn jx1, ..., xk ) =f (x1, ..., xn)f (x1, ..., xk )

,

gives the conditional pmf or pdf of (Xk+1, ...,Xn) given(X1 = x1,X2 = x2, ...,Xk = xk ) , which is a function of (x1, ..., xk ).

Note that you can pick your k coordinates as you like. For example, if you have(X1,X2,X3,X4,X5) and want to condition on X2 and X5, then simply consider thereordered random vector (X2,X5,X1,X3,X4) where one would condition on the rsttwo coordinates.

Lets illustrate these concepts using the following example.

(Bilkent) ECON509 This Version: 7 Nov 2014 96 / 125

Page 97: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Example (4.6.1): Let n = 4 and

f (x1, x2, x3, x4) =

(3/4

x21 + x

22 + x

23 + x

24

0 < xi < 1, i = 1, 2, 3, 40 otherwise

.

Clearly this is a nonnegative function and it can be shown thatZ ∞

Z ∞

Z ∞

Z ∞

∞f (x1, x2, x3, x4)dx1dx2dx3dx4 = 1.

For example, you can check that PX1 < 1

2 ,X2 <34 ,X4 >

12

is given byZ 1

1/2

Z 1

0

Z 3/4

0

Z 1/2

0

34

x21 + x

22 + x

23 + x

24

dx1dx2dx3dx4 =

1511024

.

Now, consider

f (x1, x2) =Z ∞

Z ∞

∞f (x1, x2, x3, x4)dx3dx4

=Z 1

0

Z 1

0

34

x21 + x

22 + x

23 + x

24

dx3dx4 =

34

x21 + x

22

+12,

where 0 < x1 < 1 and 0 < x2 < 1.

(Bilkent) ECON509 This Version: 7 Nov 2014 97 / 125

Page 98: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

This marginal pdf can be used to compute any probably or expectation involving X1and X2. For instance,

E [X1X2 ] =Z ∞

Z ∞

∞x1x2f (x1, x2)dx1dx2

=Z ∞

Z ∞

∞x1x2

34

x21 + x

22

+12

dx1dx2 =

516.

Now lets nd the conditional distribution of X3 and X4 given X1 = x1 and X2 = x2.For any (x1, x2) where 0 < x1 < 1 and 0 < x2 < 1,

f (x3, x4 jx1, x2) =f (x1, x2, x3, x4)f (x1, x2)

=3/4

x21 + x

22 + x

23 + x

24

3/4x21 + x

22

+ 1/2

=x21 + x

22 + x

23 + x

24

x21 + x22 + 2/3

.

(Bilkent) ECON509 This Version: 7 Nov 2014 98 / 125

Page 99: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Then, for example,

P

X3 >

34,X4 <

12

X1 = 13,X2 =

23

!

=Z 1/2

0

Z 1

3/4

"(1/3)2 + (2/3)2 + x23 + x

24

(1/3)2 + (2/3)2 + 2/3

#dx3dx4

=Z 1/2

0

Z 1

3/4

511+911x23 +

911x24

dx3dx4

=Z 1/2

0

544+111704

+944x24

dx4

=588+1111408

+3352

=2031408

.

(Bilkent) ECON509 This Version: 7 Nov 2014 99 / 125

Page 100: ECON509_-_Slides_3_-_2014_11_07

Multivariate DistributionsLets introduce some other useful extensions to the concepts we covered for theunivariate and bivariate random variables.Denition (4.6.5): Let X1, ...,Xn be random vectors with joint pdf or pmff (x1, ..., xn). Let fXi (xi ) denote the marginal pdf or pmf of Xi . Then, X1, ...,Xn arecalled mutually independent random vectors if, for every (x1, ..., xn),

f (x1, ..., xn) = fX1 (x1) ... fXn (xn) =n

∏i=1

fXi (xi ).

If the Xi s are all one-dimensional, then X1, ...,Xn are called mutually independentrandom variables.Clearly, if X1, ...,Xn are mutually independent, then knowledge about the values ofsome coordinates gives us no information about the values of the other coordinates.Remember that mutual independence implies pairwise independence and beyond.For example, it is possible to specify a probability distribution for (X1, ...,Xn) withthe property that each pair

Xi ,Xj

is pairwise independent but X1, ...,Xn are not

mutually independent.Theorem (4.6.6) (Generalisation of Theorem 4.2.10): Let X1, ...,Xn be mutuallyindependent random variables. Let g1, ..., gn be real-valued functions such thatgi (xi ) is a function only of xi , i = 1, ..., n. Then

E [g1(X1) ... gn(Xn)] = E [g1(X1)] E [g2(X2)] ... E [gn(Xn)].

(Bilkent) ECON509 This Version: 7 Nov 2014 100 / 125

Page 101: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Theorem (4.6.7) (Generalisation of Theorem 4.2.12): Let X1, ...,Xn be mutuallyindependent random variables with mgfs MX1 (t), ...,MXn (t). Let

Z = X1 + ...+ Xn .

Then, the mgf of Z isMZ (t) = MX1 (t) ... MXn (t).

In particular, if X1, ...,Xn all have the same distribution with mgf MX (t), then

MZ (t) = [MX (t)]n .

Example (4.6.8): Suppose X1, ...,Xn are mutually independent random variablesand the distribution of Xi is gamma (αi , β) . The mgf of a gamma (α, β) distributionis M(t) = (1 βt)α . Thus, if Z = X1 + ...+ Xn , the mgf of Z is

MZ (t) = MX1 (t) ... MXn (t) = (1 βt)α1 ... (1 βt)αn = (1 βt)(α1+...+αn) .

This is the mgf of a gamma (α1 + ...+ αn , β) distribution. Thus, the sum ofindependent gamma random variables that have a common scale parameter β alsohas a gamma distribution.

(Bilkent) ECON509 This Version: 7 Nov 2014 101 / 125

Page 102: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Corollary (4.6.9): Let X1, ...,Xn be mutually independent random variables withmgfs MX1 (t), ...,MXn (t). Let a1, ..., an and b1, ..., bn be xed constants. Let

Z = (a1X1 + b1) + ...+ (anXn + bn) .

Then, the mgf of Z is

MZ (t) = exp

tn

∑i=1

bi

!MX1 (a1t) ... MXn (ant).

Proof: From the denition, the mgf of Z is

MZ (t) = E [exp (tZ )] = E

"exp

tn

∑i=1(aiXi + bi )

!#

= exp

tn

∑i=1

bi

!E [exp (ta1X1) ... exp (tanXn)]

= exp

tn

∑i=1

bi

!MX1 (a1t) ... MXn (ant),

which yields the desired result.

(Bilkent) ECON509 This Version: 7 Nov 2014 102 / 125

Page 103: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Corollary (4.6.10): Let X1, ...,Xn be mutually independent random variables withXi N(µi , σ2i ). Let a1, ..., an and b1, ..., bn be xed constants. Then,

Z =n

∑i=1(aiXi + bi ) N

"n

∑i=1(aiµi + bi ) ,

n

∑i=1

a2i σ2i

#.

Proof: Recall that the mgf of a Nµ, σ2

random variable is

M(t) = expµt + σ2t2/2

. Substituting into the expression in Corollary (4.6.9)

yields

MZ (t) = exp

tn

∑i=1

bi

!exp

µ1a1t +

σ21a21t2

2

... exp

µnant +

σ2na2nt2

2

= exp

"n

∑i=1(aiµi + bi ) t +

∑ni=1 a

2i σ2it2

2

#,

which is mgf of the normal distribution given in Corollary (4.6.10).

This is an important result. A linear combination of independent normal randomvariables is normally distributed.

(Bilkent) ECON509 This Version: 7 Nov 2014 103 / 125

Page 104: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Theorem (4.6.11) (Generalisation of Lemma 4.2.7): Let X1, ...,Xn be randomvectors. Then X1, ...,Xn are mutually independent random vectors if and only ifthere exist functions gi (xi ), i = 1, ..., n, such that the joint pdf or pmf of(X1, ...,Xn) can be written as

f (x1, ..., xn) = g1(x1) ... gn(xn).

Theorem (4.6.12) (Generalisation of Theorem 4.3.5): Let X1, ...,Xn beindependent random vectors. Let gi (xi ) be a function only of xi , i = 1, ..., n. Then,the random variables Ui = gi (Xi ), i = 1, ..., n, are mutually independent.

(Bilkent) ECON509 This Version: 7 Nov 2014 104 / 125

Page 105: ECON509_-_Slides_3_-_2014_11_07

Multivariate DistributionsWe can also calculate the distribution of a transformation of a random vector, in thesame fashion as before.Let (X1, ...,Xn) be a random vector with pdf fX(x1, ..., xn). LetA = fx : fX(x) > 0g. Consider a new random vector (U1, ...,Un) , dened by

U1 = g1(X1, ...,Xn),U2 = g2(X1, ...,Xn), ...,Un = gn(X1, ...,Xn).

Suppose that A0,A1, ...,Ak from a partition of A with the following properties.The set A0, which may be empty, satises

P ((X1, ...,Xn) 2 A0) = 0.

The transformation

(U1, ...,Un) = (g1(X), g2(X), ..., gn(X))

is a one-to-one transformation from Ai onto B for each i = 1, 2, ..., k . Then for eachi , the inverse functions from B to Ai can be found. Denote the i th inverse by

x1 = h1i (u1, ..., un), x2 = h2i (u1, ..., un), ... xn = hni (u1, ..., un).

This i th inverse gives, for (u1, ..., un) 2 B, the unique (x1, ..., xn) 2 Ai such that

(u1, ..., un) = (g1(x1, ..., xn), g2(x1, ..., xn), ..., gn(x1, ..., xn)) .

(Bilkent) ECON509 This Version: 7 Nov 2014 105 / 125

Page 106: ECON509_-_Slides_3_-_2014_11_07

Multivariate Distributions

Let Ji denote the Jacobian computed from the i th inverse,

Ji =

∂h1i (u)∂u1

∂h1i (u)∂u2

∂h1i (u)∂un

∂h2i (u)∂u1

∂h2i (u)∂u2

∂h2i (u)∂un

......

. . ....

∂hni (u)∂u1

∂hni (u)∂u2

∂hni (u)∂un

,

which is the determinant of an n n matrix.Then, assuming that these Jacobians are non-zero on B,

fU(u1, ..., un) =k

∑i=1

fX(h1i (u1, ..., un), ..., hni (u1, ..., un)) jJi j .

(Bilkent) ECON509 This Version: 7 Nov 2014 106 / 125

Page 107: ECON509_-_Slides_3_-_2014_11_07

Inequalities

We will now cover some basic inequalities used in statistics and econometrics.

Most of the time, more complicated expressions can be written in terms of simplerexpressions. Inequalities on these simpler expressions can then be used to obtain aninequality, or often a bound, on the original complicated term.

This part is based on Sections 3.6 and 4.7 in Casella & Berger.

(Bilkent) ECON509 This Version: 7 Nov 2014 107 / 125

Page 108: ECON509_-_Slides_3_-_2014_11_07

InequalitiesWe start with one of the most famous probability inequalities.Theorem (3.6.1) (Chebychevs Inequality): Let X be a random variable and letg (x) be a nonnegative function. Then, for any r > 0,

P(g (X ) r ) E [g (X )]r

.

Proof: Using the denition of the expected value,

E [g (X )] =Z ∞

∞g (x)fX (x)dx

Zfx :g (x )rg

g (x)fX (x)dx

rZfx :g (x )rg

fX (x)dx

= rP(g (X ) r ).

Hence,E [g (X )] rP(g (X ) r ),

implying

P(g (X ) r ) E [g (X )]r

.

(Bilkent) ECON509 This Version: 7 Nov 2014 108 / 125

Page 109: ECON509_-_Slides_3_-_2014_11_07

Inequalities

This result comes in very handy when we want to turn a probability statement intoan expectation statement. For example, this would be useful if we already havesome moment existence assumptions and we want to prove a result involving aprobability statement.

Example (3.6.2): Let g (x) = (x µ)2 /σ2, where µ = E [X ] and σ2 = Var (X ).Let, for convenience, r = t2. Then,

P

"(X µ)2

σ2 t2

# 1t2E

"(X µ)2

σ2

#=1t2.

This implies that

P [jX µj tσ] 1t2,

and, consequently,

P [jX µj < tσ] 1 1t2.

Therefore, for instance for t = 2, we get

P [jX µj 2σ] 0.25 or P [jX µj < 2σ] 0.75.

This says that, there is at least a 75% chance that a random variable will be within2σ of its mean (independent of the distribution of X ).

(Bilkent) ECON509 This Version: 7 Nov 2014 109 / 125

Page 110: ECON509_-_Slides_3_-_2014_11_07

InequalitiesThis information is useful. However, many times, it might be possible to obtain aneven tighter bound in the following sense.Example (3.6.3): Let Z N(0, 1). Then,

P(Z t) =1p2π

Z ∞

tex

2/2dx

1p2π

Z ∞

t

xtex

2/2dx

=1p2π

et2/2

t.

The second inequality follows from the fact that for x > t, x/t > 1. Now, since Zhas a symmetric distribution, P (jZ j t) = 2P (Z t) . Hence,

P (jZ j t) r2π

et2/2

t.

Set t = 2 and observe that

P (jZ j 2) r2π

e2

2= .054.

This is a much tighter bound compared to the one given by Chebychevs Inequality.Generally Chebychevs Inequality provides a more conservative bound.

(Bilkent) ECON509 This Version: 7 Nov 2014 110 / 125

Page 111: ECON509_-_Slides_3_-_2014_11_07

Inequalities

A related inequality is Markovs Inequality.

Lemma (3.8.3): If P (Y 0) = 1 and P(Y = 0) < 1, then, for any r > 0,

P(Y r ) E [Y ]r,

and the relationship holds with equality if and only ifP (Y = r ) = p = 1 P (Y = 0) , where 0 < p 1.The more general form of Chebychevs Inequality, provided in White (2001), is asfollows.

Proposition 2.41 (White 2001): Let X be a random variable such thatE [jX jr ] < ∞, r > 0. Then, for every t > 0,

P (jX j t) E [jX jr ]t r

.

Setting r = 2, and some re-arranging, gives the usual Chebychevs Inequality. If welet r = 1, then we obtain Markovs Inequality. See White (2001, pp.29-30).

(Bilkent) ECON509 This Version: 7 Nov 2014 111 / 125

Page 112: ECON509_-_Slides_3_-_2014_11_07

Inequalities

We continue with inequalities involving expectations. However, rst we present auseful Lemma.

Lemma (4.7.1): Let a and b be any positive numbers, and let p and q be anypositive numbers (necessarily greater than 1) satisfying

1p+1q= 1.

Then,1pap +

1qbq ab, (2)

with equality if and only if ap = bq .

(Bilkent) ECON509 This Version: 7 Nov 2014 112 / 125

Page 113: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Proof: Fix b, and consider the function

g (a) =1pap +

1qbq ab.

To minimise g (a), di¤erentiate and set equal to 0:

ddag (a) = 0 =) ap1 b = 0 =) a = b

1p1 .

A check of the second derivative will establish that this is indeed a minimum. Thevalue of the function at the minimum is

1pb

pp1 +

1qbq (b

1p1 )b =

1pbq +

1qbq b

pp1 =

1p+1q

bq b

pp1 = 0.

Note that, p/(p 1) = q since p1 + q1 = 1.Hence, the unique minimum is zero and (2) is established. Since the minimum is

unique, equality holds only if a = b1p1 , which is equivalent to ap = bq , again from

p1 + q1 = 1.

(Bilkent) ECON509 This Version: 7 Nov 2014 113 / 125

Page 114: ECON509_-_Slides_3_-_2014_11_07

InequalitiesTheorem (4.7.2) (Hölders Inequality): Let X and Y be any two randomvariables, and let p and q be such that

1p+1q= 1.

Then,E [jXY j] fE [jX jp ]g1/p fE [jY jq ]g1/q

.

Proof: Dene

a =jX j

fE [jX jp ]g1/p and b =jY j

fE [jY jq ]g1/q .

Applying Lemma (4.7.1), we get

1pjX jp

E [jX jp ] +1qjY jq

E [jY jq ] jXY j

fE [jX jp ]g1/p fE [jY jq ]g1/q . (3)

Observe that,

E1pjX jp

E [jX jp ] +1qjY jq

E [jY jq ]

=1p+1q= 1.

Using this result with (3) yields

E [jXY j] fE [jX jp ]g1/p fE [jY jq ]g1/q.

(Bilkent) ECON509 This Version: 7 Nov 2014 114 / 125

Page 115: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Now consider a common special case of Hölders Inequality.

Theorem (4.7.3) (Cauchy-Schwarz Inequality): For any two random variables Xand Y ,

E [jXY j] nE [jX j2 ]

o1/2 nE [jY j2 ]

o1/2.

Proof: Set p = q = 2.Example (4.7.4): If X and Y have means µX and µY and variances σ2X and σ2Y ,respectively, we can apply the Cauchy-Schwarz Inequality to get

E [j(X µX ) (Y µY )j] nEh(X µX )

2io1/2 n

Eh(Y µY )

2io1/2

. (4)

Now, observe that for any two random variables W and Z ,

jWZ j WZ jWZ j ,

which implies that,E [jWZ j] E [WZ ] E [jWZ j],

which, in turn, implies that

fE [WZ ]g2 fE [jWZ j]g2 .

(Bilkent) ECON509 This Version: 7 Nov 2014 115 / 125

Page 116: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Therefore,

fE [(X µX ) (Y µY )]g2| z

[Cov (X ,Y )]2

fE [j(X µX ) (Y µY )j]g2 ,

and (4) implies that,[Cov (X ,Y )]2 σ2X σ2Y .

A particular implication of the previous result is that

1 ρXY 1,

where ρXY is the correlation between X and Y . One can show this without usingthe Cauchy-Schwarz Inequality, as well. However, the corresponding calculationswould be a lot more tedious. See Proof of Theorem 4.5.7 in Casella & Berger (2001,pp.172-173).

(Bilkent) ECON509 This Version: 7 Nov 2014 116 / 125

Page 117: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Theorem (4.7.5) (Minkowskis Inequality): Let X and Y be any two randomvariables. Then, for 1 p < ∞,

fE [jX + Y jp ]g1/p fE [jX jp ]g1/p + fE [jY jp ]g1/p.

Proof: We will use the triangle inequality which says that

jX + Y j jX j+ jY j .

Now,

E [jX + Y jp ] = EhjX + Y j jX + Y jp1

i E

hjX j jX + Y jp1

i+ E

hjY j jX + Y jp1

i, (5)

where we have used the triangle inequality.

(Bilkent) ECON509 This Version: 7 Nov 2014 117 / 125

Page 118: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Next, we apply Hölders Inequality to each expectation on the right-hand side of (5)and obtain,

E [jX + Y jp ] fE [jX jp ]g1/pnEhjX + Y jq(p1)

io1/q

+ fE [jY jp ]g1/pnEhjX + Y jq(p1)

io1/q,

where q is such that p1 + q1 = 1. Notice that q(p 1) = p and 1 q1 = p1.Then,

E [jX + Y jp ]nEhjX + Y jq(p1)

io1/q =E [jX + Y jp ]

fE [jX + Y jp ]g1/q

= fE [jX + Y jp ]g1q1

= fE [jX + Y jp ]gp1,

andfE [jX + Y jp ]g1/p fE [jX jp ]g1/p + fE [jY jp ]g1/p

.

(Bilkent) ECON509 This Version: 7 Nov 2014 118 / 125

Page 119: ECON509_-_Slides_3_-_2014_11_07

Inequalities

The previous results can be used for the case of numerical sums, as well.

For example, let a1, ..., an and b1, ..., bn be positive nonrandom values. Let X be arandom variable with range a1, ..., an and P(X = ai ) = 1/n, i = 1, ..., n. Similarly,let Y be a random variable taking on values b1, ..., bn with probabilityP(Y = bi ) = 1/n. Moreover, let p and q be such that p1 + q1 = 1. Suppose alsothat P(X = ai ,Y = bj ) is equal to 0 whenever i 6= j and is equal to 1/n wheneveri = j .

Then,

E [jXY j] =1n

n

∑i=1jaibi j ,

fE [jX jp ]g1/p =

"1n

n

∑i=1jai jp

#1/p

and fE [jY jq ]g1/q =

"1n

n

∑i=1jbi jq

#1/q

.

(Bilkent) ECON509 This Version: 7 Nov 2014 119 / 125

Page 120: ECON509_-_Slides_3_-_2014_11_07

InequalitiesHence, by Hölders Inequality,

1n

n

∑i=1jaibi j

"1n

n

∑i=1jai jp

#1/p "1n

n

∑i=1jbi jq

#1/q

=

1n

(p1+q1) " n

∑i=1jai jp

#1/p " n

∑i=1jbi jq

#1/q

,

and so,n

∑i=1jaibi j

"n

∑i=1jai jp

#1/p " n

∑i=1jbi jq

#1/q

.

A special case of this result is obtained by letting bi = 1 for all i and settingp = q = 2. Then,

n

∑i=1jai j

"n

∑i=1jai j2

#1/2 " n

∑i=1

1

#1/2

=

"n

∑i=1jai j2

#1/2

n1/2,

and so,

1n

n

∑i=1jai j!2

n

∑i=1

a2i .

(Bilkent) ECON509 This Version: 7 Nov 2014 120 / 125

Page 121: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Next, we turn to functional inequalities. These are inequalities that rely onproperties of real-valued functions rather than on any statistical properties.

Lets start by dening a convex function.

Denition (4.7.6): A function g (x) is convex if

g (λx + (1 λ) y ) λg (x) + (1 λ) g (y ), for all x and y , and 0 < λ < 1.

The function g (x) is concave if g (x) is convex.Now we can introduce Jensens Inequality.

Theorem (4.7.7): For any random variable X , if g (x) is a convex function, then

E [g (X )] gfE [X ]g.

(Bilkent) ECON509 This Version: 7 Nov 2014 121 / 125

Page 122: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Proof: Let `(x) be a tangent line to g (x) at the point g (E [X ]). Write`(x) = a+ bx for some a and b.

By the convexity of g , we have g (x) a+ bx . Since expectations preserveinequalities,

E [g (X )] E [a+ bX ] = a+ bE [X ]

= `(E [X ]) = g (E [X ]).

Note, for example, that g (x) = x2 is a convex function. Then,

E [X 2 ] fE [X ]g2 .

Moreover, if x is positive, then 1/x is convex and

E1X

1E [X ]

.

Note that, for concave functions, we have, accordingly,

E [g (X )] g (E [X ]).

(Bilkent) ECON509 This Version: 7 Nov 2014 122 / 125

Page 123: ECON509_-_Slides_3_-_2014_11_07

Inequalities

We nish by providing the following theorem.

Theorem (4.7.9): Let X be any random variable and g (x) and h(x) any functionssuch that E [g (X )], E [h(X )] and E [g (X )h(X )] exist.

1 If g (x ) is a nondecreasing function and h(x ) is a nonincreasing function, then

E [g (X )h(X )] E [g (X )]E [h(X )].2 If g (x ) and h(x ) are either both nondecreasing or both nonincreasing, then

E [g (X )h(X )] E [g (X )]E [h(X )].

Unlike, say, Hölders Inequality or Cauchy-Schwarz Inequality, this inequality allowsus to bound an expectation without using higher-order moments.

(Bilkent) ECON509 This Version: 7 Nov 2014 123 / 125

Page 124: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Let us prove the rst part of this Theorem for the easier case whereE [g (X )] = E [h(X )] = 0.

Now,

E [g (X )h(X )] =Z ∞

∞g (x)h(x)fX (x)dx

=Zfx :h(x )0g

g (x)h(x)fX (x)dx +Zfx :h(x )0g

g (x)h(x)fX (x)dx .

Let x0 be such that h(x0) = 0. Then, since h(x) is nonincreasing,fx : h(x) 0g = fx : x0 x < ∞g and fx : h(x) 0g = fx : ∞ < x x0g.Moreover, since g (x) is nondecreasing,

g (x0) = minfx :h(x )0g

g (x) and g (x0) = maxfx :h(x )0g

g (x).

(Bilkent) ECON509 This Version: 7 Nov 2014 124 / 125

Page 125: ECON509_-_Slides_3_-_2014_11_07

Inequalities

Hence, Zfx :h(x )0g

g (x)h(x)fX (x)dx g (x0)Zfx :h(x )0g

h(x)fX (x)dx ,

andZfx :h(x )0g

g (x)h(x)fX (x)dx g (x0)Zfx :h(x )0g

h(x)fX (x)dx .

Thus,

E [g (X )h(X )] =Zfx :h(x )0g

g (x)h(x)fX (x)dx +Zfx :h(x )0g

g (x)h(x)fX (x)dx

g (x0)Zfx :h(x )0g

h(x)fX (x)dx + g (x0)Zfx :h(x )0g

h(x)fX (x)dx

= g (x0)Z ∞

∞h(x)fX (x)dx = g (x0)E [h(X )] = 0.

You can try to do proof for the second part, following the same method.

(Bilkent) ECON509 This Version: 7 Nov 2014 125 / 125