Download - Chapter 4 - Function of Random Variables - UAH ... · Chapter 4 - Function of Random Variables Let X denote a random variable with known density fX(x) and distribution FX(x). Let

EE385 Class Notes 7/6/2015 John Stensby

Updates at http://www.ece.uah.edu/courses/ee385/ 4-1

Chapter 4 - Function of Random Variables

Let X denote a random variable with known density fX(x) and distribution FX(x). Let y =

g(x) denote a real-valued function of the real variable x. Consider the transformation

Y = g(X). (4-1)

This is a transformation of the random variable X into the random variable Y. Random variable

X() is a mapping from the sample space into the real line. But so is g(X()). We are interested

in methods for finding the density fY(y) and the distribution FY(y).

When dealing with Y = g(X()), there are a few technicalities that should be considered.

1. The domain of g should include the range of X.

2. For every y, the set {Y = g(X) y} must be an event. That is, the set { S : Y() = g(X())

y } must be in F (i.e., it must be an event).

3.The events {Y = g(X) = ± } must be assigned a probability of zero.

In practice, these technicalities are assumed to hold, and they do not cause any problems.

Define the indexed set

Iy = { x : g(x) y }, (4-2)

the composition of which changes with y. The distribution of Y can be expressed as

FY(y) = P[ Y y ] = P[g(X) y] = P[ X Iy ]. (4-3)

This provides a practical method for computing the distribution function.

Example 4-1: Consider the function y = g(x) = ax + b, a 0, and b are constants.

yI = {x: g(x) = ax + b y} = {x : x (y - b)/a}



so that

y b y b

y y xa aF (y) [X I ] [X ] F ( )

P P .

Example 4-2: Given random variable X and function y = g(x) = x2 as shown by Figure 4-1.

Define Y = g(X) = X2, and find FY(y). If y < 0, then there are no values of x such that x2 < y.

Hence

YF y 0, y 0 .

If y 0, then x2 y for y x y . Hence, Iy = {x : g(x) y } = { y x y} , and

Y X XF (y) y X y F y F ( y ) , y 0. P

Special Considerations

Special consideration is due for functions g(x) that have “flat spots” and/or jump

discontinuities. These cases are considered next.

Watch for places where g(x) is constant (“flat spots”). Suppose g(x) is constant on the

interval (x0, x1]. That is, g(x) = y1, x0 < x x1, where y1 is a constant, and g(x) y1 off x0 < x

x1. Hence, all of the probability that X has in the interval x0 < x x1 is assigned to the single

value Y = y1 so that

x-axis

y

y y

y = g(x)=x2

Figure 4-1: Quadratic transformation used in Example 4-2.



1 0 1 X 1 X 0Y y x X x F x [ ] F x P P . (4-4)

That is, FY(y) has a jump discontinuity at y = y1. The amount of jump is FX(x1) - FX(x0). As an

example, consider the case of a saturating amplifier/limiter transformation.

Example 4-3 (Saturating Amplifier/Limiter): In terms of FX(x), find the distribution FY(y) for

Y = g(X) where

b, x>b

g(x) = x, -b<x b,

b, x -b.

Both Fx and y = g(x) are illustrated by Figure 4-2.

Case: y b

For this case we have g(x) y for all x. Therefore FY(y) =1 for y b.

Case: -b y < b

For -b y < b, we have g(x) y for x y. Hence, FY(y) = P(Y = g(X) y) = FX(y), -b y < b

Case: y < -b

For y < -b, we have g(x) < y for NO x. Hence, FY(y) = 0, y < -b

The result of these cases is shown by Figure 4-3.

b

-b

b-b

y=g(x) 1

x-axis

FX(x)

-b b

Figure 4-2: Transformation y = g(x) and distribution Fx(x) used in Ex. 4-3.



Watch for points where g(x) has a jump discontinuity

As shown by the following two examples, special

care may be required when dealing with functions that

have jump discontinuities.

Example 4-4: In terms of FX(x), find the distribution FY(y)

for Y = g(X), where

x c, x 0g(x)

x-c, x < 0

as depicted by Figure 4-4 to the right.

Case y c: If y c then g(x) y for x y-c. Hence,

FY(y) = FX(y-c) for y c.

Case -c y c: If -c y c then g(x) y for x 0. Hence,

FY(y) = P[ X < 0 ] = FX(0-) for -c y c.

Case y -c: If y -c then g(x) y for x y+c. Hence,

FY(y) = FX(y+c) for y -c.

Example 4-5: In terms of FX(x), find the distribution FY(y)

for Y = g(X) where

x c, x 0

g(x)x-c, x 0

as depicted by Figure 4-5.

Case y c: If y c then g(x) y for x y-c. Hence,

1

y-axis

Fy(y)

-b b

FX(-b)

1-FX(b-)

Figure 4-3: Result for Ex 4-3.

y = g(x)

c

-c

x-axis

y-ax

is

Figure 4-4: Transformation for Example 4-4.



FY(y) = FX(y-c) for y c.

Case -c y c: If -c y c then g(x) y for x 0. Hence,

FY(y) = P[ X 0 ] = FX(0) for -c y c.

Case y -c: If y -c then g(x) y for x y+c. Hence,

FY(y) = FX(y+c) for y -c.

Notice that there is only a subtle difference between the

previous two examples. In fact, if FX(x) is continuous at x = 0, then FY(y) is the same for the

previous two examples.

Determination of fy in terms of fx

Determine the density fY(y) of Y = g(X) in terms of the density fX(x) of X. To

accomplish this, we solve the equation y = g(x) for x in terms of y . If g has an inverse, then we

can solve for a unique x in terms of y (x = g-1(y)). Otherwise, we will have to do it in segments.

That is, x1(y), x2(y), ... , xn(y) can be found (as solutions, or roots, of y = g(x) ) such that

y = g(x1(y)) = g(x2(y)) = g(x3(y)) = ... = g(xn(y)). (4-5)

Note that x1 through xn are functions of y. The range of each xi(y) covers part of the domain of

g(x). The union of the ranges of xi(y), 1 i n, covers all, or part of, the domain of g(x). The

desired fY(y) is

X 1 X 2 X nY

1 2 n

f (x ) f (x ) f (x )f (y) + +

g (x ) g (x ) g (x )

, (4-6)

y = g(x)

c

-c

x-axis

y ax

is

Figure 4-5: Transformation for Example 4-5.



where g(x) denotes the derivative of g(x).

We establish this result for the function y = g(x) that is depicted by Figure 4-6, a simple

example where n = 2. The extension to the general case is obvious.

y y

Y Yy(y Y y y) f ( )d f (y) y

P

for small y (increments x1, x2 and y are defined to be positive). Similarly,

1 1 1 X 1 1

2 2 2 X 2 2

(x - x X x ) f (x ) x

(x < X x + x ) f (x ) x

P

P (4-7)

This leads to the requirement

1 1 1 2 2 2

Y X 1 1 X 2 2

X 1 X 2Y

1 2

(y < Y y + y) (x - x X x ) + (x < X x + x )

f (y) y f (x ) x f (x ) x

f (x ) f (x )f (y)

y yx x

P P P

(4-8)

Now, let the increments approach zero. The positive quantities y/x1 and y/x2 approach

x-axis

y

y = g(x) = x 2

x1 x2

x1 x2

y

Figure 4-6: Transformation y = g(x).



.

x 0 x 01 1x 0 x 02 2y 0 y 01 2

1 2

y ydg(x ) dg(x ) and

x xdx dx

.. (4-9)

This leads to the desired result

X 1 X 2Y

1 2

f (x ) f (x )f (y)

dg(x ) dg(x )

dx dx

. (4-10)

Example 4-6: Consider Y = aX2 where a > 0. If y < 0, then y = ax2 has no real solutions and

fY(y) = 0, y < 0. (4-11)

If y > 0, then y = ax2 has solutions 1x y / a and 2x y / a . Also, note that g(x) = 2ax.

Hence,

X 1 X 2Y

1 2

X X

f (x ) f (x )f (y)

dg(x ) dg(x )

dx dx

f ( y / a ) f ( y / a ) , y >0

2a y / a 2a y / a

= 0 y < 0

. (4-12)

To see a specific example, assume that X is Rayleigh distributed with parameter . The density

for X is given by (2-24); substitute this density into (4-12) to obtain



Y

2 2

y1 1a2 22

ya

yaexp

f (y) U(y)2a

1 yexp U(y)

2 a 2 a

(4-13)

which is the density for an exponential random variable with parameter = 1/(22a), as can be

seen from inspection of (2-27). Hence the square of a Rayleigh random variable produces an

exponential random variable.

Expected Value of Transformed Random Variable

Given random variable X, with density fX(x), and a function g(x), we form the random

variable Y = g(X). We know that

YY E[Y] y f (y)dy

(4-14)

This requires knowledge of fY(y). We can express Y directly in terms of g(x) and fX(x).

Theorem 4-1: Let X be a random variable and y = g(x) a function. The expected value of Y =

g(X) can be expressed as

XY E[Y] E[g(X)] g(x) f (x)dx

(4-15)

x-axis

y

y = g(x) = x 2

x1 x2

x1 x2

y

Figure 4-7: Transformation used in discussion of Theorem 4-1.



To see this, consider the following example that is illustrated by Figure 4-7. Recall that

Y X 1 1 X 2 2f (y) y f (x ) x f (x ) x . Multiply this expression by y = g(x1) = g(x2) to obtain

Y 1 X 1 1 2 X 2 2y f (y) y g(x )f (x ) x g(x )f (x ) x . (4-16)

Now, partition the y-axis as 0 = y0 < y1 < y2 < ..... , where y = yk+1 - yk, k = 0, 1, 2, ... . By the

mappings 1x y and 2x y , this leads to a partition x1k, k = 0, 1, 2, ... ,of the negative

axis and a partition x2k, k = 0, 1, 2, ... ,of the positive x-axes. Sum both sides over their

partitions and obtain

k Y k 1k X 1k 1k 2k X 2k 2kk 0 k 0 k 0

y f (y ) y g(x )f (x ) x g(x )f (x ) x

. (4-17)

Let y 0, xk1 0 and xk2 0 to obtain

0

Y x x0 0

x

y f (y) dy g(x)f (x)dx g(x)f (x)dx

g(x)f (x)dx

, (4-18)

the desired result. Observe that this argument can be applied to practically any function y = g(x).

Example 4-7: Let X be N(0,) and let Y = Xn. Find E[Y]. For n even (i.e., n = 2k) we

know, from Example 2-10, that E[Xn] = E[Xn] = 1·3·5 (n - 1)n. For odd n (i.e., n = 2k +

1) write

2k 1 2 22k 1 2k 10

2[ ] f (x) exp[ x / 2 ]

2

E dx x dxX x . (4-19)



Change variables: let y = x2/22, dy = (x/2)dx and obtain

2k

2 k 1 2 22k 12 k 20

2 k 1k y

0

1 x x dxE[ ] (2 ) exp[ x / 2 ]X

2 (2 )

2 (2 )y e dy .

2

(4-20)

However, from known results on the Gamma function, we have

k y0

(k 1) y e dy k! . (4-21)

Now, use (4-21) in (4-20) to obtain

n 1

2

n 2 2n-

n

n 1 n2

1E[ X ] exp[ x / 2 ]dxx

2

1 3 5 (n 1) , n 2k (n even)

22 , n 2k 1 (n odd)!

(4-22)

for a zero-mean Gaussian random variable X.

Approximate Mean of g(X)

Let X be a random variable and y = g(x) a function. The expected value of g(X) can be

expressed as

XE[g(X)] g(x) f (x)dx

. (4-23)

To approximate this, expand g(x) in a Taylor's series around the mean to obtain



n(n) (x )

g(x) g( ) g ( )(x ) +g ( ) n!

. (4-24)

Use this expansion in the expected-value calculation to obtain

X

n(n)

X

(3) (n)32 n

E[g(X) ] g(x) f (x)dx

(x )g( ) g ( )(x ) +g ( ) f (x)dx

n!

g( ) g ( ) g ( ) + g ( ) + .2! 3! n!

(4-25)

An approximation to E[g(X)] can be based on this formula; just compute a finite number of

terms in the expansion.

Characteristic Functions

The characteristic function of a random variable is

j x j XX( ) f (x)e dx E[e ]

. (4-26)

Characteristic function is complex valued with

j xX X( ) f (x)e dx f (x)dx 1

. (4-27)

Note that is the Fourier transform of fX(x), so we can write

replace by -

( ) [f (x)]X

F , (4-28)



and

j xX

1f (x) ( )e dx

2

. (4-29)

Definition (4-26) takes the form of a sum when X is a discrete random variable. Suppose

that X takes on the values xi with probabilities pi = P[X = xi] for index i in some index set I (i

I). Then the characteristic function of X is

j xX i i

i I

( ) f (x)e dx p exp[ j x ]

. (4-30)

Due to the delta functions in density fX(x), the integral in (4-30) becomes a sum.

Example 4-8: Consider the Gaussian density function

2 2x /2

X1

f (x) e2

. (4-31)

The Fourier transform of fX is F [fX(x)] = exp[-22/2], as given in common tables. Hence,

2 2 /2

-

( ) e[f (x)]X

F . (4-32)

If 2 2(x ) /2

X1

f (x) e2

, then

2 2 2 2x /2 /2

fold ( ) [(1/ 2 )e ] e

j je eF . (4-33)

Example 4-9: Let random variable N be Poisson with parameter . That is,



n

e , n=0, 1, 2, ...n!

P N = n . (4-34)

From (4-30), we can write

n n

n 0 n 0

njj

n 0

j

( ) exp[ ] exp[ j n] exp[ ] exp[ j n]n! n!

eexp[ ] exp[ ]exp[ e ]

n!

exp[ e 1 ]

(4-35)

as the characteristic function for a Poisson process.

Multiple Dimension Case

The joint characteristic function XY(1,2) of random variables X and Y is defined as

XY1 i 2 kj( x y )

1 2 1 2 i ki k

( , ) E exp{j( X Y)} e [X = x ,Y y ] P (4-36)

for the discrete case and

2XY XY

1j( x y)1 2 1 2( , ) E exp{j( X Y)} e f (x, y) dxdy

(4-37)

for the continuous case. Equation (4-37) is recognized as the two dimensional Fourier transform

(with the sign of j reversed) of fXY(x,y). Generalizing these definitions, we can define the joint

characteristic function of n random variables X1, X2, ... , Xn as

X X1 n... 1 n 1 1 n n( , ... , ) E exp{j X ... j X } . (4-38)



Equation (4-38) can be simplified using vector notation. Define the two vectors

1 1

2 2

n n

X

X, X .

X

(4-39)

Then, we can write the n-dimensional characteristic function in the compact form

T

Xj X( ) E e

. (4-40)

Equations (4-38) and (4-40) convey the same information; however, (4-40) is much easier to

write and work with.

Characteristic Function for Multi-dimensional Gaussian Case

Let X

= [X1 X2 ... Xn]T be a Gaussian random vector with mean

= E[X

]. Let

= [

2 ... n]T be a vector of n algebraic variables. Note that

Tk k

1

n2

1 2 nk 1

n

[ ]

(4-41)

is a scalar. The characteristic function of X

is given as

T TT 1

2( ) E exp[ j ] .exp[ j X]

(4-42)



Application: Transformation of Random Variables

Sometimes, the characteristic function can be used to determine the density of random

variable Y = g(X) in terms of the density of X. To see this, consider

j Y j g(X) j g(x)Y X( ) E[e ] E[e ] e f (x) dx

. (4-43)

If a change of variable y = g(x) can be made (usually, this requires g to have an inverse), this last

integral will have the form

j yY ( ) e h(y) dy

. (4-44)

The desired result fY(y) = h(y) follows (by uniqueness of the Fourier transform).

Example 4-10: Suppose X is N(0;) and Y = aX2. Then

2 2 2 2 2j Y j aX j ax j ax x /2

Y X 0

2( ) E[e ] E[e ] e f (x) dx e e dx

2

.

For 0 x < , note that the transformation y = ax2 is one-to-one. Hence, make the change of

variable y = ax2, dy = (2ax)dx = 2 ay dx to obtain

2

2 y/2aj y y/2a j y

Y 0 0

2 dy e( ) e e e dy

2 2 ay 2 ay

.

Hence, we have

2y/2a

Ye

f (y) U(y)2 ay

.



Other Applications

Sometimes, a characteristic function is used to obtain qualitative results about a random

phenomenon of interest. That is, it is a tool that may be used to obtain qualitative results about a

random quantity of interest. For example, suppose we want to show that some random

phenomenon is Gaussian distributed. We may be able to do this by deriving the characteristic

function that describes the random phenomenon (and showing that the characteristic function has

the form given by (4-33)). In Chapter 9 of these notes, we do this for shot noise. We use

characteristic function theory to show that classical shot noise becomes Gaussian distributed as

its intensity parameter becomes large.

Moment Generating Function

The moment generating function is

sx sXX(s) f (x)e dx E[e ]

. (4-45)

The nth derivative of is

n

n sx n sXXn

d(s) x f (x)e dx E[X e ]

ds

, (4-46)

so that

n

nnn

d(s) E[X ] m

ds

s 0. (4-47)

Example 4-11: Suppose X has an exponential density xXf (x) e U(x) . Then the moment

generating function is



x sx0

(s) e e dxs

.

This can be differentiated to obtain

22

22

d 1(s) E[X]ds

d 2(s) E[X ] .ds

s 0

s 0

From this, we can compute the variance as

222 22 2

2 1 1E[X ] E[X] .

Theorem 4-2

Let X and Y be independent random variables. Let g(x) and h(y) be arbitrary functions. Define

the transformed random variables

Z g(X)

W h(Y).

(4-48)

Random variables Z and W are independent.

Proof: Define

z

w

A [x : g(x) z]

B [y : h(y) w].



Then the joint distribution of Z and W is

ZW z wF (z, w) [Z z, W w] [g(X) z,h(Y) w] [X A ,Y B ] . P P P

However, due to independence of X and Y,

ZW

W

z w z w

z

F (z, w) [X A ,Y B ] [X A ] [Y B ]

[g(X) z] [h(Y) w] [Z z] [W w]

F (z)F (w) ,

P P P

P P P P ,

so that Z and W are independent.

One Function of Two Random Variables

Given random variables X and Y and a function z = g(x,y), we form the new random

variable

Z = g(X,Y). (4-49)

We want to find the density and distribution of Z in terms of like quantities for X and Y. For

real z, denote Dz as

Dz = {(x,y) : g(x,y) z}. (4-50)

Now, note that Dz satisfies

{Z z} = {g(X,Y) z} = {(X,Y) Dz}, (4-51)



so that

XY

z

Z zD

F (z) [Z z] [(X, Y) D ] f (x, y) dxdy P P . (4-52)

Thus, to find FZ it suffices to find region DZ for every z

and then evaluate the above integral.

Example 4-12: Consider the function Z = X + Y. The

distribution FZ can be represented as

Z XY

x y z

F (z) f (x, y) dxdy

.

In this integral, the region of integration is depicted by the shaded area shown on Figure 4-8.

Now, we can write

Z XY

z yF (z) f (x, y) dxdy

.

By using Leibnitz’s rule (see below) for differentiating an integral, we get the density

Z Z XY

XY

z yd df (z) F (z) f (x, y) dxdy

dz dz

f (z y, y) dy .

Leibnitz’s Rule: Consider the function of t defined by

b(t)

a(t)F(t) (x, t)dx .

x + y zx + y = z

x-axis

y-axis

Figure 4-8: Integrate over the shaded region to obtain FZ.



Note that the t variable appears in the integrand and limits. Leibnitz’s rule states that

b(t) b(t)

a(t) a(t)

d d (x, t) db(t) da(t)F(t) (x, t)dx dx + (b(t), t) (a(t), t)

dt dt t dt dt

.

Special Case: X and Y Independent.

Assume that X and Y are independent. Then fXY(z-y,y) = fX(z-y)fY(y), and the previous

result becomes

Z X Yf (z) f (z y)f (y) dy

, (4-53)

the convolution of fX and fY.

Example 4-13: Consider independent random variables

X and Y with densities shown by Figure 4-9. Find

density fZ that describes the random variable Z = X + Y.

CASE I: z < - 1/2 (see Fig 4-10)

There is no overlap, so fZ(z) = 0 for z < - 1/2.

CASE II: - 1/2 < z < 1/2 (see Fig 4-11)

z (z y)

z 1/2

(z 1/2)

f (z) e dy

1 e , 1/ 2 z 1/ 2

fX(x) = e-xU(x)

1/2-1/2

1

fy(y)

x-axis y-axis

Figure 4-9: Density functions used in Example 4-13.

1/2-1/2

1fy(y)

y-axis

y = z

fX(z-y)

Figure 4-10: Case I: z < -½

1/2-1/2

1fy(y)

y-axis

y = z

fX(z-y)

Figure 4-11: Case II: -½ < z < ½.



CASE III: 1/2 < z (see Fig 4-12)

1/2 (z y)

z 1/2

z 1/2 1/2

f (z) e dy

1e [e e ], z

2

As shown by Figure 4-13, the final result is

z

(z 1/2)

1/2 1/2 z

f (z) 0, z 1/ 2

1 e , 1/ 2 z 1/ 2

[e e ]e , 1/ 2 z

Example 4-14: Let X and Y be random variables.

Consider the transformation

Z X / Y .

For this transformation, we have

Dz = { (x,y) : x/y z },

the shaded region on the plot depicted by Figure 4-14. Now, compute the distribution

XY XY

yz 0z 0 yz

F (z) f (x, y)dxdy + f (x, y)dxdy

.

The density fZ(z) is found by differentiating FZ to obtain

-1 0 1 2 3 4 5

X-Axis

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Y-A

xis

Figure 4-13: Final result for Example 4-13.

1/2-1/2

1fy(y)

y-axis

y = z

fX(z-y)

Figure 4-12: Case III: ½ < z.



0z z xy xy xy0

df (z) F (z) y f (yz, y)dy y f (yz, y)dy y f (yz, y)dy

dz

Example 4-15: Consider the transformation 2 2Z X Y . For this transformation, the region

DZ is given by

2 2 2 2 2zD (x, y) : x y z (x, y) : x y z (4-54)

the interior of a circle of radius z > 0. Hence, we can write

z

z zD

F (z) [Z z] [(x, y) D ] f (x, y) dxdy . P P (4-55)

Now, suppose X and Y are independent, jointly Gaussian random variables with

XY

2 2

2 21 (x y )

f (x, y) exp2 2

(4-56)

x =

yz

x-axis

y-axis

Drawn for case z > 0

Dz is the shaded region

y

x

y

x

Figure 4-14: Integrate over shaded region to obtain FZ for Example 4-14.



Substitute (4-56) into (4-55) to obtain

z

z

2 2

2 2D

1 (x y )F (z) exp dxdy

2 2

To integrated this, use Figure 4-15, and

transform from rectangular to polar

coordinates

2 2

1

r x y , r 0

tan (y / x),

dA r dr d .

The change to polar coordinates yields

z

22 z

2 20 0

1 rF (z) exp r dr d

2 2

The integrand does not depend on so the

integral over is elementary. For the

integral over r, let u = r2/22 and du = (r/2)dr to obtain

2 2 2

2

2 2

z z /2r uz 20 02

z /2

drF (z) exp r e du

1 e , z 0 ,

r

x r cos

y r sin

x-axis

y-axis

Rectangular-to-Polar Transformation

r

dr

d

dA

rd

Cut-away view detailing differential area dA = rdrd.

Figure 4-15: Figures that supports Example 4-15.



so that

2 2z /2

z z 2d z

f (z) F (z) e , z 0dz

a Rayleigh density with parameter Hence, if X and Y are identically distributed, independent

Gaussian random variables then 2 2Z X Y is Rayleigh distributed.

Two Functions of Two Random Variables

Given random variables X and Y and functions z = g(x,y), w = h(x,y), we form the new random

variables

Z = g(X,Y) (4-57)

W = h(X,Y).

Express the joint statistics of Z, W in terms of functions g, h and fXY. To accomplish this, define

Dzw = {(x,y) : g(x,y) z, h(x,y) w }. (4-58)

Then, the joint distribution of Z and W can be expressed as

ZW XY

zw

zwD

F (z, w) [(X, Y) D ] f (x, y)dxdy P . (4-59)

Example 4-16: Consider independent Gaussian X and Y with the joint density function

XY

2 2

2 21 (x y )

f (x, y) exp2 2

.



Define random variables Z and W in terms of X and Y by

the transformations

2 2Z X Y

W Y / X .

Find FZW, FZ and FW. First, define region Dzw as

DZW = {(x,y) : x2 + y2 z2, y/x w }.

Dzw is the shaded region on Figure 4-16. The figure is drawn for the case w > 0 (the case w < 0

gives results that are identical to those given below). Now, integrate over Dzw to obtain

1

2

2 2

ZW XY 2

zw

12 2

2

Tan w z

0

r /212

D

Tan (w) z r /22 10

F (z, w) f (x, y) dxdy 2 e r dr d

e r dr ,

which leads to

2 2

ZW

1z /22

tan (w)F (z, w) 1 e , z 0, w

0, z 0, w

(4-60)

Note that Fzw factors into the product FZFW, where

x-axis

y-axis

tan-1(w)

y = xw

Drawn for casew > 0

Dzw

Dzw

Figure 4-16: Integrate over shaded region to obtain FZW for Example 4-16.



2 2

Z

W

z /2F (z) {1 e }U(z)

11 1F (w) Tan (w) , w2

(4-61)

Note that Z and W are independent, Z is Rayleigh distributed and W is Cauchy distributed.

Joint Density Transformations: Determine fZW Directly in Terms of fXY.

Let X and Y be random variables with joint density fXY(x,y). Let

z = g(x,y) (4-62)

w = h(x,y)

be (generally nonlinear) functions that relate algebraic variables x, y to the algebraic variables z,

w. Also, we assume that g and h have continuous first-partial derivatives at the point (x,y) used

below. Now, define the new random variables

Z = g(X,Y)

W = h(X,Y).

(4-63)

In this section, we provide a method for determining the joint density fZW(z,w) directly in terms

of the known joint density fXY(x,y).

First, consider the relatively simple case where (4-62) can be inverted. That is, it is

possible to solve (4-62) for unique functions

x (z, w)

y (z, w)

(4-64)



that give x, y in terms of z, w. Note that z = g((z,w),(z,w)) and w = h((z,w),(z,w)) since

(4-64) is the inverse of (4-62) . Later, we will consider the general case where the

transformation cannot be inverted.

The quantity P[z < Z z + dz, w < W w + dw] is the probability that random variables

Z and W lie in the infinitesimal rectangle R1 illustrated on Figure 4-17. The area of this

infinitesimal rectangle is AREA(R1) = dzdw. The vertices of the z-w plane rectangle R1 are the

points

1

2

3

4

P = (z, w)

P = (z, w+dw)

P = (z+dz, w+dw)

P = (z+dz, w).

(4-65)

z, w plane x, y plane

R1 dw

dz

z-axis

w-axis

x-axis

y-axis

1

2

3

4

P = (z, w)

P = (z, w+dw)

P = (z+dz, w+dw)

P = (z+dz, w)

1

2 w w

3 z w z w

4 z z

P = (x, y)

P = (x+ dw, y+ dw)

P = (x+ dz+ dw, y+ dz+ dw)

P = (x+ dz, y+ dz)

P1

P2 P3

P4

P1

P2

P3

P4

R2

g,h

Figure 4-17: (z,w) and (x,y) planes used in transformation of two random variables. Functions transform from z,w plane to x,y plane. Functions g ,h transform from x,y plane to z,w plane.



The z-w plane infinitesimal rectangle R1 gets mapped into the x-y plane, where it shows up as

parallelogram R2. As shown on the x-y plane of Figure 4-17, to first-order in dw and dz,

parallelogram R2 has the vertices

1 3

2 4

P = (x, y) P = (x+ dz+ dw, y+ dz+ dw)z w z w

P = (x+ dw, y+ dw) P = (x+ dz, y+ dz).w w z z

(4-66)

The requirement that (4-64) have continuous first-partial derivatives was used to write (4-66).

Note that P1 maps to P1, P2 maps to P2, etc (it is easy to show that P2 - P1 = P3 - P4 and

P4 - P1 = P3 - P2 so that we have a parallelogram in the x-y plane). Denote the area of

x-y plane parallelogram R2 as AREA(R2)

If random variables Z, W fall in the z-w plane infinitesimal square R1, then the random

variables X, Y must in the x-y plane parallelogram R2, and vice-versa. In fact, we can claim

ZW XY

ZW XY

z < Z z + dz, w < W w + dw x < X x + dx, y < Y y + dy

f (z, w) dz dw = f (x, y) dx dy

f (z, w) ( ) f (x, y) ( )

P P

AREA AREA

1 2R R

1 2R R

, (4-67)

where the approximation becomes exact as dz and dw approach zero. Since AREA(R1) = dzdw,

Equation (4-67) yields the desired fXY once an expression for AREA(R2) is obtained.

Figure 4-18 depicts the x-y plane parallelogram R2 for which area AREA(R2) must be

obtained. This parallelogram has sides 1 2P P

and 1 4P P

(shown as vectors with arrow heads on

Fig 4-18) that can be represented as



1 4

1 2

ˆ ˆP P dz dzz z

ˆ ˆP P dw dww w

i j

i j

, (4-68)

where i and j are unit vectors in the x and y directions, respectively. Now, the vector cross

product of sides 1 4P P

and 1 2P P

is denoted as 1 4 1 2P P P P

. And, the area of parallelogram R2

is the magnitude 1 4P P

1 2P P

sin() = 1 4 1 2P P P P

, where is the positive angle between

the vectors. Since ˆˆ ˆ i j k , ˆˆ ˆ j i k , ˆ ˆj j = ˆ ˆi i = ˆ ˆk k = 0, we write

1 4 1 2

ˆˆ ˆ

z w( ) P P P P det dz dz 0 det dzdw.

z z

z wdw dw 0

w w

i j k

AREA 2R (4-69)

In the literature, the last determinant on the right-hand-side of (4-69) is called the Jacobian of

the transformation (4-64); symbolically, it is denoted as J(xy); instead, the notation

(x,y)/(z,w) may be used. We write

x x

z w z w(x, y(x, y) det det

(z w y y

z w z w

J . (4-70)

P1

P2

P3

P4

R2

Figure 4-18: Parallelogram in x-y plane.



Finally, substitute (4-69) into (4-67), cancel out the dzdw term that is common to both sides, and

obtain the desired result

ZW XY

x (z,w)y (z,w)

(x, yf (z, w) f (x, y)

(z w

, (4-71)

a formula for the density fZW in terms of the density fXY. It is possible to obtain (4-71) directly

from the change of variable formula in multi-dimensional integrals; this fact is discussed briefly

in Appendix 4A.

It is useful to think of (4-69) as

(x, y( ) ( )

(z w

AREA AREA2 1R R , (4-72)

a relationship between AREA(R2) and AREA(R1). So, the Jacobian can be thought of as the

“area gain” imposed by the transformation (the Jacobian shows how area is scaled by the

transformation).

By considering the mapping of a rectangle on the x, y plane to a parallelogram on the z,

w plane (i.e., in the argument just given, switch planes so that the rectangle is in the x-y plane

and the parallelogram is in the z-w plane) , it is not difficult to show

XY ZW(z, w

f (x, y) f (z, w)(x y

, (4-73)

where (x,y) and (z,w) are related by (4-62) and (4-64). Now, substitute (4-73) into (4-71) to

obtain



ZW ZW(z, w (x, y

f (z, w) f (z, w)(x y (z w

, (4-74)

where (x,y) and (z,w) are related by (4-62) and (4-64).

Equation (4-74) leads to the conclusion

1

(x, y(z, w(z w(x y

, (4-75)

where (x,y) and (z,w) are related by (4-62) and (4-64).

Sometimes, the Jacobian (z,w)/(x,y) is easier to compute than the Jacobian

(x,y)/(z,w); Equation (4-75) tells us that the former is the numerical inverse of the latter. In

terms of the Jacobian (z,w)/(x,y), Equation (4-71) becomes

XYZW

x (z,w)y (z,w)

f (x, y)f (z, w)

(z w(x, y

, (4-76)

which may be easier to evaluate than (4-71).

Often, the original transformation (4-62) does not have an inverse. That is, it may not be

possible to find unique functions and as described by (4-64). In this case, we must solve

(4-62) for its real-valued roots xk(z,w), yk(z,w), 1 k n, where n > 1. These n roots depend on

z and w; each of the (xk,yk) “covers” a different part of the x-y plane. Note that

z = g(xk,yk), w = h(xk,yk) (4-77)

for each root, 1 k n. For this case, a simple extension of (4-71) leads to



ZW XY

n

k 1k k

(x, yf (z, w) f (x, y)

(z w(x, y) (x , y )

, (4-78)

and the generalization of (4-76) is

XY

ZW

n

k 1k k

f (x, y)

f (z, w) .(z w(x, y (x, y) (x , y )

(4-79)

That is, to obtain fZW(z,w), we should evaluate the right-hand-side of (4-71) (or (4-76)) at each of

the n roots xk(z,w), yk(z,w), 1 k n, and sum up the results.

Example 4-17: Consider the linear transformation

z = ax + by z a b x,

w = cx + dy w c d y

where ad - bc 0. This transformation has an inverse. It is possible to express

1x a b z x = Az+ Bw

y c d w y = Cz+Dw,

where A, B, C and D are appropriate constants (can you find A, B, C and D??). Now, compute

a b

(z wdet ad bc

(x, y c d



If X and Y are random variables described by fXY(x,y), the density function for random variables

Z = aX + bY, W = cX + dY is

XYzw

f (Az Bw, Cz+Dw)f (z, w)

ad-bc

.

Example 4-18: Consider X

, an n1, zero-mean Gaussian random vector with positive definite

covariance matrix x. Define Y

= AX

, where A is an nn nonsingular matrix. Note that

1 11 1n 1

n

i ik kk 1

n n1 nn n

y a a x

y a x , 1 i n

y a a x

As discussed previously, the density for X

is

T 11x x1/2 2n/2

x

1f (X) exp X X

(2 )

Since A is invertable, we can write fY(Y

) as

XY

-1X=A Y

f (X)f (Y)

(Y)

(X)

,

where



1 1 1 2 1 n 11 12 1n

2 1 2 2 2 n 21 22 2n

n 1 n 2 n n n1 n2 nn

y / x y / x y / x a a a

y / x y / x y / x a a a(Y)det det

(X)

y / x y / x y / x a a a

det[A]

is the absolute value of the determinant of the matrix A. Note that

TT T T TY X

T1 1X Y

E Y Y E AX AX A E X X A A A

A A

This leads to the result

1Y

1/2Y

1 T 1 11Y x1/2 2n/2

x

T 1 T 1 11x1/2 2n/2

x

1f (Y) exp (A Y) A Y

(2 ) det A

1exp Y (A ) A Y

(2 ) det A

,

a result rewritten as

T 11Y Y1/2 2n/2

Y

1f (Y) exp Y Y

(2 )

,

where Y = AxAT is the covariance of Gaussian random vector Y

. This example leads to the



general, very important result that linear

transformations of Gaussian random variables

produces Gaussian random variables (remember

this!!).

Example 4-19 (Polar Coordinates): Consider the

transformation

2 2

1

r x y , 0 r

Tan (y / x),

that is illustrated by Figure 4-19. With the limitation of to the ( ] range, the

transformation has the inverse

x = r cos()

y = r sin()

cos r sin(x, y)

det r(r, ) sin( ) r cos

so that

XY

XY

r x r cosy r sin

(x, y)f (r, ) f (x, y)

(r, )

r f (r cos , r sin )

for r > 0 and - < . Suppose that X and Y are independent, jointly Gaussian, zero mean

x-axis

y-axis

r

Figure 4-19: Polar coordinate transfor-mations used in Example 4-19.



with a common variance 2. For this case, the above result yields

XY XY

r

2 2

r 2 2x r cosy r sin

2

2 2

f ( )f (r)

(x, y) r {r cos } {r sin }f (r, ) f (x, y) r f (r cos , r sin ) exp

(r, ) 2 2

1 r rexp .

2 2

Note that r and are independent, r is Rayleigh and is uniform over (-].

Example 4-20: Consider the random variables Z = g(X,Y) and W = h(X,Y) where

2 2z = g(x, y) x y

w = h(x, y) y / x

. (4-80)

Transformation (4-80) has roots (x1, y1) and (x2, y2) given by

2 1/2 2 1/2

1 1 1

2 1/2 2 1/22 2 2

x z(1 w ) , y wx wz(1 w )

x z(1 w ) , y wx wz(1 w )

(4-81)

for - < w < and z > 0; the transformation has no real roots for z < 0. A direct evaluation of

the Jacobian leads to

2 2 ½ 2 2 ½

2

z zx(x y ) y(x y )x y(z w

det det(x, y w w y / x 1/ x

x y

,



which can be expressed as

2 2 ½ 2 2(z w(x y ) .1 y / x

(x, y

(4-82)

When evaluate at both (x1, y1) and (x2, y2), the Jacobian yields

2

2 21 1

(z w (z w 1 w

(x, y (x, y zx , y )x , y )

. (4-83)

Finally, application of (4-78) leads to the desired result

ZW XY XY1 1 2 22z

f (z, w) , z 0, wf (x , y ) f (x , y )1 w

, (4-84)

where (x1,y1) and (x2,y2) are given by (4-81). If, for example, X and Y are independent, zero-

mean Gaussian random variables with the joint density

XY2 2 2

21

f (x, y) = exp (x + y ) / 22

, (4-85)

then we obtain the transformed density

ZW Z W2 2

2 2z 1/

f (z, w) exp z / 2 U(z) f (z)f (w)1 w

(4-86)

where



Z

W

2 22

2

zf (z) exp z / 2 U(z)

1/f (w)

1 w

(4-87)

Thus, random variables Z and W are independent, Z is Rayleigh, and W is Cauchy.

Linear Transformations of Gaussian Random Variables

Let yi, 1 i n, be zero mean, unit variance, independent (which is equivalent to being

uncorrelated in the Gaussian case) Gaussian random variables. Define the Gaussian random

vector Y

= [y1 y2 yn]T. Note that E[Y

] = 0

and the covariance matrix is y = E[ Y

Y

T ] =

I, an n n identity matrix. Hence, we have

T1n/2 2

1f (Y) exp Y Y

(2 )

. (4-88)

Now, let A be an n n nonsingular, real-valued matrix, and consider the linear transformation

X AY

. (4-89)

The transformation is one-to-one. For every Y

there is but one X

, and for every X

there is but

one Y

= A-1X

. We can express the density of X

in terms of the density of Y

as

1

yx

Y A X

f (Y)f (X)

abs[J]

(4-90)

where



1 1 1 2 1 n

2 1 2 2 2 n

n 1 n 2 n n

x y x y x y

x y x y x yJ det det[A] 0A

x y x y x y

. (4-91)

Hence, we have

Y

1 T 1

T 1 T 1

1x

1n/2 2

1n/2 2

1f (X) f (A X)

A

1exp (A X) A X

(2 ) A

1exp ,X (A ) A X

(2 ) A

(4-92)

which can be written as

T 1x1/ 2n / 2

x

1x 2

1f (X) exp X X

(2 )

, (4-93)

where 1 1 T 1x (A ) A , which leads to the requirement that

x = AAT. (4-94)

Since A is nonsingular (a requirement on the selection of A), is positive definite. In this

development, we used x = AAT = AAT = A2 so that A = x1/2

It is important to note that X

= A Y

is zero mean Gaussian with a covariance matrix

given by x = AAT. Note that a linear transformation of Gaussian random variables produces



Gaussian random variables.

Consider the converse problem. Given zero mean Gaussian vector X

with positive

definite covariance matrix x. Find a non-singular transformation matrix A so that X

= AY

,

where Y

is zero mean Gaussian with covariance matrix y = I (identity matrix). The implication

is profound: Y

= A-1X

says that it is possible to transform a Gaussian vector with correlated

entries into a Gaussian vector made with uncorrelated (and independent) random variables. We

can remove correlation by properly transfoming the original vector. Clearly, we must find a

matrix A that satisfies

AAT = x. (4-95)

The solution to this problem comes from linear algebra. Given any positive definite

symmetric matrix x, there exists a nonsingular matrix P such that

PTxP = I, (4-96)

which means that x = (PT)-1P-1 = (P-1)TP-1 (we say that x is congruent to I). Compare this to

the result given above to see that matrix A can be found by using

A = (P-1)T = (PT)-1. (4-97)

The procedure for finding P is simple:

1) Use the given x to write the augmented matrix x[ I ]

2) Do elementary row and column operations until the augmented matrix becomes T[ I P ] .

The

elementary operations are

i) interchange two rows (columns)



ii) multiply a row (column) by a scalar

iii) add a multiple of one row (column) to another row (column).

3) Write the desired A as A = (PT)-1.

Example 4-21: Suppose we are given the covariance matrix

x

1 2

2 5

.

First, write the augmented matrix

x

1 2 1 0

2 5 0 1

1) Add to 2nd row 2first row to obtain

1 2 1 0

0 1 2 1

2) Add to 2nd column 2first column to obtain

1 0 1 0

0 1 2 1

= [ I PT]

3) 1 0

2 1

TP



4) T -11 0

A = (P )2 1

Check Results: is PTxP = I ? (Yes!) Check Results: is AAT = x? (Yes!)

Example 4-22: Consider the covariance matrix

x

2 0 3

0 1 0

3 0 10

Now, write the augmented matrix

x[

Add to 3rd row 3/2 times 1st row. Add to 3rd column 3/2 times 1st column

Multiply 1st row by 1/ 2 . Multiply 1st column by 1/ 2



Multiply 3rd row by 2 /11 . Multiply 3rd column by 2 /11 .

T

2 211 11

[ I ]

P

Finally, compute

T 1

2 112 2

2 0 0

A (P ) 0 1 0

3 0

Check Results: x = AAT ? (YES!)