EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-1
Chapter 4 - Function of Random Variables
Let X denote a random variable with known density fX(x) and distribution FX(x). Let y =
g(x) denote a real-valued function of the real variable x. Consider the transformation
Y = g(X). (4-1)
This is a transformation of the random variable X into the random variable Y. Random variable
X() is a mapping from the sample space into the real line. But so is g(X()). We are interested
in methods for finding the density fY(y) and the distribution FY(y).
When dealing with Y = g(X()), there are a few technicalities that should be considered.
1. The domain of g should include the range of X.
2. For every y, the set {Y = g(X) y} must be an event. That is, the set { S : Y() = g(X())
y } must be in F (i.e., it must be an event).
3.The events {Y = g(X) = ± } must be assigned a probability of zero.
In practice, these technicalities are assumed to hold, and they do not cause any problems.
Define the indexed set
Iy = { x : g(x) y }, (4-2)
the composition of which changes with y. The distribution of Y can be expressed as
FY(y) = P[ Y y ] = P[g(X) y] = P[ X Iy ]. (4-3)
This provides a practical method for computing the distribution function.
Example 4-1: Consider the function y = g(x) = ax + b, a 0, and b are constants.
yI = {x: g(x) = ax + b y} = {x : x (y - b)/a}
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-2
so that
y b y b
y y xa aF (y) [X I ] [X ] F ( )
P P .
Example 4-2: Given random variable X and function y = g(x) = x2 as shown by Figure 4-1.
Define Y = g(X) = X2, and find FY(y). If y < 0, then there are no values of x such that x2 < y.
Hence
YF y 0, y 0 .
If y 0, then x2 y for y x y . Hence, Iy = {x : g(x) y } = { y x y} , and
Y X XF (y) y X y F y F ( y ) , y 0. P
Special Considerations
Special consideration is due for functions g(x) that have “flat spots” and/or jump
discontinuities. These cases are considered next.
Watch for places where g(x) is constant (“flat spots”). Suppose g(x) is constant on the
interval (x0, x1]. That is, g(x) = y1, x0 < x x1, where y1 is a constant, and g(x) y1 off x0 < x
x1. Hence, all of the probability that X has in the interval x0 < x x1 is assigned to the single
value Y = y1 so that
x-axis
y
y y
y = g(x)=x2
Figure 4-1: Quadratic transformation used in Example 4-2.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-3
1 0 1 X 1 X 0Y y x X x F x [ ] F x P P . (4-4)
That is, FY(y) has a jump discontinuity at y = y1. The amount of jump is FX(x1) - FX(x0). As an
example, consider the case of a saturating amplifier/limiter transformation.
Example 4-3 (Saturating Amplifier/Limiter): In terms of FX(x), find the distribution FY(y) for
Y = g(X) where
b, x>b
g(x) = x, -b<x b,
b, x -b.
Both Fx and y = g(x) are illustrated by Figure 4-2.
Case: y b
For this case we have g(x) y for all x. Therefore FY(y) =1 for y b.
Case: -b y < b
For -b y < b, we have g(x) y for x y. Hence, FY(y) = P(Y = g(X) y) = FX(y), -b y < b
Case: y < -b
For y < -b, we have g(x) < y for NO x. Hence, FY(y) = 0, y < -b
The result of these cases is shown by Figure 4-3.
b
-b
b-b
y=g(x) 1
x-axis
FX(x)
-b b
Figure 4-2: Transformation y = g(x) and distribution Fx(x) used in Ex. 4-3.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-4
Watch for points where g(x) has a jump discontinuity
As shown by the following two examples, special
care may be required when dealing with functions that
have jump discontinuities.
Example 4-4: In terms of FX(x), find the distribution FY(y)
for Y = g(X), where
x c, x 0g(x)
x-c, x < 0
as depicted by Figure 4-4 to the right.
Case y c: If y c then g(x) y for x y-c. Hence,
FY(y) = FX(y-c) for y c.
Case -c y c: If -c y c then g(x) y for x 0. Hence,
FY(y) = P[ X < 0 ] = FX(0-) for -c y c.
Case y -c: If y -c then g(x) y for x y+c. Hence,
FY(y) = FX(y+c) for y -c.
Example 4-5: In terms of FX(x), find the distribution FY(y)
for Y = g(X) where
x c, x 0
g(x)x-c, x 0
as depicted by Figure 4-5.
Case y c: If y c then g(x) y for x y-c. Hence,
1
y-axis
Fy(y)
-b b
FX(-b)
1-FX(b-)
Figure 4-3: Result for Ex 4-3.
y = g(x)
c
-c
x-axis
y-ax
is
Figure 4-4: Transformation for Example 4-4.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-5
FY(y) = FX(y-c) for y c.
Case -c y c: If -c y c then g(x) y for x 0. Hence,
FY(y) = P[ X 0 ] = FX(0) for -c y c.
Case y -c: If y -c then g(x) y for x y+c. Hence,
FY(y) = FX(y+c) for y -c.
Notice that there is only a subtle difference between the
previous two examples. In fact, if FX(x) is continuous at x = 0, then FY(y) is the same for the
previous two examples.
Determination of fy in terms of fx
Determine the density fY(y) of Y = g(X) in terms of the density fX(x) of X. To
accomplish this, we solve the equation y = g(x) for x in terms of y . If g has an inverse, then we
can solve for a unique x in terms of y (x = g-1(y)). Otherwise, we will have to do it in segments.
That is, x1(y), x2(y), ... , xn(y) can be found (as solutions, or roots, of y = g(x) ) such that
y = g(x1(y)) = g(x2(y)) = g(x3(y)) = ... = g(xn(y)). (4-5)
Note that x1 through xn are functions of y. The range of each xi(y) covers part of the domain of
g(x). The union of the ranges of xi(y), 1 i n, covers all, or part of, the domain of g(x). The
desired fY(y) is
X 1 X 2 X nY
1 2 n
f (x ) f (x ) f (x )f (y) + +
g (x ) g (x ) g (x )
, (4-6)
y = g(x)
c
-c
x-axis
y ax
is
Figure 4-5: Transformation for Example 4-5.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-6
where g(x) denotes the derivative of g(x).
We establish this result for the function y = g(x) that is depicted by Figure 4-6, a simple
example where n = 2. The extension to the general case is obvious.
y y
Y Yy(y Y y y) f ( )d f (y) y
P
for small y (increments x1, x2 and y are defined to be positive). Similarly,
1 1 1 X 1 1
2 2 2 X 2 2
(x - x X x ) f (x ) x
(x < X x + x ) f (x ) x
P
P (4-7)
This leads to the requirement
1 1 1 2 2 2
Y X 1 1 X 2 2
X 1 X 2Y
1 2
(y < Y y + y) (x - x X x ) + (x < X x + x )
f (y) y f (x ) x f (x ) x
f (x ) f (x )f (y)
y yx x
P P P
(4-8)
Now, let the increments approach zero. The positive quantities y/x1 and y/x2 approach
x-axis
y
y = g(x) = x 2
x1 x2
x1 x2
y
Figure 4-6: Transformation y = g(x).
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-7
.
x 0 x 01 1x 0 x 02 2y 0 y 01 2
1 2
y ydg(x ) dg(x ) and
x xdx dx
.. (4-9)
This leads to the desired result
X 1 X 2Y
1 2
f (x ) f (x )f (y)
dg(x ) dg(x )
dx dx
. (4-10)
Example 4-6: Consider Y = aX2 where a > 0. If y < 0, then y = ax2 has no real solutions and
fY(y) = 0, y < 0. (4-11)
If y > 0, then y = ax2 has solutions 1x y / a and 2x y / a . Also, note that g(x) = 2ax.
Hence,
X 1 X 2Y
1 2
X X
f (x ) f (x )f (y)
dg(x ) dg(x )
dx dx
f ( y / a ) f ( y / a ) , y >0
2a y / a 2a y / a
= 0 y < 0
. (4-12)
To see a specific example, assume that X is Rayleigh distributed with parameter . The density
for X is given by (2-24); substitute this density into (4-12) to obtain
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-8
Y
2 2
y1 1a2 22
ya
yaexp
f (y) U(y)2a
1 yexp U(y)
2 a 2 a
(4-13)
which is the density for an exponential random variable with parameter = 1/(22a), as can be
seen from inspection of (2-27). Hence the square of a Rayleigh random variable produces an
exponential random variable.
Expected Value of Transformed Random Variable
Given random variable X, with density fX(x), and a function g(x), we form the random
variable Y = g(X). We know that
YY E[Y] y f (y)dy
(4-14)
This requires knowledge of fY(y). We can express Y directly in terms of g(x) and fX(x).
Theorem 4-1: Let X be a random variable and y = g(x) a function. The expected value of Y =
g(X) can be expressed as
XY E[Y] E[g(X)] g(x) f (x)dx
(4-15)
x-axis
y
y = g(x) = x 2
x1 x2
x1 x2
y
Figure 4-7: Transformation used in discussion of Theorem 4-1.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-9
To see this, consider the following example that is illustrated by Figure 4-7. Recall that
Y X 1 1 X 2 2f (y) y f (x ) x f (x ) x . Multiply this expression by y = g(x1) = g(x2) to obtain
Y 1 X 1 1 2 X 2 2y f (y) y g(x )f (x ) x g(x )f (x ) x . (4-16)
Now, partition the y-axis as 0 = y0 < y1 < y2 < ..... , where y = yk+1 - yk, k = 0, 1, 2, ... . By the
mappings 1x y and 2x y , this leads to a partition x1k, k = 0, 1, 2, ... ,of the negative
axis and a partition x2k, k = 0, 1, 2, ... ,of the positive x-axes. Sum both sides over their
partitions and obtain
k Y k 1k X 1k 1k 2k X 2k 2kk 0 k 0 k 0
y f (y ) y g(x )f (x ) x g(x )f (x ) x
. (4-17)
Let y 0, xk1 0 and xk2 0 to obtain
0
Y x x0 0
x
y f (y) dy g(x)f (x)dx g(x)f (x)dx
g(x)f (x)dx
, (4-18)
the desired result. Observe that this argument can be applied to practically any function y = g(x).
Example 4-7: Let X be N(0,) and let Y = Xn. Find E[Y]. For n even (i.e., n = 2k) we
know, from Example 2-10, that E[Xn] = E[Xn] = 1·3·5 (n - 1)n. For odd n (i.e., n = 2k +
1) write
2k 1 2 22k 1 2k 10
2[ ] f (x) exp[ x / 2 ]
2
E dx x dxX x . (4-19)
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-10
Change variables: let y = x2/22, dy = (x/2)dx and obtain
2k
2 k 1 2 22k 12 k 20
2 k 1k y
0
1 x x dxE[ ] (2 ) exp[ x / 2 ]X
2 (2 )
2 (2 )y e dy .
2
(4-20)
However, from known results on the Gamma function, we have
k y0
(k 1) y e dy k! . (4-21)
Now, use (4-21) in (4-20) to obtain
n 1
2
n 2 2n-
n
n 1 n2
1E[ X ] exp[ x / 2 ]dxx
2
1 3 5 (n 1) , n 2k (n even)
22 , n 2k 1 (n odd)!
(4-22)
for a zero-mean Gaussian random variable X.
Approximate Mean of g(X)
Let X be a random variable and y = g(x) a function. The expected value of g(X) can be
expressed as
XE[g(X)] g(x) f (x)dx
. (4-23)
To approximate this, expand g(x) in a Taylor's series around the mean to obtain
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-11
n(n) (x )
g(x) g( ) g ( )(x ) +g ( ) n!
. (4-24)
Use this expansion in the expected-value calculation to obtain
X
n(n)
X
(3) (n)32 n
E[g(X) ] g(x) f (x)dx
(x )g( ) g ( )(x ) +g ( ) f (x)dx
n!
g( ) g ( ) g ( ) + g ( ) + .2! 3! n!
(4-25)
An approximation to E[g(X)] can be based on this formula; just compute a finite number of
terms in the expansion.
Characteristic Functions
The characteristic function of a random variable is
j x j XX( ) f (x)e dx E[e ]
. (4-26)
Characteristic function is complex valued with
j xX X( ) f (x)e dx f (x)dx 1
. (4-27)
Note that is the Fourier transform of fX(x), so we can write
replace by -
( ) [f (x)]X
F , (4-28)
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-12
and
j xX
1f (x) ( )e dx
2
. (4-29)
Definition (4-26) takes the form of a sum when X is a discrete random variable. Suppose
that X takes on the values xi with probabilities pi = P[X = xi] for index i in some index set I (i
I). Then the characteristic function of X is
j xX i i
i I
( ) f (x)e dx p exp[ j x ]
. (4-30)
Due to the delta functions in density fX(x), the integral in (4-30) becomes a sum.
Example 4-8: Consider the Gaussian density function
2 2x /2
X1
f (x) e2
. (4-31)
The Fourier transform of fX is F [fX(x)] = exp[-22/2], as given in common tables. Hence,
2 2 /2
-
( ) e[f (x)]X
F . (4-32)
If 2 2(x ) /2
X1
f (x) e2
, then
2 2 2 2x /2 /2
fold ( ) [(1/ 2 )e ] e
j je eF . (4-33)
Example 4-9: Let random variable N be Poisson with parameter . That is,
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-13
n
e , n=0, 1, 2, ...n!
P N = n . (4-34)
From (4-30), we can write
n n
n 0 n 0
njj
n 0
j
( ) exp[ ] exp[ j n] exp[ ] exp[ j n]n! n!
eexp[ ] exp[ ]exp[ e ]
n!
exp[ e 1 ]
(4-35)
as the characteristic function for a Poisson process.
Multiple Dimension Case
The joint characteristic function XY(1,2) of random variables X and Y is defined as
XY1 i 2 kj( x y )
1 2 1 2 i ki k
( , ) E exp{j( X Y)} e [X = x ,Y y ] P (4-36)
for the discrete case and
2XY XY
1j( x y)1 2 1 2( , ) E exp{j( X Y)} e f (x, y) dxdy
(4-37)
for the continuous case. Equation (4-37) is recognized as the two dimensional Fourier transform
(with the sign of j reversed) of fXY(x,y). Generalizing these definitions, we can define the joint
characteristic function of n random variables X1, X2, ... , Xn as
X X1 n... 1 n 1 1 n n( , ... , ) E exp{j X ... j X } . (4-38)
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-14
Equation (4-38) can be simplified using vector notation. Define the two vectors
1 1
2 2
n n
X
X, X .
X
(4-39)
Then, we can write the n-dimensional characteristic function in the compact form
T
Xj X( ) E e
. (4-40)
Equations (4-38) and (4-40) convey the same information; however, (4-40) is much easier to
write and work with.
Characteristic Function for Multi-dimensional Gaussian Case
Let X
= [X1 X2 ... Xn]T be a Gaussian random vector with mean
= E[X
]. Let
= [
2 ... n]T be a vector of n algebraic variables. Note that
Tk k
1
n2
1 2 nk 1
n
[ ]
(4-41)
is a scalar. The characteristic function of X
is given as
T TT 1
2( ) E exp[ j ] .exp[ j X]
(4-42)
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-15
Application: Transformation of Random Variables
Sometimes, the characteristic function can be used to determine the density of random
variable Y = g(X) in terms of the density of X. To see this, consider
j Y j g(X) j g(x)Y X( ) E[e ] E[e ] e f (x) dx
. (4-43)
If a change of variable y = g(x) can be made (usually, this requires g to have an inverse), this last
integral will have the form
j yY ( ) e h(y) dy
. (4-44)
The desired result fY(y) = h(y) follows (by uniqueness of the Fourier transform).
Example 4-10: Suppose X is N(0;) and Y = aX2. Then
2 2 2 2 2j Y j aX j ax j ax x /2
Y X 0
2( ) E[e ] E[e ] e f (x) dx e e dx
2
.
For 0 x < , note that the transformation y = ax2 is one-to-one. Hence, make the change of
variable y = ax2, dy = (2ax)dx = 2 ay dx to obtain
2
2 y/2aj y y/2a j y
Y 0 0
2 dy e( ) e e e dy
2 2 ay 2 ay
.
Hence, we have
2y/2a
Ye
f (y) U(y)2 ay
.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-16
Other Applications
Sometimes, a characteristic function is used to obtain qualitative results about a random
phenomenon of interest. That is, it is a tool that may be used to obtain qualitative results about a
random quantity of interest. For example, suppose we want to show that some random
phenomenon is Gaussian distributed. We may be able to do this by deriving the characteristic
function that describes the random phenomenon (and showing that the characteristic function has
the form given by (4-33)). In Chapter 9 of these notes, we do this for shot noise. We use
characteristic function theory to show that classical shot noise becomes Gaussian distributed as
its intensity parameter becomes large.
Moment Generating Function
The moment generating function is
sx sXX(s) f (x)e dx E[e ]
. (4-45)
The nth derivative of is
n
n sx n sXXn
d(s) x f (x)e dx E[X e ]
ds
, (4-46)
so that
n
nnn
d(s) E[X ] m
ds
s 0. (4-47)
Example 4-11: Suppose X has an exponential density xXf (x) e U(x) . Then the moment
generating function is
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-17
x sx0
(s) e e dxs
.
This can be differentiated to obtain
22
22
d 1(s) E[X]ds
d 2(s) E[X ] .ds
s 0
s 0
From this, we can compute the variance as
222 22 2
2 1 1E[X ] E[X] .
Theorem 4-2
Let X and Y be independent random variables. Let g(x) and h(y) be arbitrary functions. Define
the transformed random variables
Z g(X)
W h(Y).
(4-48)
Random variables Z and W are independent.
Proof: Define
z
w
A [x : g(x) z]
B [y : h(y) w].
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-18
Then the joint distribution of Z and W is
ZW z wF (z, w) [Z z, W w] [g(X) z,h(Y) w] [X A ,Y B ] . P P P
However, due to independence of X and Y,
ZW
W
z w z w
z
F (z, w) [X A ,Y B ] [X A ] [Y B ]
[g(X) z] [h(Y) w] [Z z] [W w]
F (z)F (w) ,
P P P
P P P P ,
so that Z and W are independent.
One Function of Two Random Variables
Given random variables X and Y and a function z = g(x,y), we form the new random
variable
Z = g(X,Y). (4-49)
We want to find the density and distribution of Z in terms of like quantities for X and Y. For
real z, denote Dz as
Dz = {(x,y) : g(x,y) z}. (4-50)
Now, note that Dz satisfies
{Z z} = {g(X,Y) z} = {(X,Y) Dz}, (4-51)
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-19
so that
XY
z
Z zD
F (z) [Z z] [(X, Y) D ] f (x, y) dxdy P P . (4-52)
Thus, to find FZ it suffices to find region DZ for every z
and then evaluate the above integral.
Example 4-12: Consider the function Z = X + Y. The
distribution FZ can be represented as
Z XY
x y z
F (z) f (x, y) dxdy
.
In this integral, the region of integration is depicted by the shaded area shown on Figure 4-8.
Now, we can write
Z XY
z yF (z) f (x, y) dxdy
.
By using Leibnitz’s rule (see below) for differentiating an integral, we get the density
Z Z XY
XY
z yd df (z) F (z) f (x, y) dxdy
dz dz
f (z y, y) dy .
Leibnitz’s Rule: Consider the function of t defined by
b(t)
a(t)F(t) (x, t)dx .
x + y zx + y = z
x-axis
y-axis
Figure 4-8: Integrate over the shaded region to obtain FZ.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-20
Note that the t variable appears in the integrand and limits. Leibnitz’s rule states that
b(t) b(t)
a(t) a(t)
d d (x, t) db(t) da(t)F(t) (x, t)dx dx + (b(t), t) (a(t), t)
dt dt t dt dt
.
Special Case: X and Y Independent.
Assume that X and Y are independent. Then fXY(z-y,y) = fX(z-y)fY(y), and the previous
result becomes
Z X Yf (z) f (z y)f (y) dy
, (4-53)
the convolution of fX and fY.
Example 4-13: Consider independent random variables
X and Y with densities shown by Figure 4-9. Find
density fZ that describes the random variable Z = X + Y.
CASE I: z < - 1/2 (see Fig 4-10)
There is no overlap, so fZ(z) = 0 for z < - 1/2.
CASE II: - 1/2 < z < 1/2 (see Fig 4-11)
z (z y)
z 1/2
(z 1/2)
f (z) e dy
1 e , 1/ 2 z 1/ 2
fX(x) = e-xU(x)
1/2-1/2
1
fy(y)
x-axis y-axis
Figure 4-9: Density functions used in Example 4-13.
1/2-1/2
1fy(y)
y-axis
y = z
fX(z-y)
Figure 4-10: Case I: z < -½
1/2-1/2
1fy(y)
y-axis
y = z
fX(z-y)
Figure 4-11: Case II: -½ < z < ½.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-21
CASE III: 1/2 < z (see Fig 4-12)
1/2 (z y)
z 1/2
z 1/2 1/2
f (z) e dy
1e [e e ], z
2
As shown by Figure 4-13, the final result is
z
(z 1/2)
1/2 1/2 z
f (z) 0, z 1/ 2
1 e , 1/ 2 z 1/ 2
[e e ]e , 1/ 2 z
Example 4-14: Let X and Y be random variables.
Consider the transformation
Z X / Y .
For this transformation, we have
Dz = { (x,y) : x/y z },
the shaded region on the plot depicted by Figure 4-14. Now, compute the distribution
XY XY
yz 0z 0 yz
F (z) f (x, y)dxdy + f (x, y)dxdy
.
The density fZ(z) is found by differentiating FZ to obtain
-1 0 1 2 3 4 5
X-Axis
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Y-A
xis
Figure 4-13: Final result for Example 4-13.
1/2-1/2
1fy(y)
y-axis
y = z
fX(z-y)
Figure 4-12: Case III: ½ < z.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-22
0z z xy xy xy0
df (z) F (z) y f (yz, y)dy y f (yz, y)dy y f (yz, y)dy
dz
Example 4-15: Consider the transformation 2 2Z X Y . For this transformation, the region
DZ is given by
2 2 2 2 2zD (x, y) : x y z (x, y) : x y z (4-54)
the interior of a circle of radius z > 0. Hence, we can write
z
z zD
F (z) [Z z] [(x, y) D ] f (x, y) dxdy . P P (4-55)
Now, suppose X and Y are independent, jointly Gaussian random variables with
XY
2 2
2 21 (x y )
f (x, y) exp2 2
(4-56)
x =
yz
x-axis
y-axis
Drawn for case z > 0
Dz is the shaded region
y
x
y
x
Figure 4-14: Integrate over shaded region to obtain FZ for Example 4-14.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-23
Substitute (4-56) into (4-55) to obtain
z
z
2 2
2 2D
1 (x y )F (z) exp dxdy
2 2
To integrated this, use Figure 4-15, and
transform from rectangular to polar
coordinates
2 2
1
r x y , r 0
tan (y / x),
dA r dr d .
The change to polar coordinates yields
z
22 z
2 20 0
1 rF (z) exp r dr d
2 2
The integrand does not depend on so the
integral over is elementary. For the
integral over r, let u = r2/22 and du = (r/2)dr to obtain
2 2 2
2
2 2
z z /2r uz 20 02
z /2
drF (z) exp r e du
1 e , z 0 ,
r
x r cos
y r sin
x-axis
y-axis
Rectangular-to-Polar Transformation
r
dr
d
dA
rd
Cut-away view detailing differential area dA = rdrd.
Figure 4-15: Figures that supports Example 4-15.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-24
so that
2 2z /2
z z 2d z
f (z) F (z) e , z 0dz
a Rayleigh density with parameter Hence, if X and Y are identically distributed, independent
Gaussian random variables then 2 2Z X Y is Rayleigh distributed.
Two Functions of Two Random Variables
Given random variables X and Y and functions z = g(x,y), w = h(x,y), we form the new random
variables
Z = g(X,Y) (4-57)
W = h(X,Y).
Express the joint statistics of Z, W in terms of functions g, h and fXY. To accomplish this, define
Dzw = {(x,y) : g(x,y) z, h(x,y) w }. (4-58)
Then, the joint distribution of Z and W can be expressed as
ZW XY
zw
zwD
F (z, w) [(X, Y) D ] f (x, y)dxdy P . (4-59)
Example 4-16: Consider independent Gaussian X and Y with the joint density function
XY
2 2
2 21 (x y )
f (x, y) exp2 2
.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-25
Define random variables Z and W in terms of X and Y by
the transformations
2 2Z X Y
W Y / X .
Find FZW, FZ and FW. First, define region Dzw as
DZW = {(x,y) : x2 + y2 z2, y/x w }.
Dzw is the shaded region on Figure 4-16. The figure is drawn for the case w > 0 (the case w < 0
gives results that are identical to those given below). Now, integrate over Dzw to obtain
1
2
2 2
ZW XY 2
zw
12 2
2
Tan w z
0
r /212
D
Tan (w) z r /22 10
F (z, w) f (x, y) dxdy 2 e r dr d
e r dr ,
which leads to
2 2
ZW
1z /22
tan (w)F (z, w) 1 e , z 0, w
0, z 0, w
(4-60)
Note that Fzw factors into the product FZFW, where
x-axis
y-axis
tan-1(w)
y = xw
Drawn for casew > 0
Dzw
Dzw
Figure 4-16: Integrate over shaded region to obtain FZW for Example 4-16.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-26
2 2
Z
W
z /2F (z) {1 e }U(z)
11 1F (w) Tan (w) , w2
(4-61)
Note that Z and W are independent, Z is Rayleigh distributed and W is Cauchy distributed.
Joint Density Transformations: Determine fZW Directly in Terms of fXY.
Let X and Y be random variables with joint density fXY(x,y). Let
z = g(x,y) (4-62)
w = h(x,y)
be (generally nonlinear) functions that relate algebraic variables x, y to the algebraic variables z,
w. Also, we assume that g and h have continuous first-partial derivatives at the point (x,y) used
below. Now, define the new random variables
Z = g(X,Y)
W = h(X,Y).
(4-63)
In this section, we provide a method for determining the joint density fZW(z,w) directly in terms
of the known joint density fXY(x,y).
First, consider the relatively simple case where (4-62) can be inverted. That is, it is
possible to solve (4-62) for unique functions
x (z, w)
y (z, w)
(4-64)
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-27
that give x, y in terms of z, w. Note that z = g((z,w),(z,w)) and w = h((z,w),(z,w)) since
(4-64) is the inverse of (4-62) . Later, we will consider the general case where the
transformation cannot be inverted.
The quantity P[z < Z z + dz, w < W w + dw] is the probability that random variables
Z and W lie in the infinitesimal rectangle R1 illustrated on Figure 4-17. The area of this
infinitesimal rectangle is AREA(R1) = dzdw. The vertices of the z-w plane rectangle R1 are the
points
1
2
3
4
P = (z, w)
P = (z, w+dw)
P = (z+dz, w+dw)
P = (z+dz, w).
(4-65)
z, w plane x, y plane
R1 dw
dz
z-axis
w-axis
x-axis
y-axis
1
2
3
4
P = (z, w)
P = (z, w+dw)
P = (z+dz, w+dw)
P = (z+dz, w)
1
2 w w
3 z w z w
4 z z
P = (x, y)
P = (x+ dw, y+ dw)
P = (x+ dz+ dw, y+ dz+ dw)
P = (x+ dz, y+ dz)
P1
P2 P3
P4
P1
P2
P3
P4
R2
g,h
Figure 4-17: (z,w) and (x,y) planes used in transformation of two random variables. Functions transform from z,w plane to x,y plane. Functions g ,h transform from x,y plane to z,w plane.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-28
The z-w plane infinitesimal rectangle R1 gets mapped into the x-y plane, where it shows up as
parallelogram R2. As shown on the x-y plane of Figure 4-17, to first-order in dw and dz,
parallelogram R2 has the vertices
1 3
2 4
P = (x, y) P = (x+ dz+ dw, y+ dz+ dw)z w z w
P = (x+ dw, y+ dw) P = (x+ dz, y+ dz).w w z z
(4-66)
The requirement that (4-64) have continuous first-partial derivatives was used to write (4-66).
Note that P1 maps to P1, P2 maps to P2, etc (it is easy to show that P2 - P1 = P3 - P4 and
P4 - P1 = P3 - P2 so that we have a parallelogram in the x-y plane). Denote the area of
x-y plane parallelogram R2 as AREA(R2)
If random variables Z, W fall in the z-w plane infinitesimal square R1, then the random
variables X, Y must in the x-y plane parallelogram R2, and vice-versa. In fact, we can claim
ZW XY
ZW XY
z < Z z + dz, w < W w + dw x < X x + dx, y < Y y + dy
f (z, w) dz dw = f (x, y) dx dy
f (z, w) ( ) f (x, y) ( )
P P
AREA AREA
1 2R R
1 2R R
, (4-67)
where the approximation becomes exact as dz and dw approach zero. Since AREA(R1) = dzdw,
Equation (4-67) yields the desired fXY once an expression for AREA(R2) is obtained.
Figure 4-18 depicts the x-y plane parallelogram R2 for which area AREA(R2) must be
obtained. This parallelogram has sides 1 2P P
and 1 4P P
(shown as vectors with arrow heads on
Fig 4-18) that can be represented as
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-29
1 4
1 2
ˆ ˆP P dz dzz z
ˆ ˆP P dw dww w
i j
i j
, (4-68)
where i and j are unit vectors in the x and y directions, respectively. Now, the vector cross
product of sides 1 4P P
and 1 2P P
is denoted as 1 4 1 2P P P P
. And, the area of parallelogram R2
is the magnitude 1 4P P
1 2P P
sin() = 1 4 1 2P P P P
, where is the positive angle between
the vectors. Since ˆˆ ˆ i j k , ˆˆ ˆ j i k , ˆ ˆj j = ˆ ˆi i = ˆ ˆk k = 0, we write
1 4 1 2
ˆˆ ˆ
z w( ) P P P P det dz dz 0 det dzdw.
z z
z wdw dw 0
w w
i j k
AREA 2R (4-69)
In the literature, the last determinant on the right-hand-side of (4-69) is called the Jacobian of
the transformation (4-64); symbolically, it is denoted as J(xy); instead, the notation
(x,y)/(z,w) may be used. We write
x x
z w z w(x, y(x, y) det det
(z w y y
z w z w
J . (4-70)
P1
P2
P3
P4
R2
Figure 4-18: Parallelogram in x-y plane.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-30
Finally, substitute (4-69) into (4-67), cancel out the dzdw term that is common to both sides, and
obtain the desired result
ZW XY
x (z,w)y (z,w)
(x, yf (z, w) f (x, y)
(z w
, (4-71)
a formula for the density fZW in terms of the density fXY. It is possible to obtain (4-71) directly
from the change of variable formula in multi-dimensional integrals; this fact is discussed briefly
in Appendix 4A.
It is useful to think of (4-69) as
(x, y( ) ( )
(z w
AREA AREA2 1R R , (4-72)
a relationship between AREA(R2) and AREA(R1). So, the Jacobian can be thought of as the
“area gain” imposed by the transformation (the Jacobian shows how area is scaled by the
transformation).
By considering the mapping of a rectangle on the x, y plane to a parallelogram on the z,
w plane (i.e., in the argument just given, switch planes so that the rectangle is in the x-y plane
and the parallelogram is in the z-w plane) , it is not difficult to show
XY ZW(z, w
f (x, y) f (z, w)(x y
, (4-73)
where (x,y) and (z,w) are related by (4-62) and (4-64). Now, substitute (4-73) into (4-71) to
obtain
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-31
ZW ZW(z, w (x, y
f (z, w) f (z, w)(x y (z w
, (4-74)
where (x,y) and (z,w) are related by (4-62) and (4-64).
Equation (4-74) leads to the conclusion
1
(x, y(z, w(z w(x y
, (4-75)
where (x,y) and (z,w) are related by (4-62) and (4-64).
Sometimes, the Jacobian (z,w)/(x,y) is easier to compute than the Jacobian
(x,y)/(z,w); Equation (4-75) tells us that the former is the numerical inverse of the latter. In
terms of the Jacobian (z,w)/(x,y), Equation (4-71) becomes
XYZW
x (z,w)y (z,w)
f (x, y)f (z, w)
(z w(x, y
, (4-76)
which may be easier to evaluate than (4-71).
Often, the original transformation (4-62) does not have an inverse. That is, it may not be
possible to find unique functions and as described by (4-64). In this case, we must solve
(4-62) for its real-valued roots xk(z,w), yk(z,w), 1 k n, where n > 1. These n roots depend on
z and w; each of the (xk,yk) “covers” a different part of the x-y plane. Note that
z = g(xk,yk), w = h(xk,yk) (4-77)
for each root, 1 k n. For this case, a simple extension of (4-71) leads to
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-32
ZW XY
n
k 1k k
(x, yf (z, w) f (x, y)
(z w(x, y) (x , y )
, (4-78)
and the generalization of (4-76) is
XY
ZW
n
k 1k k
f (x, y)
f (z, w) .(z w(x, y (x, y) (x , y )
(4-79)
That is, to obtain fZW(z,w), we should evaluate the right-hand-side of (4-71) (or (4-76)) at each of
the n roots xk(z,w), yk(z,w), 1 k n, and sum up the results.
Example 4-17: Consider the linear transformation
z = ax + by z a b x,
w = cx + dy w c d y
where ad - bc 0. This transformation has an inverse. It is possible to express
1x a b z x = Az+ Bw
y c d w y = Cz+Dw,
where A, B, C and D are appropriate constants (can you find A, B, C and D??). Now, compute
a b
(z wdet ad bc
(x, y c d
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-33
If X and Y are random variables described by fXY(x,y), the density function for random variables
Z = aX + bY, W = cX + dY is
XYzw
f (Az Bw, Cz+Dw)f (z, w)
ad-bc
.
Example 4-18: Consider X
, an n1, zero-mean Gaussian random vector with positive definite
covariance matrix x. Define Y
= AX
, where A is an nn nonsingular matrix. Note that
1 11 1n 1
n
i ik kk 1
n n1 nn n
y a a x
y a x , 1 i n
y a a x
As discussed previously, the density for X
is
T 11x x1/2 2n/2
x
1f (X) exp X X
(2 )
Since A is invertable, we can write fY(Y
) as
XY
-1X=A Y
f (X)f (Y)
(Y)
(X)
,
where
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-34
1 1 1 2 1 n 11 12 1n
2 1 2 2 2 n 21 22 2n
n 1 n 2 n n n1 n2 nn
y / x y / x y / x a a a
y / x y / x y / x a a a(Y)det det
(X)
y / x y / x y / x a a a
det[A]
is the absolute value of the determinant of the matrix A. Note that
TT T T TY X
T1 1X Y
E Y Y E AX AX A E X X A A A
A A
This leads to the result
1Y
1/2Y
1 T 1 11Y x1/2 2n/2
x
T 1 T 1 11x1/2 2n/2
x
1f (Y) exp (A Y) A Y
(2 ) det A
1exp Y (A ) A Y
(2 ) det A
,
a result rewritten as
T 11Y Y1/2 2n/2
Y
1f (Y) exp Y Y
(2 )
,
where Y = AxAT is the covariance of Gaussian random vector Y
. This example leads to the
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-35
general, very important result that linear
transformations of Gaussian random variables
produces Gaussian random variables (remember
this!!).
Example 4-19 (Polar Coordinates): Consider the
transformation
2 2
1
r x y , 0 r
Tan (y / x),
that is illustrated by Figure 4-19. With the limitation of to the ( ] range, the
transformation has the inverse
x = r cos()
y = r sin()
cos r sin(x, y)
det r(r, ) sin( ) r cos
so that
XY
XY
r x r cosy r sin
(x, y)f (r, ) f (x, y)
(r, )
r f (r cos , r sin )
for r > 0 and - < . Suppose that X and Y are independent, jointly Gaussian, zero mean
x-axis
y-axis
r
Figure 4-19: Polar coordinate transfor-mations used in Example 4-19.
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-36
with a common variance 2. For this case, the above result yields
XY XY
r
2 2
r 2 2x r cosy r sin
2
2 2
f ( )f (r)
(x, y) r {r cos } {r sin }f (r, ) f (x, y) r f (r cos , r sin ) exp
(r, ) 2 2
1 r rexp .
2 2
Note that r and are independent, r is Rayleigh and is uniform over (-].
Example 4-20: Consider the random variables Z = g(X,Y) and W = h(X,Y) where
2 2z = g(x, y) x y
w = h(x, y) y / x
. (4-80)
Transformation (4-80) has roots (x1, y1) and (x2, y2) given by
2 1/2 2 1/2
1 1 1
2 1/2 2 1/22 2 2
x z(1 w ) , y wx wz(1 w )
x z(1 w ) , y wx wz(1 w )
(4-81)
for - < w < and z > 0; the transformation has no real roots for z < 0. A direct evaluation of
the Jacobian leads to
2 2 ½ 2 2 ½
2
z zx(x y ) y(x y )x y(z w
det det(x, y w w y / x 1/ x
x y
,
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-37
which can be expressed as
2 2 ½ 2 2(z w(x y ) .1 y / x
(x, y
(4-82)
When evaluate at both (x1, y1) and (x2, y2), the Jacobian yields
2
2 21 1
(z w (z w 1 w
(x, y (x, y zx , y )x , y )
. (4-83)
Finally, application of (4-78) leads to the desired result
ZW XY XY1 1 2 22z
f (z, w) , z 0, wf (x , y ) f (x , y )1 w
, (4-84)
where (x1,y1) and (x2,y2) are given by (4-81). If, for example, X and Y are independent, zero-
mean Gaussian random variables with the joint density
XY2 2 2
21
f (x, y) = exp (x + y ) / 22
, (4-85)
then we obtain the transformed density
ZW Z W2 2
2 2z 1/
f (z, w) exp z / 2 U(z) f (z)f (w)1 w
(4-86)
where
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-38
Z
W
2 22
2
zf (z) exp z / 2 U(z)
1/f (w)
1 w
(4-87)
Thus, random variables Z and W are independent, Z is Rayleigh, and W is Cauchy.
Linear Transformations of Gaussian Random Variables
Let yi, 1 i n, be zero mean, unit variance, independent (which is equivalent to being
uncorrelated in the Gaussian case) Gaussian random variables. Define the Gaussian random
vector Y
= [y1 y2 yn]T. Note that E[Y
] = 0
and the covariance matrix is y = E[ Y
Y
T ] =
I, an n n identity matrix. Hence, we have
T1n/2 2
1f (Y) exp Y Y
(2 )
. (4-88)
Now, let A be an n n nonsingular, real-valued matrix, and consider the linear transformation
X AY
. (4-89)
The transformation is one-to-one. For every Y
there is but one X
, and for every X
there is but
one Y
= A-1X
. We can express the density of X
in terms of the density of Y
as
1
yx
Y A X
f (Y)f (X)
abs[J]
(4-90)
where
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-39
1 1 1 2 1 n
2 1 2 2 2 n
n 1 n 2 n n
x y x y x y
x y x y x yJ det det[A] 0A
x y x y x y
. (4-91)
Hence, we have
Y
1 T 1
T 1 T 1
1x
1n/2 2
1n/2 2
1f (X) f (A X)
A
1exp (A X) A X
(2 ) A
1exp ,X (A ) A X
(2 ) A
(4-92)
which can be written as
T 1x1/ 2n / 2
x
1x 2
1f (X) exp X X
(2 )
, (4-93)
where 1 1 T 1x (A ) A , which leads to the requirement that
x = AAT. (4-94)
Since A is nonsingular (a requirement on the selection of A), is positive definite. In this
development, we used x = AAT = AAT = A2 so that A = x1/2
It is important to note that X
= A Y
is zero mean Gaussian with a covariance matrix
given by x = AAT. Note that a linear transformation of Gaussian random variables produces
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-40
Gaussian random variables.
Consider the converse problem. Given zero mean Gaussian vector X
with positive
definite covariance matrix x. Find a non-singular transformation matrix A so that X
= AY
,
where Y
is zero mean Gaussian with covariance matrix y = I (identity matrix). The implication
is profound: Y
= A-1X
says that it is possible to transform a Gaussian vector with correlated
entries into a Gaussian vector made with uncorrelated (and independent) random variables. We
can remove correlation by properly transfoming the original vector. Clearly, we must find a
matrix A that satisfies
AAT = x. (4-95)
The solution to this problem comes from linear algebra. Given any positive definite
symmetric matrix x, there exists a nonsingular matrix P such that
PTxP = I, (4-96)
which means that x = (PT)-1P-1 = (P-1)TP-1 (we say that x is congruent to I). Compare this to
the result given above to see that matrix A can be found by using
A = (P-1)T = (PT)-1. (4-97)
The procedure for finding P is simple:
1) Use the given x to write the augmented matrix x[ I ]
2) Do elementary row and column operations until the augmented matrix becomes T[ I P ] .
The
elementary operations are
i) interchange two rows (columns)
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-41
ii) multiply a row (column) by a scalar
iii) add a multiple of one row (column) to another row (column).
3) Write the desired A as A = (PT)-1.
Example 4-21: Suppose we are given the covariance matrix
x
1 2
2 5
.
First, write the augmented matrix
x
1 2 1 0
2 5 0 1
1) Add to 2nd row 2first row to obtain
1 2 1 0
0 1 2 1
2) Add to 2nd column 2first column to obtain
1 0 1 0
0 1 2 1
= [ I PT]
3) 1 0
2 1
TP
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-42
4) T -11 0
A = (P )2 1
Check Results: is PTxP = I ? (Yes!) Check Results: is AAT = x? (Yes!)
Example 4-22: Consider the covariance matrix
x
2 0 3
0 1 0
3 0 10
Now, write the augmented matrix
x[
Add to 3rd row 3/2 times 1st row. Add to 3rd column 3/2 times 1st column
Multiply 1st row by 1/ 2 . Multiply 1st column by 1/ 2
EE385 Class Notes 7/6/2015 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 4-43
Multiply 3rd row by 2 /11 . Multiply 3rd column by 2 /11 .
T
2 211 11
[ I ]
P
Finally, compute
T 1
2 112 2
2 0 0
A (P ) 0 1 0
3 0
Check Results: x = AAT ? (YES!)
Top Related