Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it...
-
Upload
sybil-thompson -
Category
Documents
-
view
222 -
download
1
Transcript of Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it...
Chapter 4DeGroot & Schervish
Variance Although the mean of a distribution is a useful
summary, it does not convey very much information about the distribution. A random variable X with mean 2 has the same mean as
the constant random variable Y such that Pr(Y = 2) = 1 even if X is not constant!
To distinguish the distribution of X from the distribution of Y in this case, it might be useful to give some measure of how spread out the distribution of X is.
The variance of X is one such measure. The standard deviation of X is the square root of the
variance.
Stock Price ChangesConsider the prices A and B of two stocks at
a time one month in the future. Assume that A has the uniform distribution
on the interval [25, 35] and B has the uniform distribution on the interval [15, 45].
Both stocks have a mean price of 30. But the distributions are very different.
Stock Price Changes
Variance/Standard DeviationLet X be a random variable with finite mean μ =
E(X).The variance of X, denoted by Var(X), is defined as
follows:
The standard deviation of X is the nonnegative square root of Var(X) if the variance exists.
When only one random variable is being discussed, it is common to denote its standard deviation by the symbol σ, and the variance is denoted by σ2.
Stock Price ChangesReturn to the two random variables A and B
in the example
Variance and Standard Deviation of a Discrete DistributionSuppose that a random variable X can take each
of the five values −2, 0, 1, 3, and 4 with equal probability.E(X) = 1/5(−2 + 0 + 1+ 3 + 4) = 1.2.
W = (X − μ)2 , Var(X) = E(W).
Properties of the VarianceTheorem:Var(X) = 0 if and only if there exists a
constant c such that Pr(X = c) = 1.
Properties of the VarianceTheorem:For constants a and b,
Y = aX + b, Var(Y ) = a2 Var(X),and σY = |a|σX.
Calculating the Variance and Standard Deviation of a Linear FunctionSuppose that a random variable X can take
each of the five values −2, 0, 1, 3, and 4 with equal probability.
Determine the variance and standard deviation of Y = 4X − 7.The mean of X is μ = 1.2 and the variance is
4.56Var(Y ) = 16 Var(X) = 72.96.Also, the standard deviation σ of Y is
σY = 4σX = 4(4.56)1/2 = 8.54.
For every random variable X, Var(X) = E(X2) − [E(X)]2.
Theorem If X1, . . . , Xn are independent random variables with finite
means, thenVar(X1 + . . . + Xn) = Var(X1) + . . . + Var(Xn).
The Variance of a Binomial DistributionSuppose that a box contains red balls and blue
balls, and that the proportion of red balls is p (0 ≤ p ≤ 1).
Suppose n balls is selected from the box with replacement.
For i = 1, . . . , n, let Xi = 1 if the ith ball that is selected is red, and let Xi = 0 otherwise.
If X denotes the total number of red balls in the sample, then X = X1 + . . . + Xn and X will have the binomial
distribution with parameters n and p.
Since X1, . . . , Xn are independent, it follows from the theorem
E(Xi) = p for i = 1, . . . , n. Since Xi2 = Xi for each i, E(Xi2 ) = E(Xi) = p.
Var(Xi) = E(Xi2 ) − [E(Xi)]2 = p − p2 = p(1− p).
Var(X) = np(1− p).
MomentsFor a random variable X, the means of
powers Xk (called moments) for k >2 have useful theoretical properties, and some of them are used for additional summaries of a distribution.
The moment generating function is a related tool
Existence of MomentsFor each random variable X and every
positive integer k, the expectation E(Xk) is called the kth moment of X
In particular, in accordance with this terminology, the mean of X is the first moment of X.
Existence of MomentsSuppose that X is a random variable for
which E(X)=μ. For every positive integer k, the expectation
E[(X −μ)k] is called the kth central moment of X or the kth moment of X about the mean.
In particular, in accordance with this terminology, the variance of X is the second central moment of X.
Moment Generating FunctionsLet X be a random variable. For each real
number t ,ψ(t) = E(etX).
The function ψ(t) is called the moment generating function (abbreviated m.g.f.) of X.
The Moment Generating Function of X Depends Only on the Distribution of X: Since the m.g.f. is the expected value of a function
of X, it must depend only on the distribution of X. If X and Y have the same distribution, they must
have the same m.g.f.
Theorem LetX be a random variables whose m.g.f. ψ(t)
is finite for all values of t in some open interval around the point t = 0.
Then, for each integer n > 0, the nth moment of X, E(Xn), is finite and equals the nth derivative ψ(n)(t) at t = 0. That is, E(Xn) = ψ(n)(0) for n = 1, 2, . . . .
Example
Example
Properties of Moment Generating FunctionsTheorem
Let X be a random variable for which the m.g.f. is ψ1; let Y = aX + b, where a and b are given constants; and let ψ2 denote the m.g.f. of Y . Then for every value of t such that ψ1(at) is finite, ψ2(t) = ebtψ1(at).
Example
Theorem Suppose that X1, . . . , Xn are n independent
random variables; and for i = 1, . . . , n, let ψi denote the m.g.f. of Xi .
Let Y = X1+ . . . + Xn, and let the m.g.f. of Y be denoted by ψ. Then for every value of t such that ψi(t) is finite for i = 1, . . . , n,
Proof
The Moment Generating Function for the Binomial DistributionSuppose that a random variable X has the
binomial distribution with parameters n and p. The mean and the variance of X are
determined by representing X as the sum of n independent random variables X1, . . . , Xn.
The distribution of each variable Xi is as follows:Pr(Xi = 1) = p and Pr(Xi = 0) = 1− p.
Now use this representation to determine the m.g.f. of X = X1 + . . . + Xn.
The Moment Generating Function for the Binomial Distribution
Uniqueness of Moment Generating FunctionsTheorem If the m.g.f.’s of two random variables X1 and
X2 are finite and identical for all values of t in an open interval around the point t = 0, then the probability distributions of X1 and X2 must be identical.
The Additive Property of the Binomial DistributionIf X1 and X2 are independent random
variables, and if Xi has the binomial distribution with parameters ni and p (i = 1, 2), then X1 + X2 has the binomial distribution with parameters n1 + n2 and p.
The Mean and the MedianAlthough the mean of a distribution is a
measure of central location, the median is also a measure of central location for a distribution.
Let X be a random variable. Every number m with the following property
is called a median of the distribution of X:Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2.
Indeed, the 1/2 quantile is a median.
Example The Median of a Discrete Distribution: Suppose that X has the following discrete
distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.2,Pr(X = 3) = 0.3, Pr(X = 4) = 0.4.
The value 3 is a median of this distribution because Pr(X ≤ 3) = 0.6, which is greater than 1/2, and Pr(X ≥ 3) = 0.7, which is also greater than 1/2.
Furthermore, 3 is the unique median of this distribution.
Example A Discrete Distribution for Which the Median Is
Not Unique: Suppose that X has the following discrete
distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.4,Pr(X = 3) = 0.3, Pr(X = 4) = 0.2.
Pr(X ≤ 2) = 1/2, and Pr(X ≥ 3) = 1/2. Therefore, every value of m in the closed interval 2 ≤ m ≤ 3 will be a median of this distribution.
The most popular choice of median of this distribution would be the midpoint 2.5.
Example The Median of a Continuous Distribution. Suppose that X has a continuous distribution
for which the p.d.f. is as follows:
Mean Squared Error/M.S.ESuppose that X is a random variable with mean μ and
variance σ2. Suppose also that the value of X is to be observed in
some experiment, but this value must be predicted before the observation can be made.
One basis for making the prediction is to select some number d for which the expected value of the square of the error X − d will be a minimum.
The number E[(X − d)2] is called the mean squared error (M.S.E.) of the prediction d.
The number d for which the M.S.E. is minimized is E(X).
Mean Absolute Error/M.A.E.Another possible basis for predicting the
value of a random variable X is to choose some number d for which E(|X − d|) will be a minimum.
The M.A.E. is minimized when the chosen value of d is a median of the distribution of X.
Predicting a Discrete Uniform Random Variable. Suppose that the probability is 1/6 that a random variable X
will take each of the following six values: 1, 2, 3, 4, 5, 6.Determine the prediction for which the M.S.E. is minimum
and the prediction for which the M.A.E. is minimum. In this example, E(X) = 1/6(1+ 2 + 3 + 4 + 5 + 6) = 3.5.Therefore, the M.S.E. will be minimized by the unique
value d = 3.5.Also, every number m in the closed interval 3 ≤ m ≤ 4 is a
median of the given distribution. Therefore, the M.A.E. will be minimized by every value of d such that 3 ≤ d ≤ 4.
Because the distribution of X is symmetric, the mean of X is also a median of X.
Covariance and CorrelationWhen we are interested in the joint
distribution of two random variables, it is useful to have a summary of how much the two random variables depend on each other.
The covariance and correlation are attempts to measure that dependence, but they only capture a particular type of dependence, namely linear dependence.
CovarianceLet X and Y be random variables having finite
means. Let E(X) = μX and E(Y) = μY .The covariance of X and Y, which is denoted
by Cov(X,Y), is defined asCov(X, Y ) = E[(X − μX)(Y − μY )]
Example Let X and Y have the joint p.d.f. f:
Theorem For all random variables X and Y
Cov(X, Y ) = E(XY) − E(X)E(Y).Proof
Cov(X, Y ) = E(XY − μXY − μYX + μXμY )
= E(XY) − μXE(Y) − μYE(X) + μXμY .
CorrelationLet X and Y be random variables with finite
variances σX2 and σY
2 , respectively.Then the correlation of X and Y , which is
denoted by ρ(X, Y), is defined as follows:
Theorem
Properties of Covariance and CorrelationIf X and Y are independent random variables
Cov(X, Y ) = ρ(X, Y) = 0.Proof If X and Y are independent, then
E(XY) = E(X)E(Y). Cov(X, Y ) = 0. Also, it follows that ρ(X, Y) = 0.
Theorem Suppose that X is a random variable and Y =
aX + b. If a>0, then ρ(X, Y) = 1. If a <0, then ρ(X, Y)=−1.
Since σY= |a|σX, the theorem follows from Correlation equation.
Theorem If X and Y are random variables
Var(X + Y) = Var(X) + Var(Y ) + 2 Cov(X, Y ).
Theorem