Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it...

47
Chapter 4 DeGroot & Schervish

Transcript of Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it...

Page 1: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Chapter 4DeGroot & Schervish

Page 2: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Variance Although the mean of a distribution is a useful

summary, it does not convey very much information about the distribution. A random variable X with mean 2 has the same mean as

the constant random variable Y such that Pr(Y = 2) = 1 even if X is not constant!

To distinguish the distribution of X from the distribution of Y in this case, it might be useful to give some measure of how spread out the distribution of X is.

The variance of X is one such measure. The standard deviation of X is the square root of the

variance.

Page 3: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Stock Price ChangesConsider the prices A and B of two stocks at

a time one month in the future. Assume that A has the uniform distribution

on the interval [25, 35] and B has the uniform distribution on the interval [15, 45].

Both stocks have a mean price of 30. But the distributions are very different.

Page 4: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Stock Price Changes

Page 5: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Variance/Standard DeviationLet X be a random variable with finite mean μ =

E(X).The variance of X, denoted by Var(X), is defined as

follows:

The standard deviation of X is the nonnegative square root of Var(X) if the variance exists.

When only one random variable is being discussed, it is common to denote its standard deviation by the symbol σ, and the variance is denoted by σ2.

Page 6: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Stock Price ChangesReturn to the two random variables A and B

in the example

Page 7: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Variance and Standard Deviation of a Discrete DistributionSuppose that a random variable X can take each

of the five values −2, 0, 1, 3, and 4 with equal probability.E(X) = 1/5(−2 + 0 + 1+ 3 + 4) = 1.2.

W = (X − μ)2 , Var(X) = E(W).

Page 8: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Properties of the VarianceTheorem:Var(X) = 0 if and only if there exists a

constant c such that Pr(X = c) = 1.

Page 9: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Properties of the VarianceTheorem:For constants a and b,

Y = aX + b, Var(Y ) = a2 Var(X),and σY = |a|σX.

Page 10: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Calculating the Variance and Standard Deviation of a Linear FunctionSuppose that a random variable X can take

each of the five values −2, 0, 1, 3, and 4 with equal probability.

Determine the variance and standard deviation of Y = 4X − 7.The mean of X is μ = 1.2 and the variance is

4.56Var(Y ) = 16 Var(X) = 72.96.Also, the standard deviation σ of Y is

σY = 4σX = 4(4.56)1/2 = 8.54.

Page 11: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

For every random variable X, Var(X) = E(X2) − [E(X)]2.

Page 12: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem If X1, . . . , Xn are independent random variables with finite

means, thenVar(X1 + . . . + Xn) = Var(X1) + . . . + Var(Xn).

Page 13: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.
Page 14: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

The Variance of a Binomial DistributionSuppose that a box contains red balls and blue

balls, and that the proportion of red balls is p (0 ≤ p ≤ 1).

Suppose n balls is selected from the box with replacement.

For i = 1, . . . , n, let Xi = 1 if the ith ball that is selected is red, and let Xi = 0 otherwise.

If X denotes the total number of red balls in the sample, then X = X1 + . . . + Xn and X will have the binomial

distribution with parameters n and p.

Page 15: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Since X1, . . . , Xn are independent, it follows from the theorem

E(Xi) = p for i = 1, . . . , n. Since Xi2 = Xi for each i, E(Xi2 ) = E(Xi) = p.

Var(Xi) = E(Xi2 ) − [E(Xi)]2 = p − p2 = p(1− p).

Var(X) = np(1− p).

Page 16: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

MomentsFor a random variable X, the means of

powers Xk (called moments) for k >2 have useful theoretical properties, and some of them are used for additional summaries of a distribution.

The moment generating function is a related tool

Page 17: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Existence of MomentsFor each random variable X and every

positive integer k, the expectation E(Xk) is called the kth moment of X

In particular, in accordance with this terminology, the mean of X is the first moment of X.

Page 18: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Existence of MomentsSuppose that X is a random variable for

which E(X)=μ. For every positive integer k, the expectation

E[(X −μ)k] is called the kth central moment of X or the kth moment of X about the mean.

In particular, in accordance with this terminology, the variance of X is the second central moment of X.

Page 19: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Moment Generating FunctionsLet X be a random variable. For each real

number t ,ψ(t) = E(etX).

The function ψ(t) is called the moment generating function (abbreviated m.g.f.) of X.

The Moment Generating Function of X Depends Only on the Distribution of X: Since the m.g.f. is the expected value of a function

of X, it must depend only on the distribution of X. If X and Y have the same distribution, they must

have the same m.g.f.

Page 20: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem LetX be a random variables whose m.g.f. ψ(t)

is finite for all values of t in some open interval around the point t = 0.

Then, for each integer n > 0, the nth moment of X, E(Xn), is finite and equals the nth derivative ψ(n)(t) at t = 0. That is, E(Xn) = ψ(n)(0) for n = 1, 2, . . . .

Page 21: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Example

Page 22: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Example

Page 23: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Properties of Moment Generating FunctionsTheorem

Let X be a random variable for which the m.g.f. is ψ1; let Y = aX + b, where a and b are given constants; and let ψ2 denote the m.g.f. of Y . Then for every value of t such that ψ1(at) is finite, ψ2(t) = ebtψ1(at).

Page 24: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Example

Page 25: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem Suppose that X1, . . . , Xn are n independent

random variables; and for i = 1, . . . , n, let ψi denote the m.g.f. of Xi .

Let Y = X1+ . . . + Xn, and let the m.g.f. of Y be denoted by ψ. Then for every value of t such that ψi(t) is finite for i = 1, . . . , n,

Page 26: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Proof

Page 27: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

The Moment Generating Function for the Binomial DistributionSuppose that a random variable X has the

binomial distribution with parameters n and p. The mean and the variance of X are

determined by representing X as the sum of n independent random variables X1, . . . , Xn.

The distribution of each variable Xi is as follows:Pr(Xi = 1) = p and Pr(Xi = 0) = 1− p.

Now use this representation to determine the m.g.f. of X = X1 + . . . + Xn.

Page 28: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

The Moment Generating Function for the Binomial Distribution

Page 29: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Uniqueness of Moment Generating FunctionsTheorem If the m.g.f.’s of two random variables X1 and

X2 are finite and identical for all values of t in an open interval around the point t = 0, then the probability distributions of X1 and X2 must be identical.

Page 30: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

The Additive Property of the Binomial DistributionIf X1 and X2 are independent random

variables, and if Xi has the binomial distribution with parameters ni and p (i = 1, 2), then X1 + X2 has the binomial distribution with parameters n1 + n2 and p.

Page 31: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

The Mean and the MedianAlthough the mean of a distribution is a

measure of central location, the median is also a measure of central location for a distribution.

Let X be a random variable. Every number m with the following property

is called a median of the distribution of X:Pr(X ≤ m) ≥ 1/2 and Pr(X ≥ m) ≥ 1/2.

Indeed, the 1/2 quantile is a median.

Page 32: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Example The Median of a Discrete Distribution: Suppose that X has the following discrete

distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.2,Pr(X = 3) = 0.3, Pr(X = 4) = 0.4.

The value 3 is a median of this distribution because Pr(X ≤ 3) = 0.6, which is greater than 1/2, and Pr(X ≥ 3) = 0.7, which is also greater than 1/2.

Furthermore, 3 is the unique median of this distribution.

Page 33: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Example A Discrete Distribution for Which the Median Is

Not Unique: Suppose that X has the following discrete

distribution:Pr(X = 1) = 0.1, Pr(X = 2) = 0.4,Pr(X = 3) = 0.3, Pr(X = 4) = 0.2.

Pr(X ≤ 2) = 1/2, and Pr(X ≥ 3) = 1/2. Therefore, every value of m in the closed interval 2 ≤ m ≤ 3 will be a median of this distribution.

The most popular choice of median of this distribution would be the midpoint 2.5.

Page 34: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Example The Median of a Continuous Distribution. Suppose that X has a continuous distribution

for which the p.d.f. is as follows:

Page 35: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Mean Squared Error/M.S.ESuppose that X is a random variable with mean μ and

variance σ2. Suppose also that the value of X is to be observed in

some experiment, but this value must be predicted before the observation can be made.

One basis for making the prediction is to select some number d for which the expected value of the square of the error X − d will be a minimum.

The number E[(X − d)2] is called the mean squared error (M.S.E.) of the prediction d.

The number d for which the M.S.E. is minimized is E(X).

Page 36: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Mean Absolute Error/M.A.E.Another possible basis for predicting the

value of a random variable X is to choose some number d for which E(|X − d|) will be a minimum.

The M.A.E. is minimized when the chosen value of d is a median of the distribution of X.

Page 37: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Predicting a Discrete Uniform Random Variable. Suppose that the probability is 1/6 that a random variable X

will take each of the following six values: 1, 2, 3, 4, 5, 6.Determine the prediction for which the M.S.E. is minimum

and the prediction for which the M.A.E. is minimum. In this example, E(X) = 1/6(1+ 2 + 3 + 4 + 5 + 6) = 3.5.Therefore, the M.S.E. will be minimized by the unique

value d = 3.5.Also, every number m in the closed interval 3 ≤ m ≤ 4 is a

median of the given distribution. Therefore, the M.A.E. will be minimized by every value of d such that 3 ≤ d ≤ 4.

Because the distribution of X is symmetric, the mean of X is also a median of X.

Page 38: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Covariance and CorrelationWhen we are interested in the joint

distribution of two random variables, it is useful to have a summary of how much the two random variables depend on each other.

The covariance and correlation are attempts to measure that dependence, but they only capture a particular type of dependence, namely linear dependence.

Page 39: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

CovarianceLet X and Y be random variables having finite

means. Let E(X) = μX and E(Y) = μY .The covariance of X and Y, which is denoted

by Cov(X,Y), is defined asCov(X, Y ) = E[(X − μX)(Y − μY )]

Page 40: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Example Let X and Y have the joint p.d.f. f:

Page 41: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem For all random variables X and Y

Cov(X, Y ) = E(XY) − E(X)E(Y).Proof

Cov(X, Y ) = E(XY − μXY − μYX + μXμY )

= E(XY) − μXE(Y) − μYE(X) + μXμY .

Page 42: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

CorrelationLet X and Y be random variables with finite

variances σX2 and σY

2 , respectively.Then the correlation of X and Y , which is

denoted by ρ(X, Y), is defined as follows:

Page 43: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem

Page 44: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Properties of Covariance and CorrelationIf X and Y are independent random variables

Cov(X, Y ) = ρ(X, Y) = 0.Proof If X and Y are independent, then

E(XY) = E(X)E(Y). Cov(X, Y ) = 0. Also, it follows that ρ(X, Y) = 0.

Page 45: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem Suppose that X is a random variable and Y =

aX + b. If a>0, then ρ(X, Y) = 1. If a <0, then ρ(X, Y)=−1.

Since σY= |a|σX, the theorem follows from Correlation equation.

Page 46: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem If X and Y are random variables

Var(X + Y) = Var(X) + Var(Y ) + 2 Cov(X, Y ).

Page 47: Chapter 4 DeGroot & Schervish. Variance Although the mean of a distribution is a useful summary, it does not convey very much information about the distribution.

Theorem