Nathaniel E. Helwig - School of Statistics : University of...
Transcript of Nathaniel E. Helwig - School of Statistics : University of...
Introduction to Normal Distribution
Nathaniel E. Helwig
Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)
Updated 17-Jan-2017
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1
Copyright
Copyright c© 2017 by Nathaniel E. Helwig
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 2
Outline of Notes
1) Univariate Normal:Distribution formStandard normalProbability calculationsAffine transformationsParameter estimation
2) Bivariate Normal:Distribution formProbability calculationsAffine transformationsConditional distributions
3) Multivariate Normal:Distribution formProbability calculationsAffine transformationsConditional distributionsParameter estimation
4) Sampling Distributions:Univariate caseMultivariate case
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 3
Univariate Normal
Univariate Normal
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 4
Univariate Normal Distribution Form
Normal Density Function (Univariate)
Given a variable x ∈ R, the normal probability density function (pdf) is
f (x) =1
σ√
2πe−
(x−µ)2
2σ2
=1
σ√
2πexp
{−(x − µ)2
2σ2
} (1)
whereµ ∈ R is the meanσ > 0 is the standard deviation (σ2 is the variance)e ≈ 2.71828 is base of the natural logarithm
Write X ∼ N(µ, σ2) to denote that X follows a normal distribution.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 5
Univariate Normal Standard Normal
Standard Normal Distribution
If X ∼ N(0,1), then X follows a standard normal distribution:
f (x) =1√2π
e−x2/2 (2)
−4 −2 0 2 4
0.0
0.2
0.4
x
f(x)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 6
Univariate Normal Probability Calculations
Probabilities and Distribution Functions
Probabilities relate to the area under the pdf:
P(a ≤ X ≤ b) =
∫ b
af (x)dx
= F (b)− F (a)
(3)
where
F (x) =
∫ x
−∞f (u)du (4)
is the cumulative distribution function (cdf).
Note: F (x) = P(X ≤ x) =⇒ 0 ≤ F (x) ≤ 1
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 7
Univariate Normal Probability Calculations
Normal Probabilities
Helpful figure of normal probabilities:0.0
0.1
0.2
0.3
0.4
−2σ −1σ 1σ−3σ 3σµ 2σ
34.1% 34.1%
13.6%2.1%
13.6% 0.1%0.1%2.1%
From http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 8
Univariate Normal Probability Calculations
Normal Distribution Functions (Univariate)
Helpful figures of normal pdfs and cdfs:
- 3 - 2 - 1φ μ,σ
2(
0.8
0.6
0.4
0.2
0.0
−5 −3 1 3 5
x
1.0
−1 0 2 4−2−4
x)
0,μ=0,μ=0,μ=−2,μ=
2 0.2,σ =2 1.0,σ =2 5.0,σ =2 0.5,σ =
- 3 - 2 - 1
x
0.8
0.6
0.4
0.2
0.0
1.0
−5 −3 1 3 5−1 0 2 4−2−4
Φμ,σ2(x)
0,μ=0,μ=0,μ=−2,μ=
2 0.2,σ =2 1.0,σ =2 5.0,σ =2 0.5,σ =
http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg http://en.wikipedia.org/wiki/File:Normal_Distribution_CDF.svg
Note that the cdf has an elongated “S” shape, referred to as an ogive.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 9
Univariate Normal Affine Transformations
Affine Transformations of Normal (Univariate)
Suppose that X ∼ N(µ, σ2) and a,b ∈ R with a 6= 0.
If we define Y = aX + b, then Y ∼ N(aµ+ b,a2σ2).
Suppose that X ∼ N(1,2). Determine the distributions of. . .Y = X + 3Y = 2X + 3Y = 3X + 2
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10
Univariate Normal Affine Transformations
Affine Transformations of Normal (Univariate)
Suppose that X ∼ N(µ, σ2) and a,b ∈ R with a 6= 0.
If we define Y = aX + b, then Y ∼ N(aµ+ b,a2σ2).
Suppose that X ∼ N(1,2). Determine the distributions of. . .Y = X + 3 =⇒ Y ∼ N(1(1) + 3,12(2)) ≡ N(4,2)
Y = 2X + 3 =⇒ Y ∼ N(2(1) + 3,22(2)) ≡ N(5,8)
Y = 3X + 2 =⇒ Y ∼ N(3(1) + 2,32(2)) ≡ N(5,18)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10
Univariate Normal Parameter Estimation
Likelihood Function
Suppose that x = (x1, . . . , xn) is an iid sample of data from a normaldistribution with mean µ and variance σ2, i.e., xi
iid∼ N(µ, σ2).
The likelihood function for the parameters (given the data) has the form
L(µ, σ2|x) =n∏
i=1
f (xi) =n∏
i=1
1√2πσ2
exp{−(xi − µ)2
2σ2
}and the log-likelihood function is given by
LL(µ, σ2|x) =n∑
i=1
log(f (xi)) = −n2
log(2π)− n2
log(σ2)− 12σ2
n∑i=1
(xi−µ)2
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 11
Univariate Normal Parameter Estimation
Maximum Likelihood Estimate of the Mean
The MLE of the mean is the value of µ that minimizes
n∑i=1
(xi − µ)2 =n∑
i=1
x2i − 2nx̄µ+ nµ2
where x̄ = (1/n)∑n
i=1 xi is the sample mean.
Taking the derivative with respect to µ we find that
∂∑n
i=1(xi − µ)2
∂µ= −2nx̄ + 2nµ ←→ x̄ = µ̂
i.e., the sample mean x̄ is the MLE of the population mean µ.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 12
Univariate Normal Parameter Estimation
Maximum Likelihood Estimate of the Variance
The MLE of the variance is the value of σ2 that minimizes
n2
log(σ2) +1
2σ2
n∑i=1
(xi − µ̂)2 =n2
log(σ2) +
∑ni=1 x2
i2σ2 − nx̄2
2σ2
where x̄ = (1/n)∑n
i=1 xi is the sample mean.
Taking the derivative with respect to σ2 we find that
∂ n2 log(σ2) + 1
2σ2
∑ni=1(xi − µ̂)2
∂σ2 =n
2σ2 −1
2σ4
n∑i=1
(xi − µ̂)2
which implies that the sample variance σ̂2 = (1/n)∑n
i=1(xi − x̄)2 is theMLE of the population variance σ2.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 13
Bivariate Normal
Bivariate Normal
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 14
Bivariate Normal Distribution Form
Normal Density Function (Bivariate)
Given two variables x , y ∈ R, the bivariate normal pdf is
f (x , y) =exp
{− 1
2(1−ρ2)
[(x−µx )2
σ2x
+(y−µy )2
σ2y− 2ρ(x−µx )(y−µy )
σxσy
]}2πσxσy
√1− ρ2
(5)
whereµx ∈ R and µy ∈ R are the marginal meansσx ∈ R+ and σy ∈ R+ are the marginal standard deviations0 ≤ |ρ| < 1 is the correlation coefficient
X and Y are marginally normal: X ∼ N(µx , σ2x ) and Y ∼ N(µy , σ
2y )
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 15
Bivariate Normal Distribution Form
Example: µx = µy = 0, σ2x = 1, σ2
y = 2, ρ = 0.6/√
2
−4−2
02
4−4
−2
0
2
40.02
0.04
0.06
0.08
0.10
0.12
x
y
f(x, y
)
0.00
0.02
0.04
0.06
0.08
0.10
0.12
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
x
y
http://en.wikipedia.org/wiki/File:MultivariateNormal.png
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 16
Bivariate Normal Distribution Form
Example: Different Means
0.00
0.02
0.04
0.06
0.08
0.10
0.12
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
µx = 0, µy = 0
x
y
0.00
0.02
0.04
0.06
0.08
0.10
0.12
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
µx = 1, µy = 2
x
y
0.00
0.02
0.04
0.06
0.08
0.10
0.12
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
µx = −1, µy = −1
x
y
Note: for all three plots σ2x = 1, σ2
y = 2, and ρ = 0.6/√
2.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 17
Bivariate Normal Distribution Form
Example: Different Correlations
0.00
0.02
0.04
0.06
0.08
0.10
0.12
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
ρ = −0.6/ 2
x
y
0.00
0.02
0.04
0.06
0.08
0.10
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
ρ = 0
x
y
0.00
0.05
0.10
0.15
0.20
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
ρ = 1.2/ 2
x
y
Note: for all three plots µx = µy = 0, σ2x = 1, and σ2
y = 2.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 18
Bivariate Normal Distribution Form
Example: Different Variances
0.00
0.05
0.10
0.15
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
σy = 1
x
y
0.00
0.02
0.04
0.06
0.08
0.10
0.12
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
σy = 2
x
y
0.00
0.02
0.04
0.06
0.08
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
σy = 2
x
y
Note: for all three plots µx = µy = 0, σ2x = 1, and ρ = 0.6/(σxσy ).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 19
Bivariate Normal Probability Calculations
Probabilities and Multiple Integration
Probabilities still relate to the area under the pdf:
P([ax ≤ X ≤ bx ] and [ay ≤ Y ≤ by ]) =
∫ bx
ax
∫ by
ay
f (x , y)dydx (6)
where∫ ∫
f (x , y)dydx denotes the multiple integral of the pdf f (x , y).
Defining z = (x , y), we can still define the cdf:
F (z) = P(X ≤ x and Y ≤ y)
=
∫ x
−∞
∫ y
−∞f (u, v)dvdu
(7)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 20
Bivariate Normal Probability Calculations
Normal Distribution Functions (Bivariate)
Helpful figures of bivariate normal pdf and cdf:
0.00
0.02
0.04
0.06
0.08
0.10
0.12
f(x, y
)
−4 −2 0 2 4
−4
−2
02
4
x
y
0.0
0.2
0.4
0.6
0.8
1.0
F(x
, y)
−4 −2 0 2 4
−4
−2
02
4
x
yNote: µx = µy = 0, σ2
x = 1, σ2y = 2, and ρ = 0.6/
√2
Note that the cdf still has an ogive shape (now in two-dimensions).Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 21
Bivariate Normal Affine Transformations
Affine Transformations of Normal (Bivariate)
Given z = (x , y)′, suppose that z ∼ N(µ,Σ) whereµ = (µx , µy )′ is the 2× 1 mean vector
Σ =
(σ2
x ρσxσyρσxσy σ2
y
)is the 2× 2 covariance matrix
Let A =
(a11 a12a21 a22
)and b =
(b1b2
)with A 6= 02×2 =
(0 00 0
).
If we define w = Az + b, then w ∼ N(Aµ + b,AΣA′).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 22
Bivariate Normal Conditional Distributions
Conditional Normal (Bivariate)
The conditional distribution of a variable Y given X = x is
fY |X (y |X = x) =fXY (x , y)
fX (x)(8)
wherefXY (x , y) is the joint pdf of X and YfX (x) is the marginal pdf of X
In the bivariate normal case, we have that
Y |X ∼ N(µ∗, σ2∗) (9)
where µ∗ = µy + ρσyσx
(x − µx ) and σ2∗ = σ2
y (1− ρ2)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 23
Bivariate Normal Conditional Distributions
Derivation of Conditional Normal
To prove Equation (9), simply write out the definition and simplify:
fY |X (y|X = x) =fXY (x, y)
fX (x)
=
exp
{− 1
2(1−ρ2)
[(x−µx )2
σ2x
+(y−µy )2
σ2y− 2ρ(x−µx )(y−µy )
σx σy
]}/(
2πσxσy√
1− ρ2)
exp{− (x−µx )2
2σ2x
}/(σx√
2π)
=
exp
{− 1
2(1−ρ2)
[(x−µx )2
σ2x
+(y−µy )2
σ2y− 2ρ(x−µx )(y−µy )
σx σy
]+
(x−µx )2
2σ2x
}√
2πσy√
1− ρ2
=
exp
{− 1
2σ2y (1−ρ2)
[ρ2 σ2
yσ2
x(x − µx )
2 + (y − µy )2 − 2ρ
σyσx
(x − µx )(y − µy )
]}√
2πσy√
1− ρ2
=
exp
{− 1
2σ2y (1−ρ2)
[y − µy − ρ
σyσx
(x − µx )]2}
√2πσy
√1− ρ2
which completes the proof.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 24
Bivariate Normal Conditional Distributions
Statistical Independence for Bivariate Normal
Two variables X and Y are statistically independent if
fXY (x , y) = fX (x)fY (y) (10)
where fXY (x , y) is joint pdf, and fX (x) and fY (y) are marginals pdfs.
Note that if X and Y are independent, then
fY |X (y |X = x) =fXY (x , y)
fX (x)=
fX (x)fY (y)
fX (x)= fY (y) (11)
so conditioning on X = x does not change the distribution of Y .
If X and Y are bivariate normal, what is the necessary and sufficientcondition for X and Y to be independent? Hint: see Equation (9)Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 25
Bivariate Normal Conditional Distributions
Example #1
A statistics class takes two exams X (Exam 1) and Y (Exam 2) wherethe scores follow a bivariate normal distribution with parameters:
µx = 70 and µy = 60 are the marginal meansσx = 10 and σy = 15 are the marginal standard deviationsρ = 0.6 is the correlation coefficient
Suppose we select a student at random. What is the probability that. . .(a) the student scores over 75 on Exam 2?(b) the student scores over 75 on Exam 2, given that the student
scored X = 80 on Exam 1?(c) the sum of his/her Exam 1 and Exam 2 scores is over 150?(d) the student did better on Exam 1 than Exam 2?(e) P(5X − 4Y > 150)?
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 26
Bivariate Normal Conditional Distributions
Example #1: Part (a)
Answer for 1(a):Note that Y ∼ N(60,152), so the probability that the student scoresover 75 on Exam 2 is
P(Y > 75) = P(
Z >75− 60
15
)= P(Z > 1)
= 1− P(Z < 1)
= 1− Φ(1)
= 1− 0.8413447= 0.1586553
where Φ(x) =∫ x−∞ f (z)dz with f (x) = 1√
2πe−x2/2 denoting the standard
normal pdf (see R code for use of pnorm to calculate this quantity).
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 27
Bivariate Normal Conditional Distributions
Example #1: Part (b)
Answer for 1(b):Note that (Y |X = 80) ∼ N(µ∗, σ
2∗) where
µ∗ = µY + ρσYσX
(x − µX ) = 60 + (0.6)(15/10)(80− 70) = 69σ2∗ = σ2
Y (1− ρ2) = 152(1− 0.62) = 144
If a student scored X = 80 on Exam 1, the probability that the studentscores over 75 on Exam 2 is
P(Y > 75|X = 80) = P(
Z >75− 69
12
)= P (Z > 0.5)
= 1− Φ(0.5)
= 1− 0.6914625= 0.3085375
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 28
Bivariate Normal Conditional Distributions
Example #1: Part (c)
Answer for 1(c):Note that (X + Y ) ∼ N(µ∗, σ
2∗) where
µ∗ = µX + µY = 70 + 60 = 130σ2∗ = σ2
X + σ2Y + 2ρσXσY = 102 + 152 + 2(0.6)(10)(15) = 505
The probability that the sum of Exam 1 and Exam 2 is above 150 is
P(X + Y > 150) = P(
Z >150− 130√
505
)= P (Z > 0.8899883)
= 1− Φ(0.8899883)
= 1− 0.8132639= 0.1867361
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 29
Bivariate Normal Conditional Distributions
Example #1: Part (d)
Answer for 1(d):Note that (X − Y ) ∼ N(µ∗, σ
2∗) where
µ∗ = µX − µY = 70− 60 = 10σ2∗ = σ2
X + σ2Y − 2ρσXσY = 102 + 152 − 2(0.6)(10)(15) = 145
The probability that the student did better on Exam 1 than Exam 2 is
P(X > Y ) = P(X − Y > 0)
= P(
Z >0− 10√
145
)= P (Z > −0.8304548)
= 1− Φ(−0.8304548)
= 1− 0.2031408= 0.7968592
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 30
Bivariate Normal Conditional Distributions
Example #1: Part (e)
Answer for 1(e):Note that (5X − 4Y ) ∼ N(µ∗, σ
2∗) where
µ∗ = 5µX − 4µY = 5(70)− 4(60) = 110σ2∗ = 52σ2
X + (−4)2σ2Y + 2(5)(−4)ρσXσY =
25(102) + 16(152)− 2(20)(0.6)(10)(15) = 2500
Thus, the needed probability can be obtained using
P(5X − 4Y > 150) = P(
Z >150− 110√
2500
)= P (Z > 0.8)
= 1− Φ(0.8)
= 1− 0.7881446= 0.2118554
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 31
Bivariate Normal Conditional Distributions
Example #1: R Code
# Example 1a> pnorm(1,lower=F)[1] 0.1586553> pnorm(75,mean=60,sd=15,lower=F)[1] 0.1586553
# Example 1b> pnorm(0.5,lower=F)[1] 0.3085375> pnorm(75,mean=69,sd=12,lower=F)[1] 0.3085375
# Example 1c> pnorm(20/sqrt(505),lower=F)[1] 0.1867361> pnorm(150,mean=130,sd=sqrt(505),lower=F)[1] 0.1867361
# Example 1d> pnorm(-10/sqrt(145),lower=F)[1] 0.7968592> pnorm(0,mean=10,sd=sqrt(145),lower=F)[1] 0.7968592
# Example 1e> pnorm(0.8,lower=F)[1] 0.2118554> pnorm(150,mean=110,sd=50,lower=F)[1] 0.2118554
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 32
Multivariate Normal
Multivariate Normal
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 33
Multivariate Normal Distribution Form
Normal Density Function (Multivariate)
Given x = (x1, . . . , xp)′ with xj ∈ R ∀j , the multivariate normal pdf is
f (x) =1
(2π)p/2|Σ|1/2 exp{−1
2(x− µ)′Σ−1(x− µ)
}(12)
whereµ = (µ1, . . . , µp)′ is the p × 1 mean vector
Σ =
σ11 σ12 · · · σ1pσ21 σ22 · · · σ2p
......
. . ....
σp1 σp2 · · · σpp
is the p × p covariance matrix
Write x ∼ N(µ,Σ) or x ∼ Np(µ,Σ) to denote x is multivariate normal.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 34
Multivariate Normal Distribution Form
Some Multivariate Normal Properties
The mean and covariance parameters have the following restrictions:µj ∈ R for all jσjj > 0 for all jσij = ρij
√σiiσjj where ρij is correlation between Xi and Xj
σ2ij ≤ σiiσjj for any i , j ∈ {1, . . . ,p} (Cauchy-Schwarz)
Σ is assumed to be positive definite so that Σ−1 exists.
Marginals are normal: Xj ∼ N(µj , σjj) for all j ∈ {1, . . . ,p}.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 35
Multivariate Normal Probability Calculations
Multivariate Normal Probabilities
Probabilities still relate to the area under the pdf:
P(aj ≤ Xj ≤ bj ∀j) =
∫ b1
a1
· · ·∫ bp
ap
f (x)dxp · · · dx1 (13)
where∫· · ·∫
f (x)dxp · · · dx1 denotes the multiple integral f (x).
We can still define the cdf of x = (x1, . . . , xp)′
F (x) = P(Xj ≤ xj ∀j)
=
∫ x1
−∞· · ·∫ xp
−∞f (u)dup · · · du1
(14)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 36
Multivariate Normal Affine Transformations
Affine Transformations of Normal (Multivariate)
Suppose that x = (x1, . . . , xp)′ and that x ∼ N(µ,Σ) whereµ = {µj}p×1 is the mean vectorΣ = {σij}p×p is the covariance matrix
Let A = {aij}n×p and b = {bi}n×1 with A 6= 0n×p.
If we define w = Ax + b, then w ∼ N(Aµ + b,AΣA′).
Note: linear combinations of normal variables are normally distributed.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 37
Multivariate Normal Conditional Distributions
Multivariate Conditional Distributions
Given variables x = (x1, . . . , xp)′ and y = (y1, . . . , yq)′, we have
fY |X (y|X = x) =fXY (x,y)
fX (x)(15)
wherefY |X (y|X = x) is the conditional distribution of y given xfXY (x,y) is the joint pdf of x and yfX (x) is the marginal pdf of x
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 38
Multivariate Normal Conditional Distributions
Conditional Normal (Multivariate)
Suppose that z ∼ N(µ,Σ) wherez = (x′,y′)′ = (x1, . . . , xp, y1, . . . , yq)′
µ = (µ′x ,µ′y )′ = (µ1x , . . . , µpx , µ1y , . . . , µqy )′
Note: µx is mean vector of x, and µy is mean vector of y
Σ =
(Σxx ΣxyΣ′xy Σyy
)where (Σxx )p×p, (Σyy )q×q, and (Σxy )p×q,
Note: Σxx is covariance matrix of x, Σyy is covariance matrix of y,and Σxy is covariance matrix of x and y
In the multivariate normal case, we have that
y|x ∼ N(µ∗,Σ∗) (16)
where µ∗ = µy + Σ′xyΣ−1xx (x− µx ) and Σ∗ = Σyy −Σ′xyΣ
−1xx Σxy
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 39
Multivariate Normal Conditional Distributions
Statistical Independence for Multivariate Normal
Using Equation (16), we have that
y|x ∼ N(µ∗,Σ∗) ≡ N(µy ,Σyy ) (17)
if and only if Σxy = 0p×q (a matrix of zeros).
Note that Σxy = 0p×q implies that the p elements of x are uncorrelatedwith the q elements of y.
For multivariate normal variables: uncorrelated→ independentFor non-normal variables: uncorrelated 6→ independent
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 40
Multivariate Normal Conditional Distributions
Example #2
Each Delicious Candy Company store makes 3 size candy bars:regular (X1), fun size (X2), and big size (X3).
Assume the weight (in ounces) of the candy bars (X1,X2,X3) follow amultivariate normal distribution with parameters:
µ =
537
and Σ =
4 −1 0−1 4 20 2 9
Suppose we select a store at random. What is the probability that. . .(a) the weight of a regular candy bar is greater than 8 oz?(b) the weight of a regular candy bar is greater than 8 oz, given that
the fun size bar weighs 1 oz and the big size bar weighs 10 oz?(c) P(4X1 − 3X2 + 5X3 < 63)?
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 41
Multivariate Normal Conditional Distributions
Example #2: Part (a)
Answer for 2(a):Note that X1 ∼ N(5,4)
So, the probability that the regular bar is more than 8 oz is
P(X1 > 8) = P(
Z >8− 5
2
)= P(Z > 1.5)
= 1− Φ(1.5)
= 1− 0.9331928= 0.0668072
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 42
Multivariate Normal Conditional Distributions
Example #2: Part (b)
Answer for 2(b):(X1|X2 = 1,X3 = 10) is normally distributed, see Equation (16).
The conditional mean of (X1|X2 = 1,X3 = 10) is given by
µ∗ = µX1 + Σ′12Σ−122 (x̃− µ̃)
= 5 +(−1 0
)(4 22 9
)−1( 1− 310− 7
)= 5 +
(−1 0
) 132
(9 −2−2 4
)(−23
)= 5 + 24/32= 5.75
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 43
Multivariate Normal Conditional Distributions
Example #2: Part (b) continued
Answer for 2(b) continued:The conditional variance of (X1|X2 = 1,X3 = 10) is given by
σ2∗ = σ2
X1−Σ′12Σ
−122 Σ12
= 4−(−1 0
)(4 22 9
)−1(−10
)= 4−
(−1 0
) 132
(9 −2−2 4
)(−10
)= 4− 9/32= 3.71875
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 44
Multivariate Normal Conditional Distributions
Example #2: Part (b) continued
Answer for 2(b) continued:So, if the fun size bar weighs 1 oz and the big size bar weighs 10 oz,the probability that the regular bar is more than 8 oz is
P(X1 > 8|X2 = 1,X3 = 10) = P(
Z >8− 5.75√3.71875
)= P(Z > 1.166767)
= 1− Φ(1.166767)
= 1− 0.8783477= 0.1216523
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 45
Multivariate Normal Conditional Distributions
Example #2: Part (c)
Answer for 2(c):(4X1 − 3X2 + 5X3) is normally distributed.
The expectation of (4X1 − 3X2 + 5X3) is given by
µ∗ = 4µX1 − 3µX2 + 5µX3
= 4(5)− 3(3) + 5(7)
= 46
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 46
Multivariate Normal Conditional Distributions
Example #2: Part (c) continued
Answer for 2(c) continued:The variance of (4X1 − 3X2 + 5X3) is given by
σ2∗ =
(4 −3 5
)Σ
4−35
=(4 −3 5
) 4 −1 0−1 4 20 2 9
4−35
=(4 −3 5
)19−639
= 289
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 47
Multivariate Normal Conditional Distributions
Example #2: Part (c) continued
Answer for 2(c) continued:So, the needed probability can be obtained as
P(4X1 − 3X2 + 5X3 < 63) = P(
Z <63− 46√
289
)= P(Z < 1)
= Φ(1)
= 0.8413447
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 48
Multivariate Normal Conditional Distributions
Example #2: R Code
# Example 2a> pnorm(1.5,lower=F)[1] 0.0668072> pnorm(8,mean=5,sd=2,lower=F)[1] 0.0668072
# Example 2b> pnorm(2.25/sqrt(119/32),lower=F)[1] 0.1216523> pnorm(8,mean=5.75,sd=sqrt(119/32),lower=F)[1] 0.1216523
# Example 2c> pnorm(1)[1] 0.8413447> pnorm(63,mean=46,sd=17)[1] 0.8413447
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 49
Multivariate Normal Parameter Estimation
Likelihood Function
Suppose that xi = (xi1, . . . , xip) is a sample from a normal distribution
with mean vector µ and covariance matrix Σ, i.e., xiiid∼ N(µ,Σ).
The likelihood function for the parameters (given the data) has the form
L(µ,Σ|X) =n∏
i=1
f (xi) =n∏
i=1
1(2π)p/2|Σ|1/2 exp
{−1
2(xi − µ)′Σ−1(xi − µ)
}and the log-likelihood function is given by
LL(µ,Σ|X) = −np2
log(2π)− n2
log(|Σ|)− 12
n∑i=1
(xi − µ)′Σ−1(xi − µ)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 50
Multivariate Normal Parameter Estimation
Maximum Likelihood Estimate of Mean Vector
The MLE of the mean vector is the value of µ that minimizes
n∑i=1
(xi − µ)′Σ−1(xi − µ) =n∑
i=1
x′iΣ−1xi − 2nx̄′Σ−1µ + nµ′Σ−1µ
where x̄ = (1/n)∑n
i=1 xi is the sample mean vector.
Taking the derivative with respect to µ we find that
∂LL(µ,Σ|X)
∂µ= −2nΣ−1x̄ + 2nΣ−1µ ←→ x̄ = µ̂
The sample mean vector x̄ is the MLE of the population mean µ vector.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 51
Multivariate Normal Parameter Estimation
Maximum Likelihood Estimate of Covariance Matrix
The MLE of the covariance matrix is the value of Σ that minimizes
−n log(|Σ−1|) +n∑
i=1
tr{Σ−1(xi − µ̂)(xi − µ̂)′}
where µ̂ = x̄ = (1/n)∑n
i=1 xi is the sample mean.
Taking the derivative with respect to Σ−1 we find that
∂LL(µ,Σ|X)
∂Σ−1 = −nΣ +n∑
i=1
(xi − µ̂)(xi − µ̂)′
i.e., the sample covariance matrix Σ̂ = (1/n)∑n
i=1(xi − x̄)(xi − x̄)′ isthe MLE of the population covariance matrix Σ.
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 52
Sampling Distributions
Sampling Distributions
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 53
Sampling Distributions Univariate Case
Univariate Sampling Distributions: x̄ and s2
In the univariate normal case, we have thatx̄ = (1/n)
∑ni=1 xi ∼ N(µ, σ2/n)
(n − 1)s2 =∑n
i=1(xi − x̄)2 ∼ σ2χ2n−1
χ2k denotes a chi-square variable with k degrees of freedom.
σ2χ2k =
∑ki=1 z2
i where ziiid∼ N(0, σ2)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 54
Sampling Distributions Multivariate Case
Multivariate Sampling Distributions: x̄ and S
In the multivariate normal case, we have thatx̄ = (1/n)
∑ni=1 xi ∼ N(µ,Σ/n)
(n − 1)S =∑n
i=1(xi − x̄)(xi − x̄)′ ∼Wn−1(Σ)
Wk (Σ) denotes a Wishart variable with k degrees of freedom.
Wk (Σ) =∑k
i=1 ziz′i where ziiid∼ N(0p,Σ)
Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 55