á á ázhu/ams571/normals_quadratics_regressions.pdfó D P ] v o ] ] µ ] } v X > z ] ] } v } Z z...
Transcript of á á ázhu/ams571/normals_quadratics_regressions.pdfó D P ] v o ] ] µ ] } v X > z ] ] } v } Z z...
1
Lecture Notes
Univariate, Bivariate, Multivariate Normal, Quadratic Forms
& Related Estimations
(1) (Univariate) Normal Distribution
Probability density function:
),(~ 2NX : X follows normal distribution of mean and variance 2
2
2
2
)(
2
1)(
x
exf , Rxx ,
Moment generating function:
22
2
1
)()(tt
txX edxxfetM
Theorem 1 Sampling from the normal population
Let X , X , … , X ~ N(μ, σ ), where i.i.d stands for independent and identically distributed
1. X~N(μ,σ
)
2. A function of the sample variance as shown below follows the Chi square distribution with (n-1) degrees of freedom:
(n − 1)S
σ=
∑ (X − X)
σ~𝜒
*Reminder: The Sample variance S is defined as:
S =∑ (X − X)
n − 1
*Def 1: The Chi-square distribution is a special gamma distribution (*** Please find out which one it is.)
*Def 2: Let Z , Z , … , Z ~ N(0,1), then W = ∑ 𝑍 ~𝜒
3. X & S are independent. 4. The Z-score transformation of the sample mean follows the standard normal distribution:
2
Z =X − 𝜇
σ/√𝑛~N(0,1)
5. A T-random variable with n-1 degrees of freedom can be obtained by replacing the population standard deviation with its sample counterpart in the Z-score above:
T =X − 𝜇
S/√𝑛~𝑡
*Def. of t-distribution (Student’s t-distribution, first introduced by William Sealy Gosset )
Let Z~N(0,1), 𝑊~𝜒 , where Z&𝑊 are independent.
Then,
T =Z
𝑊𝑘
~ 𝑡
*Def. of F-distribution
Let 𝑉~𝜒 & 𝑊~𝜒 , where 𝑉 & 𝑊 are independent.
Then,
F =V/m
W/k ~ 𝐹 ,
It is apparent that
T =Z
W/k ~ 𝐹 ,
3
(2) Bivariate Normal Distribution
(𝑋, 𝑌)~𝐵𝑁(𝜇 , 𝜎 ; 𝜇 , 𝜎 ; 𝜌) where 𝜌 is the correlation between 𝑋 & 𝑌
The joint p.d.f. of (𝑋, 𝑌) is
𝑓 , (𝑥, 𝑦) =1
2𝜋𝜎 𝜎 1 − 𝜌exp −
1
2(1 − 𝜌 )
𝑥 − 𝜇
𝜎− 2𝜌
𝑥 − 𝜇
𝜎
𝑦 − 𝜇
𝜎+
𝑦 − 𝜇
𝜎
where −∞ < 𝑥 < ∞, −∞ < 𝑦 < ∞. Then X and Y are said to have the bivariate normal distribution.
The joint moment generating function for X and Y is
𝑀(𝑡 , 𝑡 ) = exp 𝑡 𝜇 + 𝑡 𝜇 +1
2(𝑡 𝜎 + 2𝜌𝑡 𝑡 𝜎 𝜎 + 𝑡 𝜎 )
Questions:
(a) Find the marginal pdf’s of X and Y;
(b) Prove that X and Y are independent if and only if ρ = 0.
(Here ρ is the population correlation coefficient between X and Y.)
(c) Find the distribution of(𝑋 + 𝑌).
(d) Find the conditional pdf of f(x|y), and f(y|x)
Solution:
(a)
The moment generating function of X can be given by
𝑀 (𝑡) = 𝑀(𝑡, 0) = 𝑒𝑥𝑝 𝜇 𝑡 +12
𝜎2𝑡2 .
Similarly, the moment generating function of Y can be given by
𝑀 (𝑡) = 𝑀(0, 𝑡) = 𝑒𝑥𝑝 𝜇 𝑡 +12
𝜎2𝑡2 .
Thus, X and Y are both marginally normal distributed, i.e.,
𝑋~𝑁(𝜇 , 𝜎2), and 𝑌~𝑁(𝜇 , 𝜎2).
The pdf of X is
4
𝑓 (𝑥) =1
√2𝜋𝜎𝑒𝑥𝑝 −
(𝑥 − 𝜇 )2
2𝜎2 .
The pdf of Y is
𝑓 (𝑦) =1
√2𝜋𝜎𝑒𝑥𝑝 −
(𝑦 − 𝜇 )2
2𝜎2 .
(b)
If𝜌 = 0, then
𝑀(𝑡 , 𝑡 ) = exp 𝜇 𝑡 + 𝜇 𝑡 +1
2(𝜎 𝑡 + 𝜎 𝑡 ) = 𝑀(𝑡 , 0) ∙ 𝑀(0, 𝑡 )
Therefore, X and Y are independent.
If X and Y are independent, then
𝑀(𝑡 , 𝑡 ) = 𝑀(𝑡 , 0) ∙ 𝑀(0, 𝑡 ) = exp 𝜇 𝑡 + 𝜇 𝑡 +1
2(𝜎 𝑡 + 𝜎 𝑡 )
= exp 𝜇 𝑡 + 𝜇 𝑡 +1
2(𝜎 𝑡 + 2𝜌𝜎 𝜎 𝑡 𝑡 + 𝜎 𝑡 )
Therefore, 𝜌 = 0
(c)
𝑀 (𝑡) = 𝐸 𝑒 ( ) = 𝐸[𝑒 ]
Recall that 𝑀(𝑡 , 𝑡 ) = 𝐸[𝑒 ], therefore we can obtain 𝑀 (𝑡)by 𝑡 = 𝑡 = 𝑡 in 𝑀(𝑡 , 𝑡 )
That is,
𝑀 (𝑡) = 𝑀(𝑡, 𝑡) = exp 𝜇 𝑡 + 𝜇 𝑡 +1
2(𝜎 𝑡 + 2𝜌𝜎 𝜎 𝑡 + 𝜎 𝑡 )
= exp (𝜇 + 𝜇 )𝑡 +1
2(𝜎 + 2𝜌𝜎 𝜎 + 𝜎 )𝑡
∴ 𝑋 + 𝑌 ~𝑁(𝜇 = 𝜇 + 𝜇 , 𝜎 = 𝜎 + 2𝜌𝜎 𝜎 + 𝜎 )
5
(d)
The conditional distribution of X given Y=y is given by
𝑓(𝑥|𝑦) =𝑓(𝑥, 𝑦)
𝑓(𝑦)=
1
√2𝜋𝜎 1 − 𝜌2𝑒𝑥𝑝
⎩⎪⎨
⎪⎧
−
𝑥 − 𝜇 −𝜎𝜎
𝜌(𝑦 − 𝜇 )
2
2(1 − 𝜌2)𝜎2
⎭⎪⎬
⎪⎫
.
Similarly, we have the conditional distribution of Y given X=x is
𝑓(𝑦|𝑥) =𝑓(𝑥, 𝑦)
𝑓(𝑥)=
1
√2𝜋𝜎 1 − 𝜌2𝑒𝑥𝑝
⎩⎪⎨
⎪⎧
−
𝑦 − 𝜇 −𝜎𝜎
𝜌(𝑥 − 𝜇 )
2
2(1 − 𝜌2)𝜎2
⎭⎪⎬
⎪⎫
.
Therefore:
𝑋|𝑌 = 𝑦 ~ 𝑁 𝜇 + 𝜌𝜎
𝜎(𝑦 − 𝜇 ), (1 − 𝜌2)𝜎2
𝑌|𝑋 = 𝑥 ~ 𝑁 𝜇 + 𝜌𝜎
𝜎(𝑥 − 𝜇 ), (1 − 𝜌 )𝜎
6
(3) Multivariate Normal Distribution
Let 𝒀 = (𝑌 , … , 𝑌 ) ~𝑁(𝜇, 𝚺), the general formula for the 𝑘-dimensional normal density is
𝑓𝒀(𝑦 , … , 𝑦 ) =1
(2𝜋) |𝚺|exp −
1
2𝑦 − 𝜇 𝚺 𝑦 − 𝜇
where 𝐸 𝒀 = 𝜇 and 𝚺 is the covariance matrix of 𝑌. Now recall the bivariate normal case, where 𝑛 = 2, we have:
𝜇 =𝜇𝜇
𝚺 =𝜎 𝜎
𝜎 𝜎=
𝜎 𝜌𝜎 𝜎
𝜌𝜎 𝜎 𝜎, 𝜎 = Cov(𝑌 , 𝑌 )
𝚺 =1
𝜎 𝜎 (1 − 𝜌 )
𝜎 −𝜌𝜎 𝜎
−𝜌𝜎 𝜎 𝜎=
1
1 − 𝜌
1/𝜎 −𝜌/𝜎 𝜎
−𝜌/𝜎 𝜎 1/𝜎
Thus the joint density of 𝑌 and 𝑌 is
𝑓 , (𝑦 , 𝑦 ) =1
2𝜋𝜎 𝜎 1 − 𝜌exp −
1
2(1 − 𝜌 )
(𝑦 − 𝜇 )
𝜎+
(𝑦 − 𝜇 )
𝜎−
2𝜌(𝑦 − 𝜇 )(𝑦 − 𝜇 )
𝜎 𝜎
Example: Plot of the bivariate pdf of Y =
2
1
Y
Y with mean µ=
0
0 and covariance Σ =
10
01
7
*Marginal distribution.
Let Y be partitioned so that Y =
b
a
YY
, where Ya =[Y1, …, Yk] T
and Yb =[Yk+1, …, Yn] T
. If Y is multivariate normal
with mean µ =
b
a
μμ
and covariance matrix Σ =
bbba
abaa
ΣΣΣΣ
, then the marginal distribution of Ya is also
multivariate Gaussian, with mean µa =[µ1, …, µk] T
and covariance matrix Σaa.
* Conditional distribution
Let Y be partitioned so that Y =
b
a
YY
. If Y is multivariate Gaussian with mean µ =
b
a
μμ
and covariance matrix Σ
=
bbba
abaa
ΣΣΣΣ
, then the conditional distribution of Ya given that Yb=yb, is also multivariate Gaussian, with mean
)μ( Y ΣΣμμ bb1
bbababa and covariance ba
1bbabaab|aa ΣΣΣΣΣ
Theorem:
2~ nQ μYΣμY 1t
[proof:]
Since is positive definite, tTT , where T is a real orthogonal matrix ( ITTTT tt ) and
n
00
00
00
2
1
. Then,
tt TTTT 111 . Thus,
XΛX μYTT ΛμY
μYΣμY
1t
t1t
1t
Q
where μYTX t . Further,
8
2
11
2
2
1
2
1
21
100
01
0
001
n
i i
in
i i
i
n
n
n
XX
X
X
X
XXX
Q
XΛX 1t
Therefore, if we can prove ii NX ,0~ and iX are mutually independent, then
n
in
i
i
i
i XQN
X
1
2
2
~ ,1,0~
.
The joint density function of nXXX ,,, 21 is
Jyfxxxgxg n ,,, 21 ,
where
1
1det
detdetdet
1detdet
1
det
detdet
2
21
2
2
2
1
2
1
2
1
1
1
T
TTT
ITT
TX
Y
TXY
YTX
T
x
y
x
y
x
y
x
y
x
y
x
yx
y
x
y
x
y
x
yJ
t
t
t
n
nnn
n
n
j
i
μμ
9
Therefore, the density function of nXXX ,,, 21
n
i i
i
i
n
ii
t
t
n
i i
in
ii
n
n
i i
i
n
t
n
t
n
x
ITT
TTx
x
xx
yy
yfxg
1
22
1
1
1
2
2
1
1
2
1
22
1
2
12
1
2
12
1
2
2exp
1
2
1
det
detdet
detdet
2
1exp
1
2
1
2
1exp
det
1
2
1
2
1exp
det
1
2
1
2
1exp
det
1
2
1
2
1
μμ
Therefore, ii NX ,0~ and, 𝑋 , ⋯ , 𝑋 are mutually independent.
Moment Generating Function of Multivariate Normal Random Variable:
Let
nn t
t
t
N
Y
Y
Y
Y2
1
2
1
,,~ tμ .
Then, the moment generating function for Y is
ttμt
tt
tt
nn
tnYY
YtYtYtE
YEtttMM
2
1exp
exp
exp,,,
2211
21
10
Theorem: If ,~ μNY and C is a np matrix of rank p, then
tCCCNCY ,~ μ .
[proof:] Let CYX . Then,
ttμt
ssμs
tsts
s
ttt
ttt
tt
tt
tt
ttX
CCC
C
CYE
CYEXEM
2
1exp
2
1exp
exp
expexp
Since tXM is the moment generating function of tCCCN ,μ ,
tCCCNCY ,~ μ . ◆
Corollary: If ,~ 2 INY μ then
ITNTY 2,~ μ ,
where T is an orthogonal matrix.
Theorem: If ,~ μNY , then the marginal distribution of subset of the elements of Y is also multivariate normal.
,~2
1
μN
Y
Y
Y
Y
n
, then ,~2
1
μN
Y
Y
Y
Y
im
i
i
, where
11
222
222
222
21
21
22212
12111
2
1
, ,,,2,1,,, ,
mmmm
m
m
m iiiiii
iiiiii
iiiiii
i
i
i
m niiinm
μ
Theorem: Y has a multivariate normal distribution if and only if Yta is univariate normal for all real vectors a .
[proof:]
:
Suppose YVYE ,μ . Yta is univariate normal. Also,
aaaaaμaaa tttttt YVYVYEYE , .
Then, ,~ aaμaa ttt NY . Since
aa
aaμa
Y
t
Z
ttX
M
YE
XE
M
NZ
M
exp
exp
2
1exp1
,~
2
1exp1
2
2
Since
aaμaa tt
YM2
1exp ,
is the moment generating function of , μN , thus Y has a multivariate distribution , μN .
:
By the previous theorem. ◆
12
(4) Quadratic forms in normal variables
Theorem: If ,~ 2 INY μ and let P be an nn symmetric matrix of rank r. Then,
2
μμ YPYQt
is distributed as 2r if and only if PP 2 (i.e., P is idempotent).
[proof]
:
Suppose PP 2 and rPrank . Then, P has r eigenvalues equal to 1 and rn eigenvalues equal to 0.
Thus, without loss generalization,
tt TTTTP
0000
0000
1
000
0001
where T is an orthogonal matrix. Then,
2
222
21
2
1
212
212
22
1
r
n
n
tn
tt
ttt
ZZZ
Z
Z
Z
ZZZ
ZZZYTZZZ
YTTYYPYQ
μ
μμμμ
Since μ YTZ t and INY 2,0~ μ , thus
INTTTNYTZ ttt 22 ,0,0~ μ .
13
nZZZ ,,, 21 are i.i.d. normal random variables with common variance 2 . Therefore,
222
2
2
12
222
21 ~ r
rr ZZZZZZQ
:
Since P is symmetric, tTTP , where T is an orthogonal matrix and is a diagonal matrix with elements
r ,,, 21 . Thus, let μ YTZ t . Since INY 2,0~ μ ,
INTTTNYTZ ttt 22 ,0,0~ μ .
That is, rZZZ ,,, 21 are independent normal random variable with variance 2 . Then,
21
2
212
22
r
iii
tn
tt
ttt
Z
ZZZYTZZZ
YTTYYPYQ
μ
μμμμ
The moment generating function of 21
2
r
iiiZ
Q is
r
i
ii
r
iii Zt
EZ
tE1
2
2
21
2
expexp
r
ii
r
i i
i
r
i
iii
i
i
r
i
iii
ir
i
ii
tt
dztzt
t
dztz
dzzzt
1
21
112
2
2
12
2
22
2
12
2
2
2121
1
2
21exp
2
21
21
1
2
21exp
2
1
2expexp
2
1
14
Also, since Q is distributed as 2r , the moment generating function is also equal to 221
rt
. Thus, for
every t,
r
ii
rtttQE
1
21
2 2121exp
Further,
r
ii
r tt1
2121 .
By the uniqueness of polynomial roots, we must have 1i . Then, PP 2 by the following result:
a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r
eigenvalues equal to 0. ◆
Important Result: Let ,0~ INY and let 11 YPYQ t and 22 YPYQ t be both distributed as chi-square. Then, 1Q and
2Q are independent if and only if 0 21 PP .
Useful Lemma:
If 12
1 PP , 22
2 PP and 21 PP is semi-positive definite, then
21221 PPPPP
21 PP is idempotent.
Theorem:
If ,~ 2 INY μ and let
22
221
1 ,
μμμμ
YPYQ
YPYQ
tt
If 0 ,~ ,~ 212
22
1 21 QQQQ rr , then 21 QQ and
2Q are independent and 221 21
~ rrQQ .
[proof:]
15
We first prove 221 21
~ rrQQ . 021 QQ , thus
0221
21
μμ YPPYQQt
Since ,0~ 2 INY μ , μY is any vector in nR . Therefore, 21 PP is semidefinite. By the above
useful lemma, 21 PP is idempotent. Further, by the previous theorem,
22
2121 21
~ rr
t YPPYQQ
μμ
since
21
21
212121
rr
PrankPrank
PtrPtrPPtrPPrank
We now prove 21 QQ and 2Q are independent. Since
0 22222122121221 PPPPPPPPPPPPPP
By the previous important result, the proof is complete. ◆
16
(5) Regression and Simple Time Series Analysis
1. Let (𝑥 , 𝑦 ), ⋯ , (𝑥 , 𝑦 ) be n points. Please derive the least squares regression line for the usual simple linear regression: : 𝑦 = 𝛽 + 𝛽 𝑥 + 𝜀
Solution:
To minimize the sum (over all n points)of squared vertical distances from each point to the line:
𝑆𝑆 = ∑ (𝑦 − 𝛽 − 𝛽 𝑥 ) :
⎩⎨
⎧𝜕𝑆𝑆
𝜕𝛽= 0
𝜕𝑆𝑆
𝜕𝛽= 0
⟹
⎩⎪⎨
⎪⎧ 𝑦 − 𝛽 − 𝛽 𝑥 = 0
𝑥 𝑦 − 𝛽 − 𝛽 𝑥 = 0
⟹ 𝛽 = 𝑦 − 𝛽 �̅�
⟹ 𝑥 𝑦 − 𝑦 + 𝛽 �̅� − 𝛽 𝑥 = 0
⟹ 𝛽 =∑ 𝑥 (𝑦 − 𝑦)
∑ 𝑥 (𝑥 − �̅�)=
∑ (𝑥 − �̅�)(𝑦 − 𝑦)
∑ (𝑥 − �̅�)=
𝑆
𝑆
𝑦 = 𝛽 + 𝛽 𝑥 is the least squares regression line.
17
2. Derive the least squares regression line for simple linear regression through the origin: 𝑦 = 𝛽 𝑥 + 𝜀
Solution:
To minimize
𝑆𝑆 = ∑ (𝑦 − 𝛽 𝑥 ) :
𝜕𝑆𝑆
𝜕𝛽= 0 ⟹ 𝑥 𝑦 − 𝛽 𝑥 = 0 ⟹ 𝛽 =
∑ 𝑥 𝑦
∑ 𝑥
𝑦 = 𝛽 𝑥 is the least squares regression line through the origin.
18
3. Let (𝑥 , 𝑦 ), ⋯ , (𝑥 , 𝑦 ) be n points observed. Our goal is to fit a simple linear regression model using the maximum likelihood estimator. The model and assumptions are as follows:
𝑦 = 𝛽 + 𝛽 𝑥 + 𝜀 , 𝑖 = 1, 2, … , 𝑛.
We assume that the xi‘s are given (condition on the x’s, so that the x’s are not considered as random
variables here), and 𝜀𝑖𝑖𝑑~ 𝑁(0, 𝜎 ).
Solution:
Based on the assumptions that
𝑦 = 𝛽 + 𝛽 𝑥 + 𝜀 , 𝑖 = 1, 2, … , 𝑛, 𝑥 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛, 𝑎𝑛𝑑 𝜀𝑖𝑖𝑑~ 𝑁(0, 𝜎 ),
we can derive the p.d.f. for yi that
𝑦𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
~𝑏𝑢𝑡 𝑛𝑜𝑡 𝑖𝑖𝑑
𝑁(𝛽 + 𝛽 𝑥 , 𝜎 ) ⇔ 𝑓(𝑦 ) =1
√2𝜋𝜎𝑒
[ ( )]
.
Thus, the likelihood and log-likelihood function can be derived as follows:
𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: ℒ = 𝑓(𝑦 , 𝑦 , ⋯ , 𝑦 )
= 𝑓(𝑦 ) =1
√2𝜋𝜎𝑒
[ ( )]
= (2𝜋𝜎 ) 𝑒∑ [ ( )]
,
𝑙𝑜𝑔 − 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝑙 = log ℒ = −𝑛
2log(2𝜋) −
𝑛
2log (𝜎 ) −
1
2𝜎[𝑦 − (𝛽 + 𝛽 𝑥 )] .
In order to maximize the likelihood function, which is in turn maximizing the log-likelihood function, we take the first derivatives of involved parameters in the as-derived log-likelihood function and set them to 0.
⎩⎪⎪⎪⎨
⎪⎪⎪⎧ 𝜕𝑙
𝜕𝛽=
1
𝜎[𝑦 − (𝛽 + 𝛽 𝑥 )] = 0
𝜕𝑙
𝜕𝛽=
1
𝜎𝑥 [𝑦 − (𝛽 + 𝛽 𝑥 )] = 0
𝜕𝑙
𝜕𝜎= −
𝑛
2𝜎+
1
2𝜎[𝑦 − (𝛽 + 𝛽 𝑥 )] = 0
.
By solving the three equations above, we can derive the MLEs as shown below:
19
⎩⎪⎪⎨
⎪⎪⎧
𝛽 = 𝑦 − 𝛽 �̅�
𝛽 =∑ (𝑥 − �̅�)(𝑦 − 𝑦)
∑ (𝑥 − �̅�)=
𝑆
𝑆
𝜎 =1
𝑛[𝑦 − (𝛽 + 𝛽 𝑥 )]
Now we have the fitted regression model by plugging the above MLEs of the model parameters:
𝑦 = 𝛽 + 𝛽 𝑥 , 𝑖 = 1, 2, … , 𝑛.
Another way to write down the exact likelihood is to use the multivariate normal distribution directly because the likelihood is simply the joint distribution of the entire sample.
Let 𝑌 = (𝑌 , … , 𝑌 )′, the general formula for the 𝑛-dimensional multivariate normal density function is
𝑓(𝑦 , … , 𝑦 ) =1
(2𝜋) |𝚺|exp −
1
2𝑦 − 𝜇 𝚺 𝑦 − 𝜇
where 𝐸 𝑌 = 𝜇 and 𝚺 is the covariance matrix of 𝑌.
Given that we have i.i.d. Gaussian white noise, we can write down the exact likelihood of our regression dependent variables 𝑌 = (𝑌 , … , 𝑌 )′, conditioning on {𝑋 , … , 𝑋 } as a multivariate normal density function directly as follows:
𝐿(𝒚; 𝜽) = 𝑓(𝑦 , ⋯ , 𝑦 ; 𝜽)
where 𝜽 = (𝛽 , 𝛽 , 𝜎 ) . We have:
𝐸 𝑌 = 𝜇 = 𝑋𝛽 =1 𝑥⋮ ⋮1 𝑥
𝛽𝛽
𝑉𝑎𝑟 𝑌 = Σ = 𝜎1 0 0⋮ ⋱ ⋮0 0 1
= 𝜎 𝐈
20
4. For the stationary autoregressive model of order 1, that is, an AR(1) process: X = β +β X + ε , where {ε } is a sequence of Gaussian white noise.
*Definition: A process{X } is called (second-order or weakly) stationary if its mean is constant and its auto covariance functions depends only on the lag, that is,
𝐸[𝑋(𝑡)] = 𝜇
𝐶𝑜𝑣[𝑋(𝑡), 𝑋(𝑡 + 𝜏)] = 𝛾(𝜏)
a) Please derive the conditional MLE for β , β . Please show that they are the same as the OLS. (Assume that we have observed {𝑥 , 𝑥 , ⋯ , 𝑥 })
b) Please write down the exact (full) likelihood. c) Please derive the method of moment estimator (MOME) of the model parameters β , β . Are they the
same as the OLS?
Solution:
a) The conditional likelihood: 𝐿 = 𝑓(𝑥 |𝑥 )𝑓(𝑥 |𝑥 , 𝑥 ) ⋯ 𝑓(𝑥 |𝑥 , ⋯ , 𝑥 ) = 𝑓(𝑥 |𝑥 )𝑓(𝑥 |𝑥 ) ⋯ 𝑓(𝑥 |𝑥 )
=1
√2𝜋𝜎exp
(𝑥 − 𝛽 − 𝛽 𝑥 )
2𝜎∙
1
√2𝜋𝜎exp
(𝑥 − 𝛽 − 𝛽 𝑥 )
2𝜎∙ ⋯
∙1
√2𝜋𝜎exp
(𝑥 − 𝛽 − 𝛽 𝑥 )
2𝜎=
1
√2𝜋𝜎𝑒𝑥𝑝 −
(𝑥 − 𝛽 − 𝛽 𝑥 )
2𝜎
𝑙 = 𝑙𝑜𝑔𝐿 = −𝑁 − 1
2log(2𝜋𝜎 ) −
1
2𝜎(𝑥 − 𝛽 − 𝛽 𝑥 )
𝜕𝑙
𝜕𝛽= 0 ⟹ 𝛽 =
∑ 𝑥 − 𝛽 ∑ 𝑥
𝑁 − 1
𝜕𝑙
𝜕𝛽= 0 ⟹ (𝑥 − 𝛽 − 𝛽 𝑥 )𝑥 = 0 ⟹ 𝑥 −
∑ 𝑥 − 𝛽 ∑ 𝑥
𝑁 − 1− 𝛽 𝑥 𝑥 = 0 ⟹ 𝛽
=∑ 𝑥 𝑥 −
(∑ 𝑥 )(∑ 𝑥 )𝑁 − 1
∑ 𝑥 −(∑ 𝑥 )
𝑁 − 1
Therefore the conditional MLE are:
21
𝛽 =∑ 𝑥 − 𝛽 ∑ 𝑥
𝑁 − 1, 𝛽 =
∑ 𝑥 𝑥 −(∑ 𝑥 )(∑ 𝑥 )
𝑁 − 1
∑ 𝑥 −(∑ 𝑥 )
𝑁 − 1
Next we show that the conditional MLE are same with the OLS:
𝑥 𝑥
𝑥 𝑥
⋮ ⋮
𝑥 𝑥
⟹ 𝛽 = 𝑦 − 𝛽 �̅� =∑ 𝑥 − 𝛽 ∑ 𝑥
𝑁 − 1= 𝛽
⟹ 𝛽 =𝑆
𝑆=
∑ 𝑥 𝑥 −(∑ 𝑥 )(∑ 𝑥 )
𝑁 − 1
∑ 𝑥 −(∑ 𝑥 )
𝑁 − 1
= 𝛽
b) Write down the exact (full) likelihood.
X = β +β X + ε ⟹ 𝐸(X ) =β
1 − β, 𝑉𝑎𝑟(X ) =
𝜎
1 − 𝛽
𝐿 = 𝑓(𝑥 |𝑥 )𝑓(𝑥 |𝑥 , 𝑥 ) ⋯ 𝑓(𝑥 |𝑥 , ⋯ , 𝑥 ) ∙ 𝑓(𝑥 )
=1
√2𝜋𝜎𝑒𝑥𝑝 −
(𝑥 − 𝛽 − 𝛽 𝑥 )
2𝜎∙
1
2𝜋𝜎
1 − 𝛽
𝑒𝑥𝑝
⎣⎢⎢⎡
−𝑥 −
β1 − β
2𝜎
1 − 𝛽 ⎦⎥⎥⎤
The exact MLEs can thus be derived in a similar fashion as the conditional MLEs.
Another way to write down the exact likelihood is to use the multivariate normal distribution directly because the likelihood is simply the joint distribution of the entire sample.
Let 𝑋 = (𝑋 , … , 𝑋 )′, the general formula for the 𝑁-dimensional multivariate normal density function is
𝑓(𝑥 , … , 𝑥 ) =1
(2𝜋) |𝚺|exp −
1
2𝑥 − 𝜇 𝚺 𝑥 − 𝜇
22
where 𝐸 𝑋 = 𝜇 and 𝚺 is the covariance matrix of 𝑋.
Given that we have i.i.d. Gaussian white noise, we can write down the exact likelihood of our time series {𝑋 , … , 𝑋 } as a multivariate normal density function directly as follows:
𝐿(𝒙; 𝜽) = 𝑓(𝑥 , ⋯ , 𝑥 ; 𝜽)
For our stationary AR(1)
𝑋 = 𝛽 + 𝛽 𝑋 + 𝜀
where {𝜀 } is a series of Gaussian white noise (that is, i.i.d. 𝑁(0, 𝜎 ), 𝑡 = 1, ⋯ , 𝑁),
𝜽 = (𝛽 , 𝛽 , 𝜎 ) , |𝛽 | < 1.
𝐸 𝑋 = 𝜇 =𝛽
1 − 𝛽1
𝑉𝑎𝑟 𝑋 = Σ =𝜎
1 − 𝛽⎣⎢⎢⎡
1𝛽
𝛽1
⋯⋯
𝛽
𝛽⋮
𝛽⋮
𝛽⋱⋯
⋮1 ⎦
⎥⎥⎤
c) Derive the MOME of the model parameters β , β . Are they the same as the OLS?
𝐸(X ) =β
1 − β⟹ �̅� =
𝛽
1 − 𝛽
𝛾(1) = 𝐶𝑜𝑣(X , X ) = β 𝛾(0) ⟹ 𝜌(1) = β ⟹∑ (𝑥 − �̅�)(𝑥 − �̅�)
∑ (𝑥 − �̅�)𝟐= 𝛽
Therefore the MOME of the model parameters are:
𝛽 = 1 − 𝛽 �̅�, 𝛽 =∑ (𝑥 − �̅�)(𝑥 − �̅�)
∑ (𝑥 − �̅�)𝟐
So we conclude that they are not the same as the OLS.
*Please see the supplement notes on Cochran’s Theorem to understand how to test the significance of regression, for a multiple regression. using the Quadratic Forms.