á á ázhu/ams571/normals_quadratics_regressions.pdfó D P ] v o ] ] µ ] } v X > z ] ] } v } Z z...

22
1 Lecture Notes Univariate, Bivariate, Multivariate Normal, Quadratic Forms & Related Estimations (1) (Univariate) Normal Distribution Probability density function: ) , ( ~ 2 N X : X follows normal distribution of mean and variance 2 2 2 2 ) ( 2 1 ) ( x e x f , R x x , Moment generating function: 2 2 2 1 ) ( ) ( t t tx X e dx x f e t M Theorem 1 Sampling from the normal population Let X ,X ,…,X ~ N(μ, σ ), where i.i.d stands for independent and identically distributed 1. X ~N(μ, σ ) 2. A function of the sample variance as shown below follows the Chi square distribution with (n-1) degrees of freedom: (n − 1)S σ = (X −X ) ୀଵ σ ~ *Reminder: The Sample variance S is defined as: S = (X −X ) ୀଵ n−1 *Def 1: The Chi-square distribution is a special gamma distribution (*** Please find out which one it is.) *Def 2: Let Z ,Z ,…,Z ~ N(0,1), then W=∑ ୀଵ ~ 3. X & S are independent. 4. The Z-score transformation of the sample mean follows the standard normal distribution:

Transcript of á á ázhu/ams571/normals_quadratics_regressions.pdfó D P ] v o ] ] µ ] } v X > z ] ] } v } Z z...

1

Lecture Notes

Univariate, Bivariate, Multivariate Normal, Quadratic Forms

& Related Estimations

(1) (Univariate) Normal Distribution

Probability density function:

),(~ 2NX : X follows normal distribution of mean and variance 2

2

2

2

)(

2

1)(

x

exf , Rxx ,

Moment generating function:

22

2

1

)()(tt

txX edxxfetM

Theorem 1 Sampling from the normal population

Let X , X , … , X ~ N(μ, σ ), where i.i.d stands for independent and identically distributed

1. X~N(μ,σ

)

2. A function of the sample variance as shown below follows the Chi square distribution with (n-1) degrees of freedom:

(n − 1)S

σ=

∑ (X − X)

σ~𝜒

*Reminder: The Sample variance S is defined as:

S =∑ (X − X)

n − 1

*Def 1: The Chi-square distribution is a special gamma distribution (*** Please find out which one it is.)

*Def 2: Let Z , Z , … , Z ~ N(0,1), then W = ∑ 𝑍 ~𝜒

3. X & S are independent. 4. The Z-score transformation of the sample mean follows the standard normal distribution:

2

Z =X − 𝜇

σ/√𝑛~N(0,1)

5. A T-random variable with n-1 degrees of freedom can be obtained by replacing the population standard deviation with its sample counterpart in the Z-score above:

T =X − 𝜇

S/√𝑛~𝑡

*Def. of t-distribution (Student’s t-distribution, first introduced by William Sealy Gosset )

Let Z~N(0,1), 𝑊~𝜒 , where Z&𝑊 are independent.

Then,

T =Z

𝑊𝑘

~ 𝑡

*Def. of F-distribution

Let 𝑉~𝜒 & 𝑊~𝜒 , where 𝑉 & 𝑊 are independent.

Then,

F =V/m

W/k ~ 𝐹 ,

It is apparent that

T =Z

W/k ~ 𝐹 ,

3

(2) Bivariate Normal Distribution

(𝑋, 𝑌)~𝐵𝑁(𝜇 , 𝜎 ; 𝜇 , 𝜎 ; 𝜌) where 𝜌 is the correlation between 𝑋 & 𝑌

The joint p.d.f. of (𝑋, 𝑌) is

𝑓 , (𝑥, 𝑦) =1

2𝜋𝜎 𝜎 1 − 𝜌exp −

1

2(1 − 𝜌 )

𝑥 − 𝜇

𝜎− 2𝜌

𝑥 − 𝜇

𝜎

𝑦 − 𝜇

𝜎+

𝑦 − 𝜇

𝜎

where −∞ < 𝑥 < ∞, −∞ < 𝑦 < ∞. Then X and Y are said to have the bivariate normal distribution.

The joint moment generating function for X and Y is

𝑀(𝑡 , 𝑡 ) = exp 𝑡 𝜇 + 𝑡 𝜇 +1

2(𝑡 𝜎 + 2𝜌𝑡 𝑡 𝜎 𝜎 + 𝑡 𝜎 )

Questions:

(a) Find the marginal pdf’s of X and Y;

(b) Prove that X and Y are independent if and only if ρ = 0.

(Here ρ is the population correlation coefficient between X and Y.)

(c) Find the distribution of(𝑋 + 𝑌).

(d) Find the conditional pdf of f(x|y), and f(y|x)

Solution:

(a)

The moment generating function of X can be given by

𝑀 (𝑡) = 𝑀(𝑡, 0) = 𝑒𝑥𝑝 𝜇 𝑡 +12

𝜎2𝑡2 .

Similarly, the moment generating function of Y can be given by

𝑀 (𝑡) = 𝑀(0, 𝑡) = 𝑒𝑥𝑝 𝜇 𝑡 +12

𝜎2𝑡2 .

Thus, X and Y are both marginally normal distributed, i.e.,

𝑋~𝑁(𝜇 , 𝜎2), and 𝑌~𝑁(𝜇 , 𝜎2).

The pdf of X is

4

𝑓 (𝑥) =1

√2𝜋𝜎𝑒𝑥𝑝 −

(𝑥 − 𝜇 )2

2𝜎2 .

The pdf of Y is

𝑓 (𝑦) =1

√2𝜋𝜎𝑒𝑥𝑝 −

(𝑦 − 𝜇 )2

2𝜎2 .

(b)

If𝜌 = 0, then

𝑀(𝑡 , 𝑡 ) = exp 𝜇 𝑡 + 𝜇 𝑡 +1

2(𝜎 𝑡 + 𝜎 𝑡 ) = 𝑀(𝑡 , 0) ∙ 𝑀(0, 𝑡 )

Therefore, X and Y are independent.

If X and Y are independent, then

𝑀(𝑡 , 𝑡 ) = 𝑀(𝑡 , 0) ∙ 𝑀(0, 𝑡 ) = exp 𝜇 𝑡 + 𝜇 𝑡 +1

2(𝜎 𝑡 + 𝜎 𝑡 )

= exp 𝜇 𝑡 + 𝜇 𝑡 +1

2(𝜎 𝑡 + 2𝜌𝜎 𝜎 𝑡 𝑡 + 𝜎 𝑡 )

Therefore, 𝜌 = 0

(c)

𝑀 (𝑡) = 𝐸 𝑒 ( ) = 𝐸[𝑒 ]

Recall that 𝑀(𝑡 , 𝑡 ) = 𝐸[𝑒 ], therefore we can obtain 𝑀 (𝑡)by 𝑡 = 𝑡 = 𝑡 in 𝑀(𝑡 , 𝑡 )

That is,

𝑀 (𝑡) = 𝑀(𝑡, 𝑡) = exp 𝜇 𝑡 + 𝜇 𝑡 +1

2(𝜎 𝑡 + 2𝜌𝜎 𝜎 𝑡 + 𝜎 𝑡 )

= exp (𝜇 + 𝜇 )𝑡 +1

2(𝜎 + 2𝜌𝜎 𝜎 + 𝜎 )𝑡

∴ 𝑋 + 𝑌 ~𝑁(𝜇 = 𝜇 + 𝜇 , 𝜎 = 𝜎 + 2𝜌𝜎 𝜎 + 𝜎 )

5

(d)

The conditional distribution of X given Y=y is given by

𝑓(𝑥|𝑦) =𝑓(𝑥, 𝑦)

𝑓(𝑦)=

1

√2𝜋𝜎 1 − 𝜌2𝑒𝑥𝑝

⎩⎪⎨

⎪⎧

𝑥 − 𝜇 −𝜎𝜎

𝜌(𝑦 − 𝜇 )

2

2(1 − 𝜌2)𝜎2

⎭⎪⎬

⎪⎫

.

Similarly, we have the conditional distribution of Y given X=x is

𝑓(𝑦|𝑥) =𝑓(𝑥, 𝑦)

𝑓(𝑥)=

1

√2𝜋𝜎 1 − 𝜌2𝑒𝑥𝑝

⎩⎪⎨

⎪⎧

𝑦 − 𝜇 −𝜎𝜎

𝜌(𝑥 − 𝜇 )

2

2(1 − 𝜌2)𝜎2

⎭⎪⎬

⎪⎫

.

Therefore:

𝑋|𝑌 = 𝑦 ~ 𝑁 𝜇 + 𝜌𝜎

𝜎(𝑦 − 𝜇 ), (1 − 𝜌2)𝜎2

𝑌|𝑋 = 𝑥 ~ 𝑁 𝜇 + 𝜌𝜎

𝜎(𝑥 − 𝜇 ), (1 − 𝜌 )𝜎

6

(3) Multivariate Normal Distribution

Let 𝒀 = (𝑌 , … , 𝑌 ) ~𝑁(𝜇, 𝚺), the general formula for the 𝑘-dimensional normal density is

𝑓𝒀(𝑦 , … , 𝑦 ) =1

(2𝜋) |𝚺|exp −

1

2𝑦 − 𝜇 𝚺 𝑦 − 𝜇

where 𝐸 𝒀 = 𝜇 and 𝚺 is the covariance matrix of 𝑌. Now recall the bivariate normal case, where 𝑛 = 2, we have:

𝜇 =𝜇𝜇

𝚺 =𝜎 𝜎

𝜎 𝜎=

𝜎 𝜌𝜎 𝜎

𝜌𝜎 𝜎 𝜎, 𝜎 = Cov(𝑌 , 𝑌 )

𝚺 =1

𝜎 𝜎 (1 − 𝜌 )

𝜎 −𝜌𝜎 𝜎

−𝜌𝜎 𝜎 𝜎=

1

1 − 𝜌

1/𝜎 −𝜌/𝜎 𝜎

−𝜌/𝜎 𝜎 1/𝜎

Thus the joint density of 𝑌 and 𝑌 is

𝑓 , (𝑦 , 𝑦 ) =1

2𝜋𝜎 𝜎 1 − 𝜌exp −

1

2(1 − 𝜌 )

(𝑦 − 𝜇 )

𝜎+

(𝑦 − 𝜇 )

𝜎−

2𝜌(𝑦 − 𝜇 )(𝑦 − 𝜇 )

𝜎 𝜎

Example: Plot of the bivariate pdf of Y =

2

1

Y

Y with mean µ=

0

0 and covariance Σ =

10

01

7

*Marginal distribution.

Let Y be partitioned so that Y =

b

a

YY

, where Ya =[Y1, …, Yk] T

and Yb =[Yk+1, …, Yn] T

. If Y is multivariate normal

with mean µ =

b

a

μμ

and covariance matrix Σ =

bbba

abaa

ΣΣΣΣ

, then the marginal distribution of Ya is also

multivariate Gaussian, with mean µa =[µ1, …, µk] T

and covariance matrix Σaa.

* Conditional distribution

Let Y be partitioned so that Y =

b

a

YY

. If Y is multivariate Gaussian with mean µ =

b

a

μμ

and covariance matrix Σ

=

bbba

abaa

ΣΣΣΣ

, then the conditional distribution of Ya given that Yb=yb, is also multivariate Gaussian, with mean

)μ( Y ΣΣμμ bb1

bbababa and covariance ba

1bbabaab|aa ΣΣΣΣΣ

Theorem:

2~ nQ μYΣμY 1t

[proof:]

Since is positive definite, tTT , where T is a real orthogonal matrix ( ITTTT tt ) and

n

00

00

00

2

1

. Then,

tt TTTT 111 . Thus,

XΛX μYTT ΛμY

μYΣμY

1t

t1t

1t

Q

where μYTX t . Further,

8

2

11

2

2

1

2

1

21

100

01

0

001

n

i i

in

i i

i

n

n

n

XX

X

X

X

XXX

Q

XΛX 1t

Therefore, if we can prove ii NX ,0~ and iX are mutually independent, then

n

in

i

i

i

i XQN

X

1

2

2

~ ,1,0~

.

The joint density function of nXXX ,,, 21 is

Jyfxxxgxg n ,,, 21 ,

where

1

1det

detdetdet

1detdet

1

det

detdet

2

21

2

2

2

1

2

1

2

1

1

1

T

TTT

ITT

TX

Y

TXY

YTX

T

x

y

x

y

x

y

x

y

x

y

x

yx

y

x

y

x

y

x

yJ

t

t

t

n

nnn

n

n

j

i

μμ

9

Therefore, the density function of nXXX ,,, 21

n

i i

i

i

n

ii

t

t

n

i i

in

ii

n

n

i i

i

n

t

n

t

n

x

ITT

TTx

x

xx

yy

yfxg

1

22

1

1

1

2

2

1

1

2

1

22

1

2

12

1

2

12

1

2

2exp

1

2

1

det

detdet

detdet

2

1exp

1

2

1

2

1exp

det

1

2

1

2

1exp

det

1

2

1

2

1exp

det

1

2

1

2

1

μμ

Therefore, ii NX ,0~ and, 𝑋 , ⋯ , 𝑋 are mutually independent.

Moment Generating Function of Multivariate Normal Random Variable:

Let

nn t

t

t

N

Y

Y

Y

Y2

1

2

1

,,~ tμ .

Then, the moment generating function for Y is

ttμt

tt

tt

nn

tnYY

YtYtYtE

YEtttMM

2

1exp

exp

exp,,,

2211

21

10

Theorem: If ,~ μNY and C is a np matrix of rank p, then

tCCCNCY ,~ μ .

[proof:] Let CYX . Then,

ttμt

ssμs

tsts

s

ttt

ttt

tt

tt

tt

ttX

CCC

C

CYE

CYEXEM

2

1exp

2

1exp

exp

expexp

Since tXM is the moment generating function of tCCCN ,μ ,

tCCCNCY ,~ μ . ◆

Corollary: If ,~ 2 INY μ then

ITNTY 2,~ μ ,

where T is an orthogonal matrix.

Theorem: If ,~ μNY , then the marginal distribution of subset of the elements of Y is also multivariate normal.

,~2

1

μN

Y

Y

Y

Y

n

, then ,~2

1

μN

Y

Y

Y

Y

im

i

i

, where

11

222

222

222

21

21

22212

12111

2

1

, ,,,2,1,,, ,

mmmm

m

m

m iiiiii

iiiiii

iiiiii

i

i

i

m niiinm

μ

Theorem: Y has a multivariate normal distribution if and only if Yta is univariate normal for all real vectors a .

[proof:]

:

Suppose YVYE ,μ . Yta is univariate normal. Also,

aaaaaμaaa tttttt YVYVYEYE , .

Then, ,~ aaμaa ttt NY . Since

aa

aaμa

Y

t

Z

ttX

M

YE

XE

M

NZ

M

exp

exp

2

1exp1

,~

2

1exp1

2

2

Since

aaμaa tt

YM2

1exp ,

is the moment generating function of , μN , thus Y has a multivariate distribution , μN .

:

By the previous theorem. ◆

12

(4) Quadratic forms in normal variables

Theorem: If ,~ 2 INY μ and let P be an nn symmetric matrix of rank r. Then,

2

μμ YPYQt

is distributed as 2r if and only if PP 2 (i.e., P is idempotent).

[proof]

:

Suppose PP 2 and rPrank . Then, P has r eigenvalues equal to 1 and rn eigenvalues equal to 0.

Thus, without loss generalization,

tt TTTTP

0000

0000

1

000

0001

where T is an orthogonal matrix. Then,

2

222

21

2

1

212

212

22

1

r

n

n

tn

tt

ttt

ZZZ

Z

Z

Z

ZZZ

ZZZYTZZZ

YTTYYPYQ

μ

μμμμ

Since μ YTZ t and INY 2,0~ μ , thus

INTTTNYTZ ttt 22 ,0,0~ μ .

13

nZZZ ,,, 21 are i.i.d. normal random variables with common variance 2 . Therefore,

222

2

2

12

222

21 ~ r

rr ZZZZZZQ

:

Since P is symmetric, tTTP , where T is an orthogonal matrix and is a diagonal matrix with elements

r ,,, 21 . Thus, let μ YTZ t . Since INY 2,0~ μ ,

INTTTNYTZ ttt 22 ,0,0~ μ .

That is, rZZZ ,,, 21 are independent normal random variable with variance 2 . Then,

21

2

212

22

r

iii

tn

tt

ttt

Z

ZZZYTZZZ

YTTYYPYQ

μ

μμμμ

The moment generating function of 21

2

r

iiiZ

Q is

r

i

ii

r

iii Zt

EZ

tE1

2

2

21

2

expexp

r

ii

r

i i

i

r

i

iii

i

i

r

i

iii

ir

i

ii

tt

dztzt

t

dztz

dzzzt

1

21

112

2

2

12

2

22

2

12

2

2

2121

1

2

21exp

2

21

21

1

2

21exp

2

1

2expexp

2

1

14

Also, since Q is distributed as 2r , the moment generating function is also equal to 221

rt

. Thus, for

every t,

r

ii

rtttQE

1

21

2 2121exp

Further,

r

ii

r tt1

2121 .

By the uniqueness of polynomial roots, we must have 1i . Then, PP 2 by the following result:

a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r

eigenvalues equal to 0. ◆

Important Result: Let ,0~ INY and let 11 YPYQ t and 22 YPYQ t be both distributed as chi-square. Then, 1Q and

2Q are independent if and only if 0 21 PP .

Useful Lemma:

If 12

1 PP , 22

2 PP and 21 PP is semi-positive definite, then

21221 PPPPP

21 PP is idempotent.

Theorem:

If ,~ 2 INY μ and let

22

221

1 ,

μμμμ

YPYQ

YPYQ

tt

If 0 ,~ ,~ 212

22

1 21 QQQQ rr , then 21 QQ and

2Q are independent and 221 21

~ rrQQ .

[proof:]

15

We first prove 221 21

~ rrQQ . 021 QQ , thus

0221

21

μμ YPPYQQt

Since ,0~ 2 INY μ , μY is any vector in nR . Therefore, 21 PP is semidefinite. By the above

useful lemma, 21 PP is idempotent. Further, by the previous theorem,

22

2121 21

~ rr

t YPPYQQ

μμ

since

21

21

212121

rr

PrankPrank

PtrPtrPPtrPPrank

We now prove 21 QQ and 2Q are independent. Since

0 22222122121221 PPPPPPPPPPPPPP

By the previous important result, the proof is complete. ◆

16

(5) Regression and Simple Time Series Analysis

1. Let (𝑥 , 𝑦 ), ⋯ , (𝑥 , 𝑦 ) be n points. Please derive the least squares regression line for the usual simple linear regression: : 𝑦 = 𝛽 + 𝛽 𝑥 + 𝜀

Solution:

To minimize the sum (over all n points)of squared vertical distances from each point to the line:

𝑆𝑆 = ∑ (𝑦 − 𝛽 − 𝛽 𝑥 ) :

⎩⎨

⎧𝜕𝑆𝑆

𝜕𝛽= 0

𝜕𝑆𝑆

𝜕𝛽= 0

⎩⎪⎨

⎪⎧ 𝑦 − 𝛽 − 𝛽 𝑥 = 0

𝑥 𝑦 − 𝛽 − 𝛽 𝑥 = 0

⟹ 𝛽 = 𝑦 − 𝛽 �̅�

⟹ 𝑥 𝑦 − 𝑦 + 𝛽 �̅� − 𝛽 𝑥 = 0

⟹ 𝛽 =∑ 𝑥 (𝑦 − 𝑦)

∑ 𝑥 (𝑥 − �̅�)=

∑ (𝑥 − �̅�)(𝑦 − 𝑦)

∑ (𝑥 − �̅�)=

𝑆

𝑆

𝑦 = 𝛽 + 𝛽 𝑥 is the least squares regression line.

17

2. Derive the least squares regression line for simple linear regression through the origin: 𝑦 = 𝛽 𝑥 + 𝜀

Solution:

To minimize

𝑆𝑆 = ∑ (𝑦 − 𝛽 𝑥 ) :

𝜕𝑆𝑆

𝜕𝛽= 0 ⟹ 𝑥 𝑦 − 𝛽 𝑥 = 0 ⟹ 𝛽 =

∑ 𝑥 𝑦

∑ 𝑥

𝑦 = 𝛽 𝑥 is the least squares regression line through the origin.

18

3. Let (𝑥 , 𝑦 ), ⋯ , (𝑥 , 𝑦 ) be n points observed. Our goal is to fit a simple linear regression model using the maximum likelihood estimator. The model and assumptions are as follows:

𝑦 = 𝛽 + 𝛽 𝑥 + 𝜀 , 𝑖 = 1, 2, … , 𝑛.

We assume that the xi‘s are given (condition on the x’s, so that the x’s are not considered as random

variables here), and 𝜀𝑖𝑖𝑑~ 𝑁(0, 𝜎 ).

Solution:

Based on the assumptions that

𝑦 = 𝛽 + 𝛽 𝑥 + 𝜀 , 𝑖 = 1, 2, … , 𝑛, 𝑥 𝑖𝑠 𝑔𝑖𝑣𝑒𝑛, 𝑎𝑛𝑑 𝜀𝑖𝑖𝑑~ 𝑁(0, 𝜎 ),

we can derive the p.d.f. for yi that

𝑦𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡

~𝑏𝑢𝑡 𝑛𝑜𝑡 𝑖𝑖𝑑

𝑁(𝛽 + 𝛽 𝑥 , 𝜎 ) ⇔ 𝑓(𝑦 ) =1

√2𝜋𝜎𝑒

[ ( )]

.

Thus, the likelihood and log-likelihood function can be derived as follows:

𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: ℒ = 𝑓(𝑦 , 𝑦 , ⋯ , 𝑦 )

= 𝑓(𝑦 ) =1

√2𝜋𝜎𝑒

[ ( )]

= (2𝜋𝜎 ) 𝑒∑ [ ( )]

,

𝑙𝑜𝑔 − 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑: 𝑙 = log ℒ = −𝑛

2log(2𝜋) −

𝑛

2log (𝜎 ) −

1

2𝜎[𝑦 − (𝛽 + 𝛽 𝑥 )] .

In order to maximize the likelihood function, which is in turn maximizing the log-likelihood function, we take the first derivatives of involved parameters in the as-derived log-likelihood function and set them to 0.

⎩⎪⎪⎪⎨

⎪⎪⎪⎧ 𝜕𝑙

𝜕𝛽=

1

𝜎[𝑦 − (𝛽 + 𝛽 𝑥 )] = 0

𝜕𝑙

𝜕𝛽=

1

𝜎𝑥 [𝑦 − (𝛽 + 𝛽 𝑥 )] = 0

𝜕𝑙

𝜕𝜎= −

𝑛

2𝜎+

1

2𝜎[𝑦 − (𝛽 + 𝛽 𝑥 )] = 0

.

By solving the three equations above, we can derive the MLEs as shown below:

19

⎩⎪⎪⎨

⎪⎪⎧

𝛽 = 𝑦 − 𝛽 �̅�

𝛽 =∑ (𝑥 − �̅�)(𝑦 − 𝑦)

∑ (𝑥 − �̅�)=

𝑆

𝑆

𝜎 =1

𝑛[𝑦 − (𝛽 + 𝛽 𝑥 )]

Now we have the fitted regression model by plugging the above MLEs of the model parameters:

𝑦 = 𝛽 + 𝛽 𝑥 , 𝑖 = 1, 2, … , 𝑛.

Another way to write down the exact likelihood is to use the multivariate normal distribution directly because the likelihood is simply the joint distribution of the entire sample.

Let 𝑌 = (𝑌 , … , 𝑌 )′, the general formula for the 𝑛-dimensional multivariate normal density function is

𝑓(𝑦 , … , 𝑦 ) =1

(2𝜋) |𝚺|exp −

1

2𝑦 − 𝜇 𝚺 𝑦 − 𝜇

where 𝐸 𝑌 = 𝜇 and 𝚺 is the covariance matrix of 𝑌.

Given that we have i.i.d. Gaussian white noise, we can write down the exact likelihood of our regression dependent variables 𝑌 = (𝑌 , … , 𝑌 )′, conditioning on {𝑋 , … , 𝑋 } as a multivariate normal density function directly as follows:

𝐿(𝒚; 𝜽) = 𝑓(𝑦 , ⋯ , 𝑦 ; 𝜽)

where 𝜽 = (𝛽 , 𝛽 , 𝜎 ) . We have:

𝐸 𝑌 = 𝜇 = 𝑋𝛽 =1 𝑥⋮ ⋮1 𝑥

𝛽𝛽

𝑉𝑎𝑟 𝑌 = Σ = 𝜎1 0 0⋮ ⋱ ⋮0 0 1

= 𝜎 𝐈

20

4. For the stationary autoregressive model of order 1, that is, an AR(1) process: X = β +β X + ε , where {ε } is a sequence of Gaussian white noise.

*Definition: A process{X } is called (second-order or weakly) stationary if its mean is constant and its auto covariance functions depends only on the lag, that is,

𝐸[𝑋(𝑡)] = 𝜇

𝐶𝑜𝑣[𝑋(𝑡), 𝑋(𝑡 + 𝜏)] = 𝛾(𝜏)

a) Please derive the conditional MLE for β , β . Please show that they are the same as the OLS. (Assume that we have observed {𝑥 , 𝑥 , ⋯ , 𝑥 })

b) Please write down the exact (full) likelihood. c) Please derive the method of moment estimator (MOME) of the model parameters β , β . Are they the

same as the OLS?

Solution:

a) The conditional likelihood: 𝐿 = 𝑓(𝑥 |𝑥 )𝑓(𝑥 |𝑥 , 𝑥 ) ⋯ 𝑓(𝑥 |𝑥 , ⋯ , 𝑥 ) = 𝑓(𝑥 |𝑥 )𝑓(𝑥 |𝑥 ) ⋯ 𝑓(𝑥 |𝑥 )

=1

√2𝜋𝜎exp

(𝑥 − 𝛽 − 𝛽 𝑥 )

2𝜎∙

1

√2𝜋𝜎exp

(𝑥 − 𝛽 − 𝛽 𝑥 )

2𝜎∙ ⋯

∙1

√2𝜋𝜎exp

(𝑥 − 𝛽 − 𝛽 𝑥 )

2𝜎=

1

√2𝜋𝜎𝑒𝑥𝑝 −

(𝑥 − 𝛽 − 𝛽 𝑥 )

2𝜎

𝑙 = 𝑙𝑜𝑔𝐿 = −𝑁 − 1

2log(2𝜋𝜎 ) −

1

2𝜎(𝑥 − 𝛽 − 𝛽 𝑥 )

𝜕𝑙

𝜕𝛽= 0 ⟹ 𝛽 =

∑ 𝑥 − 𝛽 ∑ 𝑥

𝑁 − 1

𝜕𝑙

𝜕𝛽= 0 ⟹ (𝑥 − 𝛽 − 𝛽 𝑥 )𝑥 = 0 ⟹ 𝑥 −

∑ 𝑥 − 𝛽 ∑ 𝑥

𝑁 − 1− 𝛽 𝑥 𝑥 = 0 ⟹ 𝛽

=∑ 𝑥 𝑥 −

(∑ 𝑥 )(∑ 𝑥 )𝑁 − 1

∑ 𝑥 −(∑ 𝑥 )

𝑁 − 1

Therefore the conditional MLE are:

21

𝛽 =∑ 𝑥 − 𝛽 ∑ 𝑥

𝑁 − 1, 𝛽 =

∑ 𝑥 𝑥 −(∑ 𝑥 )(∑ 𝑥 )

𝑁 − 1

∑ 𝑥 −(∑ 𝑥 )

𝑁 − 1

Next we show that the conditional MLE are same with the OLS:

𝑥 𝑥

𝑥 𝑥

⋮ ⋮

𝑥 𝑥

⟹ 𝛽 = 𝑦 − 𝛽 �̅� =∑ 𝑥 − 𝛽 ∑ 𝑥

𝑁 − 1= 𝛽

⟹ 𝛽 =𝑆

𝑆=

∑ 𝑥 𝑥 −(∑ 𝑥 )(∑ 𝑥 )

𝑁 − 1

∑ 𝑥 −(∑ 𝑥 )

𝑁 − 1

= 𝛽

b) Write down the exact (full) likelihood.

X = β +β X + ε ⟹ 𝐸(X ) =β

1 − β, 𝑉𝑎𝑟(X ) =

𝜎

1 − 𝛽

𝐿 = 𝑓(𝑥 |𝑥 )𝑓(𝑥 |𝑥 , 𝑥 ) ⋯ 𝑓(𝑥 |𝑥 , ⋯ , 𝑥 ) ∙ 𝑓(𝑥 )

=1

√2𝜋𝜎𝑒𝑥𝑝 −

(𝑥 − 𝛽 − 𝛽 𝑥 )

2𝜎∙

1

2𝜋𝜎

1 − 𝛽

𝑒𝑥𝑝

⎣⎢⎢⎡

−𝑥 −

β1 − β

2𝜎

1 − 𝛽 ⎦⎥⎥⎤

The exact MLEs can thus be derived in a similar fashion as the conditional MLEs.

Another way to write down the exact likelihood is to use the multivariate normal distribution directly because the likelihood is simply the joint distribution of the entire sample.

Let 𝑋 = (𝑋 , … , 𝑋 )′, the general formula for the 𝑁-dimensional multivariate normal density function is

𝑓(𝑥 , … , 𝑥 ) =1

(2𝜋) |𝚺|exp −

1

2𝑥 − 𝜇 𝚺 𝑥 − 𝜇

22

where 𝐸 𝑋 = 𝜇 and 𝚺 is the covariance matrix of 𝑋.

Given that we have i.i.d. Gaussian white noise, we can write down the exact likelihood of our time series {𝑋 , … , 𝑋 } as a multivariate normal density function directly as follows:

𝐿(𝒙; 𝜽) = 𝑓(𝑥 , ⋯ , 𝑥 ; 𝜽)

For our stationary AR(1)

𝑋 = 𝛽 + 𝛽 𝑋 + 𝜀

where {𝜀 } is a series of Gaussian white noise (that is, i.i.d. 𝑁(0, 𝜎 ), 𝑡 = 1, ⋯ , 𝑁),

𝜽 = (𝛽 , 𝛽 , 𝜎 ) , |𝛽 | < 1.

𝐸 𝑋 = 𝜇 =𝛽

1 − 𝛽1

𝑉𝑎𝑟 𝑋 = Σ =𝜎

1 − 𝛽⎣⎢⎢⎡

1𝛽

𝛽1

⋯⋯

𝛽

𝛽⋮

𝛽⋮

𝛽⋱⋯

⋮1 ⎦

⎥⎥⎤

c) Derive the MOME of the model parameters β , β . Are they the same as the OLS?

𝐸(X ) =β

1 − β⟹ �̅� =

𝛽

1 − 𝛽

𝛾(1) = 𝐶𝑜𝑣(X , X ) = β 𝛾(0) ⟹ 𝜌(1) = β ⟹∑ (𝑥 − �̅�)(𝑥 − �̅�)

∑ (𝑥 − �̅�)𝟐= 𝛽

Therefore the MOME of the model parameters are:

𝛽 = 1 − 𝛽 �̅�, 𝛽 =∑ (𝑥 − �̅�)(𝑥 − �̅�)

∑ (𝑥 − �̅�)𝟐

So we conclude that they are not the same as the OLS.

*Please see the supplement notes on Cochran’s Theorem to understand how to test the significance of regression, for a multiple regression. using the Quadratic Forms.