Slide Mancini Eco No Metrics EPFL2011

190
Master in Financial Engineering Econometrics Prof. Loriano Mancini Swiss Finance Institute at EPFL First semester Slides version: September 2011

Transcript of Slide Mancini Eco No Metrics EPFL2011

Page 1: Slide Mancini Eco No Metrics EPFL2011

Master in Financial Engineering

Econometrics

Prof. Loriano Mancini

Swiss Finance Institute at EPFL

First semester

Slides version: September 2011

Page 2: Slide Mancini Eco No Metrics EPFL2011

Information about the course

• Material: slides, exercises, data, etc., at

http://sfi.epfl.ch/mfe → “Courses online” → “Mancini E”

Username: StudMFE

Password: Fall2011

Also, register at http://is-academia.epfl.ch

• Book: “Econometric Analysis”, sixth edition, W. Greene, Prentice Hall, 2008

• Assignments: each week, due next Monday, groups max 3 persons

• Exams: written, closed book, closed notes, one A4 page hand-written notes

• Grade: 30% homework, 30% midterm, 40% final exams

• Assistants: Benjamin Junge (E-mail: [email protected])

Emmanuel Leclercq (E-mail: [email protected])

1

Page 3: Slide Mancini Eco No Metrics EPFL2011

Information about the course

• Exercise sessions start on October 3rd

i.e. no exercise session on September 26th

• Prerequisites: W. Greene “Econometric Analysis” book

Appendix A on matrix algebra

Appendix B on probability and distributions

Appendix D on Laws of Large Numbers and Central Limit Theorems

2

Page 4: Slide Mancini Eco No Metrics EPFL2011

Agenda of the course

• Linear regression model

• Generalized regression model

• Panel data model

• Instrumental variables

• Generalized method of moments

• Maximum likelihood estimation

• Hypothesis testing

3

Page 5: Slide Mancini Eco No Metrics EPFL2011

Chapter 2: Econometric model

Econometrics: intersection of Economics and Statistics

Econometric model = association between yi and xi

E.g.: stock return yi (IBM) and market return xi (S&P 500 index)

Econometric model provides “approximate” description of the association

The relation will be stochastic and not deterministic

Econometric model provides probabilistic description of the association

Model: yi = f(xi) + εi

4

Page 6: Slide Mancini Eco No Metrics EPFL2011

Linear regression model

yi = f(xi1, . . . , xiK) + ε = xi1β1 + · · ·+ xiKβK + εi

yi: dependent or explained variable

xi: regressors or covariates or explanatory variables

εi: error term or random disturbance

Each observation in a sample yi, xi1, . . . , xiK , i = 1, . . . , n, comes from

yi = xi1β1 + · · ·+ xiKβK︸ ︷︷ ︸

“deterministic”

+ εi︸︷︷︸

random

Goal: estimate β1, . . . , βK

5

Page 7: Slide Mancini Eco No Metrics EPFL2011

Assumptions of the linear regression model

Assumptions on the data generating process

1. Linearity: linear relationship between yi and xi1, . . . , xiK

2. Full rank: X = [x1, . . . , xK] is an n×K matrix with rank K

3. Exogeneity of the independent variables: E[εi|xj1, . . . , xjK] = 0, ∀i, j

4. Homoscedasticity and nonautocorrelation: Var[εi|X] = σ2, i = 1, . . . , n, and

Cov[εi, εj|X] = 0, ∀i 6= j

5. Data generation: X can include constants and random variables

6. Normal distribution: ε|X ∼ N(0, σ2I)

Assumptions 4 and 6 simplify life but are too restrictive and will be relaxed

6

Page 8: Slide Mancini Eco No Metrics EPFL2011

Linearity of the regression model

The same linear model holds for all n observations yi, xi1, . . . , xiKni=1

y = x1β1 + · · ·+ xKβK + ε = Xβ + ε

Notation: y is an n× 1 vector; X = [x1, . . . , xK] is an n×K matrix;

ε is an n× 1 vector; β is a K × 1 vector

In the design matrix X: columns are variables, rows are observations

E.g. for the i-th observation: yi = x′i β + εi

Remark: we are modeling E[y|X] = Xβ, as E[ε|X] = 0 by assumption

Linearity refers to β and ε, not X

E.g. g(yi) = β h(xi) + εi is a linear model for any function g and h

7

Page 9: Slide Mancini Eco No Metrics EPFL2011

Error term ε

By assumption E[ε|X] = 0 =⇒ E[ε] = 0

Note: εi does not depend on any xj, neither past nor future xs

Let X = E[X]. By the “tower property” or “law of iterated expectations”

Cov[ε,X] = E[ε(X − X)] = Ex[E[ε(X − X)|X]] = Ex[E[ε|X]︸ ︷︷ ︸

=0

(X − X)] = 0

E[ε|X] = 0 implies E[y|X] = Xβ, i.e. Xβ is the conditional mean of y|X

Our analysis is conditional on design matrix X which can be stochastic

8

Page 10: Slide Mancini Eco No Metrics EPFL2011

Spherical error term ε

Assumptions:

Homoscedasticity Var[εi|X] = σ2, i = 1, . . . , n

Nonautocorrelation Cov[εi, εj|X] = 0, ∀i 6= j

In short: E[ε ε′|X] = σ2I

9

Page 11: Slide Mancini Eco No Metrics EPFL2011

Data generating process for the regressors

X may include constants and random variables

“Golden rule”: include a column of 1s in X

Crucial assumption: ε ⊥ X

10

Page 12: Slide Mancini Eco No Metrics EPFL2011

Chapter 3: Least squares

Regression model: yi = x′i β + εi

Goal: statistical inference on β, e.g. estimate β

Population quantities, not observed: E[yi|xi] = x′i β, β, εi

Sample quantities, estimated from sample data: yi = x′i b, b, ei

11

Page 13: Slide Mancini Eco No Metrics EPFL2011

Least squares estimator

Let b0 be the least squares estimator:

b0 = arg minb0

n∑

i=1

(yi − x′i b0)

2

S(b0) :=n∑

i=1

(yi − x′i b0)

2 =n∑

i=1

e2i0

= e′0 e0 = (y −Xb0)′(y −Xb0)

= y′y − 2y′Xb0 + b′0X′Xb0

12

Page 14: Slide Mancini Eco No Metrics EPFL2011

Least squares estimator: normal equations

Necessary condition for a minimum:

∂S(b0)

∂b0=

∂(y′y − 2y′Xb0 + b′0X′Xb0)

∂b0= −2X ′y + 2X ′Xb0 = 0

Let b be the solution, normal equations:

X ′Xb = X ′y

By assumption X has full column rank,

b = (X ′X)−1X ′y

Since X has full column rank, the following matrix is positive definite

∂2S(b0)

∂b0 ∂b′0= 2X ′X

13

Page 15: Slide Mancini Eco No Metrics EPFL2011

Example: regression with simulated data

DGP: yi = x′i β + εi, with x′

i = [1 x2i], x2i ∼ U [0, 1], εi ∼ N(0, 1)

i = 1, . . . , 100, β = [1 2]′, in this sample b = [1.01 2.07]′

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

x2i

= Uniform[0,1]

y i = x

i β +

ε i

True regression lineEstimated regression line

14

Page 16: Slide Mancini Eco No Metrics EPFL2011

Algebraic aspects of the least squares solution

“Golden rule”: include a column of 1s in X

Normal equations: 0 = X ′y −X ′Xb = X ′(y −Xb) = X ′e

First column of X, x1 = 1s, then the first normal equation is

0 = x′1e = [1 · · · 1]e =

n∑

i=1

ei =

n∑

i=1

(yi − x′i b)

Implications:

1. Least square residuals have zero mean, e = 0

2. Estimated regression line through the means of the data, y = x′b

3. Mean of fitted values equals mean of actual data, y = y

None of these implications holds if X does not include a column of 1s

15

Page 17: Slide Mancini Eco No Metrics EPFL2011

Projection

Estimated residuals e = y −Xb, LS estimator b = (X ′X)−1X ′y

e = y −Xb = y −X(X ′X)−1X ′y = (I −X(X ′X)−1X ′)y = My

M is called residual Maker matrix as My = e

As MX = 0, e′ y = y′MXb = 0, LS partitions y in two orthogonal parts

y = Xb + e = y + e

P is called the Projection matrix

y = y − e = (I −M)y = X(X ′X)−1X ′y = Py

16

Page 18: Slide Mancini Eco No Metrics EPFL2011

Properties of M and P matrices

M and P are symmetric, idempotent and orthogonal (PM = MP = 0)

Orthogonal decomposition of y

y = Xb + e = Py + My = projection + residual

Pythagorean theorem:

y′y = y′P ′Py + y′M ′My = y′y + e′e

17

Page 19: Slide Mancini Eco No Metrics EPFL2011

Partitioned regression

E.g. regression model: income = β0 + β1 age + β2 education + error

Goal: study income–education association; age is a control variable

Model: y = Xβ + ε = X1β1 + X2β2 + ε, solve normal equations for b2

b2 = (X ′2M1X2)

−1X ′2M1y

= (X ′2M

′1 M1X2)

−1X ′2M

′1 M1y

= (X∗2′X∗

2)−1X∗2′y∗

M1 is the residual maker matrix based on columns of X1

b2 is obtained regressing y∗ on X∗2

y∗ (resp. X∗2) are residuals from regression of y (resp. X2) on X1

18

Page 20: Slide Mancini Eco No Metrics EPFL2011

Partial correlation

When education ↑ ⇒ income ↑, but education and age both ↑ in time

What is the net effect of education on income?

Partial correlation, r∗yz:

y∗ = residuals in a regression of income on a constant and age

z∗ = residuals in a regression of education on a constant and age

r∗yz = simple correlation between y∗ and z∗

19

Page 21: Slide Mancini Eco No Metrics EPFL2011

Goodness of fit

Goal: measure how much variation in y is explained by variation in x

Suppose y = x = 0. Recall yi ⊥ ei, for each observation i

yi = yi + ei

n∑

i=1

y2i =

n∑

i=1

y2i +

n∑

i=1

e2i

SST = SSR + SSE

Good regression model: SST ≈ SSR, hence SSE ≈ 0

20

Page 22: Slide Mancini Eco No Metrics EPFL2011

Goodness of fit (when means 6= 0)

When y, x 6= 0, consider deviations from the means

yi − y = yi − y + ei

= (x′i − x′)b + ei

n∑

i=1

(yi − y)2 =

n∑

i=1

((x′i − x′)b)2 +

n∑

i=1

e2i

Define M0 = [I − ii′/n], n× n, symmetric, idempotent; i′ = [1 · · · 1]

M0 transforms observations in deviations from sample means, M0y = y− iy

y′M0′ M0y = b′X ′M0′ M0Xb + e′e

y′M0y = b′X ′M0Xb + e′e

SST = SSR + SSE

21

Page 23: Slide Mancini Eco No Metrics EPFL2011

Coefficient of determination, R2

R2 =SSR

SST=

b′X ′M0Xb

y′M0y= 1− e′e

y′M0y

Properties:

R2 measures the linear association between X and y

0 ≤ R2 ≤ 1, as 0 ≤ SSR ≤ SST

R2 ↑ when a regressor is added, from X = [x1 · · ·xK] to X = [x1 · · ·xK+1]

Adjusted R2 = 1− e′e/(n−K)

y′M0y/(n− 1)

Remark: X should include a column of 1s ⇒ M0e = e and e ⊥ X

22

Page 24: Slide Mancini Eco No Metrics EPFL2011

Chapter 4: Statistical properties of LS estimators

LS estimator enjoys various good statistical properties:

1. Easy to compute

2. Explicit use of model assumptions

3. Optimal linear predictor

4. Most efficient, under certain conditions

23

Page 25: Slide Mancini Eco No Metrics EPFL2011

Orthogonality conditions

Assumptions: X stochastic or not, linear model, E[εi|X] = 0 =⇒

E[xi εi] = Ex[E[xi εi|X]] = Ex[xiE[εi|X]] = 0 = Ex[xiE[(yi − x′iβ)|X]]

which implies the population orthogonality conditions:

ExE[xi yi|X] = ExE[xi x′iβ|X]]

E[xi yi] = E[xi x′i]β

LS normal equations are sample counterpart of orthogonality conditions:

X ′y = X ′X b

1

n

n∑

i=1

xi yi =1

n

n∑

i=1

xi x′i b

24

Page 26: Slide Mancini Eco No Metrics EPFL2011

Optimal linear predictor

Goal: find linear function of xi, x′iγ, that minimizes MSE

MSE = E[(yi − x′iγ)2]

= E[(yi − E[yi|X] + E[yi|X]− x′iγ)2]

= E[(yi − E[yi|X])2] + E[(E[yi|X]− x′iγ)2]

minγ

MSE = minγ

E[(E[yi|X]− x′iγ)2]

0 = −2E[xi(E[yi|X]− x′iγ)]

E[xi yi] = E[xi x′i]γ

which are the LS normal equations

Implicit assumption: all these expectations exist, i.e. E[·] <∞

25

Page 27: Slide Mancini Eco No Metrics EPFL2011

Unbiased estimation

LS estimator is unbiased in every sample:

b = (X ′X)−1X ′y = (X ′X)−1X ′(Xβ + ε) = β + (X ′X)−1X ′ε

Using law of iterated expectations, and assumption E[ε|X] = 0

E[b] = EX[E[β + (X ′X)−1X ′ε|X]]

= β + EX[(X ′X)−1X ′E[ε|X]]

= β

26

Page 28: Slide Mancini Eco No Metrics EPFL2011

Monte Carlo simulation: b2 slope estimates

DGP: yi = x′i β + εi, with x′

i = [1 x2i], x2i ∼ U [0, 1], εi ∼ N(0, 1)

i = 1, . . . , 100, β = [1 2]′, repeat simulation and estimation 1,000 times

0 0.5 1 1.5 2 2.5 3 3.50

50

100

150

200

250

b2

Freq

uenc

y

27

Page 29: Slide Mancini Eco No Metrics EPFL2011

Variance of LS estimator

LS estimator is linear in ε: b = β + (X ′X)−1X ′ε

Easy to derive variance of linear estimator:

Var[b|X] = E[(b− β)(b− β)′|X]

= E[(X ′X)−1X ′ε ε′X(X ′X)−1|X]

= (X ′X)−1X ′E[ε ε′|X]X(X ′X)−1

= (X ′X)−1X ′(σ2I)X(X ′X)−1

= σ2(X ′X)−1

Note: assumption of spherical errors, Var[ε|X] = σ2I, is crucial

28

Page 30: Slide Mancini Eco No Metrics EPFL2011

Gauss–Markov theorem

Any linear unbiased estimator b0 = Cy, where C is a K × n matrix

Unbiasedness: E[Cy|X] = E[CXβ + Cε|X] = β ⇒ CX = I

Define C = D+(X ′X)−1X ′ ⇒ CX = I = DX +(X ′X)−1X ′X =

=0︷︸︸︷DX +I

Var[b0|X] = CVar[y|X]C ′ = CVar[ε|X]C ′ = σ2CC ′

= σ2(D + (X ′X)−1X ′)(D + (X ′X)−1X ′)′

= σ2(D + (X ′X)−1X ′)(D′ + X(X ′X)−1)

= σ2DD′ + σ2(X ′X)−1 = σ2DD′ + Var[b|X]

= Var[b|X] + nonnegative definite matrix

LS estimator is BLUE (when X is constant and/or stochastic)

29

Page 31: Slide Mancini Eco No Metrics EPFL2011

Estimating the variance of LS estimator

Estimate σ2 in Var[b|X] = σ2(X ′X)−1 ⇒ use ei sample analog of εi

But ei = yi − x′ib = εi − x′

i(b− β) is an “imperfect estimate” of εi

Sample residual: e = My = M(Xβ + ε) = Mε, as MX = 0

E[e′e|X] = E[ε′Mε|X] = E[tr(ε′Mε)|X] = E[tr(Mεε′)|X]

= tr(ME[εε′|X]) = tr(M)σ2

= tr(I −X(X ′X)−1X ′)σ2 = (tr(I)− tr(X ′X(X ′X)−1))σ2

= (tr(In)− tr(IK))σ2 = (n−K)σ2

Unbiased estimator of σ2 (conditionally on X and unconditionally):

s2 =e′e

n−K=

∑ni=1 e2

i

n−K

30

Page 32: Slide Mancini Eco No Metrics EPFL2011

Normality of LS estimator

Assumption ε|X ∼ N(0, σ2I), and linearity b = β + (X ′X)−1X ′ε⇒

joint normality of b (multivariate normal distribution)

b|X ∼ N(β, σ2(X ′X)−1)

and each slope, bk, is normally distributed

bk|X ∼ N(βk, σ2(X ′X)−1

kk )

Note: exact distribution in finite samples

31

Page 33: Slide Mancini Eco No Metrics EPFL2011

Distribution of b2 slope estimates: simulation

DGP: yi = [1 x2i] [1 2]′ + εi, with εi ∼ N(0, 1), 1,000 estimations

Comparison between simulated and true normal density of b2

0 0.5 1 1.5 2 2.5 3 3.50

0.2

0.4

0.6

0.8

1

1.2

Den

sity

b2

32

Page 34: Slide Mancini Eco No Metrics EPFL2011

Hypothesis testing on a coefficient

As b|X ∼ N(β, σ2(X ′X)−1)

(bk − βk)/√

σ2(X ′X)−1kk ∼ N(0, 1)

Unfortunately σ2 is not known but estimated via s2. Useful statistic:

(bk − βk)/√

σ2(X ′X)−1kk

[e′e/σ2]/(n−K)∼ N(0, 1)√

χ2(n−K)/(n−K)

∼ t-Student(n−K)

Note: σ2 is unknown but cancels in the ratio above

Need to show: e′e/σ2 ∼ χ2(n−K) and e′e independent on bk

33

Page 35: Slide Mancini Eco No Metrics EPFL2011

χ2 distribution of e′e

Recall: M is residual maker matrix, e = My = Mε as MX = 0

As ε|X ∼ N(0, σ2I)⇒ ε/σ|X ∼ N(0, I)

e′e

σ2=

ε′

σM

ε

σ

which is an idempotent quadratic form in ε/σ, and in Appendix B.11.4

ε′

σM

ε

σ∼ χ2

rank(M)

where rank(M) = tr(M) = n−K

34

Page 36: Slide Mancini Eco No Metrics EPFL2011

Independence of b and e′e

To show independence between

b− β

σ= (X ′X)−1X ′ ε

σ= L

ε

σ∼ N(0, LL′)

ande′e

σ2=

ε′

σM ′ M

ε

σ

it suffices to show that LM = 0 because this implies, conditional on X,

Cov(

σ, M

ε

σ

)

= E[Lε

σ(M

ε

σ)′] = E[L

ε

σ

ε′

σM ′] = L

σ2I

σ2M ′ = LM

= (X ′X)−1X ′ (I −X(X ′X)−1X ′) = 0

which implies independence as ε|X ∼ N

35

Page 37: Slide Mancini Eco No Metrics EPFL2011

Significance of a coefficient: t-statistic

Common test H0 : βk = 0

tk = t-statistic =(bk − 0)/

(X ′X)−1kk

[e′e]/(n−K)=

bk√

s2(X ′X)−1kk

∼ t-Student(n−K)

36

Page 38: Slide Mancini Eco No Metrics EPFL2011

Example: Significance of a coefficient

True β = [1 2]′, estimate b = [1.01 2.07]′, n = 100, K = 2

Is b2 statistically different from zero?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

x2i

= Uniform[0,1]

yi =

xi β

+ ε

i

True regression lineEstimated regression line

−8 −6 −4 −2 0 2 4 6 80

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

b / (s2 (X´ X)−1)1/2

De

nsity

t−Student distributionNormal distribution

37

Page 39: Slide Mancini Eco No Metrics EPFL2011

Confidence intervals for parameters

Point estimates are useless without confidence intervals or standard errors

Use t-Student distribution

(bk − βk)/√

σ2(X ′X)−1kk

[e′e/σ2]/(n−K)∼ t-Student(n−K)

to set confidence intervals:

Pr(bk − tα/2 sbk≤ βk ≤ bk + tα/2 sbk

) = 1− α

where tα/2 is t-Student quantile, e.g. α = 0.05, sbk=√

s2(X ′X)−1kk

38

Page 40: Slide Mancini Eco No Metrics EPFL2011

Significance of the regression

Common test H0 : β2 = · · · = βK = 0 (except intercept)

or equivalently H0 : R2 = 0

F -test statistic:

F [K − 1, n−K] =R2/(K − 1)

(1−R2)/(n−K)∼

χ2(K−1)/(K − 1)

χ2(n−K)/(n−K)

R2 ≈ 1 ⇒ large F ⇒ reject H0

39

Page 41: Slide Mancini Eco No Metrics EPFL2011

Marginal distribution of test statistics

Under H0 : βk = β0k, and conditionally on X, bk|X ∼ N(β0

k, σ2(X ′X)−1kk )

Unconditionally, bk ∼ ?, hard to find, depends on distribution of X

Key property: t-statistic

tk = t|X =bk − β0

k√

s2(X ′X)−1kk

∼ t-Student(n−K)

but t-Student(n−K) does not depend on X ⇒ unconditionally

tk ∼ t-Student(n−K)

40

Page 42: Slide Mancini Eco No Metrics EPFL2011

Multicollinearity

Multicollinearity = variables in X are linearly dependent ⇒ X ′X is singular

In practice, variables in X are often close to be linearly dependent

“Symptoms” of multicollinearity:

• Small changes in data produce large changes in b

• Var[b|X] very large (⇒ t-statistic close to zero) but R2 is high

• Coefficient estimates “wrong” sign or implausible

41

Page 43: Slide Mancini Eco No Metrics EPFL2011

Multicollinearity: analysis

Demeaned variables, X = [X(k) xk], where xk (n× 1) is the k-th variable

Use Appendix A.5.3 on inverse of partitioned matrix:

Var[bk|X] = σ2(X ′X)−1kk

= σ2(

x′kxk − x′

kX(k)(X′(k)X(k))

−1X ′(k)xk

)−1

= σ2

(

x′kxk

[

1−x′

kX(k)(X′(k)X(k))

−1X ′(k)xk

x′kxk

])−1

= σ2

(

x′kxk

[

1− x′kP(k)xk

x′kxk

])−1

= σ2

(

x′kxk

[

1− x′kxk

x′kxk

])−1

= σ2(x′

kxk

[1−R2

k.

])−1

=σ2

x′kxk [1−R2

k.]

42

Page 44: Slide Mancini Eco No Metrics EPFL2011

Multicollinearity: interpretation

Hence, as column variables in X are demeaned,

Var[bk|X] =σ2

x′kxk [1−R2

k.]=

σ2

∑ni=1(xik − xk)2 [1−R2

k.]

where R2k. is R2 from regression of xk on X(k) (i.e. X \ xk)

Var[bk|X] ↑ when

• R2k. → 1, i.e. multicollinearity

•∑n

i=1(xik − xk)2 → 0

• σ2 ↑, i.e. ↑ dispersion of yi around regression line

43

Page 45: Slide Mancini Eco No Metrics EPFL2011

Large sample properties of LS estimator

ε|X ∼ N is a strong assumption and can be relaxed, but now

Assumption 5a (DGP of X):

• (xi, εi) i = 1, . . . , n, sequence of independent observations

• plimX ′X/n = Q positive definite matrix

Notation: plim = probability limit, i.e. convergence in probability

plimZn = Z stands for

limn→∞

Pr(|Zn − Z| > ǫ) = 0,∀ǫ > 0

where Z can be either random or constant

44

Page 46: Slide Mancini Eco No Metrics EPFL2011

Consistency of LS estimator

Consistency means plim b = β

Highly desirable property of any estimator

Recall: distribution of ε|X is unknown

b = β +

(X ′X

n

)−1X ′ε

n

plim b = β + plim

(X ′X

n

)−1

plimX ′ε

n

= β + Q−1 plimX ′ε

n

If plimX ′ε/n = 0, then b is consistent

45

Page 47: Slide Mancini Eco No Metrics EPFL2011

Random term X ′ε/n

E[X ′ε

n] = EX[E[

X ′ε

n|X]] =

1

n

n∑

i=1

EX[xi

=0︷ ︸︸ ︷

E[εi|X]] = 0

Var[X ′ε

n] = E[Var[

X ′ε

n|X]] + Var[

=0︷ ︸︸ ︷

E[X ′ε

n|X]]

= E[1

n2X ′E[εε′|X]X] =

σ2

nE[

X ′X

n] =

σ2

nQ

As E[X ′ε/n] = 0 and limn→∞ Var[X ′ε/n] = 0,

X ′ε

n

m.s.−→ 0 =⇒ plimX ′ε

n= 0

Remark: Var[X ′ε/n] decays as 1/n

46

Page 48: Slide Mancini Eco No Metrics EPFL2011

Example: convergence of X ′ε/n

xi ∼ U [−0.5, 0.5], σ2 = 2, hence Var[X ′ε/n] = σ2/nE[∑n

i=1 x2i/n]

−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.50

1

2

3

4

5

6

7

8

9

10

X ε / n

Den

sity

n=100n=50n=10

47

Page 49: Slide Mancini Eco No Metrics EPFL2011

Asymptotic distribution of LS estimator

Key idea: stabilize the distribution of X ′ε/n

Recall: Var[X ′ε/n] decays as 1/n

Var[√

nX ′ε/n] = nVar[X ′ε/n] ∈ O(1)

√n(b− β) =

(X ′X

n

)−1√nX ′ε/n

−→ Q−1 × asymptotic distribution of√

nX ′ε/n

48

Page 50: Slide Mancini Eco No Metrics EPFL2011

Random term√

nX ′ε/n

Recall: E[√

nX ′ε/n] = 0; (xi, εi) independent; regressors well behaved

Var[√

nX ′ε/n] =1

nVar[X ′ε] =

1

nVar[

n∑

i=1

xi εi]

=1

n

n∑

i=1

Var[xi εi] =1

n

n∑

i=1

σ2E[xi x′i] = σ2Q

By Central Limit Theorem:√

nX ′ε/nd−→ N(0, σ2Q)

√n(b− β)

d−→ Q−1 ×N(0, σ2Q)

d= N(0, σ2Q−1)

ba∼ N(β,

σ2

nQ−1)

49

Page 51: Slide Mancini Eco No Metrics EPFL2011

Asymptotic normality of LS estimator

If regressors well behaved and observations independent, then

asymptotic normality of LS estimator follows from CLT, not ε|X ∼ N

In practice in

ba∼ N(β,

σ2

nQ−1)

Q is estimated by X ′X/n

σ2 is estimated by s2 = e′e/(n−K) (as plim s2 = σ2)

If ε|X ∼ N(0, σ2I), then b ∼ N(β, σ2(X ′X)−1) for every sample size n

50

Page 52: Slide Mancini Eco No Metrics EPFL2011

Asymptotic dist. of nonlinear function: Delta method

f(b): J possibly nonlinear C1 functions

∂f(b)

∂b′=: C(b) (J ×K)

Goal: find asymptotic distribution of f(b)

Slutsky theorem: plim f(b) = f(plim b) = f(β), and plimC(b) = C(β)

First order Taylor expansion (remainder negligible if plim b = β)

f(b) = f(β) + C(β)× (b− β) + remainder

f(b)a∼ N(f(β), C(β)

σ2

nQ−1C(β)′)

51

Page 53: Slide Mancini Eco No Metrics EPFL2011

t-Statistic: remark

To test H0 : βk = 0, t-statistic tk = bk/√

s2(X ′X)−1kk

If in finite sample ε|X ∼ N , then tk ∼ t-Student(n−K)

If only asymptotically ε|X ∼ N (not in finite sample), then tk ∼ N(0, 1)

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

b / (s2 (X´ X)−1)1/2

Dens

ity

t−statistic (n=10)

Normalt−Student

52

Page 54: Slide Mancini Eco No Metrics EPFL2011

Missing observations

Common issue in applied work

• Missing at random: least serious case, just discard those observations, sample

size reduced

• Not missing at random: most difficult case, selection bias, mechanism should

be studied

Read Chapter 4.8.2

53

Page 55: Slide Mancini Eco No Metrics EPFL2011

Chapter 5: Inference

Goal: test implications of economic theory

Example: unrestricted model of investment, It,

ln It = β1 + β2 it + β3∆pt + β4 lnYt + β5 t + εt

where it nominal interest rate, ∆pt inflation rate, Yt real output

H0 : “investors care only about real interest rate, (it −∆pt)”

⇒ restricted (or nested) model of investment:

ln It = β1 + β2(it −∆pt) + β4 lnYt + β5 t + εt

⇒ β3 = −β2 ⇒ β2 + β3 = 0, in the unrestricted model

54

Page 56: Slide Mancini Eco No Metrics EPFL2011

Linear restrictions

In the linear regression model, y = Xβ + ε, consider J linear restrictions

Rβ = q

R is J ×K and usually J ≪ K

Example: β = (β1 β2 β3 β4)′

1. H0 : β2 = 0 tested with R = (0 1 0 0) and q = 0

2. H0 : β2 = β3 = β4 = 0 tested with

R =

0 1 0 0

0 0 1 0

0 0 0 1

and q = (0 0 0)′

55

Page 57: Slide Mancini Eco No Metrics EPFL2011

Two approaches to testing hypothesis

1. Fit unrestricted model and check whether estimates satisfy restrictions

2. Fit restricted model and check loss of fit (in terms of R2)

The two approaches are equivalent in the linear regression model

Working assumption: ε|X ∼ N(0, σ2I) (to be relaxed)

56

Page 58: Slide Mancini Eco No Metrics EPFL2011

Approach 1: discrepancy vector

Null hypothesis: J linear restrictions, R is J ×K

H0 : Rβ − q = 0

Alternative hypothesis:

H1 : Rβ − q 6= 0

Discrepancy vector, m = Rb− q, will not be exactly zero (most likely)

Decide whether m is not exactly zero because of

(a) sampling variability (do not reject H0)

(b) or restrictions are not satisfied by the data (reject H0)

57

Page 59: Slide Mancini Eco No Metrics EPFL2011

Wald criterion

Under H0 : Rβ − q = 0, discrepancy vector m = Rb− q

E[m|X] = RE[b|X]− q = Rβ − q = 0

Var[m|X] = Var[Rb− q|X] = RVar[b|X]R′ = σ2R(X ′X)−1R′

Recall, as ε|X ∼ N(0, σ2I) by assumption, b|X ∼ N(β, σ2(X ′X)−1)

=⇒ m|X ∼ N(0, σ2R(X ′X)−1R′)

Wald statistic:

W = m′ (Var[m|X])−1 m

= (Rb− q)′ (σ2R(X ′X)−1R′)−1 (Rb− q)

∼ χ2(J)

χ2 distribution ⇐ Full Rank Gaussian Quadratic form, Appendix B.11.6

58

Page 60: Slide Mancini Eco No Metrics EPFL2011

Wald statistic feasible and F -statistic

In the Wald statistic, need to get rid of unknown σ2

F =(Rb− q)′ (σ2R(X ′X)−1R′)−1 (Rb− q)/J

[e′e/σ2]/(n−K)

∼χ2

(J)/J

χ2(n−K)/(n−K)

∼ F(J, n−K)

• Numerator: under H0, (Rb− q)/σ = R(b− β)/σ = R(X ′X)−1X ′ε/σ

i.e. standardized Gaussian quadratic form in R(X ′X)−1X ′ε/σ ⇒ χ2(J)

• Denominator: standardized Gaussian quadratic form in Mε/σ ⇒ χ2(n−K)

As MX = 0, Cov(R(X ′X)−1X ′ε/σ,Mε/σ) = 0⇒ Num. Den. independent

59

Page 61: Slide Mancini Eco No Metrics EPFL2011

Hypothesis testing on a single coefficient

H0 : βk = β0 can be tested with “t-statistic”

t :=(bk − β0)/

σ2(X ′X)−1kk

[e′e/σ2]/(n−K)∼ N(0, 1)√

χ2(n−K)/(n−K)

∼ t-Student(n−K)

or with linear restriction R = (0 · · · 0 1 0 · · · 0) and q = β0

F =(bk − β0) (σ2(X ′X)−1

kk )−1 (bk − β0)

[e′e/σ2]/(n−K)

∼χ2

(1)

χ2(n−K)/(n−K)

∼ F(1, n−K)

As t2 = F the two tests are equivalent

60

Page 62: Slide Mancini Eco No Metrics EPFL2011

Approach 2: restricted least squares

Fit of restricted model cannot be better than unrestricted model

Restricted LS:

b∗ = arg minb0

(y −Xb0)′(y −Xb0) subject to Rb0 = q

= b− (X ′X)−1R′[R(X ′X)−1R′]−1(Rb− q)

(b∗ − b) = −(X ′X)−1R′[R(X ′X)−1R′]−1(Rb− q)

e∗ residuals from restricted model. Loss of fit due to constraints:

e∗ = y −Xb∗ = y −Xb−X(b∗ − b) = e−X(b∗ − b)

e′∗e∗ = e′e + (b∗ − b)′X ′ X(b∗ − b) ≥ e′e

e′∗e∗ − e′e = (b∗ − b)′X ′ X(b∗ − b)

= (Rb− q)′[R(X ′X)−1R′]−1(Rb− q)

61

Page 63: Slide Mancini Eco No Metrics EPFL2011

Loss of fit and F -statistic

F -statistic for H0 : Rβ = q

F =(Rb− q)′ [R(X ′X)−1R′]−1 (Rb− q)/J

e′e/(n−K)

=(e′∗e∗ − e′e)/J

e′e/(n−K)

=(R2 −R2

∗)/J

(1−R2)/(n−K)∼ F(J, n−K)

Recall R2 = 1− e′e/y′M0y and y′M0y not depend on b. Similarly for R2∗.

Special case: overall significance of the regression

β2 = . . . = βK = 0 (except intercept) ⇒ R2∗ = 0 with J = K − 1

62

Page 64: Slide Mancini Eco No Metrics EPFL2011

Nonnormal disturbances and large sample tests

Drop assumption ε|X ∼ N which implies b|X ∼ N(β, σ2(X ′X)−1)

All previous tests hold asymptotically, when n→∞

Key ingredient: asymptotic distribution of b

ba∼ N(β,

σ2

nQ−1), where Q = plim (X ′X/n)

Recall √n(b− β)

d−→ N(0, σ2Q−1), from CLT

plim s2 = σ2, where s2 = e′e/(n−K)

63

Page 65: Slide Mancini Eco No Metrics EPFL2011

Example: limiting distribution of Wald statistic

If√

n(b− β)d−→ N(0, σ2Q−1) and H0 : Rβ − q = 0, then

√n(Rb− q) =

√nR(b− β)

d−→ N(0, σ2RQ−1R′)

which implies

√n(Rb− q)′(σ2RQ−1R′)−1

√n(Rb− q)

d−→ χ2(J)

which has the same limiting distribution as W

W = (Rb− q)′ (s2R(X ′X)−1R′)−1 (Rb− q)d−→ χ2

(J)

when plim s2(X ′X/n)−1 = σ2Q−1. Note: in W all n’s cancel

Remark: W is only approximately distributed as χ2(J) in finite samples,

in practice n 6→ ∞

64

Page 66: Slide Mancini Eco No Metrics EPFL2011

Testing nonlinear restrictions

Test H0 : c(β) = q, where c is J × 1 nonlinear functions

Apply delta method: first order Taylor expansion of c

c(β) ≈ c(β) +∂c(β)

∂β′(β − β)

Var[c(β)] ≈ ∂c(β)

∂β′Var[β]

∂c(β)′

∂β

In ∂c(β)/∂β′ replace β by β

Wald statistic

W = (c(β)− q)′(

Var[c(β)])−1

(c(β)− q)d−→ χ2

(J)

65

Page 67: Slide Mancini Eco No Metrics EPFL2011

Prediction

Prominent use of regression model

y0, x0 not in our sample, not observed. Predict y0 using

E[y0|x0, X] = x0′b

as y0 = x0′β + ε0, and assuming that x0 is known

Forecast error: e0 = y0 − y0 = (β − b)′x0 + ε0

Prediction variance: Var[e0|x0, X] = σ2 + x0′[σ2(X ′X)−1]x0 > 0

Prediction interval at (1− λ) confidence level:

y0 ± zλ/2

Var[e0|x0, X]

where zλ/2 is the λ/2-quantile of N(0, 1), e.g. λ = 0.05

66

Page 68: Slide Mancini Eco No Metrics EPFL2011

Prediction of y0 and x0

If x0 is known, prediction of y0

E[y0|x0, X] = x0′b

with Var[e0|x0, X]

If x0 is not known and needs to be predicted too, prediction of y0

Ex0E[y0|x0, X] = Ex0[x0′b|X]

depends on distribution of x0, usually unknown and computed by simulation,

with Var[e0|X] > Var[e0|x0, X]

67

Page 69: Slide Mancini Eco No Metrics EPFL2011

Measure of predictive accuracy

Notation: yi realized values, yi predicted values, n0 number of predictions

• Not scale invariant:

– Root mean square error (RMSE) =√∑

i(yi − yi)2/n0

– Mean absolute error (MAE) =∑

i |yi − yi|/n0

• Scale invariant:

– Theil U statistic

U =

√∑

i(yi − yi)2/n0

i y2i /n0

68

Page 70: Slide Mancini Eco No Metrics EPFL2011

Chapter 6: Functional form

Very general functional form of regression model:

L independent variables: zi = [z1i · · · zLi]

K linearly independent functions of zi: f1i(zi) · · · fKi(zi)

g(yi) observable function of yi

usual assumptions on εi

The following model is still linear and can be estimated by LS:

g(yi) = β1 f1i(zi) + . . . + βK fKi(zi) + εi

= β1 x1i + . . . + βK xKi + εi

yi = x′i β + εi

69

Page 71: Slide Mancini Eco No Metrics EPFL2011

Nonlinearity in variables

A linear model, e.g., yi = β1 + xi β2 + wi β3 + εi is typically enriched with

• dummy variables

• nonlinear functions of regressors (e.g. quadratic function)

• interaction terms (i.e. cross products)

yi = β1 + xi β2 + wi β3 + β4 di + β5 x2i + β6 xiwi + εi

= x′i β + εi

where x′i = [1 xi wi di x2

i xiwi] and the dummy variable

di =

1, i ∈ D0, otherwise

70

Page 72: Slide Mancini Eco No Metrics EPFL2011

Dummy variable

Easy to use: one dummy variable is one more column in X

To study various effects (treatment, grouping, seasonality, thresholds, etc.)

yi = β1 + x2i β2 + di β3 + εi

= (β1 + di β3) + x2i β2 + εi

= x′i β + εi

where

di =

1, i ∈ D0, otherwise

In this model the dummy variable “shifts” the intercept: β1 ←→ (β1 + β3)

71

Page 73: Slide Mancini Eco No Metrics EPFL2011

Example: regression with dummy variable

yi = x′i β + εi, x′

i = [1 x2i di], x2i ∼ U [0, 1], di = 1x2i>0.5, εi ∼ N(0, 1)

i = 1, . . . , 100, β = [1 2 2]′, in this sample b = [0.99 2.13 1.96]′

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

6

7

x2i

= Uniform[0,1]

y i = β

1 + x

2i β

2 + β

3 di +

ε i

True regression lineEstimated regression line

72

Page 74: Slide Mancini Eco No Metrics EPFL2011

Structural break

Previous graph shows a structural break in the model

yi =

β1 + x2i β2 + εi, x2i ≤ 0.5

(β1 + β3) + x2i β2 + εi, x2i > 0.5

Structural change can be tested with F -test

Note: the break point is supposed to be known a priori

73

Page 75: Slide Mancini Eco No Metrics EPFL2011

Testing for a structural break

Split the sample in two parts, according to potential structural break

nb observations on yb and Xb (nb × k) before potential structural break

na observations on ya and Xa (na × k) after potential structural break

• Unrestricted model allows for potential structural break, βb 6= βa:

[

yb

ya

]

=

[

Xb 0

0 Xa

][

βb

βa

]

+

[

εb

εa

]

• Restricted model, no structural break, β′ = [β′b β′

a]

βb = βa

βb − βa = 0

[Ik...− Ik] β = R β = 0

74

Page 76: Slide Mancini Eco No Metrics EPFL2011

F -test for a structural break

H0 : R β = q, with q = 0, R = [Ik...−Ik], dim(R) = k×2k, dim(β) = 2k×1

F =(Rb− q)′ [R(X ′X)−1R′]−1 (Rb− q)/J

e′e/(n−K)∼ F(J, n−K)

where

J = k = number of restrictions = number of rows in R

n−K = (nb + na)− 2k = total number of observations minus dim(β)

Alternative ways exist to test for structural break (e.g., Wald statistic)

Typical issue: limited sample sizes before, nb, and/or after, na, the break

75

Page 77: Slide Mancini Eco No Metrics EPFL2011

Chapter 7: Specification analysis

Implicit assumption: the model y = Xβ + ε is correct

Common model misspecification:

• Omission of relevant variables

• Inclusion of superfluous variables

76

Page 78: Slide Mancini Eco No Metrics EPFL2011

Omitted relevant variables

True regression model: y = X1 β1 + X2 β2 + ε

Use wrong regression model: “y = X1 β1 + ε”

Regress y on X1 only:

b1 = (X ′1X1)

−1X ′1 y = (X ′

1X1)−1X ′

1(X1 β1 + X2 β2 + ε)

= β1 + (X ′1X1)

−1X ′1X2 β2 + (X ′

1X1)−1X ′

E[b1|X] = β1 + (X ′1X1)

−1X ′1X2 β2

Unless X ′1X2 = 0 or β2 = 0,

E[b1|X] 6= β1, i.e. b1 is biased

plim (b1) 6= β1, i.e. b1 is inconsistent

Inference procedures (t-test, F -test, etc.) are invalid

77

Page 79: Slide Mancini Eco No Metrics EPFL2011

Inclusion of superfluous variables

True regression model: y = X1 β1 + ε

Use “wrong” regression model: y = X1 β1 + X2 β2 + ε

Rewrite y = X1 β1 + X2 β2 + ε = Xβ + ε

where X = [X1 X2] and β′ = [β′1 β′

2] = [β′1 0′]

Model used, per se, it is not wrong, simply β2 = 0

Regress y on X: LS estimator is unbiased estimator of β

E[b|X] = β =

[

β1

β2

]

=

[

β1

0

]

Price to pay for not using information β2 = 0: reduced precision of estimates

“Var[b|X] ≥ Var[b1|X]”

78

Page 80: Slide Mancini Eco No Metrics EPFL2011

Model building

• Simple-to-general:

not a good strategy, omitted variables induce biased and inconsistent

estimates

• General-to-simple:

better strategy, computing power is cheap, but variable selection is a difficult

task

79

Page 81: Slide Mancini Eco No Metrics EPFL2011

Choosing between nonnested models

F -test of H0 : Rβ = q is only for nested models

R represents (linear) restrictions on the model y = Xβ + ε

Various nonnested hypothesis can be of interest

e.g., choosing between linear or loglinear functional forms:

yi = β1 + xi β2 + εi or log(yi) = β1 + log(xi)β2 + εi

Typically, these tests are based on likelihood function

80

Page 82: Slide Mancini Eco No Metrics EPFL2011

Likelihood function: digression

Probability theory : given population model, what is the probability of

observing that sample?

Inference procedure : given that sample, what is the population model?

Likelihood function = probability of observing that sample as a function of

model parameters

81

Page 83: Slide Mancini Eco No Metrics EPFL2011

Likelihood function: simple example

Fair coin, H, T, Pr(toss = T ) = p0 = 0.5

Goal: estimate p0 (unknown to us)

Observed sample: n = 60 tossing, total T = k = 28

L(p) =

(

n

k

)

pk(1− p)n−k =

(

60

28

)

p28(1− p)32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

p

Likeli

hood

82

Page 84: Slide Mancini Eco No Metrics EPFL2011

Choosing between nonnested models: Vuong’s test

Goal: choose between two nonnested models

No model is favored, as in classical hypothesis testing

Models can be both wrong: choose the least misspecified

Assumption: observations are independent (conditionally on regressors)

True model: yi ∼ h(yi), density with parameter α

Model 0: yi ∼ f(yi), density with parameter θ

Model 1: yi ∼ g(yi), density with parameter γ

KLIC0 = E [(lnh(yi)− ln f(yi))| h is true ] ≥ 0

KLIC0 = distance between model h (true) and f in terms of log-likelihood

83

Page 85: Slide Mancini Eco No Metrics EPFL2011

Vuong’s statistic

Decision criteria: model 1 is better than model 0 if KLIC1 < KLIC0

KLIC1 −KLIC0 = E [(ln f(yi)− ln g(yi))| h is true ]

≈ 1

n

n∑

i=1

(ln f(yi)− ln g(yi)) =

n∑

i=1

mi/n

Vuong’s statistic:

V =√

n

∑ni=1 mi/n

√∑ni=1(mi −m)2/n

• Vd−→ N(0, 1) when model 0 and 1 are “equivalent”

• Va.s.−→ +∞ when model 0, f(yi), is “better”

• Va.s.−→ −∞ when model 1, g(yi), is “better”

84

Page 86: Slide Mancini Eco No Metrics EPFL2011

Vuong’s test: application to linear models

Assume ε ∼ N(0, σ2)

Model 0: yi ∼ f(yi), with yi = x′iθ + ε0i

Model 1: yi ∼ g(yi), with yi = x′iγ + ε1i

f(yi) =1√

2πσ2e−0.5(yi−x′

iβ)2/σ2

ln f(yi) = −1

2ln(2πσ2)− 1

2(yi − x′

iβ)2/σ2

= −1

2[ln 2π + ln(σ2) + ε2

0i/σ2]

ln f(yi)− ln g(yi) =

[

−1

2[ln(e′0e0/n) +

e20i

e′0e0/n]

]

−[

−1

2[ln(e′1e1/n) +

e21i

e′1e1/n]

]

85

Page 87: Slide Mancini Eco No Metrics EPFL2011

Model selection criteria

Various criteria have been proposed

Adjusted R2:

R2

= 1− e′e/(n−K)∑n

i=1(yi − y)2/(n− 1)

Akaike Information Criteria:

lnAIC(K) = ln(e′e/n) + K2/n

Bayesian Information Criteria:

lnBIC(K) = ln(e′e/n) + K lnn/n

86

Page 88: Slide Mancini Eco No Metrics EPFL2011

Chapter 8: Generalized regression model

Spherical error E[ε ε′|X] = σ2I is a restrictive assumption

Allow for heteroscedasticity, σ2i 6= σ2

j , and autocorrelation, σij 6= 0, ∀i, j:

E[ε ε′|X] = σ2Ω = Σ =

σ21 σ12 · · · σ1n

σ12 σ22 · · · σ2n

. . .

σ1n σ2n

Total number of parameters in Σ = n + (n2 − n)/2 = n(n + 1)/2≫ n

E.g., n = 100 ⇒ n(n + 1)/2 = 5,050 too many!

Need to impose structure on Σ

87

Page 89: Slide Mancini Eco No Metrics EPFL2011

Heteroscedasticity: asset returns and stochastic volatility

S&P 500 daily returns, 1999–2003, and asymmetric GARCH volatility

0 100 200 300 400 500 600 700 800 900 1000−8

−6

−4

−2

0

2

4

6

Retu

rn %

0 100 200 300 400 500 600 700 800 900 10000.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Vola

tility

88

Page 90: Slide Mancini Eco No Metrics EPFL2011

Least square estimator

When Var[ε|X] = σ2Ω

LS estimator, b = β + (X ′X)−1X ′ε, has still good properties:

unbiased, consistent, and asymptotically normal

E[b|X] = β

Var[b|X] = (X ′X)−1X ′ Var[ε|X]X(X ′X)−1

=σ2

n

(X ′X

n

)−1(X ′ΩX

n

)(X ′X

n

)−1

If plim (X ′X/n) and plim (X ′ΩX/n) are positive definite, plim b = β

√n(b− β) =

(X ′X

n

)−1√nX ′ε/n

d−→ Q−1 ×N(0, σ2 plimX ′ΩX

n)

89

Page 91: Slide Mancini Eco No Metrics EPFL2011

Generalized least square estimator

Var[ε|X] = σ2Ω, assume Ω is known; decompose Ω = CΛC ′

Ω−1 = CΛ−1/2 Λ−1/2C ′ = P ′ P , where Λ = diag(λ1, . . . , λn), C ′C = I

Transformed model : Py = PXβ + Pε ⇒ Var[Pε|X] = σ2PΩP ′ = σ2I

β = (X ′P ′ PX)−1X ′P ′ Py

= (X ′Ω−1X)−1X ′Ω−1y

= arg minβ0

(y′ − β′0X

′)Ω−1 (y −Xβ0)

Heteroscedasticity case: Ω = diag(w1, . . . , wn)

β = arg minβ0

n∑

i=1

(yi − x′i β0)

2/wi

Recall: OLS case Ω = I

90

Page 92: Slide Mancini Eco No Metrics EPFL2011

GLS efficient estimator

In the classical model, y = Xβ + ε, where Var[ε|X] = σ2I:

OLS is minimum variance, BLUE, estimator

In the transformed model, Py = PXβ + Pε, where Var[Pε|X] = σ2I:

GLS estimator = OLS in the transformed model

⇒ GLS estimator is efficient (not OLS)

91

Page 93: Slide Mancini Eco No Metrics EPFL2011

Feasible generalized least square estimator (FGLS)

Var[ε|X] contains n(n + 1)/2 parameters: impossible to estimate all

Var[ε|X] = σ2Ω parameterized with few unknown parameters θ

E.g. Time series: Ωij = θ|i−j|, where |θ| < 1

E.g. Heteroscedasticity: Ωii = exp(z′i θ)

FGLS estimator relies on Ω = Ω(θ)

β(Ω) = (X ′Ω−1X)−1X ′Ω−1y

Key result: when n→∞, β(Ω) behaves like β(Ω)

using any consistent (not necessarily efficient) estimator of Ω(θ)

92

Page 94: Slide Mancini Eco No Metrics EPFL2011

Heteroscedasticity

Var[ε|X] = σ2Ω = σ2diag(w1, . . . , wn)

Scaling: tr(σ2Ω) =∑n

i=1 σ2i = σ2

∑ni=1 wi = σ2n⇒ σ2 =

∑ni=1 σ2

i /n

Interpretation: wi positive weight

When form of heteroscedasticity is

• known: parameterize and estimate Ω, then FGLS

• unknown: OLS can still be applied, but Var[b|X]?

93

Page 95: Slide Mancini Eco No Metrics EPFL2011

Estimating Var[b|X] under unknown heteroscedasticity

White’s heteroscedasticity consistent estimator:

Var[b|X] =σ2

n

(X ′X

n

)−1(X ′ΩX

n

)(X ′X

n

)−1

=1

n

(X ′X

n

)−1(

1

n

n∑

i=1

σ2i xi x

′i

)(X ′X

n

)−1

≈ 1

n

(X ′X

n

)−1(

1

n

n∑

i=1

e2i xi x

′i

)(X ′X

n

)−1

Proof sketch: As σ2i xi x

′i = E[ε2

i xi x′i|xi],

plim1

n

n∑

i=1

σ2i xi x

′i = plim

1

n

n∑

i=1

ε2i xi x

′i = plim

1

n

n∑

i=1

e2i xi x

′i

Remark: equalities above are in plim , X ′ΩX/n never estimated

94

Page 96: Slide Mancini Eco No Metrics EPFL2011

Test for heteroscedasticity: Breusch–Pagan test

Form of heteroscedasticity: σ2i = σ2f(α0 + α′zi)

Note: functional form f does not need to be specified

H0 : α = 0, i.e. homoscedasticity

Under H0, E[ε2i/(σ2f(α0))− 1] = 0 and does not depend on zi

Regress gi := (e2i/(e′e/n)− 1) on Z ′

i := [1 z′i] (1× k), i = 1, . . . , n

calculate b = (Z ′Z)−1Z ′g and g = Zb

Under H0, test statistic:

1

2g′g =

1

2g′Z(Z ′Z)−1Z ′g

d−→ χ2(k−1)

95

Page 97: Slide Mancini Eco No Metrics EPFL2011

Multiplicative heteroscedasticity: example

Goal: explain firms profit, yi, i = 1, . . . , n

Model: yi = x′i β + εi, where

Var[εi|X] = σ2 exp(z′i α) (Harvey’s model)

Step 1: regress yi on xi using OLS and compute ei

Step 2: regress log(e2i ) on [1 z′i] using OLS to estimate σ2 (biased) and α

Step 3: regress yi on xi using FGLS with Ωii = exp(z′i α) to estimate β

LS applied twice to model yi = x′i β + εi: two-stage least squares

Remark: LS estimate of σ2 biased (but not important for FGLS) because

E log ε2i < log Eε2

i = log σ2i = log σ2 + z′iα

E log ε2i = −c + log σ2

i , where c > 0

log e2i = −c + log σ2 + z′iα + νi, where νi error term

96

Page 98: Slide Mancini Eco No Metrics EPFL2011

Chapter 9: Panel data models

Time series: yit, t = 1, . . . , T

Cross sectional: yit, i = 1, . . . , n

Panel or longitudinal: yit, i = 1, . . . , n, t = 1, . . . , T , with n≫ T

y1t y2t y3t · · · ynt

... ...

y1T y2T y3T · · · ynT

97

Page 99: Slide Mancini Eco No Metrics EPFL2011

Why panel data model

Reach panel databases are available, e.g. labor market, industrial sectors

Certain phenomena can be studied only in panel data models

E.g. Analysis of production function:

technological change (over time) and

economies of scale (across firms of different sizes)

98

Page 100: Slide Mancini Eco No Metrics EPFL2011

General framework for panel data model

Typically n≫ T

yit = x′it β + z′i α + εit

= x′it β + ci + εit

xit: K × 1, without constant term

zi: individual specific variables, observed or unobserved, with constant term

ci: individual effect, often unobserved and stochastic, e.g. “health”, “ability”

Goal: estimate partial effects β = ∂E[yit|xit]/∂xit and E[ci|xi1, xi2, . . .]

Note: if zi observed ∀i ⇒ linear model estimated by LS

99

Page 101: Slide Mancini Eco No Metrics EPFL2011

Modeling frameworks

Panel data model: yit = x′it β + ci + εit

1. Pooled model: ci = α constant term. Use OLS to estimate α, β

2. Fixed effects: ci unobserved and correlated with xit: E[ci|Xi] = αi

yit = x′it β + αi + εit + (ci − αi)

Regress yit on xit omits variables: LS biased, inconsistent estimate of β

3. Random effects: ci unobserved and uncorrelated with xit: E[ci|Xi] = α

yit = x′it β + α + εit + (ci − α)

Regress yit on xit and constant: OLS consistent, inefficient estimate of α, β

100

Page 102: Slide Mancini Eco No Metrics EPFL2011

Pooled model

Assumption: ci = α constant term

yit = x′it β + ci + εit

= x′it β + α + εit

E[εit|Xi] = 0

Var[εit|Xi] = σ2ε

Cov[εit εjs|Xi, Xj] = 0, if i 6= j or t 6= s

If assumptions of linear regression model are met: OLS unbiased and efficient

But this is hardly the case

101

Page 103: Slide Mancini Eco No Metrics EPFL2011

LS estimation of pooled model

Pooled model: yit = x′it β + ci + εit = x′

it β + α + εit

If FE true model, Cov[ci, xit] 6= 0: LS is inconsistent (omitted variables)

If RE true model, Cov[ci, xit] = 0: LS consistent but inefficient

In RE model:

yit = x′it β + ci + εit

= x′it β + E[ci|Xi] + (ci − E[ci|Xi]) + εit

= x′it β + α + ui + εit

= x′it β + α + wit

Autocorrelation (within group i): Cov[wit wis] = σ2u 6= 0, t 6= s

102

Page 104: Slide Mancini Eco No Metrics EPFL2011

Pooled regression with random effects

RE model: yit = x′it β + α + ui + εit. Stack Ti observations for individual i:

yi = [ii xi]

[

α

β

]

+ (εi + iiui) = Xi β + wi

Shocks, wi, are heteroscedastic (across individuals) and autocorrelated:

Var[wi] = Var[εi + iiui] =

σ2ε · · · 0

. . .

0 · · · σ2ε

+

σ2u · · · σ2

u... . . . ...

σ2u · · · σ2

u

= σ2εITi

+ Σi

= Ωi

Recall: i = 1, . . . , n, and goal is to estimate β

103

Page 105: Slide Mancini Eco No Metrics EPFL2011

LS pooled regression with random effects

Stack all observations for all individuals, (T1 + . . . + Tn):

b = (X ′X)−1X ′y = β +

[

1

n

n∑

i=1

X ′iXi

]−11

n

n∑

i=1

X ′iwi

p−→ β

Asy.Var[b] =1

nplim

[

1

n

n∑

i=1

X ′iXi

]−1

plim

[

1

n

n∑

i=1

X ′iwi w

′iXi

]

plim

[

1

n

n∑

i=1

X ′iXi

]−1

LS consistent; Asy.Var[b] called robust covariance matrix

If data are well behaved

plim

[

1

n

n∑

i=1

X ′iXi

]

and plim

[

1

n

n∑

i=1

X ′iwi w

′iXi

]

are positive definite

but second matrix needs to be “estimated”

104

Page 106: Slide Mancini Eco No Metrics EPFL2011

“Estimating” center matrix in Asy.Var[b]

Use White’s approach (not White’s heterosc. estimator):

plim

[

1

n

n∑

i=1

X ′iwi w

′iXi

]

= plim

[

1

n

n∑

i=1

X ′iΩiXi

]

= plim

[

1

n

n∑

i=1

X ′iwi w

′iXi

]

= plim

1

n

n∑

i=1

Ti∑

t=1

xitwit

Ti∑

t=1

xitwit

6= plim

1

n

n∑

i=1

Ti∑

t=1

w2itxit x′

it

Correlations across observation (not heterosc.) contribute most to Asy.Var[b]

105

Page 107: Slide Mancini Eco No Metrics EPFL2011

Pooled regression: group means estimator

To estimate β use n group means, e.g. for yit, t = 1, . . . , Ti:

(1/Ti)

Ti∑

t=1

yit = (1/Ti)i′i yi = yi.

Averaging eliminates time series dimension of panel data (≈ cross section)

yi = Xi β + wi

(1/Ti)i′i yi = (1/Ti)i

′i Xi β + (1/Ti)i

′i wi

yi. = x′i. β + wi.

In Pooled model wi. = εi.; in RE model wi. = εi. + ui heteroscedastic

Sample data (yi., xi.), i = 1, . . . , n

Estimation: LS for β and White’s heterosc. estimator for Asy.Var[b]

106

Page 108: Slide Mancini Eco No Metrics EPFL2011

Pooled regression: first difference estimator

General panel data model: yi,t = x′i,t β + ci + εi,t, where

ci correlated (fixed effects) or uncorrelated (random effects) with xi,t

yi,t − yi,t−1 = (x′i,t − x′

i,t−1)β + εi,t − εi,t−1

∆yi,t = (∆x′i,t)β + ui,t

Advantage: first difference removes all individual specific heterogeneity ci

Disadvantage: first difference removes all time-invariant variables too

ui,t: moving average (MA), covariance matrix tridiagonal, two-stage GLS

107

Page 109: Slide Mancini Eco No Metrics EPFL2011

Fixed effects model

Assumption: unobservable individual effect, ci, correlated with xit

yit = x′it β + ci + εit

= x′it β + E[ci|Xi] + (ci − E[ci|Xi]) + εit

= x′it β + h(Xi) + νi + εit

= x′it β + αi + εit

Further assumption: Var[ci|Xi] = Var[νi|Xi] is constant

In general: Cov[εit, εis|Xi] = E[(νi + εit)(νi + εis)|Xi] = E[ν2i |Xi] 6= 0

Assumption: Var[εi|Xi] = σ2ε ITi

⇒ classical regression model

Parameters to estimate (K + n): [β1 · · ·βK]′ and αi, i = 1, . . . , n

108

Page 110: Slide Mancini Eco No Metrics EPFL2011

Fixed effects model: drawback

Time invariant variables in xit are absorbed in αi

x′it = [1x

′it 2x

′i] time variant and time invariant variables

yit = x′it β + αi + εit

= 1x′it β1 + 2x

′i β2 + αi + εit

= 1x′it β1 + αi + εit

β2 cannot be estimated (not identified)

109

Page 111: Slide Mancini Eco No Metrics EPFL2011

Fixed effects model: Least Squares Dummy Variable

Recall i = (T × 1) column of ones. Stack T observations for individual i:

yi = Xi β + i αi + εi

Stack all regression models for n individuals, LSDV model:

y1

...

yn

=

X1

...

Xn

β +

i 0 · · · 0... ...

0 0 · · · i

α1

...

αn

+

ε1

...

εn

y = [X d1 · · · dn]

[

β

α

]

+ ε

= X β + D α + ε

110

Page 112: Slide Mancini Eco No Metrics EPFL2011

Fixed effects model: least squares estimation

Model for nT observations: y = X β + D α + ε, interest on β

Partitioned regression, MD y on MD X, reduces size of computation

b = [X ′MDX]−1X ′MDy

Asy.Var[b] = s2 [X ′MDX]−1

Individual effect, αi, estimated using only T observations on individual i:

ai = yi. − x′i.b =

1

T

T∑

t=1

(αi + x′it β + εit)− x′

i.b

ai − αi =1

T

T∑

t=1

εit +1

T

T∑

t=1

x′it(β − b) = εi. + x′

i.(β − b)

Asy.Var[ai] =σ2

ε

T+ x′

i. Asy.Var[b] xi. 6→ 0, when n→∞

111

Page 113: Slide Mancini Eco No Metrics EPFL2011

Testing differences across groups

Null hypothesis H0 : α1 = · · · = αn

α1 − α2 = 0

α2 − α3 = 0...

αn−1 − αn = 0

1 −1 0 0′

0 1 −1 0′

... ...

0′ 0 1 −1

α1

α2

...

αn

= R α = 0

that is J = n− 1 restrictions on α.

F -statistic: compare unrestricted R2 vs. restricted R2

F [n− 1, nT −K − n] =(R2

LSDV −R2Pooled)/(n− 1)

(1−R2LSDV)/(nT −K − n)

112

Page 114: Slide Mancini Eco No Metrics EPFL2011

Random effects model

Assumption: unobservable individual effect, ci, uncorrelated with xit

yit = x′it β + ci + εit

= x′it β + E[ci] + (ci − E[ci]) + εit

= x′it β + α + ui + εit

= x′it β + α + ηit

For T observations on individual i:

Var[ηi] = Var[εi + iT ui] =

σ2ε · · · 0

. . .

0 · · · σ2ε

+

σ2u · · · σ2

u... . . . ...

σ2u · · · σ2

u

= σ2ε IT + σ2

u iT i′T

= Σ

113

Page 115: Slide Mancini Eco No Metrics EPFL2011

Random effects model: Generalized least squares

Observations i ⊥ j ⇒ nT × nT cov. matrix block diagonal, Ω = In ⊗ Σ

Remark: Σ does not depend on i

GLS:

β = (X ′Ω−1X)−1X ′Ω−1y =

(n∑

i=1

X ′i Σ−1 Xi

)−1( n∑

i=1

X ′i Σ−1 yi

)

σ2ε and σ2

u in Σ are usually unknown: estimate them and then FGLS

114

Page 116: Slide Mancini Eco No Metrics EPFL2011

FGLS of random effects model: estimating σ2ε

Taking deviations from group means remove heterogeneity ui

yit = x′it β + α + ui + εit

yi. = x′i. β + α + ui + εi.

yit − yi. = (xit − xi.)′β + εit − εi.

= (xit − xi.)′b + eit − ei.

σ2ε =

∑ni=1

∑Tt=1(eit − ei.)

2

nT − n−K

p−→ σ2ε

Degrees of freedom: nT observations − n yi. means − K slopes

Note ei. = 0; Note σ2ε = s2

LSDV as

eit = yit − yi. − (xit − xi.)′b = yit − x′

itb− (yi. − x′i.b)

= yit − x′itb− ai = residual in FE LSDV model

115

Page 117: Slide Mancini Eco No Metrics EPFL2011

FGLS of random effects model: estimating σ2u

OLS consistent, unbiased, not efficient estimator of α and β in

yit = x′it β + α + ui + εit = x′

it β + α + ηit

Hence

plim s2Pooled = plim

e′e

nT −K − 1= Var[ηit] = σ2

u + σ2ε

Consistent estimator of σ2u:

σ2u = s2

Pooled − s2LSDV

If negative, change degrees of freedom

116

Page 118: Slide Mancini Eco No Metrics EPFL2011

Random Effects or Fixed Effects model?

FE: flexible, Cov[ci, xit] 6= 0, but many parameters to estimate: α1, . . . , αn

RE: parsimonious but assumption Cov[ci, xit] = 0 might be violated

Hausman’s specification test, H0 : RE model

• H0 : Cov[ci, xit] = 0⇒ OLS in LSDV and GLS in RE model both consistent,

but OLS inefficient

• H1 : Cov[ci, xit] 6= 0⇒ only OLS in LSDV consistent,

but GLS in RE model inconsistent

Under H0 : OLS in LSDV model ≈ GLS in RE model

117

Page 119: Slide Mancini Eco No Metrics EPFL2011

Hausman’s specification test

b OLS in LSDV model; β GLS in RE model. Under H0 : b− β ≈ 0

Var[b− β] = Var[b] + Var[β]− Cov[b, β]− Cov[β, b]

Hausman’s key result:

0 = Cov[efficient estimator, (efficient estimator − inefficient estimator)]

0 = Cov[β, (β − b)] = Var[β]− Cov[β, b]

This implies, under H0,

Var[b− β] = Var[b]−Var[β]

Wald criterion, based on K estimated slopes, excluding intercept:

W = [b− β]′ (Var[b]−Var[β])−1 [b− β] ∼ χ2(K)

118

Page 120: Slide Mancini Eco No Metrics EPFL2011

Mundlak’s approach

Fixed effects model: E[ci|Xi] = αi, one parameter for each individual i

Random effects model: E[ci|Xi] = α, one parameter for all individuals

Mundlak’s approach: E[ci|Xi] = x′i.γ, parameters γ for all individuals

Model:

yit = x′it β + ci + εit

= x′it β + E[ci|Xi] + (ci − E[ci|Xi]) + εit

= x′it β + x′

i.γ + ui + εit

Drawback: x′i.γ can only include time varying variables

119

Page 121: Slide Mancini Eco No Metrics EPFL2011

Dynamic panel data model

Model yit = x′it β + ci + εit describes static relation

Dynamic model yit = γ yi,t−1 + x′it β + ci + εit fits data much better

OLS and GLS inconsistent: ci correlated with yi,t−1

FE model, deviations from means, first difference: inconsistent estimates

Instrumental variable estimator: consistent estimates

Read about SUR and CAPM in Chapter 10

120

Page 122: Slide Mancini Eco No Metrics EPFL2011

Chapter 12: Instrumental variables

Linear regression model: y = Xβ + ε

b = β + (X ′X)−1X ′ε

b unbiased when E[ε|X] = 0

b consistent when plimX ′ε/n = 0

In many situations (e.g., dynamic panel models, measurement error on X),

X and ε are correlated ⇒ OLS (and GLS) biased and inconsistent

Solution: Instrumental variables (IV), consistent estimates

121

Page 123: Slide Mancini Eco No Metrics EPFL2011

Assumptions of the model

1. Linearity: E[y|X] linear in β

2. Full rank: X is an n×K matrix with rank K

3. Endogeneity of independent variables: E[εi|xi] 6= 0

4. Homoscedasticity and nonautocorrelation of εi

5. Stochastic or nonstochastic X

6. Normal distribution: ε|X ∼ N(0, σ2I)

122

Page 124: Slide Mancini Eco No Metrics EPFL2011

Instrumental variable: Definition

Instrumental variables Z = [z1 · · · zL] (n× L), L ≥ K, have two properties:

1. Exogeneity: Z uncorrelated with ε

2. Relevance: Z correlated with X

Further assumptions of the model:

• [xi, zi, εi], i = 1, . . . , n, i.i.d.

• E[εi|zi] = 0

• plimZ ′Z/n = Qzz, finite, positive definite matrix

• plimZ ′ε/n = 0 (Exogeneity)

• plimZ ′X/n = Qzx, finite, L×K matrix, rank K (Relevance)

123

Page 125: Slide Mancini Eco No Metrics EPFL2011

Insight on IV estimation

When plimX ′ε/n = 0

y = X β + ε

X ′y/n = X ′X β/n + X ′ε/n

X ′y/n ≈ X ′X β/n

β ≈ (X ′X)−1X ′y

When plimX ′ε/n 6= 0, but plimZ ′ε/n = 0 (and L = K)

Z ′y/n = Z ′X β/n + Z ′ε/n

Z ′y/n ≈ Z ′X β/n

β ≈ (Z ′X)−1Z ′y

Remark: ≈ are = in plim

124

Page 126: Slide Mancini Eco No Metrics EPFL2011

Instrumental variable estimator (L = K)

L instruments, observed variables, Z is n× L matrix, when L = K

bIV = (Z ′X)−1Z ′y

= β + (Z ′X)−1Z ′ε

plim bIV = β + (plimZ ′X/n)−1 plimZ ′ε/n

= β√

n(bIV − β) = (Z ′X/n)−1√

n Z ′ε/n

d−→ Q−1zx ×N(0, σ2Qzz)

d= N(0, σ2Q−1

zx QzzQ−1xz )

bIVa∼ N(β, σ2Q−1

zx QzzQ−1xz /n)

Exogeneity ⇒ consistency; Relevance ⇒ low variance

125

Page 127: Slide Mancini Eco No Metrics EPFL2011

Instrumental variable estimator (L > K)

When L > K, Z ′X is L×K, not invertible matrix

X correlated with ε ⇒ inconsistency

Z uncorrelated with ε (Exogeneity)

Idea: project X on Z to get X, than regress y on X to estimate β

X = Z × slope of X on Z = Z(Z ′Z)−1Z ′X

Regressing y on X:

bIV = [X ′X]−1X ′y

= [X ′Z(Z ′Z)−1Z ′ Z(Z ′Z)−1Z ′X]−1X ′Z(Z ′Z)−1Z ′y

= [X ′Z(Z ′Z)−1Z ′X]−1X ′Z(Z ′Z)−1Z ′y

Two-stage least squares (2SLS) estimator (only logically)

126

Page 128: Slide Mancini Eco No Metrics EPFL2011

Which instruments?

Instrumental variables are generally difficult to find

Z can include variables in X uncorrelated with ε

In time series settings, lagged values of x and y are typical instruments

Relevance ⇒ high correlation between X and Z (otherwise Q−1xz large)

But then Z might be correlated with ε (as ε is correlated with X)

127

Page 129: Slide Mancini Eco No Metrics EPFL2011

Example: Dynamic panel data model

Model: yit = γ yi,t−1 + x′it β + ci + εit

ci correlated or uncorrelated with xit

ci certainly correlated with yi,t−1 ⇒ LS inconsistent

Taking first difference, ∆yit = yit − yi,t−1

∆yit = γ∆yi,t−1 + ∆x′it β + ∆εit

Cov[∆yi,t−1,∆εit] 6= 0 ⇒ LS still inconsistent

To estimate γ and β, valid instruments, e.g., yi,t−2 and ∆yi,t−2

128

Page 130: Slide Mancini Eco No Metrics EPFL2011

Measurement error

Measurement errors are very common in practice

E.g., variables of interest are not available but only approximated by others

E.g., GDP, consumption, capital, . . . , cannot be measured exactly

129

Page 131: Slide Mancini Eco No Metrics EPFL2011

Regression model with measurement error

True, latent (unobserved), univariate model

y∗i = x∗

i β + εi

Observed data: yi = y∗i + vi and xi = x∗

i + ui

where vi ∼ (0, σ2v), vi ⊥ y∗

i , x∗i , ui and ui ∼ (0, σ2

u), ui ⊥ y∗i , x

∗i , vi

Working model, derived from true model:

yi − vi = (xi − ui)β + εi

yi = xi β + (−ui β + εi + vi)

Measurement error on yi, i.e. vi, absorbed in the error term

Measurement error on xi, i.e. ui, makes LS inconsistent

130

Page 132: Slide Mancini Eco No Metrics EPFL2011

LS estimation with measurement error

Set vi = 0 for simplicity. Working model:

yi = xi β + (−ui β + εi) = xi β + wi

LS estimation of β inconsistent because

Cov[xi, wi] = Cov[x∗i + ui,−ui β + εi] = −β σ2

u 6= 0

b =

(n∑

i=1

x2i/n

)−1 n∑

i=1

xi yi/n

plim b =

(

plim

n∑

i=1

(x∗i + ui)

2/n

)−1

plim

n∑

i=1

(x∗i + ui)(x

∗i β + εi)/n

=(Q∗ + σ2

u

)−1βQ∗ = β/(1 + σ2

u/Q∗)

→ 0 when σ2u →∞

131

Page 133: Slide Mancini Eco No Metrics EPFL2011

IV estimation with measurement error

Instrument zi has the two properties:

1. Exogeneity: Cov[zi, ui] = 0

2. Relevance: Cov[zi, x∗i ] = Q∗

zx 6= 0

Recall true model yi = x∗i β + εi, observed regressor xi = x∗

i + ui

bIV =

(n∑

i=1

xi zi/n

)−1 n∑

i=1

zi yi/n

plim bIV =

(

plim

n∑

i=1

(x∗i + ui)zi/n

)−1

plim

n∑

i=1

zi(x∗i β + εi)/n

= (Q∗zx)

−1βQ∗

zx = β

132

Page 134: Slide Mancini Eco No Metrics EPFL2011

IV estimation of generalized regression model

In generalized regression model E[ε ε′|X] = σ2Ω

bIV = [X ′Z(Z ′Z)−1Z ′X]−1X ′Z(Z ′Z)−1Z ′y

= β + [X ′Z(Z ′Z)−1Z ′X]−1X ′Z(Z ′Z)−1Z ′ε

plim bIV = β + Qxx.z × plimZ ′ε/n = β√

n(bIV − β)d−→ Qxx.z ×N(0, σ2 plim (Z ′ΩZ/n))

d= N(0, σ2Qxx.z plim (Z ′ΩZ/n)Q′

xx.z)

bIVa∼ N(β, σ2Qxx.z plim (Z ′ΩZ/n)Q′

xx.z/n)

Same derivation as when E[ε ε′|X] = σ2I

133

Page 135: Slide Mancini Eco No Metrics EPFL2011

Chapter 15: Generalized method of moments (GMM)

General framework for estimation and hypothesis testing

LS, NLS, GLS, IV, etc. special cases of GMM

GMM relies on “weak” assumptions about first moments

(existence and convergence of first moments)

Strength (and limitation) of GMM:

No assumptions about distribution ⇒ Robust to misspecification of DGP

Widely used in Econometrics, Finance, . . .

134

Page 136: Slide Mancini Eco No Metrics EPFL2011

Logic behind method of moments

Sample momentsp−→ Population moments = function(parameters)

E.g., random sample yi, i = 1, . . . , n, with E[yi] = µ and Var[yi] = σ2

1

n

n∑

i=1

yip−→ E[yi] = µ

1

n

n∑

i=1

y2i

p−→ E[y2i ] = σ2 + µ2

Assumptions of Law of Large Numbers need to hold

135

Page 137: Slide Mancini Eco No Metrics EPFL2011

Orthogonality conditions: Example

Parameters are implicitly defined by two orthogonality conditions:

E[yi − µ] = 0

E[y2i − σ2 − µ2] = 0

To estimate µ and σ2, replace E[·] by empirical distribution

and solve two moment equations:

1

n

n∑

i=1

(yi − µ) = 0

1

n

n∑

i=1

(y2i − σ2 − µ2) = 0

Moment estimators: µ =∑n

i=1 yi/n and σ2 =∑n

i=1(yi − µ)2/n

136

Page 138: Slide Mancini Eco No Metrics EPFL2011

Example: Gamma distribution

Gamma distribution used to model positive r.v. yi, e.g. waiting time

f(y) =λp

Γ[p]e−λyyp−1, y ≥ 0, p > 0, λ > 0

(Some) orthogonality conditions:

E

yi − p/λ

y2i − p(p + 1)/λ2

ln yi − d ln Γ[p]/dp + lnλ

1/yi − λ/(p− 1)

= 0

Orthogonality conditions are (general) nonlinear functions of sample data

More orthogonality conditions (four) than parameters (two)

Any two orthogonality conditions give (p, λ): need to reconcile all of them

137

Page 139: Slide Mancini Eco No Metrics EPFL2011

Orthogonality conditions

K parameters to estimate, θ = (θ1, . . . , θK)′

L moment conditions (L ≥ K):

E

mi1(yi, xi, θ)...

mil(yi, xi, θ)...

miL(yi, xi, θ)

= E[mi(yi, xi, θ)] = 0

θ implicitly defined by equation above

estimated via empirical counter part of E[·]

138

Page 140: Slide Mancini Eco No Metrics EPFL2011

Exactly identified case

When L = K, i.e. # moment conditions = # parameters,

sample moment equations have a unique solution and are all exactly satisfied

• E.g., previous method of moments estimator of µ and σ2

• E.g., LS estimator: E[mi(yi, xi, θ)] = E[xi (yi − x′i θ)] = 0

Solving sample moment equations (or normal equations)

1

n

n∑

i=1

[xi (yi − x′i θ)] = 0

1

n

n∑

i=1

xiyi −1

n

n∑

i=1

xix′i θ = 0

θ =

(n∑

i=1

xix′i

)−1( n∑

i=1

xiyi

)

139

Page 141: Slide Mancini Eco No Metrics EPFL2011

Overidentified case

When L > K, i.e. # moment conditions > # parameters,

system of L equations in K unknown parameters

1

n

n∑

i=1

mil(yi, xi, θ) = 0, l = 1, . . . , L

has no solution (equations functionally independent) in finite samples

although

plim1

n

n∑

i=1

mil(yi, xi, θ) = E[mil(yi, xi, θ)] = 0, l = 1, . . . , L

E.g., previous estimation of parameters of Gamma distribution

E.g., IV estimation when # instruments L > # parameters K

140

Page 142: Slide Mancini Eco No Metrics EPFL2011

Criterion function

When L > K, to reconcile different estimates, minimize criterion function

q = m(θ)′ Wn m(θ)

where m(θ) =∑n

i=1 mi(yi, xi, θ)/n, L× 1 moment conditions

Wn: positive definite, weighting matrix, with plimWn = W

• When Wn = I

q = m(θ)′ m(θ) =L∑

l=1

ml(θ)2

where ml(θ) =∑n

i=1 mil(yi, xi, θ)/n, l = 1, . . . , L

• When Wn inversely proportional to variance of m(θ) ⇒ Efficiency gains

same logic that makes GLS more efficient than OLS

141

Page 143: Slide Mancini Eco No Metrics EPFL2011

Optimal weighting matrix

L orthogonality conditions, possibly correlated

optimal weighting matrix:

W = Asy.Var[√

n m(θ)]−1 = Φ−1

Recall, Var[m(θ)] = Var[∑n

i=1 mi(yi, xi, θ)/n] ∈ O(1/n)

Efficient GMM estimator based on Φ−1

• When L > K, W = I (or any W 6= Φ−1) produces inefficient estimates of θ

• When L = K =⇒ moment equations satisfied exactly, i.e. m(θ) = 0,

=⇒ q = 0 and W irrelevant

142

Page 144: Slide Mancini Eco No Metrics EPFL2011

Assumptions of GMM estimation

θ0 true parameter vector, K × 1

L population orthogonality conditions: E[mi(θ0)] = 0, L ≥ K

L sample moments: mn(θ0) =∑n

i=1 mi(θ0)/n

E.g., IV estimation: mn(θ0) =∑n

i=1 zi(yi − x′iθ0)/n,

L ≥ K instruments, one orthogonality condition for each instrument

Assumption 1: Convergence of empirical moments

Data generating process satisfies assumptions of Law of Large Numbers

mn(θ0) =1

n

n∑

i=1

mi(θ0)p−→ E[mi(θ0)] = 0

143

Page 145: Slide Mancini Eco No Metrics EPFL2011

Assumptions of GMM estimation

Empirical moment equations continuous and continuously differentiable

=⇒ L×K matrix of partial derivatives

Gn(θ0) =∂mn(θ0)

∂θ′0=

1

n

n∑

i=1

∂mi(θ0)

∂θ′0

p−→ G(θ0)

Law of Large Numbers apply to moments and derivatives of moments

Assumption 2: Identification

For any n > K, if θ1 6= θ2, then mn(θ1) 6= mn(θ2)

plim qn(θ) = plim (mn(θ)′ Wn mn(θ)) has unique minimum (= zero) at θ0

Identification ⇒ L ≥ K and rank(Gn(θ0)) = K

144

Page 146: Slide Mancini Eco No Metrics EPFL2011

Assumptions of GMM estimation

Assumptions 1 and 2 =⇒ θ can be estimated

Assumption 3: Asymptotic distribution of empirical moments

Empirical moments obey a Central Limit Theorem

√n mn(θ0)

d−→ N(0,Φ)

145

Page 147: Slide Mancini Eco No Metrics EPFL2011

Asymptotic properties of GMM

Under previous assumptions

θGMMp−→ θ0

θGMMa∼ N(θ0, [G(θ0)

′Φ−1G(θ0)]−1/n)

146

Page 148: Slide Mancini Eco No Metrics EPFL2011

Consistency of GMM estimator

Recall criterion function qn(θ) = mn(θ)′ Wn mn(θ)

Assumption 1 and continuity of moments =⇒ qn(θ)p−→ q0(θ)

Wn positive definite, for any finite n

0 ≤ qn(θGMM) ≤ qn(θ0)

When n→∞, qn(θ0)p−→ 0 =⇒ qn(θGMM)

p−→ 0

W positive definite and identification assumption =⇒ θGMMp−→ θ0

147

Page 149: Slide Mancini Eco No Metrics EPFL2011

Asymptotic normality of GMM estimator

First order condition for the GMM estimator:

∂qn(θGMM)

∂θGMM

= 2Gn(θGMM)′ Wn mn(θGMM) = 0

Assumption: moment equations continuous and continuously differentiable

Mean Value Theorem and Taylor expansion at θ0 of moment equations:

mn(θGMM) = mn(θ0) + Gn(θ)(θGMM − θ0)

where θ0 < θ < θGMM componentwise. Fist order condition becomes:

2 Gn(θGMM)′ Wn [mn(θ0) + Gn(θ)(θGMM − θ0)] = 0

Solve for (θGMM − θ0) and ×√n give:

148

Page 150: Slide Mancini Eco No Metrics EPFL2011

Asymptotic normality of GMM estimator

√n(θGMM − θ0) = −[Gn(θGMM)′ Wn Gn(θ)]−1Gn(θGMM)′ Wn

√n mn(θ0)

When n→∞

• θGMMp−→ θ0 and θ

p−→ θ0 as θ0 < θ < θGMM componentwise

• Gn(θGMM)p−→ G(θ0) and Gn(θ)

p−→ G(θ0)

• Wnp−→W by construction of weighting matrix

• √n mn(θ0)d−→ N(0,Φ) by Assumption 3

√n(θGMM − θ0)

d−→ −[G(θ0)′ W G(θ0)]

−1G(θ0)′ W ×N(0,Φ)

d= N [0, [G(θ0)

′ W G(θ0)]−1G(θ0)

′ WΦ . . . . . .′]d= N [0, [G(θ0)

′ Φ−1 G(θ0)]−1] using W = Φ−1

θGMMa∼ N(θ0, [G(θ0)

′ Φ−1 G(θ0)]−1/n)

149

Page 151: Slide Mancini Eco No Metrics EPFL2011

Weighting matrix

Any W positive definite matrix produces consistent GMM estimates

W determines efficiency of GMM estimator:

Optimal W = Asy.Var[√

n m(θ)]−1 depends on unknown θ

Feasible two-step procedure:

Step 1. Use W = I to obtain a consistent estimator, θ(1), then estimate

Φ =1

n

n∑

i=1

mi(yi, xi, θ(1)) mi(yi, xi, θ(1))′

(when mi(yi, xi, θ0), i = 1, . . . , n uncorrelated sequence)

Step 2. Use W = Φ−1 to compute GMM estimator

150

Page 152: Slide Mancini Eco No Metrics EPFL2011

Testing hypothesis in GMM framework

Two sets of tests:

1. Testing restrictions induced by moment equations

2. GMM counterparts to Wald, LM, and LR tests

151

Page 153: Slide Mancini Eco No Metrics EPFL2011

Specification test

In exactly identified case, L moment equations = K parameters:

θ exists such that m(θ) = 0

In overidentified case, L moment equations > K parameters:

L−K moment equations imply moment restrictions on θ

Intuition: •K moment equations “set to zero to compute” the K parameters

• L−K “free” moment equations

Test of overidentifying restrictions, using W = Asy.Var[√

n m(θ)]−1:

J-stat = nq =√

n m(θ)′ W√

n m(θ)d−→ χ2

(L−K)

Note: no parametric restrictions on θ in the specification test

152

Page 154: Slide Mancini Eco No Metrics EPFL2011

Testing parametric restrictions

To test J (linear or nonlinear) parametric restrictions on θ

Given L moment equations, now only K − J free parameters

nqR =√

n m(θR)′ W√

n m(θR)d−→ χ2

(L−(K−J))

nqR − nqd−→ χ2

(J)

as for degrees of freedom (L− (K − J))− (L−K) = J

Note: same optimal weighting matrix W in qR and q =⇒ qR ≥ q

153

Page 155: Slide Mancini Eco No Metrics EPFL2011

Application of GMM: Asset pricing model estimation

Asset pricing model:

E[rej,t] = δAER βAER,j + δHML βHML,j + δCLS βCLS,j = δ′β

Stochastic Discount Factor (SDF) representation, demeaned factors ft:

mt = 1− bAER˜AERt − bHML

˜HMLt − bCLS˜CLSt = 1− b′ft

Euler pricing equation:

E[mt rej,t] = 0

⇒ N moment conditions j = 1, . . . , N

Market price of risks, δ, and SDF loadings, b:

δ = E[ft f ′t]b

154

Page 156: Slide Mancini Eco No Metrics EPFL2011

GMM estimation resultsModel (1) (2) (3)

δAER 11.43 4.34

(7.26) (11.36)

δHML 18.70 7.48

(13.08) (35.93)

δCLS 13.16 27.13

(3.35) (17.77)

bAER 0.13 −0.03

(0.05) (0.06)

bHML 0.05 −0.16

(0.04) (0.07)

bCLS 0.26 0.75

(0.07) (0.24)

J-stat 0.0467 0.0444 0.0349

p-value 6.03% 7.71% 14.14%

Table 1: Parameter estimates (Newey–West standard errors).

155

Page 157: Slide Mancini Eco No Metrics EPFL2011

Chapter 16: Maximum Likelihood Estimation

Maximum Likelihood Estimation (MLE): very important inference method

Maximum likelihood principle:

Given sample data generated from parametric model,

find parameters that maximize probability of observing that sample

Basic, strong assumption:

DGP has parametric, known (up to θ) distribution

Fundamental result:

MLE makes “best use” of this information

156

Page 158: Slide Mancini Eco No Metrics EPFL2011

Likelihood function

Likelihood function = probability of observing that sample

Formally, joint density of n i.i.d. observations, y1, . . . , yn

f(y1, . . . , yn; θ) =n∏

i=1

f(yi; θ) = L(θ; y)

L(θ; y) is the likelihood function, with θ unknown

Log-likelihood is usually easier to deal with

lnL(θ; y) =n∑

i=1

ln f(yi; θ)

157

Page 159: Slide Mancini Eco No Metrics EPFL2011

Identification

Identification means parameters are estimable. It depends on the model

Check identification before estimating or testing the model

Definition: θ is identified (or estimable) if

L(θ; y) 6= L(θ∗; y)

∀θ∗ 6= θ and some data y

E.g. Linear regression model not identified when rank [x1, . . . , xK] < K

E.g. Threshold model for yi > 0 or yi ≤ 0

Pr(yi > 0) = Pr(β1 + β2 xi + εi > 0) = Pr(εi/σ > −(β1 + β2 xi)/σ)

not identified, σ, β1, β2 not estimable (normalization required, e.g. σ = 1)

158

Page 160: Slide Mancini Eco No Metrics EPFL2011

Maximum likelihood estimator

Maximum likelihood estimator, θ, solves

θ = arg maxθ

L(θ; y)

= arg maxθ

lnL(θ; y)

or equivalently the likelihood equation

∂ lnL(θ; y)

∂θ= 0

159

Page 161: Slide Mancini Eco No Metrics EPFL2011

Maximum likelihood estimator: Example

i.i.d. normal random variables, yi ∼ N(µ, σ2), i = 1, . . . , n

lnL(µ, σ2; y) = −n

2ln(2π)− n

2lnσ2 − 1

2

n∑

i=1

(yi − µ)2/σ2

∂ lnL/∂µ =

n∑

i=1

(yi − µ)/σ2 = 0

∂ lnL/∂σ2 = − n

2σ2+

1

2 σ4

n∑

i=1

(yi − µ)2 = 0

Solve likelihood equations:

µML =1

n

n∑

i=1

yi

σ2ML =

1

n

n∑

i=1

(yi − µML)2

160

Page 162: Slide Mancini Eco No Metrics EPFL2011

Asymptotic efficiency

An estimator is asymptotically efficient if it is

• consistent,

• asymptotically normally distributed (CAN), and has

• asy. covariance matrix not larger than that of any other CAN estimator

Under some regularity conditions, MLE is asymptotically efficient

Finite sample properties usually not optimal

E.g., σ2ML =

∑ni=1(yi− y)2/n biased (no correction for degrees of freedom)

161

Page 163: Slide Mancini Eco No Metrics EPFL2011

Properties of MLE

Under regularity conditions, MLE θ has the following properties:

M1 Consistency: plim θ = θ0

M2 Asymptotic normality: θa∼ N(θ0, −E0[∂

2 lnL/∂θ0 ∂θ′0]−1)

M3 Asymptotic efficiency: θ reaches Cramer–Rao lower bound in M2

M4 Invariance: MLE of γ0 = c(θ0) is γ0 = c(θ) if c ∈ C1

162

Page 164: Slide Mancini Eco No Metrics EPFL2011

Regularity conditions on f(yi; θ)

R1 First three derivatives of ln f(yi; θ) w.r.t. θ are continuous and finite ∀θ

R2 Conditions for E[∂ ln f(yi; θ)/∂θ] <∞, E[∂2 ln f(yi; θ)/∂θ ∂θ′] <∞ hold

R3 |∂3 ln f(yi; θ)/∂θj ∂θk ∂θl| < h, where E[h] <∞, ∀θ

Definition: Regular densities satisfy R1–R3

Goals: use Taylor approximation; interchange differentiation and expectation

Notation: gradient gi = ∂ ln f(yi; θ)/∂θ, Hessian Hi = ∂2 ln f(yi; θ)/∂θ ∂θ′

163

Page 165: Slide Mancini Eco No Metrics EPFL2011

Properties of regular densities

Moments of derivatives of log-likelihood:

D1 ln f(yi; θ), gi, Hi, i = 1, . . . , n are random samples

D2 E0[gi(θ0)] = 0

D3 Var0[gi(θ0)] = −E0[Hi(θ0)]

D1 implied by assumption: yi, i = 1, . . . , n is random sample

To prove D2: by definition 1 =∫

f(yi; θ0) dyi

∂1

∂θ0=

∂θ0

f(yi; θ0) dyi

0 =

∫∂f(yi; θ0)

∂θ0dyi =

∫∂ ln f(yi; θ0)

∂θ0f(yi; θ0) dyi = E0[gi(θ0)]

164

Page 166: Slide Mancini Eco No Metrics EPFL2011

Information matrix equality

To prove D3: differentiate previous integral once more w.r.t. θ0

∂0

∂θ′0=

∂θ′0

∫∂ ln f(yi; θ0)

∂θ0f(yi; θ0) dyi

0 =

∫ [∂2 ln f(yi; θ0)

∂θ0 ∂θ′0f(yi; θ0) +

∂ ln f(yi; θ0)

∂θ0

∂f(yi; θ0)

∂θ′0

]

dyi

=

∫ [∂2 ln f(yi; θ0)

∂θ0 ∂θ′0f(yi; θ0) +

∂ ln f(yi; θ0)

∂θ0

∂ ln f(yi; θ0)

∂θ′0f(yi; θ0)

]

dyi

= E0[Hi(θ0)] + Var0[gi(θ0)] =⇒ D3

D1 (random sample) ⇒ Var0[∑n

i=1 gi(θ0)] =∑n

i=1 Var0[gi(θ0)]

Var0[

n∑

i=1

gi(θ0)] =: Var0

[∂ lnL(θ0; y)

∂θ0

]

= −E0

[∂2 lnL(θ0; y)

∂θ0 ∂θ′0

]

︸ ︷︷ ︸

Information matrix equality

:= −E0[

n∑

i=1

Hi(θ0)]

165

Page 167: Slide Mancini Eco No Metrics EPFL2011

Likelihood equation

Score vector at θ:

g =∂ lnL(θ; y)

∂θ=

n∑

i=1

∂ ln f(yi; θ)

∂θ=

n∑

i=1

gi

D1 (random sample) and D2 (E0[gi(θ0)] = 0) ⇒ Likelihood equation at θ0:

E0

[∂ lnL(θ0; y)

∂θ0

]

= 0

166

Page 168: Slide Mancini Eco No Metrics EPFL2011

Consistency of MLE

In any finite sample, lnL(θ) ≥ lnL(θ0) (and in general ∀θ 6= θ, not only θ0)

From Jensen’s inequality, if θ0 6= θ (and in general ∀θ 6= θ0, not only θ)

E0

[

lnL(θ)

L(θ0)

]

< lnE0

[

L(θ)

L(θ0)

]

= ln

∫L(θ)

L(θ0)L(θ0) dy = ln 1 = 0

E0[lnL(θ)/n] < E0[lnL(θ0)/n] (♣)

Under previous assumptions, using inequality in the very first row:

plim lnL(θ)/n ≥ plim lnL(θ0)/n

E0[lnL(θ)/n] ≥ E0[lnL(θ0)/n]

and combining with (♣): E0[lnL(θ0)/n] > E0[lnL(θ)/n] ≥ E0[lnL(θ0)/n]

⇒ plim lnL(θ)/n = E0[lnL(θ0)/n] and plim θ = θ0

167

Page 169: Slide Mancini Eco No Metrics EPFL2011

Asymptotic normality of MLE

MLE solves sample likelihood equation: g(θ) =∑n

i=1 gi(θ) = 0

First order Taylor expansion: g(θ) = g(θ0) + H(θ)(θ − θ0) = 0

As θ = w θ0 + (1− w) θ, 0 < w < 1, plim θ = θ0 ⇒ plim θ = θ0

Hessian is continuous in θ. Rearranging, scaling by√

n, taking limit n→∞

√n(θ − θ0) = −H(θ)−1

√n g(θ0) =

−1

n

n∑

i=1

Hi(θ)

−1√n

1

n

n∑

i=1

gi(θ0)

d−→

−E0

[

1

n

n∑

i=1

Hi(θ0)

]−1

×N

(

0,−E0

[

1

n

n∑

i=1

Hi(θ0)

])

d= N

0,

−E0

[

1

n

n∑

i=1

Hi(θ0)

]−1

θa∼ N

(

θ0, −E0 [H(θ0)/n]−1/n)

= N(

θ0, I(θ0)−1)

168

Page 170: Slide Mancini Eco No Metrics EPFL2011

Asymptotic efficiency

Cramer–Rao lower bound :

Assume that f(yi; θ0) satisfies regularity conditions R1–R3,

the asymptotic variance of a consistent and asy. normally distributed

estimator of θ0 is at least as large as

I(θ0)−1 =

−E0

[∂2 lnL(θ0)

∂θ0 ∂θ′0

]−1

Asymptotic variance of MLE reaches the Cramer–Rao lower bound

169

Page 171: Slide Mancini Eco No Metrics EPFL2011

Invariance

MLE of γ0 = c(θ0) is γ0 = c(θ) if c ∈ C1

MLE invariant to one-to-one transformation

Useful application: lnL(θ0) can be “complicated” function of θ0

re-parameterize the model to simplify calculations using lnL(γ0)

E.g. Normal log-likelihood, precision parameter γ2 = 1/σ2

lnL(µ, γ2; y) = −n

2ln(2π) +

n

2ln γ2 − γ2

2

n∑

i=1

(yi − µ)2

∂ lnL

∂γ2=

n

2

1

γ2− 1

2

n∑

i=1

(yi − µ)2 = 0

γ2ML =

n∑n

i=1(yi − µML)2=

1

σ2ML

170

Page 172: Slide Mancini Eco No Metrics EPFL2011

Estimating asymptotic covariance matrix of MLE

Asy.Var[θ] depends on θ0. Three estimators, asymptotically equivalent:

1. Calculate E0[H(θ0)] (very difficult) and evaluate it at θ to estimate

I(θ0)−1 =

−E0

[∂2 lnL(θ0)

∂θ0 ∂θ′0

]−1

2. Calculate H(θ0) (still quite difficult) and evaluate it at θ to get

I(θ)−1 =

−∂2 lnL(θ)

∂θ ∂θ′

−1

=

−n∑

i=1

Hi(θ)

−1

3. BHHH or OPG estimator (very easy): use D3 E0[−Hi(θ0)] = Var0[gi(θ0)]

ˆI(θ)−1 =

−n∑

i=1

Hi(θ)

−1

=

n∑

i=1

gi(θ) gi(θ)′

−1

171

Page 173: Slide Mancini Eco No Metrics EPFL2011

Conditional likelihood

Econometric models involve exogenous variables xi ⇒ yi not i.i.d.

E.g. Model: yi = x′i β + εi, xi can be stochastic, correlated across i’s, etc.

Usually f(y; θ) not interesting, data generated by f(y, x) not known

Way out: DGP of xi exogenous and well-behaved (LLN applies),

xi ∼ f(xi, δ), θ and δ no common elements, no restrictions between θ and δ

f(yi, xi; θ, δ) = f(yi|xi; θ) f(xi; δ)

lnL(θ, δ; y, x) =

n∑

i=1

ln f(yi|xi; θ) +

n∑

i=1

ln f(xi; δ)

θML = arg maxθ

n∑

i=1

ln f(yi|xi; θ)

172

Page 174: Slide Mancini Eco No Metrics EPFL2011

Maximizing log-likelihood

Log-likelihoods are typically highly nonlinear functions of parameters

E.g., GARCH in mean model for asset return, yt = pt/pt−1−1 = Et−1[yt]+εt

with Et−1[yt] = γ0 + γ1σ2t and Vart−1[yt] = σ2

t = β0 + β1 ε2t−1 + β2 σ2

t−1

lnL = −0.5∑T

t=1[ln(2π) + lnσ2t + (yt − γ0 − γ1σ

2t )

2/σ2t ]

Maximizing log-likelihood is a numerical problem, various methods:

• “Brute force” (but using good routines, e.g. FMINSEARCH in Matlab)

• Newton’s method: θ(i+1) = θ(i) −H−1(i) g(i), use actual Hessian

• Score method: θ(i+1) = θ(i) −H−1 g(i), use expected Hessian

173

Page 175: Slide Mancini Eco No Metrics EPFL2011

Hypothesis testing

Test of hypothesis H0 : c(θ) = 0

Three tests, asymptotically equivalent (not in finite sample)

• Likelihood ratio: If c(θ) = 0, then lnLU − lnLR ≈ 0

Both unrestricted (ML) and restricted estimators are required

• Wald test: If c(θ) = 0, then c(θML) ≈ 0

Only unrestricted (ML) estimator is required

• Lagrange multiplier test: If c(θ) = 0, then ∂ lnL/∂θR ≈ 0

Only restricted estimator is required

174

Page 176: Slide Mancini Eco No Metrics EPFL2011

Likelihood ratio test

LU = L(θU), where θU is MLE, unrestricted

LR = L(θR), where θR is restricted estimator

Likelihood ratio: LR/LU

0 ≤ LR/LU ≤ 1

Limiting distribution of likelihood ratio: 2 (lnLU − lnLR) ∼ χ2df

with df = # of restrictions

Remarks:

• LR test cannot be used to test two restricted models, θU must be MLE

• Likelihood function L must be the same in LU and LR

175

Page 177: Slide Mancini Eco No Metrics EPFL2011

Wald test

Wald test based on full rank quadratic forms

Recall: If x ∼ N(µ,Σ), quadratic form (x− µ)′ Σ−1 (x− µ) ∼ χ2(J)

If E[x] 6= µ, (x−µ)′ Σ−1 (x−µ) ∼ noncentral χ2(J) (> χ2

(J) on average)

If H0 : c(θ) = q is true, c(θML)− q ≈ 0 (not “= 0” for sampling variability)

If H0 : c(θ) = q is false, c(θML)− q ≪ 0 or ≫ 0

Wald test statistic:

W = [c(θML)− q]′Asy.Var[c(θML)− q]−1[c(θML)− q] ∼ χ2df

with df = # of restrictions

Drawbacks: no H1 ⇒ limited power; not invariant to restriction formulation

176

Page 178: Slide Mancini Eco No Metrics EPFL2011

Lagrange multiplier test

Lagrange multiplier (or score) test based on restricted model

Restrictions H0 : c(θ) = q, Lagrangean: lnL(θ) + (c(θ)− q)′λ

First order conditions for restricted θ, i.e. θR:

∂ lnL(θ)

∂θ+

∂c(θ)′

∂θλ = 0

If restrictions not binding ⇒ λ = 0 (first term MLE) and can be tested.

Simpler, equivalent approach: at restricted maximum

∂ lnL(θR)

∂θR

+∂c(θR)′

∂θR

λ = 0 =⇒ −∂c(θR)′

∂θR

λ =∂ lnL(θR)

∂θR

= gR

Under H0 : λ = 0, gR =∑n

i=1 gi(θR) = 0

Recall, Var0[∑n

i=1 gi(θ0)] = −E0[∂2 lnL/∂θ0 ∂θ′0] = I(θ0)

177

Page 179: Slide Mancini Eco No Metrics EPFL2011

Lagrange multiplier test statistic

As in Wald test, LM statistic is a full rank quadratic form:

LM =

(

∂ lnL(θR)

∂θR

)′

I(θR)−1

(

∂ lnL(θR)

∂θR

)

∼ χ2df

with df = # of restrictions

Alternative calculation of LM test: define G′R = [g1(θR), . . . , gn(θR)] (K × n),

regress a column of 1s, i, on GR ⇒ slope bi = (G′R GR)−1G′

R i

uncentered R2i =

i′i

i′ i=

(b′i G′R) (GR bi)

i′ i

=(i′GR (G′

R GR)−1G′R) (GR (G′

R GR)−1G′R i)

n

=i′GR G′

R GR−1G′R i

n=

LM

n

178

Page 180: Slide Mancini Eco No Metrics EPFL2011

Application of MLE: Linear regression model

Model: yi = x′i β + εi, and yi|xi ∼ N(x′

i β, σ2)

Log-likelihood based on n conditionally independent observations:

lnL = −n

2ln(2π)− n

2lnσ2 − 1

2

n∑

i=1

(yi − x′i β)2

σ2

= −n

2ln(2π)− n

2lnσ2 − (y −Xβ)′ (y −Xβ)

2σ2

Likelihood equations:

∂ lnL

∂β=

X ′(y −Xβ)

σ2= 0

∂ lnL

∂σ2= − n

2σ2+

(y −Xβ)′ (y −Xβ)

2 σ4= 0

179

Page 181: Slide Mancini Eco No Metrics EPFL2011

MLE of linear regression model

Solving likelihood equations:

βML = (X ′X)−1X ′y and σ2ML =

e′e

n

βML = b =⇒ OLS has all desirable asymptotic properties of MLE

σ2ML 6= s2 = e′e/(n−K) =⇒ σ2

ML biased in finite samples, but

E[σ2ML] = E

[(n−K)

ns2

]

=(n−K)

nσ2 −→ σ2, n→∞

Cramer–Rao lower bound for θ′ML = (β′ML σ2

ML) can be computed explicitly:

I(θ)−1 =

−E

[∂2 lnL(θ)

∂θ ∂θ′

]−1

=

[

σ2(X ′X)−1 0

0′ 2σ4/n

]

180

Page 182: Slide Mancini Eco No Metrics EPFL2011

MLE and Wald test

Testing J (possibly nonlinear) restrictions, H0 : c(β) = 0 vs. H1 : c(β) 6= 0

Idea: check whether unrestricted estimator (i.e. MLE) “satisfies” restrictions

Under H0, Wald statistic

W = c(b)′

∂c(b)

∂b′[σ2(X ′X)−1

] ∂c(b)′

∂b

−1

c(b)d−→ χ2

(J)

where σ2(X ′X)−1 = Asy.Var[b], using Delta method:

c(b) ≈ c(β) +∂c(β)

∂β′(b− β)

Asy.Var[c(b)] =∂c(β)

∂β′Asy.Var[b]

∂c(β)′

∂β

and plim b = β, plim c(b) = c(β), plim ∂c(b)/∂b′ = plim ∂c(β)/∂β′

181

Page 183: Slide Mancini Eco No Metrics EPFL2011

MLE and Likelihood ratio test

Testing J (possibly nonlinear) restrictions, H0 : c(β) = 0 vs. H1 : c(β) 6= 0

Idea: check whether unrestricted L “significantly” larger than restricted L∗

Likelihood ratio test: b unrestricted, b∗ restricted slopes

FOC of σ2 implies: Est.[σ2] = (y −Xβ)′(y −Xβ)/n, with β = b or b∗

LR = 2[lnL− lnL∗]

=

[

−n ln σ2 − (y −Xb)′ (y −Xb)

σ2

]

−[

−n ln σ2∗ −

(y −Xb∗)′ (y −Xb∗)

σ2∗

]

= n ln σ2∗ − n ln σ2 d−→ χ2

(J)

plugging σ2 in lnL, and σ2∗ in lnL∗, (i.e. concentrating log-likelihood)

⇒ second terms in square brackets both equal n and cancel out

182

Page 184: Slide Mancini Eco No Metrics EPFL2011

MLE and Lagrange multiplier test

Testing J (possibly nonlinear) restrictions, H0 : c(β) = 0 vs. H1 : c(β) 6= 0

Idea: gradient of lnL at restricted maximum, gR, should be “close” to zero

From Lagrangean: gR(β) = ∂ lnL(β)/∂β = −∂c(β)′/∂β λ

Under H0 : λ = 0, E0[gR(β)] = E0[X′ε/σ2] = 0⇒ X ′e∗ ≈ 0

Lagrange multiplier: apply Wald-type test to restricted gradient of lnL

LM = e′∗X(Est.Var[X ′ε])−1X ′e∗

= e′∗X(σ2∗X

′X)−1X ′e∗

=e′∗X(X ′X)−1X ′e∗

e′∗e∗/n= nR2

∗d−→ χ2

(J)

R2∗ in the regression of restricted residuals e∗ = (y −Xb∗) on X

Intuition: if restrictions not binding, b∗ = b, e∗ ⊥ X, LM = 0

183

Page 185: Slide Mancini Eco No Metrics EPFL2011

Pseudo Maximum Likelihood estimation

ML requires complete specification of f(yi|xi; θ)

What if the density is misspecified?

Under certain conditions, the estimator retain some good properties

even if the wrong likelihood is maximized

E.g., In the model yi = x′i β + εi, OLS is MLE when εi ∼ N(0, σ2)

but under certain conditions LS is still consistent, even when εi 6∼ N(0, σ2)

When εi 6∼ N(0, σ2), OLS is maximizing the wrong likelihood

Key point: OLS solves normal equations E[xi(yi − x′i β)] = 0

These equations might hold even when εi 6∼ N(0, σ2)

184

Page 186: Slide Mancini Eco No Metrics EPFL2011

Pseudo Maximum Likelihood estimator

θML = arg maxθ

∑ni=1 ln f(yi|xi; θ), where f(yi|xi; θ) true p.d.f.

θPML = arg maxθ

∑ni=1 lnh(yi|xi; θ), where h(yi|xi; θ) ∈ Exponential family

Key point: h(yi|xi; θ) 6= f(yi|xi; θ), possibly

If h(yi|xi; θ) = f(yi|xi; θ), then θPML = θML

E.g., Estimate θ when f(yi|xi; θ) = N(x′i θ, σ

2i ) and h(yi|xi; θ) = N(x′

i θ, σ2)

PML estimator solves first order conditions:

1

n

n∑

i=1

∂ lnh(yi|xi; θPML)

∂θPML

= 0

185

Page 187: Slide Mancini Eco No Metrics EPFL2011

Asymptotic distribution of PML estimator

Usual method: first order Taylor expansion of FOC, mean value theorem,

rearrange to have (θPML − θ0) in LHS, scale by√

n, take limit n→∞:

√n(θPML − θ0) =

−1

n

n∑

i=1

∂2 lnh(yi|xi; θ)

∂θ ∂θ′

−1√n

1

n

n∑

i=1

∂ lnh(yi|xi; θ0)

∂θ0

d−→−H(θ0)

−1 ×N (0,Φ)

d= N

(0,H(θ0)

−1 Φ H(θ0)−1)

θPMLa∼ N

(θ0,H(θ0)

−1 Φ H(θ0)−1/n

)

If h(yi|xi; θ0) is true p.d.f., then information matrix equality holds,

Φ = −H(θ0), and θPML = θML

θMLa∼ N

(θ0,−H(θ0)

−1/n)

186

Page 188: Slide Mancini Eco No Metrics EPFL2011

Estimator of Asy.Var[θPML]

Sandwich (or robust) estimator of

Asy.Var[θPML] = H(θ0)−1 Φ H(θ0)

−1/n

based on

• Empirical counterpart (no expectation) of the Hessian H(θ0):

Est.[H(θ0)] =1

n

n∑

i=1

∂2 lnh(yi|xi; θPML)

∂θPML ∂θ′PML

• Sample variance of gradients:

Est.[Φ] =1

n

n∑

i=1

[

∂ lnh(yi|xi; θPML)

∂θPML

][

∂ lnh(yi|xi; θPML)

∂θ′PML

]

187

Page 189: Slide Mancini Eco No Metrics EPFL2011

Remarks on PML estimation

In general, maximizing wrong likelihoods gives inconsistent estimates

(in those cases, sandwich estimator of Asy.Var[θPML] useless)

Under certain conditions, θPML robust to some model misspecification

Major advantage of PML: if h(yi|xi; θ0) is true p.d.f., then θPML = θML

(in those cases, sandwich estimator should not be used)

Typical application of PML in Finance: daily asset returns are not normal,

but GARCH volatility models typically estimated using Gaussian likelihoods

188

Page 190: Slide Mancini Eco No Metrics EPFL2011

Summary of the course

• Linear regression model: OLS estimator, specification and hypothesis testing

• Generalized regression model: heteroscedastic data, GLS estimator

• Panel data model: Fixed and Random effects, Hausman’s specification test

• Instrumental variables: regressors correlated with disturbances

• Generalized method of moments: general framework for inference, weak

assumptions

• Maximum likelihood estimation: assume parametric DGP, best use of this

information

• Hypothesis testing: Likelihood ratio, Wald, Lagrange multiplier tests

189