Lecture 3: Asymptotic Normality of M-estimatorsdoubleh/eco273/newslides3.pdf · f(0jx) = (0) if...

Lecture 3: Asymptotic Normality of M-estimators

Instructor: Han Hong

Department of EconomicsStanford University

Prepared by Wenbo Zhou, Renmin University

Han Hong Normality of M-estimators

References

• Takeshi Amemiya, 1985, Advanced Econometrics, HarvardUniversity Press

• Newey and McFadden, 1994, Chapter 36, Volume 4, TheHandbook of Econometrics.


Asymptotic Normality

The General Framework

• Everything is just some form of first order Taylor Expansion:

∂Qn(θ)

∂θ= 0⇐⇒

√n∂Qn (θ0)

∂θ+√n(θ − θ0

) ∂2Qn (θ∗)

∂θ∂θ′= 0.

√n(θ − θ0

)=−

(∂2Qn (θ∗)

∂θ∂θ′

)−1√n∂Qn (θ0)

∂θ

LD= −

(∂2Q (θ0)

∂θ∂θ′

)−1√n∂Qn (θ0)

∂θ

d−→ N(0,A−1BA−1

)where

A = E

(∂2Q (θ0)

∂θ∂θ′

), B = Var

(√n∂Qn (θ0)

∂θ

)


Asymptotic Normality for MLE

• In MLE, ∂Qn(θ)∂θ = 1

n∂ log L(θ)

∂θ . ∂2Qn(θ)∂θ∂θ′ = 1

n∂2 log L(θ)∂θ∂θ′ .

• Information matrix:

E∂2 log L (θ0)

∂θ∂θ′= −E ∂ log L (θ0)

∂θ

∂ log L (θ0)

∂θ′.

by using interchange of integration and differentiation.

• So A = −B, and

√n(θ − θ0

)d−→ N

(0,−A−1

)= N

(0,

(− lim

1

nE∂2 log L (θ)

∂θ∂θ′

)−1).

• What if interchanging integration and differentiation is notpossible?

• Example: If y ∈ (θ,∞), then E ∂ log f (y ;θ)∂θ = f (θ).


Asymptotic Normality for GMM

• Qn (θ) = gn (θ)′Wgn (θ), gn (θ) = 1n

∑nt=1 g (zt , θ).

• Asymptotic normality holds when the moment functions onlyhave first derivatives.

• Denote Gn (θ) = ∂gn(θ)∂θ , θ∗ ∈ [θ0, θ], Gn ≡ Gn(θ),

G ∗n ≡ Gn (θ∗), G = EGn (θ0), Ω = E(g (z , θ0) g (z , θ0)′

).

0 = G ′nWgn(θ) = G ′nW(gn (θ0) + G∗n (θ − θ0)

)=⇒√n(θ − θ0) = (G ′nWG∗n )−1G ′nW

√ngn (θ0)

LD= (G ′WG )

−1G ′W

√ngn (θ0)

LD= (G ′WG )

−1G ′W × N (0,Ω)

= N(

0, (G ′WG )−1

G ′WΩWG (G ′WG )−1)


Examples

• Efficient choice of W = Ω−1(or W ∝ Ω−1),

√n(θ − θ0

)d−→ N

(0,(G ′Ω−1G

)−1).

• When G is invertible, W is irrelevant,

√n(θ − θ0

)d−→ N

(0,G−1ΩG

′−1)

= N(

0,(G ′Ω−1G

)−1).

• When Ω = αG (or G ∝ Ω),

√n(β − β0

)d−→ N

(0, αG−1

).


• Least square (LS): g (z , β) = x (y − xβ).

• G = Exx ′, Ω = Eε2xx ′, then

√n(β − β0

)d−→ N

(0, (Exx ′)

−1 (Eε2xx ′

)(Exx ′)

−1),

the so-called White’s heteroscedasticity consistency standarderror.

• If E[ε2|x

]= σ2, then Ω = σ2G and

√n(β − β0

)d−→ N

(0, σ2 (Exx ′)

−1).

• Weighted LS: g (z , β) = 1E(ε2|x) (y − x ′β).

G = E 1E(ε2|x)xx

′ = Ω =⇒√n(β − β0

)d−→ N (0,G ).


• Linear 2SLS: g (z , β) = z (y − xβ).

• G = Ezx ′, Ω = Eε2zz ′, W = (Ezz ′)−1, then√n(β − β0

)d−→ N (0,V ).

• If Eε2zz ′ = σ2Ezz ′, V = σ2[Exz ′ (Ezz ′)

−1Ezx ′

]−1.

• Linear 3SLS: g (z , β) = z (y − xβ).

G = Ezx ′, Ω = Eε2zz ′, W =(Eε2zz ′

)−1, then

√n(β − β0

)d−→ N (0,V ) for V =

[Exz ′

(Eε2zz ′

)−1Ezx ′

]−1.

• MLE as GMM: g (z , θ) = ∂ log f (z,θ)∂θ .

G = −E ∂2 log f (z,θ)∂θ∂θ′ = Ω = E ∂ log f (z,θ)

∂θ∂ log f (z,θ)

∂θ′ , then

√n(θ − θ

)d−→ N

(0,G−1

)= N (0,Ω).


• GMM again:

• Take linear combinations of the moment conditions to make

Number of g (z , θ) = Number of θ.

• In particular, take h (z , θ) = G ′Wg (z , θ) and use h (z , θ) asthe new moment conditions, then

θ = argmaxθ

[1

n

n∑t=1

h (zt , θ)

]′ [1

n

n∑t=1

h (zt , θ)

]

is asymptotically equivalent to θ = argmaxθg′nWgn, where

G = E ∂h(z,θ)∂θ = G ′WG , Ω = Eh (z , θ) h (z , θ)′ = G ′WΩWG .


• Quantile Regression as GMM:

• g (z , β) = (τ − 1 (y ≤ x ′β)) x , and W is irrelevant.

• G = E g(z,β)∂β = −E ∂1(y≤x′β)x

∂β . Proceeding with a “quick anddirty” way – take expectation before taking differentiation:

G =∂E1 (y ≤ x ′β) x

∂β=∂ExF (y ≤ x ′β|x)

∂β

=Ex∂F (y ≤ x ′β|x)

∂β= Efy (x ′β|x) xx ′ = Efu (0|x) xx ′.

• Conditional on x , τ − 1 (y ≤ x ′β0) = τ − 1 (u ≤ 0) is a

Bernoulli r.v.⇒ E[(τ − 1 (y ≤ x ′β0))2 |x

]= τ (1− τ), then

Ω = EE[(τ − 1 (y ≤ x ′β0))

2 |x]xx ′ = τ (1− τ)Exx ′.


• Quantile Regression as GMM:

•√n(β − β0)

d→N(

0, τ (1− τ) [Efu (0|x) xx ′]−1

Exx ′ [Efu (0|x) xx ′]−1)

.

• f (0|x) = f (0) if homoscedastic, then V = τ(1−τ)f (0) Exx ′.

• Consistent estimation of G and Ω:

• Estimated by G.

= 1n

∑nt=1

∂g(zt ,θ)∂θ .

• For nonsmooth problems as quantile regression, useQn(θ+2hn)+Qn(θ−2hn)−2Q(θ)

4h2nto approximate.

Require hn = o (1) and 1/hn = o(1/√n).

• For stationary data, heteroscedasticity and dependence willonly affect estimation of Ω. For independent data, use White’sheteroscedasticity-consistent estimate; for dependent data, useNewey-West’s autocorrelation-consistent estimate.


Iteration and One Step Estimation

• The initial guess θ ⇒ the next round guess θ.

• Newton-Raphson, use quadratic approximation for Qn (θ).

• Gauss-Newton, use linear approximation for the first-ordercondition, e.g. GMM.

• If the initial guess is a√n consistent estimate, more iteration

will not increase (first-order) asymptotic efficiency.

• e.g.(θ − θ0

)= Op

(1√n

), then

√n(θ − θ0

) LD=√n(θ − θ0

),

for θ = argmaxθQn (θ).


1 Newton-Raphson, Use quadratic approximation for Qn (θ):

Qn (θ) ≈ Qn

(θ)

+∂Q(θ)

∂θ

′ (θ − θ

)+

1

2

(θ − θ

)′ ∂2Q (θ)∂θθ′

(θ − θ

)= 0.

=⇒∂Qn

(θ)

∂θ+∂2Qn

(θ)

∂θ∂θ′

(θ − θ

)= 0.

=⇒ θ = θ −

∂2Qn

(θ)

∂θ∂θ′

−1 ∂Qn

(θ)

∂θ

2 Gauss-Newton, use linear approximation for the first-ordercondition, e.g. GMM:

Qn (θ) ≈(gn(θ)

+ G(θ − θ

))′W(gn(θ)

+ G(θ − θ

))′=⇒ G ′Wgn

(θ)

+ G ′WG(θ − θ

)= 0.

=⇒ θ = θ −(G ′WG

)−1GWgn

(θ)


If the initial guess is a√n consistent estimate, e.g.(

β − β0)

= Op

(1√n

), then

√n(θ − θ0

) LD=√n(θ − θ0

), for

θ = argmaxθQn (θ). More iteration will not increase (first-order)asymptotic efficiency:


1 For Newton-Raphson:

√n(θ − θ0

)=√n(θ − θ0

)−

∂2Q(θ)

∂θ∂θ′

−1

√n∂Q(θ)

∂θ

=√n(θ − θ0

)−

∂2Q(θ)

∂θ∂θ′

−1 [√n∂Q (θ0)

∂θ+(θ − θ0

) ∂2Q (θ∗)

∂θ∂θ′

]

=

I −

∂2Q(θ)

∂θ∂θ′

−1

∂2Q (θ∗)

∂θ∂θ′

√n (θ − θ0)−∂2Q

(θ)

∂θ∂θ′

−1

√n∂Q (θ0)

∂θ

= op (1) +√n(θ − θ0

)2 For Gauss-Newton:√n(θ − θ0

)=√n(θ − θ0

)−(G ′WG

)−1

G ′W√n[gn (θ0) + G∗

(θ − θ0

)]=

(I −

(G ′WG

)−1

G ′WG∗)√

n(θ − θ0

)−(G ′WG

)−1

G ′W√ngn (θ0)

= op (1) +√n(θ − θ0

)Han Hong Normality of M-estimators

Influence Function

• φ (zt) is called influence function if

•√n(θ − θ0) = 1√

n

∑nt=1 φ (zt) + op (1),

• Eφ (zt) = 0, Eφ (zt)φ (zt)′<∞.

• Think of√n(θ − θ0) distributed as

φ (zt) ∼ N(0,Eφφ′

).

• Used for discussion of asymptotic efficiency, two step ormultistep estimation, etc.


Examples

• For MLE,

φ (zt) =

[−E ∂

2 ln f (yt , θ0)

∂θ∂θ′

]−1∂ ln f (yt , θ0)

∂θ

=

[E∂ ln f (yt , θ0)

∂θ

∂ ln f (yt , θ0)

∂θ′

]−1 ∂ ln f (yt , θ0)

∂θ.

• For GMM,

φ =−(G ′WG

)−1G ′Wg (zt , θ0) ,

or φ =−(E∂h

∂θ

)−1h (zt , θ0) for h (zt , θ0) = G ′Wg (zt , θ0) .

• Quantile Regression:

φ (zt) =[Ef (0|x) xx ′

]−1(τ − 1 (u ≤ 0)) xt .


Asymptotic Efficiency

• Is MLE efficient among all asymptotically normal estimators?

• Superefficient estimator:

Suppose√n(θ − θ0)

d−→ N (0,V ) for all θ. Now define

θ∗ =

θ if |θ| ≥ n−1/4

0 if |θ| < n−1/4

then√n (θ∗ − θ0)

d−→ N (0, 0) if θ0 = 0, and√n (θ∗ − θ0)

LD=√n(θ − θ0)

d−→ N (0,V ) if θ0 6= 0.

• θ is regular if for any data generated by θn = θ0 + δ/√n, for

δ ≥ 0,√n(θ − θ0) has a limit distribution that does not

depend on δ.


• For regular estimators, influence function representationindexed by τ ,

√n(θ (τ)− θ0)

LD= φ (z , τ) ∼ N

(0,Eφ (τ)φ (τ)′

),

• θ (τ) is efficient than θ (τ) if it has a smaller var-cov matrix.

• A necessary condition is thatCov (φ (z , τ)− φ (z , τ) , φ (z , τ)) = 0 for all τ including τ .

• The following are equivalent:

Cov (φ (z , τ)− φ (z , τ) , φ (z , τ)) = 0

⇐⇒Cov (φ (z , τ) , φ (z , τ)) = Var (φ (z , τ))

⇐⇒Eφ (z , τ)φ (z , τ)′ = Eφ (z , τ)φ (z , τ)′


Newey’s efficiency framework:

• Classify estimators into the GMM framework with

φ (z , τ) = D (τ)−1m (z , τ).

• For the class indexed by τ = W , given a vector g (z , θ0),

D (τ) ≡ D (W ) = G ′WG and

m (z , τ) ≡ m (z ,W ) = G ′Wg (z , θ0).

• Consider MLE among the class of GMM estimators, so that τindexes any vector of moment function having the samedimension as θ. In this case,

D (τ) ≡ D (h) = −E ∂h∂θ and m (z , τ) = h (zt , θ0).


• For this particular case where φ (z , τ) = D (τ)−1m (z , τ),

Eφ (z , τ)φ (z , τ)′ = Eφ (z , τ)φ (z , τ)′ =⇒

D (τ)−1 Em (z , τ)m (z , τ)D (τ)−1 = D (τ)−1 Em (z , τ)m (z , τ)D (τ)−1 .

• If τ satisfies D (τ) = Em (z , τ)m (z , τ) for all τ , then bothsides above are the same D (τ)−1 and so efficient.

• Examples. Check D (τ) = Em (z , τ)m (z , τ).

• GMM with optimal weighting matrix:

D (τ) = G ′WG , m (z , τ) = m (z ,W ) = G ′Wg(z , θ0).

To check D (τ) = Em (z , τ)m (z , τ) = G ′WΩWG ,

G ′WG = G ′WΩWG =⇒ ΩW = I =⇒ W = Ω−1.


• MLE better than any GMM:

D (τ) = −E ∂h(z,θ0)∂θ , m (z , τ) = h (z , θ0).

To check D (τ) = Eh (z , θ0) h (z , θ0), use the generalizedinformation matrix equality:

0 =∂Eh (z , θ0)

∂θ=

∂

∂θ

∫h (z , θ) f (z , θ) dz

=

∫∂h (z , θ)

∂θf (z , θ) dz +

∫h (z , θ)

∂ ln f (z , θ)

∂θf (z , θ) dz

= E∂h (z , θ0)

∂θ+ Eh (z , θ0)

∂ ln f (z , θ0)

∂θ

=⇒ h (z , θ0) = ∂ ln f (y ,θ0)∂θ , the score function for MLE.


Two Step Estimator

General Framework:

• First step estimator√n (γ − γ0) = 1√

n

∑nt=1 φ (zt) + op (1).

• Estimate θ by

∂Qn(θ, γ)

∂θ=

1

n

n∑t=1

q(zt , θ, γ)

∂θ= 0

Let=

1

n

n∑t=1

h(zt , θ, γ).

• Let

H (z , θ, γ) =∂h (z , θ, γ)

∂θ, Γ (z , θ, γ) =

∂h (z , θ, γ)

∂γ;

H = EH (zt , θ0, γ0) , Γ = EΓ (z , θ0, γ0) ;

h = h (θ0, γ0) .


• Then just taylor expand: 1√n

∑h(zt , θ, γ

)= 0

⇐⇒ 1√n

∑h (θ0, γ) + 1

n

∑H (θ∗, γ)

√n(θ − θ0

)= 0 =⇒

√n(θ − θ0

)=−

[1

n

∑H (θ∗, γ)

]−11√n

∑h (θ0, γ)

LD= − H−1

[1√n

∑h (θ0, γ0) +

1

n

∑Γ (θ0, γ

∗)√n (γ − γ0)

]LD= − H−1

[1√n

∑h + Γ

(1√n

∑φ (zt) + op (1)

)]LD= − H−1

[1√n

∑h + Γ

1√n

∑φ (zt)

].

So that√n(θ − θ0

)d−→ N (0,V ) for

V = H−1E (h + Γφ) (h′ + φ′Γ′)H−1′.


• GMM both first stage γ and second stage θ:

• φ = −M−1m (z), for some moment condition m (z , γ).

• h (θ, γ) = G ′Wg (z , θ, γ) so that H = G ′WG ,Γ = G ′W ∂g

∂γ ≡ G ′WGγ for Gγ ≡ ∂g∂γ .

• Plug these into the above general case.

• If W = I , and G is invertible, then this simplies to

V = G−1[Ω + (Egφ′)G ′γ + Gγ (Eφg ′) + Gγ (Eφφ′)G ′γ

]G−1

′.

• Again if you have trouble differentiating ∂g(θ,γ)∂θ or ∂g(θ,γ)

∂γ ,then simply take expectation before differentiation, justreplace H and Γ by ∂Eg(θ,γ)

∂θ and ∂Eg(θ,γ)∂γ .


Lecture 3: Asymptotic Normality of M-estimatorsdoubleh/eco273/newslides3.pdf · f(0jx) = (0) if...

Documents

Transcript of Lecture 3: Asymptotic Normality of M-estimatorsdoubleh/eco273/newslides3.pdf · f(0jx) = (0) if...