Lecture 3: Asymptotic Normality of M-estimatorsdoubleh/eco273/newslides3.pdf · f(0jx) = (0) if...
Transcript of Lecture 3: Asymptotic Normality of M-estimatorsdoubleh/eco273/newslides3.pdf · f(0jx) = (0) if...
Lecture 3: Asymptotic Normality of M-estimators
Instructor: Han Hong
Department of EconomicsStanford University
Prepared by Wenbo Zhou, Renmin University
Han Hong Normality of M-estimators
References
• Takeshi Amemiya, 1985, Advanced Econometrics, HarvardUniversity Press
• Newey and McFadden, 1994, Chapter 36, Volume 4, TheHandbook of Econometrics.
Han Hong Normality of M-estimators
Asymptotic Normality
The General Framework
• Everything is just some form of first order Taylor Expansion:
∂Qn(θ)
∂θ= 0⇐⇒
√n∂Qn (θ0)
∂θ+√n(θ − θ0
) ∂2Qn (θ∗)
∂θ∂θ′= 0.
√n(θ − θ0
)=−
(∂2Qn (θ∗)
∂θ∂θ′
)−1√n∂Qn (θ0)
∂θ
LD= −
(∂2Q (θ0)
∂θ∂θ′
)−1√n∂Qn (θ0)
∂θ
d−→ N(0,A−1BA−1
)where
A = E
(∂2Q (θ0)
∂θ∂θ′
), B = Var
(√n∂Qn (θ0)
∂θ
)
Han Hong Normality of M-estimators
Asymptotic Normality for MLE
• In MLE, ∂Qn(θ)∂θ = 1
n∂ log L(θ)
∂θ . ∂2Qn(θ)∂θ∂θ′ = 1
n∂2 log L(θ)∂θ∂θ′ .
• Information matrix:
E∂2 log L (θ0)
∂θ∂θ′= −E ∂ log L (θ0)
∂θ
∂ log L (θ0)
∂θ′.
by using interchange of integration and differentiation.
• So A = −B, and
√n(θ − θ0
)d−→ N
(0,−A−1
)= N
(0,
(− lim
1
nE∂2 log L (θ)
∂θ∂θ′
)−1).
• What if interchanging integration and differentiation is notpossible?
• Example: If y ∈ (θ,∞), then E ∂ log f (y ;θ)∂θ = f (θ).
Han Hong Normality of M-estimators
Asymptotic Normality for GMM
• Qn (θ) = gn (θ)′Wgn (θ), gn (θ) = 1n
∑nt=1 g (zt , θ).
• Asymptotic normality holds when the moment functions onlyhave first derivatives.
• Denote Gn (θ) = ∂gn(θ)∂θ , θ∗ ∈ [θ0, θ], Gn ≡ Gn(θ),
G ∗n ≡ Gn (θ∗), G = EGn (θ0), Ω = E(g (z , θ0) g (z , θ0)′
).
0 = G ′nWgn(θ) = G ′nW(gn (θ0) + G∗n (θ − θ0)
)=⇒√n(θ − θ0) = (G ′nWG∗n )−1G ′nW
√ngn (θ0)
LD= (G ′WG )
−1G ′W
√ngn (θ0)
LD= (G ′WG )
−1G ′W × N (0,Ω)
= N(
0, (G ′WG )−1
G ′WΩWG (G ′WG )−1)
Han Hong Normality of M-estimators
Examples
• Efficient choice of W = Ω−1(or W ∝ Ω−1),
√n(θ − θ0
)d−→ N
(0,(G ′Ω−1G
)−1).
• When G is invertible, W is irrelevant,
√n(θ − θ0
)d−→ N
(0,G−1ΩG
′−1)
= N(
0,(G ′Ω−1G
)−1).
• When Ω = αG (or G ∝ Ω),
√n(β − β0
)d−→ N
(0, αG−1
).
Han Hong Normality of M-estimators
• Least square (LS): g (z , β) = x (y − xβ).
• G = Exx ′, Ω = Eε2xx ′, then
√n(β − β0
)d−→ N
(0, (Exx ′)
−1 (Eε2xx ′
)(Exx ′)
−1),
the so-called White’s heteroscedasticity consistency standarderror.
• If E[ε2|x
]= σ2, then Ω = σ2G and
√n(β − β0
)d−→ N
(0, σ2 (Exx ′)
−1).
• Weighted LS: g (z , β) = 1E(ε2|x) (y − x ′β).
G = E 1E(ε2|x)xx
′ = Ω =⇒√n(β − β0
)d−→ N (0,G ).
Han Hong Normality of M-estimators
• Linear 2SLS: g (z , β) = z (y − xβ).
• G = Ezx ′, Ω = Eε2zz ′, W = (Ezz ′)−1, then√n(β − β0
)d−→ N (0,V ).
• If Eε2zz ′ = σ2Ezz ′, V = σ2[Exz ′ (Ezz ′)
−1Ezx ′
]−1.
• Linear 3SLS: g (z , β) = z (y − xβ).
G = Ezx ′, Ω = Eε2zz ′, W =(Eε2zz ′
)−1, then
√n(β − β0
)d−→ N (0,V ) for V =
[Exz ′
(Eε2zz ′
)−1Ezx ′
]−1.
• MLE as GMM: g (z , θ) = ∂ log f (z,θ)∂θ .
G = −E ∂2 log f (z,θ)∂θ∂θ′ = Ω = E ∂ log f (z,θ)
∂θ∂ log f (z,θ)
∂θ′ , then
√n(θ − θ
)d−→ N
(0,G−1
)= N (0,Ω).
Han Hong Normality of M-estimators
• GMM again:
• Take linear combinations of the moment conditions to make
Number of g (z , θ) = Number of θ.
• In particular, take h (z , θ) = G ′Wg (z , θ) and use h (z , θ) asthe new moment conditions, then
θ = argmaxθ
[1
n
n∑t=1
h (zt , θ)
]′ [1
n
n∑t=1
h (zt , θ)
]
is asymptotically equivalent to θ = argmaxθg′nWgn, where
G = E ∂h(z,θ)∂θ = G ′WG , Ω = Eh (z , θ) h (z , θ)′ = G ′WΩWG .
Han Hong Normality of M-estimators
• Quantile Regression as GMM:
• g (z , β) = (τ − 1 (y ≤ x ′β)) x , and W is irrelevant.
• G = E g(z,β)∂β = −E ∂1(y≤x′β)x
∂β . Proceeding with a “quick anddirty” way – take expectation before taking differentiation:
G =∂E1 (y ≤ x ′β) x
∂β=∂ExF (y ≤ x ′β|x)
∂β
=Ex∂F (y ≤ x ′β|x)
∂β= Efy (x ′β|x) xx ′ = Efu (0|x) xx ′.
• Conditional on x , τ − 1 (y ≤ x ′β0) = τ − 1 (u ≤ 0) is a
Bernoulli r.v.⇒ E[(τ − 1 (y ≤ x ′β0))2 |x
]= τ (1− τ), then
Ω = EE[(τ − 1 (y ≤ x ′β0))
2 |x]xx ′ = τ (1− τ)Exx ′.
Han Hong Normality of M-estimators
• Quantile Regression as GMM:
•√n(β − β0)
d→N(
0, τ (1− τ) [Efu (0|x) xx ′]−1
Exx ′ [Efu (0|x) xx ′]−1)
.
• f (0|x) = f (0) if homoscedastic, then V = τ(1−τ)f (0) Exx ′.
• Consistent estimation of G and Ω:
• Estimated by G.
= 1n
∑nt=1
∂g(zt ,θ)∂θ .
• For nonsmooth problems as quantile regression, useQn(θ+2hn)+Qn(θ−2hn)−2Q(θ)
4h2nto approximate.
Require hn = o (1) and 1/hn = o(1/√n).
• For stationary data, heteroscedasticity and dependence willonly affect estimation of Ω. For independent data, use White’sheteroscedasticity-consistent estimate; for dependent data, useNewey-West’s autocorrelation-consistent estimate.
Han Hong Normality of M-estimators
Iteration and One Step Estimation
• The initial guess θ ⇒ the next round guess θ.
• Newton-Raphson, use quadratic approximation for Qn (θ).
• Gauss-Newton, use linear approximation for the first-ordercondition, e.g. GMM.
• If the initial guess is a√n consistent estimate, more iteration
will not increase (first-order) asymptotic efficiency.
• e.g.(θ − θ0
)= Op
(1√n
), then
√n(θ − θ0
) LD=√n(θ − θ0
),
for θ = argmaxθQn (θ).
Han Hong Normality of M-estimators
1 Newton-Raphson, Use quadratic approximation for Qn (θ):
Qn (θ) ≈ Qn
(θ)
+∂Q(θ)
∂θ
′ (θ − θ
)+
1
2
(θ − θ
)′ ∂2Q (θ)∂θθ′
(θ − θ
)= 0.
=⇒∂Qn
(θ)
∂θ+∂2Qn
(θ)
∂θ∂θ′
(θ − θ
)= 0.
=⇒ θ = θ −
∂2Qn
(θ)
∂θ∂θ′
−1 ∂Qn
(θ)
∂θ
2 Gauss-Newton, use linear approximation for the first-ordercondition, e.g. GMM:
Qn (θ) ≈(gn(θ)
+ G(θ − θ
))′W(gn(θ)
+ G(θ − θ
))′=⇒ G ′Wgn
(θ)
+ G ′WG(θ − θ
)= 0.
=⇒ θ = θ −(G ′WG
)−1GWgn
(θ)
Han Hong Normality of M-estimators
If the initial guess is a√n consistent estimate, e.g.(
β − β0)
= Op
(1√n
), then
√n(θ − θ0
) LD=√n(θ − θ0
), for
θ = argmaxθQn (θ). More iteration will not increase (first-order)asymptotic efficiency:
Han Hong Normality of M-estimators
1 For Newton-Raphson:
√n(θ − θ0
)=√n(θ − θ0
)−
∂2Q(θ)
∂θ∂θ′
−1
√n∂Q(θ)
∂θ
=√n(θ − θ0
)−
∂2Q(θ)
∂θ∂θ′
−1 [√n∂Q (θ0)
∂θ+(θ − θ0
) ∂2Q (θ∗)
∂θ∂θ′
]
=
I −
∂2Q(θ)
∂θ∂θ′
−1
∂2Q (θ∗)
∂θ∂θ′
√n (θ − θ0)−∂2Q
(θ)
∂θ∂θ′
−1
√n∂Q (θ0)
∂θ
= op (1) +√n(θ − θ0
)2 For Gauss-Newton:√n(θ − θ0
)=√n(θ − θ0
)−(G ′WG
)−1
G ′W√n[gn (θ0) + G∗
(θ − θ0
)]=
(I −
(G ′WG
)−1
G ′WG∗)√
n(θ − θ0
)−(G ′WG
)−1
G ′W√ngn (θ0)
= op (1) +√n(θ − θ0
)Han Hong Normality of M-estimators
Influence Function
• φ (zt) is called influence function if
•√n(θ − θ0) = 1√
n
∑nt=1 φ (zt) + op (1),
• Eφ (zt) = 0, Eφ (zt)φ (zt)′<∞.
• Think of√n(θ − θ0) distributed as
φ (zt) ∼ N(0,Eφφ′
).
• Used for discussion of asymptotic efficiency, two step ormultistep estimation, etc.
Han Hong Normality of M-estimators
Examples
• For MLE,
φ (zt) =
[−E ∂
2 ln f (yt , θ0)
∂θ∂θ′
]−1∂ ln f (yt , θ0)
∂θ
=
[E∂ ln f (yt , θ0)
∂θ
∂ ln f (yt , θ0)
∂θ′
]−1 ∂ ln f (yt , θ0)
∂θ.
• For GMM,
φ =−(G ′WG
)−1G ′Wg (zt , θ0) ,
or φ =−(E∂h
∂θ
)−1h (zt , θ0) for h (zt , θ0) = G ′Wg (zt , θ0) .
• Quantile Regression:
φ (zt) =[Ef (0|x) xx ′
]−1(τ − 1 (u ≤ 0)) xt .
Han Hong Normality of M-estimators
Asymptotic Efficiency
• Is MLE efficient among all asymptotically normal estimators?
• Superefficient estimator:
Suppose√n(θ − θ0)
d−→ N (0,V ) for all θ. Now define
θ∗ =
θ if |θ| ≥ n−1/4
0 if |θ| < n−1/4
then√n (θ∗ − θ0)
d−→ N (0, 0) if θ0 = 0, and√n (θ∗ − θ0)
LD=√n(θ − θ0)
d−→ N (0,V ) if θ0 6= 0.
• θ is regular if for any data generated by θn = θ0 + δ/√n, for
δ ≥ 0,√n(θ − θ0) has a limit distribution that does not
depend on δ.
Han Hong Normality of M-estimators
• For regular estimators, influence function representationindexed by τ ,
√n(θ (τ)− θ0)
LD= φ (z , τ) ∼ N
(0,Eφ (τ)φ (τ)′
),
• θ (τ) is efficient than θ (τ) if it has a smaller var-cov matrix.
• A necessary condition is thatCov (φ (z , τ)− φ (z , τ) , φ (z , τ)) = 0 for all τ including τ .
• The following are equivalent:
Cov (φ (z , τ)− φ (z , τ) , φ (z , τ)) = 0
⇐⇒Cov (φ (z , τ) , φ (z , τ)) = Var (φ (z , τ))
⇐⇒Eφ (z , τ)φ (z , τ)′ = Eφ (z , τ)φ (z , τ)′
Han Hong Normality of M-estimators
Newey’s efficiency framework:
• Classify estimators into the GMM framework with
φ (z , τ) = D (τ)−1m (z , τ).
• For the class indexed by τ = W , given a vector g (z , θ0),
D (τ) ≡ D (W ) = G ′WG and
m (z , τ) ≡ m (z ,W ) = G ′Wg (z , θ0).
• Consider MLE among the class of GMM estimators, so that τindexes any vector of moment function having the samedimension as θ. In this case,
D (τ) ≡ D (h) = −E ∂h∂θ and m (z , τ) = h (zt , θ0).
Han Hong Normality of M-estimators
• For this particular case where φ (z , τ) = D (τ)−1m (z , τ),
Eφ (z , τ)φ (z , τ)′ = Eφ (z , τ)φ (z , τ)′ =⇒
D (τ)−1 Em (z , τ)m (z , τ)D (τ)−1 = D (τ)−1 Em (z , τ)m (z , τ)D (τ)−1 .
• If τ satisfies D (τ) = Em (z , τ)m (z , τ) for all τ , then bothsides above are the same D (τ)−1 and so efficient.
• Examples. Check D (τ) = Em (z , τ)m (z , τ).
• GMM with optimal weighting matrix:
D (τ) = G ′WG , m (z , τ) = m (z ,W ) = G ′Wg(z , θ0).
To check D (τ) = Em (z , τ)m (z , τ) = G ′WΩWG ,
G ′WG = G ′WΩWG =⇒ ΩW = I =⇒ W = Ω−1.
Han Hong Normality of M-estimators
• MLE better than any GMM:
D (τ) = −E ∂h(z,θ0)∂θ , m (z , τ) = h (z , θ0).
To check D (τ) = Eh (z , θ0) h (z , θ0), use the generalizedinformation matrix equality:
0 =∂Eh (z , θ0)
∂θ=
∂
∂θ
∫h (z , θ) f (z , θ) dz
=
∫∂h (z , θ)
∂θf (z , θ) dz +
∫h (z , θ)
∂ ln f (z , θ)
∂θf (z , θ) dz
= E∂h (z , θ0)
∂θ+ Eh (z , θ0)
∂ ln f (z , θ0)
∂θ
=⇒ h (z , θ0) = ∂ ln f (y ,θ0)∂θ , the score function for MLE.
Han Hong Normality of M-estimators
Two Step Estimator
General Framework:
• First step estimator√n (γ − γ0) = 1√
n
∑nt=1 φ (zt) + op (1).
• Estimate θ by
∂Qn(θ, γ)
∂θ=
1
n
n∑t=1
q(zt , θ, γ)
∂θ= 0
Let=
1
n
n∑t=1
h(zt , θ, γ).
• Let
H (z , θ, γ) =∂h (z , θ, γ)
∂θ, Γ (z , θ, γ) =
∂h (z , θ, γ)
∂γ;
H = EH (zt , θ0, γ0) , Γ = EΓ (z , θ0, γ0) ;
h = h (θ0, γ0) .
Han Hong Normality of M-estimators
• Then just taylor expand: 1√n
∑h(zt , θ, γ
)= 0
⇐⇒ 1√n
∑h (θ0, γ) + 1
n
∑H (θ∗, γ)
√n(θ − θ0
)= 0 =⇒
√n(θ − θ0
)=−
[1
n
∑H (θ∗, γ)
]−11√n
∑h (θ0, γ)
LD= − H−1
[1√n
∑h (θ0, γ0) +
1
n
∑Γ (θ0, γ
∗)√n (γ − γ0)
]LD= − H−1
[1√n
∑h + Γ
(1√n
∑φ (zt) + op (1)
)]LD= − H−1
[1√n
∑h + Γ
1√n
∑φ (zt)
].
So that√n(θ − θ0
)d−→ N (0,V ) for
V = H−1E (h + Γφ) (h′ + φ′Γ′)H−1′.
Han Hong Normality of M-estimators
• GMM both first stage γ and second stage θ:
• φ = −M−1m (z), for some moment condition m (z , γ).
• h (θ, γ) = G ′Wg (z , θ, γ) so that H = G ′WG ,Γ = G ′W ∂g
∂γ ≡ G ′WGγ for Gγ ≡ ∂g∂γ .
• Plug these into the above general case.
• If W = I , and G is invertible, then this simplies to
V = G−1[Ω + (Egφ′)G ′γ + Gγ (Eφg ′) + Gγ (Eφφ′)G ′γ
]G−1
′.
• Again if you have trouble differentiating ∂g(θ,γ)∂θ or ∂g(θ,γ)
∂γ ,then simply take expectation before differentiation, justreplace H and Γ by ∂Eg(θ,γ)
∂θ and ∂Eg(θ,γ)∂γ .
Han Hong Normality of M-estimators