Chapter 3. GMM: Selected Topicspeople.bu.edu/qu/EC709-2012/chapter03.pdf · 2012. 10. 16. · 1....
Transcript of Chapter 3. GMM: Selected Topicspeople.bu.edu/qu/EC709-2012/chapter03.pdf · 2012. 10. 16. · 1....
Chapter 3. GMM: Selected Topics
Contents
1 Optimal Instruments 11.1 The issue of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Optimal Instruments under the i:i:d: assumption . . . . . . . . . . . . . 1
1.2.1 The basic result . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Optimal Instruments under the martingale di¤erence assumption . . . . 5
1.4 Optimal Instruments under general dependence . . . . . . . . . . . . . . 5
2 Finite sample properties 62.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 The size of GMM-based Wald tests . . . . . . . . . . . . . . . . . . . . . 6
2.3 Bootstrap under the martingale di¤erence assumption . . . . . . . . . . 10
2.3.1 Bootstrapping the GMM estimator . . . . . . . . . . . . . . . . . 10
2.3.2 Bootstrap the test statistics . . . . . . . . . . . . . . . . . . . 11
2.4 Bootstrap under general serial dependence . . . . . . . . . . . . . . . . . 12
3 Weak identi�cation: a pitfall and some ways around 123.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Some consequences of weak instruments in linear models: a scary regression 13
3.3 Robust Inference with weak instruments in linear models . . . . . . . . 16
3.4 Robust Inference in nonlinear models . . . . . . . . . . . . . . . . . . . . 17
1. Optimal Instruments
1.1. The issue of interest
When applying GMM in macroeconomics, a typical but crucial step is to transform
conditional restrictions into unconditional ones.
Speci�cally, suppose a model delivers the following conditional moment restrictions:
E(dt(�0)jzt) = 0; (t = 1; 2; :::) (1)
where dt(�0) and zt are �nite dimensional random vectors. (Here we have simpli�ed
the matter by restricting zt to be �nite dimensional.) This implies, for any measurable
integrable function f(:), we always have
E (f(zt)dt(�0)) = 0 (t = 1; 2; :::): (2)
We end up with an in�nite number of valid instruments, or equivalently, an in�nitely
number of unconditional moment restrictions.
Question: which of these instruments should we use if the goal is to minimize the
asymptotic variance of the GMM estimator?
1.2. Optimal Instruments under the i:i:d: assumption
1.2.1. The basic result
� Assumption 1: (dt(�0)0; z0t) are independently and identically distributed in t =
1; :::; T:
Proposition 1. Assume (1) and Assumption 1 hold. Suppose �0 is an unknown q by
1 parameter vector.
1. An optimal choice of the instruments (i.e, minimizing the asymptotic variance
among all GMM estimators) in (2) is given by
z�t = KE
�@dt(�0)
0
@�
���� zt����1; (3)
1
where K is any q by q nonsingular matrix of �nite constants, and
�� = E�dt(�0)dt(�0)
0jzt�; (4)
2. The resulting GMM estimator solves
1
T
TXt=1
z�t dt(��) = 0;
whose asymptotic covariance matrix is given by
V � =
�E
�E
�@dt(�0)
0
@�
���� zt����1E � @dt(�0)@�0
���� zt����1 :1.2.2. Illustrative examples
Example 1. Consider the linear model
yt = xt� + ut
where xt is a scalar random variable with E(xtut) 6= 0: Suppose zt is a set of instrumentssatisfying
E(utjzt) = 0;
var(utjzt) = �2;
and (yt; xt; zt) are i:i:d. Then, apply (3) and (4),
z�t = kE
�@(yt � xt�)
@�
���� zt� 1
�2= � k
�2E(xtjzt);
where k is an arbitrary constant. Since k can take any value, we are free to set k = ��2
in which case the optimal instrument reduces to,
z�t = E(xtjzt):
Now we generalize the model to allow for heteroskedasticity, i.e., we assume
var(utjzt) = �2t ;
2
but leave all other aspect of the speci�cation the same. In this case, the optimal instru-
ment takes the form
z�t = kE
�@(yt � xt�)
@�
���� zt� 1
�2t= � k
�2tE(xtjzt):
In this case, it is not possible to eliminate �2t by judicious choice of k, although we can
set k = �1 to remove the minus sign. This gives
z�t =1
�2tE(xtjzt):
Example 2. Consider the following linear two-equation system:
y1;t = x1;t�1 + u1;t
y2;t = x2;t�2 + u2;t
where x1;t and x2;t are scalar random variables. Let
ut = (u1;t; u2;t)0:
Suppose zt is a set of variables satisfying
E(utjzt) = 0;
var(utjzt) = �;
and (yt; xt; zt) are iid. Then, apply (3) and (4),
z�t = KE
"@(y1;t�x1;t�1)
@�10
0@(y2;t�x2;t�2)
@�2
#jzt
!��1
= �KE "
E(x1;tjzt) 0
0 E(x2;tjzt)
#!��1
where K is a matrix of constants. In this case, the matrix ��1 weights E(x1;tjzt) andE(x2;tjzt) to account for the correlation between the two equations.
While the Proposition characterizes the optimal instruments, it does not fully resolve
the problem of instrument selection. The function z�t depends on E (@dt(�0)0=@�j zt)
3
and, in most cases, on �� as well (see the above examples); neither of these functions
are typically part of the speci�cation of the underlying economic/statistical model. One
natural solution is to estimate the components E (@dt(�0)0=@�j zt) and �� from the data.In some cases this works in a straightforward manner, while in general it is complicated.
We now present one example in which the proposal works.
Example 3. Still consider the linear model studied in the �rst example, with the fur-
ther assumption that xt is generated by a linear model
xt = z0t� + vt: (5)
With this speci�cation, an optimal instrument is
E(xtjzt) = z0t�:
Hence, the optimal GMM estimator solves
TXt=1
z0t�(yt � xt�) = 0:
Since � is unknown, we replace it by the OLS estimate based on equation (5):
� = (Z 0Z)�1Z 0X:
Substituting in, we haveTXt=1
z0t�(yt � xt�) = 0:
Explicitly
� =
TXt=1
xtz0t�
!�1 TXt=1
ytz0t�
= (X 0PZX)�1X 0PZy:
This is precisely the two-stage least square estimator (2SLS). Therefore, 2SLS estimator
can be interpreted as the feasible optimal GMM estimator within this model.
4
In the preceding example, the construction of the optimal instruments rests crucially
on the assumption that E(xtjzt) is linear. This speci�cation may be natural in somecontexts�such as the linear simultaneous equations model�but may not be so appropriate
in others. In the general case, we can use non-parametric methods to estimate E(xtjzt);e.g., approximate the conditional expectation by a polynomial. For further discussions
in such a direction, see Newey (1990, 1993).
1.3. Optimal Instruments under the martingale di¤erence assumption
We now relax Assumption 1 and presents a similar result for dynamic models. Suppose
the solution of a model delivers the following moment restrictions:
E(dt(�0)jIt�1) = 0 (6)
where It�1 is the information set at t� 1.
Corollary 1. Assume (6) holds. Suppose �0 is of dimension q.
1. Then an optimal choice of instruments in (2) is given by
z�t = KE
�@dt(�0)
0
@�
���� It�1����1;where
�� = E�dt(�0)dt(�0)
0jIt�1�;
and K is any q by q nonsingular matrix of �nite constants.
2. This choice leads to a GMM estimator with asymptotic variance matrix
V � =
�E
�E
�@dt(�0)
0
@�
���� It�1����1E � @dt(�0)@�0
���� It�1����11.4. Optimal Instruments under general dependence
Once serial correlation is introduced, which is the case if we have moment conditions
E(dt(�0)jIt�m) = 0 with m > 1,
5
then the form of the optimal instrument will change, although the basic idea remains
the same. The result is of limited use from a practical point of view, therefore we omit
the details. You can refer to Hall (2005, pp. 247-251).
2. Finite sample properties
2.1. Introduction
Our discussions so far have been asymptotic in nature. In practice, we always face a
�nite sample. The following two issues are therefore of particular importance:
1. Finite sample properties of the GMM estimator, e.g., �nite sample bias and MSE;
2. Finite sample properties of GMM-based inference procedures.
Here we focus on the second issue. We will �rst examine the �nite sample prop-
erties of the Wald tests when asymptotic critical values are used for inference. Then,
we will discuss a bootstrap procedure, which can improve the inference under some
circumstances.
2.2. The size of GMM-based Wald tests
The discussion below is based on the simulation analysis in Burnside and Eichenbaum
(1996, Henceforth BE). They asked the following questions:
1. Does the size of the tests closely approximate their asymptotic size?
2. Do joint tests of several restrictions perform as well or worse than tests of simple
hypotheses, and what are responsible for size distortions?
3. How can modelling assumptions, or restrictions imposed by hypothesis themselves,
be used to improve the performance of these tests?
4. What practical advice can be given to the practitioner?
6
BE considered two simulation experiments. In the �rst, the data are generated
by Gaussian vector white noise, and in the second the DGP is Burnside and Eichen-
baum (1994). The �ndings from the two experiments are similar; we focus on the �rst
experiment due to its simplicity.
� DGP: Xit � i:i:d:N(0; �2i ); i = 1; :::; n; t = 1; :::; T: n = 20; T = 100; �21 = ::: =
�2n = 1:
� Parameters: Econometrician knows E(Xit) = 0 and is interested in estimating�2i � V ar(Xit):
� Moment Conditions: E(X2it � �2i ) = 0; i = 1; :::; n:
� GMM estimates: �i =�T�1
PTt=1X
2it
�1=2� Hypotheses of interest: HM : �1 = ::: = �M = 1;M � n: BE considered
M 2 f1; 2; 5; 10; 20g :
� Wald tests:WMT = T (� � 1)0A0
�AVTA
0��1A (� � 1) ; (7)
where A = (IM 0M�(n�M)); � = (�1; :::; �n)0; and VT denotes a generic estimator
of the asymptotic variance-covariance matrix ofpT (� � 1); i:e:;
limT!1
VT =�G00S
�10 G0
��1:
Note that
the i� th element of G0 is E@(X2
it � �2i )@�i
= �2�i;
the ij � th element of S0 is E(X2it � �2i )(X2
jt � �2j ):
Also note that
WMT !d �2M under HM :
� Alternative Covariance Matrix Estimators VT :
7
1. Allow the data to be dependent, and estimate S0 using the Newey and West (1987)
estimator; the bandwidth BT = 4;
2. Allow the data to be dependent, and estimate S0 using the Newey and West (1987)
estimator; the bandwidth BT = 2;
3. Allow the data to be dependent, and estimate S0 using the Newey and West (1987)
estimator, the bandwidth is determined using Andrews (1991) ;
4. Exploit the assumption that data are serially uncorrelated. Thus, the ijth element
of S0 is estimated by T�1PTt=1(X
2it � �2i )(X2
jt � �2j ):
5. Exploit the assumption that data are serially uncorrelated and mutually indepen-
dent. The iith element of S0 is estimated by T�1PTt=1(X
2it��2i )2; the o¤-diagonal
elements are zero.
6. Impose Gaussianity. The ii�th element of S0 is estimated by 2�4i ; the o¤-diagonalelements are zero.
7. Impose the null hypotheses on S0. The iith element of S0 is 2 for i � n; the
o¤-diagonal elements are zero.
8. Impose the null hypotheses on S0 and G0: The iith element of S0 is 2 for i � n;the o¤-diagonal elements are zero. The ith element of G0 is �2 for i � n:
The results are reported in Table 1.
Hence, the conclusions and suggestions to practitioner are:
1. The small sample size of the Wald tests tends to exceed the asymptotic size. The
problem becomes dramatically worse as the dimension of the joint tests being
considered increases;
2. The bulk of the problem has to do with di¢ culty in estimating the variance-
covariance matrix S0. In their second simulation experiment, BE further docu-
ments that the bias in estimating mT (�0) and the correlation between mT (�) and
ST are not the main contributors to the size distortions.
8
9
3. In practice, to improve the size property, it is useful to impose a priori information
when estimating S0: Two important sources of such information are the economic
theory being investigated and the null hypothesis being tested.
2.3. Bootstrap under the martingale di¤erence assumption
Bootstrap is an alternative way to approximate the sampling distribution of an estimator
or a test statistic.
Suppose we have the following moment restrictions:
E(m(Xt; �0)) = 0; t = 1; 2; :::T:
Assume m(Xt; �0) are serially uncorrelated.
We now show how to use bootstrap to approximate the sampling distribution of the
two-step GMM estimator and related test statistics.
2.3.1. Bootstrapping the GMM estimator
Recall
� = argmin�
1
T
TXt=1
m(Xt; �)
!0ST (�1)
�1
1
T
TXt=1
m(Xt; �)
!;
where
ST (�1) =1
T
TXt=1
m(Xt; �1)m(Xt; �1)0
and �1 is some preliminary GMM estimator, say, GMM estimator using an identity
weighting matrix.
Then the bootstrap approximation to the sampling distribution of � can be obtained
as follows.
� Step 1: Draw a sample of size T with replacement from the observed sample
fX1; :::; XT g ; denote the sample of draws as fX�1 ; :::; X
�T g :
� Step 2: Compute the GMM estimator using the random sample, i.e.,
��= argmin
�
1
T
TXt=1
m�(X�t ; �)
!0S�T (�
�1)�1
1
T
TXt=1
m�(X�t ; �)
!;
10
where
m�(X�t ; �) = m(X�
t ; �)�1
T
TXt=1
m(Xt; �),
S�T (�) =1
T
TXt=1
m�(X�t ; �)m
�(X�t ; �)
0;
and ��1 is some preliminary GMM estimator, say, GMM estimator with an identity
weighting matrix.
� Step 3: Repeating Steps 1 and 2 many times (say B times) to obtain a set of es-timates. Call them �
�(1); :::; �
�(B). We then use the distribution of
pT���(j) � �
�as an approximation to the sampling distribution of
pT (� � �0).
2.3.2. Bootstrap the test statistics
We can approximate sampling distributions of commonly used test statistics using Boot-
strap. We focus on the t, the Wald and the J statistic.
The t-statistic. Recall the that t-statistic for testing the r � th component of �0equal to some constant is given by
pT (� � �0)rqVT (�)r;r
(8)
where (���0)r denotes the r� th component of ���0 and VT (�)r;r denote the (r; r)� thcomponent of VT ; with
VT (�) =hGT (�)
0ST (�)�1GT (�)
i�1(9)
The distribution of (8) can be approximated by the empirical distribution ofpT (�
� � �)rqVT (�
�)r;r
where the formula of VT (��)r;r is given in (9), with � replaced by �
�:
11
The Wald statistic. (I will leave this as an exercise). As another exercise, bootstrap
the statistic (7) and compare with Table 1 (d to f only).
The J statistic. Simply compute
J� = T
1
T
TXt=1
m�(X�t ; �
�)
!0S�T (�
�1)�1
1
T
TXt=1
m�(X�t ; �
�)
!
and repeat. Use the resulting empirical distribution to approximate the distribution of
J:
2.4. Bootstrap under general serial dependence
The extension to this case is not straightforward. The available procedures are com-
plicated and do not work very satisfactory in practice. Interested readers can read
Hall, P., and J. L. Horowitz (1996): �Bootstrap Critical Values for Tests Based on
Generalized-Method-of-Moment Estimators,�Econometrica, 64, 891�916.
3. Weak identi�cation: a pitfall and some ways around
3.1. Introduction
Recall that for the linear instrumental variable regression to work, we need a set of
instruments that are both valid (uncorrelated with the errors) and relevant (correlated
with the endogenous regressors).
For GMM, we say instruments zt are valid (or exogenous) if they satisfy moment
restrictions E(ztdt(�0)) = 0. The requirement of "instrument relevance" is replaced by
"identi�cation", which is satis�ed if
E(ztdt(�)) 6= 0 for � 6= �0:
"Weak instruments" or "weak identi�cation" will arise if instruments are only weakly
correlated with included endogenous variables. This poses considerable challenges to
inference using GMM and IV methods.
Below, we
12
1. Use a linear model to illustrate such consequences;
2. Discuss how to conduct inference for linear models with potential weak instru-
ments;
3. Brie�y discuss how to conduct inference for nonlinear models with weak identi�-
cation.
3.2. Some consequences of weak instruments in linear models: a scary re-gression
Many papers have been written trying to measure the return to years of education. The
setting usually involves estimating some wage equation as follows
yi = Yi� + x02;i� + "i;
where yi is some measure of the income for individual i and Yi is the years of education,
and x02;i are some covariates The di¢ culty is that the education achievement Yi is
endogenous. A popular solution is to use some instruments, which generate variations
in Yi but otherwise are uncorrelated with yi:
Angrist and Krueger (1991) is a famous example. They argued that the quarter
of birth is a valid instrument. The idea is that the compulsory school attendance law
requires a student to start �rst grade in the fall of the calendar year in which he or
she turns age 6 and to continue attending school until he or she turns 16. Thus an
individual born in the early months of the year will usually enter �rst grade when he or
she is close to age 7 and will reach age 16 in the middle of tenth grade. An individual
born in the third or fourth quarter will typically start school either just before or just
after turning age 6 and will �nish tenth grade before reaching age 16. They presented
several tabulations to show that individuals born in the early month of the year on
average have less years of education. They estimated the wage equation and concluded
that the educational attainment has a signi�cant e¤ect on earnings and the magnitude
is to the one estimated with OLS.
While it can be argued that the quarter of birth itself may have a direct e¤ect on
earnings, a more relevant concern is that the relationship between quarter of birth and
13
educational attainment maybe very weak. Bound, Jaeger and Baker (1995) shows this
may well be the case (The following table is from Bound, Jaeger and Baker (1995),
Table 1): Hence, the question is how misleading the results can be in such a situation.
To address this problem Bound, Jaeger and Baker (1995) did something very clever
(following the suggestion of Krueger). They replaced each individual�s real quarter of
birth by a fake quarter of birth, randomly generated by a computer. What they found
was amazing: It didn�t matter whether you used the real quarter of birth or the fake
one as the instrument� 2SLS gave basically the same answer! The detailed results are
reported in the following table.The intuition behind the results can be illustrated by a
simple example. 1
yi = xi� + "i
xi = zi + vi
where zi are �xed, vi is a zero mean error term: For simplicity, assume the variables "i1The example is taken from Hansen (2006), with some change.
14
and vi are normally distributed and that "i is independent of vi. We have
�IV � �0 =Pni=1 zi"iPni=1 zixi
Now, suppose the instrument and the endogenous variable are not correlated, i.e., =
0:Then, for a given n,
1pn
nXi=1
zi"i � N1 = N(0; E(z2i "2i ))
1pn
nXi=1
zixi =1pn
nXi=1
zivi � N2 = N(0; E(z2i v2i ))
therefore
�IV � �0 �N1N2:
The above result holds for any n, it also holds when n ! 1: The distribution isdrastically di¤erent from the standard normal approximation and the standard inference
is invalid. In particular, in the presence of identi�cation failure,pn(�IV ��0) diverges,
hence severe over-rejection of the null hypothesis will occur if standard critical values
are used.
15
3.3. Robust Inference with weak instruments in linear models
A partial solution to the above problem is to use test statistics that are not sensitive to
the strength of instruments (this excludes the t and F statistics). Three statistics have
attracted wide attention: the Anderson-Rubin (AR) statistic, Kleibergen�s LM statistic,
Moreira�s conditional likelihood ratio statistic. Asymptotically, those statistics all have
distributions that do not depend on the strength of the instruments.
We consider a linear regression model with a single endogenous regressor and no
included exogenous variable
y = Y � + u
with
Y = Z�+ v
where y and Y are T by 1 vector of observations on endogenous variables, Z is a T by
k matrix of instruments. It is useful to de�ne the following two quantities:
S = (Z 0Z)�1=2(Z 0Y¯b0)p
b0b0(10)
and
T =(Z0Z)�1=2(Z 0Y
¯�1a0)p
a0�1a0(11)
where
Y¯= [y; Y ] ; b0 = [1;��0]0 ; a0 = [�0; 1]0
and is the variance of the reduced form errors. (11) is a su¢ cient statistic for �: Let
T and S denote T and S evaluated with =Y¯0MZY¯
=(T �K) replacing :
The Anderson-Rubin statistic. Anderson and Rubin (1949) proposed testing the
null hypothesis �0= 0 using the statistic
AR(�0) =(y � Y �0)0PZ(y � Y �0)=k
(y � Y �0)0MZ(y � Y �0)=(T � k)=S 0Sk
With �xed instruments and normal errors, the quadratic forms in the numerator and
denominator of are independent chi-squared random variables under the null hypothesis,
16
and AR(�0) has exact Fk;T�k null distribution. Dropping the Gaussian assumption, we
have
AR(�0)!d �2k=k:
Because the numerator and denominator of the Anderson Rubin statistic are eval-
uated at the true parameter value, it has an asymptotic chi-square distribution even if
the unknown parameters are poorly identi�ed.
Kleibergen�s Statistic. Kleibergen (2001) proposed the statistic
K(�0) =
�S 0T
�2T 0T
If k = 1, then K(�0) = AR(�0) . Kleibergen showed that under either conventional or
weak-instrument asymptotics, K(�0) has �21 as its null distribution.
Moreira�s Statistic. Moreira (2003) proposed testing � = �0 using the conditional
likelihood ratio test statistic
M(�0) =1
2
S 0S � T 0T +
s�S 0S + T 0T
�2�4��S 0S
��T 0T
���S 0T
�2�!
The (weak instruments) asymptotic distribution of M(�0) is non-standard. However,
conditional on the value of T ; it does not depend on the strength of the instrumentsand the null distribution can be obtained by Monte Carlo simulation.
Remark 1. Due to the duality between hypothesis tests and con�dence sets, these tests
can be used to construct con�dence sets robust to weak instruments. For example, a
fully robust 95% con�dence set can be constructed as the set of �0 for which the AR
statistic, AR(�0), fails to reject at the 5% signi�cance level.
3.4. Robust Inference in nonlinear models
Nonlinear Anderson�Rubin Statistic. Recall that because the numerator and
denominator of the Anderson Rubin statistic are evaluated at the true parameter value,
17
it has an asymptotic chi-square distribution even if the unknown parameters are poorly
identi�ed. This observation suggests tests of � =�0 based on the nonlinear analog of
the AR statistic, which is the so-called continuous-updating GMM objective function
in which the weight matrix is evaluated at the same parameter value as the numerator
JCU (�0) =
r1
T
TXt=1
m(Xt; �0)
!0S(�0)
�1
r1
T
TXt=1
m(Xt; �0)
!
If there is no serial correlation, then
S(�0) =1
T
TXt=1
~m(Xt; �0) ~m(Xt; �0)0
with ~m(Xt; �0) = m(Xt; �0)� T�1TXt=1
m(Xt; �0)
If m(Xt; �0) is serially correlated, then S(�0) is replaced by the estimate of the long run
variance using some kernel based method. Under the null hypothesis, JCU (�0) has a
�2K limiting distribution where K is the number of moment restrictions.
Notice that we need to re-center m(Xt; �0) when estimating S(�0): Otherwise S(�0)
diverges under the alternative hypothesis and the test does not have power.
Kleibergen�s Statistic. Kleibergen (2005) proposed testing the hypothesis � = �0
using a generalization of K(�0) and showed that the proposed statistic has a chi-square
limiting distribution. You can refer to his paper for details.
18
References
[1] Anderson, T. W., and Rubin, H. (1949), �Estimation of the Parameters of a Single
Equation in a Complete System of Stochastic Equations,�Annals of Mathematical
Statistics, 20, 46�63.
[2] Donald W. K. Andrews, (1999). "Consistent Moment Selection Procedures for Gen-
eralized Method of Moments Estimation," Econometrica, vol. 67(3), pages 543-564.
[3] Angrist, J. D., and Krueger, A. B. (1991), �Does Compulsory School Attendance
A¤ect Schooling and Earnings,�Quarterly Journal of Economics, 106, 979�1014.
[4] Bound, J., Jaeger, D. A., and Baker, R. (1995), �Problems With Instrumental
Variables Estimation When the Correlation Between the Instruments and the En-
dogenous Explanatory Variables Is Weak,�Journal of the American Statistical As-
sociation, 90, 443�450.
[5] Burnside, C. and Eichenbaum, M. (1996), "Small-Sample Properties of GMM-
Based Wald Tests", Journal of Business and Economic Statistics, 14, pp. 294-308.
[6] Chamberlain, G. (1987). �Asymptotic E¢ ciency in Estimation with Conditional
Moment Restrictions,�Journal of Econometrics, 34, 305�334.
[7] Hall (2005), Generalized Method of Moments, Oxford Press.
[8] Kleibergen, F. (2002), �Pivotal Statistics for Testing Structural Parameters in In-
strumental Variables Regression,�Econometrica, Vol. 70, No. 5, 1781�1803.
[9] � - (2005), �Testing Parameters in GMM Without Assuming That They Are Iden-
ti�ed,�Econometrica, Vol. 73, No. 4 (July, 2005), 1103�1123.
[10] Moreira, M. J. (2003),�A Conditional Likelihood Ratio Test for Structural Models",
Econometrica, Vol. 71, No. 4, pp. 1027-1048.
[11] W. K. Newey, (1990). "E¢ cient Instrumental Variables Estimation of Nonlinear
Models,�Econometrica 58, 809-837.
19
[12] � - (1993). "E¢ cient Estimation of Models with Conditional Moment Restrictions,�
in G.S. Maddala, C.R. Rao, and H.D. Vinod, eds., Handbook of Statistics, Volume
11: Econometrics. Amsterdam: North-Holland.
[13] Stock, J. H., Wright, J.H. and Yogo, M. (2002), "A Survey of Weak Instruments
and Weak Identi�cation in Generalized Method of Moments," Journal of Business
& Economic Statistics, Volume 20 , pp. 518-529.
20
AppendixProof of Proposition 1.
Let ��denote the GMM estimator using z�t as instruments and V
� its asymptoticvariance.
Let � denote an alternative GMM estimator using xt as instruments, where xt =f(zt) for some vector-valued function f(:). Let V denote its asymptotic variance of �.
It su¢ ces to show that (V � V �) is a positive semi-de�nite matrix.Write
� = ��+ (� � ��);
Then,
V ar(pT �) = V ar(
pT �
�) + V ar(
pT (� � ��))
+Cov�pT �
�;pTh� � ��
i�+ Cov
�pTh� � ��
i;pT �
��:
Therefore,
V ar(pT �)� V ar(
pT �
�)
= V ar(pT (� � ��)) + Cov
�pT �
�;pTh� � ��
i�+ Cov
�pTh� � ��
i;pT �
��:
Because the �rst term on the right hand side is positive semi-de�nite, the proof will becomplete if we can show
limT!1
Cov(pT �
�;pT (� � ��)) = 0:
Or, equivalently, to show
limT!1
Cov(pT �
�;pT �) = lim
T!1V ar(
pT �
�): (A.1)
To establish (A.1), explicit formulae for ��and � are needed. First consider �
�. It
satis�es1pT
TXt=1
z�t dt(��) = 0:
Take a �rst order Taylor�s expansion around the true value �0;
0 =1pT
TXt=1
z�t dt(�0) +
1
T
TXt=1
z�t@dt(�0)
@�0
!pT (�
� � �0) + op(1):
A-1
Because
limT!1
1
T
TXt=1
z�t@dt(�0)
@�0= E
�z�t@dt(�0)
@�0
�= E
�E
�z�t@dt(�0)
@�0jzt��
= KE
�E
�@dt(�0)
0
@�
���� zt����1E � @dt(�0)@�0
���� zt��� KD�;
where we have de�ned
D� = E
�E
�@dt(�0)
0
@�
���� zt����1E � @dt(�0)@�0
���� zt�� ;we have,
pT (�
� � �0) = �D��1K�1T�1=2TXt=1
z�t dt(�0) + op(1):
Therefore
limT!1
V ar(pT (�
� � �0)) = D��1K�1V ar (z�t dt(�0))K�1D��1 = D��1:
Note that the last equality follows because
V ar (z�t dt(�0))
= V ar
�KE
�@dt(�0)
0
@�
���� zt����1dt(�0)�= KE
�E
�@dt(�0)
0
@�
���� zt����1dt(�0)dt(�0)0���1E � @dt(�0)@�0
���� zt��K 0
= KE
�E
�@dt(�0)
0
@�
���� zt����1E � @dt(�0)@�0
���� zt��K 0
= KD�K 0:
Now consider � and apply similar arguments. We have
� = argmin�
1
T
TXt=1
xtdt(�)
!0S�1
1
T
TXt=1
xtdt(�)
!; (A.2)
where S is a consistent estimate of the optimal weighting matrix S�10 , with
S0 = V ar(xtdt(�0)):
A-2
The �rst order condition of (A.2) implies 1
T
TXt=1
xt@dt(�)
@�0
!0S�1
1pT
TXt=1
xtdt(�)
!= 0:
Take a �rst order Taylor�s expansion of T�1PTt=1 xtdt(�) around the true value �0; we
have
0 =
1
T
TXt=1
xt@dt(�)
@�0
!0S�1
1pT
TXt=1
xtdt(�0)
!
+
1
T
TXt=1
xt@dt(�)
@�0
!0S�1
1
T
TXt=1
xt@dt(�0)
@�0
!pT (� � �0) + op(1):
Because � !p �0, we have
limT!1
1
T
TXt=1
xt@dt(�)
@�0= limT!1
1
T
TXt=1
xt@dt(�0)
@�0� D
andlimT!1
S�1 = S�10 :
Therefore,pT (� � �0) = �(D0S�10 D)�1D0S�10 T�1=2
TXt=1
xtdt(�0):
And
limT!1
Cov(pT (� � �0);
pT (�
� � �0))
= limT!1
E
0@(D0S�10 D)�1D0S�101
T
TXt=1
xtdt(�0)
TXt=1
z�t dt(�0)
!0 �K�1�0D��1
1A(due to iid) = E
�(D0S�10 D)�1D0S�10
�xtdt(�0)dt(�0)
0z�0t� �K�1�0D��1�
= E�(D0S�10 D)�1D0S�10
�xtE
�dt(�0)dt(�0)
0jzt�0z�0t
� �K�1�0D��1� (A.3)
For the term in the middle,
xtE [dt(�0)dt(�0)jzt]0 z�0t = xt�����1E�@dt(�0)
0
@�
���� zt�0K 0 = E
�xt@dt(�0)
@�0
���� zt�K 0:
A-3
Hence, (A.3) equals
E(D0S�10 D)�1D0S�10
�E
�xt@dt(�0)
@�0
���� zt�K 0��K�1�0D��1
= (D0S�10 D)�1D0S�10 E
�E
�xt@dt(�0)
@�0
���� zt�K 0��K�1�0D��1
= (D0S�10 D)�1D0S�10 E
�xt@dt(�0)
@�0
�D��1
= D��1:
A-4