Panel Vector Autoregression under Cross Sectional · PDF fileof the model variables are ......
Transcript of Panel Vector Autoregression under Cross Sectional · PDF fileof the model variables are ......
Panel Vector Autoregression under CrossSectional Dependence�
Xiao Huang
October 2004
AbstractThis paper studies the fully modi�ed (FM) estimation of panel
vector autoregression (VAR) under cross sectional dependence whenthe time dimension of the panel is large. The time series propertiesof the model variables are allowed to be an unknown mixture of sta-tionary and unit root processes with possible cointegrating relations.The common shocks are modelled with a factor structure. We extendfactor analysis in Bai and Ng (2002) and Bai (2003) to vector processand give the asymptotic distribution of the estimated factor and factorloadings. FM estimation is used to obtain the estimates of the para-meters in panel VAR. We use simulation to study the performance offactor analysis and compare the instrumental variable (IV) estimator,FM-VAR estimator, continuously updated estimator of Bai and Kao(2004) and factor augmented estimator. It is found the factor aug-mented method gives better �nite sample properties than other threemethods when the signal from common shock is strong.
Key Words: cross sectional dependence, factor analysis, nonstationary paneldata, VAR
JEL Classi�cation: C13, C23, C33�The author thanks Aman Ullah for stimulating discussions and greatly acknowl-
edges �nancial support from the Department of Economics, U.C.Riverside. All errorsare mine. Correspondence address: Department of Economics, University of California,Riverside, Riverside CA 92521-0427. Phone: (951)9078434. Fax: (951)8275685. Email:[email protected]
1 Introduction
There has been a growing interest in studying the cross sectional depen-
dence in panel data analysis. Cross sectional dependence is common for
both macro- and microeconomic data. Oil shocks, global �nancial crises are
two typical sources of cross-section dependence in a macroeconomic data set
containing time series from di¤erent countries. Political stability of some
particular countries could also have a global e¤ect on other countries�econ-
omy. Even within a country, the performance of di¤erent �rms in di¤erent
industries are likely to be a¤ected by macroeconomic factors like tax policy,
monetary policy, and business cycles. For data in the microeconomic level
within a region, cross sectional dependence is also not rare. Not only does the
general state of the economy a¤ects the household behavior, preferences and
fashions could have an e¤ect on every household as well. Spatial dependence
across di¤erent regions is another example.
The interest in relaxing the assumption of cross sectional independence
has already brought several important papers into the literature. Roughly
speaking, there are two types of interest. The �rst one is focusing on the
parameter estimation in the conditional mean under cross-section depen-
dence. Pesaran (2002) suggested an estimation method based on augmented
regression for both static and dynamic panel. Phillips and Sul (2003a) pro-
posed a median unbiased estimation procedures for estimation,testing and
con�dence interval construction in dynamic panel. Factor loadings are also
estimated in Phillips and Sul (2003a). Analytic bias in dynamic panel estima-
tion ignoring cross section dependence is studied in Phillips and Sul (2003b).
Andrews(2003) studied the instrumental variable (IV) and least square (LS)
estimators for cross section data under cross-section dependence of any form.
Bai and Kao (2004) recently studied the panel cointegration with a factor
1
structure in the errors, where they obtained the limiting distribution of the
fully modi�ed (FM) estimators and loading coe¢ cients. They also proposed
a continously-updated fully modi�ed (CUP-FM) estimator which has better
small sample properties than 2-step FM and OLS estimator. Panel unit root
test under cross section dependence is studied by Moon and Perron (2004),
Pesaran (2003) Im and Pesaran (2003), and others. For panel VAR, Mutl
(2002) studied the maximum likelihood estimator under spatial correlation in
short panel; Pesaran et al.(2004) build a global VAR (GVAR) model where
the focus is on the estimation in the conditional mean. Pesaran(2004) also
develops a test for cross section dependence in panel data.
The other type of interest is related to factor analysis with panel structure
data which assumes that variations in a large number of economic variables
can be modeled by a small number of reference variables, though those refer-
ence variables may not have any economic interpretation. Stock and Watson
(1998,2002a,b) provided some asymptotic results in the context of di¤usion
index forecasting. Forni et al.(2000) and Forni and Lippi (2001) gave the
general result for dynamic factor models. Bai and Ng (2002) gave crite-
ria to select the number of factors under heteroskedasticity and some weak
dependence between the factors and the errors. Bai (2003a) developed an
inferential theory for factor models of large dimensions. Bai and Ng(2003)
developed tests which can distinguish whether nonstationarity in the data
comes from common components or from idiosyncratic source. Bai (2003b)
studied the large-dimension factor models with nonstationary dynamic fac-
tors. Ng(2004) proposed a cross section dependence test which can �nd out
the number of units that are correlated,understanding if there is heterogene-
ity in the correlations, and evaluating the magnitude of the correlations. For
panel VAR, Canova and Ciccarelli (2002) used Bayesian method to integrate
2
Panel VAR and index models for forecasting purpose.
In this paper, we study the cross sectional dependence in nonstationary
Panel VAR via factor analysis. Our interest is the estimation of both re-
gression coe¢ cients and cross sectional shocks when the time dimension of
the panel is large. We allow parameter heterogeneity across di¤erent units.
The basic procedure is to get a �rst stage consistent estimator for each cross
section unit i, ignoring the cross sectional dependence. Combining the �rst-
stage residuals from di¤erent units, we apply the method in factor analysis
to determine the number of shocks and estimate those shocks. We then rees-
timate the regression coe¢ cient based on the estimated factor and residuals
in second stage using the fully modi�ed estimation of Phillips (1995) and
factor augmented regression. It is noted that the theory of factor analysis
for both N and T large developed so far are all for scalar process, see Bai
and Ng (2002), Bai (2003). We thus extend the method to vector process in
this paper.
The rest of the paper is organized as follows. Section 2 extends the
method of factor analysis to vector process and gives the asymptotic results.
Section 3 gives the fully modi�ed estimator of VAR with cross sectional
dependence. Section 4 provides the simulation result. Section 5 are the
conclusions.
2 Factor Model
Consider the following qth order panel VAR model,
yit = Ji (L) yi;t�1 + xit; (2.1)
xit = �ift + eit; i = 1; � � � ; N , t = 1; � � � ; T; (2.2)
3
where yit; xit and "it arem�1 vectors, �i is am�rmatrix of factor loadings, ftis r�1 vector of cross sectional shock, Ji (L) =
Pq�1h=1 JihL
h�1 and i = 1; : : : N
, t = 1; : : : ; T: De�ne J�i (L) =Pq�1
h=1 J�ihL
h�1; J�ih = �Pq
g=h+1 Jih; and Ai =
Ji (1) ; (2:1) can be rewritten as
yit = J�i (L)�yi;t�1 + Aiyi;t�1 + �ift + eit: (2.3)
If ft and eit are assumed to be i:i:d: or with limited time dependence
structure, MA(1), say, a �rst stage consistent estimator J�i (L) and Ai can be
obtained via instrumental variable method, provided T is su¢ ciently large.
Let xit = yit � J�i (L) � Aiyi;t�1, which can be treated as observed datato recover the unobservable factor structure, �ift; using method of factor
analysis. It is well known that if xit is a scalar, the estimated factor ft
ispT times one of the k largest eigenvector of the outer product of the
observed data matrix and factor loadings �i can also be estimated, where
k is the estimated number of factors. We show in the following that for a
vector process xit, the solution to factor analysis with larger T and N is the
same as the scalar process except for some modi�cation with respect to the
dimension of xit:
Consider the factor model in (2:2) : De�ne x0t = (x1t; � � � ; xNt) ; a m�Nmatrix; xi = (xi1; � � � ; xiT )0 ; a T �m matrix with xit being a m� 1 vector,whose pth element is xitp: Let �
0 = (�01; � � � ; �0N) be a r �mN matrix with
�0i = (�i1; � � � ; �im) ; a r �m matrix, where r is the true number of factors
and �ip is an r� 1 vector. Similar to xt and xi; we de�ne et and ei; and eachelement of et or ei is "it: Let e = (e1; � � � ; eN) , f = (f1; � � � ; fT )0, a T � rmatrix and x = (x1; � � � ; xN) ; and we have
xiT�m
= fT�r
�0ir�m
+ eiT�m
; (2.4)
xT�mN
= fT�r
�0r�mN
+ eT�mN
: (2.5)
4
The objective function is
min�ip;ft
V (k) =1
mNT
NXi=1
TXt=1
mXp=1
�xitp � �0ipft
�2: (2.6)
Let�~f; ~��be the solution to (2:6) ; the f.o.c.s give the following
~�ip = (P
t ftf0t)�1(P
t ftxitp) ; (2.7)
~ft =�P
i
Pp �ip�
0ip
��1 �Pi
Pp xitp�ip
�: (2.8)
Plug (2:7) into the (2:6) ; we have
V (k) =1
N
Pi vec
�x0i � x0if (f 0f)
�1f 0�0vec
�x0i � x0if (f 0f)
�1f 0�
=1
N
Pi tr��xi � x0if (f 0f)
�1f 0xi
��x0i � x0if (f 0f)
�1f 0��
=1
Ntr (P
i xix0i)�
1
Ntr(f 0
Pi xix
0if): (2.9)
The last line of (2:9) follows from the normalization condition f 0f=T = Ir:
Minimizing (2:6) is equivalent to maximizing the second term in (2:9) ; the
solution of which ispT times k largest eigenvector of the matrix xx0: The
estimated factor loading is then given by ~�0= ~f 0x=T: Alternatively, we can
plug (2:8) into the objective function and we have
V (k) =1
T
Pt
�vec (x0t)� � (�0�)
�1�0vec (x0t)
�0 �vec (x0t)� � (�0�)
�1�0vec (x0t)
�=
1
T
Pt vec(x
0t)0vec (x0t)�
1
T
Pt vec (x
0t)0� (�0�)
�1�0vec (x0t)
=1
T
Pt vec(x
0t)0vec (x0t)�
1
Ttr (�0x0x�) ; (2.10)
where the last line of (2:10) follows from the normalization �0�= (mN) = Ir
and ~� is chosen aspmN times k largest eigenvectors of x0x: The factor is
given by ~f = x~�= (mN) : The two approaches are both valid and di¤er in
computation time for di¤erent values of T and N:
5
The above solution is the same as the scalar process case in Stock and
Watson (1998), and it is natural to ask whether the factor number selection
criteria developed in Bai and Ng (2002) apply in this case. The answer is yes,
but we need to modi�ed the criteria functions with respect tom , the number
of variables included in the vector regression process. Under the assumption
that the number of cross sectional shocks a¤ecting each units are the same,
the correct selection of the number of true underlying factors can be done
using the criterion functions in Bai and Ng (2002). For example, if we have
three variables in the vector process, GDP, interest rate and equity price,
then we can select the series GDP from each country or region, forming a
dynamic panel, and applying Bai and Ng�s method. The number of cross
sectional shocks, r; can then be consistently estimated with su¢ cient data.
However, in some cases, we are also interested in estimating the factors,
factor loadings, and possibly using them for forecasting purpose, then the
scalar process which ignores the information in interest rate and equity price
is certainly not suitable. Hence, if the interest is not only the determination
of the number of factors, but also the estimation of factors, we need to
consider the vector process. In the following, we extend the Bai and Ng�s
(2002) method to vector process and also gives the asymptotic results of the
estimated factors and factor loadings as in Bai (2003).
The following assumptions are needed 8i; j = 1; � � � ; N ; s; t = 1; � � � ; T ;p; q = 1; � � � ;m:Assumption A. Ekftk4 <1 and T�1
Pt ftf
0t ! �f as T !1 where �f
is a positive de�nite matrix.
Assumption B. k�ipk � �� <1; and k (mN)�1 �0����k ! 0 as N !1for some positive de�nite matrix ��:
Assumption C. For a �nite positive constant M;
6
1. E (eitp) = 0; E jeitpj8 �M;2. E
�m�1N�1P
i
Pp eispeitp
�= mN (s; t) ; j mN (s; s)j � M for all
s; t; p: T�1P
s
Pt j mN (s; t)j �M;
3. E(eitpejsq) = � ij;t;pq with j� ij;t;pqj � j� ij;pqj for some � ij;pq and all t:Also, (mN)�1
Pi
Pj
Pp
Pq j� ij;pqj �M;
4. Ej(mN)�1=2P
i
Pp (eitpeisp � E (eitpeisp)) j4 �M 8(s; t):
Assumptions A and B are the same as Bai and Ng (2002). Assumption
C2 allows general correlation structure among variable within a particular
unit i: For unit i; its variables are allowed to have both contemporaneous
dependence and time dependence. Assumption C3 allows the cross sectional
dependence in the error term. We do not make the assumption of weak
dependence between factors and idiosyncratic errors here. This assumption
can be imposed without any e¤ect on the asymptotic results, but it is not
appropriate for the factor-augmented regression model developed in Section
4.
In order to consistently estimate the number of factors, we need a penalty
function g (mN;T ) in the following criteria function
PC (k) = V�k; fk
�+ kg(mN;T );
where V�k; fk
�is obtained by plugging estimated factors into (2:6) : Mini-
mizing PC (k) w.r.t. k gives the consistent estimator of r:
Theorem 2.1. Under the Assumptions A-C and k factors are obtained
by the principal component method. Let k = argminPC(k): Then limN;T!1
, Prob(k = r) = 1 if g (mN;T )! 0 and C2mNTg (mN;T )!1 as N; T !1with CmNT = min
�pmN;
pT�:
See Appendix for proof. This theorem is similar to the one in Bai and
Ng (2002) except the inclusion of m in the penalty function. When N and
7
T are large, the role of m in the criteria function is negligible However, it
does make some di¤erence in �nite sample. Explicit forms of g (mN;T ) are
given in Section 4. The distribution theory of the estimated factors, factor
loadings and common component can also be obtained with some additional
assumptions.
Assumption DP
s j mN (s; t)j �M andP
i
Pp
Pq j� ij;pqj �M:
Assumption E
1. Ek (mNT )�1=2P
s
Pi
Pp fs (eispeitp � E (eispeitp)) k2 �M;
2. Ek (mNT )�1=2P
t ft�0vec (e0t) k2 �M;
3. For each t; as N ! 1; (mN)�1=2P
i
Pp �ipeitp
d! N (0;�t) ; where
�t = limN!1 (mN)�1P
i
Pj
Pp
Pq �ip�
0jqE (eiptejqt) ;
4. T�1=2P
t fteitpd! N (0;�ip) where�ip = p limT!1 T
�1Ps
PtE (ftf
0seitpeisp) :
Assumption F The eigenvalues of the r � r matrix ���f are distinct.Assumption F is the same as Assumption G in Bai (2003). Under the
Assumptions A-F, we can easily derive the distribution theory for the esti-
mated factors, factor loadings and common components. Since the results
are quite similar to the scalar case in Bai (2003), we state the distribution
theory in one Theorem 2.2 and a sketch of the proof is given in Appendix A.
Theorem 2.2. Under the Assumptions A-F andpN=T !1 as N; T !
1; we have the following distribution theories for the estimated factor model,pmN
�~ft �H 0ft
�d! N (0; V �1Q�tQ
0V �1) ;pT�~�ip �H�1�ip
�d! N (0; Q0�1��ipQ
�1) ;�1pmNVitp +
1TWitp
��1=2(~citp � citp)
d! N (0; 1) ;
where H = (�0�=mN)�f 0 ~f=T
�V �1mNT ; VmNT is the diagonal matrix of the
largest r eigenvalues of (mNT )�1 xx0 in decreasing order, V , Q is de�ned in
Proposition 1 in Bai (2003), Vitp = �0ip�
�1� �t�
�1� �ip;Witp = f
0t�
�1f �ip�
�1f ft; ~citp =
8
~�0ipft:
See Appendix for proof. The above asymptotic theory can be used to
construct con�dence interval in factor analysis. It will also be of possible use
for the testing of cross sectional dependence in Panel VAR.
Theorem 2.1 and Theorem 2.2 are very similar to the corresponding the-
orems in Bai and Ng (2002) and Bai (2003), which is not a surprise if we
reshape the data matrix. Consider a simple case with N = 3;m = 2: Suppose
we have data for 3 countries, and for each countries, our VAR includes GDP
and interest rate. Then we can always reorder our data in such a way that
within each country, there are 2�T observations, the �rst T observations isthe GDP series, and the second T observations is the interest rate observa-
tions. Thus the data for each country can be split into two groups, where
each group can be treated as an new individual unit and we will then have
N � m = 3 � 2 = 6 units. Let N� = m � N; then it is exactly the samemodel as in Bai and Ng (2003). In the matrix notation, it is the reordering
of the column vector of x in (2:5) ; which gives the same solution for (2:6)
after singular value decomposition. It then motivates us to think what is
the advantage of consider the factor analysis for vector process? The �rst
advantage is its simplicity. Applying factor analysis to (2:2) doesn�t require
reshape of the data matrix, and the estimation procedure is much simpler
when there is iteration procedure as in Bai and Kao (2004). The second
advantage is related to e¢ cient estimation. Note reshaping the data matrix
treating each time series in each cross sectional unit as a new individual unit,
which is not useful when we implement e¢ cient estimation of the factor and
factor loading. Table 2 shows that the factor analysis gives misleading results
when there are strong presence of group-wise heteroskedasticity. If the vari-
ance covariance matrix for GDP and interest rate is very di¤erent for each
9
country, then it is possible the factor selection criteria and factor estimation
method will give wrong number of factors and ine¢ cient estimation of factors
and factor loadings. In this case, a more appealing procedure will be using
the information in the covariance matrix and changing the objective function
to
min�ip;ft
VmNT =1
mNT
NXi=1
TXt=1
(xit � �ift)0��1i (xit � �ift) ; (2.11)
where �i in practice can be the �rst stage estimation of the covariance matrix
of the data in group i and xit is am�1 vector. (2:11) may give more e¢ cientestimation results in the presence of heteroskedasticity and serial correlation.
3 Fully Modi�ed Estimation
The common shock studied in Section 2 is assumed to follow a station-
ary process, but we allow possible nonstationarity and cointegration for the
process yit: Since T is large, we can estimate the model for each individual
unit and allow for possible parameter heterogeneity, and it becomes a VAR
model with cross sectional dependence. By assuming normality of the error
term "it in (2:2) and exogeneity of the cross sectional shocks, one may ex-
tend Johansen�s (1988) method to the process with exogenous variables as in
Pesaran et al. (2000). In that case, we can further relax the assumption of
stationary cross sectional shocks. However, the asymptotic results of factor
analysis for nonstationary vector process have to be derived similar to Bai
(2004). Without imposing any assumption on the error term and the cross
sectional shocks, we use Phillips�(1995) fully modi�ed method for estimation
in this section.
Fully modi�ed method was developed by Phillips (1995) as an alternative
way to estimate possibly cointegrated VAR process without pretesting for
10
cointegration rank and the location of unit root. It can correct possible serial
correlation in the error term and the endogeneity of the regressors resulting
from cointegration. (2:1) di¤ers from the model in Phillips (1995) with the
extra term �ift; which, combined with the error term "it, can be treated as
a composite error and we expect the asymptotic theory for the estimated
coe¢ cient in (2:1) gives some di¤erent variance covariance structure. It is
noted that recently Bai and Kao (2004) also used the fully modi�ed method
for estimation in panel with cross sectional dependence. However, the model
in this paper di¤ers from theirs in several aspects: we study panel VAR
instead of static panel with scalar process; time period T is large in this
paper which allows for parameter heterogeneity; we also consider the �nite
sample performance of factor augmented regression through simulation given
in Section 4.
Rewrite (2:1) as
yit = J�i (L)�yi;t�1 + Ayi;t�1 + �ift + eit (3.1)
= Jizt + Aiyi;t�1 + e�it; (3.2)
where Ji = (J�1 ; � � � ; J�q�1); zt = (�y0i;t�1; � � � ;�y0i;t�q+1) and e�it = �ift + eit:Assumption C in Section 2 allows very general form of serial correlation for ft
and "it: However, in order to get a �rst stage consistent estimator from (2:1) ;
we have to impose limited serial correlation structure in both the factors and
errors. For simplicity, a MA(1) structure for both ft and "it is assumed in
simulation. We can certainly allow for higher order moving average process at
the cost of truncating more data. De�ne the following notations for the fully
modi�ed estimator1: let at and bt be two covariance stationary processes.
The long-run and one-sided long-run covariance matrix between a and b are
1These succint notations are from Kauppi (2004).
11
given by ab =P1
j=�1E (at+jbt) and �ab =P1
j=0E (at+jbt) ; whose kernel
estimators are ab =PT�1
j=�T+1w (j=K) �ab (j) ; �ab =PT�1
j=0 w (j=K) �ab (j) :
w (�) is the kernel function with a lag truncation or bandwidth parameter Kand �ab (j) = T�1
P1�t;t+j�T at+jb
0t: The following assumptions except for
H1 and H2 are from Phillips (1995,p.1044).
Assumption H.
1. ft is identically distributed MA(1) process with zero mean and covari-
ance matrix �f in Assumption A and �nite fourth order cumulants.
2. "it is identically distributed MA(1) process with zero mean and covari-
ance matrix �" in Assumption A and �nite fourth order cumulants.
3. jIm � J (L)Lj = 0 has roots on or outside the unit circle.4. Ai = Im + ��
0 where � and � are m� r0 matrices of full column rankr0; 0 � r0 � m:5. �0? (J
� (1)� Im) �? is nonsingular, where �? and �? are m� (m� r0)matrices of full column rank such that �0?� = �
0?� = 0:
De�ne the orthogonal transformation matrixHT = [�; �?]:Multiply both
sides of (3:2) with HT and we obtain the set of equations similar to (31a)
and (31b) in Phillips (1995).
y1it
= J1zt + Ai11y1i;t�1 + e�1it; (3.3)
y2it
= y2i;t�1 + u2t; u2t = e
�2it + J2zt + A21y1i;t�1; (3.4)
where the coe¢ cients,J1; J2; Ai11; Ai21 are elements of the properly parti-
tioned matrix H 0TJiHT and H
0TAiHT in (28) of Phillips (1995, p.1045), and
underscored variables are left-multiplication of the original variable with H 0T :
Multiplying the variable with HT separates the stationary and nonstationary
part of the data. Thus y1itis the stationary part and y
2itis the nonstationary
part. De�ne wit = (f 0t ; e0it; u
02t)
0 to be a vector stationary process. The func-
tional central limit theory for wit gives T�1=2P[T �]
t witd! B (�) � BM () :
12
jp and �jp is the corresponding element in and � according to the di-
mension of ft; eit and u2t for j; p = f; e; 2:
In matrix notation, we write (3:2) as,
Y 0i = J iZ0i + AiY
0i;�1 + E
�0i = F iX
0i + E
�0i (3.5)
= F 1iX0i1 + F 2iX
0i2 + E
�0i ; (3.6)
whereX 0i1 andX
0i2 contain the stationary and nonstationary regressors,respectively.
It is immediately seen from (3:6) that the similar asymptotic results for F ican be obtained as in Phillips (1995). The result is summarized in the fol-
lowing theorem.
Theorem 3.1. Under Assumption H, we havepT�F+
1i � F+1i�
d! N�0; (Im ��111 )''e(Im ��111 ) + (Im ��111 )''f (Im ��111 )
�;
T�F+
2i � F+2i�
d! �i
�R 10dBf �2B
02
��R 10B2B
02
��1+�R 1
0dBe�2B
02
��R 10B2B
02
��1;
where��111 = E (Xi1tX0i1t), ''f =
P1j=�1E (�ift+jf
0t�0i Xi1t+jX
0i1t) ; ''e =P1
j=�1E�eit+je
0it Xi1t+jX
0i1t
�; Bf �2 = Bf�f2
�122 B2, Be�2 = Be�e2
�122 B2
and B2 � BM (22) : Bf and Be are the corresponding vector Brownian mo-
tion in wit (=H 0Twit):
The proof of Theorem 3.1 is a straight forward extension of the Theorem
5.1 in Phillips (1995) to the composite error "�it = �ift + "it and thus is
omitted. In practice, we needn�t know the nonstationary property of the
data, and the estimator F+i takes the following form
F+i = [Y 0i Zi... Y 0i Yi;�1 � T �i�f�y�1 � f�y�1�y�1�y�1
��Y 0i;�1Yi;�1 � T ��y�1�y�1
��T �e�y�1 � e�y�1�y�1�y�1
��Y 0i;�1Yi;�1 � T ��y�1�y�1
�] (X 0
iXi)�1: (3.7)
If IV estimator is used in the �rst stage, the T in (3:7) should be(T � 2) :The distribution of the FM estimator in the original coordinates is given
in the following corollary.
13
Corollary 3.1. Under the results in Theorem 3.1, we havepT�F+1i � F+1i
�G
d! N�0; (Im ��111 )''e(Im ��111 ) + (Im ��111 )''f (Im ��111 )
�;
T�F+2i � F+2i
�G?
d! �i
�R 10dBf �2B
02
��R 10B2B
02
��1+�R 1
0dBe�2B
02
��R 10B2B
02
��1;
where
G =
�Iq�1 H 0
0 �
�; G? =
�0...�0?
�and Bf �2 and Be�2 are de�ned similarly as in Theorem 3.1.
(3:7) gives the estimates of nonstationary VAR with unknown mixture of
I (0) and I (1) variables. We may also have I (2) variables in applications
and the above method doesn�t work in that case. However, it is possible
to use the resiudal-based fully modi�ed VAR procedure in Chang (2000) for
estimation purpose where unknown mixture of I (0) ; I (1) and I (2) variables
is allowed.
The endogeneity and serial correlation correction in (3:7) is taken with
respect to both the factors and error terms, which is equivalent to making
correction with respect to the composite error term e�it = �ift + eit: In other
words, FM-VAR ignoring the cross sectional dependence yields the same so-
lution as the one in (3:7). But that argument is based on procedure that
we do not make use of the estimated factors. There are several ways where
we have di¤erences between FM-VAR with and without cross sectional de-
pendence. One way is to consider the dynamic behavior in the factors. The
limited dynamic behavior in ft and "it is assumed, but can be tested. For
forecasting purpose, one can use the method in Galbraith et al.(2002) to
recover the moving average coe¢ cient and improve forecasting performance
using the information in the factors and errors. The other way is to use the
factor augmented regression to improve the estimation even if there is no
dynamic behavior in either shocks or errors. Once the common shocks are
14
estimated, we can consider the following factor augmented regression
yit = ai + J�i (L)�yi;t�1 + Ayi;t�1 + �ift + eit: (3.8)
The FM-VAR for (3:8) needs to correct the endogeneity and serial correlation
only w.r.t. eit; and ft is added as a stationary regressor. Generally, we
expect the factor augmented regression gives better estimation results. The
simulation in Section 4 shows that this argument holds conditioning on the
nature of the data generating process and how precise the factor analysis is.
4 Simulation Results
In this section we study the �nite sample properties of both the vector factor
analysis and the FM-VAR with cross sectional dependence.
Three cases for the estimation of the number of factors are considered
in the simulation, and the estimated number of factors, k; is simulated 1000
times and its average is reported in Tables 1,2 and 3, respectively. The
maximum number of factors during estimation is set to be 10. In the �rst
case, we consider the following data generating process,
xitp = �0ipft + eitp;
i = 1; � � � ; N t = 1; � � � ; T p = 1; � � � ;m;
where each element of ft; �ip and "it are generated as i:i:d: N (0; 1) 8i; t; p:m is set to be 5 and N and T take di¤erent values. The number of cross
sectional shocks, r; is set to be 5. The factor number selection criteria are
15
modi�ed from those in Bai and Ng (2002).
PCp1 (k) = V�k; fk
�+ k�2
�mN + T
mNT
�log
�mN
mN + T
�PCp2 (k) = V
�k; fk
�+ k�2
�mN + T
mNT
�log�C2mNT
�PCp3 (k) = V
�k; fk
�+ k�2 log
�C2mNT
�=C2mNT
ICp1 (k) = log�V�k; fk
��+ k
�mN + T
mNT
�log
�mN
mN + T
�ICp2 (k) = log
�V�k; fk
��+ k
�mN + T
mNT
�log�C2mNT
�ICp3 (k) = log
�V�k; fk
��+ k log
�C2mNT
�=C2mNT
Table 1 gives the results for homoskedasticity case. All the six criteria
consistently select the number of factors except for the case when T is very
small. In Table 2, we consider the case when there are group-speci�c het-
eroskedasticity, where each group�s variance is generated from Chi-square
distribution with 2 degrees of freedom. The generated Chi-square variable is
then multiplied with the standard normal random variable to generate the
error term. Elements in ft and �ip are still from N (0; 1) : We observe that
for the heteroskedasticity case considered in this section, the six criteria still
consistently select the number of factors when the number of cross sectional
units is large and T is moderately large. When the number of cross sectional
units is small, they all give wrong selection of the number even T is large.
This suggests that when there is strong heteroskedasticity across groups, the
number of cross sectional units has to be large enough for consistent selection
of the factors. The performance of the six criteria is worse than they do in
Table IV of Bai and Ng (2002) and reason is heteroskedasticity in the data
generating process (DGP) in Table 2 is stronger. In Table 3, the factor ft
follows a MA(1) process and the MA coe¢ cient matrix is generated from
16
N (0; 1) : �ip and "it are from N (0; 1) : We see from the table that for most
of the cases, all the six criteria gives correct answer. As in Table 1, if T is
too small, those criteria functions fail.
Next, we study the �nite sample property of the FM-VAR with cross sec-
tional shocks. For simplicity, we consider the bivariate VARMA(1,1) model:
The number of factors is set to be 1. (3:1) reduces to
yit = Aiyi;t�1 + �ift + eit; (4.1)
ft = �fuft�1 + uft; (4.2)
eit = �"u"t�1 + u"t: (4.3)
The coe¢ cient matrix is chosen from Binder et al. (2003).
Ai =
�0:5 0:1�0:5 1:1
�with � = (�0:5;�0:5)0 ; � = (1;�0:2)0 and we assume parameter homogene-ity across di¤erent groups to simplify the DGP. Each element of �i follows
N (0:1; 1) or N (0:5; 1). (uft; u0"t)0 follows
�uftu"t
�v N
0@ 1 0 00 1 0:80 0:8 1
1A ;where we allow for correlation between elements of u"t; but correlation be-
tween uft and u"t is not allowed. Another value of 5 for the variance of uft is
also tried in simulation. �f = 0:2 or 0:8 in simulations. �" takes the following
value
�" =
�0:3 0:4�"21 0:6
�;
where �e21 = �0:8; 0; 0:8 in simulation. This value of �e is taken from Phillipsand Loretan (1991). Finally, we let N = 50; 100 and T = 50; 100; 200 and use
di¤erent combinations of (N; T ) in simulation. 9 di¤erent simulation tables
17
for FM-VAR are reported in this section. We number these 9 DGPs from
DGP4 to DGP12, corresponding to the numbering of the tables.
N �2f �f �e21 �iDGP4 50 1 0.2 -0.8 N (0:1; 1)DGP5 50 5 0.2 -0.8 N (0:1; 1)DGP6 50 5 0.6 -0.8 N (0:1; 1)DGP7 50 5 0.2 0.8 N (0:1; 1)DGP8 50 5 0.2 0.8 N (0:1; 1)DGP9 50 5 0.8 0 N (0:5; 1)DGP10 100 5 0.8 0 N (0:5; 1)DGP11 50 1 0.8 0 N (0:5; 1)DGP12 100 1 0.8 0 N (0:5; 1)
The long-run and one-sided variance covariance matrices in (3:7) are cal-
culated with the KERNEL procedure in COINT 2.0, where we use the Parzen
kernel and an arbitrary lag truncation number of 5. Simulations using other
kernels give similar results and are not reported in this paper.
In each table, we report the bias and standard deviation (s.d.) of each
element of Ai with 1000 replications. Besides the FM estimator and factor
augmented (AUG) FM estimator, �rst-stage instrumental variable estimator
(IV) and continuously-updated estimator (UFM) of Bai and Kao (2004) are
also considered. The continuously-updated estimator is obtained through an
iteration procedure: FM is used as estimator as the initial estimator, and
by reestimating the factors, we get another FM estimator. The procedure is
repeated until convergence is reached. The maximum iteration steps is set
to be 30, and the convergence criterion is 0.001.
In Table 4, when T is small, 3 out of 4 parameter estimates from the
augmented regression are better than those from other methods. When T
is large ( T = 200), the FM estimator is catching up in two parameter
estimates, Ai11 and Ai12: If the variance �2f is increased as in Table 5, the
augmented estimator performs worse than FM estimator and updated FM
18
estimator. Only in the case when T is large, augmented regression gives 1 out
of 4 parameter estimates that is better than the FM estimator. The reason
for this is that when the variance of cross sectional shock is increased, it
becomes harder for factor analysis to extract the covariability across groups.
If we increase the magnitude of �f from 0.2 to 0.6 in Table 6, there is some
improvement in the performance of augmented regression and it happens only
in the case when T is large. Overall, no estimator dominates the others in
Table 6: If we change �"22 to 0.8, the simulation results in Tables 7 and 8
still give mixed results regarding the performance of augmented regression
compared to other three estimators. The simulation tables reported so far
do not have a positive conclusion on the e¤ectiveness of the factor augment
regression in terms of bias, though MSE is smaller in this case. A further
look at the DGP gives the answer. Notice in DGP 4 to 8, we have a very
noisy error process, and the variance in Tables 5 to 8 is also large. As argued
previously, large variance decrease the e¢ ciency of factor analysis, though
the number of factor can still be estimated; also, noisy error term will weaken
the signal from cross sectional shock if the magnitude of error, "it; relative
to cross sectional shock, ft; is large.
These �ndings suggest us to investigate the performance of factor aug-
mented regression when the signal from ft is strong compared to that from
"it and the variance of ft is small. It is expected that in these cases, the fact
analysis can e¢ ciently extract the cross sectional shocks and the augmented
regression gives better results. This is indeed the results in Tables 9-12. In
Tables 9 and 10, we have the same DGP, but di¤erent sample size. It is clear
that as T gets larger, the augmented regression performs better in 3 out of
4 parameter estimates than other methods. Even for the parameter A22; the
bias of AUG is not large, 0.0071. We also try the case when the variance of ft
19
is small and results is reported in Tables 11 and 12 where AUG gives compa-
rable performance with other three methods in some estimated parameters
and gives substantial improvement for other estimated parameters :
5 Conclusion
This paper studies the cross sectional dependence in panel VAR using method
of factor analysis and makes the following contributions to the literature: 1)
we extend factor analysis to the vector process. 2) we give the limiting
distribution of the FM-VAR with cross sectional dependence 3) �nite sample
properties of IV, FM, UFM and AUG is investigated through simulation.
Our �ndings are: 1) The vector factor analysis proposed in this paper gives a
simpler approach than that in Bai and Ng (2002) in analyzing the data with
panel VAR structure. Moreover, the approach is suitable for e¢ cient factor
analysis. The simulation results for both factor analysis, especially those in
Table 2, and FM estimation indicates unsatisfactory performance of factor
analysis under certain DGPs, and it is hoped the e¢ cient method can take
into consideration of both heteroeskedasticity and serial correlation and give
better estimation. 2) Like in Phillips (1995), we give the asymptotic results
of the estimators for both stationary and nonstationary variables in panel
VAR. Again, they are found to have a normal mixture distribution in the
limit. 3) We �nd the six modi�ed factor number selection criteria generally
select the true number of factors, but give wrong answer for some DGPs with
heteroskedasticity and serial correlation considered in this paper. Tables 9
to 12 shows that augmented regression performs reasonably well when cross
sectional signal is strong relative to the magnitude of heteroskedasticty and
serial correlation in the error term. However, if there is strong presence of
heteroskedasticity and serial correlation, the factor analysis cannot extract
20
the factors e¢ ciently and augmented regression performs worse than FM-
VAR estimators.
There are several important questions remain unanswered in the literature
of panel VAR with cross sectional dependence. One is testing the existence of
cross sectional shocks and assessment of magnitude of those shocks, which is
important before we apply the augmented regression. It can possibly be done
using of the approach in Ng(2004). Phillips and Sul (2003) give the analytic
bias for dynamic panel ignoring cross sectional dependence, and extension of
their work to panel VAR is also interesting. Comparison of the �nite sample
property of the bias-correted estimator and the four estimators considered in
this paper is necessary. Another interesting direction is the extension of the
result in this paper to nonstationary cross sectional dependence, where we
may obtain the results similar to those in Bai and Ng (2004). The property of
the more e¢ cient factor analysis as in (2:11) also needs to be investigated. It
is important to note that for many microeconomic data, the time dimension
is short, and extension of the work in Binder et al(2003) to cross-section
dependence is necessary. We hope to study these issues in subsequent work.
.
21
Appendix A
Lemma A.1 Under Assumptions A-C, we have
(i) T�1P
s
Pt mN (s; t)
2 �M1;
(ii) E�T�1
Pt k (mN)
�1=2Pi
Pp eitp�
0ipk2
��M1;
(iii) E
�T�2
Pt
Ps
�(mN)�1
Pi
Pp xitpxisp
�2��M1;
Proof of Lemma A.1. The proof follows the same line as that in Lemma
A.1 of Bai and Ng (2002). �Lemma A.2. For any �xed k > 1; there exits a r � k matrix Hk with
rank = min (k; r) and CmNT = min�pmN;
pT�; such that
C2mNT
�T�1
Pt kfkt �Hk0ftk
�= Op (1) (A.1)
Proof of Lemma A.2. This lemma di¤ers from the Theorem 1 in Bai
and Ng (2002) with the extra m, the dimension of the VAR process. The
proof is analogous to that of Bai and Ng (2002), except that there is have
one more dimension . Let Hk0 = ( ~fk0f=T ) (�0�=mN) ; then it can be veri�ed
that the following equality holds,
~fkt �Hk0ft = T�1
TPs=1
~fks mN (sp; tp)+T�1
TPs=1
~fks �st+T�1
TPs=1
~fks �st+T�1
TPs=1
~fks �st;
(A.2)
where
�st = vec(e0s)0vec (e0t) =mN � mN (s; t) ; (A.3)
�st = (mN)�1 f 0s�0vec (e0t) ;
�st = (mN)�1 f 0t�0vec (e0s) :
Using the inequality (w + x+ y + z+)2 � 4 (w2 + x2 + y2 + z2) ; it is easyto see that kfkt �Hk0ftk � 4 (at + bt + ct + dt) ; where at = T�2k
Ps~fks mN (s; t) k2;
22
bt = T�2kP
s~fks �stk2; ct = T�2k
Ps~fks �stk2; dt = T�2k
Ps~fks �stk2: Then
T�1P
t kfkt �Hk0ftk � 4T�1P
t (at + bt + ct + dt) :
From Lemma A.1 (i) and Cauchy-Schwarz inequality, T�1P
t at is Op(1):
For bt;
T�1P
t bt = T�3P
t
Ps
Pu~fk0s~fku�st�ut
��T�1
Ps k ~fkt k2
� �T�2
Ps
Pu (P
t �st�ut)2�1=2
But E (P
t �st�ut)2 � T 2maxs;tE j�stj
4 �T 2maxs;t (mN)
�2E���(mN)�1=2Pi
Pp (eitpeisp � E (eitpeisp))
���4� (mN)�2M:Thus T�1
Pt bt � Op (T=mN) : Consider ct;
ct = T�2k (mN)�1 �0vec (e0t)P
s~fks f
0sk2
� (mN)�2 k�0vec (e0t) k2�T�1
Ps k ~fks k2
� �T�1
Ps kfsk2
�= (mN)�2 kvec (e0t)�k �Op (1)
Then T�1P
t ct = Op (1)�(mN)�1P
t k (mN)�1=2 vec (e0t)�k = Op (1=mN)
by Lemma A (ii) ; and the result for dt can be proved in a similar way. Com-
bine the results for at to dt and plug them into (A:2) ; we have the result in
Lemma A.2.�Lemma A.3. With 1 � k � r;
V�k; fk
�� V
�k; fHk
�= Op
�C�1mNT
�(A.4)
Proof. Similar to the Lemma 2 in Bai and Ng (2002), we have
V�k; fk
�= (mNT )�1
Pi vec (xi)
0�Im Mk
f
�vec (xi)
V�k; fHk
�= (mNT )�1
Pi vec (xi)
0 (Im MfH) vec (xi)
V�k; fk
�� V
�k; fHk
�= (mNT )�1
Pi vec (xi)
0�Im
�PfH � P kf
��vec (xi)
23
where MfH = IT � PfH and Mkf= IT � fk
�fk0fk
��1fk0 = IT � P kf : Let
Dk = fk0fk=T and D = Hk0f 0fHk=T; following the same decomposition in
Bai and Ng (2002), we have
P kf� PfH = T�1[
�fk � fHk
�D�1k
�fk � fHk
�0+�fk � fHk
�D�1k H
k0f 0
+fHkD�1k
�fk � fHk
�0+ fHk
�D�1k �D�1�Hk0f 0] ((A.5))
and (mNT )�1P
i vec (xi)0�Im
�PfH � P kf
��vec (xi) = I + II + III + IV
. De�ne the T � 1 vector xi�p = (xi1p; xi2p; � � � ; xitp)0, then
I = (mNT )�1P
i vec (xi)0�Im
�fk � fHk
�D�1k
�fk � fHk
�0�vec (xi)
= (mNT )�1P
i
Pp x
0i�p
�fk � fHk
�D�1k
�fk � fHk
�0xi�p
��T�2
Pt
Ps kfkt �Hk0ftk2kfks �Hk0fsk2kD�1
k k2�1=2�
T�2P
t
Ps
�Pi
Pp xitpxisp
�2�1=2= Op
�C�2mNT
�By Lemma A.1 (iii) and Lemma A.2.
II = (mNT )�1P
i vec (xi)0�Im
�fk � fHk
�D�1k H
k0f 0�vec (xi)
= (mNT )�1P
i
Pp
Pt
Ps
�fkt � ftHk
�D�1k H
k0f 0sxitpxisp
��T�1
Pt kfkt �Hk0ftk2
�1=2kD�1
k k2�T�1
Ps kHk0fsk2
�1=2��T�2
Pt
Ps
�Pi
Pp xitpxisp
�2�1=2= Op
�C�1mNT
�= III
IV = (mNT )�1P
i
Pp
Pt
Ps f
0tH
k�D�1k �D�1�Hk0fsxitpxisp
= kD�1k �D�1k
Pp
�(mN)�1
Pi
�T�1
Pt kf 0tHkk jxitpj
�2�= Op
�C�1mNT
�24
where the property kDk � Dk = Op�C�1mNT
�is proved in Bai and Ng. The
result in Lemma A.3 follows from combining I, II, III and IV. �Lemma A.4. For each k < r; there exists a � k > 0 such that
p lim infN;T!1
V�k; fHk
�� V (r; f) = � k (A.6)
Proof. Again, this Lemma is identical to Lemma 3 in Bai and Ng. We
prove a vector version of it. De�ne the T�1 vectors �i�p = (�i1p;�i2p; � � � ; �iTp)0
and ei�p = (ei1p; : : : ; eiTp)0 ;
V�k; fHk
�� V (r; f) = (mNT )�1
Pi vec (xi)
0 (Im (Pf � PfH)) vec (xi)
= (mNT )�1P
i
Pp
��0i�pf
0 + e0i�p�(Pf � PfH) (f�i�p + ei�p)
= (mNT )�1P
i
Pp �
0i�pf
0 (Pf � PfH) f�i�p
+2 (mNT )�1P
i
Pp e
0i�p (Pf � PfH) f�i�p
+(mNT )�1P
i
Pp e
0i�p (Pf � PfH) ei�p
= I + II + III
I > 0 and III > 0; the proof follows Bai and Ng. Consider the second
term II:
II =P
p
�2 (mNT )�1
Pi e0i�pPff�i�p � 2 (mNT )
�1Pi e0i�pPfHf�i�p
�and the �rst term in the brackets is��(mNT )�1Pi e
0i�pPff�i�p
�� =��(mNT )�1Pt ft
Pi eitp�ip
���
�T�1
Pt kftk2
�1=2(mN)�1=2
�T�1
Pt k (mN)
�1=2Pi eitp�ipk2
�1=2= Op
�(mN)�1=2
�by Lemma A.1 (ii) : The second term is of the same order. Thus II =
Op
�(mN)�1=2
�:�
25
Lemma A.5. For any k � r; V�k; fk
�� V
�r; f r
�= Op
�C�2mNT
�Proof. See Bai and Ng (2002).
Proof of Theorem 2.1. Follows the proof of Bai and Ng (2002) and
the results in Lemmas A.1 to A.5.�Lemma A.6. Under Assumptions A-E, we have
(i) T�1P
s~fs mN (sp; tq) = Op
�T�1=2C�1mNT
�(ii) T�1
Ps~fs�st = Op
�(mN)�1=2C�1mNT
�(iii)T�1
Ps~fs�st = Op
�(mN)�1=2
�(iv)T�1
Ps~fs�st = Op
�(mN)�1=2C�1mNT
�Proof. (i) follows by the decomposition
(i) = T�1P
s(~fs �H 0fs) mN (sp; tq) + T
�1H 0Ps fs mN (sp; tq)
and applying Cauchy-Schwarz inequality, Lemma A.2, Assumption A and
D.
(ii) - (iv) can be proved in a similar way.�Let VmNT be the r � r diagonal matrix of the �rst r eigenvalues of the
matrix (mNT )�1 xx0 and de�ne H = (�0�=mN)�f 0 ~f=T
�V �1mNT : Assuming
at this stage the number of factors, r; can be consistently estimated using
the result in Theorem 2.1, we can drop the superscript k in (A:2) write it as
~ft�H 0ft = V�1mNT
�T�1
Ps~fs mN (sp; tq) + T
�1Ps~fs�st + T
�1Ps~fs�st + T
�1Ps~fs�st
�(A.7)
The proofs of the distribution of factor loadings and common components
need other additional lemmas.
Lemma A.7. Under Assumptions A-E, we have
T�1�~f � fH
�0ei�p = Op
�C�2mNT
�
26
Proof. From (A:7) ; we have
T�1P
t
Pp
�~ft �H 0ft
�eitp = V �1mNT [T
�2Pt
Ps
Pp~fs mN (sp; tq) + T
�2Pt
Ps
Pp~fs�steitp
+T�2P
t
Ps
Pp~fs�steitp + T
�2Pt
Ps
Pp~fs�steitp]
= VmNT (I + II + III + IV )
Consider the term I
I = T�2P
t
Ps
Pp
�~fs �H 0fs
� mN (s; t) eitp + T
�2Pt
Ps
Pp fs mN (s; t) eitp
� T�1=2�T�1
Ps k ~fs �H 0fsk2
�1=2�T�2
Ps
�Pt mN (s; t)
Pp eitp
�2�1=2+T�2
Pt
Ps
Pp mN (s; t)
�Ekfsk2
�1=2 �E(e2itp)
�1=2=
= T�1=2Op�C�1mNT
�+O
�T�1
�= T�1=2Op
�C�1mNT
�The second term II
II = T�2P
t
Ps
Pp
�~fs �H 0fs
��steitp + T
�2Pt
Ps
Pp fs�steitp
��T�1
Ps k ~fs �H 0fsk2
�1=2�T�1
Ps
�T�1
Pt �st
Pp eitp
�2�1=2+Op
�(mNT )�1=2
�= Op
�(mN)�1=2C�1mNT
�For III
III = T�2P
t
Ps
Pp(~fs �H 0fs)�steitp + T
�2Pt
Ps
Pp fs�steitp
= III1 + III2
27
and
III1 ��T�1
Ps k ~fs �H 0fsk2
�1=2�T�3
Ps
�Pt �st
Pp eitp
�2�1=2= Op
�C�1mNT
��T�1
Ps
�T�1
Pt
Pq (mN)
�1Pj
Pq �jqejtq
�2�1=2= Op((mN)
�1=2C�1mNT )
III2 = H 0 �T�1Ps fsf0s
� �(mNT )�1
Pt
Pj
Pp
Pq �iqejtqeitp
�� H 0 �T�1Ps fsf
0s
�(mN)
Pj
Pp
Pq j� ij;pqj
= O�(mN)�1
�Hence III = Op((mN)
�1=2C�1mNT ) + O�(mN)�1
�: The order of IV can
be obtained similar to III: Thus I + II + III + IV = Op�C�2mNT
�:�
Lemma A.8. Under Assumptions A-E, we have
T�1�~f � fH
�0f = Op
�C�2mNT
�Proof. Rewrite it as in (A:2)
T�1P
t
�~ft �H 0ft
�f 0t = V �1mNT (T
�1Pt
Ps~fsf
0t mN (sp; tq) + T
�1Pt
Ps~fsf
0t�st
+T�1P
t
Ps~fsf
0t�st + T
�1Pt
Ps~fsf
0t�st)
= V �1mNT (I + II + III + IV )
The proof is then similar to the one in Lemma A.7 and is omitted here.�Proof of Theorem 2.2. Using the result in Lemma A.6, (A:7) can be
written as
pmN
�~ft �H 0ft
�= V �1mNTT
�1Ps
�~fsf
0s
�(mN)�1=2
Pi
Pp �ipeipt + op (1)
(A.8)
Under Assumption E3 and using Lemma A.3 in Bai (2003), the �rst term
on the r.h.s. of (A:7) is distributed as N(0; V �1Q�tQ0V �1):
28
Also note for ~�ip; we have the following identity,
~�ip �H�1�ip = T�1H 0f 0ei�p + T
�1 ~f 0�f � ~fH�1
��ip + T
�1�~f � fH
�0ei�p
(A.9)
Applying Lemma B.3 in Bai (2003) and Lemma A.7, together with As-
sumption E4, it�s easy to see thatpT (�ip �H�1�ip)
d! N (0; Q0�1�ipQ�1) :
Similarly, a small modi�cation of the proof in Appendix C in Bai (2003) gives
the distribution of the common factor in Theorem 2.2.�
29
References
Andrews,D.W.K.(2003), "Cross-Section Regression with Common Shocks",Cowles Foundation Discussion Paper.
Bai,J.(2003a), "Inference on factor models of large dimension", Economet-rica 71, 135-171.
Bai,J.(2003b), "Estimating cross-section common stochastic trends in non-stationary panel data", Journal of Econometrics, in press.
Bai,J. and S. Ng(2002), "Determine the number of factors in approximatefactor models", Econometrica 70, 91-221.
Bai,J. and S. Ng(2004), "A PANIC attack on unit roots and cointegration",Econometrica, forthcoming.
Bai,J. and C. Kao(2004), "On The Estimation And Inference Of A Coin-tegrated Regression In Panel Data With Cross-sectional Dependence",manuscript, Department of Economics, Syracuse University.
Binder,M., C.Hsiao and M.H.Pesaran(2003),"Estimation and Inference inShort Panel Vector Autoregressions with Unit Roots and Cointegra-tion", manuscript, Cambridge University.
Chang,Y.(2000),"Vector Autoregressions with UnknownMixtures of I(0),I(1),andI(2) Components", Econometric Theory 16, 905-926.
Forni,M., M.Hallin, M.Lippi and L.Reichlin(2000), "The Generalized Dynamic-Factor Model: Identi�cation and Estimation", Review of Economicsand Statistics, 82(4) 540-554.
Forni,M. and M.Lippi(2001), "The Generalized Dynamic Factor Model:Representation Theory", Econometric Theory, 17 1113-1141.
Galbraith, J.W., A. Ullah and V. Zinde-Walsh(2002),"Estimation of TheVector Moving Average Model by Vector Autoregression", EconometricReviews 21(2), 205-219.
Im,K. S. and M.H. Pesaran(2003), "On The Panel Unit Root Tests UsingNonlinear Instrumental Variables", working paper, Trinity College.
30
Johansen,S.(1988), "Statistical Analysis of Cointegration Vectors", Journalof Economic Dynamics and Control, 12 231-254.
Kauppi,H.(2004),"On the Robustness of Hypothesis Testing Based on FullyModi�ed Vector Autoregression when Some Roots Are Almost One",Econometric Theory 20, 341-359.
Moon,H.R. and B.Perron(2004), "Testing for a unit root in panels withdynamic factors", Journal of Econometrics, in press.
Mutl,Jan(2002), "Panel VAR with Spatial Dependence", manuscript,http://econpapers.hhs.se/cpd/2002/113_Mutl.pdf.
Ng,S.(2004),"Testing Cross-section Correlation in Panel Data Using Spac-ings", manuscript, Department of Economics University of Michigan.
Pesaran, M.H., Y. Shin and R. J. Smith (2002), "Structural analysis ofvector error correction models with exogenous I(1) variables", Journalof Econometrics, 97(2) 293-343.
Pesaran,M.H.(2002), "Estimation and Inference in Large Heterogenous Pan-els with Cross Section Dependence", working paper, Trinity College.
Pesaran,M.H.(2003), "A Simple Panel Unit Root Test In The Presence OfCross Section Dependence", working paper, Trinity College.
Pesaran,M.H.(2004), "General Diagnostic Tests for Cross Section Depen-dence in Panels", working paper, Trinity College.
Pesaran,M.H., T. Schuermann and S.M.Weiner(2004),"Modeling RegionalInterdependencies Using a Global Error-Correcting MacroeconometricModel", Journal of Business and Economic Statistics 22(2), 129-162.
Phillips,P.C.B. andM. Loretan(1991),"Estimating Long-run Economic Equi-libria", Review of Economic Studies 58, 407-436.
Phillips,P.C.B.(1995),"Fully Modi�ed Least Squares and Vector Autoregres-sion", Econometrica 63(5), 1023-1078.
Phillips,P.C.B. and D. Sul(2003a),"Dynamic Panel Estimation and Homo-geneity Testing under Cross Section Dependence", Econometrics Jour-nal 6(1), 217-259.
31
Phillips,P.C.B. and D. Sul(2003b),"Bias in Dynamic Panel Estimation withFixed E¤ects, Incidental Trends and Cross Section Dependence",CowlesFoundation Discussion Papers: 1438.
Stock,J. and M. Watson(1998), "Di¤usion Indexes", working paper 6702,NBER.
Stock,J. and M. Watson(2002a), "Forecasting Using Principal ComponentsFrom a Large Number of Predictors", Journal of the American Statis-tical Association, 97 1167-1179.
Stock,J. and M. Watson(2002b), "Macroeconomic Forecasting Using Di¤u-sion Indexes", Journal of Business and Economic Statistics, 20 147-162.
32
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3
100 40 5 5 5 5 5 5100 60 5 5 5 5 5 5200 60 5 5 5 5 5 5500 60 5 5 5 5 5 5100 100 5 5 5 5 5 5200 100 5 5 5 5 5 5500 100 5 5 5 5 5 540 100 5 5 5 5 5 560 100 5 5 5 5 5 560 200 5 5 5 5 5 510 50 5.355 5.002 5.002 5 5 9.86210 100 5 5 5 5 5 520 100 5 5 5 5 5 5
100 10 10 10 10 10 10 10100 20 5 5 5 5 5 5
Table 1 r = 5
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3
100 40 5.053 5.037 5.037 4.873 4.853 4.925100 60 5 5 5 5 5 5200 60 5.04 5.024 5.024 4.995 4.991 4.999500 60 5 5 5 4.997 4.997 4.997100 100 5.88 5.557 5.557 5.002 5 5,343200 100 5 5 5 5 5 5500 100 5 5 5 5 5 540 100 10 10 10 10 10 1060 100 9.426 8.773 8.773 8.23 6.606 1060 200 9.541 8.558 8.558 9.152 7.08 1010 50 10 9.996 9.996 10 9.491 1010 100 10 10 10 10 10 1020 100 8.32 6.717 6.717 5.46 5.005 10
100 10 10 10 10 10 10 10100 20 8.77 8.616 8.616 4.229 4.173 4.358
Table 2 r = 5 heterskedasticity
33
N T PCp1 PCp2 PCp3 ICp1 ICp2 ICp3
100 40 3 3 3 3 3 3100 60 3 3 3 3 3 3200 60 3 3 3 3 3 3500 60 3 3 3 3 3 3100 100 3 3 3 3 3 3200 100 3 3 3 3 3 3500 100 3 3 3 3 3 340 100 3 3 3 3 3 360 100 3 3 3 3 3 360 200 3 3 3 3 3 310 50 4.246 3.054 3.054 3 3 9.05210 100 3 3 3 3 3 320 100 3 3 3 3 3 3.001
100 10 10 10 10 10 10 10100 20 3.792 3.58 3.58 3 3 3
Table 3 r = 3 MA(1)
A11 = 0.5 A12 = 0.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.3394 -0.2785 -0.2598 0.0872 0.0691 0.0595(0.1477) (0.1017) (0.0713) (0.0456) (0.0289) (0.0187)
FM -0.0551 -0.0169 -0.0058 0.0072 0.0002 -0.0008(0.1165) (0.0785) (0.0546) (0.036) (0.0223) (0.0144)
UFM -0.0026 0.0361 0.0464 -0.011 -0.0153 -0.0142(0.1167) (0.0783) (0.0543) (0.0362) (0.0223) (0.0143)
AUG 0.0299 0.069 0.0798 0.004 -0.0079 -0.0116(0.1) (0.0693) (0.0471) (0.0299) (0.0188) (0.0121)
A21 = -0.5 A22 = 1.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.499 -0.4582 -0.453 0.09 0.0854 0.0871(0.1472) (0.1001) (0.0704) (0.0454) (0.0286) (0.0185)
FM 0.1769 0.2015 0.2048 -0.0466 -0.0471 -0.0453(0.1189) (0.0804) (0.056) (0.0366) (0.0228) (0.0147)
UFM 0.3679 0.3718 0.3637 -0.1027 -0.0932 -0.0849(0.1372) (0.0908) (0.0626) (0.0419) (0.0256) (0.0163)
AUG -0.1593 -0.1324 -0.1225 0.0346 0.0288 0.0266(0.0924) (0.0629) (0.0434) (0.0276) (-0.0174) (-0.0111)
Table 4 (DGP4)
Estimator
34
A11 = 0.5 A12 = 0.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.3516 -0.2834 -0.2624 0.0912 0.0708 0.0601(0.0619) (0.1099) (0.0765) (0.0617) (0.0381) (0.0241)
FM -0.1033 -0.063 -0.0535 0.0151 0.0089 0.0085(0.1311) (0.0873) (0.0603) (0.0501) (0.0303) (0.0191)
UFM -0.0796 -0.0399 -0.0298 0.0046 0.0001 0.0015(0.1315) (0.0873) (0.0602) (0.0504) (0.0304) (0.0191)
AUG -0.0824 -0.0429 -0.02938 0.0277 0.016 0.0113(0.0928) (0.0628) (0.0431) (0.0323) (0.012) (0.0126)
A21 = -0.5 A22 = 1.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.451 -0.3971 -0.3894 0.085 0.0755 0.0754(0.1623) (0.1095) (0.0761) (0.0617) (0.0379) (0.024)
FM 0.0811 0.1087 0.1096 -0.035 -0.0337 -0.0296(0.1305) (0.0869) (0.0602) (0.0497) (0.0301) (0.019)
UFM 0.1805 0.1921 0.1883 -0.0682 -0.0591 -0.051(0.1404) (0.0923) (0.0637) (0.0526) (0.0316) (0.0199)
AUG -0.1932 -0.1635 -0.154 0.0409 0.0353 0.0331(0.0851) (0.0574) (0.0395) (0.0297) (0.0184) (0.0115)
Table 5 (DGP5)
Estimator
A11 = 0.5 A12 = 0.1T = 50* T = 100 T = 200 T = 50* T = 100 T = 200
IV -0.3766 -0.2505 -0.2286 0.1014 0.0655 0.0539(0.1699) (0.1121) (0.0776) (0.0675) (0.0425) (0.0267)
FM -0.0862 -0.0084 0.00004 0.0218 -0.0039 -0.0035(0.1309) (0.084) (0.0577) (0.052) (0.0318) (0.0199)
UFM -0.0761 0.0132 0.0218 0.0142 -0.0122 -0.0103(0.1315) (0.0841) (0.0577) (0.0524) (0.0319) (0.0199)
AUG -0.1087 -0.0193 -0.0054 0.0401 0.0112 0.0064(0.0869) (0.0574) (0.0394) (0.0314) (0.0193) (0.0121)
A21 = -0.5 A22 = 1.1T = 50* T = 100 T = 200 T = 50* T = 100 T = 200
IV -0.4522 -0.3435 -0.3353 0.0937 0.0665 0.0654(0.17) (0.1121) (0.0775) (0.0677) (0.0425) (0.0267)
FM 0.0942 0.1403 0.1411 -0.0298 -0.0429 -0.0379(0.1288) (0.084) (0.0578) (0.0512) (0.0316) (0.0198)
UFM 0.1489 0.2147 0.2113 -0.0513 -0.0662 -0.0573(0.1342) (0.0888) (0.0609) (0.0529) (0.033) (0.0206)
AUG -0.1718 -0.1206 -0.1113 0.00425 0.0267 0.0245(0.0802) (0.0531) (0.0365) (0.0292) (0.018) (0.0112)
Table 6 (DGP 6)
Estimator
35
A11 = 0.5 A12 = 0.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2458 -0.1986 -0.1940 0.0209 0.0188 0.0280(0.2055) (0.1342) (0.0904) (0.1003) (0.0580) (0.0344)
FM -0.2101 -0.1715 -0.1568 0.0415 0.0376 0.0323(0.1696) (0.1088) (0.0728) (0.0825) (0.0470) (0.0278)
UFM -0.0091 -0.1832 -0.1653 0.0461 0.0404 0.0344(0.1708) (0.1093) (0.0730) (0.0832) (0.0472) (0.0279)
AUG -0.0419 -0.0153 -0.0070 -0.0121 -0.0072 -0.0047(0.1165) (0.0763) (0.0511) (0.052) (0.0303) (0.0179)
A21 = -0.5 A22 = 1.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.0625 -0.0449 -0.0464 -0.0523 -0.0309 -0.0142(0.2540) (0.1654) (0.1113) (0.1195) (0.0694) (0.0411)
FM -0.1917 -0.1689 -0.1668 0.0059 0.0178 0.0228(0.2027) (0.1296) (0.0868) (0.0955) (0.0545) (0.0322)
UFM -0.2078 -0.1740 -0.1680 0.0056 0.0165 0.0220(0.2045) (0.1300) (0.0869) (0.0963) (0.0547) (0.0323)
AUG 0.0873 0.0939 0.0941 -0.0533 -0.0374 -0.0294(0.1487) (0.0972) (0.0649) (0.0652) (0.0381) (0.0245)
Tables 7 (DGP 7)
Estimator
A11 = 0.5 A12 = 0.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2192 -0.1663 -0.1626 0.0161 0.0117 0.0215(0.2128) (0.1368) (0.0914) (0.1135) (0.0649) (0.0383)
FM -0.1552 -0.1135 -0.0983 0.0340 0.0288 0.0218(0.1664) (0.1049) (0.0696) (0.0879) (0.0494) (0.0290)
UFM -0.1761 -0.1266 -0.1080 0.0399 0.0323 0.0243(0.1679) (0.1055) (0.0699) (0.0888) (0.0497) (0.0292)
AUG -0.0263 0.0011 0.0097 -0.0151 -0.0096 -0.0079(0.1079) (0.0694) (0.0463) (0.0509) (0.0293) (0.0171)
A21 = -0.5 A22 = 1.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.0570 -0.0335 -0.0360 -0.0525 -0.0338 -0.0164(0.2552) (0.1636) (0.1095) (0.1310) (0.0752) (0.0443)
FM -0.1314 -0.1053 -0.1035 -0.0033 0.0073 0.0110(0.1955) (0.1228) (0.0818) (0.0996) (0.0562) (0.0330)
UFM -0.1515 -0.1143 -0.1083 -0.0015 0.0076 0.0113(0.1975) (0.1234) (0.0820) (0.1007) 0.0565) (0.0331)
AUG 0.0841 0.0927 0.0940 -0.0522 -0.0362 -0.0292(0.1380) (0.0887) (0.0591) (0.0639) (0.0369) (0.0216)
Table 8 (DGP 8)
Estimator
36
A11 = 0.5 A12 = 0.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2979 -0.2216 -0.2043 0.0779 0.0542 0.0475(-0.2071) (0.1370) (0.0914) (-0.122) (0.0759) (0.0452)
FM -0.0812 -0.0316 -0.0211 0.0122 0.0020 0.0004(-0.1573) (0.1016) (0.0674) (-0.0919) (0.0560) (0.0333)
UFM -0.0929 -0.0424 -0.0324 0.011 0.0024 0.0015(-0.1585) (0.1022) (0.0678) (-0.0926) (0.0563) (0.0335)
AUG -0.0873 -0.0326 -0.0176 0.0263 0.0114 0.0069(-0.097) (0.0613) (0.0418) (-0.0503) (0.0304) (0.0178)
A21 = -0.5 A22 = 1.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2855 -0.2196 -0.2109 0.0530 0.0383 0.0396(-0.2117) (0.1393) (0.0928) (0.1238) (0.0768) (0.0457)
FM 0.0027 0.0440 0.0467 -0.0264 -0.0286 -0.0229(-0.1556) (0.0999) (0.0662) (0.0911) (0.0553) (0.0328)
UFM -0.0039 0.0331 0.0330 -0.0313 -0.0295 -0.0220(-0.1571) (0.1006) (0.0666) (0.0919) (0.0556) (0.0330)
AUG -0.0737 -0.0296 -0.0181 0.0137 0.0036 0.0028(0.0954) (0.0036) (0.0411) (0.0495) (0.0299) (0.0176)
Table 9 ( DGP 9)
Estimator
A11 = 0.5 A12 = 0.1T = 50* T = 100 T = 200 T = 50* T = 100 T = 200
IV -0.3659 -0.2295 -0.2055 0.1135 0.0605 0.0487(0.2120) (0.1365) 0.0912 (0.1222) (0.0745) (0.0450)
FM -0.0957 -0.0355 -0.0220 0.0268 0.0028 0.0012(0.1611) (0.1011) 0.0672 (0.0922) (0.0551) (0.0331)
UFM -0.1100 -0.0470 -0.0334 0.0285 0.0035 0.0024(0.1622) (0.1017) 0.0676 (0.0929) (0.0554) (0.0333)
AUG -0.0965 -0.0368 -0.0204 0.0443 0.0114 0.0071(0.1040) (0.0625) 0.0414 (0.0518) (0.0299) (0.0177)
A21 = -0.5 A22 = 1.1T = 50* T = 100 T = 200 T = 50* T = 100 T = 200
IV -0.3751 -0.2311 -0.2128 0.0959 0.0458 0.0406(0.2162) (0.1385) (0.0924) (0.1238) (0.0754) (0.0454)
FM 0.0062 0.0384 0.0432 -0.0182 -0.0275 -0.0213(0.1581) (0.0995) (0.0660) (0.0908) (0.0544) (0.0326)
UFM -0.0033 0.0267 0.0297 -0.0197 -0.0280 -0.0205(0.1594) (0.1002) (0.0664) (0.0916) (0.0547) (0.0328)
AUG -0.0734 -0.0280 -0.0209 0.0273 0.0032 0.0030(0.1027) (0.0547) (0.0407) (0.0512) (0.0294) (0.0174)
Table 10 (DGP 10)
Estimator
37
A11 = 0.5 A12 = 0.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2991 -0.2276 -0.2100 0.0762 0.0548 0.0484(0.1758) (0.1185) (0.0813) (0.0779) ( 0.0487) (0.0295)
FM -0.0717 -0.0252 -0.0143 0.0099 -0.0002 -0.0002(0.1340) ( 0.0882) (0.0601) (0.0590) ( 0.0361) (0.0218)
UFM -0.0815 -0.0345 -0.0245 0.0083 -0.0003 0.0006( 0.1350) ( 0.0886) (0.0604) (0.0594) (0.0363) (0.0219)
AUG -0.0140 0.0293 0.0435 0.0091 -0.0033 -0.0061( 0.1067 ) (0.0707) (0.0479) ( 0.0441) (0.0270) (0.0161 )
A21 = -0.5 A22 = 1.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2969 -0.2364 -0.2289 0.0527 0.0408 0.0429(0.1827) (0.1221) (0.0837) (0.0804) ( 0.0500 ) (0.0302 )
FM 0.0478 0.0856 0.0904 0.0349 -0.0368 -0.0304(0.1316) ( 0.0858 ) (0.0584) ( 0.0579) ( 0.0352 ) (0.0212)
UFM 0.0458 0.0768 0.0770 -0.0413 -0.0388 -0.0299(0.1332) ( 0.0864) (0.0587) (0.0586) ( 0.0354) (0.0213)
AUG -0.0071 0.0276 0.0384 -0.0026 -0.0100 -0.0092( 0.1048 ) ( 0.0694) (0.0470) ( 0.0433 ) (0.0265 ) (0.0158)
Estimator
Table 11 (DGP 11)
A11 = 0.5 A12=0.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2962 -0.2303 -0.2100 0.0736 0.0581 0.0490(0.1783) (0.1182) (0.0811) (0.0832) (0.0478) (0.0293)
FM -0.0647 -0.0288 -0.0124 0.0037 0.0009 -0.0007(0.1361) (0.0881) (0.0599) (0.0631) (0.0355) (0.0216)
UFM -0.0733 -0.0388 -0.0224 0.0006 0.0011 0.0001(0.1370) (0.0886) (0.0602) (0.0636) (0.0357) (0.0217)
AUG -0.0142 0.0254 0.0445 0.0066 -0.0023 -0.0069(0.1077) (0.0706) (0.0476) (0.0465) (0.0267) (0.0161)
A21 = -0.5 A22 = 1.1T = 50 T = 100 T = 200 T = 50 T = 100 T = 200
IV -0.2908 -0.2440 -0.2297 0.0485 0.0454 0.0433(0.1842) (0.1219) (0.0832) (0.0854) (0.0491) (0.0300)
FM 0.0581 0.0814 0.0902 -0.0449 -0.0357 -0.0301(0.1332) (0.0858) (0.0581) (0.0618) (0.0346) (0.0210)
UFM 0.0594 0.0717 0.0769 -0.0534 -0.0373 -0.0296(0.1349) (0.0863) (0.0584) (0.0626) (0.0348) (0.0211)
AUG -0.0084 0.0233 0.0395 -0.0069 -0.0093 -0.0100(0.1057) (0.0693) (0.0467) (0.0456) (0.0262) (0.0157)
Estimator
Table 12 (DGP 12)
38