Cura Apostolica & Cura Personalis Jasminne Mendez & Jeff Hausman.
The Hausman test in dynamic panel model326974/...I propose a Hausman test in dynamic panel model....
Transcript of The Hausman test in dynamic panel model326974/...I propose a Hausman test in dynamic panel model....
1
The Hausman test in dynamic panel model
Author: Mengque Liu
Supervisor: Johan Lyhagen
Master thesis in Statistics
Faculty of Statistics
Uppsala University, Sweden
May, 2010
Abstract
I propose a Hausman test in dynamic panel model. The aim of the test is to detect whether there existfixed effects in the dynamic model. It is based on the comparison of the PGMM estimator and theinstrumental variables estimator which is consistent only under the null hypothesis. A Monte Carlosimulation is presented to compare the finite sample properties of the Hausman test in the dynamic modelto that in individual-specific effects model.
Key words: dynamic model, Hausman test, PGMM estimator, instrumental variables estimator
2
1. Introduction
In the study of panel data, unobserved individual heterogeneity correlated with the regressors is an
important issue. Hausman [1978] proposes the Hausman specification test to detect fixed effects in the
individual-specific effects model. The Hausman test in the dynamic model is my interest. This essay aims
to compare the Hausman test in the individual-specific effects model with that in the dynamic model. And
our focus is on the panel with a large number of individuals but a short period of time.
For the dynamic model which includes a lagged dependent variable as the regressor, the standard panel
estimators would all be inconsistent.
Anderson and Hsiao [1981] suggested a simple instrumental variable method to estimate the first-
differenced dynamic panel data model. The lagged values of the dependent variable become valid
instruments. The instrumental variable estimator is consistent though it is less efficient. Arellano-Bond
[1991] proposed the generalized method of moments (GMM)1 to estimate the first-differenced dynamic
panel data model which is called the difference GMM. The generalized method of moments is based on the
sample analog of the population moment conditions. It allows us to use wider unbalanced instruments sets
which will lead to a more efficient estimator. Holtz-Eakin, Newey and Rosen [1988] made some similar
study on that estimation. All the estimators which they proposed are consistent when fixed effects are
present.
In the thesis I suggest an instrumental variable estimator which is only consistent when the individual
effects are random. I hope to perform the Hausman test to detect the fixed effects in the dynamic model
based on the comparison between that instrumental variable estimator and the panel generalized method of
moments estimator which Arellano-Bond [1991] proposed.
In addition, I proceed to the Monte Carlo study to see if the Hausman test is less efficient in the dynamic
model than in the individual-specific effects model.
The organization of the essay is as follows: Section 2 introduces panel models including random effects
model, fixed effects model and dynamic model. Section 3 derives the Hausman tests in different kinds of
models. Section 4 presents a Monte Carlo study to explore the small sample properties of the tests. Section
5shows an empirical example. Section 6 conclusions.
1 The generalized method of moments is developed by Lars Peter Hansen.
3
2. Individual-specific effects model and dynamic model
2.1 Individual-specific effects model
Panel data can not only offer us the information across different individuals, but also the information for a
given individual across the time. The individual-specific effects model in which the intercept coefficients
vary over individual and slopes remain the same is a kind of classical regression models to present the
special feature of panel data.
, 1...... , 1......it i it ity x i N t T (2.1)
Where it is iid over i and t. The random variables i are unobserved individual effects. The assumption
of strong exogeneity is made in our discussion about the individual-specific effects model.
1[ | , ,.... ] 0, 1,...,it i i iTE x x t T
2.1.1 Random effects model and random effects estimator
There is a variant of the model (2.1) which is called random effects model. ‘… the unobservable individual
effects i are random variables that are distributed independently of the regressors. This model is called
random effects model, which usually makes the additional assumptions that
2
2
~ [ , ]
~ [0, ]i
it
(2.2)
So that both the random effects and the error term in (2.1) are assumed to be iid.’(Cameron and Trivedi,
2005, p.700)
There exists serial correlation in ,i ty in the random effects model, though both the random effects and error
term are iid. If we set =0, so that it i ity .
1 1
1
1
2
2 2
[ , ] [ , ]
cov( , )var( ) var( )
it it i it i it
i it i it
i it i it
Cor y y Cor
For the errors of the random effects model are correlated over time for a given individual and
heteroskedastic, the feasible GLS estimator is efficient and consistent under the random effects assumption.
The random effects estimator is the feasible GLS estimator of the RE model. The random effects estimator
is fully efficient under the assumptions (2.2).
4
The random effects estimator can be obtained from the OLS estimation of this transformed model
‘… 2 (2.3)
Where is asymptotically iid, and ̂ is consistent for
2 21
T
Note that ̂ =0 corresponds to pooled OLS, ̂ =1 corresponds to within estimation, and ̂ 1 as
T , This is a two-step estimator of .’(Cameron and Trivedi, 2005, p.700)
When the fixed effects occur, the unobservable individual effects are correlated with the observed
regressors, so the errors are still correlated with the ˆit ix x , random effects estimator is inconsistent.
2.1.2 Fixed effects model and fixed effects estimator
If the individual-specific effects are correlated with the observed regressors, the model (2.1) is called fixed
effects model which is another variant of model (2.1).3
The random effects estimator fails to yield the consistent estimates of under the fixed effects model.
The within or fixed estimator make use of individual-specific deviations over time to obtain the consistent
estimator in the fixed effects model. Because that we get rid of the individual effects term in the process of
getting the individual-specific deviations over time.
The within estimator can be obtained from the OLS estimation of this transformed model:4
'( ) ( ),it i it i it iy y x x i=1,… … ,N, t= 1,… ..,T (2.4)
2.2 Dynamic model
If a lagged term for the dependent variable is included in the usual individual-specific effects panel data
model, the model is a dynamic model.
'1it it it i ity y x 1,...., , 1,...,i N t T (2.5)
2 See Cameron and Trivedi, 2005, p.734~736 for the derivation of (2.3) and ways to estimate 2 ,
2 and
3 Cameron and Trivedi, 2005, p.7004 Cameron and Trivedi, 2005, p.704
5
The time series correlation in ,i ty now is different with that in the individual specific effects model.
Besides the indirect effect through i , the past value of dependent variable , 1i ty directly induces the serial
correlation.
‘… let 0 , so that , 1it i t i ity y .
1 , 1 1
1
2 2
[ , ] [ , ]
[ , ]
(1 )1 (1 ) / (1 )
it it i t i it it
i it
Cor y y Cor y y
Cor y
The result makes it clear that there are two possible reasons for correlation between ,i ty and , 1i ty .’
(Cameron and Trivedi, 2005, p.763)
2.2.1 Instrumental variables estimator
If i is a random effect, the dynamic model can be treated as adding a dependent variable lagged once to a
random effects model.
OLS estimation of (2.5) will lead to inconsistent estimation of and . This is because the regressor
, 1i ty is correlated with i and hence with the composite error term ( i it ).
To obtain a consistent estimator in this situation, we consider using the instrumental variables estimator
with , 1 , 2( )i t i ty y as an instrument for , 1i ty . This is a valid instrument, since i is cancelled
in , 1 , 2( )i t i ty y , , 1 , 2( )i t i ty y is not correlated with ( i it ). Furthermore, , 1 , 2( )i t i ty y is
correlated with , 1i ty .
The model (2.5) is just-identified with , 1 , 2( )i t i ty y as an instrument for , 1i ty .The instrumental
variables estimator is
' 1 'ˆ ( )IV Z X Z y
In this model Z is a ( 1)N T K matrix with ' ', 1 , 2[ , ]it i t i t itz y y x .
This instrumental variables estimator is consistent in the dynamic model with random individual effects,
but it is inconsistent when the fixed individual effects occur.
2.2.2 Panel GMM estimator
6
If i is a fixed effect, the dynamic model can be treated as adding a dependent variable lagged once to a
fixed effects model. Model (2.5) is first differenced to get rid of the individual effect:
'1 1 2 1 1( ) ( ) ( )it it it it it it it ity y y y x x , 2,...,t T (2.6)
In this thesis, our focus is on the short panel data with a few time periods and a large number of individuals. If
it is a long panel data with large T, the impact of one year’s shock on the individual’s fixed effects will decline
with time (see Roodman, 2006). On the contrary, the correlation of the lagged dependent variable with the
error term is significant, in short panels. , 1 , 2( )i t i ty y is correlated with 1( )it it in (2.6). The OLS
estimation of (2.6) will fail to provide the consistent estimators.
Anderson and Hsiao (1981) proposed using , 2i ty or , 2 , 3( )i t i ty y as instruments for , 1 , 2( )i t i ty y . If
the successive observations are positively correlated, the instrumental variable estimator with , 2i ty as
instrument is less efficient than that with , 2 , 3( )i t i ty y as instrument. This instrumental variable
estimator is consistent though it is less efficient. Holtz-Eakin et al. (1988) and Arellano-Bond (1991) proposed
the generalized method of moments (GMM) to estimate the model (2.6).
The basic idea of the GMM 5estimator is that the population moment conditions can be replaced by the sample
moment conditions. The property of the instruments for regressors implies that the moment conditions of the
errors with the instruments are equal to zero. Based on the analog principle, the sample moment conditions are
used to estimate the parameters. Each moment condition is for each instrument. In the dynamic panel model,
the lags of the dependent variable can be valid instruments. Thus there are more moment conditions than
regressors, the model is over-identified. In the generalized method of moments, the parameters are estimated
by minimizing this quadratic form:
' ' '
1 1
( ) [ ] [ ]N N
N i i N i ii i
Q Z u W Z u
6
Where 'iZ denotes a matrix of instruments, NW denotes a weighting matrix. i i iu y X .
For the using of regressors in other periods as instruments for the current period regressors, the panel
generalized method of moments (PGMM) estimation could be more efficient.
According to the proposition of Holtz-Eakin et al. (1988) and Arellano-Bond (1991), we estimate the model
(2.6) with PGMM estimator using , 2i ty , , 3i ty as instruments for , 1 , 2( )i t i ty y and , , 1( )i t i tx x as an
instrument for itself. And this PGMM estimator is consistent in dynamic model with fixed individual effects.
In the dynamic model (2.6), the PGMM estimator is
5 The generalized method of moments is developed by Lars Peter Hansen. For the introduction to GMM,see Lars Peter Hansen (1982): Large Sample Properties of Generalized Method of Moments Estimators,Econometrica 50, 1029-1054.6 See Cameron and Trivedi, 2005, p.745
7
~ ~ ~ ~' ' ' 1 ' '
1 1 1 1
ˆ [( ) ( )] ( ) ( )N N N N
i i iPGMM i N i i N i ii i i i
X Z W Z X X Z W Z y
7
Where~
'iX is (T-2)×(K+1) matrix with t th row '
, 1( , )i t ity x , t=3,… ,T,~
iy is a (T-2)×1 vector with
t th row ,i ty , and iZ is a (T-2)×r matrix of instruments
'3
'4
'
0 ... 00 ... .. .. ... 00 .. 0
i
ii
iT
zz
Z
z
Where ' ', 2 , 3[ , , ]it i t i t itz y y x .
3. Hausman test
The Hausman tests are based on comparisons between two different estimators.
‘… Consider two estimators ̂ and~
.We consider the testing situation where
~
0
~
1
ˆ: lim( ) 0,
ˆ: lim( ) 0
H p
H p
Assume the difference between the two root-N consistent estimators is also root-N consistent under
0H with mean 0 and a limit normal distribution, so that
~ˆ( ) [0, ]
d
HN V
Where denotes the variance matrix in the limiting distribution. Then the Hausman test statistic
is asymptotically 2 ( )q distributed under 0H . We reject 0H at level if 2 ( )H q .’
(Cameron and Trivedi ,2005, p.271~272)
The Hausman test is a classical test to detect whether the model (2.1) is with fixed effect or not. The test is
based on the comparison between the random effects estimator and the within estimator in the individual-
specific effects model. If the statistically significant difference between these two estimators occurs, we can
draw the conclusion that the model (2.1) is with fixed effects.
7 See Cameron and Trivedi, 2005, p.765
8
It is similar in the dynamic models. But the comparison is between the panel GMM estimator and
instrumental variable estimator.
In the individual-specific effects model which is described in section 2, the random effects estimator under
the null hypothesis is fully efficient. Then the Hausman test statistic simplifies to
8
This statistic is asymptotically 2 ( )q distributed under the null hypothesis.
In the dynamic model, neither of the panel GMM estimator and instrumental variable estimator are fully
efficient estimators. The simplified form of HV can not be used and it should be replaced by the general
form. So we have to seek the approach of finding the consistent estimate of HV . Under the assumption that
the observations are independent over i this variance matrix can be consistently estimated through the
bootstrap method.
A panel-robust Hausman test statistic is
9
Similarly, if the statistically significant difference between the PGMM estimator and instrumental variables
estimator occurs, we can draw the conclusion that fixed effects are present in dynamic model.
Hausman tests in individual-specific effects model and dynamic model are evaluated and compared through
their probability of making mistakes. 10 There are two types of mistakes the tests would make. If the fixed
effects are not present, but the Hausman test incorrectly to reject the null hypothesis, then Type I error
occurs. If the fixed effects are present, but the Hausman test accepts the null hypothesis, then Type II error
occurs. The Hausman test in the model which has a larger probability of making mistakes is less efficient
than the other one.
4. Monte Carlo Study
Small sample properties of the Hausman test statistic can be obtained by performing a Monte Carlo study.
Firstly, we will study the Hausman test in the individual-specific effects model and then the dynamic model.
At last, we will compare the small sample properties in the static and dynamic models.
The Hausman test is
0 :H the individual-specific effects are uncorrelated with regressors
1 :H fixed effects are present
8 See Cameron and Trivedi , 2005, p.7189 See Cameron and Trivedi , 2005, p.718 for the panel robust Hausman test statistic10 George Casella, Roger L. Berger, Statistical Inference, 2nd Edition, p.382, 383
9
4.1 The Hausman test in the individual-specific effects model
The data ( , )it ity x under the null hypothesis are generated according to the random effects model:
, 1...... , 1......it i it ity x i N t T
Where 2 2 2 2~ ( , ), ~ (0, ), ~ ( , ), ~ ( , )i it it i iN N x N N
set 1 , 2 2 2 21, 1, 1, 1 , 0, 0 , N=100,T=7
Figure 1 shows there exists time-series correlation in ity . While Figure 2 shows that there does not exist
time series correlation in the individual-specific deviations of ity . Thus we can see that the time series
correlation in ity comes from the individual effect.
Figure 1 Autocorrelation function of ity in the
individual-specific effects model
Figure 2 Autocorrelation function of theindividual-specific deviations of ity fromits time-averaged values in the individual-specific effects model
10
We fit the individual-specific effects model to the data and get the random effects estimator and fixed
effects estimator of . Then we get the Hausman test statistic:
under the null hypothesis.
Figure 3 gives the density for 500 computed values of Hausman test statistic. The statistic is expected to
follow a chi-square distribution with 1 degree of freedom.
The true size of the test statistic is the proportion of the 500 observations in which 2 (1)H . The size-
correlated critical values are also obtained in the Monte Carlo study. For example, the upper 5 percentile of
the 500 simulated values of Hausman test statistic is the size-correlated critical value for 0.05 significance
level.
Consider the power of the Hausman test under the alternative hypothesis, and we assume that the individual
effects depend on the mean of the regressor itx which is a particular specification of the possible model
under the alternative hypothesis. In the data-generating process, the individual effects are generated in this
model: 2, ~ (0, )it
ti i i
xN
T
. Then a new set of 500 observations is produced. The power is
the proportion of this 500 observations in which 2 (1)H .
The simulation result is presented in the Table 1.
Figure 3: Density for 500 computed values of Hausman test statisticin the individual-specific effects model
11
Table1. Hausman test size and power in the individual-specific effects model for 500 observations.
Nominal Size Actual size Actual power Size-corrected critical value
0.01 0.018 1 8.617997
0.025 0.044 1 5.876021
0.05 0.072 1 4.905718
0.1 0.118 1 3.10286
If we increase the simulation size to 1000.The simulation result is presented in Table 2.
Table1. Hausman test size and power in the individual-specific effects model for 1000 observations.
Nominal Size Actual size Actual power Size-corrected critical value
0.01 0.01 1 6.615710
0.025 0.028 1 5.080407
0.05 0.052 1 3.885145
0.1 0.092 1 2.616923
The asymptotic result is pretty well, though the simulation size is not very big. The actual size of the test in
the individual-specific effects model is a little lager than the nominal size. The power is ideal which means
the Hausman test is powerful for detecting the existence of the correlation between the mean of the
regressor and the dependent variable in the individual-specific effects model.
4.2 The Hausman test in dynamic model
The data ( , )it ity x under the null hypothesis are generated according to the random effects model:
'1it it it i ity y x , i=1,… .N, t=1,… … ,T.
Where 2 2 2 2~ ( , ), ~ (0, ), ~ ( , ), ~ ( , )i it it i iN N x N N , 10.7, i iy
set 1 , 2 2 2 21, 1, 1, 1 , 0, 0 , N=100,T=7
Figure 4 shows there exists time series correlation in ity . While Figure 5 shows that there still exists time
series correlation in the individual-specific deviations of ity . Thus we can see that the time series
correlation in ity is induced by the past value and the individual effect.
12
We fit the dynamic model to the data and get the instrumental variables estimators and Panel GMM
estimators of and . Because neither of these two estimators is fully efficient in dynamic models, we
can not use the simplified form of covariance matrix. We generate 500 bootstrap replicates to estimate the
variance matrix consistently. Then we get the Hausman test statistic.
under the null hypothesis.
Figure 6 gives the density for 500 computed values of Hausman test statistic. The statistic is expected to
follow a chi-square distribution with 2 degree of freedom.
Figure 4 Autocorrelation function of ityin the dynamic model
Figure 5 Autocorrelation function of theindividual-specific deviations of ity fromits time-averaged values in the dynamicmodel
13
The true size of the test statistic is the proportion of the 500 observations in which 2 (2)H . The size-
corrected critical values are also obtained in the Monte Carlo study. For example, the upper 5 percentile of
the 500 simulated values of Hausman test statistic is the size-corrected critical value for 0.05 significance
level.
Consider the power of the Hausman test under the alternative hypothesis, and we assume that the individual
effects depend on the mean of the regressor itx which is a particular specification of the possible model
under the alternative hypothesis. In the data-generating process, the individual effects are generated in this
model: 2, ~ (0, )it
ti i i
xN
T
and 1 2i iy . Then a new set of 500 observations is produced.
The power is the proportion of this 500 observations in which 2 (2)H .
Figure 6: Density for 500 computed values of Hausman test statisticin the individual-specific effects model
14
The simulation result is presented in the Table 3.
Table 3 Hausman test size and power in the dynamic model for 500 observations
Nominal Size Actual size Actual power Size-corrected critical value
0.01 0.008 1 8.677593
0.025 0.024 1 7.297520
0.05 0.064 1 6.337962
If we increase the simulation size to 1000.The simulation result is presented in Table 4.
Table 4 Hausman test size and power in the dynamic model for 1000 observations
Nominal Size Actual size Actual power Size-corrected critical value
0.01 0.129 1 23.11916
0.025 0.195 1 17.69622
0.05 0.254 1 14.64198
The asymptotic result is not very well. The actual size of the test in the individual-specific effects model is
lager than the nominal size. The power is ideal which means the Hausman test is powerful for detecting the
existence of the correlation between the mean of the regressor and the dependent variable in the individual-
specific effects model.
Compare with the Hausman test in the individual-specific effects model, the type I error is larger in
dynamic models. The Hausman test in dynamic model has lager probability of making mistakes. Thus the
test in the dynamic model is less efficient.
5. Empirical example
The following example is from Arellano and Bond (1991). The dataset “Employment and Wage in
England”is a panel of 140 observations from 1976 to 1984. The number of observations is 1031. The
individual is firms in the United-Kingdom. It is an unbalanced panel with n=140, t=7-9.
15
Figure 7 and Figure 8 shows there still exists time series correlation in the individual-specific deviations of
employment from its time-averaged values. We can conclude that the time-series correlation in
employment is not only due to the individual specific tendency but also induced by the past employment.
Thus we can fit a dynamic model to the data.
The dynamic model is
1log( ) log( ) log( ) log( ) log( )it it it it it i itemp emp wage capital output
1,...., , 1,...,i N t T (5.1)
Our aim is to find whether there exist fixed effects in the dynamic model. Apply the Hausman test for
dynamic model which is described in Section 3.
The Hausman test statistic is 13.23045. The critical values at 0.05 significance level of chi-square for df=4
is 9.49. So we can reject the null hypothesis and draw a conclusion that the fixed effects are present in the
dynamic model (5.1).
Figure 7: Autocorrelation functionof employment
Figure 8: Autocorrelation function ofthe individual-specific deviationsof employment from its time-averaged
16
6. Conclusion
I propose a Hausman test in the dynamic model. It is based on the comparison of the PGMM estimator
and the instrumental variables estimator which is consistent only under the null hypothesis. A Monte Carlo
simulation is presented to compare the finite sample properties of the Hausman test in the dynamic model
to that in individual-specific effects model. And I find out that the Hausman tests in both the individual-
specific effects model and dynamic model are powerful for detecting the existence of the correlation
between the mean of the regressor and the dependent variable. The Hausman test is less efficient in
dynamic model than in the individual-specific effects model.
17
Reference
[1]A. Colin Cameron and Pravin K. Trivedi, SUPPLEMENT MICROECONOMETRICS:METHODS AND APPLICATIONS, Cambridge University Press, New York, May 2005
[2]Arellano, M. and Bond, S. (1991), "Some Tests of Specification for Panel Data: Monte Carlo Evidenceand an Application to Employment Equations", The Review of Economic Studies, vol. 58(2), April 1991,pp.227–297.
[3] Anderson, T.W. and Hsiao, C., 1981. "Estimation of dynamic models with error components", Journalof the American Statistical Association 76, pp. 589–606
[4] Holtz-Eakin, Douglas & Newey, Whitney & Rosen, Harvey S, 1988. "Estimating VectorAutoregressions with Panel Data," Econometrica, Econometric Society, vol. 56(6), pages 1371-95
[5] George Casella, Roger L. Berger, Statistical Inference, 2nd Edition p.382, 383
[6] Arellano, M., and S. R. Bond (1991). "Some Tests of Specification for Panel Data: Monte CarloEvidence and an Application to Employment Equations, " Review of Economic Studies, 58, 277–297.
[7] David Roodman, 2006. "How to Do xtabond2: An Introduction to "Difference" and"System" GMM in Stata," Working Papers 103, Center for Global Development.
[8] Hausman, J.A. (1978). "Specification Tests in Econometrics", Econometrica, 46 (6), 1251–1271.
[9] Lars Peter Hansen (1982). "Large Sample Properties of Generalized Method of Moments Estimators",Econometrica 50, 1029-1054.
18
Appendix
R code for the Monte Carlo study of dynamic model:library(plm)library(sem)library(boot)y=c()H=c()set.seed=127for(k in 1:500){############### t=7N=100t=7p=0.7mu=rnorm(N,5,1)id=c(1:N)T=c(1:t)autodatarandom=c()for (i in 1:N){x=rnorm(t,mu[i],1)Epsilon=rnorm(t,0,1)ideffect=rnorm(1,0,1)#############for (j in 1:t){if (j==1) y[j]=mu[i] else{y[j]=p*y[j-1]+ideffect+x[j]+Epsilon[j]}########################end ifID=rep(i,t)}###########################end for jdata=data.frame(cbind(ID,T,y,x))autodatarandom=rbind(autodatarandom,data)}#################################end for iautorandom=plm(y~lag(y,1)+lag(y,2)+x-1,data=autodatarandom,effect="individual",index=c("ID","T"))ytry=autorandom$model[,1]y_1tr=autorandom$model[,2]xtry=autorandom$model[,4]ivtry=y_1tr-autorandom$model[,3]t=tsls(ytry~y_1tr+xtry-1,~xtry+ivtry)coe=t$coefficientsautofixed=pgmm(y~lag(y,1)+x | lag(y,2:3),data=autodatarandom,effect="twoways",model="twosteps")coeff=as.matrix(autofixed$coefficients[[2]][1:2])###############################################bootstrapr=500d1=c()d2=c()for(i in 1:r){sample=sample(1:100,replace = TRUE)databoot=data.frame()boot=data.frame()for(j in 1:length(sample)){id=c(rep(j,7))boot=as.data.frame(cbind(id,autodatarandom[(1+(sample[j]-1)*7):(7+(sample[j]-1)*7),2:4]))databoot=rbind(databoot,boot)}#####################################end for one databootautorandom=plm(y~lag(y,1)+lag(y,2)+x-1,data=databoot,effect="individual",index=c("id","T"))ytry=autorandom$model[,1]y_1tr=autorandom$model[,2]
19
xtry=autorandom$model[,4]ivtry=y_1tr-autorandom$model[,3]t=tsls(ytry~y_1tr+xtry-1,~xtry+ivtry)coe=t$coefficientsautofixed=pgmm(y~lag(y,1)+x | lag(y,2:3),data=databoot,effect="twoways",model="twosteps")coeff=as.matrix(autofixed$coefficients[[2]][1:2])d1[i]=coe[1]-coeff[1]d2[i]=coe[2]-coeff[2]}##################################end 100f=cbind(d1,d2)V=var(f)######################get the panel bootstrap estimate of the variance matrix############################################the panel-robust Hausman test statisticH[k]=t(coeff-coe)%*%solve(V)%*%(coeff-coe)}H=as.numeric(H)hist(H,freq=FALSE,breaks=500,xlim=c(0,30),main="Monte Carlo Simulations of Hausman Test indynamic models (autocorrelated data 500 simulations" ,xlab="Hausman Test Statistic",ylab="Density")########################################################################actual sizecurve(dchisq(x, df = 2), col = 2, lty = 2, lwd = 2, add = TRUE)count=function(x,value){count=0for(i in 1:length(x)){if (abs(x[i])>value ) count=count+1 else{count=count}}return(count)}size1=count(H,5.99)/500####################nominal size 0.05size2=count(H,7.38)/500####################nominal size 0.025size3=count(H,9.22)/500####################nominal size 0.01#############################the test powerset.seed=127y=c()H=c()for(k in 1:500){###########################t=7N=100t=7p=0.7mu=rnorm(N,5,1)id=c(1:N)T=c(1:t)autodatafixed=c()for (i in 1:N){x=rnorm(t,mu[i],1)Epsilon=rnorm(t,0,1)Eta=rnorm(1,0,1)ideffect=mean(x)+Etafor (j in 1:t){if (j==1) y[j]=mu[i] else{y[j]=p*y[j-1]+ideffect+x[j]+Epsilon[j]}################end ifID=rep(i,t)}###################end for jdata=data.frame(cbind(ID,T,y,x))autodatafixed=rbind(autodatafixed,data)####################autocorrelated data with fixed effect}########################end for i
20
autorandom=plm(y~lag(y,1)+lag(y,2)+x-1,data=autodatafixed,effect="individual",index=c("ID","T"))ytry=autorandom$model[,1]y_1tr=autorandom$model[,2]xtry=autorandom$model[,4]ivtry=y_1tr-autorandom$model[,3]t=tsls(ytry~y_1tr+xtry-1,~xtry+ivtry)coe=t$coefficientsautofixed=pgmm(y~lag(y,1)+x | lag(y,2:3),data=autodatafixed,effect="twoways",model="twosteps")coeff=as.matrix(autofixed$coefficients[[2]][1:2])###############################################bootstrapr=250d1=c()d2=c()for(i in 1:r){sample=sample(1:100,replace = TRUE)databoot=data.frame()boot=data.frame()for(j in 1:length(sample)){id=c(rep(j,7))boot=as.data.frame(cbind(id,autodatafixed[(1+(sample[j]-1)*7):(7+(sample[j]-1)*7),2:4]))databoot=rbind(databoot,boot)}#####################################end for one databootautorandom=plm(y~lag(y,1)+lag(y,2)+x-1,data=databoot,effect="individual",index=c("id","T"))ytry=autorandom$model[,1]y_1tr=autorandom$model[,2]xtry=autorandom$model[,4]ivtry=y_1tr-autorandom$model[,3]t=tsls(ytry~y_1tr+xtry-1,~xtry+ivtry)coe=t$coefficientsautofixed=pgmm(y~lag(y,1)+x | lag(y,2:3),data=databoot,effect="twoways",model="twosteps")coeff=as.matrix(autofixed$coefficients[[2]][1:2])d1[i]=coe[1]-coeff[1]d2[i]=coe[2]-coeff[2]}##################################end 100f=cbind(d1,d2)V=var(f)################################################get the panel bootstrap estimate of thevariance matrixH[k]=t(coeff-coe)%*%solve(V)%*%(coeff-coe)}H=as.numeric(H)hist(H,freq=FALSE,breaks=10000,xlim=c(0,50),main="Monte Carlo Simulations of Hausman Test indynamic models (autocorrelated data 500 simulations" ,xlab="Hausman Test Statistic",ylab="Density")########################################################################actual sizecount=function(x,value){count=0for(i in 1:length(x)){if (abs(x[i])>value ) count=count+1 else{count=count}}return(count)}power1=count(H,5.99)/500####################nominal size 0.05power2=count(H,7.38)/500####################nominal size 0.025power3=count(H,9.22)/500####################nominal size 0.01
21
R code for the Monte Carlo study of the individual-specific effects model
library(plm)Hausman=c()for(k in 1:500){mu=rnorm(100,0,1)N=100t=7id=c(1:N)T=c(1:t)datarandom=c()for (i in 1:N){x=rnorm(t,mu[i],1)Epsilon=rnorm(t,0,1)ideffect=rnorm(1,0,1)y=ideffect+x+EpsilonID=rep(i,t)data=data.frame(cbind(ID,T,y,x))datarandom=rbind(datarandom,data)}######################################data.frame of random effects modelrandomra=plm(y~x-1,data=datarandom,effect="individual",model=c("random"),index=c("ID","T"))randomfd=plm(y~x-1,data=datarandom,effect="individual",model=c("within"),index=c("ID","T"))## Hausman testhausmantest=phtest(randomra,randomfd)hausman[k]=hausmantest$statistic}hausman=as.numeric(hausman)hist(hausman,freq=FALSE,breaks=200,xlim=c(0,10),main="Monte Carlo Simulations of HausmanTest(500 simulations)",xlab="Hausman Test Statistic",ylab="Density")curve(dchisq(x, df = 1), col = 2, lty = 2, lwd = 2, add = TRUE)#############################the test powerhausman=c()for(k in 1:500){mu=rnorm(100,0,1)N=100t=7id=c(1:N)T=c(1:t)datafd=c()for (i in 1:N){x=rnorm(t,mu[i],1)Epsilon=rnorm(t,0,1)Eta=rnorm(1,0,1)ideffect=mean(x)+Etay=ideffect+x+EpsilonID=rep(i,t)data=data.frame(cbind(ID,T,y,x))datafd=rbind(datafd,data)}######################################data.frame of random effects modelrandomra=plm(y~x-1,data=datafd,effect="individual",model=c("random"),index=c("ID","T"))randomfd=plm(y~x-1,data=datafd,effect="individual",model=c("within"),index=c("ID","T"))## Hausman testhausmantest=phtest(randomra,randomfd)
22
hausman[k]=hausmantest$statistic}#############################count=function(x,value){count=0for(i in 1:length(x)){if (abs(x[i])>value ) count=count+1 else{count=count}}return(count)}power1=count(hausman,3.84)/500####################nominal size 0.05power2=count(hausman,5.02)/500####################nominal size 0.025power3=count(hausman,6.63)/500####################nominal size 0.01power4=count(hausman,2.7055)/500####################nominal size 0.1
R code for the empirical example
library(plm)library(sem)data("EmplUK", package = "plm")autorandom=plm(log(emp)~lag(log(emp),1)+lag(log(emp),2)+log(wage)+log(capital)+log(output)-1,data=EmplUK,effect="individual",index=c("firm","year"))Nemp=autorandom$model[,1]Nemp_1=autorandom$model[,2]Nwage=autorandom$model[,4]Ncapital=autorandom$model[,5]Noutput=autorandom$model[,6]iv=Nemp_1-autorandom$model[,3]t=tsls(Nemp~Nemp_1+Nwage+Ncapital+Noutput-1,~Nwage+Ncapital+Noutput+iv)coe=t$coefficientsautofixed=pgmm(log(emp)~lag(log(emp),1)+log(wage)+log(capital)+log(output) |lag(log(emp),2:3),data=EmplUK,effect="twoways",model="twosteps",transformation = c("d"))coeff=as.matrix(autofixed$coefficients[[2]][1:4])###############################################bootstrapr=500d1=c()d2=c()d3=c()d4=c()for(i in 1:r){set.seed=127sample=sample(1:140,replace = TRUE)databoot=data.frame()boot=data.frame()for(j in 1:length(sample)){############################140n=0databoot1=data.frame()for(k in1:(dim(EmplUK)[1])){##############################################################if (EmplUK[k,1]==sample[j]){n=n+1boot=EmplUK[k,2:7]databoot1=rbind(databoot1,boot)}
23
else{n=n}############################################end if}###############################end for kfirm=c(rep(j,n))databoot1=as.data.frame(cbind(firm,databoot1))databoot=rbind(databoot1,databoot)}#####################################end for one databootautorandom=plm(log(emp)~lag(log(emp),1)+log(wage)+log(capital)+log(output)-1,data=databoot,effect="individual",index=c("firm","year"))Nemp=autorandom$model[,1]Nemp_1=autorandom$model[,2]Nwage=autorandom$model[,4]Ncapital=autorandom$model[,5]Noutput=autorandom$model[,6]iv=Nemp_1-autorandom$model[,3]t=tsls(Nemp~Nemp_1+Nwage+Ncapital+Noutput-1,~Nwage+Ncapital+Noutput+iv)coe=t$coefficientsautofixed=pgmm(log(emp)~lag(log(emp),1)+log(wage)+log(capital)+log(output) |lag(log(emp),2:3),data=databoot,effect="twoways",model="twosteps",transformation = c("d"))coeff=as.matrix(autofixed$coefficients[[2]][1:4])d1[i]=coe[1]-coeff[1]d2[i]=coe[2]-coeff[2]d3[i]=coe[3]-coeff[3]d4[i]=coe[4]-coeff[4]}##################################end 140f=cbind(d1,d2,d3,d4)V=var(f)################################################get the panel bootstrap estimate of thevariance matrix############################################the panel-robust Hausman test statisticH=t(coeff-coe)%*%solve(V)%*%(coeff-coe)