TESTING FOR UNIT ROOTS WITH COINTEGRATED...
Transcript of TESTING FOR UNIT ROOTS WITH COINTEGRATED...
Paper 1:
TESTING FOR UNIT ROOTS WITH COINTEGRATED DATA
Paper 2:
ON THE PRACTICAL IMPORTANCE OF HURWICZ BIAS
IN MODELS WITH LAGGED DEPENDENT VARIABLES
TESTING FOR UNIT ROOTS WITH COINTEGRATED DATA
by
W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand
Email: [email protected]
Abstract
This paper demonstrates that unit root tests can suffer from inflated Type I error rates when data are cointegrated. Results from Monte Carlo simulations show that three commonly used unit root tests – the ADF, Phillips-Perron, and DF-GLS tests – frequently overreject the true null of a unit root for at least one of the cointegrated variables. The reason for this overrejection is that unit root tests, designed for random walk data, are often misspecified when data are cointegrated. While the addition of lagged differenced (LD) terms can eliminate the size distortion, this “success” is spurious, driven by collinearity between the lagged dependent variable and the LD explanatory variables. Accordingly, standard diagnostics such as (i) testing for serial correlation in the residuals and (ii) using information criteria to select among different lag specifications are futile. The implication of these results is that researchers should be conservative in the weight they attach to individual unit root tests when determining whether data are cointegrated.
Keywords: Unit root testing, cointegration, DF-GLS test, Augmented Dickey-Fuller test, Phillips-Perron test, simulation JEL classification: C32, C22, C18
May 30, 2015
Acknowledgments: This paper has been much improved as a result of comments from David Giles, Peter Phillips, Morten Nielsen, Marco Reale, and seminar participants at the “1st Conference on Recent Developments in Financial Econometrics and Applications” held at Deakin University, 4-5 December, 2014. Remaining errors are the sole property of the author.
1
I. INTRODUCTION
When estimating relationships among time series data, it is standard practice to first test for
unit roots in the individual series. If the data are integrated, one then moves to testing
whether the variables are cointegrated. This note points out that unit root tests are likely to
suffer from size distortions precisely because the data are cointegrated. These size distortions
are often substantial.
I illustrate this using a simple autoregressive, distributed lag (ARDL) system of two
variables. The ARDL framework has a number of features which make it attractive for
modelling dynamic relationships. It allows for interactions between variables, and
incorporates both endogeneity and own and cross-lagged effects. These features capture
likely behaviors of real economic time series. The ARDL framework can be “solved” to
identify parameter values that cause the two variables to be cointegrated. Furthermore, the
ARDL framework is easily transformed to an error correction specification, which facilitates
interpretation of dynamic relationships.
TABLE 1 illustrates the problem with size. X and Y are two simulated data series
where the parameter values for the data generating process (DGP) have been chosen to ensure
that they are cointegrated. Because the series are cointegrated, each of the series must have a
unit root. I subject each series to three unit root tests: the augmented Dickey-Fuller test
(ADF), the Phillips-Perron test, and the DF-GLS test. 10,000 simulations of sample sizes 100
were conducted. Significance levels were set equal to 0.05. The table reports the associated
Type I error rates. All simulations were done using Stata, Version 14.1
While the ADF and DF-GLS tests produce Type I error rates for X close to 0.05, the
Phillips-Perron test produces an error rate over 0.40. For Y, the results are much worse.
Type I error rates are 0.206, 1.000, and 0.685 for the ADF, Phillips-Perron, and DF-GLS
1 All programs used to produce the results for this paper are available from the author.
2
tests, respectively.2 The ADF regressions show good diagnostics, with little serial correlation
evident in the residuals. As I show below, unit root test results such as these are quite easy to
produce with cointegrated data.
I proceed as follows. Section II presents the theory that motivates the simulation
work. Section III presents additional Monte Carlo evidence of size distortions for
cointegrated data. Section IV provides an explanation for my results. Section V concludes
by discussing the implication of these findings for estimation of error correction models.
II. THEORY Consider the following ARDL(1,1) model.
, ~ 0,1 , 1) , ~ 0,1 ,
1,2, … , . This can be rewritten in VAR form as:
(2) .
where the parameters , i=1,2, j=1,2,3,4 are each functions of the and terms of
Equation (1).
Define . The determinant of the matrix
is the characteristic equation of , and the values of that
set this equation equal to zero are the associated characteristic roots, or eigenvalues:
(3) 0 .
A necessary condition for and to be CI(1,1) is that the corresponding solutions to (3) be
given by 1, | | 1.
2 Lag lengths for the Phillips-Perron and DF-GLS tests were chosen using the default options supplied by Stata.
3
The following conditions on , , and are sufficient to ensure that
1, | | 1. 3
(4a) 0 1
(4b) 0 1
(4c) 1 .
We can work backwards from (4a) – (4c) to obtain and values consistent with 1,
| | 1.
Let and take any values such that 1. Then
(5a)
(5b) 1
(5c)
(5d) 1
will produce , , and values such that 1, | | 1.
Equation (2) can be arranged in vector error correction (VEC) model form as:
∆ , 6) ∆ ,
where the coefficients , , and , as well as the error terms and , are functions of
the terms, i=1,2, j=1,2,3,4. This allows the long-run equilibrium relationship between
and , represented by the parameter ; and the speed-of-adjustment parameters and , to
all be expressed as functions of , , and .
7a)
7b)
3 These conditions are taken from Enders (2010, page 369). I have made the conditions more restrictive to make sure that the speed of adjustment parameters have the correct sign and size.
4
7c) .
The parameter values chosen in this way will ensure that (i) 1 0, and (ii) 1
0, so that the VEC model is well-behaved.
III. RESULTS
This section reports results from ten additional cases that highlight the problem with size
distortions. The first two columns of TABLE 2 describe the model parameters and time
series characteristics associated with the data generating processes (DGPs) for each case. I
have chosen cases that cover a wide range of behaviours. The cases are sorted in ascending
order of , the speed of adjustment coefficient for the Y series. ranges from a low of
-0.16 to a high of -0.90. The last column reports the characteristic roots associated with the
respective model parameters. In all cases, 1, | | 1.
TABLE 3 reports more simulation findings demonstrating that the results from
TABLE 1 are not isolated outcomes. The ten panels of TABLE 3 correspond to the ten cases
of TABLE 2. The top panel reports Monte Carlo results using the parameter values from
Case 1:
0, 2, 1.24, 1.70, 0, 5, 4.40, 1.75.
The fact that both and are nonzero implies that if one of the series is I(1), the other
must be as well (cf. Equation 1). These values generate a VEC model with long-run
equilibrium and speed of adjustment parameters 1.25, 0.16, and 0.25.
This implies that the long-run relationship between Y and X is given by 1.25 . A
one-unit increase in from its equilibrium value causes the next period’s value of Y to
decrease by 0.16 units. A one-unit increase in from its equilibrium value causes the next
period’s value of X to decrease by 0.25 units (= ). As in TABLE 1, the Monte Carlo
results are based on 10,000 simulations of sample sizes 100. As a point of comparison,
5
TABLE 3 also reports the results of unit root tests for a random walk variable,
. ~ 0,1 The Z column is useful for illustrating the range of deviations that can be
expected from sampling error.
The results for the X variable demonstrate that the size distortions associated with
each of the tests can be quite substantial. The Type I error rates for the ADF, Phillips-Perron,
and DF-GLS tests are 53.6, 87.6, and 64.0 percent, respectively. Thus, given sample data
from this DGP and applying any of the three tests supplied in Stata, a researcher would
incorrectly conclude that the X variable was stationary over half the time. The results for Y
also show size distortions, but of a smaller degree. These results are to be compared to those
reported for the Z variable, which is a pure random walk process. All three unit root tests
produce Type I error rates for Z that are close to 5 percent.
As is well-known, results from unit root tests can differ substantially depending on the
number of lagged differenced (LD) terms included in the unit root specification. Stata
automatically selects the number of LD terms for the Phillips-Perron and DF-GLS tests. The
ADF test requires the user to supply the number of lags. For the ADF tests, I chose lag
orders that were sufficient to generate white noise behaviour in the residuals.4 The last row
of the panel reports the results of a Breusch-Godfrey test where the null hypothesis is no
serial correlation. The test results for the X and Y variables are close to the value of 0.05 that
one would expect were there no serial correlation. These are virtually identical to those for
the random walk Z variable which has no serial correlation by construction. Based on these
results, a researcher would conclude that the ADF test was correctly specified.
The next nine panels report more unit root test results. I have highlighted the results
that show substantial size distortions. In the second panel, unit root tests for both the X and Y
variables reveal Type I error rates well above 5 percent for all three tests. For the ADF test,
4 Lag lengths for Cases 1 to 10 are 2, 2, 4, 3, 6, 2, 3, 4, 4, and 5, respectively.
6
Type I error rates are 0.390 and 0.309, respectively. The Breusch-Godfrey test results
indicate that sufficient lags have been included in the ADF specification. For the Phillips-
Perron and DF-GLS tests, the Type I error rates are 0.770 and 0.655, and 0.492 and 0.394,
respectively. The results for the X and Y variables contrast with the results for the benchmark
Z variable, which are approximately 5 percent across all three tests. The results from this
second panel indicate that a researcher would frequently conclude that both X and Y were
both stationary, and hence not cointegrated.
The highlighted areas in the subsequent panels accumulate further evidence that unit
root tests of cointegrated data are frequently characterized by substantial size distortions. An
egregious example is Case 6, where the Type I error rates for the Y variable are 0.921, 1.000,
and 0.931. A researcher would incorrectly classify the order of integration for this variable
over 90 percent of the time using any of the three unit root tests provided by Stata.
It turns out that the size distortions for the ADF test can be eliminated by adding
sufficient lagged differenced (LD) terms to the ADF specification. However, knowing the
correct number of LD terms to add is impossible in practice. Two common methods for
determining the number of LD terms are (i) testing the residuals for serial correlation; and (ii)
using information criteria such as the AIC and SIC to select the LD specification with the
lowest AIC/SIC value (Harris, 1992).
TABLE 4 reports the results of an analysis where these two methods are employed to
determine the appropriate number of LD terms to add to the ADF specification. The X and Y
data for TABLE 4 are generated using the DGP for Case 1. As before, the Z data are pure
random walk data and are included as a benchmark. One to ten LD terms are successively
added to the ADF specification. Breusch-Godfrey tests for each LD specification are
reported in the top panel of TABLE 4. Average AIC and SIC values for each LD
specification are reported in the subsequent two panels.
7
TABLE 4 is designed to address this thought experiment: based on the results from
the first three panels, how many LD terms would a researcher think is the “correct” number
of terms to add? For example, when LAGS = 1, the null hypothesis of no serial correlation is
rejected approximately 6.0 and 5.7 percent of the time for the X and Y series. For LAGS = 2,
rejection rates are 5.6 and 5.5 percent.5 The average AIC values for the X and Y series when
LAGS = 1 are 168.96 and 41.51. These successively increase as additional LD terms are
added. Likewise, the average SIC values for the X and Y series achieve their minimum when
LAGS = 1. Using the diagnostics from these three panels, a researcher might conclude that
the “correct” number of LD terms to add was 1 or 2.
The fourth panel of TABLE 4 reports the ADF Type I error rates for each LD
specification. When LAGS = 1, the Type I error rates for the X and Y variables are 0.734 and
0.148. When LAGS = 2, they are 0.546 and 0.106, respectively. In other words, using
commonly accepted methods for determining the appropriate number of LD terms, a
researcher would likely conclude that one, or at most two, LD terms was sufficient to control
for serial correlation in the ADF specification. Either strategy would result in the researcher
concluding that the X variable was stationary over half the time. In fact, it would take ten or
more LD terms to reduce the size of the ADF test to 5 percent. The diagnostic tests are
unable to identify the appropriate number of LD terms.
Similar results are obtained for the remaining cases (the Appendix reports the results
of following the same procedure for Case 2). In all cases, the information criteria select a
single LD term. Tests for serial correlation generally indicate that more than one LD term
should be included, but not so many as to eliminate the size distortion. The next section
explains that this inability of the diagnostic tests is because the apparent “success” from
adding LD terms to the ADF specification is spurious.
5 In practice, a researcher only has a single test for serial correlation to go on, so that it is likely that that he/she would find a single LD term to be sufficient in this case.
8
IV. DISCUSSION
The explanation for the poor size performance of unit root tests with cointegrated data can be
linked to how critical values are determined in the Dickey-Fuller framework. Given Monte
Carlo simulation of the random walk process,
,
repeated OLS estimation of the DF specification below
∆ .
produces an empirical distribution of t values, . .
, associated with testing the null
hypothesis: : 0, which is true given the random walk process.
In my simulations of the ARDL(1,1) data, the DGP is given by
,
,
and the corresponding differenced specifications are given by
∆
∆ ,
where 1 , 1 , , and
. While variables Y and X are each I(1) in the ten cases of TABLE 2,
1 0 and 1 0. In other words, the unit root tests are
misspecified: the null hypothesis : 0, is not true for those cases where / is not
equal to one.
This also explains why adding more LD terms “improves” the performance of the
ADF test. As in System GMM, the LD terms are predictors of the lagged dependent variable.
The more LD terms, the greater the multicollinearity in the ADF specification, and the more
statistically insignificant the coefficient becomes on the lagged dependent variable. This
makes it appear that the ADF test is more “successful” in identifying unit root processes as
9
more LD terms are added to the ADF specification. This is a spurious success due to
increased collinearity between the lagged dependent variable and the LD terms.
V. CONCLUSION
This paper demonstrates that unit root tests can suffer from inflated Type I error rates when
data are cointegrated. This should be of interest to researchers who are interested in
estimating relationships between nonstationary variables. Standard procedure calls for
testing variables for unit roots before proceeding to the estimation of error correction models.
The results of this study demonstrate that the very fact that the data are cointegrated can
render unit root tests unreliable. This suggests that researchers should be conservative in the
weight they attach to individual unit root tests, opting for a more holistic approach when
determining whether data are cointegrated.
10
REFERENCES
Enders, W. 2010. Applied Econometric Time Series, 3rd Edition, New York: John Wiley &
Sons. Harris, R.I.D. 1992. Testing for unit roots using the augmented Dickey-Fuller test: Some issues
relating to the size, power and the structure of the test. Economics Letters 38: 381-386.
11
TABLE 1
Example of Unit Root Test Results Using Cointegrated Data
UNIT ROOT TEST X Y
ADF 0.051 0.206
Phillips-Perron 0.410 1.000
DF-GLS 0.087 0.685
NOTE: Values in the table are Type I error rates associated with the null
hypothesis that the data series have a unit root. The underlying DGP is the ARDL framework represented by Equation 1 in the text, with the following parameter values:
0, 3, 1, 3, 0, 1, 0.2, 0.2
The corresponding long-run equilibrium and speed of adjustment parameters (see Equations 7a)-7c) are given by: 0.1, 0.5, 1, 0.1.
12
TABLE 2 DESCRIPTION OF CASES
CASE MODEL PARAMETERS
(1) TIME SERIES CHARACTERISTICS
(2) CHARACTERISTIC ROOTS
(3)
1 2.00, 1.24, 1.70,
5.00, 4.40, 1.75 0.16, 0.20, 0.25, 1.25 1, 0.59
2 2.00, 1.80, 1.60,
5.00, 4.50, 1.25 0.20, 0.50, 0.25, 0.50 1, 0.55
3 2.00, 0.35, 0.60,
3.00, 2.05, 2.80 0.25, 0.20, 0.80, 4.00 1, 0.05
4 1.00, 1.45, 0.90,
5.00, 3.65, 1.30 0.45, 0.90, 0.20, 0.22 1, 0.35
5 0.50, 0.72, 0.85,
1.00, 0.92, 1.10 0.48, 0.40, 0.50, 1.25 1, 0.02
6 3.00, 1.00, 3.00,
1.00, 0.20, 0.20 0.50, 1.00, 0.10, 0.10 1, 0.40
7 2.00, 2.20, 1.80,
5.00, 2.90, 1.35 0.60, 0.90, 0.15, 0.17 1, 0.25
8 0.80, 0.72, 1.08,
0.60, 0.64, 0.96 0.60, 0.40, 0.40, 1.00 1, 0
13
CASE MODEL PARAMETERS
(1) TIME SERIES CHARACTERISTICS
(2) CHARACTERISTIC ROOTS
(3)
9 0.80, 0.52, 1.16,
0.60, 0.52, 1.06 0.80, 0.40, 0.30, 0.75 1, 0.1
10 2.00, 1.90, 1.90,
5.00, 1.40, 1.40 0.90, 0.90, 0.10, 0.11 1, 0
NOTE: Each of the cases above is based on the ARDL framework of Equation (1), with associated parameter values given in Column
(1). The values of the characteristics , , and reported in Column (2) are, respectively, the values of the speed of adjustment parameters and the long-run relationship parameter between Y and X in the error correction models of Equation (6) that correspond to the values of the model parameters for that case. identifies the systematic change in ∆ corresponding to a one-unit change in . Necessary conditions for the series to be well-behaved are (i) 1 0, and (ii) 1 0. The last column reports the values of the characteristic roots in the VAR specification of Equation (2) for that case. A necessary condition for the series to be cointegrated is that 1, | | 1.
14
TABLE 3
More Examples of Unit Root Test Results Using Cointegrated Data
CASE TEST X Y Z
1
ADF 0.536 0.100 0.055
Phillips-Perron 0.876 0.203 0.062
DF-GLS 0.640 0.121 0.045
BREUSCH-GODFREY TEST: 0.056 0.058 0.059
2
ADF 0.390 0.309 0.055
Phillips-Perron 0.770 0.655 0.062
DF-GLS 0.492 0.394 0.045
BREUSCH-GODFREY TEST 0.059 0.057 0.059
3
ADF 0.113 0.048 0.049
Phillips-Perron 0.970 0.029 0.062
DF-GLS 0.429 0.066 0.045
BREUSCH-GODFREY TEST 0.056 0.063 0.062
4
ADF 0.200 0.403 0.053
Phillips-Perron 0.787 0.963 0.062
DF-GLS 0.419 0.680 0.045
BREUSCH-GODFREY TEST 0.063 0.069 0.057
5
ADF 0.159 0.106 0.045
Phillips-Perron 1.000 1.000 0.062
DF-GLS 0.756 0.639 0.045
BREUSCH-GODFREY TEST 0.053 0.063 0.063
6
ADF 0.051 0.921 0.055
Phillips-Perron 0.024 1.000 0.062
DF-GLS 0.038 0.931 0.045
BREUSCH-GODFREY TEST 0.061 0.056 0.059
7
ADF 0.083 0.578 0.053
Phillips-Perron 0.452 0.998 0.062
DF-GLS 0.170 0.808 0.045
BREUSCH-GODFREY TEST 0.057 0.066 0.057
8
ADF 0.266 0.405 0.049
Phillips-Perron 0.999 1.000 0.062
DF-GLS 0.677 0.782 0.045
BREUSCH-GODFREY TEST 0.062 0.062 0.062
15
CASE TEST X Y Z
9
ADF 0.087 0.311 0.049
Phillips-Perron 0.952 1.000 0.062
DF-GLS 0.354 0.696 0.045
BREUSCH-GODFREY TEST 0.055 0.061 0.062
10
ADF 0.050 0.347 0.048
Phillips-Perron 0.316 1.000 0.062
DF-GLS 0.085 0.826 0.045
BREUSCH-GODFREY TEST 0.059 0.063 0.060
NOTE: The values in the table are the rejection rates of the respective null hypothesis. For the unit root tests (ADF, Phillips-Perron, and DF-GLS), the null hypothesis is that the series has a unit root. For the Breusch-Godfrey tests, the null hypothesis is that the residuals associated with the ADF test are not serially correlated.
16
TABLE 4 The Effect of Adding Lagged Differenced Terms to the Dickey-Fuller Unit Root Regression Equation: Case 1
X Y Z
BREUSCH-GODFREY TESTS:
LAGS = 1 0.060 0.057 0.057
LAGS = 2 0.056 0.055 0.063
LAGS = 3 0.062 0.056 0.060
LAGS = 4 0.056 0.054 0.059
LAGS = 5 0.059 0.059 0.060
LAGS = 6 0.060 0.060 0.062
LAGS = 7 0.059 0.061 0.061
LAGS = 8 0.067 0.058 0.063
LAGS = 9 0.056 0.064 0.065
LAGS = 10 0.060 0.061 0.064 AIC VALUES:
LAGS = 1 168.96 41.51 279.67
LAGS = 2 169.93 42.39 280.44
LAGS = 3 170.91 43.55 281.41
LAGS = 4 171.84 44.49 282.39
LAGS = 5 172.91 45.38 283.40
LAGS = 6 173.65 46.40 284.03
LAGS = 7 174.65 47.31 284.93
LAGS = 8 175.51 48.31 285.79
LAGS = 9 176.73 49.25 286.81
LAGS = 10 177.49 50.05 287.48 SIC VALUES:
LAGS = 1 179.34 51.89 290.05
LAGS = 2 182.90 55.37 293.42
LAGS = 3 186.48 59.12 296.98
LAGS = 4 190.00 62.66 300.56
LAGS = 5 193.67 66.14 304.16
LAGS = 6 197.01 69.75 307.39
LAGS = 7 200.60 73.26 310.88
LAGS = 8 204.06 76.86 314.33
LAGS = 9 207.87 80.39 317.95
LAGS = 10 211.23 83.78 321.22
17
TABLE 4 (continued)
X Y Z
ADF UNIT ROOT TESTS:
LAGS = 1 0.734 0.148 0.054
LAGS = 2 0.546 0.106 0.050
LAGS = 3 0.405 0.085 0.055
LAGS = 4 0.291 0.068 0.054
LAGS = 5 0.222 0.063 0.047
LAGS = 6 0.166 0.054 0.048
LAGS = 7 0.143 0.054 0.043
LAGS = 8 0.116 0.046 0.042
LAGS = 9 0.092 0.051 0.045
LAGS = 10 0.082 0.046 0.044
NOTE: The values in the top panel (“Breusch-Godfrey Tests”) are the rejection rates associated with the null hypothesis of no serial correlation for alternative specifications of lagged differenced (LD) terms in the ADF specification. The values in the next two panels (“AIC Values” and “SIC Values”) are the average information criteria values associated with the respective LD specifications. The number of observations are held constant across the different specifications. The values in the bottom panel (“ADF Unit Root Tests”) are the Type I error rates associated with the null hypothesis of a unit root using the ADF test with the designated number of lagged, differenced terms.
18
APPENDIX The Effect of Adding Lagged Differenced Terms to the Dickey-Fuller Unit Root Regression Equation: Case 2
X Y Z BREUSCH-GODFREY TESTS:
LAGS = 1 0.069 0.069 0.057
LAGS = 2 0.060 0.060 0.063
LAGS = 3 0.063 0.059 0.060
LAGS = 4 0.056 0.057 0.059
LAGS = 5 0.062 0.056 0.060
LAGS = 6 0.059 0.058 0.062
LAGS = 7 0.059 0.061 0.061
LAGS = 8 0.065 0.056 0.063
LAGS = 9 0.060 0.060 0.065
LAGS = 10 0.063 0.064 0.064 AIC VALUES:
LAGS = 1 179.18 14.35 279.67
LAGS = 2 179.93 15.04 280.44
LAGS = 3 181.00 16.15 281.41
LAGS = 4 181.89 16.95 282.39
LAGS = 5 182.89 17.86 283.40
LAGS = 6 183.62 18.93 284.03
LAGS = 7 184.67 19.96 284.93
LAGS = 8 185.51 20.94 285.79
LAGS = 9 186.71 21.91 286.81
LAGS = 10 187.58 22.78 287.48 SIC VALUES:
LAGS = 1 189.56 24.73 290.05
LAGS = 2 192.90 28.02 293.42
LAGS = 3 196.57 31.72 296.98
LAGS = 4 200.06 35.12 300.56
LAGS = 5 203.65 38.62 304.16
LAGS = 6 206.97 42.28 307.39
LAGS = 7 210.62 45.91 310.88
LAGS = 8 214.06 49.49 314.33
LAGS = 9 217.85 53.05 317.95
LAGS = 10 221.32 56.52 321.22
19
APPENDIX (continued)
X Y Z
ADF UNIT ROOT TESTS:
LAGS = 1 0.592 0.477 0.054
LAGS = 2 0.411 0.324 0.050
LAGS = 3 0.287 0.218 0.055
LAGS = 4 0.202 0.163 0.054
LAGS = 5 0.145 0.123 0.047
LAGS = 6 0.113 0.094 0.048
LAGS = 7 0.098 0.083 0.043
LAGS = 8 0.079 0.065 0.042
LAGS = 9 0.067 0.056 0.045
LAGS = 10 0.059 0.052 0.044
NOTE: Values in the top panel (“Breusch-Godfrey Tests”) are the rejection rates associated with the null hypothesis of no serial correlation. Values in the next two panels (“AIC Values” and “SIC Values”) are the average information criteria values associated with the respective lag specifications. The number of observations are held constant across the different specifications. The values in the bottom panel (“ADF Unit Root Tests”) are the Type I error rates associated with the null hypothesis of a unit root using the ADF test with the designated number of lagged, differenced terms.
NOTE: This is an early draft of the paper. Please do not circulate.
ON THE PRACTICAL IMPORTANCE OF HURWICZ BIAS IN MODELS WITH LAGGED DEPENDENT VARIABLES
by
W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand
Email: [email protected]
Min Zhu School of Economics and Finance
Queensland University of Technology, Australia Email: [email protected]
Abstract
It is well-known that OLS estimation of the time series model , ~ , produces biased estimates of in finite samples, though OLS is consistent. This
bias was first discussed in Hurwicz (1950), and investigated further in Phillips (1977). Most applied researchers are seemingly unperturbed by this fact, presumably resting their confidence on the asymptotic properties of the OLS estimator of . The purpose of this note is to remind researchers that the size of the bias can be very large in reasonably-sized samples. This has important implications for estimation of dynamic models. I illustrate the size of the bias in three settings: (i) a single equation, time series model where is regressed on ; (ii) a single equation, auto-regressive, distributed lag model (ARDL) where the researcher is interested in estimating the long-run propensity of an independent explanatory variable ; and (iii) a single equation, dynamic panel data (DPD) model where the researcher is again interested in estimating the long-run propensity of . In all three cases, Type I error rates are sufficiently large to vitiate the usefulness of hypothesis testing in many cases. KEYWORDS: Hurwicz bias, Auto-Regressive Distributed-Lag models, ARDL, Dynamic Panel Data models, DPD, Anderson-Hsaio, Arellano-Bond, Difference GMM, System GMM, Nickell bias, simulation JEL classification: C22, C23
Revised, 4 July 2015
Acknowledgments: This paper acknowledges helpful comments from David Frazier, Adrian Pagan, Yongcheol Shin, Richard Smith, Jeffrey Wooldridge, and seminar participants at the NZESG meetings in February 2015 and the NZAE meetings in July 2015. Remaining errors are the sole property of the authors.
1
I. INTRODUCTION
It is well-known that OLS estimation of the model,
(1) ,
t = 1,2,...,T, where ~ 0, and produces biased estimates of in finite samples,
though OLS is consistent. In particular, . This bias was first discussed in
Hurwicz (1950) and investigated further in Phillips (1977).
Despite this fact, most applied researchers are seemingly unperturbed by this fact,
presumably resting their confidence on the asymptotic properties of the OLS estimator. The
purpose of this note is to remind researchers that the size of the bias can be very large in
reasonably-sized samples when the dependent variable is substantially auto-correlated. This
has important implications for the estimation of dynamic models. I illustrate the size of the
bias in three settings.
The first setting is the similar linear regression model represented by Equation (1)
where researchers are interested in estimating . The second setting is an auto-regressive,
distributed lag (ARDL) model of the form
(2) ,
where researchers use OLS to estimate the value of the long-run propensity (LRP) of x, LRP
of x = 1⁄ . The third setting is a dynamic panel data (DPD) model of the form
(3) , , where the error term has a fixed effect and researchers are again interested in estimating the
LRP of x.
The paper proceeds as follows. Sections II to IV respectively investigate Hurwicz
bias in each of the settings above. Section V concludes.
2
LRP = ∑
The long-run propensity (LRP) defined above, despite its conceptual beauty, has serious
statistical pitfalls. The fact that the OLS estimation of AR model is biased in finite sample
potentially causes the LRP to be biased. Even worse, the LRP is a nonlinear function of the
AR coefficients and a small bias in the denominator can be greatly magnified especially when
AR coefficients are close to the unit circle. For illustration purpose, suppose p 1 which is
Koyck model and β 1. Marriott and Pope (1954) provide the first-order bias in the OLS
estimation for AR(1) coefficient which is 1 3ρ /T. With ρ 0.95 and T 100, the
first-order bias of AR(1) coefficient is -0.0385 or 4% of the true value. However, this can be
translated into 43.5% bias in the LRP (LRR is 11.3 rather than 20)! The issue of bias could
be even pronounced when p 1. As pointed out by Patterson(2000) , small biases in the
individual AR coefficient estimates cumulate, often in a reinforcing rather than offsetting
manner, in the estimate of the sum ∑ ρ .
On top of the bias issue, we have problem with estimation due the fact that often the
ratio distributions are heavy-tailed, and it may be difficult to work with such distributions and
develop an associated statistical test. For the LRP, the OLS estimate of the ratio is the ratio
of two correlated normally distributed variables, themselves estimate. In the case of both
numerator and denominator are independent normal distributions with mean zero, the ratio
follows Cauchy distribution. In general cases, the distribution of the ratio is heavy tailed and
has no moments. The shape of its density can be unimodal, bimodal, symmetric, asymmetric
depending heavily on the value of the coefficient of the denominator variable (Hinkley, 1969;
Diaz-Frances and Rubio, 2013).
II. HURWICZ BIAS IN A SINGLE-EQUATION, SIMPLE LINEAR REGRESSION SETTING
3
The first part of my analysis focuses on the model 1 , t = 1,2,...,T,
~ 0,1 . The error terms are generated independently of the lagged dependent
variable values, . T takes values between 15 and 1000 ( 15, 30, 50, 100, and1000),
and takes values between 0.010 and 0.99 ( 0.10, 0.50,0.90, 0.95, and0.99). For each
simulated sample, OLS is used to estimate the value of . 10,000 replications were run for
each of the (T, ) combinations.
The results of the Monte Carlo experiments are reported in TABLE 1.1 The top panel
of the table reports the average estimated value of across the 10,000 replications. The
bottom panel reports Type I error rates, where the values are the rejection rates of the null
hypothesis, H0: . For both panels, the columns represent the different
sample sizes, and the rows represent the different true values of .
For example, when 0.10and T = 15, the average estimated value of = 0.014.
When T=1000, the average estimated value of increases to 0.098, illustrating the
consistency of the OLS estimator. Likewise, when 0.10and T = 15, the rejection rate
of the null hypothesis H0: 0.10 is 0.041 (significance level is 5%), rising to 0.054 when
T=1000 (see below).
Note that the nominal size of the bias is substantial when 0.90 for sample sizes
of T=50. For example, when 0.90, the average estimated value of is 0.819, with a
nominal bias of -0.081, or -9.0 percent.2 Similarly, the Type I error rates are substantially
larger than 0.05. When 0.90and T = 50, the rejection rate of the null hypothesis H0:
0.90 is 0.103 (significance level is 5%), rising to 0.227 when 0.99. This
1 Stata Version 13 was used for all the Monte Carlo experiments. Copies of the Stata .do files used to produce all the results in this paper are presented in the Appendix. Readers are encouraged to use the programs to confirm my results and to study alternative experimental parameters and settings. 2 Note that the percentage bias is often larger for smaller values of .
4
demonstrates that Hurwicz bias can be practically important in realistic empirical
environments in terms of both bias and inference.
III. HURWICZ BIAS IN A SINGLE-EQUATION, AUTO-REGRESSIVE, DISTRIBUTED LAG (ARDL) SETTING The next set of Monte Carlo experiments focuses on the implications of Hurwicz bias for
estimating the long-run propensity (LRP) of an explanatory variable. I study the ARDL
model, 1 , 1,2, . . . , , where takes values 0.10, 0.50, 0.90,
0.95, and 0.99 (same as above), ~ 0,1 , and the simulated values are again
generated independently of the values. The true LRP of x values are 1.11, 2, 10, 20, and
100, respectively. The estimated LRP of x values are calculated by 1 1⁄ .
TABLE 2 reports the results of these Monte Carlo experiments. The table is
organized similarly to TABLE 1 with some minor differences. I now report three panels.
The top panel reports the median (not average) value of the estimated LRP of x values across
the 10,000 replications.3 The bottom panel reports Type I error rates associated with the null
hypothesis, H0: LRPof . The middle panel reports the 90 percent
empirical sample range calculated by taking the 5th and 95th percentile values of the 10,000
estimated LRP values. While the range of the true values is the same as above, I only
report simulation results for sample sizes of T = 10, 50, and 1000.
As in TABLE 1, there is little nominal distortion associated with small values of
when T = 50. For example, when 0.10, the median estimated value of the LRP of x =
1.1, the same as the true value. The associated Type I error rate is 0.064. The 90 percent
empirical sample range extends from 0.08 to 1.5, encompassing the true value.
However, substantial distortions arise for large values of when T = 50. When
0.90, the median estimated LRP of x value is 7.5, 25 percent less than its true value.
3 I report the median rather than the mean, because estimates of very close to 1 can cause the LRP to be
extremely large, making the sample mean uninformative.
5
The 90 percent empirical sample range, while encompassing the true value, is quite large,
extending from 3.4 to 19.4 And the associated Type I error rate is 0.267. All of these values
become substantially worse for larger values of and smaller values of T.
IV. HURWICZ BIAS IN A SINGLE-EQUATION, DYNAMIC PANEL DATA (DPD) SETTING The last of my analyses focuses on the implications of Hurwicz bias for estimating the long-
run propensity (LRP) of an explanatory variable in dynamic panel data (DPD) models. The
corresponding Monte Carlo programs generate simulated panel data having the DGP,
1 1 , , 1,2, … , , 1,2, . . . , , where again takes
values 0.10, 0.50, 0.90, 0.95, and 0.99. is an error term that has a fixed effect,
, and each of the components of are independently and identically standard normally
distributed, ~ 0,1 , ~ 0,1 . The fixed effects are uncorrelated with , so there
is no endogeneity associated with the fixed effects. I begin with the panel data having
dimensions N=50, T=10.
Given the values of , the true LRP of x values are, again, 1.11, 2, 10, 20, and 100,
respectively. The LRP is estimated using three dynamic panel data estimators: Anderson-
Hsaio (AH), difference GMM (DGMM), and system GMM (SGMM). The results are
reported in TABLE 3 and again consist of three panels: (i) the median estimated LRP value,
(ii) the 90% empirical sample range of LRP vales, and (iii) the Type I error rate. The
columns now report the results for each of the three DPD estimators. Note that the panel
sample sizes are the same for all the experiments reported in TABLE 3 (N=50,T=10).
The consequences of Hurwicz bias are once again substantial for large values of .
For example, when 0.90, the median LRP estimates for the AH, DGMM, and SGMM
estimators are 0.6, 5.1, and 25.6 (compared to a true value of 10). The 90% empirical sample
ranges are very wide, and in none of the cases do they include the true value. The Type 1
6
error rates are 0.709, 0.821, and 0.001, rendering them worthless for hypothesis testing.
Larger values of result in simularly distorted values.
TABLE 4 reports a similar analysis for panel data models having dimensions N=140,
T=5. I chose these dimensions because they are the approximate dimensions of the “abdata”
bundled with Stata that is used to illustrate the functionality of the xtabond (difference GMM)
and the xtdpdsys (system GMM) commands in Stata.4 The results are very similar to those
from TABLE 3. Once again, substantial biases and size distortions are apparent for
0.90.
While much emphasis has been paid to Nickell bias in DPD models (Nickell, 1981),
the results above suggest that Hurwicz bias is a much more serious problem for estimation of
LRP values in DPD models.
V. BIAS REDUCATION UNHELPFUL FOR ESTIMATING LRP
We pointed out two statistical issues with estimating LPR in the Introduction, namely,
Hurwicz bias and heavy tailed ratio distributions. Here we illustrate that controlling for
Hurwicz bias does not necessarily improve the LPR estimation.
We use jackknife technique for reducing the leading order of the OLS estimator bias.
Suppose that is a maximum-likelihood based estimator of θ. Partition the observation
along the time dimension, that is 1, … , to 1,… , /2 and 1, … , .
And let ̅ / 0.5 . Compared with the original biased estimator , the
jackknifed estimator ̅ / is bias-reduced. For the ARDL model 1
, 1,2, . . . , , we apply the jackknifing procedure to reduce the bias in . Note that
4 The abdata data set used for the xtabond procedure has 140 groups, with a minimum of 4 observations per group, and a maximum of 6. The corresponding minimum and maximum values corresponding to the data used for the xtdpdsys procedure are 5 and 7, with the number of groups again equalling 140.
7
is not affected by finite-sample bias. The results are reported in Table 5. The top panel of
the table reports the average estimated value of across 10,000 replications, for both the
OLS estimator and the bias-reduced estimator. As it shown, the jackknifing procedure
greatly reduces the finite-sample bias in the OLS estimator. For example, when
0.99and T = 15, the average estimated value of = 0.76. The average value improves to
0.93 after jackknifing. The middle panel reports the median value of the estimated LRP
across the 10,000 replications using based on the bias-reduced estimates. And the bottom
panel reports the 90 percent empirical sample range of the estimated LRP values. We can see
that better estimate of does not necessarily lead to a better estimate of the LRP ratio.
Table 6 reports results on DPD model. The jackknifing procedure fails to reduce any finite-
sample bias in this case.
VI. CONCLUSION
It is well known that OLS estimation of a single equation time series model having the
specification , ~ 0, produces biased estimates of .
This bias is known as “Hurwicz bias” and was identified more than sixty years ago (Hurwicz,
1950). To the best of my knowledge, the implications of this for estimation of auto-
regressive, distributed lag (ARDL) and dynamic panel data (DPD) models have not been
previously noted.
This paper uses Monte Carlo experiments to demonstrate that the practical
consequences of Hurwicz bias can be substantial in finite samples when is characterized
by a significant degree of auto-correlation. Estimates of the long-run propensity (LRP) of
explanatory variables calculated from ARDL and DPD models can be biased by 50 or even
8
100 percent in realistic data settings. In some cases, 90 percent empirical sample ranges for
LRP estimates do not include the true value. And Type I error rates are sufficiently distorted
to render them virtually useless for hypothesis testing.
Again to the best of my knowledge, there is no econometric procedure that can
identify and/or correct for Hurwicz bias. One potential indicator that there is a problem is the
size of the estimated coefficient on the lagged dependent variable. If this coefficient is close
to one, then empirical estimates of long-run impacts of explanatory variables are likely to be
severely distorted. Of course, this is complicated by the fact that Hurwicz bias causes the
coefficient of the lagged dependent variable to be underestimated. However, programs like
the ones attached to the end of this study can be used to simulate DGPs that produce
estimates similar to those observed by the researcher. The DGP parameters uncovered in this
manner can be used to assess the severity of the problem.
All of the discussion above assumes a relatively uncomplicated data environment.
For example, the presence of serial correlation in the error term is likely to compound the
problems identified here.5 Given these difficulties, researchers may want to consider
specifications that replace the lagged dependent variable with lagged values of the
explanatory variable(s). The resulting costs associated with diminished degrees of freedom
may be small compared to the distortions from including a lagged dependent variable.
In conclusion, it is hoped that that the results of this study will encourage researchers
to be even more cautious in the use and interpretation of models with lagged dependent
variables.
55 The following papers provide useful discussions of the problems associated with serially-correlated error terms in dynamic models: Griliches (1961), Keele and Kelly (2006), and Beck and Katz (2011).
9
REFERENCES
Beck, N. and Katz, J.N. 2011. “Modelling Dynamics in Time-Series-Cross-Sectional
Political Economy Data.” Annual Review of Political Science Vol. 14, pp. 331–52. Diaz‐Frances, E. and Rubio, F.J. 2013. “On the Existence of a Normal approximation to the
distribution of the Ratio of Two Independent Normal Random Variables.” Statistical Papers, Vol. 54 No. 2, pp. 309‐323.
Griliches, Zvi. 1961. ‘‘A Note of Serial Correlation Bias in Estimates of Distributed Lags.’’
Econometrica Vol. 29, pp. 65–73. Hinkley, D.V. 1969. “On the Ratio of Two Correlated Normal Random Variables.” Biometrika,
Vol. 56 No. 3, pp. 635‐639. Hurwicz, L. 1950. “Least Squares Bias in Time Series," in Statistical Inference in Dynamic
Economic Models, ed. by T. C. Koopmans. New York: Wiley. Keele, L. and Kelly, N.J. 2006. “Dynamic Models for Dynamic Theories: The Ins and Outs
of Lagged Dependent Variables.” Political Analysis Vol. 14: pp. 186-205. Marriott, F.H.C. and Pope, J.A. 1954. “Bias in the estimation of
autocorrelations.” Biometrika, Vol. 41, pp. 390‐402. Nickell, S. 1981. “Biases in Dynamic Models with Fixed Effects.” Econometrica, Vol. 49,
No. 6, pp. 1417-1426. Patterson, K. 2000. “Finite sample bias of the least squares estimators in an AR(p) model:
estimation, inference, simulation and examples.” Applied Economics, Vol. 32 No. 15, pp. 1993‐2005.
Phillips, P.C.B. 1977. “Approximations to Some Finite Sample Distributions Associated with
a First-Order Stochastic Difference Equation.” Econometrica, Vol. 45, No. 2, pp. 463-485.
10
TABLE 1
Demonstration of Hurwicz Bias in a Single Equation Time Series Model:
True Value of SAMPLE SIZE
T=15 T=30 T=50 T=100 T=1000
Average Estimated Value of
0.10 0.014 0.057 0.074 0.086 0.098
0.50 0.339 0.418 0.449 0.475 0.498
0.90 0.640 0.764 0.819 0.860 0.896
0.95 0.668 0.802 0.862 0.906 0.946
0.99 0.698 0.830 0.893 0.941 0.986
Type I Error Rate (H0: true value)
0.10 0.041 0.046 0.051 0.047 0.054
0.50 0.052 0.054 0.054 0.051 0.053
0.90 0.131 0.129 0.103 0.085 0.055
0.95 0.178 0.165 0.144 0.110 0.056
0.99 0.221 0.238 0.227 0.203 0.089
SOURCE: All Monte Carlo experiments were run using Version 13 of Stata. 10,000 replications were run for each experiment. The experiments are described in more detail in Section II of the text. The associated simulation programs are given in “TABLE1A” and “TABLE1B” in the Appendix.
11
TABLE 2 Demonstration of Hurwicz Bias in an ARDL Model:
; LRP of x = ⁄
True Value of / True Value of LRP of x
SAMPLE SIZE
T=10 T=50 T=1000
Median Value of Estimated LRP of x
0.10 / 1.11 1.1 1.1 1.1 0.50 / 2 1.6 1.9 2.0 0.90 / 10 2.8 7.5 9.9 0.95 / 20 2.8 11.4 19.5 0.99 / 100 2.7 16.0 85.9
90 Percent Empirical Sample Range for Estimated LRP of x
0.10 / 1.11 0.3 — 2.3 0.8 — 1.5 1.0 — 1.2 0.50 / 2 0.4 — 5.0 1.3 — 2.8 1.8 — 2.2 0.90 / 10 -13.2 — 21.9 3.4 — 19.4 8.3 — 11.7 0.95 / 20 -18.1 — 25.2 3.9 — 45.7 15.3 — 24.8 0.99 / 100 -24.6 — 28.8 -72.1 — 125.4 51.2 — 152.9
Type I Error Rate (H0: LRP of x = true value)
0.10 / 1.11 0.104 0.064 0.050 0.50 / 2 0.185 0.081 0.051 0.90 / 10 0.530 0.267 0.069 0.95 / 20 0.661 0.393 0.084 0.99 / 100 0.844 0.681 0.197
12
SOURCE: All Monte Carlo experiments were run using Version 13 of Stata. 10,000 replications were run for each experiment. The experiments are described in more detail in Section III of the text. The associated simulation programs are given in “TABLE2A” and “TABLE2B” in the Appendix.
13
TABLE 3 Demonstration of Hurwicz Bias in DPD Model:
, ; LRP of x = ⁄ ; N=50, T=10
True Value of / True Value of LRP of x
ESTIMATOR
Anderson-Hsaio Difference GMM System GMM
Median Value of Estimated LRP of x
0.10 / 1.11 1.1 1.1 1.1
0.50 / 2 1.9 1.8 2.0
0.90 / 10 0.6 5.1 25.6
0.95 / 20 2.5 10.8 ‐56.1
0.99 / 100 8.3 34.7 ‐22.2
Average 90 Percent Empirical Sample Range for Estimated LRP of x
0.10 / 1.11 0.9 — 1.3 1.0 — 1.2 1.0 — 1.2 0.50 / 2 1.3 — 3.5 1.6 — 2.1 1.7 — 2.3 0.90 / 10 -5.9 — 7.0 3.3 — 8.3 11.4 — 99.7 0.95 / 20 -27.3 — 27.8 6.6 — 21.1 -332.5 — 225.6 0.99 / 100 -99.4 — 105.0 13.1 — 170.0 -30.0 — -17.9
Average Type I Error Rate (H0: LRP of x = true value)
0.10 / 1.11 0.053 0.083 0.088
0.50 / 2 0.077 0.194 0.110
0.90 / 10 0.709 0.821 0.001
0.95 / 20 0.514 0.642 0.499
0.99 / 100 0.524 0.585 1.000
14
SOURCE: All Monte Carlo experiments were run using Version 13 of Stata. 10,000 replications were run for each experiment. The experiments are described in more detail in Section IV of the text. The associated simulation programs are given in “TABLE3A” and “TABLE3B” in the Appendix.
15
TABLE 4 Demonstration of Hurwicz Bias in DPD Model:
, ; LRP of x = ⁄ ; N=140, T=5
True Value of / True Value of LRP of x
ESTIMATOR
Anderson-Hsaio Difference GMM System GMM
Median Value of Estimated LRP of x
0.10 / 1.11 1.1 1.1 1.1
0.50 / 2 2.0 1.8 1.9
0.90 / 10 0.8 3.4 25.2
0.95 / 20 1.8 7.7 ‐57.2
0.99 / 100 8.3 25.6 ‐22.1
Average 90 Percent Empirical Sample Range for Estimated LRP of x
0.10 / 1.11 0.9 — 1.3 1.0 — 1.2 1.0 — 1.2 0.50 / 2 1.4 — 3.3 1.6 — 2.2 1.7 — 2.3 0.90 / 10 -8.7 — 9.6 2.1 — 6.2 10.8 — 105.2 0.95 / 20 -22.0 — 21.5 4.5 — 19.1 -263.0 — -21.4 0.99 / 100 -103.1 — 105.6 -86.0 — 148.9 -27.2 — -18.6
Average Type I Error Rate (H0: LRP of x = true value)
0.10 / 1.11 0.052 0.083 0.095
0.50 / 2 0.073 0.191 0.143
0.90 / 10 0.640 0.921 0.001
0.95 / 20 0.583 0.746 0.618
0.99 / 100 0.540 0.618 1.000
16
SOURCE: All Monte Carlo experiments were run using Version 13 of Stata. 10,000 replications were run for each experiment. The experiments are described in more detail in Section IV of the text. The associated simulation programs are given in “TABLE4A” and “TABLE4B” in the Appendix.
17
TABLE 5
Bias-reduced estimate in an ARDL Model: ; LRP of x = ⁄
True Value of / True Value of LRP of x
SAMPLE SIZE
T=10 T=50 T=1000
OLS estimate / Jackknifing estimate of
0.10 0.33 / 0.95 0.88 / 0.10 0.10 / 0.10 0.50 0.37 / 0.48 0.48 / 0.50 0.50 / 0.50 0.90 0.70 / 0.85 0.86 / 0.90 0.90 / 0.90 0.95 0.73 / 0.89 0.90 / 0.95 0.95 / 0.95 0.99 0.76 / 0.93 0.94 / 0.99 0.99 / 0.99
Median Value of Estimated LRP of x – Bias-corrected
0.10 / 1.11 1.09 1.11 1.11 0.50 / 2 1.5 2.0 2.0 0.90 / 10 1.1 8.1 10.1 0.95 / 20 0.9 9.1 20.2 0.99 / 100 0.7 6.7 100.1
90 Percent Empirical Sample Range for Estimated LRP of x – Bias-corrected
0.10 / 1.11 0.2 — 3.3 0.8 — 1.5 1.0 — 1.2 0.50 / 2 -5.5 — 8.5 1.3 — 3.1 1.8 — 2.2 0.90 / 10 -16.8 — 17.1 -34.2 — 48.8 8.0 — 12.0 0.95 / 20 -17.3 — 19.7 -70.5 — 85.0 15.6 — 26.6 0.99 / 100 -17.0 — 18.7 -99.6 — 101.7 47.1 — 335.3
18
TABLE 6 Bias-reduced estimate of Hurwicz Bias in DPD Model: , ; LRP of x = ⁄ ; N=50, T=10
True Value of / True Value of LRP of x
ESTIMATOR
Anderson-Hsaio Difference GMM System GMM
OLS estimate / Jackknifing estimate of
0.10 0.10 / 0.10 0.79 / 0.95 0.96 / 0.10
0.50 0.49 / 0.50 0.46 / 0.50 0.79 / 0.51
0.90 0.36 / 0.34 0.81 / 0.90 0.96 / 0.96
0.95 0.86 / 0.98 0.91 / 0.97 1.01 / 1.00
0.99 0.98 / 0.99 0.97 / 1.00 1.05 / 1.04
Median Value of Estimated LRP of x – Bias-corrected
0.10 / 1.11 1.11 1.10 1.12
0.50 / 2 2.0 2.0 2.0
0.90 / 10 0.3 9.1 25.7
0.95 / 20 1.1 20.1 ‐64.7
0.99 / 100 6.3 ‐25.7 ‐25.9
90 Percent Empirical Sample Range for Estimated LRP of x – Bias-corrected
0.10 / 1.11 0.9 — 1.4 1.0 — 1.2 1.0 — 1.3 0.50 / 2 1.3 — 3.9 1.6 — 2.4 1.7 — 2.4 0.90 / 10 -4.2 — 4.5 -14.9 — 46.6 9.4 — 113.7 0.95 / 20 -20.1 — 23.5 -157.5 — 179.6 -502.9 — 419.7 0.99 / 100 -91.0 — 87.3 -375.3 — 348.7 -37.8 — -20.0
19
20
APPENDIX: Stata .do files that accompany this paper
NOTE #1: These .do files are included to make it possible for readers to (i) confirm this study’s findings, and (ii) investigate alternative DGP specifications. NOTE #2: The output for each table is produced by a two-part program. For Table X, the TABLEXA program must be run first, followed by the TABLEXB program. The latter program calls the former program and produces the results reported in the table. The computing time necessary to produce the results for a table are given at the beginning of the respective B programs.
21
“TABLE1A” program
program drop _all program define hurwicz, rclass version 13 syntax, beta0(real) betay(real) betau(real) smallobs(integer) bigobs(integer) // Remove existing variables drop _all // Create the data set obs `bigobs' gen t = _n tsset t // We initialize y as its LR equilibrium value gen y = `beta0'/(1‐`betay') replace y = `beta0' + `betay'*L.y + `betau'*rnormal() in 2/l regress y L.y in ‐`smallobs'/l return scalar bhaty = _b[L.y] test _b[L.y] = `betay' return scalar pvalue = r(p) end
22
“TABLE1B” program
// This program takes approximately 30 mins to run on my laptop. // The programs must be run in the following order: (i) hurwicz1A, then (ii) hurwicz1B. // The program “etime” is a user‐written .do file that can be obtained online etime, start drop _all clear graph drop _all set more off set seed 13 matrix meanbhaty = J(5,5,0) matrix meanRR = J(5,5,0) local i = 1 foreach betay in 0.1 0.5 0.9 0.95 0.99 { local j = 1 foreach smallobs in 15 30 50 100 1000 { simulate bhaty = r(bhaty) pvalue = r(pvalue), /// reps(10000): hurwicz, betay(`betay') smallobs(`smallobs') bigobs(1100) /// beta0(1) betau(1) summ bhaty, meanonly matrix meanbhaty[`i', `j'] = r(mean) generate RejectRate = 0 replace RejectRate = 1 if pvalue < 0.05 summ RejectRate, meanonly matrix meanRR[`i', `j'] = r(mean) local `++j' } local `++i' } matrix colnames meanbhaty = T15 T30 T50 T100 T1000 matrix rownames meanbhaty= B10 B50 B90 B95 B99 matrix colnames meanRR = T15 T30 T50 T100 T1000 matrix rownames meanRR = B10 B50 B90 B95 B99 matrix list meanbhaty matrix list meanRR etime
23
“TABLE2A” program
program drop _all program define ARDLprog, rclass version 13 syntax, beta0(real) betay(real) betax(real) betau(real) smallobs(integer) bigobs(integer) // Remove existing variables drop _all // Create the data set obs `bigobs' gen t = _n tsset t // We initialize y as its LR equilibrium value gen x=rnormal() gen u=rnormal() gen y = `beta0'/(1‐`betay') replace y = `beta0' + `betax'*x+`betay'*L.y + u in 2/l regress y L.y x in ‐`smallobs'/l return scalar LRP = _b[x]/(1‐_b[L.y]) testnl _b[x]/(1‐_b[L.y])=`betax'/(1‐`betay') return scalar pLRP = r(p) end
24
“TABLE2B” program
// This program takes approximately 20 mins to run on my laptop. // The programs must be run in the following order: (i) hurwicz2A, then (ii) hurwicz2B. // The program “etime” is a user‐written .do file that can be obtained online etime, start drop _all clear graph drop _all set more off set seed 13 matrix medLRP = J(5,3,0) matrix confidLRP = J(5,6,0) matrix meanRR = J(5,3,0) local i = 1 foreach betay in 0.1 0.5 0.9 0.95 0.99 { local j = 1 foreach smallobs in 10 50 1000 { simulate LRP = r(LRP) pLRP = r(pLRP), /// reps(10000): ARDLprog, betay(`betay') smallobs(`smallobs') bigobs(1100) /// beta0(1) betau(1) betax(1) summ LRP, detail matrix medLRP[`i', `j'] = r(p50) matrix confidLRP[`i', (`j'‐1)*2+1] = r(p5) matrix confidLRP[`i', (`j'‐1)*2+2] = r(p95) generate RejectRate = 0 replace RejectRate = 1 if pLRP < 0.05 summ RejectRate, meanonly matrix meanRR[`i', `j'] = r(mean) local `++j' } local `++i' } matrix colnames medLRP = "T10(P50)" "T50(P50)" "T1000(P50)" matrix rownames medLRP= B10 B50 B90 B95 B99 matrix colnames confidLRP = "T10(P5)" "T10(P95)" "T50(P5)" "T50(P95)" /// "T1000(P5)" "T1000(P95)" matrix rownames confidLRP= B10 B50 B90 B95 B99 matrix colnames meanRR = T10 T50 T1000 matrix rownames meanRR = B10 B50 B90 B95 B99 matrix list medLRP matrix list confidLRP matrix list meanRR etime
25
“TABLE3A” program
program drop _all program define DPDprog, rclass version 13 syntax, beta0(real) betax(real) betay(real) numN(integer) numT(integer) numNT(integer) /// beffect(real) drop _all set obs `numN' gen id = _n // "ai" is the part of the error term that doesn't change over time for a given // unit. gen ai = rnormal() expand `numT' bysort id: gen t=_n xtset id t gen x = (`beffect'*ai + rnormal())/sqrt(1+`beffect'^2) // "uit" is the part of the error term that varies across time gen uit = rnormal() // The total error term is the sum of "a" and "e" gen error = ai + uit gen y = `beta0'/(1‐`betay') replace y = `beta0' + `betax'*x + `betay'*L.y + error if t > 1 //This section calculates AH estimates xtivreg y x (L.y = L(2/3).y) if t > 10, fd return scalar AHLRP = _b[D.x]/(1‐_b[LD.y]) testnl _b[D.x]/(1‐_b[LD.y])=`betax'/(1‐`betay') return scalar pAHLRP = r(p) // This section calculates Difference GMM xtabond y x if t > 10 return scalar DGMMLRP = _b[x]/(1‐_b[L.y]) testnl _b[x]/(1‐_b[L.y])=`betax'/(1‐`betay') return scalar pDGMMLRP = r(p) // This section calculates System GMM xtdpdsys y x if t > 10 return scalar SGMMLRP = _b[x]/(1‐_b[L.y]) testnl _b[x]/(1‐_b[L.y])=`betax'/(1‐`betay') return scalar pSGMMLRP = r(p) end
26
“TABLE3B” program
// This program takes approximately 70 mins to run on my laptop. // The programs must be run in the following order: (i) hurwicz3A, then (ii) hurwicz3B. // The program “etime” is a user‐written .do file that can be obtained online etime, start drop _all clear set more off set seed 13 matrix medLRP = J(5,3,0) matrix confidLRP = J(5,6,0) matrix meanRR = J(5,3,0) // The local commands below set all the parameters for the experiments. local beffect = 0 // “beffect” controls the degree of correlation between x and the fixed effect. // When beffect = 0, there is no omitted variable bias. As beffect increases, // omitted variable bias increases. local numN = 50 // This sets the number of cross‐sectional units local numT = 20 // This sets the number of time observations per unit // Note that the estimations will only use the last 10 of these observations local numNT = `numN'*`numT' // This sets the total number of observations local beta0 = 1 // This sets the intercept term local betax = 1 // This sets the slope coefficient for x local reps = 10000 local i = 1 foreach betay in 0.1 0.5 0.9 0.95 0.99 { simulate AHLRP = r(AHLRP) pAHLRP = r(pAHLRP) DGMMLRP = r(DGMMLRP) /// pDGMMLRP = r(pDGMMLRP) SGMMLRP = r(SGMMLRP) pSGMMLRP = r(pSGMMLRP) , /// reps(`reps'): DPDprog, betay(`betay') betax(`betax') beta0(`beta0') /// numN(`numN') numT(`numT') numNT(`numNT') beffect(`beffect') summ AHLRP, detail matrix medLRP[`i',1] = r(p50) matrix confidLRP[`i',1] = r(p5) matrix confidLRP[`i',2] = r(p95) generate RRAHLRP = 0 replace RRAHLRP = 1 if pAHLRP < 0.05 summ RRAHLRP, meanonly matrix meanRR[`i', 1] = r(mean) summ DGMMLRP, detail matrix medLRP[`i',2] = r(p50) matrix confidLRP[`i',3] = r(p5) matrix confidLRP[`i',4] = r(p95) generate RRDGMMLRP = 0 replace RRDGMMLRP = 1 if pDGMMLRP < 0.05 summ RRDGMMLRP, meanonly matrix meanRR[`i', 2] = r(mean)
27
summ SGMMLRP, detail matrix medLRP[`i',3] = r(p50) matrix confidLRP[`i',5] = r(p5) matrix confidLRP[`i',6] = r(p95) generate RRSGMMLRP = 0 replace RRSGMMLRP = 1 if pSGMMLRP < 0.05 summ RRSGMMLRP, meanonly matrix meanRR[`i', 3] = r(mean) local `++i' } // These commands print out the results matrix colnames medLRP = AHLRP(p50) DGMMLRP(p50) SGMMLRP(p50) matrix rownames medLRP = B10 B50 B90 B95 B99 matrix colnames confidLRP = AHLRP(p5) AHLRP(p95) DGMMLRP(p5) DGMMLRP(p95) /// SGMMLRP(p5) SGMMLRP(p95) matrix rownames confidLRP = B10 B50 B90 B95 B99 matrix colnames meanRR = AHLRP DGMMLRP SGMMLRP matrix rownames meanRR = B10 B50 B90 B95 B99 matrix list medLRP matrix list confidLRP matrix list meanRR etime
28
“TABLE4A” program
program drop _all program define DPDprog, rclass version 13 syntax, beta0(real) betax(real) betay(real) numN(integer) numT(integer) numNT(integer) /// beffect(real) drop _all set obs `numN' gen id = _n gen ai = rnormal() // "ai" is the part of the error term that doesn't change over time for a given unit. expand `numT' bysort id: gen t=_n xtset id t gen x = (`beffect'*ai + rnormal())/sqrt(1+`beffect'^2) gen uit = rnormal() // "uit" is the part of the error term that varies across time // The total error term is the sum of "a" and "e" gen error = ai + uit gen y = `beta0'/(1‐`betay') replace y = `beta0' + `betax'*x + `betay'*L.y + error if t > 1 //This section calculates AH estimates xtivreg y x (L.y = L(2/3).y) if t > 15, fd return scalar AHLRP = _b[D.x]/(1‐_b[LD.y]) testnl _b[D.x]/(1‐_b[LD.y])=`betax'/(1‐`betay') return scalar pAHLRP = r(p) // This section calculates Difference GMM xtabond y x if t > 15 return scalar DGMMLRP = _b[x]/(1‐_b[L.y]) testnl _b[x]/(1‐_b[L.y])=`betax'/(1‐`betay') return scalar pDGMMLRP = r(p) // This section calculates System GMM xtdpdsys y x if t > 15 return scalar SGMMLRP = _b[x]/(1‐_b[L.y]) testnl _b[x]/(1‐_b[L.y])=`betax'/(1‐`betay') return scalar pSGMMLRP = r(p) end
29
“TABLE4B” program
// This program takes approximately 100 mins to run on my laptop. // The programs must be run in the following order: (i) hurwicz3A, then (ii) hurwicz3B. // The program “etime” is a user‐written .do file that can be obtained online etime, start drop _all clear set more off set seed 13 matrix medLRP = J(5,3,0) matrix confidLRP = J(5,6,0) matrix meanRR = J(5,3,0) // The local commands below set all the parameters for the experiments. local beffect = 0 // “beffect” controls the degree of correlation between x and the fixed effect. // When beffect = 0, there is no omitted variable bias. As beffect increases, // omitted variable bias increases. local numN = 140 // This sets the number of cross‐sectional units local numT = 20 // This sets the number of time observations per unit // Note that the estimations will only use the last 5 of these observations local numNT = `numN'*`numT' // This sets the total number of observations local beta0 = 1 // This sets the intercept term local betax = 1 // This sets the slope coefficient for x local reps = 10000 local i = 1 foreach betay in 0.1 0.5 0.9 0.95 0.99 { simulate AHLRP = r(AHLRP) pAHLRP = r(pAHLRP) DGMMLRP = r(DGMMLRP) /// pDGMMLRP = r(pDGMMLRP) SGMMLRP = r(SGMMLRP) pSGMMLRP = r(pSGMMLRP) , /// reps(`reps'): DPDprog, betay(`betay') betax(`betax') beta0(`beta0') /// numN(`numN') numT(`numT') numNT(`numNT') beffect(`beffect') summ AHLRP, detail matrix medLRP[`i',1] = r(p50) matrix confidLRP[`i',1] = r(p5) matrix confidLRP[`i',2] = r(p95) generate RRAHLRP = 0 replace RRAHLRP = 1 if pAHLRP < 0.05 summ RRAHLRP, meanonly matrix meanRR[`i', 1] = r(mean) summ DGMMLRP, detail matrix medLRP[`i',2] = r(p50) matrix confidLRP[`i',3] = r(p5) matrix confidLRP[`i',4] = r(p95) generate RRDGMMLRP = 0 replace RRDGMMLRP = 1 if pDGMMLRP < 0.05 summ RRDGMMLRP, meanonly matrix meanRR[`i', 2] = r(mean)
30
summ SGMMLRP, detail matrix medLRP[`i',3] = r(p50) matrix confidLRP[`i',5] = r(p5) matrix confidLRP[`i',6] = r(p95) generate RRSGMMLRP = 0 replace RRSGMMLRP = 1 if pSGMMLRP < 0.05 summ RRSGMMLRP, meanonly matrix meanRR[`i', 3] = r(mean) local `++i' } // These commands print out the results matrix colnames medLRP = AHLRP(p50) DGMMLRP(p50) SGMMLRP(p50) matrix rownames medLRP = B10 B50 B90 B95 B99 matrix colnames confidLRP = AHLRP(p5) AHLRP(p95) DGMMLRP(p5) DGMMLRP(p95) /// SGMMLRP(p5) SGMMLRP(p95) matrix rownames confidLRP = B10 B50 B90 B95 B99 matrix colnames meanRR = AHLRP DGMMLRP SGMMLRP matrix rownames meanRR = B10 B50 B90 B95 B99 matrix list medLRP matrix list confidLRP matrix list meanRR etime