A Compari son of Dynamic Panel Data Estimators Using Monte … · 2017-11-29 · A Comparison of...
Transcript of A Compari son of Dynamic Panel Data Estimators Using Monte … · 2017-11-29 · A Comparison of...
UNIVERSITEIT GENT GHENT UNIVERSITY
FACULTEIT ECONOMIE EN BEDRIJFSKUNDE FACULTY OF ECONOMICS AND BUSINESS
ADMINISTRATION
ACADEMIC YEAR 2016 – 2017
A Comparison of Dynamic Panel Data Estimators Using Monte Carlo
Simulations and the Firm Growth Model
Masterproef voorgedragen tot het bekomen van de graad van
Master’s Dissertation submitted to obtain the degree of
Master of Science in Business Economics
Bob Mertens
Under the guidance of Peter Beyne
Prof. Philippe Van Cauwenberge
UNIVERSITEIT GENT GHENT UNIVERSITY
FACULTEIT ECONOMIE EN BEDRIJFSKUNDE FACULTY OF ECONOMICS AND BUSINESS
ADMINISTRATION
ACADEMIC YEAR 2016 – 2017
A Comparison of Dynamic Panel Data Estimators Using Monte Carlo
Simulations and the Firm Growth Model
Masterproef voorgedragen tot het bekomen van de graad van
Master’s Dissertation submitted to obtain the degree of
Master of Science in Business Economics
Bob Mertens
Under the guidance of Peter Beyne
Prof. Philippe Van Cauwenberge
Admission to Loan
The author gives permission to make this master’s dissertation available for consultation and
to copy parts of this master’s dissertation for personal use. In the case of any other use, the
limitations of the copyright have to be respected, in particular with regard to the obligation to
state expressly the source when quoting results from this master dissertation.
Bob Mertens, June 2017
A Comparison of Dynamic Panel DataEstimators Using Monte Carlo Simulations
and the Firm Growth Modelby
Bob Mertens
Master’s Dissertation submitted to obtain the academic degree of
Master of Science in Business Economics
Academic 2016–2017 Promoter: Prof. Philippe VAN CAUWENBERGE
Supervisor: Peter Beyne
Faculty of Economics and Business Administration
Ghent University
Department of Accounting, Corporate Finance and Taxation
Samenvatting [Dutch]
In deze masterproef worden 4 schatters vergeleken, gebruikt bij het schatten van dynamis-
che panel data modellen: de Fixed Effects schatter, de Anderson-Hsiao IV schatter, de First-
Differenced GMM schatter en de System GMM schatter. De vergelijking wordt uitgevoerd
op basis van verschillende Monte Carlo simulaties, geımplementeerd in MATLAB. De indeling
van deze thesis is als volgt: In Hoofdstuk 1 worden het doel en de motivatie voor deze thesis
toegelicht. Vervolgens wordt in Hoofdstuk 2 het ’Firm Growth Model’ besproken. Dit dynamisch
panel data model wordt immers gebruikt tijdens de Monte Carlo simulaties en de 4 schatters
zullen dus vergeleken worden bij het schatten van de parameters in dit specifiek model. In
Hoofdstuk 3 wordt de werking van de Fixed Effects schatter besproken, alsook zijn voordelen en
tekortkomingen. Na het behandelen van deze eerder eenvoudige schatter, worden in Hoofdstuk
4 de 3 meer gecompliceerde schatters gepresenteerd die gebruikt worden bij het schatten van
dynamische panel data modellen: de Anderson-Hsiao IV schatter, de First-Differenced GMM
schatter en de System GMM schatter. Ook voor deze schatters worden de werking, voordelen en
tekortkomingen kort besproken. In Hoofdstuk 5 komen we uiteindelijk aan bij de kern van deze
thesis, namelijk de vergelijking van de 4 bestudeerde schatters m.b.v. Monte Carlo simulaties.
De simulatiemethode wordt besproken, tezamen met de resultaten. Deze resultaten tonen dat
de System GMM schatter duidelijk het best presteert bij het schatten van de parameters in het
Firm Growth Model. De System GMM schatter is zowel consistenter als efficienter dan de 3
andere schatters. Ten slotte sluit Hoofdstuk 6 deze thesis af met een conclusie en een laatste
overzicht van de behaalde resultaten.
Keywords
Dynamic Panel Data Models; Firm Growth Model; Fixed Effects estimator; Anderson-Hsiao IV
estimator; First- Differenced GMM estimator; System GMM estimator; Monte Carlo simulation
Preface
This master’s dissertation concludes my Master in Business Economics at Ghent University.
This economical education broadened my horizons and taught me the basics in economics and
business administration. This master, however, is not my first, as I already obtained a degree in
electrical engineering in 2015. Since I like mathematics, this master’s dissertation can be quite
mathematical at some points.
I would like to thank my promoter Prof. Philippe Van Cauwenberge for granting me the oppor-
tunity to write this thesis under his supervision. He introduced me to Peter Beyne, researcher
at the Department of Accounting, Corporate Finance and Taxation, who’s guidance led me to
this particular subject. Many thanks to you as well, Peter.
Lastly, my parents and girlfriend deserve special thanks. My parents gave me the chance to
pursue an extra master in economics after my engineering education and my girlfriend always
supported me during this education.
Bob Mertens, June 2017
CONTENTS i
Contents
List of Abbreviations iii
List of Figures iv
List of Tables v
1 Introduction 1
1.1 The Need for Reliable Dynamic Panel Data Estimators . . . . . . . . . . . . . . . 1
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 The Firm Growth Model 3
3 The Fixed Effects Estimator and Bias 5
3.1 Presentation of the Fixed Effects Estimator . . . . . . . . . . . . . . . . . . . . . 5
3.2 The Nickell Bias of the Fixed Effects Estimator . . . . . . . . . . . . . . . . . . . 7
4 Estimation Techniques for Dynamic Panel Data 9
4.1 The Idea of Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 The Anderson-Hsiao Instrument Variable Estimator . . . . . . . . . . . . . . . . 11
4.3 Introduction to the Generalized Method of Moments . . . . . . . . . . . . . . . . 13
4.4 The First-Differenced Generalized Method of Moments Estimator . . . . . . . . . 15
4.5 The System Generalized Method of Moments Estimator . . . . . . . . . . . . . . 18
5 Monte Carlo Simulations 23
5.1 General Set-Up of Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Configuration of the Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 MATLAB Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
CONTENTS ii
5.4.1 Simulation results for β1 + 1 = 0.1 . . . . . . . . . . . . . . . . . . . . . . 29
5.4.2 Simulation results for β1 + 1 = 0.5 . . . . . . . . . . . . . . . . . . . . . . 31
5.4.3 Simulation results for β1 + 1 = 0.9 . . . . . . . . . . . . . . . . . . . . . . 32
5.4.4 Conclusion simulation results . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Conclusion 35
A Matlab Implementation of Estimators and Monte Carlo Simulations 37
A.1 FE final.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2 AHIV final.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.3 FDGMM final.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.4 SGMM final.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A.5 MC simulations final.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A.6 MC results processing final.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bibliography 47
CONTENTS iii
List of Abbreviations
AHIV Anderson-Hsiao Instrument
Variable
FDGMM First-Differenced GMM
FE Fixed Effects
GMM Generalized Method of Moments
IV Instrument Variable
OLS Ordinary Least Squares
SGMM System GMM
LIST OF FIGURES iv
List of Figures
5.1 Schematic diagram representing MATLAB implementation . . . . . . . . . . . . 28
5.2 MC simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 MC simulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 MC simulation 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.5 MC simulation 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.6 MC simulation 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.7 MC simulation 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.8 MC simulation 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.9 MC simulation 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.10 MC simulation 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.11 MC simulation 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.12 MC simulation 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.13 MC simulation 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
LIST OF TABLES v
List of Tables
5.1 Descriptive statistics of variables ln(S) and ln(A) . . . . . . . . . . . . . . . . . . 26
5.2 Estimates for the population parameters. . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Configuration of simulations to be performed . . . . . . . . . . . . . . . . . . . . 27
INTRODUCTION 1
Chapter 1
Introduction
1.1 The Need for Reliable Dynamic Panel Data Estimators
The use of dynamic panel data models in present-day empirical economic research has become
more and more popular. As the name of these models indicates, dynamic panel data models
make use of panel data and are dynamic. These two components immediately explain the pop-
ularity of these type of models. The first component is the usage of panel data in these models.
Panel data consists of multiple cross-sectional units, e.g. individuals, households, firms, or coun-
tries, observed at different points in time. This type of data delivers an obvious advantage over
cross-sectional data: we cannot estimate dynamic models from observations at a single point in
time, and it is rare for single cross-section surveys to provide sufficient information about earlier
time periods for dynamic relationships to be investigated. It is clear that the combination of
cross-sectional data and time series data allows for richer econometric models and more accurate
conclusions. The second component of these models is the dynamic nature of these models. In
a dynamic models, past observations of the variable of interest can influence the current value
of that variable. In that way, dynamic adjustment processes can be analyzed for a broad base
of cross-sectional units.
The increasing use of dynamic panel data models requires the availability of reliable dynamic
panel data estimators. In this master’s dissertation, 4 dynamic panel data estimators, suggested
in the literature, are compared: the Fixed Effects estimator, the Anderson-Hsiao Instrument
Variable (IV) estimator, the First-Differenced Generalized Method of Moments (GMM) estima-
tor, and the System GMM estimator. The comparison of the estimators is done using Monte
INTRODUCTION 2
Carlo simulations, which are implemented in MATLAB. During this type of simulation, the co-
efficients of a certain dynamic panel data model are estimated repeatedly over a large number of
iterations. In every iteration, a new panel data set is created using a well defined statistical dis-
tribution. The underlying econometric model, used during the simulations, is the ’Firm Growth
Model’. Each of the 4 estimators is then compared in terms of consistency and efficiency.
1.2 Outline
Chapter 2 of this master’s dissertation discusses the ’Firm Growth Model’. This dynamic panel
data model is used during the Monte Carlo simulations. The historical development of this
model is shortly described, and the final form of the model is presented. In Chapter 3, the Fixed
Effects model is explained, and its shortcomings when used in dynamic panel data analysis
are highlighted. After covering the rather simple Fixed Effects estimator, Chapter 4 presents
3 more sophisticated estimators used in dynamic panel data estimation: the Anderson-Hsiao
IV estimator, the First-Differenced GMM estimator, and the System GMM estimator. The 3
estimators are explained and their advantages and disadvantages are highlighted. In Chapter 5
we finally arrive at the core of this master’s dissertation: the Monte Carlo simulations of the 4
presented estimators. The simulation method, implemented in MATLAB, is explained in detail
and the results of the simulations are presented. Chapter 6 concludes this work and gives a last
overview of the accomplished results.
THE FIRM GROWTH MODEL 3
Chapter 2
The Firm Growth Model
During the Monte Carlo simulations carried out in this thesis, the 4 investigated dynamic panel
data estimators will repeatedly estimate the coefficients of a dynamic panel data model. There-
fore, such a model needs to be chosen in order to compare the 4 estimators in terms of consistency
and efficiency. In this study, the ’Firm Growth Model’ will be used for this purpose. The ’Firm
Growth Model’ is frequently used when analyzing dynamic panel data in order to study the in-
fluence of certain economic variables on firm growth. Furthermore, the model was already used
at UGent for analyzing the influence of municipal fiscal policy on firm growth. This model has
been developed by consecutive research by different researchers and is built upon Gibrat’s law,
which summarizes the conventional wisdom on the relationship between firm size and growth.
It asserts that the probability of a proportionate increase in firm size over an interval in time is
the same for all firms, regardless of their size at the beginning of the interval. i.e., Gibrat’s law
does not assumes a relationship between firm growth and firm size. Much early work supported
Gibrat’s law (Hart and Prais, 1956; Simon and Bonini, 1958; Hymer and Pashigian, 1962; Lucas,
1967), but later empirical research found that a relationship between firm growth and size does
exist (Manseld, 1962; Kumar, 1985; Hall, 1987; FitzRoy and Kraft, 1991; Mata, 1994). This
discrepancy between theory and evidence has led to new theories emphasizing managerial effi-
ciency and learning-by-doing as key determinants of firm dynamics. Jovanovic (1982) proposed
a learning model in which firm growth and survival are linked to firm size and age through
an initial level of production efficiency. Firms learn their efficiency level through production
experience. When output is a decreasing convex function of managerial inefficiency, the model
implies that younger firms tend to grow faster than older firms.
THE FIRM GROWTH MODEL 4
Starting from these theories, a number of papers have taken firm size and age as the determinants
of firm growth. Following Evans (1987a,b), the growth variable of interest at time t+1 is modeled
as a function of size and age at time t. This relationship in logarithmic form is expressed as:
Gi,t+1 = ln(F (Ai,tSi,t)) + ui,t (2.1)
where Gi,t+1 = ln(Si,t+1) − ln(Si,t) is the growth variable of interest of firm i (i = 1, 2, ..., N)
in period t + 1 (t = 1, 2, ..., T ), Si,t and Ai,t are the size and age of firm i at time t, and ui,t is
the error term. The error term can be broken down into a firm-specific (µi), a time-specific (λt)
and a random error term (vi,t) as:
ui,t = µi,t + λi,t + vi,t (2.2)
Using a flexible translog functional form for F (.) the relation is written as:
Gi,t+1 = β0 + β1ln(Si,t) + β2ln(Ai,t) + β3(ln(Si,t))2 + β4(ln(Ai,t))
2 + β5ln(Si,t)ln(Ai,t) + ui,t
(2.3)
Eq. 2.3 will be used as econometric model when comparing the different estimators in the Monte
Carlo simulations. During these simulations, the coefficients of the model will be defined in ad-
vance and a dynamic panel data set will be created based upon this model. This is done using a
random input of variables and a random error following a certain statistical distribution. More
on the technicalities of these simulations in Chapter 5.
Before discussing the Monte Carlo simulations, the 4 estimators which are studied in this master’s
dissertation will be presented in the next 2 chapters. In Chapter 3 we take a look at the Fixed
Effects estimator, which is sometimes used in the estimation of dynamic panel data models.
This estimator, however, is not very well suited for usage with this type of models. Therefore,
Chapter 4 presents 3 more suitable estimators to be used when dealing with dynamic panel data
models.
THE FIXED EFFECTS ESTIMATOR AND BIAS 5
Chapter 3
The Fixed Effects Estimator and
Bias
3.1 Presentation of the Fixed Effects Estimator
In this chapter, we discuss the first of 4 estimators studied in this master’s dissertation: the Fixed
Effects (FE) estimator. This estimator serves as a starting point as it is a natural choice when
allowing for individual effects in dynamic models. Both the Ordinary Least Squares (OLS) and
Random Effects (RE) estimator are not suited for estimating dynamic models since the lagged
dependent variable in the right-hand side of the model equation will be positively correlated with
the error-term due to the presence of individual effects. This makes the OLS and RE estimators
inconsistent when used in dynamic modeling. Applying the FE estimator to dynamic panel data
models, however, is not without problems either. When the number of time observation points
in the panel data set is not infinite, a so-called Nickell bias will occur (Nickell, 1981). We will
here shortly introduce the FE estimator in a mathematical manner, and show that a bias occurs
when estimating dynamic models.
We start off by introducing a simplified linear dynamic panel data model, containing explanatory
variables xt as well as the lagged endogenous variable yt−1.
yi,t = ρyi,t−1 + x′i,tβ + αi + εi,t (3.1)
In this model, i = 1..N represents the index for individuals, and t = 1..N the index for years. x′i,t
is a row vector, containing the K explanatory variables for individual i at time point t. ρ is the
THE FIXED EFFECTS ESTIMATOR AND BIAS 6
unknown parameter of the lagged endogenous variable, and β is the unknown parameter vector
for the K explanatory variables. αi represents the individual specific effects (time invariant).
Further, following assumptions are made: the error term is normally distributed: εi,t ∼ N(0, σ2ε ),
the error term is orthogonal to the exogenous variables: E(x′i,tεi,t) = 0 and the error term is un-
correlated with the lagged endogenous variable: E(yi,t−1εi,t) = 0. We, however, do not assume
the exogenous variables to be uncorrelated with the individual effects: E(x′i,tαi) 6= 0.
This model can easily be expressed in matrix notation as follows:
y = y−1ρ+Xβ +Dα+ ε (3.2)
with
y =
y1
y2
...
yN
yi =
yi,1
yi,2
...
yi,T
X =
X1
X2
...
XN
Xi =
x′i,1
x′i,2
...
x′i,T
D = IN × e e =
1
1
...
1
with dimension T α =
α1
α2
...
αN
From the definition of these matrices it can be seen that y is a (N · T × 1) column vector and
X is a (N · T × K) matrix. The matrix formulation of this model is important, as the same
formulation will be used in order to implement the FE estimator in MATLAB.
In this matrix notation, the dummy matrix D is used to insert the individual specific effects
into the model. This dummy matrix is key, since it will help us to estimate the parameters
in this model using the classic OLS estimator. In order to be able to use the OLS estimator,
the values of the variables in the model have to be ’demeaned’, meaning that their deviations
from the individual means need to be calculated. Once the variables are ’demeaned’, the OLS
estimator can be used, since the individual specific effects are filtered out of the variables. This
’demeaning’ is performed by premultiplying the left-hand side and right-hand side of the model
THE FIXED EFFECTS ESTIMATOR AND BIAS 7
equation with matrix M :
My = My−1ρ+MXβ +MDα+Mε (3.3)
with
M = INT −D(D′D)−1D′ (3.4)
It can be proven that the term MDα in equation 3.3 is equal to zero, which results in the
disappearance of the individual effects from the model. From this point it is possible to use the
classic OLS estimator to estimate the parameters in the model as follows:
β = (X ′MX)−1X ′My (3.5)
The derivation of equation 3.5 was omitted. This method will be used in the implementation of
the FE estimator in MATLAB.
3.2 The Nickell Bias of the Fixed Effects Estimator
After having presented the Fixed Effects estimator, it is now time to study its theoretical perfor-
mance when estimating dynamic models. It will be shown that the FE estimator exhibits a bias
when dealing with lagged endogenous variables. To come to this, let’s consider the demeaned
lagged endogenous variables y∗i,t−1 and the demeaned error term ε∗i,t:
y∗i,t−1 = yi,t−1 −1
T
t=T∑t=1
yi,t−1 (3.6)
ε∗i,t = εi,t −1
T
t=T∑t=1
εi,t (3.7)
The above 2 expressions result from the scalar representation of equation 3.3. From these 2
expressions it can be conducted that the demeaned lagged endogenous variable correlates with
the demeaned error term, or E(y∗i,t−1ε∗i,t) 6= 0. This due to the fact that the error term ε∗i,t−1 is
contained in y∗i,t−1 with weight 1− 1T , and in ε∗i,t with weight − 1
T . This correlation renders the
FE estimates ρ and β biased. From this, it is also obvious that the correlation decreases with
increasing T , i.e. with increasing number of time observation points. Typical panel data sets,
however, contain a large number of individuals (large N), observed over a rather limited amount
of time (relatively small T). Therefore it is of importance to study the Nickell bias of the FE
THE FIXED EFFECTS ESTIMATOR AND BIAS 8
estimator for the asymptotic case where N tends to infinity (N →∞), while T stays finite. The
exact calculation of this bias is omitted here. The properties of this bias, however, in case of
infinite N and finite T are as follows:
For N → ∞ and T finite, the asymptotic bias of the estimated parameters (biasρ =
ρ− ρ and biasβ = β − β) has following properties:
1. Increasing for increasing ρ
2. Increasing for increasing N (number of individuals in data set)
3. Increasing for increasing σ2ε (variance of error term)
4. Decreasing for increasing T (number of time observation points in data set)
The Nickell bias completely disappears when the number of time observation points in the panel
data set goes to infinity (T → ∞). This is, however, not achievable in real-world data sets. It
is therefore needed to take the Nickell bias into account. Its presence will be demonstrated in
Chapter 5 when testing the FE estimator using Monte Carlo simulations.
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 9
Chapter 4
Estimation Techniques for Dynamic
Panel Data
In Chapter 3, the first of 4 estimators studied in this master’s dissertation was discussed, namely
the Fixed Effects estimator. Although simple, the FE estimator exhibits a non-negligible bias
when estimating real-world dynamic panel data models. This bias has led to the development
of more suitable estimators to perform dynamic panel data estimation. This chapter treats
3 of those estimators, which have frequently been proposed in the literature: the Anderson-
Hsiao instrument variable (IV) estimator, the First-Differenced generalized method of moments
(GMM) estimator, and the System GMM estimator. Since each of these estimators makes use
of the principle ’instrumentation’, we will first shortly introduce this technique at the start of
the chapter. Subsequently the 3 mentioned estimators will be presented.
4.1 The Idea of Instrumentation
In Chapter 3, it was shown that a bias occurs in the FE estimates due to the correlation of the
error term with one of the regressors (in this case a lagged dependent variable). The goal of
instrumentation is to prevent this bias resulting from the correlation between the regressor x and
the error term ε. The whole idea of instrumentation in estimators can be summarized as follows:
’Find a variable Z (the instrument), that is highly correlated with the regressor X,
but does not correlate with the error term ε. Then use as new regressor that part
of X which correlates with the instrument Z and is orthogonal to the error term ε.’
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 10
The use of instrumentation can be made transparent by expressing it as a two stage process.
In the first stage, the explanatory variable X is regressed on the instrument Z. This leads to a
derived explanatory variable X, which contains the linear dependent part of X on Z. This new
variable X will be uncorrelated with the error term ε, since it only contains the linear dependent
part of X on Z, and Z was chose such that it is uncorrelated with ε. X is then used in the
second stage as new explanatory variable for the main regression to be performed. In matrix
form, this method is executed as follows:
The goal is to estimate the parameter in the regression equation y = Xβ+ε using an instrument
for the explanatory variable X.
First stage: regression of explanatory variable X on instrument variable Z
X = Zγ + ν (4.1)
The regression values X, obtained through OLS estimation of γ are then:
X = Zγ = Z(Z ′Z)−1Z ′X (4.2)
Second stage: usage of variables X in main regression with following regression equation:
y = Xβ + ε (4.3)
β = (XX ′)−1X ′y (4.4)
Again the estimate for the parameter β was obtained using the OLS estimation expression. If
we then insert the result of equation 4.2 into the obtained expression for β (equation 4.4), we
obtain following end result for the estimation of the parameter β:
β = (X ′PX)−1X ′Py (4.5)
with
P = Z(Z ′Z)−1Z ′ (4.6)
This is an important result, since it is the general expression for the instrument variable estimator
and will be used in the next section of this chapter.
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 11
4.2 The Anderson-Hsiao Instrument Variable Estimator
The Anderson-Hsiao estimator (Anderson/Hsiao, 1982) is a first estimator which tries to estimate
dynamic panel data models without introducing a bias in the estimates. This estimator uses the
instrumentation principle, explained in the previous section, combined with a differenced form
of the original regression equation. Let’s start again from the linear dynamic model introduced
in Chapter 3 when discussing the Fixed Effects estimator:
yi,t = ρyi,t−1 + x′i,tβ + α+ εi,t (4.7)
For this model, the same conventions are used as in Chapter 3, as well as the same assumptions.
Instead of working with this standard form of the equation, the Anderson-Hsiao estimator is
based on the differenced form of the regression equation:
yi,t − yi,t−1 = ρ(yi,t−1 − yi,t−2) + (x′i,t − x′i,t−1)β + εi,t − εi,t−1 (4.8)
which cancels out the individual effects, which have been assumed to possibly correlate with
the exogenous variables. Next, we can express both the original equation and the differenced
equation in matrix form:
y = y−1ρ+Xβ +Dα+ ε original (4.9)
Fy = Fy−1ρ+ FXβ + FDα+ Fε differenced (4.10)
where the vectors and matrices have the exact same form as in Chapter 3, and with matrix F
defined as:
F = IN × FT and FT =
−1 1 0 ... 0 0
0 −1 1 ... 0 0...
0 0 0 ... −1 1
with dimension (T − 1)× T
As can be deducted from its definition, premultiplication of the data matrices with matrix F
results in the differences of these variables. It is therefore true that FD = 0, which cancels out
the individual fixed effects. This is valuable, since it prevents the correlation between the fixed
effects and the explanatory variable to bias the estimate. We are, however, not out of troubles
yet. If we take at look at the difference of the lagged dependent variable
yi,t−1 − yi,t−2 = ρ(yi,t−2 − yi,t−3) + (x′i,t−1 − x′i,t−2)β + εi,t−1 − εi,t−2 (4.11)
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 12
we can see that this differenced lagged dependent variable is correlated with the differenced error
term εi,t−εi,t−1 through the presence of εi,t−1 in expression 4.2: E((yi,t−1−yi,t−2)(εi,t−εi,t−1)) 6=
0. This correlation will render the estimate of the parameters in the model biased.
The presence of the above mentioned correlation brings us to the second technique used in this
estimator, namely instrumentation. Anderson and Hsiao suggested 2 possible instruments to be
used for the differenced lagged dependent variable yi,t−1 − yi,t−2: the level instrument yi,t−2 or
the lagged difference yi,t−2−yi,t−3. These instruments can be expected to have a high correlation
with the differenced lagged dependent variable, and are also expected to be uncorrelated with
the differenced error term. Arellano (1989) analyzed the properties of these 2 instruments and
found the estimator using the level instruments to be superior because of its smaller variance
and no points of singularity. The use of the level instruments also has the added advantage
of losing one time observation point less than is the case for the lagged difference instruments.
This can be of relevance in practical use, since data sets with a large number of individuals but
a limited number of time observation points are commonly encountered.
Combining the first-differencing approach with the instrumentation using level instruments, the
Anderson-Hsiao estimator can be formulated as a standard instrument variable estimator. The
parameters of the model can therefore be estimated using the expressions for the IV estimator,
presented in the previous section:
γ = (X ′PX)−1X ′P y (4.12)
with
P = Z(Z ′Z)−1Z ′ (4.13)
In these expressions, all right-hand side variables (lagged dependent variable and explanatory
variables) are brought together in one matrix X. Their corresponding parameters in the model
are grouped into the row vector γ. The instrumented variables are grouped into one large matrix
as well, namely Z. In this matrix, the differenced lagged dependent variables yi,t−1 − yi,t−2 are
replaced by their level instruments, namely yi,t−2. The complete buildup of these matrices is as
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 13
follows:
X =
X1
X2
...
XN
Z =
Z1
Z2
...
ZN
y =
y1
y2
...
yN
with
Xi =
yi,2 − yi,1 x′i,3 − x′i,2yi,3 − yi,2 x′i,4 − x′i,3
......
yi,T−1 − yi,T−2 x′i,T − x′i,T−1
Zi =
yi,1 x′i,3 − x′i,2yi,2 x′i,4 − x′i,3
......
yi,T−2 x′i,T − x′i,T−1
yi =
yi,3 − yi,2
yi,4 − yi,3
...
yi,T − yi,T−1
As you can see, the data in these matrices is grouped per individual, and the blocks of individual
data (containing all time observation points for that individual) are then stacked on top of each
other. This structure is important to highlight, as the implementation of the Anderson-Hsiao
estimator in MATLAB will make use of this structure. The performance of this estimator will
be discussed in Chapter 5, using the results of the Monte Carlo simulations.
4.3 Introduction to the Generalized Method of Moments
In the previous section, the Anderson-Hsiao estimator was presented, which is an instrument
variable estimator. The 2 remaining estimators to be discussed, however, are so-called Gener-
alized Method of Moments (GMM) estimators. This section therefore shortly introduces the
concept of GMM. GMM estimation was formalized by Hansen (1982) and since has become
one of the most widely used methods of estimation in economics and finance. This technique
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 14
does not require the complete knowledge of the distribution of the data. Only specified moment
conditions, derived from the underlying model, are needed for GMM estimation. These moment
conditions are functions of the model parameters and the data, such that their expectation
is zero at the true values of the parameters. They are also called ’orthogonality conditions’.
Solving these moment conditions for the model parameters then leads to an estimate for these
parameters.
In order to demonstrate this method, we here represent the simple OLS estimation method as
an application of the method of moments. Using this method, we would like to estimate the
parameter vector β in following model: y = Xβ + ε. The moment condition to start from is in
this case the uncorrelatedness of the explanatory variable and the error term:
E(X ′ε) = 0 (4.14)
This moment condition then needs to be applied to the sample data. We obtain:
1
NX ′(y −Xβ) = 0 (4.15)
When this expression is solved for the parameter vector β, we obtain following well known result
for the OLS estimator:
β = (X ′X)−1X ′y (4.16)
In the same way, the instrument variable estimator can be seen as en application of the method of
moments. This time, the moment condition to start from is the assumption that the instrument
Z is orthogonal to the error term ε:
E(Z ′ε) = 0 (4.17)
Again applying this condition to the sample data leads to following expression:
1
NZ ′(y −Xβ) = 0 (4.18)
Solving this expression for the parameter vector β results in
β = (X ′PX)−1X ′Py (4.19)
with
P = Z(Z ′Z)−1Z ′ (4.20)
which is the same as obtained in the first section of this chapter.
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 15
4.4 The First-Differenced Generalized Method of Moments Es-
timator
The First-Differenced GMM estimator was developed by Arrelano and Bond (1991), and has
since become increasingly popular in empirical work using firm level or household data. This
estimator is similar to the Anderson-Hsiao estimator, presented in a previous section, but exploits
additional moment conditions, which enlarges the set of instruments to be used. By introducing
extra moment conditions, this estimators becomes a GMM estimator. To understand this, we
once again start from the same dynamic model used in the presentation of the FE estimator:
yi,t = ρyi,t−1 + x′i,tβ + α+ εi,t (4.21)
For this model, the same conventions are used as in Chapter 3, as well as the same assumptions.
Just as was the case for the Anderson-Hsiao estimator, the first-differenced GMM estimator
makes use of the differenced model equation:
yi,t − yi,t−1 = ρ(yi,t−1 − yi,t−2) + (x′i,t − x′i,t−1)β + εi,t − εi,t−1 (4.22)
This again allows us to eliminate the individual specific effects from the model.
To understand where the extra set of moment conditions originates from, we take a look at the
level instruments available for instrumenting the differenced lagged dependent variable yi,t−1 −
yi,t−2 at different time observation points. For t = 3, the equation to be estimated is:
yi,3 − yi,2 = ρ(yi,2 − yi,1) + (x′i,3 − x′i,2)β + εi,3 − εi,2 (4.23)
The available level instrument for yi,2 − yi,1 is just yi,1.
At time observation point t = 4, the equation has following form:
yi,4 − yi,3 = ρ(yi,3 − yi,2) + (x′i,4 − x′i,3)β + εi,4 − εi,3 (4.24)
Now, the available level instruments for yi,3 − yi,2 are yi,1 and yi,2. As can be seen, the time
periods available for instrumentation enlarge up to the time observation point t = T where the
instruments yi,1, yi,2, ..., yi,T−2 are available. The First-Differenced GMM estimator makes use
of all these available instruments, and has therefore an enlarged set of moment conditions to
impose on the available data. This leads in theory to an increased consistency and efficiency of
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 16
the estimator.
In matrix formulation, these extra instruments are added to the instrument matrix Z:
Z =
Z1
Z2
...
ZN
with Zi =
yi,1 0 0 ... 0 ... 0 x′i,3 − x′i,20 yi,1 yi,2 ... 0 ... 0 x′i,4 − x′i,3...
...... ...
.... . .
......
0 0 0 ... yi,1 ... yi,T−2 x′i,T − x′i,T−1
This matrix is called a ’GMM-style’ instrument matrix. When other regressors in the model are
endogenous (besides the differenced lagged variable yi,t − yi,t−1), this matrix can be extended
with another set of instrument variables in the same way as for the variable yi,t − yi,t−1. The
estimates of the parameters in the model are eventually obtained by imposing following set of
moment conditions on the data:
E(Zi∆εi) = 0 for i = 1...N (4.25)
with
∆εi =
∆εi,3
∆εi,4...
∆εi,T
This results in a set of (T−1)(T−2)
2 +K meaningful moment conditions per individual, and thus
a total of N( (T−1)(T−2)2 +K) moment conditions. This number typically exceeds the number of
unknown parameters in the model. Therefore, the asymptotically efficient GMM estimator is
obtained by minimizing following expression QN :
QN = (1
N
N∑i=1
Z ′i∆εi)′ WN (
1
N
N∑i=1
Z ′i∆εi) (4.26)
Differentiating this expression with respect to the model parameters γ, and next solving for γ
yields:
γ = (X ′ZWNZ′X)−1X ′ZWNZ
′Xy (4.27)
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 17
in which X has following form:
X =
X1
X2
...
XN
with Xi =
yi,2 − yi,1 x′i,3 − x′i,2yi,3 − yi,2 x′i,4 − x′i,3
......
yi,T−1 − yi,T−2 x′i,T − x′i,T−1
and where WN is the weighting matrix, incorporated to deal with heteroscedasticity of the error
term.
In order to determine the optimal weighting matrix WOPTN , a two-step procedure is used. In
the first step, the First-Differenced GMM estimator is limited to the case of no autocorrelation
in the error term εi,t combined with the homoscedasticity assumption (constant variance of the
error term over different individuals and time observation points). By making these assumptions,
the first-step weighting matrix W s1N can easily be computed as follows:
W s1N = (Z ′GZ)−1 (4.28)
with
G = IN ×GT and GT = F ′TFT =
2 −1 0 ...
−1 2. . . 0
0. . .
. . . −1... 0 −1 2
Once this preliminary weighting matrix is determined, a first estimate of the parameters can be
made using expression 4.27. This leads to an interim estimate for γ:
γs1 = (X ′Z W s1N Z
′X)−1X ′Z W s1N Z
′Xy (4.29)
Using this interim estimate from the first step, the residuals for this first-step regression can be
calculated as follows:
∆ε = Y −Xγs1 (4.30)
These residuals are then used in the second step for calculating the optimal weighting matrix
WOPTN :
WOPTN = (Z ′∆ε∆ε′Z)−1 (4.31)
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 18
This optimal weighting matrix is then used for calculating the final estimate of γ:
γs2 = (X ′Z WOPTN Z ′X)−1X ′Z WOPT
N Z ′Xy (4.32)
Again this exact matrix formulation will be used in the implementation of the First-Differenced
GMM estimator in Matlab. This implementation will be discussed in Chapter 5, together with
the performance of the estimator in the Monte Carlo simulations.
Although this estimator makes use of an extended set of instruments, it performs poorly under
certain circumstances. The reason behind this poor performance is the problem of weak instru-
ments, meaning that the instruments are only weakly correlated with the endogenous variables
they are trying to instrument. Blundell and Blond (1998) have demonstrated that the instru-
ments used in the First-Differenced GMM estimator become less informative (and therefore
weaker) when:
� yi,t is close to a random walk (no clear trend observable in the dependent variable).
� The variance of the individual effects σ2α is relatively large in comparison to the variance
of the error term σ2ε .
This problem of weak instruments is redressed by introducing another set of instruments, re-
sulting in the System GMM estimator, presented in the last section of this chapter.
4.5 The System Generalized Method of Moments Estimator
To improve the properties of the First-Differenced GMM estimator, Arrelano and Bover (1995)
and Blundell and Bond (1998) suggested to introduce another set of moment conditions, but
this time for the level model equation, instead of for the differenced model equation. This is
realized by using instruments that are orthogonal to the individual effects. The key here is that,
instead of differencing the regressors to eliminate the individual effects (as was the case for the
Anderson-Hsiao and the First-Differenced GMM estimator), now the instruments are differenced
to make them orthogonal to the individual effects.
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 19
In order to illustrate this method, we again start from the same dynamic model used in the
presentation of the FE estimator:
yi,t = ρyi,t−1 + x′i,tβ + α+ εi,t (4.33)
For this model, the same conventions are used as in Chapter 3, as well as the same assumptions.
It is this level equation that will be used to come up with an additional set of moment condi-
tions to be used in the System GMM estimator with respect to the First-Differenced estimator.
The level endogenous variables in this equation will be instrumented by differenced instruments,
meaning that yi,t−1 will be instrumented by yi,t−1 − yi,t−2 = ∆yi,t−1. The System GMM esti-
mator exploits this new set of instruments, while retaining the original set for the differenced
equation.
In order to capture all these instruments in one matrix formulation, the System GMM estima-
tor involves building a stacked data set with twice the observations, i.e. in the data of each
individual, the differenced observations go on top and the levels below. This results in following
definition for y+i and X+i :
y+i =
yi,3 − yi,2
yi,4 − yi,3
...
yi,T − yi,T−1
yi,3
yi,4
...
yi,T
and X+i =
yi,2 − yi,1 x′i,3 − x′i,2yi,3 − yi,2 x′i,4 − x′i,3
......
yi,T−1 − yi,T−2 x′i,T − x′i,T−1yi,2 x′i,3
yi,3 x′i,4...
...
yi,T−1 x′i,T
The instrument matrix per individual Z+i for this system can then be written as:
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 20
Z+i =
yi,1 0 0 ... 0 ... 0 0 0 0 0 x′i,3 − x′i,20 yi,1 yi,2 ... 0 ... 0′ 0 0 0 0 x′i,4 − xi,3...
...... ...
.... . .
... 0 0 0 0...
0 0 0 ... yi,1 ... yi,T−2 0 0 0 0 x′i,T − x′i,T−10 0 0 0 0 0 0 ∆yi,2 0 0 0 x′i,3
0 0 0 0 0 0 0 0 ∆yi,3 0 0 x′i,4
0 0 0 0 0 0 0...
.... . .
......′
0 0 0 0 0 0 0 0 0 0 ∆yi,T−1 x′i,T
Using theses matrices, the estimates of the parameters can be obtained by using the same
method as for the First-Differenced GMM estimator. Following set of moment conditions are
now imposed on the data:
E(Z+i ∆ε+i ) = 0 for i = 1...N (4.34)
with
∆ε+i =
∆εi,3
∆εi,4...
∆εi,T
εi,3
εi,4...
εi,T
This results in a large set of ( (T−1)(T−2)2 + (T − 2) + K) meaningful moment conditions per
individual, and thus a total of (N( (T−1)(T−2)2 + (T − 2) +K)) moment conditions. This number
typically largely exceeds the number of unknown parameters in the model. Therefore, the
asymptotically efficient GMM estimator is again obtained by minimizing following expression
QN :
QN = (1
N
N∑i=1
Z+′
i ∆ε+i )′ W+N (
1
N
N∑i=1
Z+′
i ∆ε+i ) (4.35)
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 21
Differentiating this expression with respect to the model parameters γ, and next solving for γ
yields:
γ = (X+′Z+W+
NZ+′X+)−1X+′
Z+W+NZ
+′X+y+ (4.36)
and where W+N is the weighting matrix for the Sytem GMM estimator, incorporated to deal
with heteroscedasticity of the error term.
In order to determine the optimal weighting matrix WOPTN , again a two-step procedure is used,
just as was the case for the First-Differenced GMM estimator. In the first step, the System
GMM estimator is limited to the case of no autocorrelation in the error term εi,t combined with
the homoscedasticity assumption (constant variance of the error term over different individuals
and time observation points). Furthermore, the covariance matrix of the individual effects is
assumed to be zero: V [αi] = 0. By making these assumptions, the first-step weighting matrix
W s1N can easily be computed as follows:
W s1N = (Z+′
G+Z+)−1 (4.37)
with
G+ = IN ×G+′
T and G+T =
GD 0
0 GL
where
GD =
2 −1 0 ...
−1 2. . . 0
0. . .
. . . −1... 0 −1 2
and GL =
1 0 ... 0
0 1. . . 0
.... . .
. . . 0
0 0 0 1
Once this preliminary weighting matrix is determined, a first estimate of the parameters can be
made using expression 4.36. This leads to an interim estimate for γ:
γs1 = (X+′Z+ W s1
N Z+′X+)−1X+′
Z+ W s1N Z
+′X+y+ (4.38)
Using this interim estimate from the first step, the residuals for this first-step regression can be
calculated as follows:
∆ε+ = Y + −Xγs1 (4.39)
ESTIMATION TECHNIQUES FOR DYNAMIC PANEL DATA 22
These residuals are then used in the second step for calculating the optimal weighting matrix
WOPTN :
WOPTN = (Z ′∆ε∆ε′Z)−1 (4.40)
This optimal weighting matrix is then used for calculating the final estimate of γ:
γs2 = (X+′Z+ W s1
N Z+′X+)−1X+′
Z+ W s1N Z
+′X+y+ (4.41)
This exact matrix formulation will be implemented in Matlab, as explained in the next chapter.
MONTE CARLO SIMULATIONS 23
Chapter 5
Monte Carlo Simulations
In this chapter, the core of this master’s dissertation is discussed, namely the implementation
and results of the Monte Carlo simulations. The goal of the Monte Carlo simulations is to
compare the performance of the 4 discussed estimators in estimating the Firm Growth Model
under different circumstances. In order to be able to freely vary the parameters in the simulation,
I have chosen to execute the simulations using the mathematical software package ’MATLAB’.
This environment provides a large freedom when implementing the simulations and enabled me
to set up the simulations for the specific case of the Firm Growth Model. This extra freedom,
however, came at the cost of extra programming time, since everything needed to implemented
from scratch. In the first section of this chapter, the general set-up of the simulation will be
presented. Next, the parameters varied over the different simulations are discussed, as well as
the reasons for this variation. In the third section the MATLAB implementation of the Monte
Carlo simulations is explained. The results of the simulations are presented and discussed in the
last section.
5.1 General Set-Up of Simulations
Monte Carlo (MC) simulations are used to solve problems which are probabilistic in their nature.
Drawing a data sample from a population and estimating the population characteristics from
that sample is an example of a probabilistic problem. A broad definition of Monte Carlo simu-
lations is: ’the repeated sampling from a probabilistic distribution to determine the properties
of some phenomenon’. The standard framework for MC simulations is as follows:
MONTE CARLO SIMULATIONS 24
1 Define a domain of possible inputs
2 Generate inputs randomly from a probability distribution over that domain
3 Perform the deterministic calculations necessary to determine searched-for parameters
4 Repeat steps 2-3 over a large amount of iterations
5 Aggregate the results.
This framework can be applied to the study we are performing here, namely the analysis of the
performance of 4 estimators when estimating the parameters in the Firm Growth Model. This
Firm Growth Model was discussed in Chapter 2, and has following form:
Gi,t+1 = β0 + β1ln(Si,t) + β2ln(Ai,t) + β3(ln(Si,t))2 + β4(ln(Ai,t))
2 + β5ln(Si,t)ln(Ai,t) + ui,t
where Gi,t+1 = ln(Si,t+1)− ln(Si,t) is the growth variable of interest of firm i (i = 1, 2, ..., N) in
period t+ 1 (t = 1, 2, ..., T ), Si,t and Ai,t are the size and age of firm i at time t, and ui,t is the
error term. Rewriting this equation leads to:
Si,t+1 = β0 + (β1 + 1)ln(Si,t) + β2ln(Ai,t) + β3(ln(Si,t))2 + β4(ln(Ai,t))
2 + β5ln(Si,t)ln(Ai,t) + ui,t
It is this form of the equation which will be used during the Monte Carlo simulations.
Following the framework presented above, the procedure for the Monte Carlo simulations per-
formed here is as follows:
1 Specify the possible inputs, i.e. specify the population parameters β1, β2, β3, β4 and β5.
2 Generate a data set by drawing a sample out of this population, i.e. draw random initial
values for ln(Si,1) and ln(Ai,1) from their respective distributions, which will be used to
determine all later values ln(Si,t) and ln(Ai,t) with t = 2...T . Also draw random values
for all error terms ui,t. Then calculate all remaining values in the data set.
3 From the generated data set, calculate estimates for the parameters β1, β2, β3, β4 and β5
using the 4 estimators under study. Save these estimates for later processing.
4 Repeat step 2-3 a large number of times (e.g. D = 1000).
5 Compare the estimated coefficients from all iterations to the population values to determine
the performance of the different estimators.
MONTE CARLO SIMULATIONS 25
This simulation procedure will be performed a number of times by varying parameters in the
model. Which parameters will be varied during the different MC simulations is discussed in the
next section.
In the final step of the Monte Carlo simulation, the performance of the different estimators needs
to be determined. In order to do this, performance measures are necessary. In this case, we
evaluated the estimators based upon 2 properties:
� Consistency: meaning that the estimates converge in probability to the population value,
i.e. the distribution of the estimates becomes more and more concentrated near the true
value of the parameter being estimated
� Efficiency: meaning the variance of the estimates is small.
The performance statistics used to measure these properties of the estimators are the following:
� The average absolute bias = 15
∑i=5i=1
1D
∑d=Dd=1 |βi,d − βi|, as a performance statistic for
the consistency of the estimator.
� The average variance = 15
∑i=5i=1
1D
∑d=Dd=1 (βi,d − βi)2, as a performance statistic for the
efficiency of the estimator.
In the definitions of these performance statistics, βi,d represents the estimated parameter βi from
the dth iteration. The first averaging is over all D iteration performed during 1 MC simulation.
The second averaging is over the 5 model parameters (β1 to β5). The word ’average’ in the
definition of the performance statistics therefore implies both averaging actions.
5.2 Configuration of the Simulations
The data sets used in each iteration of the MC simulations are created such that they ap-
proximate a real-world data set as close as possible. In order to achieve this, the population
parameters β1, β2, β3, β4 and β5, and the initial values for ln(Si,1) and ln(Ai,1) need to be
chosen based upon a realistic population. The population sample on which the MC simulations
are based here is the growth data of firms, defined in terms of assets, located in Flemish munic-
ipalities from 2004 to 2013. This data was retrieved from the BELFIRST database published
by Bureau Van Dijk. The sample consists of more then 32 0000 firm-year observations for 69
000 firms and it was gathered and analyzed by P. Van Cauwenberge, P. Beyne and H. Vander
MONTE CARLO SIMULATIONS 26
Bauwhede (2016). The descriptive statistics gathered during their analysis will be used here to
set up the MC simulations.
In the creation of the data set during each iteration of the MC simulation, the initial values
for ln(Si,1) and ln(Ai,1) are drawn randomly from a distribution as close as possible to their
real-world distributions. These distributions have following descriptive statistics (as determined
by P. Van Cauwenberge, P. Beyne and H. Vander Bauwhede):
Mean SD Min Median Max
ln(S) 6.3686 1.4655 3.2242 6.2478 10.6868
ln(A) 2.5501 0.8491 0 2.7081 4.0943
Table 5.1: Descriptive statistics of variables ln(S) and ln(A)
Both the distribution of ln(S) and ln(A) are assumed to be normal, since the mean and median
are close to each other and all the data is contained in the interval [Mean−3∗SD;Mean+3∗SD].
Therefore, in each iteration of the MC simulation the initial values ln(Si,1) and ln(Ai,1) are drawn
from the distributions N(6.3686, 1.46552) and N(2.5501, 0.84912) respectively.
For the specification of the population parameters β1, β2, β3, β4 and β5, the research of P. Van
Cauwenberge, P. Beyne and H. Vander Bauwhede is used as well. In their regression analysis
of the real-world data set of Flemish firm growth data (defined in terms of assets), they have
estimated these parameters in the Firm Growth Model. The obtained estimates and their
respective significance levels are:
Parameter Estimate Significance
β1 0.0439 0.0069
β2 -0.0804 0.0084
β3 -0.0048 0.0010
β4 0.0045 0.0011
β5 0.0070 0.0017
Table 5.2: Estimates for the population parameters.
These estimates for β2, β3, β4 and β5 will be used to specify the population in each of the MC
MONTE CARLO SIMULATIONS 27
simulations. Parameter β1, however, will be varied in order to study the influence of the param-
eter corresponding to the lagged dependent variable on the performance of the 4 estimators. As
discussed in Chapter 3, the Nickell bias increases for increasing values of the dependent lagged
variable parameter ρ. For the Firm Growth Model used here, this parameter ρ is equal to 1+β1.
In order to visualize this Nickell bias clearly, 3 separate MC simulation will be performed: one
for 1 + β1 = 0.1, one for 1 + β1 = 0.5 and one for 1 + β1 = 0.9.
From Chapter 3 we know as well that not only the lagged dependent variable parameter influ-
ences the magnitude of the Nickell bias. The number of individuals in the data set (N) and
the amount of time observation points (T) influence the Nickell bias as well. Therefore, these 2
parameters are varied as well across different MC simulations. N will take on 2 values, namely
N = 100 and N = 1000. T will be varied across two values as well: T = 5 and T = 10. These
values were chosen such that they are separated sufficiently to cause a performance difference,
while still producing feasible simulation sizes using a desktop computer. In total, we will have
following 12 separate MC simulations:
Simulation number β1 + 1 T N
1 0.1 5 100
2 0.1 5 1000
3 0.1 10 100
4 0.1 10 1000
5 0.5 5 100
6 0.5 5 1000
7 0.5 10 100
8 0.5 10 1000
9 0.9 5 100
10 0.9 5 1000
11 0.9 10 100
12 0.9 10 1000
Table 5.3: Configuration of simulations to be performed
The implementation in MATLAB of these simulation is presented in the next section.
MONTE CARLO SIMULATIONS 28
5.3 MATLAB Implementation
MATLAB (short for MATrix LABoratory) is a numerical computing environment, well suited
for performing calculations with matrices. Since all 4 estimators studied in this master’s disser-
tation are presented in matrix formulation, MATLAB was an obvious choice for implementing
these estimators, performing the Monte Carlo simulations and processing the results. The total
implementation of the simulations in MATLAB consists of 4 functions and 2 scripts:
� A function implementing the Fixed Effects estimator (’FE final.m’ ).
� A function implementing the Anderson-Hsiao IV estimator (’AHIV final.m’ ).
� A function implementing the First-Differenced GMM estimator (’FDGMM final.m’ ).
� A function implementing the System GMM estimator (’SGMM final.m’ )
� A script implementing the Monte Carlo Simulations(’MC simulations final.m’ ).
� A script implementing the processing and visualization of the results
(’MC results processing final.m’ ).
The main script in the hierarchy is ’MC simulations final.m’, implementing the Monte Carlo
simulation. It is here that the D=1000 iterations of the simulation are performed. The interplay
of the different functions and scripts is visualized in the figure below.
Figure 5.1: Schematic diagram representing MATLAB implementation
MONTE CARLO SIMULATIONS 29
This procedure will be executed for the 12 simulations presented in table 5.3. The exact imple-
mentation of the different MATLAB scripts can be found in the appendix of this thesis.
5.4 Simulation Results
In this section, the results of the Monte Carlo simulations are presented and discussed. The
simulation results are grouped into 3 groups, accoring to the value of the lagged dependent
variable parameter β1 + 1. In each group, the values for N (number of individuals in data set)
and for T (number of time observation points in data set) are varied between 2 values: N = 100
and N = 1000, and T = 5 and T = 10. This creates a total of 4 different MC simulations per
value of β1 + 1.
5.4.1 Simulation results for β1 + 1 = 0.1
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.1, T=5, N=100
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.2: MC simulation 1
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.1, T=5, N=1000
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.3: MC simulation 2
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.1, T=10, N=100
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.4: MC simulation 3
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.1, T=10, N=1000
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.5: MC simulation 4
MONTE CARLO SIMULATIONS 30
When studying the results of the 4 MC simulations in which β1 +1 = 0.1, it is clear the the Sys-
tem GMM estimator outperforms the other 3 estimators both in terms of consistency (minimal
average absolute bias) and efficiency (minimal average variance). This estimator in particular
outperforms the First-Differenced GMM estimator in each of the 4 simulations, regardless of
the size of N or T . The use of an extended set of instruments in the System GMM estimator
clearly affects its performance in comparison to the smaller set of instruments used in the First-
Differenced GMM estimator. This is particularly true when the size of the data set is small
(small T and small N).
When the value of T is low (T = 5), the Nickell bias of the Fixed Effects estimator becomes
clearly visible. In the simulations 1 and 2, where T = 5, the FE estimator performs worst,
having the largest bias and variance. Furthermore, this bias becomes larger, relatively to the
bias of the other estimators, when N increases. This is in accordance to the properties of the
Nickell bias presented in Chapter 3. When T = 10, however, the FE estimator is no longer the
worst performer of the 4 estimators. For T = 10, the Anderson-Hsiao estimator exhibits the
largest bias ans variance of all 4 estimator for both N = 100 and N = 1000. The use of a limited
set of level instruments for the differenced lagged dependent variables seems to perform worse
relatively to the other estimators when used in estimating the parameters of the Firm Growth
Model for larger values of T .
Lastly, it can be seen that in general all estimators perform best when the data set is largest.
This is no surprise, since a larger data set enables more accurate estimation because of the larger
amount of information available for estimation.
MONTE CARLO SIMULATIONS 31
5.4.2 Simulation results for β1 + 1 = 0.5
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.5, T=5, N=100
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.6: MC simulation 5
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.5, T=5, N=1000
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.7: MC simulation 6
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.5, T=10, N=100
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.8: MC simulation 7
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.5, T=10, N=1000
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.9: MC simulation 8
The simulation results for β1 +1 = 0.5 show that also for average values of the lagged dependent
variable parameter, the System GMM estimator performs best. The difference with the other 3
estimators is again largest for small N and small T . The performance of the First-Differenced
estimator is again far below that of the System GMM estimator in all 4 simulations.
The FE estimator only performs worst when T = 5 and N = 1000. Here, the Nickell bias is
most visible, as explained in Chapter 3. In general, however, the FE estimator performs better
for β1 + 1 = 0.5 than for β1 + 1 = 0.1. In the 3 other simulations, the Anderson-Hsiao estimator
is performing the worst. We can therefore conclude that the limited set of instruments, used in
MONTE CARLO SIMULATIONS 32
the AHIV estimator becomes weaker when the value of the lagged dependent variable parameter
increases in the Firm Growth Model.
5.4.3 Simulation results for β1 + 1 = 0.9
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.9, T=5, N=100
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.10: MC simulation 9
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.9, T=5, N=1000
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.11: MC simulation 10
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.9, T=10, N=100
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.12: MC simulation 11
FE AHIV FDGMM SGMM0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1
(1+B1) =0.9, T=10, N=1000
Ave
rage
abs
olut
e bi
as
FE AHIV FDGMM SGMM0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Ave
rage
var
ianc
e
Bias Var
Figure 5.13: MC simulation 12
In the last set of 4 MC simulations, this time for β1 + 1 = 0.9, there is again a clear winner.
The System GMM estimator outperforms all other estimators and the difference in performance
is larger than for low values of β1 + 1. This difference, however, is not caused by the fact that
the System GMM estimator performs better in case of higher values for β1 + 1. It performs
approximately the same as in the previous two sets of simulations. It is the bad performance of
the other estimators that causes this difference.
MONTE CARLO SIMULATIONS 33
The FE estimator performs worse for high values of β1 + 1 because of the larger Nickell bias
for larger values of β1 + 1. In Chapter 3, it was highlighted that the Nickell bias increases for
increasing values of the lagged dependent variable parameter. This phenomenon is now clearly
visible when we compare this set of 4 simulations to the 2 previous sets in which β1 + 1 was
lower. It was also highlighted in Chapter 3 that the Nickell bias increases for increasing values
of N , and decreases for increasing values of T . This can also clearly be seen in this last set of
simulations. Simulation 10 shows the largest Nickell bias, since T is small and N is large. In
simulation 11, the Nickell bias is smallest relatively to the biases of the other estimators, since
here T is larger and N smaller.
The reason for the bad performance of the Anderson-Hsiao and the First-Differenced GMM
estimator has the same origin. At the end of section 4.4, it was stated that the reason be-
hind possible poor performance of the FDGMM estimator is the problem of weak instruments,
meaning that the instruments are only weakly correlated with the endogenous variables they
are trying to instrument. This correlation becomes weaker when the dependent variable is close
to a random walk, which is the case when β1 + 1 gets close to 1. It are therefore these weak
level instruments, used to instrument the differenced lagged dependent variables, that cause the
poor performance of the AHIV and the FDGMM estimator when β1 + 1 is close to 1. In all 4
simulations of this set, the Anderson-Hsiao estimator performs worse than the FDGMM estima-
tor. The FDMM estimator uses a larger set of instruments, and uses therefore more information
when estimating the parameters in the Firm Growth Model. This leads to better estimates than
is the case for the limited set of instruments used by the AHIV estimator.
5.4.4 Conclusion simulation results
When comparing the 4 estimators in this specific case of the Firm Growth Model, we can con-
clude that in general the System GMM estimator performs best, regardless of the sample size and
value of the lagged dependent variable parameter. This estimator exhibits the lowest average
absolute bias and lowest average variance in all 12 Monte Carlo simulations. It can therefore be
concluded that it is most consistent and most efficient in comparison to the other 3 estimators
when estimating the Firm Growth Model.
MONTE CARLO SIMULATIONS 34
The Fixed Effects estimator clearly exhibits a large Nickell bias when T is small and N is large.
The Nickell bias is highest when β1+1 gets close to 1. This estimator, however, outperformed the
Anderson-Hsiao estimator in all cases where T = 10, despite being more simple. It is therefore
justified to say that for estimating the Firm Growth Model, the FE estimator can be chosen over
the AHIV estimator when confronted with data sets in which the number of time observation
points is large.
Comparing the AHIV estimator with the First-Differenced GMM estimator, we can see that
the FDGMM estimator is a better performer in all simulations. This is no surprise since the
FDGMM estimator is in a way an extension of the AHIV estimator, having an extended set
of instruments. Because of the same reason, both estimators perform bad when the value of
β1 + 1 gets close to 1. The instruments of both estimators become weak, which leads to biased
estimates.
CONCLUSION 35
Chapter 6
Conclusion
In this master’s dissertation, 4 estimators used for estimating dynamic panel data models were
compared. Monte Carlo simulations were used to perform this comparison, and the Firm Growth
Model was chosen as the dynamic panel data model during the Monte Carlo simulations. This
model was presented in Chapter 2 of this master’s dissertation.
The 4 estimators under study were: the Fixed Effects estimator, the Anderson-Hsiao Instrument
Variable estimator, the First-Differenced GMM estimator and the System GMM estimator. All
4 estimators were presented shortly in Chapters 3 and 4, using matrix formulation. Such a for-
mulation was chosen, since the implementation of the estimators in MATLAB was done using
this format. The theoretical advantages and shortcomings of the 4 estimators were presented in
these chapters as well. For the Fixed Effects estimator, the Nickell bias is the most important
shortcoming, making estimates for data sets with a low number of time observation points and
a high number of individuals unreliable. The Anderson-Hsiao and the First-Differenced GMM
estimator share the problem of weak instruments when the values of the dependent variable
in the model gets close to a random walk. This occurs when the parameter corresponding to
the lagged dpendent variable in the model gets close to 1. These weak instruments causes the
estimates to be unreliable. The System GMM estimator is the most complicated estimator of
the 4 estimators studied here, but has in theory the best performance.
The theoretical properties of the 4 estimators were putted to the test using Monte Carlo sim-
ulations, implemented in the mathematical software package MATLAB. Each estimator was
implemented in a separate MATLAB function, and the simulation script called upon these func-
CONCLUSION 36
tions during execution. In the simulation script, a data set was generated in each iteration of the
Monte Carlo simulation, based upon a real-world firm growth data of firms in Flemish munici-
palities. Processing of the results of each Monte Carlo simulation was done in another separate
script. In this script, the average absolute bias and average variance of the estimators were
calculated, being a performance measure for consistency and efficiency, respectively. In total,
12 separate simulations were performed. Between these simulations, the value of 3 parameters
was changed: the value of the lagged dependent variable parameter (β1 +1), the number of time
observation points in the generated data set (T ) and the number of individuals in the generated
data set (N).
In all 12 Monte Carlo simulations, the System GMM estimator was more consistent and efficient
than the other 3 estimators. We can therefore conclude that the System GMM estimator yields
the best estimates when estimating the parameters in the Firm Growth Model. The Fixed Ef-
fects estimator clearly exhibited a Nickell bias when T was small and N large. The bias was
highest when the value of the lagged dependent variable parameter β1+1 got close to 1. When T
was large and N was small, however, the more simple Fixed Effects estimator outperformed the
more complicated Anderson-Hsiao and First-Differenced GMM estimator. These 2 estimators
performed badly when β1 + 1 got close to 1, because of the weak instruments used in these esti-
mators. The FDGMM estimator always outperformed the AHIV estimator, since the FDGMM
estimator is in a way an extension of the AHIV estimator.
Based upon these results it is safe to say that, when estimating the parameters in the Firm
Growth Model, the System GMM estimator should be the preferred option in order to achieve
the best results.
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 37
Appendix A
Matlab Implementation of
Estimators and Monte Carlo
Simulations
A.1 FE final.m
1 f unc t i on [ Alpha , Beta ] = FE f ina l ( Y,X, n un i t s )
2 % This func t i on implements the Fixed F f f e c t s e s t imator .
3 % Bob Mertens 2017
4
5 % Model in matrix notat ion : Y = D*Alpha + X*Beta + e .
6 % Beta = c o e f f i c i e n t s f o r r e g r e s s o r s ( no i n t e r c e p t c o e f f i c i e n t ! ) .
7 % Alpha = Fixed E f f e c t c o e f f i c i e n t s per c ros s−s e c t i o n a l un i t .
8 % Y = dependent v a r i a b l e s matrix .
9 % X = re g r e s s o r values , same s t ru c tu r e as Y matrix .
10 % Structure : rows grouped per c ros s−s e c t i o n a l unit , and advancing time un i t .
11
12 %% Construct ing aux i l i a r y matr i ce s to be used during c a l c u l a t i o n s
13 D=kron ( eye ( n un i t s ) , ones ( l ength (Y) / n un i t s , 1 ) ) ;
14 P D=D* inv (D’*D) *D’ ;
15 M D=eye ( l ength (Y) )−P D ;
16
17 %% Calcu la t ing Beta and f i x ed e f f e c t s
18 Beta = inv (X’*M D*X) *X’*M D*Y;
19 Alpha = inv (D’*D) *D’ * (P D*Y−P D*X*Beta ) ;
20
21 end
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 38
A.2 AHIV final.m
1 f unc t i on [ Beta ] = AHIV final ( Y,X, n un i t s , endo )
2 % This func t i on implements the F i r s t D i f f e r enc ed IV es t imator (Anderson−Hsioa ) .
3 % Bob Mertens 2017
4
5 % Beta = Calcu lated c o e f f i c i e n t s o f r e g r e s s o r s ( no i n t e r c e p t c o e f f i c i e n t ! ) .
6 % Y = Dependent v a r i a b l e s matrix .
7 % X = Regressor va lue s ( i n c l ud ing au t o r e g r e s s i v e r e g r e s s o r s ) , same s t r u c tu r e as Y matrix .
8 % ( Struc ture : rows grouped per c ros s−s e c t i o n a l unit , and advancing time un i t )
9 % endo : vec to r with s i z e equal to number o f r e g r e s s o r e . 1 f o r endogenous , 0 f o r exogenous
.
10
11 %% Construct ing aux i l i a r y matr i ce s to be used during c a l c u l a t i o n s
12 T=length (Y) / n un i t s +1;
13 M d=[ ze ro s (T−2 ,1) diag ( ones (T−2 ,1) ) ]− [ d iag ( ones (T−2 ,1) ) z e r o s (T−2 ,1) ] ;
14 F=kron ( eye ( n un i t s ) ,M d) ;
15
16 %% Create Y d i f f : f i r s t −d i f f e r e n c e d dependent v a r i a b l e s
17 Y d i f f=F*Y;
18
19 %% Create X d i f f : f i r s t −d i f f e r e n c e d r e g r e s s o r v a r i a b l e s
20 X d i f f=F*X;
21
22 %% Creat ing Z : instrument matrix conta in ing a l l instrument v a r i a b l e s
23 Z l=kron ( eye ( n un i t s ) , [ eye (T−2) z e ro s (T−2 ,1) ] ) ;
24 Z= [ ] ;
25 f o r j =1: s i z e (X, 2 )
26 i f endo ( j )==1
27 Z j=Z l *X( : , j ) ;
28 e l s e
29 Z j=X d i f f ( : , j ) ;
30 end
31 Z=[ Z j Z ] ;
32 end
33
34 %% Calcu la t ing Beta .
35 P=Z*(Z ’*Z) ˆ(−1)*Z ’ ;
36 Beta=(X d i f f ’*P*X d i f f ) ˆ(−1)*X di f f ’*P*Y d i f f ;
37 end
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 39
A.3 FDGMM final.m
1 f unc t i on [ Beta ] = FDGMM final ( Y,X, n un i t s , endo )
2 % This func t i on implements the F i r s t D i f f e r enc ed GMM est imator ,
3 % using a two−s tep procedure .
4 % Bob Mertens 2017
5
6 % Beta = c o e f f i c i e n t s f o r r e g r e s s o r s ( no i n t e r c e p t c o e f f i c i e n t ,
7 % aut o r e g r e s s i v e c o e f f i c i e n t s inc luded ) .
8 % Y = dependent v a r i a b l e s vec to r .
9 % X = re g r e s s o r va lue s ( i n c l ud ing au t o r e g r e s s i v e r e g r e s s o r s ) , same s t r u c tu r e as Y matrix .
10 % Structure : rows grouped per c ros s−s e c t i o n a l unit , and advancing time un i t .
11 % endo : vec to r with s i z e equal to number o f r e g r e s s o r s , 1 f o r endogenous , 0 f o r exogenous
.
12
13 %% Construct ing aux i l i a r y matr i ce s to be used during c a l c u l a t i o n s
14 T=length (Y) / n un i t s +1;
15 M d=[ ze ro s (T−2 ,1) diag ( ones (T−2 ,1) ) ]− [ d iag ( ones (T−2 ,1) ) z e r o s (T−2 ,1) ] ;
16 F=kron ( eye ( n un i t s ) ,M d) ;
17
18 %% Create Y d i f f : f i r s t −d i f f e r e n c e d dependent v a r i a b l e s
19 Y d i f f=F*Y;
20
21 %% Create X d i f f : f i r s t −d i f f e r e n c e d r e g r e s s o r v a r i a b l e s
22 X d i f f=F*X;
23
24 %% Creat ing Z : instrument matrix conta in ing a l l instrument v a r i a b l e s
25 Z= [ ] ;
26 f o r j =1: s i z e (X, 2 )
27 i f endo ( j )==1
28 Z j=ze ro s ( (T−2)* n uni t s , (T−2)*(T−1)/2) ;
29 f o r k=1: n un i t s
30 l =0;
31 f o r i =1:T−2
32 i f i==1
33 l =1;
34 e l s e
35 l=l+( i −1) ;
36 end
37 Z j ( ( k−1)*(T−2)+i , l : l+i −1)=X( ( k−1)*(T−1)+1:(k−1)*(T−1)+i , j ) ;
38 end
39 end
40 e l s e
41 Z j=X d i f f ( : , j ) ;
42 end
43 Z=[Z Z j ] ;
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 40
44 end
45
46 %% Calcu la te f i r s t −s tep Beta .
47 G=kron ( eye ( n un i t s ) , (M d*M d ’ ) ’ ) ;
48 W s1=inv (Z ’*G*Z) ;
49 Beta s1=inv ( X d i f f ’*Z*W s1*Z ’* X d i f f ) *X di f f ’*Z*W s1*Z ’* Y d i f f ;
50
51 %% Calcu la te second−s tep Beta us ing optimal we ight ing matrix .
52 e d i f f=Y di f f−X d i f f *Beta s1 ;
53 W opt=inv (Z ’* e d i f f * e d i f f ’*Z) ;
54 Beta=inv ( X d i f f ’*Z*W opt*Z ’* X d i f f ) *X di f f ’*Z*W opt*Z ’* Y d i f f ;
55 end
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 41
A.4 SGMM final.m
1 f unc t i on [ Beta ] = SGMM final ( Y,X, n un i t s , endo )
2 % This func t i on implements the System GMM est imator , us ing a two−s tep procedure .
3 % Bob Mertens 2017
4
5 % Beta = c o e f f i c i e n t s f o r r e g r e s s o r s ( no i n t e r c e p t c o e f f i c i e n t ,
6 % dynamic c o e f f i c i e n t s inc luded ) .
7 % Y = dependent v a r i a b l e s vec to r .
8 % X = re g r e s s o r va lue s ( i n c l ud ing au t o r e g r e s s i v e r e g r e s s o r s ) , same s t r u c tu e r as Y matrix .
9 % Structure : rows grouped per c ros s−s e c t i o n a l unit , and advancing time un i t .
10 % endo : vec to r with s i z e equal to number o f r e g r e s s o r s , 1 f o r endogenous , 0 f o r exogenous
.
11
12 %% Construct ing aux i l i a r y matr i ce s to be used during c a l c u l a t i o n s
13 T=length (Y) / n un i t s +1;
14 M d=[ ze ro s (T−2 ,1) diag ( ones (T−2 ,1) ) ]− [ d iag ( ones (T−2 ,1) ) z e r o s (T−2 ,1) ] ;
15 M l=[ z e r o s ( (T−2) ,1 ) eye (T−2) ] ;
16 M tot=[M d ; M l ] ;
17 F=kron ( eye ( n un i t s ) ,M tot ) ;
18
19 %% Create Y d i f f : f i r s t −d i f f e r e n c e d dependent v a r i a b l e s
20 Y tot=F*Y;
21
22 %% Create X d i f f : f i r s t −d i f f e r e n c e d r e g r e s s o r v a r i a b l e s
23 X tot=F*X;
24
25
26 %% Creat ing Z : instrument matrix conta in ing a l l instrument v a r i a b l e s
27 Z= [ ] ;
28 f o r j =1: s i z e (X, 2 )
29 i f endo ( j )==1
30 Z j=ze ro s ( (T−2)* n uni t s , (T−2)*(T−1)/2) ;
31 f o r k=1: n un i t s
32 l =0;
33 f o r i =1:T−2
34 i f i==1
35 l =1;
36 e l s e
37 l=l+( i −1) ;
38 end
39 Z j ( ( k−1)*(T−2)+i , l : l+i −1)=X( ( k−1)*(T−1)+1:(k−1)*(T−1)+i , j ) ;
40 Z j ( ( k−1)*(T−2)*2+(T−2)+i , (T−2)*(T−1)/2+ i )=X tot ( ( k−1)*(T−2)*2+i , j ) ;
41 end
42 end
43 e l s e
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 42
44 Z j=X tot ( : , j ) ;
45 end
46 Z=[Z Z j ] ;
47 end
48
49 %% Calcu la te f i r s t −s tep Beta
50 G D=M d*M d ’ ;
51 G L=eye (T−2) ;
52 G T=blkd iag (G D, G L) ;
53 G=kron ( eye ( n un i t s ) ,G T ’ ) ;
54 W s1=inv (Z ’*G*Z) ;
55 Beta s1=inv ( X tot ’*Z*W s1*Z ’* X tot ) *X tot ’*Z*W s1*Z ’* Y tot ;
56
57 %% Calcu la te second−s tep Beta us ing optimal we ight ing matrix .
58 e t o t=Y tot−X tot *Beta s1 ;
59 W opt=inv (Z ’* e t o t * e to t ’*Z) ;
60 Beta=inv ( X tot ’*Z*W opt*Z ’* X tot ) *X tot ’*Z*W opt*Z ’* Y tot ;
61 end
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 43
A.5 MC simulations final.m
1 %% Scr i p t implementing the Monte Carlo s imu la t i on s o f 4 e s t imato r s
2 % Model : dynamic Firm Growth Model
3 % Data gene ra t i on : based upon rea l−world populat ion ( Flemish f i rms )
4 % Bob Mertens 2017
5
6 c l o s e a l l
7 c l e a r a l l
8
9 %% Input parameters f o r MC s imu la t i on
10 T=5; %sample s i z e , time dimension
11 n c=100; %sample s i z e , c ros s−s e c t i o n a l dimension
12 B1=−0.9; %value o f beta 1
13 N=1000; %number o f i t e r a t i o n s in s imu la t i on
14
15 %% Model :
16 % ln ( S i t )= B0 i + (B1+1)* ln ( S i ( t−1) ) + B2*( ln ( S i ( t−1) ) ) ˆ2
17 % + B3* ln ( A i ( t−1) ) + B4*( ln ( A i ( t−1) ) ) ˆ2 + B5* ln ( S i ( t−1) ) * ln ( A i ( t−1) )
18
19 B=[B1+1 −0.0804 −0.0048 0 .0045 0 . 0 0 7 ] ’ ; %Co e f f i c i e n t s f o r r e g r e s s o r s
20
21 Beta es t to t FE=ze ro s ( l ength (B) ,N) ;
22 Beta est tot AHIV=ze ro s ( l ength (B) ,N) ;
23 Beta est tot FDGMM=zero s ( l ength (B) ,N) ;
24 Beta est tot SGMM=zero s ( l ength (B) ,N) ;
25
26 %% N i t e r a t i o n s o f MC s imu la t i on
27 f o r i =1:N
28 i
29 Y=ze ro s ( (T−1)*n c , 1 ) ;
30 X=ze ro s ( (T−1)*n c , 5 ) ;
31 f o r k=1: n c
32 lnS0=normrnd (6 . 3686 , 1 . 4 655 ) ;
33 lnA0=normrnd (2 . 5501 , 0 . 8 491 ) ;
34 lnS1=[ lnS0 lnS0 ˆ2 lnA0 lnA0ˆ2 lnS0 * lnA0 ]*B+normrnd (0 , 1 ) ;
35 lnA1=log ( exp ( lnA0 )+1) ;
36 f o r j =1:(T−1)
37 i f ( j==1)
38 X(( k−1)*(T−1)+j , : ) =[ lnS1 lnS1 ˆ2 lnA1 lnA1ˆ2 lnS1 * lnA1 ] ;
39 Y(( k−1)*(T−1)+j )=X( ( k−1)*(T−1)+j , : ) *B+normrnd (0 , 1 ) ;
40 e l s e
41 lnS=Y( ( k−1)*(T−1)+j−1) ;
42 lnA=log ( exp (X( ( k−1)*(T−1)+j −1 ,3) )+1) ;
43 X(( k−1)*(T−1)+j , : ) =[ lnS lnS ˆ2 lnA lnAˆ2 lnS * lnA ] ;
44 Y(( k−1)*(T−1)+j )=X( ( k−1)*(T−1)+j , : ) *B+normrnd (0 , 1 ) ;
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 44
45 end
46 end
47 end
48
49
50 % Calcu la t i on o f model c o e f f i c i e n t s
51 endo = [1 1 0 0 1 ] ;
52 [ Alpha , Beta e s t to t FE ( : , i ) ]= FE f ina l (Y,X, n c ) ;
53 Beta est tot AHIV ( : , i )=AHIV final (Y,X, n c , endo ) ;
54 Beta est tot FDGMM ( : , i )=FDGMM final (Y,X, n c , endo ) ;
55 Beta est tot SGMM ( : , i )=SGMM final (Y,X, n c , endo ) ;
56
57 end
58
59 %% Saving the r e s u l t s
60 date=date ;
61 time=f i x ( c l o ck ) ;
62 name=s p r i n t f ( ’%i %i %i %i %i r e s u l t s B 1 1 %.1 f N %i T %i .mat ’ , time (1 ) , time (2 ) , time (3 ) , time
(4 ) , time (5 ) ,B1+1,n c ,T) ;
63 save (name) ;
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 45
A.6 MC results processing final.m
1 %% Scr i p t inmplementing the p ro c e s s i ng and v i s u a l i z a t i o n o f the MC s imu la t i on r e s u l t s .
2 % Bob Mertens 2017
3
4 c l o s e a l l
5
6 %% Calcu la t ing abso lu t e d i f f e r e n c e between e s t imate s and populat ion parameters .
7 Be t a e s t t o t FE d i f f=abs ( Beta est tot FE−repmat (B, 1 ,N) ) ;
8 Beta e s t t o t AHIV d i f f=abs ( Beta est tot AHIV−repmat (B, 1 ,N) ) ;
9 Beta est tot FDGMM diff=abs (Beta est tot FDGMM−repmat (B, 1 ,N) ) ;
10 Beta est tot SGMM dif f=abs ( Beta est tot SGMM−repmat (B, 1 ,N) ) ;
11
12 %% Calcu la t ing average abso lu t e b i a s o f each parameter
13 Beta e s t t o t FE b i a s=mean( Be t a e s t t o t FE d i f f ’ ) ;
14 Beta es t tot AHIV bias=mean( Beta e s t to t AHIV d i f f ’ ) ;
15 Beta est tot FDGMM bias=mean( Beta est tot FDGMM diff ’ ) ;
16 Beta est tot SGMM bias=mean( Beta est tot SGMM dif f ’ ) ;
17
18 %% Calcu la t ing average abso lu t e b i a s over a l l parameters
19 Beta e s t t o t FE b ia s avg=mean( Be ta e s t t o t FE b i a s ) ;
20 Beta es t to t AHIV bias avg=mean( Beta es t tot AHIV bias ) ;
21 Beta est tot FDGMM bias avg=mean( Beta est tot FDGMM bias ) ;
22 Beta est tot SGMM bias avg=mean( Beta est tot SGMM bias ) ;
23
24 %% Calcu la t ing square o f abso lu t e b i a s e s
25 Be t a e s t t o t FE d i f f 2=Be t a e s t t o t FE d i f f . ˆ 2 ;
26 Beta e s t t o t AHIV d i f f 2=Beta e s t t o t AHIV d i f f . ˆ 2 ;
27 Beta est tot FDGMM diff2=Beta est tot FDGMM diff . ˆ 2 ;
28 Beta est tot SGMM dif f2=Beta est tot SGMM dif f . ˆ 2 ;
29
30 %% Calcu la t ing vara ince per parameter
31 Beta e s t to t FE var=mean( Be t a e s t t o t FE d i f f 2 ’ ) ;
32 Beta est tot AHIV var=mean( Beta e s t to t AHIV d i f f 2 ’ ) ;
33 Beta est tot FDGMM var=mean( Beta est tot FDGMM diff2 ’ ) ;
34 Beta est tot SGMM var=mean( Beta est tot SGMM dif f2 ’ ) ;
35
36 %% Calcu la t ing average vara ince over a l l parameters
37 Beta e s t to t FE var avg=mean( Beta e s t to t FE var ) ;
38 Beta est tot AHIV var avg=mean( Beta est tot AHIV var ) ;
39 Beta est tot FDGMM var avg=mean( Beta est tot FDGMM var ) ;
40 Beta est tot SGMM var avg=mean( Beta est tot SGMM var ) ;
41
42 %% Figure p l o t t i g
43 c = { ’FE ’ , ’AHIV ’ , ’FDGMM’ , ’SGMM’ } ;
44
MATLAB IMPLEMENTATION OF ESTIMATORS AND MONTE CARLO SIMULATIONS 46
45 f i g u r e
46
47 a=[ [ Be ta e s t t o t FE b ia s avg Beta es t to t AHIV bias avg Beta est tot FDGMM bias avg
Beta est tot SGMM bias avg ] ’ z e r o s (4 , 1 ) ] ;
48 b=[ z e r o s (4 , 1 ) [ Be ta e s t to t FE var avg Beta est tot AHIV var avg
Beta est tot FDGMM var avg Beta est tot SGMM var avg ] ’ ] ;
49
50 [AX,H1 ,H2 ] =plotyy ( [ 1 : 4 ] , a , [ 1 : 4 ] , b , ’ bar ’ , ’ bar ’ ) ;
51 s e t (H1 , ’ FaceColor ’ , ’ r ’ ) % a
52 s e t (H2 , ’ FaceColor ’ , ’ b ’ ) % b
53
54 s t r=s p r i n t f ( ’ (1+B 1 ) =%.1f , T=%i , N=%i ’ ,B(1) , T, n c ) ;
55 t i t l e ( s t r , ’ FontSize ’ ,16 , ’ FontWeight ’ , ’ bold ’ )
56 lh1=legend (H1 , ’ Bias ’ , ’ Locat ion ’ , ’ northwest ’ ) ;
57 s e t ( lh1 , ’ FontSize ’ ,12) ;
58 lh2=legend (H2 , ’Var ’ , ’ Locat ion ’ , ’ nor theas t ’ ) ;
59 s e t ( lh2 , ’ FontSize ’ ,12) ;
60 s e t (AX, ’ x t i c k l a b e l ’ , c , ’ FontSize ’ ,14) ;
61 ylim (AX(1) , [ 0 1 ] )
62 ylim (AX(2) , [ 0 2 ] )
63 s e t (AX(1) , ’ y t i c k ’ , [ 0 : 0 . 1 2 5 : 1 ] )
64 s e t (AX(2) , ’ y t i c k ’ , [ 0 : 0 . 2 5 : 2 ] )
65 y l ab e l (AX(1) , ’ Average abso lu te b i a s ’ , ’ FontSize ’ ,14)
66 y l ab e l (AX(2) , ’ Average var iance ’ , ’ FontSize ’ ,14)
67 g r id on
BIBLIOGRAPHY 47
Bibliography
[1] Anderson, T.W./ Hsiao, C., 1982 “Formulation and Estimation of Dynamic Models Using
Panel Data” Journal of Econometrics 18, 47-82.
[2] Arellano, M., 1989 “A Note on the Anderson-Hsiao Estimator for Panel Data” Economics
Letters 31, 337-341.
[3] Arellano, M./ Bond, S.R., 1991 “Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations” Review of Economic Studies 58,
277-297.
[4] Arellano, M./ Bover, 0., 1995 “Another Look at Instrumental Variables Estimation of
Error-Component Models” Journal of Econometrics 68, 29-51.
[5] Blundell, R./ Bond, S.R., 1998 “Initial Conditions and Moment Restrictions in Dynamic
Panel Data Models” Journal of Econometrics 87, 115-143.
[6] Evans D.S., 1987a “The relationship between rm growth, size and age: estimates for 100
manufacturing industries” Journal of Industrial Economics 35, 567-581.
[7] Evans D.S., 1987b “Tests of alternative theories of rm growth” Journal of Political Economy
95, 657-674.
[8] FitzRoy, F.R., Kraft, K., 1991 “Firm size, growth and innovation: some evidence from West
Germany” In: Acs, Z.J., Audretsch, D.B. (Eds.), Innovation and Technological Change: an
International Comparison, Harvester Wheatsheaf, New York.
[9] Hall B.H., 1987 “The relationship between rm size and rm growth in the US manufacturing
sector” Journal of Industrial Economics 35, 583-606.
[10] Hansen L.P., 1982 “Large Sample Properties of Generalized Method of Moments Estima-
tors” Econometrica 50, 1029-1054.
BIBLIOGRAPHY 48
[11] Hart P.E., Prais S.J., 1956 “The analysis of business concentration: a statistical approach”
Journal of the Royal Statistical Society (Series A), 150-191.
[12] Hymer S., Pashigian P., 1962 “Firm size and rate of growth” Journal of Political Economy
52, 556-569.
[13] Jovanovic B., 1982 “Selection and evolution of industry” Econometrica 50, 649-670.
[14] Kumar M.S., 1985 “Growth, acquisition activity and rm size: evidence from the United
Kingdom” Journal of Industrial Economics, 327-338.
[15] Lucas R.E., 1967 “Adjustment costs and the theory of supply” Journal of Political Economy
75, 321-344.
[16] Manseld E., 1962 “Entry, Gibrats law, innovation, and the growth of rms” American
Economic Review 52, 1023-1051.
[17] Mata J., 1994 “Firm growth during infancy” Small Business Economics 6, 27-93.
[18] Nickell S.J., 1981 “Biases in Dynamic Models with Fixed Effects” Econometrica 49,
1417-1426.
[19] Simon H.A., Bonini C.P., 1958 “The size distribution of business firms” American Economic
Review 48, 607-617.
[20] Van Cauwenberge P., Beyne P., Vander Bauwhede H., 2016 “An Empirical Investigation
of the Influence of Municipal Fiscal Policy on Firm Growth” Environment and Planning C:
Politics and Space 34, 1825 - 1842.