Markov-Switching Vector Autoregressive Models

Markov-Switching Vector Autoregressive Models: Monte Carlo Experiment, Impulse Response Analysis, and Granger-Causal Analysis

Matthieu Droumaguet

Thesis submitted for assessment with a view to obtaining the degree of Doctor of Economics of the European University Institute

Florence, December 2012

Droumaguet, Matthieu (2012), Markov-Switching Vector Autoregressive Models: Monte Carlo experiment, impulse response analysis, and Granger-Causal analysis European University Institute

DOI: 10.2870/63610

European University Institute Department of Economics

Markov-Switching Vector Autoregressive Models: Monte Carlo Experiment, Impulse Response Analysis, and Granger-Causal Analysis

Matthieu Droumaguet

Thesis submitted for assessment with a view to obtaining the degree of Doctor of Economics of the European University Institute

Examining Board Prof. Massimiliano Marcellino, European University Institute (Supervisor) Prof. Ana Beatriz Galvão, Queen Mary University of London Prof. Hans-Martin Krolzig, University of Kent Prof. Helmut Lütkepohl, DIW Berlin and Freie Universität Berlin

© Matthieu Droumaguet, 2012 No part of this thesis may be copied, reproduced or transmitted without prior permission of the author


DOI: 10.2870/63610


DOI: 10.2870/63610

Contents

1 Monte Carlo characterization of MS-VARs 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 VAR models with Markov-switching in regime . . . . . . . . . . . . . . . . . 41.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Monte Carlo experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 Finite-sample evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.6 Summary and implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

A Appendix 35

A.1 Monte Carlo experiment results for MS-VAR models . . . . . . . . . . . . . . 35A.2 Statistics’ ratios of MS-VAR models over VAR models . . . . . . . . . . . . . 53

Bibliography 65

2 Bayesian impulse responses for MS-VAR models 67

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682.2 Markov-switching vector autoregressive model . . . . . . . . . . . . . . . . . 702.3 Impulse responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722.4 Likelihood, prior, and posterior . . . . . . . . . . . . . . . . . . . . . . . . . . 802.5 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842.6 Nonlinearities in oil markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

B Appendix 107

B.1 Alternative classical approach, the rolling estimation . . . . . . . . . . . . . 107B.2 Structural breaks, the Qu and Perron test . . . . . . . . . . . . . . . . . . . . 110

iii


DOI: 10.2870/63610

B.3 Kilian (2009)’s impulse responses . . . . . . . . . . . . . . . . . . . . . . . . . 112

Bibliography 113

3 Testing noncausality in MS-VAR models 117

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183.2 MS-VAR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213.3 Granger Causality - Following Warne (2000) . . . . . . . . . . . . . . . . . . 1263.4 Bayesian Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1323.5 The Block MH sampler for restricted MS-VAR models . . . . . . . . . . . . . 1373.6 Granger causal analysis of US money-income data . . . . . . . . . . . . . . . 1443.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

C Appendix 157

C.1 Alternative restrictions for noncausality . . . . . . . . . . . . . . . . . . . . . 157C.2 Summary of the posterior densities simulations . . . . . . . . . . . . . . . . . 158C.3 Characterization of estimation efficiency . . . . . . . . . . . . . . . . . . . . . 161

Bibliography 163


DOI: 10.2870/63610

Abstract

This dissertation has for prime theme the exploration of nonlinear econometric modelsfeaturing a hidden Markov chain. Occasional and discrete shifts in regimes generateconvenient nonlinear dynamics to econometric models, allowing for structural changessimilar to the exogenous economic events occurring in reality.

The first paper sets up a Monte Carlo experiment to explore the finite-sample propertiesof the estimates of vector autoregressive models subject to switches in regime governedby a hidden Markov chain. The main main finding of this article is that the accuracywith which regimes are determined by the expectation maximixation algorithm showsimprovement when the dimension of the simulated series increases. However this gaincomes at the cost of higher sample size requirements for models with more variables.

The second paper advocates the use of Bayesian impulse responses for a Markov-switching vector autoregressive model. These responses are sensitive to the Markov-switching properties of the model and, based on densities, allow statistical inference tobe conducted. Upon the premise of structural changes occurring on oil markets, theempirical results of Kilan (2009) are reinvestigated. The effects of the structural shocks arecharacterized over four estimated regimes. Over time, the regime dynamics are evolvinginto more competitive oil markets, with the collapse of the OPEC.

Finally, the third paper proposes a method of testing restrictions for Granger non-causality in mean, variance and distribution in the framework of Markov-switching VARmodels. Due to the nonlinearity of the restrictions derived by Warne (2000), classical testshave limited use. The computational tools for posterior inference consist of a novel BlockMetropolis-Hastings sampling algorithm for estimation of the restricted models, and ofstandard methods of computing the Posterior Odds Ratio. The analysis may be appliedto financial and macroeconomic time series with changes of parameter values over timeand heteroskedasticity.

v


DOI: 10.2870/63610

Keywords: Markov-switching Vector Autoregressive models, Expectation Maximization al-gorithm, Monte Carlo experiment, Gibbs Sampling, Impulse Response Analysis, GrangerCausality, Regime Inference, Posterior Odds Ratio, Block Metropolis-Hastings Sampling.

JEL classification: C11, C15, C22, C32, C53, E32, Q43


DOI: 10.2870/63610

Acknowledgements

Professor Massimiliano Marcellino

Professor Helmut Lutkepohl

Tomasz Wozniak

My family

My dear friends

Thank you

vii


DOI: 10.2870/63610


DOI: 10.2870/63610

Chapter 1

Characterization of the Estimates ofMarkov-Switching VectorAutoregressive Models ThroughMonte Carlo Simulations

Abstract. Through a Monte Carlo experiment, this paper examines the finite-sample properties of the estimates of vector autoregressive models subjectto switches in regime governed by a hidden Markov chain. The main mainfinding of this article is that the accuracy with which regimes are determined bythe EM algorithm shows improvement when the dimension of the simulatedseries increases. However this gain comes at the cost of higher sample sizerequirements for models with more variables.

I thank Pierre Guerin, Helmut Lutkepohl, and Massimilliano Marcellino for their very useful commentson the paper.

1


DOI: 10.2870/63610

2 CHAPTER 1. MONTE CARLO CHARACTERIZATION OF MS-VARS

1.1 Introduction

The discipline of econometrics is devoted to the analysis and test of the empirical relation-ships between economic variables. Hendry (1996) chronicles the historical debate over theconstancy of the parameters underlying economic models, whose essence lies in the essayby Robbins (1932), doubting about the existence of permanent and constant values for theformal categories of economic analysis. Or rephrased through Robbins’s frugal metaphor:

“The demand for herrings, however, is not a simple derivative of needs. It is, as it were, afunction of a great and many apparently independent variables. It is a function of fashion, and byfashion is meant something more than the ephemeral results of an Eat British Herrings campain;the demand for herrings might be substantially changed by a change in theological views of theeconomic subjects entering the market. It is a function of the availability of other foods. [. . . ]Discoveries in the art of cooking may change their relative desirability. Is it possible reasonably tosuppose that coefficients derived from the observation of a particular herring market at a particulartime and place have any permanent significance - save as Economic History?”

Major exogenous events, such as the formation of the international monetary systemat Bretton Woods after the second world war, are quite likely to redefine the economiclandscape and arguably to change the predictive power of formerly insightful econometricmodels. Hendry (1996) futuristically illustrates this point:

“An analogy might be a spacecraft to a distant planet being exactly on course and forecast toland successfully, just before being destroyed by a meteorite.”

The huge literature on tests for structural breaks, surveyed in Hansen (2001) or Perron(2006) testifies to the whole attention that econometricians pay to this phenomena. Onceagreed that doubt may be cast upon the stability of some data generating processes gov-erning economic time series, one needs to find a methodology to deal with it. An appealingeconometric framework taking into account such structural changes is the one includingdiscrete regimes governed by a hidden Markov chain, modeling time-series as combina-tion of data-generating processes, and popularized in Hamilton (1989). Occasional anddiscrete shifts in regimes generate the required nonlinear dynamics to econometric mod-els, allowing for structural changes similar to the exogenous economic events occurringin reality. The unobservable characteristic of the Markov chain is also convenient for theeconometrician who in practice has to draw probabilistic inference about what the currentregime of the time series is. The growing popularity of models with regime switchingand the large scope of investigated economic time series for which dramatic breaks in


DOI: 10.2870/63610

1.1. INTRODUCTION 3

their behavior occur due to some event is surveyed in Hamilton (2008). Among the mostfamous applications of such models, is certainly Hamilton (1989), where the successionof expansionary and recessionary phases in business cycles is considered. Sims and Zha(2006) use switches in regimes within a structural vector autoregressive [VAR] modelto assess the impact of changes in the U.S. monetary policy. Currency crises were alsostudied through the Markov-switching framework in Jeanne and Masson (2000), with theempirical example of speculative attacks against the French franc in 1987–1993. The areaof fiscal policy is examined by Davig (2004), with the U.S. tax reforms of 1964 and 1981.Markov-switching models are not restricted to economic time series, and applications tofinancial time series also have been considered, in for instance Dai et al. (2007), wherethe latent variables introduce regime-shift risks to dynamic term structure model used forU.S. Treasury zero-coupon bond yields.

However, the finite-sample properties of vector autoregressive models with shifts inthe regime have been scarcely studied. Besides Psaradakis and Sola (1998) who perform aMonte Carlo experiment on univariate autoregressive [AR] processes with Markov regime-switching in the mean and in the variance, I am not aware of any attempt to characterizethe estimates of such models by simulation. This is certainly due to the non-linearitiespresent in the models, rendering their estimation problematic to program. Hence, whilethe estimation theory has already been formulated– see Krolzig (1997) for full coverage ofestimation– few software packages for estimation are available to the practitioner.1

The contribution of this paper is twofold. Firstly it extends Psaradakis and Sola (1998)to multivariate time series with up to 20 equations. Markov switching vector autoregres-sive [MS–VAR] models considered for the staged Monte Carlo experiment are modelswith switches in intercepts. Three classes of models are scrutinized: models with regimeswitches in the intercepts only, models with regime switches in the variance only, andmodels with regime switches in all the model parameters, in other words the interceptsvector, the autoregressive coefficients matrix, and the variance-covariance matrix. Sec-ondly, studies such as Ang and Bekaert (2002) show that incorporating incorporatinginternational short-rate and term spread information to interest rate series provide betterregime classification than in the univariate case. Hence I consider statistics for evaluatingthe accuracy in the estimation of the latent regime in the univariate and multivariate cases.This can be seen as an exercise of precision for dating the turning points of the time series.

1An overview of existing software packages is provided in Section 1.3.


DOI: 10.2870/63610


The results of this Monte Carlo experiment pave the way for future applications makinguse of large dimensional MS–VAR models. The gain of precision in regime estimationoccurring when adding new variables to the model, justifies the use of large MS–VARmodels in applications involving regime changes in the economy, and the detection ofturning points.

The remainder of the present paper is structured as follows. Section 1.2 introducesmultivariate models with switches in regime governed by Markov chains. Section 1.3discusses the estimation of such processes and the algorithm used to estimate them, theexpectation maximization [EM] algorithm. Next, the setup for a Monte Carlo experimentis devised in Section 1.4. Finally, Section 1.5 dissects the results of the experiment.

1.2 Vector autoregressive models with Markov-switching in regime

MS–VAR are non-linear models, confluent of the linear vector autoregressive models andof the hidden Markov chain models. Krolzig (1997) discusses them in depth, from theirorigin to their estimation. Krolzig (1997) established the taxonomy of models belongingto the MS–VAR class. Models can be classified into two categories: models with switchesin their intercept and models with switches in their mean. While the seminal applicationof Hamilton (1989) confronted a MS–VAR model incorporating switches in mean to U.S.GDP series — for the study of business cycles — this class of models is more complex toestimate due to the dependence of the mean to history of the latent variable.2 The class ofmodels with switches in intercept, comparatively behaving nicely in terms of estimation,are more suited to a Monte Carlo experiment.

Among those two categories, models can be further classified, depending on which ofVAR parameters are allowed to vary across regimes. The three VAR parameters are theintercept (or mean), the autoregressive coefficients, and the variance-covariance matrix.

The next section is dedicated to the three types of models estimated in this Monte Carloexperiment.

2My experience in estimating models with switches in the mean was unfruitful, the EM algorithm sufferingfrom convergence issues. Even after convergence, the estimates were heavily depending on the parameters’initial values.


DOI: 10.2870/63610

1.2. VAR MODELS WITH MARKOV-SWITCHING IN REGIME 5

1.2.1 MSI(M)–VAR(p) model

In MSI(M)–VAR(p) models, as defined in Krolzig (1997), only the intercepts vary acrossregimes. M stands for the number of regimes and p for the number of lags of autoregressiveterms to take into account. If yt is a K dimensional time-series, the corresponding MSI–VAR model is written as:

yt =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

A01 +∑p

i=1 Aiyt−i + Σ12 et

...

A0M +∑p


, (1.1)

where et ∼ NID(0, IK).

Each regime is characterized by an intercept A0i. The autoregressive terms A1, . . . ,Ap,and a variance-covariance matrix Σ are common across all regimes according to a hiddenMarkov chain. This model is based on the assumption of varying intercepts accordingto the state of the economy controlled by the unobserved variable st. Traditionally, andabstracting the difference between switches in mean and switches in intercepts, MSI(M)–VAR(p) models were used business cycle applications, the first of them being Hamilton(1989).

To complete the description of the data-generating process, one introduces a modelfor the regime generating process, which then allows to infer the evolutions of regimesfrom the data. In Markov-switching models, the unobservable realization of the regimest ∈ {1, . . . ,M} is governed by a discrete time, discrete state Markov stochastic process,which is defined by the transition probabilities:

pij = Pr(st+1 = j|st = i),M∑j=1

pij = 1, for all i, j ∈ {1, · · · ,M}.

The transition probabilities between the states are collected into the transition probabilitymatrix P:

P =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

p11 p12 . . . p1M

p21 p22 . . . p2M...

.... . .

...

pM1 pM2 . . . pMM

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

st follows an ergodic M state Markov process. A Markov chain is irreducible in the sense


DOI: 10.2870/63610


that no state is absorbing, i.e. when occurring the Chain does not stay stuck into a state.Ergodicity of the chain refers to the fact that each states are aperiodic and recurrent. Underthese two conditions the ergodic probability vector of the Markov chain can be interpretedas the unconditional probability distribution of the states.

1.2.2 MSH(M)–VAR(p) model

In MSH(M)–VAR(p) models, only the variance covariance matrix varies across regime.They are written as:

yt =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

A0 +∑p


...

A0 +∑p

i=1 Aiyt−i + Σ12Met

, (1.2)


Each regime is characterized by its proper variance-covariance matrixΣi. With Markov-switching heteroskedasticity, the variance of errors can also differ between the regimes.After the change in regime there is thus an immediate one-time jump in the varianceof errors. The intercept A0 and autoregressive terms A1, . . . ,Ap remain constant over allregimes. This model is based on the assumption of varying heteroskedasticity accordingto the state of the economy, controlled by the latent variable st of the same nature as inMSI–VAR models. These models have recently been used in Lanne et al. (2010) where— within the reduced form error covariance matrix varying across states context — theMarkov regime switching property is exploited to identify structural shocks.

1.2.3 MSIAH(M)–VAR(p) model

The less restrictive MS–VAR specification is the one where all parameters of the processare conditioned on the state st. MSIAH–VAR model are written as:

yt =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

A01 +∑p

i=1 Ai1yt−i + Σ121 et

...

A0M +∑p

i=1 AiMyt−i + Σ12Met

, (1.3)



DOI: 10.2870/63610

1.3. ESTIMATION 7

Each regime is characterized by an intercept A0i , autoregressive parameter matricesA1i, . . . ,Api, and a variance-covariance matrix Σi. In this general specification all param-eters are allowed to switch between regimes according to a hidden Markov chain. Thismodel is also based on the assumption of varying model parameters according to thestate of the economy controlled by the unobserved variable st, similarly to the former twomodels. These models introducing switches in the autoregressive parameters, have thusbeen used for impulse response analysis. For instance Ehrmann et al. (2003) propose tostudy regime-dependent impulse responses in the context of such models, conditional onstaying on the regime after the shock.

1.3 Estimation

Estimation techniques Estimation of Markov-switching autoregressive models has beeninitiated in Hamilton (1989). This paper describes how to draw probabilistic inferenceabout the latent state st given observations on yt, giving birth to the so called Hamiltonfilter. It then relates this result to the sample likelihood, which can be estimated on serieswith the help of numerical optimization methods using gradient methods, such as theNewton-Raphson algorithm.3 Despite being appropriate for the estimation of a restrictednumber of parameters, the use of numerical optimizers is prohibited for larger dimensionalsystems or when the number of lags increase.

An answer to that is the expectation maximization algorithm, introduced to theMarkov-switching models of time-series econometrics in Hamilton (1990). The EM al-gorithm by construction finds an analytic solution to the sample likelihood derivativesfrom the smoothed inference about the unobserved regime st. Estimation of higher di-mensional models is permitted by the EM algorithm. Krolzig (1997) provides analyticalsolution to the maximization step for the whole class of Markov-switching models. An-other argument in favor of the EM algorithm over the maximization of the likelihood usingthe calculation of gradients is made by Mizrach and Watkins (1999), and is related to theexistence of local maxima in the likelihood function associated with Markov switchingmodels. Mixture distributions possibly have as many local maxima as there are regimes inthe model, and likelihood functions derived from these densities inherit the same features.The EM algorithm however, not involving the hill-climbing of any likelihood surface but

3The book Kim and Nelson (1999) also presents an estimation strategy based on the Hamilton filterassociated to numerical optimization algorithms.


DOI: 10.2870/63610


rather providing an algebraic solution to the maximization problem, may perform betterin avoid local maxima.

Exhaustiveness requires to refer to the recent developments of Sims and Zha (2006),presenting an application of the estimation of MSVAR models within the Bayesian frame-work. The approach is flexible, allows to work with multivariate series, and additionallyprovides tools to compare model specifications, through their Marginal Data Densities.

Software packages GAUSS source code replicating Hamilton (1989) or examples of thebook Kim and Nelson (1999) are provided by the authors. Also, Bellone (2005) wrote theopen-source MSVARlib package in GAUSS. However all of these programs use numericaloptimizers, hence are not appropriate for higher dimensional estimation.

Krolzig (1998) implemented the models described in Krolzig (1997) in the proprietarysoftware package Ox. The closed nature of this program renders it impossible to usebeyond the scope the authors allow us to. No modification of the algorithm, nor MonteCarlo experiment is possible through this program and its use has again to be discarded.

Sims et al. (2008) provide the theoretical framework to Bayesian estimation of MSVARmodels, as well as some Matlab and C++ programs for practitioners. While very promis-ing, the code is not yet polished enough to be usable.

Staying in the classical framework, the GAUSS programs developed by Warne (1999)make use of the EM algorithm. I preferred to use the open source software language R4

to implement the EM algorithm described in Krolzig (1997).5 R’s openness makes it a fastevolving programming environment for which one can release packages that are likely tobe used by other practitioners.

1.3.1 Implementation of the Expectation Maximization algorithm

The implementation of the EM algorithm developed for this article is flexible because itestimates different type of models, with a flexibility in the following parameters:

• Number of regimes, M.

• Number of lags in the autoregressive part, p.

4The R computing language is developed by the R Development Core Team (2009).5This paper focusing on the algorithm of Krolzig (1997), a rapid comparison of the results yielded by the

package in Ox and this implementation was performed. Similarity in the estimates ensured that the presentimplementation was correct.


DOI: 10.2870/63610

1.3. ESTIMATION 9

• Number of equations in the series, either univariate (K = 1) or multivariate (K =2, 5, 10, 20).

Initialization

The parameters to be initiated are the matrices of autoregressive coefficients, A1i, · · · ,Api

for each regime i ∈ {1, . . . ,M} in the case of MSIAH–VAR models, the matrix of transitionprobabilities P, and the initial state ξ1|0.6

The procedure is automatized and the approach is similar to the one employed inBellone (2005). For the intercepts and autoregressive terms, I compute the ordinary leastsquares [OLS] regression on the whole or split series, depending on which model isestimated.7 From the OLS regression results I compute the residuals either for eachregime or on the whole series, as well as their variance-covariance matrix, used later inthe expectation step.

The transition probability matrix P is initialized with arbitrarily diagonal values.8 Off-diagonals columns of each row share the remaining probabilities, so that the transitionprobabilities for each state sum up to unity.

Expectation step

The BLHK filter, as described in Krolzig (1997), performs the filtering and smoothingoperations on the regime probabilities ξt.

ξt =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣Pr (st = 1)...

Pr (st =M)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦6Hamilton (1994) defines P(st = j|Yt, θ) denotes the conditional probability that the analyst assigns the

possibility that the tth observation was generated by regime j. Those probabilities are collected for j = 1, . . . ,Min the (M × 1) vector ξt|t.

7Before splitting them, Bellone (2005) sorts the series by the values of the first column. While this approachseems reasonable for business cycle applications with univariate series, I do not proceed like that for MSH–VAR and MSIAH–VAR, in order to release the assumption that one series is more prominent than othersfor determining the value of the Markov chain. However for MSI–VAR models series have to be sortedbeforehand, otherwise the resulting initialization parameters are often too similar for the EM algorithm todifferentiate between them, which results in poor convergence performance of the algorithm.

8Diagonals of 0.7 are used for the simulations performed throughout this paper.


DOI: 10.2870/63610


Filtering The filter introduced by Hamilton (1989) is an iterative algorithm calculatingthe optimal forecast of the value of ξt+1 on the basis of the information set in t consistingof the observed values of yt, namely Yt = (y

′t, y

′t−1, · · · , y

′1−p)

′.

The initial state ξ1|0 needs to be initialized with some value to start the iterations. Assuggested in Hamilton (1994), I use the vector of ergodic regime probabilities ξ = Π, whereΠ satisfies the equation PΠ = Π.This step is a forward recursion, i.e. for t = 1, · · · ,T, written as:

ξt+1|t =P′ (ηt � ξt|t−1

)1′M

(ηt � Fξt−1|t−1

) ,where F = P

′and ηt is the collection of M densities and is defined as:

ηt =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣p(yt|st = 1,Yt−1

)...

p(yt|st =M,Yt−1

)⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

Smoothing Full-sample information is used to make an inference about the unob-served regimes by incorporating the previously neglected sample information Yt+1···T =(y′t+1, · · · , y

′T)′

into the inference about ξt.This step is a backward recursion, for j = 1, · · · ,T − 1. The iteration consists in the

following equation:

ξT− j|T =[P

(ξT− j+1|T � ξT− j+1|T− j

)]� ξT− j|T− j

Matrix of transition probabilities The transition probabilities matrix are estimated fromthe filtered, and the smoothed probabilities —ξt|t and ξt|T — calculated in the expectationstep. The (M2 × 1) vector of transition probabilities ρ obtained as follows:

1. Calculate the joint probabilities p(st+1 = j, , st = i|Yt

)for all st, st+1 = 1, · · · ,M gives

the (M2 × 1) vector of regime probabilities:9

ξ(2)t|T = vec(P) �

[(ξ(1)

t+1|T � ξ(1)t+1|t

)⊗ ξ(1)

t|t]

9The transition probability matrix P from previous iterations is used during this step.


DOI: 10.2870/63610

1.3. ESTIMATION 11

2. Sum up over all T:

ξ(2) =

T−1∑t=1

ξ(2)t|T

3. Write:

ξ(1) =(1′M ⊗ IM

)ξ(2)

4. The vector of transition probabilities ρ is obtained by:

ρ = ξ(2) �(1M ⊗ ξ(1)

)

5. Finally, reshape ρ into a (M ×M) matrix to get the transitions probability matrix P.

Normal equations The maximization step typically boils down to the computation ofthe maximum likelihood for the model:

L (λ|Y) := p (YT|Y0;λ)

=

T∏t=1

p (Yt|Yt−1;λ)

=

T∏t=1

∑ξt

p(yt|ξt,Yt−1, θ

)p (ξt|Yt−1;λ)

=

T∏t=1

η′tξt|t−1

The conditional densities p(yt|ξt,Yt−1, θ

)are composed of several normal distributions,

see Krolzig (1997), hence rendering L non-normal. Directly maximizing log L thus requiresthe use of non-linear optimization algorithms, which are costly from the computationalpoint of view and increasingly with the estimation of multivariate models. In the EM


DOI: 10.2870/63610


algorithm, the parameters (intercepts, autoregressive terms, and variance) are derivedfrom the first-order condition of the maximum likelihood estimation. Krolzig (1997) showsthat it is sufficient to do only one single generalized least squares [GLS] estimation withineach maximization step to ensure convergence to a stationary point of the likelihood.

MSI–VAR models The regression equation for MSI–VAR models is:10

y =

M∑m=1

(Ξm1T ⊗ IK) A0m +(X ⊗ IK

)A + u, u ∼ N (0,Ω) ,Ω = IT ⊗ Σ

Writing β as the collection of νm and α in a((

M + Kp) × K

)matrix, the GLS estimates

are :11

β′=

⎡⎢⎢⎢⎢⎣ Ξ ΞΞΞ′X

X′ΞΞΞ X

′X

⎤⎥⎥⎥⎥⎦−1 ⎡⎢⎢⎢⎢⎣ΞΞΞ

′Y

X′Y

⎤⎥⎥⎥⎥⎦Σ = T−1U

′ΞU

MSH–VAR models The regression equation in the MSH–VAR case is the following:

y =

M∑m=1

(1T ⊗ IK) β + u, u ∼ N (0,Ω) ,Ω =M∑

m=1

Ξm ⊗ Σm

10Defining y as y =(y′1, . . . , y

′T

)11Notations are introduced:

Y− j =(y1− j, . . . , yT− j

)′X =

(Y−1, . . . ,Y−p

)U = 1M ⊗ Y − Zβ

′

Z =(IM ⊗ 1T, 1M ⊗ X

)And for the indicators of smoothed probabilities:

ΞΞΞ =(ξ1|T, . . . , ξT|T

)Ξ = diag

(1T′ΞΞΞ

)


DOI: 10.2870/63610

1.3. ESTIMATION 13

The GLS estimates are written as12

β′=

⎛⎜⎜⎜⎜⎜⎝M∑

m=1

(X′ΞmX

)⊗ Σ−1

m

⎞⎟⎟⎟⎟⎟⎠−1 ⎛⎜⎜⎜⎜⎜⎝

M∑m=1

(X′Ξm

)⊗ Σ−1

m

⎞⎟⎟⎟⎟⎟⎠ y

Σm = T−1m U

′ΞmU

MSIAH–VAR models The regression equation for the MSIAH–VAR case is the fol-lowing:

y =

M∑m=1

(ΞmX ⊗ IK

)βm + u, u ∼ N (0,Ω) ,Ω =

M∑m=1

Ξm ⊗ Σm

In that case, the GLS estimates are written as:13

β′m =

(X′ΞmX

)−1X′ΞmY

Σm = T−1m U

′mΞmUm

12Some of the notations change:

X =(1T,Y−1, . . . ,Y−p

)Um = Y − (

X ⊗ IK)β′

And for the indicators of smoothed probabilities:

ξm =(ξ′m1|T, . . . , ξ

′mT|T

)Ξm = diag

(ξm

)Tm = tr

(Ξm

)= 1T

′ξm

13The residuals are now obtained by:

Um = Y − Xβm


DOI: 10.2870/63610


Convergence criteria

After the initialization, the algorithm iterates on the expectation and maximization steps,until convergence. Two measures for convergence between jth and j + 1th iterations areconsidered here.

• The first one is the absolute percentage change in the logarithm of the likelihoodvalue, calculated as:

Δ1 = 100 ·∣∣∣∣∣∣∣∣ln L

(λ( j+1)|YT

)− ln L

(λ( j)|YT

)ln L

(λ( j)|YT

)∣∣∣∣∣∣∣∣

• The second one is the maximum change between iterations j and j + 1, among allparameters, formulated as follows:

Δ2 = maxi{∣∣∣∣∣λ( j+1)

i − λ( j)i

∣∣∣∣∣}Convergence is considered achieved when one of the criterion is judged small enough,

i.e. Δ1 ≤ δ or Δ2 ≤ δ.14

In order not to enter into infinite iterations, a parameter for the maximum numberof allowed iterations before convergence is implemented. If the EM has not convergedwithin it, the algorithm stops.15 Error handling and information about convergence arealso provided by the algorithm returns, facilitating simulation exercises such as bootstrapor Monte Carlo experiments.

1.4 Monte Carlo experiment

The purpose of this Monte Carlo experiment is to study the properties of the EM algo-rithm for the estimation of simulated univariate and multivariate series from MSI–VAR,MSH–VAR, and MSIAH–VAR models, paying particular attention to higher dimensionalsystems composed of many variables.

First of all, we are interested in finding out whether these models are estimable at all,and under which circumstances. All Markov-switching VAR models may not be equally

14Typically, a value of δ = 10−6 is used.15By default the maximum number of iterations allowed before convergence of the algorithm is 100.


DOI: 10.2870/63610

1.4. MONTE CARLO EXPERIMENT 15

easy to estimate for the EM algorithm, and for some sets of parameters, the algorithmmay not converge at all. Furthermore, we are interested in looking at the accuracy of theestimates, consisting of the mean square error in the estimated model parameters, i.e. theintercepts, autoregressive coefficients, and variance-covariance matrices. Finally, I take acloser look at the estimates of the realizations of the hidden Markov chain, to see how wellthe EM manages to estimate the regimes.

1.4.1 Specificities of the experiment: multiple dimensions and distance be-

tween regimes

In time-series econometrics, a Monte Carlo experiment can usually be decomposed intofour phases:

1. Choice of model parameters.

2. Simulation of N series using the parameter set. Each series is generated by drawinga new history of residuals from the appropriate random distribution.

3. Estimation of each individual series using the algorithm.

4. Aggregation of the results of the individual estimations into the final result.

In a standard Monte Carlo experiment, parameters remain identical over all the simu-lations. While this is the proper way to proceed to gather finite-sample evidence on modelswhere the number of parameters stays the same over all the simulations, what to do whenfor instance one wants to study the properties of an estimator over different dimensions ofone model ?16 The number of parameters to estimate varies with the number of equationsin the model and thus steps 1, 2, and 4 of the aforementioned list do not fit to such anexperiment. The Monte Carlo experimental design has to be adapted to characterizes theproperties of estimators over a different number of possible parameters.

Regarding the dimensions of interest for a Monte Carlo experiment on Markov switch-ing VAR model, consider the peculiarity in such models: the switch in regimes throughthe latent variable following a Markov process. Intuitively, the relative distance betweenthe regimes should affect the estimation, processes composed of regimes that are veryclose from another should be more cumbersome to estimate than processes that have very

16The number of equations in the model K is varying over the experiment.


DOI: 10.2870/63610


distant regimes. Also, a regime which is rarely occurring over the sample will also bemore problematic to estimate than a regime that occurs frequently over the sample.17. Theparameters defining the distance between regimes are the intercepts, the autoregressivecoefficients, and the variance-covariance matrix. For each of these, two cases will beconsidered in the experiment, one close and another distant case, see details in next section.

Finally, two factors that may influence the estimation are also considered. The first isthe stationarity in the simulated series as their distance to the unit root. The second one isthe persistence of the regimes, expressed in the matrix of transition probabilities betweenregimes.

The next section details the design of the experiment.

1.4.2 Experimental design

Parameters

The experiment is conducted on MSI(2)–VAR(1), MSH(2)–VAR(1), MSIAH(2)–VAR(1)processes, i.e. with M = 2 regimes, and p = 1 lag in the autoregressive part. It is conductedover the following dimensions: the sample size (T) of the processes are the same as inPsaradakis and Sola (1998) and are the ones typically used in practice. Univariate andmultivariate processes are simulated and estimated, up to series containing 20 equations.1000 simulations are repeated for every set of parameters. This is summarized as:

• M = 2

• p = 1

• T ∈ {100, 200, 400, 800}• K ∈ {1, 2, 5, 10, 20}• N = 1000

MSI–VAR In MSI(2)–VAR(1) models, the intercept coefficients are regime dependent(A01 and A02). We set up two distances between the two regimes. For close intercepts, thefirst regime contains values of -1 for all equations whereas the second regime is valued

17This is a neglected dimension in this paper, for which only a minimum number of occurrences of a regimeover the simulated series is required to be sufficient.


DOI: 10.2870/63610

1.4. MONTE CARLO EXPERIMENT 17

to 1. In the more distant case, the first regime has intercepts of -5 and the second one of5. the (K × 1) vector of intercepts, is invariant across regimes and contains only ones forall equations. A1, the (K × K) matrix of autoregressive coefficients is a diagonal matrix.All the diagonal values are equal to 0.6 in the stationary case, all eigenvalues well insidethe unit circle, or 0.9 which is closer from the non-stationarity region. Σ, the (K × K)variance-covariance matrix is also diagonal. For the transition probabilities, two casesare considered. The more persistent case will have an average expected duration of aregime of 20 periods (0.95 in the diagonals of P). In the less persistent case, regimes willbe expected to last for 5 periods only.

• All elements (A01,A02) ∈ {(−1, 1) , (−5, 5)}• Diagonals of A1 ∈ {0.6, 0.9}. Non-diagonal elements are 0

• Diagonals of Σ = 1. Non-diagonal elements are 0

• (p11, p22

) ∈ {(0.8, 0.8) , (0.95, 0.95)}

Combining these cases to consider their separate and joint effect, we have to consider 8experiments for MSI–VAR models. Adding the sample size dimension and the number ofequations ones to the experiment yields 160 experiments in total, each of them consistingof 1000 simulations plus estimations for MSI–VAR models.

MSH–VAR A0 the (K × 1) vector of intercepts, is invariant across regimes and containsonly ones for all equations. A1, the (K×K) matrix of autoregressive coefficients is a diagonalmatrix. All the diagonal values are equal to 0.6 in the stationary case, all eigenvalues wellin the safe zone of the unit circle, or 0.9 which is closer from the non-stationarity region.Σ, the (K×K) variance-covariance matrix is also diagonal. In MSH(2)–VAR(1) models, thevariance is regime dependent and we consider either close regimes, where the diagonalelements of the first regime are 1 and of the second regimes are 5, or more distant regimesrespectively with diagonal elements of 1 and 25.

• All elements A0 = 1

• Diagonals of A1 ∈ {0.6, 0.9}. Non-diagonal elements are 0

• Diagonals of (Σ1,Σ2) ∈ {(1, 5) , (1, 25)}. Non-diagonal elements are 0


DOI: 10.2870/63610


• (p11, p22

) ∈ {(0.8, 0.8) , (0.95, 0.95)}

160 experiments are also conducted for MSH–VAR model, each of them consisting of1000 simulations plus estimations.

MSIAH–VAR In MSIAH(2)–VAR(1) models, all parameters vary across regimes. Addingup to the variations of intercepts and variance-covariance of the same nature as for theprecedent models, a variation of the autoregressive matrices is integrated. Closer matricesare diagonals of -0.6 in the first regime and 0.6 in the second. The numbers become -0.9and 0.9 for the more distant case.18

• All elements (A01,A02) ∈ {(−1, 1) , (−5, 5)}

• Diagonals of (A11,A12) ∈ {(−0.6, 0, 6) , (−0.9, 0, 9)}. Non-diagonal elements are 0

• Diagonals of (Σ1,Σ2) ∈ {(1, 5) , (1, 25)}. Non-diagonal elements are 0

• (p11, p22

) ∈ {(0.8, 0.8) , (0.95, 0.95)}

Again, 160 experiments are also conducted for MSIAH–VAR model, each of themconsisting of 1000 simulations plus estimations.

Benchmark: VAR As a point of reference, experiments are run on K-dimensional VAR(p)models:

yt = A0 +

p∑i=1

Aiyt−i + Σ12 et, (1.4)

where et ∼ NID(0, IK). The parameters varying over the experiments are:

• All elements A0 ∈ {1, 5}

• Diagonals of A1 ∈ {0.6, 0.9}. Non-diagonal elements are 0

• Diagonals of Σ ∈ {1, 5, 25}. Non-diagonal elements are 0

18Hence we can not discriminate if it is the distance to unit root or between the AR coefficients that affectthe estimation.


DOI: 10.2870/63610

1.5. FINITE-SAMPLE EVIDENCE 19

Criteria for successful experiment

EM estimation For each call to the EM algorithm, the maximum number of iterations ofthe EM algorithm authorized before achieving convergence is 100. If convergence occurs,the algorithm is considered to be successful, provided there was a log-likelihood gain ofat least 5% between the first and last iterations.19

Each experiment Each experiment consists in 1000 simulated and estimated series. Toreduce the computational burden of the whole procedure, the upper limit of 100 failedEM estimations (as defined above) per experiment is set. For each experiment, I report therate of failed estimations.

The following section analyzes the outcomes of the Monte Carlo experiment.

1.5 Finite-sample evidence

The complete results for all experiments are detailed in the tables of the Appendix. Com-mon and rather intuitive results arise among the three Markov-switching type of estimatedmodels. First of all when the number of observations increase over the simulations (from100 to 800 per simulation), the algorithms converge more often and the experiments areconsidered as more successful. Naturally a higher sample size facilitates the estimation.

Also, the sample size necessary to estimate processes naturally increases with thedimension of the simulated series (univariate ones requiring less voluminous series to beestimated than higher dimensional ones). The number of parameters to estimate inflateswith the number of variables in the time series, as illustrated below for the models studiedin this paper:

195% was arbitrarily chosen, based on my own experience. It was chosen as low as possible, so that not toomany estimations are discarded.


DOI: 10.2870/63610


MSI(M) − VAR(p) : M (M − 1)︸��︷︷��︸Transition probabilities

+K(M + Kp

)︸��︷︷��︸

Intercept + AR

+K (K + 1)

2︸��︷︷��︸Variance-covariance

MSH(M) − VAR(p) : M (M − 1)︸��︷︷��︸Transition probabilities

+K(1 + Kp

)︸��︷︷��︸

Intercept + AR

+KM (K + 1)


MSIAH(M) − VAR(p) : M (M − 1)︸��︷︷��︸Transition probabilities

+KM(1 + Kp

)︸��︷︷��︸

Intercept + AR

+KM (K + 1)


Table 1.1: Number of model parameters for the models studied in the present Monte Carloexperiment. K stands for the number of equations in the model.

K 1 2 5 10 20

MSI(2)–VAR(1) 6 13 52 177 652MSH(2)–VAR(1) 6 14 62 222 842

MSIAH(2)–VAR(1) 8 20 92 332 1262

As illustrated in Table 1.1, the number of parameters to estimate grows fast withan increase of the number of equations in the model (K). MSI–VAR have the least pa-rameters among the three types of studied models. Due to the variance-covariance ma-trix and/or the autoregressive terms, the MSH–VAR and MSIAH–VAR models see theirregime-dependent parameters grow to the order 2 of K, whereas this rate of growth forregime-dependent parameters is not squared for MSI–VAR models. In any case, the threemodels possess a high number of parameters to estimate for systems with many equations.This explains the deterioration of the estimation performance of the algorithm for thesemodels when the sample size stays small.

1.5.1 Percentage of successful estimations

Tables A.1, A.6, and A.11 display the percentage of successful estimations for the thousandsimulations of each experiment, respectively for MSI–VAR, MSH–VAR, and MSIAH–VARmodels. The next sections present the estimation successes specific to the three types of


DOI: 10.2870/63610


models.

MSI–VAR

Being the models with the lowest number of parameters to estimate, models with regimechange in intercept only are the most successfully estimated among the three studied ones.The distance between the intercepts is the only regime-varying parameter for MSI–VARmodels. This Monte Carlo experiment was designed with two cases for intercept distancebetween regimes, one with close regimes and another with more distant ones. The closercase has intercepts vectors of the first regime equal to -1 and intercepts vectors of thesecond regime equal to 1, whereas the in the more distant case, intercepts are -5 and5 in each regime. The EM algorithm performs better for the more distant case, whereit discriminates better between the two regimes. For the closer case, the worst rate offailures of the experiment is 24%, to compare with 12.4% for the more distant interceptscase. Series with smaller sample size are less successfully estimated, and this phenomenais more pronounced for series of higher dimensions.

The persistence of the processes is also a varying dimension over the experiments.Results indicate that for MSI–VAR, processes closer to unit root20 are estimated better,since no experiment was considered successful for processes more distant to the unit root.

The last source of variation between experiments is the persistence of the regimes,introduced in the form of the transition probabilities between regimes. They can be eitherless persistent with probabilities to remain in the same regime in next period of 0.8 forboth regimes or more persistent with probabilities of 0.95. Simulated series with lowerregime persistence are consistently more successfully estimated by the EM in comparisonto series with more persistent processes.

MSH–VAR

Experiments on MSH–VAR models display more failures than for MSI–VAR models.The distance between the diagonals of the variance-covariance matrices in each regime isonce again an important determinant of success in the EM estimation, with as should beexpected better estimation performance for simulated processes having higher differencebetween regimes in their variance-covariance matrices. In both the close and distant

20The diagonals of the autoregressive coefficients matrix A1 were chosen to be equal either to 0.6 or 0.9.


DOI: 10.2870/63610


cases, the success rates of the experiments increase with the number of equations inthe models. However, adding equations to the system only increases the efficiency ofthe algorithm provided the sample size for is large enough. Indeed, small sample sizebecome a handicap as the number of parameters to estimate increase. Every equation ofthe simulated series being subject to switches at simultaneous times certainly eases theregime detection occurring during the expectation step in the EM algorithm.

Different distances to unit root for the processes i.e. A1 = 0.6 or A1 = 0.9, does notnoticeably modify the estimation performance.

Yet, the persistence in the processes has an strong impact on the experiment successes,and more persistent regimes (P = 0.95) have more easily estimated than models with lesspersistent regimes. This is not surprising, as the only source of variation between theregimes is the variance, and intuitively prolonged periods of the same variance should beeasier to detect than rapid switches between different regimes of variance.

MSIAH–VAR

Among the three categories of models, the MSIAH–VAR witness the most contrast betweentheir regimes, with regime switches in intercepts, autoregressive terms, and covariancesmatrices. The EM algorithm would be expected to estimate these models with ease, incomparison to MSI–VAR and MSI–VAR models. However the number of parameters toestimate is much higher, as shows the Table 1.1.

As expected, MSIAH–VAR models are the successfully estimated in this Monte Carloanalysis, with only few failures for univariate models, K = 1, or in higher dimension whenthe number of observations is low. The EM algorithm has more latitude to distinguishbetween the regimes in its expectation step, which results in better convergence for thesemodels. However, due to the high number of parameters to estimate, higher dimensionalsystems are less successfully estimated than models less parameters intensive such as theMSI–VAR. One can clearly observe a decrease in the rate of success when jumping from10 equations MSIAH–VAR models to models with 20 equations.

Higher distance between regimes in the three regime-varying parameters yields betterconvergence performance of the EM algorithm, for low dimensional MS–VAR modelswith K = 1, 2. This tendency vanishes for higher dimensions for which closer regimeshave more success rate in estimation.

The persistence in the processes slightly improves on the successes.


DOI: 10.2870/63610


As for the regime persistence, it does not notably influence the experiments’ successrate. However, processes with more persistent regimes are subject to higher algorithmfailure rate when the sample size is to small for the number of parameters to estimate, forexample when the processes have 10 or 20 equations and when the simulated series have100 or 200 observations.

1.5.2 Empirical distribution of the Maximum Likelihood Estimator

Psaradakis and Sola (1998) considered the mean, skewness and kurtosis of the estimators.Here, due to the high number of experiments to summarize, only the second moment ofthe error —mean squared error [MSE]— incorporating bias and variance of the estimator,is considered.

MSI–VAR

Intercepts The mean squared error for the first intercept coefficient for each regimes,A01 and A02, are summarized in Table A.2. Results indicate that the mean squared errordecreases when the sample size gets larger. The precision in the intercept estimatesremains about the same for different distances between regimes, provided the sample sizeis large enough, otherwise closer regimes are logically more precisely estimated, as anartefact of the experiment. The estimation of intercepts for models further away from theunit root constantly outperforms the estimation of processes closer from the unit root, butonly moderately. Higher persistences in the Markov chain deteriorate the precision in theintercepts estimation.

It is worth to notice that for a large enough sample size, the mean squared error doesnot suffer from a deterioration when the number of equations in the model increase.

Also, when comparing the magnitude of the mean squared errors between MSI–VARmodels and the ones of standard VAR models with the comparable parameters, which isdone through ratios exposed in Table A.16, in almost every experiment the intercepts areestimated with much more precision for MSI–VAR models. The ratios have indeed valuescomprised between 7.3 × 10−4 and 1.1. More distant regimes for the MSI–VAR, where theEM estimates better, produce the best results.

Autoregressive coefficients Table A.3 reports the mean squared error statistics of theMonte Carlo experiments for the upper-left element of A1, the matrix of autoregressive


DOI: 10.2870/63610


coefficients. In comparison with the estimates of the intercepts for the same MSI–VARprocesses, estimates of the autoregressive terms have the same magnitude of mean squarederrors.

Nor the distance between the regimes nor the persistences in the regimes or in theprocesses influence the mean square error of the autoregressive coefficients. Neither anincrease in the number of variables K results in worse estimates of the AR coefficients,except for K = 10 or K = 20 where smaller sample size are less precisely estimated.

Comparing to the estimates of standard VAR models, as reported in the form of ratios inTable A.17, models with regime switches in their intercept show slightly better estimationprecision the estimation of the AR coefficients, provided the sample size is large enoughto estimate all the parameters.

Variance-covariance matrix Table A.4 recapitulates the mean squared errors statisticsof the Monte Carlo experiments for the first element of the variance-covariance matrixΣ. The results are very comparable to the ones concerning the intercepts of the sameMSI–VAR models, both with regards to the magnitude of the statistics and to the effectsof the distance between regimes or persistence of the regimes.

When increasing the number of variables in the system, the mean squared errors ofthe estimates of the variance-covariance matrix coefficients do remain fairly at the samelevels that for lower dimensional MSI–VAR models.

Table A.18 indicates that the variance estimation of MSI–VAR processes yields worse,though comparable precision to the simpler VAR processes. However, the ratios surgefor higher dimensional systems with small sample size, with a maximum ratio of 1161 forK = 20.

MSH–VAR

Intercepts Table A.7 shows the Monte Carlo experiments outcomes in the form of meansquared error statistics of the first element of the intercept vector A0, constant over regimesfor MSH–VAR models. A higher distance between the regimes — a higher variancedifference between regimes in the MSH–VAR case — helps the EM algorithm to estimatethe intercepts more precisely, as lower mean squared errors for the right-hand side panelwitness. Roughly the statistics are at least two times higher for closer regimes.

A higher persistence in the processes notably worsens the precision with which the


DOI: 10.2870/63610


intercepts are estimated. Estimates have a lower mean squared error when the diagonalsof the AR matrix are equal to 0.6 than for higher values of 0.9, with ratios of mean squarederror being above 10 in some cases.

The persistence in the regimes does not influence the precision of the intercepts’ esti-mate for simulations where regimes are close from each other. However for the distantregimes cases, more persistent regimes are less precisely estimated, roughly by a factor 2in the ratios of mean squared errors.

Increasing the number of equations in the models diminishes the accuracy of theintercepts’ estimates, by approximately a factor 10 between the mean squared errors forK = 1 and those of K = 20, given a sufficient number of observations in the simulatedseries.

Finally, the ratios of mean squared errors for MSH–VAR models over equivalent VARmodels, represented in Table A.19 indicate that the MSH–VAR processes estimation overallproduces more precise estimates, again if the sample size is large enough for the EM toestimate larger dimensional series.

Autoregressive coefficients The mean squared error statistics on the estimates of the firstdiagonal element of the AR coefficient matrix (i.e. A1) are presented in Table A.8. Moredistance between the variances of the regimes is associated with better estimates of theautoregressive coefficients, which can be observed in the table by statistics about 2 timessmaller in the case of more distant MSH–VAR processes.

Contrarily to the intercepts, more persistent processes are more easily estimated bythe algorithm and their estimates have slightly lower mean square errors than the onesof comparable but less persistent processes. The persistence in regimes have howeverthe same effect on the AR coefficients as for the intercepts: simulated series with morepersistent Markov regimes produce equally accurate estimates in general, but less accurateestimates in case of distant processes.

Estimates of AR coefficients do not suffer from much loss of precision when the numberof equations increase, and the estimates have quite comparable magnitudes of meansquared errors.

It is interesting to notice from Table A.20 that for simulated processes with closerregimes the MSH–VAR models are on par with the VAR models. But this changes forsimulated MSH–VAR with distant regimes, which tend to yield smaller mean squarederror than VAR models.


DOI: 10.2870/63610


Variance-covariance matrix Table A.9 sums up the mean squared errors for the upper-left elements of the regime-dependent variance-covariance matricesΣ1 andΣ2. There existthree true values for the parameter of interest, either 1, 5, or 25. On all the experiments,the first regime always have the diagonals of his variance-covariance matrix equal to 1,whereas the second regime has either values of 5 or values of 25. Comparing the precisionwith which the variances of the first regimes are estimated, it turns out that a higherdistance between the variances of both regimes does not diminish the reported meansquared error statistic values of the table. However simulated series with higher distancesbetween the regimes require large sample size to be correctly estimated, as can be seen inthe K = 20 case where a sample size of 100 yields worse results for more distant processes.Changes in the persistence of processes do not influence the results, neither the persistenceof regimes does.

The critical factor influencing the mean squared error statistics is the true parametervalue: Estimates of MSH–VAR processes generated with variance-covariances diagonalsequal to 1 generally yield mean squared errors around the value of 0.005, for series with800 observations. For true parameter values of 5 and 25 (for the second regimes of thesimulated series), almost every experiment reports statistics of 0.15 and 3, respectively.Higher variances are less accurately estimated.

Increasing the number of equations, K, improves the precision of the EM algorithm forvariance-covariance terms, for sufficient sample size.

The ratios of variance with respect to estimates from VAR models are shown in TableA.21. The mean squared errors of MSH–VAR models are always higher than the ones ofVAR models, by at least a factor of 2. The factor sharply increases for large dimensionalmodels where a limited sample size limits the estimation performance of the EM algorithm.

MSIAH–VAR

Intercepts MSIAH–VAR models have all of their parameters switching with the regimes.Table A.12 shows the mean squared error for the first elements of the intercept for eachregimes, namely A01, A02. While estimates of the intercepts for the first regime are moreor less equally precise when the distance between the two regimes of the processes varies,this is not the case for the estimates of the second regime. For the second regimes, theestimates for simulated series with more distant regimes have higher mean squared error.This can likely be attributed to the Monte Carlo design, where the second regime has a


DOI: 10.2870/63610


variance of 25 in the distant case, to be compared with 5 in the non-distant case.Changes in the persistence of the processes do not lead to notable differences in the

precision of the estimated coefficients. More persistence in the regimes increases the meansquared error of the estimates.

When the dimension of the processes increases, there is not a big loss in the estimationprecision, except in the frequent case of too small sample size. The sample size becomes acritical factor for MSIAH–VAR models, as can already be seen with bivariate series of 100observations, which yield unexpectedly high mean squared error.

The comparison of the estimation precision of intercepts for MSIAH–VAR models withVAR models, proposed in Table A.22, reveals that the estimates for MSIAH–VAR modelsare in most cases more precise, particularly for lower dimensional series for which the EMhas less issues of high number of parameters.

Autoregressive coefficients The first elements of the regime-dependent autoregressivematrices A11 and A12 are scrutinized in Table A.13. The distance between the processesof both regimes influences the mean squared error statistics of the parameters, and moredistant regimes are more precisely estimated when it comes to their autoregressive coeffi-cients.

More persistent processes (higher absolute values in the AR coefficients) yield moreaccurate AR estimates, and the contrary happens for more persistent regimes. The meansquared errors for higher dimensional series are not much negatively affected in compar-ison to lower dimensional ones, but the sample size is critical, as usual for MSIAH–VARmodels.

Table A.23 shows that the ratios are much in favor of VAR estimates over MSIAH–VARestimates with ratios usually bigger than one.

Variance-covariance matrix Table A.14 sums up the mean squared errors for the firstdiagonal element of the regime-dependent variance-covariance matrices Σ1 and Σ2. Ex-amining the influence of distance between the two regimes on the accuracy with whichthe variances are estimated, and again restricting the analysis to estimates for the firstregime only, there appears to be little difference in the reported statistics for series withdifferent distances. The EM algorithm produce equally precise estimates in both cases,however the more distant case is more demanding in term of sample size, as illustrated inthe right-hand side panel.


DOI: 10.2870/63610


The mean squared errors statistics remain of very comparable levels, whether theregimes’ persistence is low or high. The persistence of the processes greatly influences theprecision, as processes closer from the unit root have very high mean squared errors inthe right-hand side panel.

Estimates of the variance for higher dimensional series have higher mean squarederrors than lower dimensional ones, as occurs for the EM suffers from the high number ofparameters to estimate.

Finally, all the ratios of Table A.24 are generally much bigger than unity, indicatingthat the variance estimates for MSIAH–VAR models have a higher mean squared errorthan the ones for VAR models. MSIAH–VAR models suffer from their high number ofparameters to estimate in these experiments. This result may be improved using longersamples in the simulated series, but it constitutes an illustration of how MS-VAR modelsrequire a larger amount of data to produce as precise estimates as linear models such asvector autoregressions.

1.5.3 Accuracy in the regime estimation

In the influential contribution introducing dynamic switching models governed by oneunobservable Markov process, Hamilton (1989) estimated univariate series of aggregateoutput. Allowing one state to represent recessionary regimes (low or negative outputgrowth) and the other to represent expansions (positive output growth), Hamilton (1989)provided a technical framework to estimate the business cycles turning points, whichconstituted the main innovation of the paper.

Constructing the chronology of business cycles subsequently became a popular ap-plication of Markov chain based models, as suggested in Goodwin (1993). Mainly, theU.S. business cycle turning points are compared to the ones from the National Bureauof Economic Research, which are obtained by the methodology developed in Mitchelland Burns (1938).21 The closeness between the estimated turning points and the NBERbusiness cycles ones presumes upon the performance of the EM estimation of turningpoints.

21On the contrary, the Markov-switching methodology from Hamilton (1989) makes use of a probabilitymodel without prior information.


DOI: 10.2870/63610


Methodology

The present Monte Carlo experiment aims to assessing and quantifying the accuracy inthe estimation of the regimes, closely related to the determination of the turning points. Iwill now review the statistics which give some insight on how the EM algorithm correctlyestimates the latent Markov-switching variable, and then propose another statistic.

The EM algorithm returns the regime probabilities of the estimated model, often calledsmoothed probabilities ξt|T. One can infer the state st, hence the history of regimes:

st = arg maxj∈{1,··· ,M}

ξ j,t|T = arg maxj∈{1,··· ,M}

Pr(st = j|Yt

)

Before comparing the regimes, one first needs to match them. Indeed the EM algorithmmay assign regimes differently, for instance labeling regime 1 as regime 2, etc. Matching,or labeling the simulated and estimated regime, is performed by mapping the occurrencesof generated regimes with the estimated ones. For every generated regime, the estimatedregime occurring the most frequently is then mapped to this one.

To compare simulated and estimated histories of regimes, I consider the quadraticprobability score [QPS] and the log-probability score [LPS] originating in Diebold andRudebusch (1989). These statistics, if traditionally uses in forecasting exercises, can nev-ertheless be informative in such an analysis. They are written as:

QPS =2T

T∑t=1

(ξ1,t|T − I(st=1)

)2, (1.5)

LPS = − 1T

T∑t=1

((1 − I(st=1)

)log

(1 − ξ1,t|T

)+ I(st=1) log

(ξ1,t|T

)), (1.6)

where I(st=1) is an indicator function taking the value of 1 when the actual regime is 1, and0 otherwise. The QPS takes values between 0 and 2, 0 being the case of perfect regimeestimation over the whole sample, for t = 1, · · · ,T. The LPS is equal to 0 in case of perfectregime estimation, but is not upper bounded. LPS penalizes larger regime estimationserrors more than QPS. Each statistic is then averaged over the N simulations of the MonteCarlo experiment.

One issue obstructs the reporting of the LPS statistic in this Monte Carlo experiment:suppose that at time t, the regime is incorrectly estimated, i.e. if the algorithm yields


DOI: 10.2870/63610


ξ1,t|T = 1 whereas st = 2. Then, in equation (1.6), the product(1 − I(st=1)

)log

(1 − ξ1,t|T

)becomes infinite, and so does the LPS statistic. Averaging over the N simulations of everysingle LPS statistic does not get rid of this error. If one single regime is wrongly estimatedon one simulation only, the whole LPS average will be infinite, which is not a desirableproperty. Therefore, I discard the reporting of average LPS for the simulations.

Comparing actual to estimated regimes can be simplified as a discrete exercise inwhich either the regimes are correctly estimated, or they are not. Accordingly, I considera discrete statistic. I count the number of wrongly estimated regimes per series, i.e. thenumber of occurrences where the simulated st and the estimated st differ over the samplesize. Since the sample size varies over the different simulations, I divide the result bythe sample size T and express everything in percent. The considered statistic is thus thepercentage of wrongly estimated regimes [%WER]:

%WER =1T

T∑t=1

I(st�st), (1.7)

where I(st�st) is the indicator function taking the value 1 if both regimes are different and 0otherwise.

The next section presents the results of the Monte Carlo experiment for the accuracyof regime estimation.

Results

The three models share a common characteristic, already reported in Section 1.5.1: thesample size necessary to estimate processes naturally increases with the dimension of thesimulated series.

MSI–VAR Table A.5 reports statistics about the accuracy in the regime estimation,namely the mean QPS, and mean %WER statistics. Unequivocally, experiments with moredistant processes are estimated with more precision. Both the mean QPS and the mean%WER statistics indicate that the EM detects the regimes better when the distance betweenthe intercepts of the two regimes is higher for MSI–VAR models.

The persistence in the processes —or diagonals of the autoregressive coefficientsmatrix— have no influence on the precision with which the regimes are estimated.

A lower persistence in the regimes, which in this experiment translates into values of


DOI: 10.2870/63610


0.9 in the diagonals of the transition probability matrix yield some better accuracy in theregime estimation, with slightly lower mean QPS values and mean %WER.

The most remarkable result in this table is how the precision in the regime estimationincreases with the dimension of the simulated series, best illustrated by the column ofresults on the left-hand side of the panel. For a sample size of 800 observations, the QPSand %WER statistics dramatically decrease when the number of variables in the modelincrease from 1 to 20. This does not necessarily occur when the sample size is equal to100 and for series with 20 equations. In this case the mean regime estimations failures canreach 30%.

MSH–VAR The statistics for regime estimation of MSH–VAR models are collected inTable A.10. A higher distance between regimes, here between the variance-covariancematrices of each regime is the only source of differentiation between regimes, diminishesthe mean QPS statistics in every comparable instances of the experiment. The mean %WER

statistics display the same trend.The persistence in the processes does not affect the precision in the regime estimation.

Yet a higher persistence in the regimes yields a more precise estimation of regimes.As for MSI–VAR models, the most striking result of Table A.10 is the strong improve-

ment in regime estimation occurring when the dimension of the system increases. Thisimprovement being of course conditional on having a large enough sample-size for theEM algorithm to estimate large dimensional series.

MSIAH–VAR Lastly, Table A.15 shows information about the accuracy in the regimeestimation for MSIAH–VAR models. Distance between regimes, here in intercepts, au-toregressive coefficients, and variance-covariance, once again plays a role in the precisionof the regime estimation, more differentiated regimes yielding better estimates of the hid-den regime variable. Similarly to MSH–VAR models, both QPS and %WER statistics showimprovement of the algorithm when estimating more distant regimes.

More persistent processes, i.e. closer from the unit root have similar mean QPS and%WER statistics than less persistent processes for the most part. Likewise, no clear differ-ence in the estimation of regimes emerge when comparing processes with different regimepersistence.

Finally the noticeable result of Table A.15 is related to the improvement in the esti-mation of the correct hidden-variable state when the number of equations in the model


DOI: 10.2870/63610


increase, again provided a sufficient sample size for models with many equations. How-ever, when the sample size is too low for the dimension of the series, the %WER canapproach 50%, so there is a trade-off between the gain brought by dimension of the seriesand the requirement of longer datasets to estimate these.

1.6 Summary and implications

What are the finite sample properties of the estimates of vector autoregressive models withswitches in regime? Through Monte Carlo simulations, this paper aims at shading somelight on these properties. Markov-switching vector autoregressive processes are non-linear and their estimation –performed through the expectation maximization algorithmin this paper– can be cumbersome. Therefore the first feature of interest is whether suchprocesses can be estimated or not, and under which conditions. Among the consideredfactors that may influence the estimation of MS–VAR processes are the sample size of theseries, the distance between the regimes of the series, the persistence of the regimes, thedistance to unit root for the series, and finally the number of equations in the series. Theestimates of the series were then scrutinized, and I checked the accuracy of the intercepts,autoregressive coefficients, and variances. Finally, attention was given to the precisionin the estimation of the hidden Markov process realizations, especially interesting forapplications such as the determination of turning points.

The Monte Carlo experiment were performed on three types of MS–VAR: modelswith regime switches in the intercepts only (MSI–VAR), models with regime switchesin the variance only (MSH–VAR), and models with regime switches in all the modelparameters, in other words the intercepts vector, the autoregressive coefficients matrix,and the variance-covariance matrix (MSIAH–VAR).

Models with switches in intercepts were the most often successfully estimated by theexpectation maximization algorithm in this Monte Carlo experiment, whereas models withswitches in all the parameters proved to be tougher for the EM algorithm to estimate. Ahigher number of model parameters is naturally making the estimation more difficult forthe algorithm. Among the other determinants facilitating the estimation of the algorithm,the distance between the regimes of the processes stands out for the three models. Theability to discriminate between the regimes in order to form good inference about therealization of the hidden regime variable is crucial for the convergence of EM algorithm.


DOI: 10.2870/63610

1.6. SUMMARY AND IMPLICATIONS 33

Regimes too similar from each others yield more failures in the estimation. Other factorssuch as the process or regime persistences, if they sometimes influenced the success ratesof the experiments, did not do it consistently for all three models. Increasing the dimensionof the models by adding more equations helps the EM algorithm to successfully estimatethe processes, however this gain comes at the expense of a higher sample size needed forthe estimation, due to the inflation in the number of parameters.

The finite sample properties of the estimates are similarly influenced by the distancebetween the regimes: the parameters of the processes with more distant regimes areestimated with equal or more precision. All the estimates of MSI–VAR models’ parameters,consisting of the intercepts, autoregressive terms, and variances, regardless of the distancebetween their regimes, have similar mean squared errors. MSH–VAR and MSIAH–VARmodels with more distant regimes are however more slightly accurately estimated thantheir less distant counterparts. There are no robust results over the three type of modelsregarding the impact of the persistence in regimes or processes on the precision of theestimates. There is only a tendency of less accurate parameter estimation for simulatedseries with more persistent processes. Unsurprisingly, every considered model benefitsfrom additional data, especially for larger dimensional models and for more parameterintensive models such as the MSIAH–VAR models.

Subsequently the ability of the EM algorithm to accurately estimate the hidden regimevariable was analyzed within the Monte Carlo simulations. The outcomes are in the lineof the former results regarding the impact of the distance between the regimes of MS–VARmodels, i.e. the farther away the better. More persistence in the processes of the modelseither had no effect on the regime estimation for all models. More persistent regimesled to ambiguous results, either with worse (MSI–VAR), better (MSH–VAR), or equal(MSIAH–VAR) estimation of the Markov process. Finally, and this is the main finding ofthis article, the precision with which regimes are determined by the EM algorithm showednon-equivocal improvement when the dimension of the simulated series increased.

Based on this very last result, a promising application of higher dimensional Markov-switching models would revolve around the estimation of the hidden state. Keeping inmind the original application of such models in Hamilton (1989), one could think that theexercise of dating the turning points of the business cycle could gain in precision oncemore series are included in the models.


DOI: 10.2870/63610



DOI: 10.2870/63610

Appendix A

A.1 Monte Carlo experiment results for MS-VAR models

A.1.1 MSI–VAR

Table A.1 displays the percentage of failed estimations for 1000 simulated series. TableA.2 shows the mean squared error for the first element of the intercept for each regimes,namely A01, A02. Table A.3 does it for the first element of the autoregressive matrix,namely A1. Table A.4 sums up the mean squared errors for the first diagonal element ofthe variance-covariance matrix. Finally, Table A.5 shows information about the accuracyin the regime estimation, the QPS, and ratio of wrongly estimated regimes.

35


DOI: 10.2870/63610

TableA

.1:Thistable

containssum

mary

resultsfrom

1000M

onteC

arlosim

ulationsofM

SI–VA

Rprocesses.Each

cellpresents

thepercentage

offailed

estimations

for1000

simulated

series.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

3.71.3

4.54.1

03.5

07.5

2000.9

0.30.4

10

0.90

1.3400

00

00

00

00

8000

00

00

00

0

2100

1.616.8

1.26.3

08.7

05.3

2000.1

9.30

1.60

30

1.3400

04.1

00

00.5

00

8000

0.70

00

00

0

5100

2.623.1

1.37.2

013

04.3

2000.3

13.90.1

2.10

6.50

1.5400

04.6

00.1

01.1

00.1

8000

0.60

00

0.20

0

10100

6.324

1.89.2

0.112.4

04.2

2000.7

14.10

2.10

7.50

1400

04.4

00.3

01.1

00.1

8000

0.40

00

0.10

0

20100

9.923.9

7.115.5

0.512

0.73.8

2001.5

15.10

2.50

4.30

1.2400

04.4

00.4

01.4

00

8000

0.70

00

0.10

0

36


DOI: 10.2870/63610

TableA

.2:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofMSI–V

AR

processes.Each

cellreportsthe

mean

squarederror

forboth

firstelements

oftheregim

e-varyinginterceptvectors

A01

andA

02 .The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.0630.069

0.0560.055

0.0750.072

0.0570.053

0.0250.024

0.0490.051

0.0250.022

0.0760.063

2000.028

0.0280.026

0.0250.032

0.0320.025

0.0250.013

0.0120.025

0.0260.012

0.0110.019

0.039400

0.0140.015

0.0120.012

0.0140.012

0.0110.01

0.00630.0064

0.0110.012

0.00550.0058

0.00740.0076

8000.0068

0.00680.0062

0.00580.0062

0.0060.0046

0.00440.003

0.00310.0059

0.00540.0026

0.00270.0037

0.0037

2100

0.0460.039

0.0930.097

0.0570.057

0.0710.074

0.0260.029

0.350.38

0.0350.036

0.520.39

2000.018

0.0180.045

0.0410.02

0.0180.028

0.0270.013

0.0130.12

0.10.014

0.0140.087

0.104400

0.00790.0077

0.0140.013

0.00750.0077

0.00810.0091

0.00620.0059

0.0120.012

0.0060.006

0.00940.0184

8000.0037

0.0040.0068

0.00730.0035

0.00330.0039

0.00430.0029

0.00280.0057

0.0060.0027

0.00270.0035

0.0039

5100

0.0480.048

0.150.14

0.0970.095

0.170.17

0.0410.032

1.92.1

0.0980.104

1.82.1

2000.015

0.0140.074

0.0760.02

0.020.047

0.0470.014

0.0130.47

0.480.019

0.0210.68

0.57400

0.00650.0063

0.0210.021

0.00730.0074

0.010.011

0.0060.0059

0.0440.058

0.0070.007

0.0790.046

8000.0029

0.00320.0061

0.00590.0032

0.00310.0042

0.00410.0032

0.00290.0063

0.00630.0028

0.0030.0039

0.0041

10100

0.0780.091

0.210.2

0.290.29

0.380.33

0.180.23

5.95.8

0.750.72

55.6

2000.016

0.0180.11

0.110.039

0.0370.109

0.0980.015

0.0152.2

2.30.04

0.0341.9

2400

0.00670.0069

0.0240.025

0.00890.0085

0.020.017

0.00690.0072

0.210.22

0.00890.0096

0.210.17

8000.0031

0.00310.0065

0.00720.0034

0.00330.0059

0.00510.0035

0.00330.0064

0.00680.0034

0.00360.0096

0.0137

20100

0.160.15

0.310.32

0.850.8

0.770.73

3.33.4

9.99.6

7.87.9

9.59.7

2000.042

0.0430.14

0.140.12

0.110.22

0.230.03

0.0297.2

7.20.31

0.264.8

5.1400

0.0070.0072

0.0770.077

0.0160.015

0.0630.057

0.00740.0071

1.31.2

0.0140.015

1.21.3

8000.0033

0.00340.01

0.0110.0045

0.00480.0063

0.00710.0032

0.00310.01

0.0230.0045

0.00470.027

0.015

37


DOI: 10.2870/63610

TableA

.3:Thistable

containssum

mary

resultsfrom

1000M

onteC

arlosim

ulationsofM

SI–VA

Rprocesses.Each

cellreports

them

eansquared

errorfor

theupper-leftdiagonalelem

entoftheregim

einvariantautoregressive

matrix

ofA

1 .Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.0540.028

0.0550.029

0.0210.021

0.020.044

2000.025

0.0130.028

0.0130.011

0.0110.01

0.011400

0.0130.0065

0.0140.0064

0.00520.0054

0.00520.005

8000.0057

0.0030.0063

0.00310.0027

0.00250.0024

0.0024

2100

0.0370.032

0.0350.03

0.0210.28

0.0220.49

2000.016

0.0130.016

0.0120.01

0.0640.01

0.13400

0.00730.0055

0.00690.0057

0.00490.0049

0.00530.011

8000.0035

0.00270.0035

0.00240.0025

0.00250.0027

0.0025

5100

0.0310.045

0.0320.035

0.0391.6

0.0951.9

2000.012

0.0160.012

0.0120.011

0.330.01

0.92400

0.00540.0056

0.00580.0054

0.00530.027

0.00560.1

8000.0024

0.00230.0027

0.00240.0027

0.00270.0024

0.0029

10100

0.0460.07

0.0540.066

1.24.6

1.63.2

2000.013

0.0230.015

0.0170.014

1.70.026

2.4400

0.00560.007

0.00610.006

0.00540.17

0.00570.33

8000.0027

0.0030.0028

0.00250.0026

0.00280.0026

0.012

20100

0.0610.12

0.0830.16

305.7

222.5

2000.027

0.0520.03

0.0450.085

5.40.59

3.7400

0.0080.012

0.00950.0092

0.00880.97

0.00991.9

8000.0034

0.00330.0032

0.00320.003

0.0140.003

0.04

38


DOI: 10.2870/63610

TableA

.4:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofM

SI–VA

Rprocesses.

Eachcellreports

them

eansquared

errorfor

theupper-leftdiagonalelem

entoftheregim

einvariantvariance-covariance

matrix

Σ.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.0540.028

0.0550.029

0.0210.021

0.020.044

2000.025

0.0130.028

0.0130.011

0.0110.01

0.011400

0.0130.0065

0.0140.0064

0.00520.0054

0.00520.005

8000.0057

0.0030.0063

0.00310.0027

0.00250.0024

0.0024

2100

0.0370.032

0.0350.03

0.0210.28

0.0220.49

2000.016

0.0130.016

0.0120.01

0.0640.01

0.13400

0.00730.0055

0.00690.0057

0.00490.0049

0.00530.011

8000.0035

0.00270.0035

0.00240.0025

0.00250.0027

0.0025

5100

0.0310.045

0.0320.035

0.0391.6

0.0951.9

2000.012

0.0160.012

0.0120.011

0.330.01

0.92400

0.00540.0056

0.00580.0054

0.00530.027

0.00560.1

8000.0024

0.00230.0027

0.00240.0027

0.00270.0024

0.0029

10100

0.0460.07

0.0540.066

1.24.6

1.63.2

2000.013

0.0230.015

0.0170.014

1.70.026

2.4400

0.00560.007

0.00610.006

0.00540.17

0.00570.33

8000.0027

0.0030.0028

0.00250.0026

0.00280.0026

0.012

20100

0.0610.12

0.0830.16

305.7

222.5

2000.027

0.0520.03

0.0450.085

5.40.59

3.7400

0.0080.012

0.00950.0092

0.00880.97

0.00991.9

8000.0034

0.00330.0032

0.00320.003

0.0140.003

0.04

39


DOI: 10.2870/63610

TableA

.5:Thistable

containssum

mary

resultsfrom

1000M

onteC

arlosim

ulationsofM

SI–VA

Rprocesses.Each

reportscontains

theaccuracy

inthe

estimation

ofregime.Tw

ostatistics

arepresented:the

mean

Quadratic

ProbabilityScore,and

them

eanpercentage

ofwrongly

estimated

regimes.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95Q

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ER

1100

0.213

0.0694.4

0.213

0.0855.3

5.5×10 −

90

2.2×10 −

90

9.4×10 −

120

0.0010.052

2000.18

120.057

3.70.18

120.062

49.2×

10 −11

03.4×

10 −13

01.4×

10 −9

00.00019

0.0096400

0.1712

0.0543.5

0.1712

0.0543.6

0.00000290.00025

3.9×10 −

130

4.2×10 −

110

1.2×10 −

120

8000.16

110.052

3.40.16

110.051

3.42.1×

10 −9

00.000000096

05.3×

10 −9

05.7×

10 −10

0

2100

0.127.3

0.0955.6

0.117

0.0653.8

9.2×10 −

320

0.0140.77

1.1×10 −

310

0.0231.2

2000.085

5.60.051

3.30.085

5.60.033

21.3×

10 −32

00.0036

0.21.6×

10 −32

00.0056

0.29400

0.0785.2

0.0291.9

0.0785.2

0.0211.4

2.1×10 −

320

3.6×10 −

320

3.8×10 −

290

0.00030.016

8000.075

50.022

1.40.076

5.10.021

1.43.4×

10 −27

04.2×

10 −32

02.3×

10 −32

04.2×

10 −32

0

5100

0.0824.5

0.2212

0.0633.4

0.189.3

0.000410.021

0.0934.9

0.00290.14

0.115.7

2000.016

10.11

6.10.016

10.049

2.71.5×

10 −32

00.02

1.11.7×

10 −32

00.043

2.2400

0.0130.83

0.0191.2

0.0130.87

0.00670.4

1.5×10 −

320

0.00150.09

1.9×10 −

320

0.00410.22

8000.012

0.80.0045

0.290.013

0.820.0032

0.22.1×

10 −32

04.1×

10 −32

02.2×

10 −32

03.8×

10 −32

0

10100

0.2412

0.4925

0.2110

0.3819

0.0160.83

0.315

0.0392

0.2814

2000.015

0.830.25

140.0084

0.450.18

9.21.9×

10 −32

00.11

5.70.00046

0.0230.13

6.9400

0.00180.1

0.0392.2

0.00120.071

0.0190.99

1.6×10 −

320

0.00880.5

2.4×10 −

320

0.0130.7

8000.00085

0.0550.002

0.130.0009

0.0570.00073

0.0422.1×

10 −32

03.2×

10 −32

02.1×

10 −32

00.00046

0.024

20100

0.5728

0.6834

0.5226

0.630

0.3216

0.5527

0.3216

0.4723

2000.2

100.57

290.12

6.30.44

220.0017

0.0850.38

190.022

1.10.33

17400

0.000480.025

0.2111

0.000320.017

0.136.7

1.9×10 −

320

0.0553

1.8×10 −

320

0.0995

8000.0000048

0.000250.0087

0.50.0000026

0.000130.0016

0.0832.2×

10 −32

00.00042

0.0232.1×

10 −32

00.0013

0.067

40


DOI: 10.2870/63610

A.1.2 MSH–VAR

Table A.6 displays the percentage of failed estimations for 1000 simulated series. TableA.7 shows the mean squared error for the first element of the intercept A0 Table A.8 doesit for the first element of the autoregressive matrix, namely A1. Table A.9 sums up themean squared errors for the first diagonal element of the variance-covariance matrix, foreach regimes. Finally, Table A.10 shows information about the accuracy in the regimeestimation, the QPS, and ratio of wrongly estimated regimes.

41


DOI: 10.2870/63610

TableA

.6:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofM

SH–V

AR

processes.Each

cellreportsthe

percentageoffailed

estimations

for1000sim

ulatedseries.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

52.316.1

54.518.9

2.30.2

3.10.2

20040.7

5.940

5.80

0.20.1

0400

24.21.3

24.70.5

00

00

80010.6

010.2

00

00

0

2100

30.94.7

27.95

1.20

0.60

20010.5

0.79.9

0.60

00

0400

10.1

10.2

00.1

00

8000.2

0.10.1

0.10.1

00

0

5100

26.75.8

275.1

2.10.1

2.80.3

2002.1

0.21.9

0.10

00.1

0400

00

00

00

00

8000

00

00

00

0

10100

24.110

27.211.3

10.95.2

124.4

2005.8

1.35.2

1.90.2

00.1

0400

00

00

00

00

8000

00

00

00

0

20100

4748.5

37.241.4

62.868.6

71.870.3

20013.6

7.512.7

8.92.9

9.15.9

7400

2.30

2.50

00

0.30

8000

00

00

00

0

42


DOI: 10.2870/63610

TableA

.7:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofM

SH–V

AR

processes.Each

cellreportsthe

mean

squarederror

forthe

firstelementofthe

regime

invariantinterceptvectorA

0 .Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.080.084

0.440.5

0.0730.074

0.20.32

2000.036

0.0380.21

0.210.027

0.0320.069

0.1400

0.0150.015

0.0670.063

0.0110.015

0.0250.035

8000.008

0.00770.025

0.0290.0054

0.00710.01

0.014

2100

0.140.13

0.940.93

0.0860.11

0.420.82

2000.056

0.0640.33

0.320.03

0.0440.092

0.17400

0.0250.027

0.10.13

0.0130.021

0.0360.062

8000.012

0.0130.045

0.0510.0063

0.00840.014

0.024

5100

0.380.34

3.83.5

0.170.29

1.53.2

2000.12

0.140.84

10.055

0.110.25

0.68400

0.0560.058

0.270.34

0.0210.038

0.0720.18

8000.023

0.0260.11

0.130.01

0.020.029

0.057

10100

11.1

1415

1.31.4

1519

2000.28

0.282.3

2.90.12

0.260.82

2.6400

0.0970.11

0.60.87

0.0410.088

0.180.56

8000.041

0.0540.22

0.290.017

0.0360.068

0.15

20100

3.73.9

5259

65.8

7980

2000.97

0.9313

131.2

1.412

19400

0.210.28

1.92.6

0.0970.2

0.642.2

8000.091

0.120.59

0.740.035

0.0820.16

0.51

43


DOI: 10.2870/63610

TableA

.8:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofM

SH–V

AR

processes.Each

cellreportsthe

mean

squarederrorforthe

upper-leftdiagonalelementofthe

regime

invariantautoregressivem

atrixA

1 .Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.00780.009

0.00430.0048

0.0050.0075

0.00140.0028

2000.0039

0.00420.0018

0.00190.0017

0.00270.00049

0.00087400

0.00160.0015

0.000590.00059

0.000770.0015

0.000160.00029

8000.00082

0.000820.00021

0.000270.00035

0.000590.000073

0.00011

2100

0.00890.0079

0.00510.0057

0.00380.0064

0.00190.0043

2000.0033

0.00390.0017

0.00180.0015

0.00290.00038

0.00083400

0.00150.0019

0.000560.00067

0.000610.0012

0.000150.0003

8000.00078

0.000880.00023

0.000270.00029

0.000490.00006

0.00011

5100

0.0130.013

0.0120.014

0.00560.0088

0.00390.011

2000.0038

0.00440.0025

0.00340.0012

0.00320.00058

0.002400

0.00170.0018

0.000710.00088

0.000530.0011

0.000150.00048

8000.00071

0.000790.00025

0.000340.00023

0.000520.000058

0.00014

10100

0.0290.027

0.0370.039

0.0240.035

0.0360.049

2000.0058

0.00590.0055

0.00720.002

0.00510.0014

0.0058400

0.00180.0022

0.00120.0017

0.00060.0016

0.000250.0011

8000.00073

0.000930.00035

0.000480.00025

0.000640.000074

0.00023

20100

0.0830.081

0.140.15

0.110.12

0.190.19

2000.017

0.0170.028

0.030.017

0.0280.026

0.043400

0.00270.0038

0.00330.0049

0.0010.0029

0.000760.0039

8000.00091

0.00110.0007

0.00110.00032

0.000890.00014

0.00066

44


DOI: 10.2870/63610

TableA

.9:Thistable

containssum

mary

resultsfrom

1000M

onteC

arlosim

ulationsofM

SH–V

AR

processes.Eachcellreports

them

eansquared

errorsfor

eachupper-leftdiagonalelem

entofthevariance-covariance

matrices

Σ1

andΣ

2 .Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.321.88

0.111.49

0.271.71

0.121.56

0.3134.86

0.06828.437

0.837.7

0.09830.987

2000.13

0.850.06

0.870.13

0.850.069

0.8870.056

16.710.33

14.530.053

19.230.03

15.52400

0.0620.542

0.0320.428

0.060.49

0.0380.436

0.148.18

0.0157.388

0.0248.448

0.0147.071

8000.036

0.3080.02

0.220.031

0.3020.017

0.210.012

4.4290.0069

3.89380.3

4.40.0065

3.4997

2100

0.291.88

0.111.55

0.321.78

0.11.4

0.8431.58

0.09728.342

1.127.3

0.05728.512

2000.083

0.8840.037

0.680.096

0.8420.046

0.640.36

13.670.025

14.1730.031

13.9930.024

14.828400

0.0330.386

0.0170.307

0.0320.394

0.0160.337

0.0157.199

0.0117.38

0.0156.015

0.0116.853

8000.036

0.2070.0076

0.15890.019

0.1860.0072

0.16220.0075

3.31230.0058

3.3940.0075

3.31870.0062

3.2603

5100

0.762.22

0.311.58

0.741.97

0.251.47

2.329.5

0.4528.77

0.930.3

0.3328.36

2000.065

0.690.085

0.5860.083

0.6760.058

0.6350.027

12.8350.026

13.6580.024

12.8130.027

15.783400

0.0170.3

0.0130.285

0.0180.289

0.0120.265

0.0126.079

0.0116.59

0.0116.48

0.0116.788

8000.008

0.1450.0055

0.13470.0077

0.13370.0055

0.12970.005

3.1090.0054

3.40780.0059

3.38850.0051

3.2821

10100

1.63.5

0.882.35

1.63.9

0.922.73

3256

1958

3464

1859

2000.22

0.750.26

0.70.17

0.780.18

0.690.042

12.9491.9

16.60.064

13.4050.94

16.87400

0.0140.301

0.0130.292

0.0140.288

0.0120.277

0.0126.524

0.0126.748

0.0126.847

0.0126.539

8000.006

0.1310.0054

0.13540.0064

0.13210.0058

0.12990.0054

3.26130.0054

3.18980.0052

2.93260.0057

3.1783

20100

44.6

2.44.2

2.76

2.35

135117

66165

109172

58276

2001.7

2.81.3

21.6

3.11

2.124

3132

3724

3927

34400

0.0790.378

0.0460.323

0.0950.389

0.0510.341

0.0186.748

0.597.96

0.0886.518

0.0288.096

8000.0063

0.12950.0059

0.14720.007

0.1290.0069

0.13250.0073

3.21970.0067

3.41370.007

2.9350.0069

3.5707

45


DOI: 10.2870/63610

TableA

.10:Thistable

containssum

mary

resultsfrom

1000M

onteC

arlosim

ulationsofM

SH–V

AR

processes.Eachcellreports

theaccuracy

inthe

estimation

ofregime.

Two

statisticsare

presented:the

mean

Quadratic

ProbabilityScore,and

them

eanpercentage

ofwrongly

estimated

regimes.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95Q

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ER

1100

0.3726

0.213

0.3826

0.2113

0.2114

0.0744.8

0.2114

0.0775

2000.36

260.19

130.36

260.18

120.19

130.064

4.20.19

130.063

4.2400

0.3526

0.1712

0.3526

0.1712

0.1813

0.0593.9

0.1813

0.0593.9

8000.34

260.17

110.35

260.17

110.18

130.059

3.90.18

130.058

3.8

2100

0.3121

0.149.1

0.3321

0.148.7

0.117.1

0.0342.2

0.116.8

0.0332.1

2000.28

190.11

7.20.28

190.11

7.30.091

6.10.026

1.70.09

60.026

1.6400

0.2618

0.0986.6

0.2618

0.0986.6

0.0865.8

0.0231.5

0.0865.8

0.0231.5

8000.25

180.094

6.30.25

180.093

6.20.084

5.70.023

1.50.084

5.60.023

1.5

5100

0.3419

0.137.3

0.3419

0.137.5

0.0482.6

0.021.1

0.0361.9

0.0170.91

2000.16

100.049

3.10.16

100.05

3.20.018

1.10.0065

0.40.017

0.990.0059

0.36400

0.128.4

0.0382.5

0.138.5

0.0372.4

0.0130.83

0.00420.26

0.0130.81

0.0040.26

8000.12

7.90.035

2.20.12

7.90.034

2.30.012

0.770.0036

0.230.012

0.740.0034

0.22

10100

0.6231

0.3718

0.6432

0.3718

0.3417

0.2412

0.3517

0.2412

2000.16

8.70.074

3.90.14

7.60.061

3.30.0057

0.290.012

0.580.0047

0.240.0071

0.36400

0.0563.5

0.0160.96

0.0563.5

0.0150.94

0.00120.066

0.000770.04

0.00110.06

0.000660.034

8000.045

30.012

0.780.046

30.012

0.780.0007

0.0420.00032

0.0180.00065

0.040.00028

0.017

20100

0.8341

0.6231

0.8341

0.6331

0.7437

0.5326

0.7437

0.5125

2000.65

320.47

230.66

330.45

230.31

150.32

160.31

160.29

15400

0.0492.5

0.0211.1

0.0472.4

0.0180.91

0.000460.023

0.00220.11

0.0010.052

0.000690.034

8000.012

0.690.0035

0.20.012

0.670.0033

0.190.000017

0.000880.000015

0.000754.6×

10 −10

00.000012

0.00063

46


DOI: 10.2870/63610

A.1.3 MSIAH–VAR

Table A.11 displays the percentage of failed estimations for 1000 simulated series. TableA.12 shows the mean squared error for the first element of the intercept for each regimes,namely A01, A02. Table A.13 does it for the first element of the autoregressive matrix,namely A11 and A12. Table A.14 sums up the mean squared errors for the first diagonalelement of the variance-covariance matrix, for each regimes. Finally, Table A.15 showsinformation about the accuracy in the regime estimation, the QPS, and ratio of wronglyestimated regimes.

47


DOI: 10.2870/63610

TableA

.11:MSIA

H–V

AR

processes.Eachcellpresents

thepercentage

offailedestim

ationsfor1000

simulated

series.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1),(Σ1 ,Σ

2 )∈(1,5)

(A01 ,A

02 )∈(−

5,5),(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

5.54.4

1.51

110.1

0.30.3

2000.4

1.20.4

0.20.1

6.30.1

0400

00.3

0.10

05.8

00

8000

0.20

00

4.50

0

2100

7.47.1

4.53.7

1.27.6

3.54.5

2000.8

21.6

1.60.9

2.91.3

1400

00.2

0.20.3

01

0.30.1

8000

00

00

0.40.1

0

5100

18.110.6

14.39.9

9.120

13.514.8

2001

2.22.3

2.50.7

6.72.7

4.1400

00

00.3

00.2

00.5

8000

00

00

00

0.1

10100

17.517.9

10.25.8

31.264.4

17.846.2

20012.6

7.912.7

86

35.714.2

15.5400

00.6

1.62.8

0.15.9

2.14.3

8000

00

00

0.10

0.1

20100

7.215.6

2.75.5

75.484.2

37.862.6

20011

12.15.9

2.229

80.315

52.4400

11.38.6

9.34.5

7.251.7

1114.8

8000.2

0.11.7

3.40

2.92.1

5.1

48


DOI: 10.2870/63610

TableA

.12:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofM

SIAH

–VA

Rprocesses.

Eachcellreports

them

eansquared

errorfor

thefirst

elements

ofeach

regime-varying

interceptsvectors

A01

andA

02 .T

heparam

etersof

thetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1),(Σ1 ,Σ

2 )∈(1,5)

(A01 ,A

02 )∈(−

5,5),(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.050.37

0.0560.51

0.0390.22

0.390.78

0.131.02

2.53

0.0490.99

1.56.2

2000.021

0.1190.02

0.150.012

0.0790.013

0.190.073

0.52.4

1.30.011

0.380.011

1.24400

0.00810.047

0.00790.052

0.00570.036

0.00580.072

0.0350.26

2.710.68

0.00460.19

0.00550.47

8000.0038

0.0220.0033

0.0250.0029

0.0160.0026

0.0340.0028

0.112.44

0.360.0025

0.0850.0025

0.21

2100

0.110.3

0.20.53

0.170.28

0.181.62

1.21.6

2.94.3

1.53

7.815.2

2000.02

0.0920.036

0.170.032

0.0880.034

0.310.2

0.540.77

1.380.42

0.50.53

3.17400

0.00830.038

0.0070.063

0.00690.036

0.00650.095

0.00550.23

0.230.65

0.00520.18

0.0710.73

8000.0033

0.020.0032

0.0290.0026

0.0160.0026

0.0390.0028

0.110.0034

0.280.0027

0.0860.0025

0.24

5100

0.440.39

0.650.8

0.720.78

1.63.4

5.33.1

13.88.6

15.27.9

3259

2000.029

0.0960.074

0.250.059

0.110.14

0.490.17

0.572.3

22.11

0.812.9

8.5400

0.0060.043

0.00650.097

0.00550.037

0.00530.14

0.00550.24

0.0370.82

0.00480.2

0.0460.76

8000.0029

0.0220.0033

0.0450.0024

0.0170.0027

0.0520.0028

0.120.004

0.370.0027

0.0890.0027

0.3

10100

1.010.79

0.960.91

1.71.6

2.32.9

2113

2916

3328

73125

2000.31

0.210.6

0.430.96

0.361.5

1.42.9

1.413

2.922.6

5.940

31400

0.00660.05

0.0160.145

0.0440.05

0.170.19

0.00540.26

0.670.94

0.710.28

4.23.9

8000.0025

0.0210.0037

0.0610.0026

0.0190.0058

0.0620.0027

0.130.0044

0.430.0025

0.0930.0027

0.31

20100

0.851.25

0.941.42

1.12.4

1.47.4

2330

2652

2455

137347

2001.03

0.781.02

0.771.45

0.971.4

1.322

1226.7

8.929

2245

36400

0.330.2

0.530.29

1.30.31

1.80.58

1.170.63

13.71.4

29.55.4

3915

8000.0026

0.0240.0055

0.0810.037

0.0230.29

0.0980.0032

0.120.0087

0.51.64

0.185.3

1.1

49


DOI: 10.2870/63610

TableA

.13:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofM

SIAH

–VA

Rprocesses.

Eachcellreports

them

eansquared

errorfor

eachupper-leftdiagonalelem

entoftheregim

e-varyingautoregressive

coefficientm

atricesA

11and

A12 ,over

1000sim

ulatedseries.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1),(Σ1 ,Σ

2 )∈(1,5)

(A01 ,A

02 )∈(−

5,5),(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.0160.028

0.030.043

0.00220.0097

0.010.03

0.00480.0107

0.2060.025

0.0010.0029

0.00990.015

2000.0043

0.0110.0092

0.0120.00044

0.00270.0007

0.00390.0053

0.00480.22

0.00790.00003

0.000880.000033

0.001400

0.00160.0042

0.00440.0043

0.000190.001

0.000260.0011

0.00270.0025

0.250.0047

0.0000140.00041

0.0000140.00036

8000.0007

0.0020.0011

0.00190.000096

0.000440.00011

0.000470.000049

0.0010.2232

0.00270.0000066

0.000180.0000066

0.00015

2100

0.0520.042

0.1080.041

0.0580.056

0.160.094

0.0860.041

0.170.032

0.160.093

0.250.14

2000.0081

0.00950.018

0.00990.022

0.0140.021

0.0160.011

0.0090.032

0.00850.038

0.0240.087

0.023400

0.00270.0043

0.00250.0036

0.000310.0014

0.000480.00154

0.000260.0029

0.01050.0051

0.0000610.0011

0.020.012

8000.00074

0.00190.001

0.00170.00013

0.000550.00021

0.000560.00014

0.00150.00035

0.00140.000029

0.000470.000052

0.00044

5100

0.20.098

0.30.096

0.320.24

0.680.27

0.260.11

0.670.68

0.740.42

2.630.79

2000.014

0.0120.043

0.0150.021

0.0230.049

0.0330.0041

0.0110.105

0.0150.052

0.0380.26

0.068400

0.00150.0046

0.00260.0044

0.00130.0022

0.000910.0029

0.000430.0041

0.00190.004

0.000110.0024

0.000820.0034

8000.00078

0.0020.0012

0.0020.00019

0.000920.00034

0.000880.00019

0.002210.00061

0.00170.000041

0.000930.000092

0.00089

10100

0.40.3

0.420.19

0.890.84

0.950.69

0.610.39

0.80.2

1.71.5

2.81.3

2000.14

0.0540.31

0.0550.37

0.30.66

0.270.11

0.0370.546

0.0490.6

0.451.78

0.57400

0.00220.006

0.00710.0064

0.0120.016

0.0650.022

0.000550.0061

0.030.0074

0.0160.014

0.180.036

8000.00074

0.00230.0014

0.00230.00026

0.00140.0026

0.00220.00023

0.00240.00073

0.00210.000058

0.00170.0002

0.0019

20100

0.350.45

0.370.34

0.961.15

0.880.89

0.640.78

0.580.46

2.62.4

2.31.8

2000.41

0.280.44

0.180.85

0.780.91

0.620.57

0.290.75

0.181.3

1.21.94

0.94400

0.130.062

0.270.041

0.510.42

0.830.33

0.040.019

0.490.034

0.690.53

1.490.43

8000.001

0.00320.003

0.00360.014

0.0140.098

0.030.00029

0.00380.0018

0.00330.026

0.0180.15

0.034

50


DOI: 10.2870/63610

TableA

.14:This

tablecontains

summ

aryresults

from1000

Monte

Carlo

simulations

ofM

SIAH

–VA

Rprocesses.

Eachcellreports

them

eansquared

errorfor

theeach

upper-leftdiagonalelementofthe

variance-covariancem

atricesΣ

1andΣ

2 ,over1000

simulated

series.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1),(Σ1 ,Σ

2 )∈(1,5)

(A01 ,A

02 )∈(−

5,5),(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.411.89

0.251.73

1.73.1

0.321.67

4.338.6

16129

5.7680.7

30216

2000.093

0.780.075

0.720.028

0.610.025

0.60.089

27.41.8

1300.024

13.240.024

14.62400

0.0220.35

0.0160.3

0.0130.29

0.0110.27

0.03710.205

2.2174

0.0117.138

0.0116.6

8000.01

0.170.0062

0.130.0065

0.130.0054

0.130.0069

3.242.2

161.60.0054

3.310.0054

3.07

2100

1.11.6

2.51.8

3214

1650

45127

40115

22967965

592312217

2000.067

0.660.11

0.681.8

6.82.7

19.50.6

35.30.34

48.17289

1164273

4792400

0.0250.27

0.0140.27

0.0710.43

0.0430.98

0.0116.22

0.2212.9

0.0116.41

15441

8000.0073

0.130.0055

0.140.0051

0.130.0053

0.140.0057

3.410.0055

3.330.0051

3.30.005

3.16

5100

5.62.8

3.92.8

121159

171131

334289

310213

2295040942

3724062616

2000.2

0.60.55

0.728.5

11.916

232.6

35.412

744149

40803691

11860400

0.0120.29

0.0110.29

0.650.27

0.0110.29

0.0116.84

0.0168.91

0.0116.65

50365

8000.0061

0.130.0051

0.140.005

0.130.0053

0.120.0053

2.90.0058

3.350.0053

3.160.0054

3.31

10100

103

43.6

253272

152137

1133556

549189

4053869353

4278334929

2003.7

1.23.4

1.6169

149149

182185

139220

14235798

4857843886

71326400

0.0180.33

0.0650.34

6.53.9

1718

0.0127.82

1.827

11211383

468111170

8000.0057

0.130.0057

0.150.0057

0.140.29

0.770.0054

3.330.0063

3.410.0052

3.760.0054

3.88

20100

2.85.8

1.38.4

82102

3033

489139

220245

1776522755

108628346

2009.4

1.74.1

3.2234

216100

1011158

576454

11538456

5704032973

28781400

4.310.66

3.110.83

243203

182170

8258

203117

4788657068

4080769931

8000.0072

0.190.031

0.25.9

5.237

260.0076

4.770.07

4.892450

24536305

10867

51


DOI: 10.2870/63610

TableA

.15:Thistable

containssum

mary

resultsfrom

1000M

onteC

arlosim

ulationsofM

SIAH

–VA

Rprocesses.Each

cellreportsthe

accuracyin

theestim

ationofregim

e.Two

statisticsare

presented:them

eanQ

uadraticProbability

Score,andthe

mean

percentageofw

ronglyestim

atedregim

es.Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95Q

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ERQ

PS%

WER

QPS

%W

ER

1100

0.2214

0.0986.2

0.0845.6

0.0291.8

0.053.1

0.0462.8

0.0241.5

0.00810.5

2000.19

130.071

4.70.083

5.70.025

1.60.049

30.044

2.80.022

1.40.0059

0.38400

0.1813

0.0654.3

0.0815.6

0.0231.5

0.0452.9

0.053.2

0.0211.3

0.00540.35

8000.18

130.063

4.20.079

5.50.023

1.50.044

2.80.046

2.90.021

1.40.0056

0.36

2100

0.148.5

0.084.6

0.0512.9

0.0362

0.0422.3

0.0392.1

0.0341.8

0.0311.5

2000.093

6.10.032

2.10.03

1.90.016

0.890.017

0.990.011

0.640.012

0.690.01

0.55400

0.0865.6

0.0241.6

0.021.3

0.00690.43

0.0110.68

0.00490.28

0.00360.22

0.00260.15

8000.083

5.50.024

1.60.019

1.30.0055

0.360.011

0.670.0024

0.150.0035

0.220.0014

0.082

5100

0.2915

0.2513

0.2512

0.2713

0.157.7

0.2111

0.2814

0.3216

2000.034

20.029

1.60.019

10.03

1.50.0054

0.290.026

1.40.03

1.60.04

2400

0.0181.2

0.00480.3

0.00150.086

0.000370.024

0.000570.035

0.000440.023

0.0000650.0043

0.000660.035

8000.017

1.10.0045

0.280.00074

0.0480.00031

0.0190.00052

0.0310.000085

0.00530.000082

0.00510.00006

0.0036

10100

0.7738

0.6130

0.839

0.6632

0.6834

0.6130

0.8140

0.7235

2000.21

100.28

140.34

170.41

210.079

40.21

110.42

210.47

24400

0.00440.25

0.00470.25

0.0130.68

0.0331.7

0.0000180.001

0.00660.33

0.0120.58

0.0472.4

8000.0028

0.180.00073

0.0460.000012

0.000750.00095

0.0480.000014

0.000753.9×

10 −9

04.6×

10 −9

00.000000013

0

20100

0.8542

0.7537

0.8441

0.7235

0.9246

0.8844

0.944

0.8643

2000.86

430.76

380.87

430.76

380.71

350.64

320.88

440.85

42400

0.2412

0.2814

0.5528

0.6131

0.0341.7

0.2412

0.6130

0.5829

8000.00016

0.00860.001

0.0520.014

0.680.063

3.26.9×

10 −14

00.00008

0.0040.022

1.10.062

3.1

52


DOI: 10.2870/63610

A.2 Statistics’ ratios of MS-VAR models over VAR models

A.2.1 MSI–VAR vs. VAR

Table A.16 shows the ratios of mean squared error for the first elements of the interceptvectors for each regimes, namely A01, A02, over the mean squared error for the first elementof the intercepts vector of similar VAR processes. Table A.17 does it for the autoregressivematrix, namely A1, and Table A.18 for the first diagonal element of the variance-covariancematrix.

53


DOI: 10.2870/63610

TableA

.16:Thistable

containsratios

ofthem

eansquared

errorsofM

SI–VA

Rprocesses

overthe

mean

squarederrors

ofVA

Rprocesses,over

1000experim

entsfor

eachtype

ofm

odel.Each

cellreportsthe

ratiosfor

bothfirst

elements

ofthe

regime-varying

interceptvectors

A01

andA

02ofM

SI–VA

Rprocesses

overthe

firstelementofthe

A0

interceptvectorofV

AR

processesw

ithequivalentparam

eters.The

parameters

ofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.931.01

0.820.81

0.140.13

0.1020.095

0.0210.02

0.040.042

0.00210.0019

0.00660.0054

2001.1

10.95

0.940.19

0.180.14

0.140.024

0.0220.047

0.0480.0029

0.00270.0046

0.0096400

1.11.1

0.960.9

0.220.19

0.180.17

0.0240.024

0.0430.044

0.00350.0037

0.00470.0048

8001.1

1.10.98

0.920.22

0.210.16

0.160.024

0.0250.047

0.0430.0039

0.00390.0055

0.0055

2100

0.440.38

0.890.92

0.0530.053

0.0650.069

0.010.011

0.140.15

0.00140.0014

0.0210.015

2000.36

0.360.91

0.830.053

0.0480.074

0.0740.012

0.0120.11

0.090.0015

0.00150.0093

0.011400

0.350.34

0.620.6

0.0520.053

0.0550.063

0.0120.011

0.0230.022

0.00190.0019

0.00290.0057

8000.33

0.350.6

0.640.058

0.0540.064

0.070.012

0.0110.023

0.0240.0019

0.00180.0024

0.0026

5100

0.150.15

0.480.46

0.0280.028

0.050.05

0.00520.0041

0.250.26

0.00120.0013

0.0220.026

2000.12

0.120.59

0.60.018

0.0180.043

0.0430.0053

0.00490.17

0.180.00076

0.000840.028

0.023400

0.110.11

0.370.36

0.0220.022

0.030.034

0.00460.0045

0.0340.045

0.000750.00075

0.00850.005

8000.1

0.110.22

0.210.021

0.0210.028

0.0270.0047

0.00430.0093

0.00930.00079

0.000820.0011

0.0011

10100

0.120.14

0.320.31

0.0280.028

0.0370.032

0.0120.015

0.370.36

0.00310.003

0.0210.024

2000.059

0.0680.42

0.410.015

0.0150.043

0.0390.0022

0.00230.34

0.350.00056

0.000480.026

0.028400

0.0570.058

0.20.21

0.0090.0086

0.020.018

0.00240.0025

0.0720.077

0.000390.00042

0.00910.0077

8000.059

0.0580.12

0.140.01

0.00970.017

0.0150.0026

0.00250.0048

0.00520.0004

0.000430.0012

0.0016

20100

0.090.089

0.180.18

0.0250.024

0.0230.022

0.0780.08

0.230.23

0.00970.0098

0.0120.012

2000.066

0.0690.22

0.220.013

0.0120.024

0.0250.002

0.00190.47

0.470.0014

0.00120.022

0.023400

0.0260.027

0.290.29

0.00590.0055

0.0230.021

0.00130.0012

0.210.21

0.000230.00024

0.0190.02

8000.03

0.030.093

0.0950.0056

0.0060.0079

0.00880.0012

0.00120.0037

0.00840.00022

0.000230.00134

0.00073

54


DOI: 10.2870/63610

TableA

.17:This

tablecontains

ratiosof

them

eansquared

errorsof

MSI–V

AR

processesover

them

eansquared

errorsofV

AR

processes,over1000

experiments

foreach

typeofm

odel.Eachcellreports

theratios

forthe

upper-leftdiagonal

ofA

1 ,the

matrix

ofregim

einvariant

autoregressivecoeffi

cientsfor

MSI–V

AR

processesover

thefirst

elementofthe


cientsm

atrixA

1for

VA

Rprocesses

with

equivalentparameters.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

10.74

0.650.23

0.0220.037

0.0080.012

2001.1

0.790.67

0.250.028

0.0450.0089

0.012400

1.20.77

0.710.23

0.0240.042

0.0110.0059

8001.1

0.870.64

0.250.025

0.0410.011

0.007

2100

1.10.95

0.90.76

0.470.93

0.473.5

2000.94

0.920.62

0.630.53

0.640.42

2400

0.890.86

0.740.63

0.480.48

0.520.76

8000.88

0.730.74

0.530.54

0.50.48

0.49

5100

1.62.5

1.31.9

0.845.9

0.926.1

2000.91

1.90.86

1.50.83

2.30.7

5.7400

0.840.99

0.740.82

0.840.98

0.822.5

8000.83

0.870.79

0.770.79

0.740.93

0.91

10100

2.73.5

1.92.1

2.910

1.96

2001.4

3.10.95

1.90.89

7.60.95

7.6400

11.6

0.861.2

0.982

0.792.6

8001

1.20.93

0.980.88

0.990.9

1.1

20100

3.22.6

1.71.7

96.5

43.3

2003.3

4.81.5

2.31

151.6

5.5400

0.893.4

0.941.9

1.15.9

0.945.6

8001

1.20.95

1.10.9

1.10.96

1.1

55


DOI: 10.2870/63610

TableA

.18:This

tablecontains

ratiosof

them

eansquared

errorsof

MSI–V

AR

processesover

them

eansquared

errorsofV

AR

processes,over1000

experiments

foreach

typeofm

odel.Eachcellreports

theratios

forthe

upper-leftdiagonal

ofthe

regime

invariantvariance-covariance

matrix

Σfor

MSI–V

AR

processesover

thefirst

element

ofthe


atrixΣ

forV

AR

processesw

ithequivalent

parameters.

Theparam

etersof

thetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1)(A

01 ,A02 )∈

(−5,5)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

2.61.3

2.71.4

1.11.1

0.972.2

2002.3

1.22.7

1.31

10.97

1.1400

2.51.2

31.4

11.1

10.98

8002.2

1.22.6

1.31.2

1.10.98

1

2100

1.81.6

1.81.5

1.114

1.124

2001.5

1.31.6

1.20.95

6.10.94

12400

1.61.2

1.41.2

11

0.982.1

8001.4

1.11.4

0.940.97

0.970.98

0.91

5100

1.42.1

1.41.6

280

4.588

2001.1

1.61.1

1.11.1

311

92400

1.11.1

10.98

15.2

1.121

8000.97

0.91.1

0.981.1

1.10.98

1.2

10100

2.13.2

2.22.7

47189

66132

2001.2

21.3

1.51.3

1592.3

214400

1.21.5

1.21.2

1.134

1.164

8001.1

1.21.2

11

1.11

4.6

20100

2.44.7

2.54.8

1161221

62373

2002.5

4.82.3

3.47.7

48747

292400

1.52.4

1.81.8

1.6181

1.9356

8001.4

1.31.2

1.21.1

5.11.1

15

56


DOI: 10.2870/63610

A.2.2 MSH–VAR vs. VAR

Table A.19 shows the ratios of mean squared error for the first elements of the first elementof A0, the intercept vector, over the mean squared error for the first element of the interceptsvector of similar VAR processes. Table A.20 does it for the autoregressive matrix, and TableA.21 for the first diagonal element of the variance-covariance matrix.

57


DOI: 10.2870/63610

TableA

.19:This

tablecontains

ratiosof

them

eansquared

errorsof

MSH

–VA

Rprocesses

overthe

mean

squarederrors

ofV

AR

processes,over

1000experim

entsfor

eachtype

ofm

odel.Each

cellreports

theratios

forthe

firstelem

entthe

interceptvector

A0

ofM

SH–V

AR

processesover

thefirst

element

ofthe

A0

interceptvector

ofV

AR

processesw

ithequivalent

parameters.

Theparam

etersof

thetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.750.79

0.750.84

0.220.22

0.20.34

2000.75

0.791

0.990.17

0.20.17

0.25400

0.640.61

0.840.79

0.140.2

0.170.24

8000.67

0.650.66

0.790.14

0.180.16

0.22

2100

0.890.79

0.90.89

0.220.28

0.240.46

2000.8

0.910.91

0.90.16

0.240.16

0.3400

0.760.81

0.660.86

0.150.25

0.160.28

8000.71

0.80.7

0.790.15

0.20.14

0.24

5100

0.980.88

1.11

0.270.47

0.320.7

2000.78

0.910.74

0.920.2

0.420.16

0.44400

0.80.82

0.690.86

0.170.31

0.160.4

8000.73

0.820.63

0.780.18

0.360.16

0.3

10100

1.41.5

1.21.3

0.941

1.11.5

2000.98

10.8

0.980.24

0.540.25

0.8400

0.770.86

0.71

0.210.44

0.160.5

8000.59

0.790.62

0.810.19

0.40.18

0.4

20100

1.92.1

1.61.9

2.52.4

22

2001.5

1.41.3

1.41.4

1.71.2

1.8400

0.831.1

0.731

0.310.65

0.210.73

8000.78

10.7

0.870.24

0.580.19

0.6

58


DOI: 10.2870/63610

TableA

.20:This

tablecontains

ratiosof

them

eansquared

errorsof

MSH

–VA

Rprocesses

overthe

mean

squarederrors

ofVA

Rprocesses,over

1000experim

entsfor

eachtype

ofmodel.Each

cellreportsthe

ratiosfor

theupper-left

diagonalof

A1 ,

them

atrixof

regime

invariantautoregressive

coefficients

forM

SH–V

AR

processesover

thefirst

elementofthe


cientsm

atrixA

1for

VA

Rprocesses

with

equivalentparameters.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.991.2

0.921

0.630.94

0.310.6

2001.1

1.21.1

1.10.47

0.720.26

0.46400

0.980.91

0.930.92

0.460.88

0.260.46

8001.1

1.10.73

0.920.42

0.720.26

0.4

2100

1.10.98

0.891

0.460.79

0.290.67

2000.91

1.10.86

0.920.4

0.780.19

0.42400

0.971.2

0.710.85

0.350.72

0.210.42

8000.98

1.10.72

0.840.34

0.570.2

0.37

5100

1.11.1

0.991.2

0.530.83

0.280.77

2000.89

10.79

10.32

0.810.18

0.6400

0.951

0.720.9

0.290.63

0.140.47

8000.78

0.870.64

0.860.27

0.60.16

0.39

10100

1.71.6

1.21.2

1.42

1.21.6

2001

10.79

10.32

0.820.18

0.75400

0.760.93

0.650.89

0.270.71

0.130.55

8000.8

10.67

0.910.27

0.690.14

0.44

20100

2.42.3

1.51.6

33.3

2.22.2

2001.7

1.71.3

1.41.7

2.81.1

1.9400

0.831.2

0.671

0.310.9

0.150.77

8000.74

0.920.55

0.880.29

0.810.12

0.54

59


DOI: 10.2870/63610

TableA

.21:This

tablecontains

ratiosof

them

eansquared

errorsof

MSH

–VA

Rprocesses

overthe

mean

squarederrorsofV

AR

processes,over1000experim

entsforeachtype

ofmodel.Each

cellreportstheratios

foreachupper-left

diagonalof

theregim

e-dependentvariance-covariance

matrices

Σ1

andΣ

2for

MSH

–VA

Rprocesses

overthe

firstelem

entofthevariance-covariance

matrix

Σfor

VA

Rprocesses

with

equivalentparameters.

Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(Σ1 ,Σ

2 )∈(1,5)

(Σ1 ,Σ

2 )∈(1,25)

A1

diagonals=

0.6A

1diagonals

=0.9

A1

diagonals=

0.6A

1diagonals

=0.9

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

15.53.4

5.12.7

13.13.4

5.73.1

14.72.7

3.32.2

39.62.9

4.82.4

20011.4

3.45.4

3.412.6

3.66.7

3.85

2.629.8

2.35.1

3.32.9

2.6400

11.84.4

6.23.5

13.14.1

8.23.7

27.92.5

2.92.3

5.32.6

3.12.2

80014

4.87.9

3.413

5.17.2

3.54.5

2.72.7

2.4126

2.82.7

2.2

2100

14.13.8

5.53.1

163.5

5.12.7

41.12.6

4.82.3

53.82.1

2.92.2

2008

3.43.6

2.69.8

3.44.8

2.634.5

2.22.4

2.33.1

2.32.5

2.5400

7.33.2

3.92.6

6.63.1

3.22.6

3.32.5

2.52.5

3.11.9

2.32.2

80014.4

3.53

2.77.4

2.92.8

2.63

22.3

2.12.9

2.22.4

2.2

5100

354.2

14.22.9

33.23.4

11.42.6

104.22.1

212

40.72.1

152

2006.2

2.78.1

2.37.5

2.75.3

2.62.6

22.4

2.12.2

1.92.4

2.3400

3.52.3

2.72.2

3.22.2

2.22.1

2.32

2.12.2

1.92

22.1

8003.2

2.32.2

2.13.1

22.2

1.92

2.12.2

2.32.4

1.92

1.9

10100

73.66.3

39.64.3

67.56.1

37.74.3

1452.73.8

841.63.9

1398.64.2

757.93.9

20019.7

2.823.4

2.714.3

2.815.8

2.43.7

2167.1

2.65.5

1.981

2.4400

2.92.4

2.72.3

2.82.1

2.32

2.61.9

2.62

2.52.1

2.42

8002.4

1.92.1

22.7

2.22.5

2.22.1

2.22.1

2.22.2

22.4

2.1

20100

158.27.2

93.66.5

81.57.1

68.65.9

5285.67.2

258810

3258.57.8

172813

200156

10118.5

7.1121.8

9.779

6.72197.5

4.82947.7

5.71854.8

4.62051

4400

152.7

8.62.3

18.12.9

9.82.5

3.42.1

111.12.5

16.71.8

5.32.3

8002.6

1.92.4

2.22.7

1.92.7

23

22.7

2.12.7

1.92.6

2.3

60


DOI: 10.2870/63610

A.2.3 MSIAH–VAR vs. VAR

Table A.22 shows the ratios of mean squared error for the first elements of the interceptvectors for each regimes, namely A01, A02, over the mean squared error for the first elementof the intercepts vector of similar VAR processes. Table A.23 does it for the autoregressivematrix, and Table A.24 for the first diagonal element of the variance-covariance matrix.

61


DOI: 10.2870/63610

TableA

.22:This

tablecontains

ratiosofthe

mean

squarederrors

ofMSIA

H–V

AR

processesover

them

eansquared

errorsofV

AR

processes,over1000

experiments

foreach

typeofm

odel.Each

cellreportsthe

ratiosfor

bothfirstelem

entsofthe

regime-varying

interceptvectors

A01

andA

02of

MSIA

H–V

AR

processesover

thefirst

element

ofthe

A0

interceptvector

ofV

AR

processesw

ithequivalentparam

eters.Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(A01 ,A

02 )∈(−

1,1),(Σ1 ,Σ

2 )∈(1,5)

(A01 ,A

02 )∈(−

5,5),(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

0.743.49

0.834.78

0.070.37

0.691.33

1.860.68

372

0.0880.077

2.690.49

2000.77

2.460.73

3.150.07

0.380.077

0.9362.7

0.787.4

1.70.062

0.0930.066

0.302400

0.631.94

0.612.14

0.0910.456

0.0940.903

2.690.75

2092

0.0740.116

0.0870.293

8000.6

1.90.51

2.070.1

0.430.094

0.9240.44

0.74384.4

2.40.088

0.1160.087

0.286

2100

1.11.9

1.93.3

0.150.27

0.171.55

11.320.58

27.51.6

1.350.12

7.250.59

2000.4

1.30.73

2.360.086

0.2450.092

0.8544.09

0.4215.6

1.11.128

0.0551.42

0.35400

0.371.13

0.311.88

0.0480.231

0.0450.606

0.240.38

101

0.0360.054

0.490.22

8000.29

1.230.28

1.740.043

0.2440.042

0.6060.25

0.390.3

0.950.045

0.0610.042

0.173

5100

1.41

2.12.1

0.210.22

0.480.97

16.870.41

441.1

4.4350.087

9.30.65

2000.23

0.620.59

1.570.054

0.0980.13

0.431.39

0.1818.56

0.641.92

0.0282.6

0.3400

0.110.62

0.121.39

0.0160.094

0.0160.347

0.0970.163

0.650.56

0.0140.021

0.1360.081

8000.1

0.70.12

1.410.016

0.10.018

0.3010.1

0.180.14

0.520.018

0.0250.018

0.082

10100

1.61.1

1.51.2

0.160.14

0.220.26

33.150.75

44.260.94

3.170.11

7.10.5

2001.15

0.752.3

1.50.38

0.120.6

0.4810.72

0.2148.67

0.428.913

0.08115.75

0.43400

0.0550.399

0.131.15

0.0440.058

0.170.22

0.0450.088

5.690.31

0.7160.013

4.30.18

8000.047

0.3140.069

0.8870.0077

0.05270.017

0.1710.051

0.1030.083

0.3360.0074

0.01160.0079

0.0377

20100

0.490.65

0.540.74

0.0320.074

0.040.23

12.940.74

15.11.3

0.70.063

40.4

2001.6

1.21.6

1.20.16

0.10.16

0.1435.57

0.7342.63

0.553.188

0.0964.91

0.15400

1.250.81

21.2

0.480.12

0.660.22

4.360.092

51.30.2

10.8690.085

14.520.24

8000.023

0.2020.049

0.6960.046

0.0270.36

0.120.028

0.040.078

0.1662.0484

0.00896.656

0.055

62


DOI: 10.2870/63610

TableA

.23:This

tablecontains

ratiosofthe

mean

squarederrors

ofMSIA

H–V

AR

processesover

them

eansquared

errorsofVA

Rprocesses,over1000

experiments

foreachtype

ofmodel.Each

cellreportstheratiosforboth

upper-leftdiagonals

ofA

01and

A02 ,the

matrices

ofregim

einvariant


cientsfor

MSIA

H–V

AR

processesover

thefirst

element

ofthe


cientsm

atrixA

1for

VA

Rprocesses

with

equivalentparam

eters.The

parameters

ofthetrue

datagenerating

processare

describedin

thecolum

nsheaders.

(A01 ,A

02 )∈(−

1,1),(Σ1 ,Σ

2 )∈(1,5)

(A01 ,A

02 )∈(−

5,5),(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

1.93.6

3.65.6

0.432.08

26.4

0.581.38

24.93.3

0.20.63

1.93.3

2001.3

2.92.7

3.30.27

1.580.43

2.351.6

1.364.5

2.10.018

0.5770.02

0.67400

0.922.53

2.62.6

0.311.6

0.421.72

1.61.4

148.52.7

0.0230.658

0.0230.571

8000.9

2.51.4

2.50.36

1.520.41

1.620.063

1.285287.6

3.40.025

0.6490.025

0.559

2100

6.35.2

13.15.1

8.79.8

2416

10.45.1

20.13.9

2414

3921

2002.2

2.65

2.79.9

7.19.7

8.33

2.38.8

2.217

1240

11400

1.62.7

1.52.3

0.431.78

0.681.94

0.151.55

6.42.8

0.0861.423

2916

8000.89

2.341.2

2.10.44

1.720.68

1.750.16

1.750.42

1.610.094

1.7150.17

1.64

5100

18.58.7

27.48.5

2620

5622

24.29.2

6160

6031

21358

2003.5

2.910.2

3.46.4

715

100.97

2.7825

3.716

1178

19400

0.742.54

1.32.5

1.22.3

0.842.96

0.222.28

0.922.27

0.0972.366

0.753.42

8000.9

2.21.3

2.20.5

2.40.9

2.20.21

2.370.7

1.80.11

2.480.24

2.37

10100

2317

2411

3027

3222

3523

4612

5850

9544

20025.4

9.356.6

9.447

4485

4020.1

6.2100.7

8.178

60230

76400

12.5

3.32.7

6.68.5

3512

0.262.83

13.83.4

8.87.6

9720

8000.88

2.551.7

2.50.48

2.634.7

4.20.27

2.930.87

2.560.11

3.040.37

3.38

20100

9.612.9

10.19.9

1113

9.89.8

1722

1613

2927

2520

20039

2942

1938

3641

2955

2772

1661

5388

41400

3919

7713

9885

16067

11.65.7

14010

133102

28783

8000.84

2.642.4

2.911

1179

240.23

3.351.4

2.921

14120

26

63


DOI: 10.2870/63610

TableA

.24:This

tablecontains

ratiosofthe

mean

squarederrors

ofMSIA

H–V

AR

processesover

them

eansquared

errorsofV

AR

processes,over1000

experiments

foreach

typeof

model.

Eachcellreports

theratios

foreach

upper-leftdiagonalof

theregim

e-dependent


atricesΣ

1and

Σ2

forM

SIAH

–VA

Rprocesses

overthe

firstelem

entof

thevariance-covariance

matrix

Σfor

VA

Rprocesses

with

equivalentparameters.

Theparam

etersofthe

truedata

generatingprocess

aredescribed

inthe

columns

headers.

(A01 ,A

02 )∈(−

1,1),(Σ1 ,Σ

2 )∈(1,5)

(A01 ,A

02 )∈(−

5,5),(Σ1 ,Σ

2 )∈(1,25)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

(A11 ,A

12 )∈(−

0.6,0.6)(A

11 ,A12 )∈

(−0.9,0.9)

#eq.

#obs.

P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95P=

0.8P=

0.95

1100

203.4

11.93.1

86.26.2

15.73.4

2083

77410

28353

146617

2008.4

3.16.7

2.82.8

2.62.4

2.57.9

4.2160

202.3

22.3

2.2400

4.32.8

3.12.4

2.92.5

2.52.2

7.13.3

42356

2.52.3

2.52.2

8003.9

2.72.4

22.7

2.22.3

2.22.7

2.2870

1092.2

2.22.2

2

2100

52.73.2

124.73.6

160528

79699

222510

1977.49.2

114289646

294785990

2006.5

2.510.6

2.6182

27281

7857.9

5.732.9

7.729655

20728020

854400

5.62.3

3.12.2

14.63.4

8.77.7

2.42.1

48.84.3

2.22.1

3049145

8002.9

2.22.2

2.32

22.1

2.12.2

2.22.2

2.12

2.11.9

2

5100

256.65.2

1805.3

5456276

7693227

1542923

1428517

10338972879

16776444404

20019.3

2.452.1

2.9776

481481

93246.6

5.21142

11376785

621335191

1806400

2.32.2

2.22.3

116.82.1

22.3

2.22

3.22.7

22.1

8950115

8002.4

22.1

2.22

22.1

1.92.1

1.82.3

2.12.2

22.2

2.1

10100

449.35.5

181.86.6

10397425

6250215

5101440

2471414

16684074828

17607832431

200334.4

4.6309

614532

52912840

64416542

2019733

203077969

70403773396

10337400

3.82.6

13.62.7

129528

3342131

2.62.6

379.59.1

222665455

9296313674

8002.3

1.92.2

2.22.4

2.4121

132.1

22.5

22.2

2.32.3

2.4

20100

1099

5113

2478121

89539

19137.78.4

862415

5335221031

326214378

200869

6376

1117743

6847601

320106968

8241976

162918646

73242502548

3695400

813.14.8

5876

463781494

347391255

1554317

3824634

912227717238

777372521124

8002.9

2.912

32281

7914343

3863.1

328.4

3.1939499

14442417495

6398

64


DOI: 10.2870/63610

BIBLIOGRAPHY 65

Bibliography

Ang, A. and G. Bekaert (2002). Regime Switches in Interest Rates. Journal of Business &Economic Statistics 20(2), 163–182.

Bellone, B. (2005). Classical Estimation of Multivariate Markov-Switching Models usingMSVARlib. Econometrics 0508017, EconWPA.

Dai, Q., K. J. Singleton, and W. Yang (2007). Regime Shifts in a Dynamic Term StructureModel of U.S. Treasury Bond Yields. Review of Financial Studies 20(5), 1669 –1706.

Davig, T. (2004). Regime-Switching Debt and Taxation. Journal of Monetary Economics 51(4),837–859.

Diebold, F. X. and G. D. Rudebusch (1989). Scoring the Leading Indicators. The Journal ofBusiness 62(3), 369–391.

Ehrmann, M., M. Ellison, and N. Valla (2003). Regime-Dependent Impulse ResponseFunctions in a Markov-Switching Vector Autoregression Model. Economics Letters 78(3),295–299.

Goodwin, T. H. (1993). Business-Cycle Analysis with a Markov-Switching Model. Journalof Business & Economic Statistics 11(3), 331–339.

Hamilton, J. D. (1989). A New Approach to the Economic Analysis of Nonstationary TimeSeries and the Business Cycle. Econometrica 57(2), 357–384.

Hamilton, J. D. (1990). Analysis of Time Series Subject to Changes in Regime. Journal ofEconometrics 45(1-2), 39–70.

Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.

Hamilton, J. D. (2008). Regime-Switching Models. The New Palgrave Dictionary of Economics.

Hansen, B. E. (2001). The New Econometrics of Structural Change: Dating Breaks in U.S.Labor Productivity. The Journal of Economic Perspectives 15(4), 117–128.

Hendry, D. (1996). On the Constancy of Time-Series Econometric Equations. Economic andSocial Review 27, 401–422.

Jeanne, O. and P. Masson (2000). Currency Crises, Sunspots and Markov-SwitchingRegimes. Journal of International Economics 50(2), 327–350.

Kim, C. and C. Nelson (1999). State-Space Models with Regime Switching. MIT press Cam-bridge, MA.


DOI: 10.2870/63610

66 APPENDIX A. APPENDIX

Krolzig, H. (1997). Markov-Switching Vector Autoregressions: Modelling, Statistical Inference,and Application to Business Cycle Analysis. Springer Verlag.

Krolzig, H. (1998). Econometric Modelling of Markov-Switching Vector Autoregressionsusing MSVAR for Ox. unpublished, Nuffield College, Oxford.

Lanne, M., H. Lutkepohl, and K. Maciejowska (2010). Structural Vector Autoregressionswith Markov Switching. Journal of Economic Dynamics and Control 34(2), 121–131.

Mitchell, W. and A. Burns (1938). Statistical Indicators of Cyclical Revivals. NBER Bul-letin 69, 1938.

Mizrach, B. and J. Watkins (1999). A Markov Switching Cookbook. In P. Rothman (Ed.),Nonlinear Time Series Analysis of Economics and Financial Data, Chapter 2, pp. 33–43.Dordrecht: Oxford University Press.

Perron, P. (2006). Dealing with Structural Breaks. Palgrave handbook of econometrics 1,278–352.

Psaradakis, Z. and M. Sola (1998). Finite-Sample Properties of the Maximum Likelihood Es-timator in Autoregressive Models with Markov Switching. Journal of Econometrics 86(2),369–386.

R Development Core Team (2009). R: A Language and Environment for Statistical Computing.Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.

Robbins, L. (1932). An Essay on the Nature and Significance of Economic Science. MacmillanLondon.

Sims, C. A., D. F. Waggoner, and T. Zha (2008). Methods for Inference in Large Multiple-Equation Markov-Switching Models. Journal of Econometrics 146(2), 255–274.

Sims, C. A. and T. Zha (2006). Were There Regime Switches in U.S. Monetary Policy?American Economic Review 96, 54–81.

Warne, A. (1999). Estimation of Means and Autocovariances in a Markov Switching VARModel. Manuscript, Research Department, Sveriges Riksbank, Sweden.


DOI: 10.2870/63610

Chapter 2

Nonlinearities in oil markets:Bayesian impulse responses forMarkov-Switching VARs

Abstract. Upon the premise of structural changes occurring on oil marketsbetween 1973 and 2007, I reinvestigate the empirical results of Kilian (2009)by simulating Bayesian impulse responses from a Markov-switching VectorAutoregressive model. These responses are sensitive to the Markov-switchingproperties of the model and, based on densities, allow statistical inference tobe conducted. The effects of a global oil production disruption, of an unantic-ipated aggregate demand expansion, and of an unanticipated oil-market spe-cific demand shock are characterized over the four estimated regimes, whichconsist of two dominant regimes, one prior to and the other posterior to 1986,as well as of two scarcely occurring regimes. Over time, the regime dynamicsare evolving into more competitive oil markets, with the collapse of the OPEC.

I thank Helmut Lutkepohl and Massimilliano Marcellino for their very useful comments. I thank TomaszWozniak for fruitful discussions, particularly regarding the formalization of impulse responses functions intoa rigorous Bayesian notation. Finally, I thank Lutz Kilian for kindly providing data.

67


DOI: 10.2870/63610

68 CHAPTER 2. BAYESIAN IMPULSE RESPONSES FOR MS-VAR MODELS

2.1 Introduction

“Nine out of ten of the U.S. recessions since World War II were preceded by a spike up in oil prices.”

These words constitute the beginning of the survey by (Hamilton, 2008), and illustratehow vital the relationship between oil markets and the economy might be, provided oneestablishes a causality channel between the two entities. Oil markets have a complex struc-ture. The supply side, though split between a large number of producers, is dominated bycountries members of the Organization of Petroleum Exporting Countries [OPEC],1 whichholds 79% of the world crude reserves2 and aims at stabilizing the prices on internationalmarkets. However oil reserves are not a static entity and keep being discovered and ex-ploited, as is the case in Alaska, the North Sea, Canada, or the Russian Federation, hencethe geopolitical influence of the OPEC evolves over time. Moreover, major events such asthe Arab oil embargo, the Iranian revolution, or the Persian Gulf War also change the faceof the supply side. Neither is the situation on the demand side trivial, with an increasingglobal demand, with technological advance reducing dependence to oil, shifting the priceelasticity of demand for gasoline to higher levels. As a result of these complex dynamics,oil has been subject to tremendous price variations.

This work aims to study oil markets from the empirical econometrics approach, specif-ically from the time series perspective, and from the angle of impulse response analysis,modeling various shocks impacting the oil markets and the economy. The segment of lit-erature focusing on oil shocks and their effect on the economy is vast. Important surveysare Hamilton (2008), on the effects of oil shocks on the U.S. economy, or Kilian (2007) onthe same topic. The latter emphasizes the need to account for the endogeneity of energyprices and to differentiate between the effects of demand and supply shocks in energymarkets. I base my analysis on Kilian (2009), where a measure of global real activityis constructed and used in a Structural Vector Autoregressive [SVAR] model to identifythree types of oil shocks: a global oil production disruption, an unanticipated aggregatedemand expansion, and an unanticipated oil-market specific demand shock. The impactof these shocks on the U.S. economy is then studied. Other works of interest includeKilian and Park (2007), studying the effect of oil shocks on the stock market. The work of(Kilian, 2009) influences the way macroeconomists incorporate oil prices in their models.

1The OPEC, founded in 1965, is composed of twelve oil-producing countries made up of Algeria, Angola,Ecuador, Iran, Iraq, Kuwait, Libya, Nigeria, Qatar, Saudi Arabia, the United Arab Emirates, and Venezuela.

22007 figure from ”British Petroleum table of world oil production”.


DOI: 10.2870/63610

2.1. INTRODUCTION 69

Nakov and Pescatori (2007) for instance take a New Keynesian model but adds a dom-inant oil supplier, the OPEC, that optimizes the price of oil endogenously. Nakov andPescatori (2010) estimate the Dynamic Stochastic General Equilibrium model from Nakovand Pescatori (2007) using Bayesian technique, and study the impact of oil shocks on theGreat Moderation, which is found to be nontrivial.

This work differs from Kilian (2009) in that it tests for and acknowledges the param-eters’ instability over the course of the dataset, which covers the years between 1973and 2007. This is not the first time that nonlinearities are postulated and studied for oilmarkets. For instance (Blanchard and Galı, 2007) used a rolling bivariate VAR for thepre-1884 and post-1984 periods, and studied the changes of impact of the oil shocks on theeconomy, with steadily lower impact on the prices, wages, output, and employment overtime. Gronwald (2012) refines the Blanchard and Galı (2007) analysis by using the flexiblemachinery of rolling impulse responses, and Baumeister and Peersman (2012) estimatea Time-Varying Parameters Bayesian Vector Autoregression with stochastic volatility, ahighly flexible framework. They find a decline in the price elasticity of oil demand overtime, and a larger role over time for oil demand shocks in comparison to oil supply shocksfor the variability of the real price of oil. In this study, the tool of Markov-switching VectorAutoregressions [MS-VAR] is chosen. Introduced in the macroeconometric analysis byHamilton (1989) in the context of business cycle analysis, these models — by allowingfor a limited number of regimes —authorize to parsimoniously characterize the economicconditions over a time period.

Belonging to the family of nonlinear models and due to the presence of a latent vari-able representing the state of the economy in the system, impulse response analysis issubject to some limitations within the classical approach. Indeed the difficulty of inte-grating the regime history over the propagation period of the shock forced the existingresearch to make concessions. Either researchers postulated that the regimes were notswitching beyond the original shock, in Ehrmann et al. (2003). Or Krolzig (2006), derivingimpulse responses with switching regimes over the propagation, did not solve the issueof confidence intervals, rendering statistical inference from the responses difficult. Themethodological contribution of this work is the switching to the Bayesian framework.By proceeding so, one can derive impulse responses simulating regime changes over thepropagation horizon. The problem of confidence intervals is also solved, because Bayesiansimulation methods provide the densities of the models parameters. Also the problemof model selection, still an open question in the classical approach, can be tackled by the


DOI: 10.2870/63610


comparison of marginal densities of data calculated from the posterior of different modelspecifications. Finally, Bayesian shrinkage techniques allow for the estimation of mod-els with higher dimension, models which would have complex shapes of the likelihoodfunction and hence be difficult to estimate with classical algorithms.

Bayesian impulse response analysis for Markov-switching Vector Autoregression mod-els is first introduced, then applied to the dataset of Kilian (2009). The obtained regimesseparate the sample into two periods over the date of 1986 — which saw the collapse of theOPEC. The structural changes occurring over time, transforming the oil market into morecompetitive ones were highlighted within each regime’s dynamics. Finally, other regimesidentified brief periods where unanticipated oil-market specific demand increases led toa global decline of the economic activity.

The remainder of this paper is organized as follows. In Section 2.2, the MS-VAR modeland its specificities are introduced. Section 2.3 describes the methodological difficultiesassociated with classical impulse responses and the MS-VAR model, and defines Bayesianimpulse responses. Before describing the Gibbs sampler set into motion to simulatethe posterior densities of the parameters of the model in Section 2.5, the complete-datalikelihood, the prior and the posterior distributions of the model are detailed in Section2.4. Then, Section 2.6 constitutes the empirical example of Bayesian impulse responsesapplied to the complex dynamics of oil markets. Finally, Section 2.7 concludes.

2.2 Markov-switching vector autoregressive model

y = (y1, . . . , yT)′

is a multivariate time series consisting of T observations. Each yt is a N-variate vector for t ∈ {1, . . . ,T}, taking values in a sampling space Y ⊂ RN. y is a realizationof a stochastic process {Yt}Tt=1. The stochastic process Yt depends on the realizations, st, ofa hidden discrete stochastic process St with discrete state space {1, . . . ,M}. This class ofmodels has been introduced in time series analysis by Hamilton (1989). Conditioned onthe state, st, and realizations of y up to time t− 1, yt−1, yt follows an independent identicalnormal distribution. The reduced form of the model is known, according to the taxonomyof Krolzig (1997), as a MSIAH-VAR(p). The intercept A(0)

st, the lag polynomial matrices A(i)

st,

for i = 1, . . . , p, and the covariance matrices Σst , all depend on the state st = 1, . . . ,M.


DOI: 10.2870/63610

2.2. MARKOV-SWITCHING VECTOR AUTOREGRESSIVE MODEL 71

yt = A(0)st+

p∑i=1

A(i)st

yt−i +Ut, (2.1)

Ut|st ∼ i.i.N(0,Σst), (2.2)

for t = 1, . . . ,T. The vector of initial values y0 = (yp−1, . . . , y0)′ is set to the first p observa-tions of the available data.St is assumed to be an irreducible aperiodic hidden Markov process starting from its er-godic distributionπ = (π1, . . . , πM), such that Pr(S0 = i|P) = πi. Its properties are describedby the (M ×M) transition probabilities matrix:

P =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

p11 p12 . . . p1M

p21 p22 . . . p2M...

.... . .

...

pM1 pM2 . . . pMM

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦,

in which an element, pij, denotes the probability of transition from state i to state j,pij = Pr(st+1 = j|st = i). The elements of each row of matrix P sum to one,

∑Mj=1 pij = 1.

Structural identification Through structural identifying restrictions, the model is trans-formed into a structural model with orthogonal, thus interpretable shocks. Numerousidentification schemes have been used in the applied literature and are presented inLutkepohl and Kratzig (2004). This paper focuses on the recursive identification scheme,i.e. a certain ordering is imposed to the variables, hence assuming that contemporaneousinteractions among variables are recursive. This corresponds to a B model, allowing forinstantaneous effect of the shocks on the variables. It is written as:

yt = A(0)st+

p∑i=1

A(i)st

yt−i + Bstεt, (2.3)

εt ∼ i.i.N(0, IN), (2.4)


DOI: 10.2870/63610


where the structural disturbances are normalized to 1.

The structural form residuals in equation 2.4 are obtained from the reduced formresiduals from equation 2.2, by applying the transformation Ut = Bstεt. The B matrix isobtained from the decomposition of the reduced form covariance matrix, such as it satisfiesΣst = BstB

′st

. In this paper, the Bst matrices are restricted to the special case of triangularmatrices. Under such triangular matrices, Waggoner and Zha (2003) state that the numberof restriction on each Bst is N(N − 1)/2, which implies a one to one mapping between eachΣst and each Bst . Imposing a triangular Bst makes the estimation of the structural modelfeasible from the estimated reduced form model. Such a restriction is not necessary toimpose and the estimation of models with different restrictions on Bst require a modifiedGibbs sampling, containing an additional block for Bst , which could be based on the workof Waggoner and Zha (2003).

2.3 Impulse responses

Since the influential contribution for macroeconometrics of Sims (1980), the dynamicinteraction between the variables and disturbances in vector autoregressive models havebeen best described and interpreted by impulse response functions. Impulse responseanalysis consists in a counterfactual experiment, where a shock is assigned to one variableof the system and where the propagation of this shock on all the variables of the systemis studied over time. In order to remove any instantaneous correlation between the shock— and render them interpretable in economic theory — practitioners orthogonalize theimpulse responses. Provided that, structural shocks instantaneously impact one variable ata time. In the econometric terminology one speaks about structural identification, wherethe reduced form residuals, Ut, become structural residuals, εt, using a transformationinvolving a matrix, Bst , defined as:

BstB′st= Σst (2.5)

Orthogonalization of impulse responses, taking place in equation (2.5), can involvea number of identification strategies. This paper concentrates on the popular Choleskydecomposition of Σust

, for it allows to estimate reduced form models which are more con-venient to handle computationally speaking. The Cholesky decomposition of Σust

makesthe Bst matrix lower triangular, hereby rendering the model recursive. In the impulseresponses framework, the causal interpretation of a recursive system is the following: a


DOI: 10.2870/63610

2.3. IMPULSE RESPONSES 73

shock on the first variable can only have an instantaneous impact on the first variable ofthe system, a shock on the second variable instantaneously impacts the first and secondvariables, and so on, until perturbations in the last variable which can have an instanta-neous effect on all the variables of the system. For other strategies, a presentation of thedifferent approaches used to perform structural identification can be found in Lutkepohland Kratzig (2004).

The following paragraphs define impulse response functions, the specificities of theiruse within a Markov-switching model. Furthermore, issues with the classical paradigmare outlined, before developing a Bayesian framework that mitigates these difficulties.

2.3.1 Impulse response function for MS-VARs

While the framework for impulse responses functions within structural vector autoregres-sions is well established in the literature, as surveyed in Lutkepohl (2008), the case fornonlinear models is less covered. Due to the mechanics of switching regimes, VAR modelsincorporating Markov-switches belong to the latter category. A good starting point forhandling impulse responses within nonlinear models is the paper by Koop et al. (1996),defining generalized impulse responses. Traditional and generalized impulse responses func-tions differ in that the latter treats the impulse response function as a random variable. Ageneralized impulse response is measured as the difference between the conditional ex-pectations of the model after a shock impact and the conditional expectation of the modelwithout any shock impact. Formally, at the horizon h, a generalized impulse response toa perturbation ∇ε impacting the system at time t would be written as:

GIR∇ε(h) = E[yt+h|S,y, εt + ∇ε] − E

[yt+h|S, y, εt] , (2.6)

where the response GIR∇ε(h) depends on the history of regime S, where S = (s1, . . . , sT)′, i.e.in which state is the system at the time of the impact. Two lines of work have emerged forderiving impulse responses for Markov-switching Vector Autoregressions, both of themtaking into account the history of the system as well as the type of shocks impacting it.

In the first approach — Ehrmann et al. (2003) — pleads for the use of regime-dependentimpulse responses. Under the assumption of short time horizon and persistent enoughregimes, regime-dependent impulse responses neglect regime changes throughout theshock propagation. The response is calculated as in equation (2.6), with a shock ∇εhitting the system at time t. At time t, the system is in state st, and it is assumed to


DOI: 10.2870/63610


stay in this regime over the propagation of the shock. Within each regime, responses arederived exactly as if the system was a vector autoregressive model, with autoregressivecoefficients A(i)

stand the structural identification restriction Bst defined such as BstB

′st= Σst ,

while nonlinearities are disregarded.

The second approach of Krolzig (2006) acknowledges the existence of a Markov chaininfluencing the propagation of the shock on the response of the system. First, the condi-tional expectation of the future regimes, st+h for h = 1, ...,H, is derived given the regime atthe impact, st, as well as given the transition probabilities, P, as follows:

E [st+h|st] = Phξt,

where ξt collects the information about the realization of st:

ξt =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣I (st = 1)...

I (st =M)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ .

Then, using a linear state space representation incorporating the hidden Markov pro-cess, (Krolzig, 2006) derives the algebra for a moving average representation of the model,leading to impulse response functions. At time t, the model — in state st — is hit by ashock ∇ε. The impulse response function for periods t + h for h = 1, ...,H, incorporatingthe uncertainty about future states through the conditional expectation of st+h, is more inline with the spirit of nonlinear impulse responses.

The next paragraph discusses the shortcomings of Ehrmann et al. (2003) and Krolzig(2006).

2.3.2 Issues with the existing approaches

Regimes change over time In Ehrmann et al. (2003),the regime-dependent impulse re-sponse function describes the relationship between endogenous variables and fundamen-tal disturbances within each Markov-switching regime. The assumption of non-varyingregimes, while greatly simplifying the derivation of an impulse response into the knownframework of vector autoregressions, can be seen as draconian if the regimes were likelyto change over the horizon considered for the impulse responses.

Figure 2.1 draws the probabilities of staying in the same regime over horizons, for


DOI: 10.2870/63610


diversely persistent states. The figure is illustrative in showing how unlikely it is for aregime to remain constant over larger horizons. When the diagonals of the matrix oftransition probability for the regime, pii, are very low, 0.5 for instance, after only severalperiods the probability of regime invariance drops to almost zero. At the horizon of 10periods, for a more persistent regime with pii = 0.8, this probability also dramaticallydrops to 0.11. For a reasonably persistent regime with pii = 0.95, the probability is stillquite low, with a value of 0.6 at the 10 periods horizon. Lastly, even for the most persistentregime considered, pii = 0.99, the probability of regime invariance drops to 0.91 after 10periods and to 0.82 after 20 periods.

Hence, some doubt can be cast on the assumption of impulse responses drawn underregime invariance. Algebraically there is a simplification of using the assumption, but ourexample shows that this framework is not ideal.

Figure 2.1: Probabilities for the hidden Markov process staying in the same regime, overdifferent horizons, and given the diagonal element of the transition probability matrix.Five values are presented, from a persistent regime with pii = 0.99 to an unstable regimewith pii = 0.5. The probabilities, pii,h, are derived from the formula pii,h = ph

ii.

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

Horizon

Pro

babi

lity

of re

gim

e in

varia

nce

P = 0.99P = 0.95P = 0.9P = 0.8P = 0.5

Statistical inference Krolzig (2006) solves the first issue of changing regimes, however


DOI: 10.2870/63610


remains the issue of confidence intervals. Due to the complex analytical expressions ofthe asymptotic variance of the impulse response coefficients but also to the small sizeof datasets, in applied work related to the vector autoregressive framework, statisticalinference regarding impulse responses is carried on thanks to bootstrapping techniques.The bootstrap methodology allows to estimate the distribution of an estimator or of a teststatistic, and consists in resampling data or a model estimated from the data. For instanceKilian (2009) constructs confidence intervals using a recursive-design wild bootstrap de-scribed in Goncalves and Kilian (2004). This technique involves the creation of pseudotime series with autoregressive coefficients estimated from the ordinary least squares VARand with resampled residuals, ε∗t = εtηt, with εt the residuals of the Ordinary Least SquaresVAR estimation, and where ηt is an i.i.d. sequence with 0 mean and standard deviation 1.

Bootstrapping a Markov-switching VAR model raises new challenges because of thelink existing between the history of the regimes and the distributions of the residuals.To apply the recursive-design wild bootstrap of Goncalves and Kilian (2004), one wouldfirst need to estimate the MS-VAR — through an Expectation Maximization algorithmfor instance — then to resample the history of regimes, to assign the EM residuals inorder to generate new time series, and to estimate them with an EM algorithm a sufficientnumber of times. First of all, chaining iterations would be computationally very costly,but most importantly it is not feasible. Indeed resampling a history of regimes wouldcreate a different path or history of regimes for the hidden Markov process. Between theoriginally estimated path and the newly generated one, there would not be a one to onecorrespondence of the regimes, hence assigning residuals to the correct regime becomesan issue. This renders the resampling of the data problematic, and the construction ofconfidence intervals cumbersome.

The next section is dedicated to the attempt of resolving these issues by entering theBayesian paradigm.

2.3.3 The Bayesian proposition

The issues raised in the previous section motivates a shift to the Bayesian methodology,where simulation methods render the computation of the densities of impulse responsesstraightforward. Bayesian impulse responses for vector autoregressive models are wellcovered in Ni et al. (2007). The authors argue for the simplicity of computing them, as longas the posteriors for the VAR parameters and the orthogonalization matrix for structural


DOI: 10.2870/63610


identification are available. Indeed there is a one-to-one mapping between the impulseresponses and these parameters. The simulations of the posteriors of the parameters of aVAR model, joint to the simulations of its identification matrix B hence directly yields theposterior densities of the impulse response function.

In this paper, focusing on MS-VAR models, the procedure for producing posterior den-sities of impulse responses is straightforward, but nevertheless requires some refinementsto be described. The non-linearities of the model due to the latent state must be dealt with.The simulations are carried on by a Gibbs sampler detailed in Section (2.5), which drawsfrom the posteriors of the model parameters.

Notation Define the impulse response function for Markov-switching vector autoregres-sion models, IR(∇ε,s0,t), as the response at horizon t of the system to a shock ∇ε, impactingthe system being in the regime s0 at time 0:

IR(∇ε,s0,t=0) = Bs0∇ε, (2.7a)

IR(∇ε,s0,t=h) =

p∑i=1

A(i)(sh)IR(∇ε,s0,t=h−p), (2.7b)

for h = 1, 2, . . . . Notice that IR(∇ε,s0,t=h) in equation (2.7b), although it is not explicitlymarked in the notation, depends also on s1, . . . , sh.

Define θ ∈ Θ ⊂ Rk as the vector of size k collecting the parameters of the transitionprobabilities matrix P as well as of all the state-dependent parameters of the VAR process,θst : A0

st, A(i)

st, Σst , for st = 1, . . . ,M and i = 1, . . . , p. The impulse response function (2.7) is

in fact a function of the parameters of the model, θ, and the initial shock, ∇ε. This fact isexpressed in the notation. Denote by

IR(θ;∇ε) =(IR(∇ε,s0,t=0), IR(∇ε,s0,t=1), . . . , IR(∇ε,s0,t=h)

)′(2.8)

a collection of the impulse responses function at period t = 0, 1, . . . , h. Finally, let S+ =

(s1, s2, . . . , sh)′

be a h × 1 column vector of realizations of the hidden Markov process forperiods t = 1, 2, . . . , h.

Posterior distribution of the impulse response function The conditional posterior den-sity of the impulse response function given the data, the initial state of the economy, and


DOI: 10.2870/63610


the initial shock, is defined as:

p(IR(θ;∇ε)|∇ε, s0; y

). (2.9)

Notice that IR(θ;∇ε), as defined in equation (2.7), depends on a vector of the initialshock, ∇ε, the state of the economy at period t = 0, s0, the parameters of the model, θ,and the forecasted states of the economy, S+. However, the idea in this analysis of theimpulse response function for Markov-switching VAR models being is to not conditionthe IRF on any other state of the economy but solely on s0. Therefore, the posterior densityof the impulse response function needs to be constructed integrating out the future states,S+. The following equation defines the posterior distribution of the impulse responsefunction:

p(IR(θ;∇ε)|∇ε, s0; y

)=

∫p(IR(θ;∇ε)|S+,∇ε, s0; y

)p(S+|s0, θ)dS+. (2.10)

Simulation of the future states of the economy, S+ Let θ(l) denote a sample drawn fromthe posterior distribution p(θ|y), the subscript l denoting the iteration step of the MCMCsampler. The distribution of future states of the economy, S+ = (s1, . . . , sh), conditioned onthe initial realization of the hidden Markov process, s0, and the parameters of the model,θ(l), has a form of:

p(S+|s0, θ

)= p

(sh|sh−1, θ

(l))

p(sh−1|sh−2, θ

(l)). . . p

(s2|s1, θ

(l))

p(s1|s0, θ

(l)). (2.11)

Each of the distributions used on the right-hand side of equation (2.11) is the mth rowof the transition probabilities matrix given that sj−1 = m:

p(sj|sj−1 = m, θ(s)

)= P

(s)m·.

Simulation of the posterior density (2.10) Given the sample drawn from the posteriordistribution of the parameters, θ(l), compute the draws from the posterior distribution ofthe impulse response function (2.10) following the steps:

1. Given s0 = m, sample sj recursively for j = 1, . . . , h from p(sj|s(l)j−1, θ

(l)), obtaining

draws{s(l)

1 , . . . , s(l)h

}.


DOI: 10.2870/63610


2. Given ∇ε, θ(l) and s(l)1 , . . . , s

(l)h , compute: IR(l)

(∇ε,s0,t)for t = 0, 1, . . . , h, obtaining a sample

drawn from the posterior distribution (2.10):

{IR(θ;∇ε)(l)

}.

Repeating the operation over numerous Gibbs iterations yields the sampled posteriordistribution of the impulse response function. The features that render the hitherto definedBayesian impulse responses for MSVARs useful for the practitioners can be summarizedas:

1. Integration over the regimes: Incorporating regime switches in the MCMC simulationof impulse responses, unlike in the regime-dependent ones from Ehrmann et al.(2003), makes the responses sensitive to the uncertainty of unknown future regimes.Provided that changes of regime are likely to occur over the horizon of a simulatedimpulse response analysis, one must take into account the latent variable propertiesof the MS-VAR model.

2. Confidence intervals for impulse responses: The outcome of the simulations are poste-rior densities of the impulse response function. Densities means that the Bayesiancounterpart of confidence intervals, namely the highest posterior density regions arereadily obtained. Hence, as an improvement over Krolzig (2006), inference aboutthe uncertainty of impulse responses is feasible. This is provides added-value forapplied work, as one can assess the findings drawn from the impulse responses witha statistically motivated conviction.

3. Model selection: In classical econometrics, under an unknown number of regimes, onecannot rely on statistical inference drawn from standard maximum-likelihood the-ory, as discussed in Garcia (1998), because when the number of regimes is unknown,the asymptotic null distribution of the likelihood ratio (LR) test becomes nonstan-dard. One could rely on the Akaike information criterion (AIC), but (Smith et al.,2006) argue about the tendency of the AIC to overestimate the number of regimes.3

The Markov-switching criterion developed in Smith et al. (2006) and based on theprinciple of the minimum Kullback-Leiber divergence, imposes a different penaltyterm than Akaike and is shown to perform better than it to jointly determine the

3The AIC is also not consistent and known to overfit the number of lags in VAR models.


DOI: 10.2870/63610


number of regimes and variables to retain in the model. Unfortunately the Markov-switching criterion is only suited to univariate series. Bayesian estimation however,through the estimation of marginal densities of data, allows to properly discriminatebetween models with different regimes and different lags. From the point of view ofmodel specification, Bayesian analysis is well suited to non-standard models suchas MS-VARs. A discussion of the model selection procedure and computation ofthe marginal density of data used in this paper is to be found in Droumaguet andWozniak (2012). Marginal density of data is computed with the Modified HarmonicMean (MHM) method of Geweke (1999).

4. Bayesian shrinkage: Finally, Bayesian estimation, with the possibility of prior incorpo-ration, can help to circumvent the curse of dimensionality appearing when dealingwith higher dimensional VAR models or even when specifying more regimes inMSVARs. The procedure developed by Litterman (1986), assigning low a priorivalues to parameters thought to play a less important role in the model, allows toestimate models with many variables and regimes. Otherwise — in the classicalcase — these would be models computationally difficult to handle because of aproblematic likelihood function.

Of course Bayesian estimation through MCMC sampling comes at the price of addi-tional computational burden, and while the advantages listed above are consequent, oneshould not forget to mention that operations such as model selection can be time consum-ing. Prior dependency should also be mentioned, though the evaluation of their impactcould be the topic of a separate paper. In this one, the standard priors usually used theliterature are incorporated to the model.

The next sections introduce the details pertaining to the Bayesian estimation of theconsidered MSVAR model.

2.4 Likelihood, prior, and posterior

Before introducing the algorithm for the MCMC simulation of the posterior densities ofthe parameters of the model, this section introduces and describes the properties of theMSVAR model in terms of complete-data likelihood, so that the Bayesian estimation canbe conducted with the simulations methods of Section 2.5.The prior and posterior of themodel are described below.


DOI: 10.2870/63610

2.4. LIKELIHOOD, PRIOR, AND POSTERIOR 81

Complete-data likelihood function The classical estimation of the reduced form modelconsists in maximizing the likelihood function with e.g. the EM algorithm (see Krolzig,1997; Kim and Nelson, 1999b). Impulse responses for Markov-switching VAR models— with error bands — however require the simulation of the history of the regimes,hence the Bayesian inference is the convenient approach. (For details of a standardBayesian estimation and inference on Markov-switching models, the reader is referred toFruhwirth-Schnatter, 2006).

As stated by Fruhwirth-Schnatter (2006), the complete-data likelihood function is equalto the joint sampling distribution p(S,y|θ) for the complete data (S, y) given θ. Thisdistribution is now considered to be a function of θ for the purpose of estimating theunknown parameter vector θ. It is further decomposed into a product of a conditionaldistribution of y given S and θ, and a conditional distribution of S given θ:

p(S, y|θ) = p(y|S, θ)p(S|θ). (2.12)

The former is assumed to be a conditional normal distribution function of Ut, fort = 1, . . . ,T, given the states, st, with the mean equal to a vector of zeros and Σst as thecovariance matrix:

p(y|S, θ) =T∏

t=1

p(yt|S,yt−1, θ) =T∏

t=1

(2π)−K/2|Σst |−1/2 exp{−1

2U′tΣ−1st

Ut

}. (2.13)

The form of the latter comes from the assumptions about the Markov process and isgiven by:

p(S|θ) = p(s0|P)M∏

i=1

M∏j=1

pNij(S)i j , (2.14)

where Nij(S) = #{st−1 = j, st = i} is a number of transitions from state i to state j, ∀i, j ∈{1, . . . ,M}.

A convenient form of the complete-data likelihood function (2.12) results from repre-senting it as a product of M + 1 factors. The first M factors depend on the state-specificparameters, θst , and the remaining one depends on the transition probabilities matrix, P:

p(S, y|θ) =M∏

i=1

⎛⎜⎜⎜⎜⎜⎝∏t:st=i

p(yt|yt−1, θi)

⎞⎟⎟⎟⎟⎟⎠M∏

i=1

M∏j=1

pNij(S)i j p(s0|P). (2.15)


DOI: 10.2870/63610


Prior distribution The independence of the state-specific parameters and the transitionprobabilities matrix is assumed. This allows the possibility to incorporate prior knowledgeof the researcher about the state-specific parameters of the model, θst , separately for eachstate. The factorization of the likelihood function (2.15) is maintained by the choice of theprior distribution in the following form:

p(θ) =M∏

i=1

p(θi)p(Pi·). (2.16)

For the MSIAH-VAR(p) model, the following prior specification is assumed: We as-sume that the states of an economy are persistent over time (see e.g. Kim and Nelson,1999a). To accomplish that each row of the transition probabilities matrix P, a priori fol-lows a M variate Dirichlet distribution, with parameters set to 1 for all the transitionprobabilities, with the exception of the diagonal elements Pii, for i = 1, . . . ,M, for whichit is set to 10. The Dirichlet distribution is chosen as the multivariate generalization ofthe beta distribution and is capable of drawing transition probabilities for a number ofregimes superior to two.

The state-dependent parameters of the vector autoregressive processes are collectedin vectors βst = (A0

st

′,vec(A(1)

st)′, . . . ,vec(A(p)

st)′)′, for st = 1, . . . ,M. I assume the shrinkage

Litterman prior introduced by Doan et al. (1983) and Litterman (1986). These parametersfollow a priori a (N+pN2)-variate Normal distribution, with mean equal to a vector of zerosand a diagonal covariance matrix. Elements on the diagonal of the covariance matrix Vβare determined by a set of of hyper-parameters, (λ1, λ2, λ3, c)′ and are as follows:

(ςiλ3)2 for A0i.st, (2.17a)

(λ1

exp(ck − c)

)2

for A(k)ii.st, (2.17b)

(λ1λ2

exp(ck − c)ςi

ς j

)2

for A(k)i j.st, (2.17c)

for i, j = 1, . . . ,N and i � j, and k = 1, . . . , p. The variances of the prior distribution arescaled using the variances of the residuals of the autoregressions of order 17 for each of thevariables ςi, for i = 1, . . . ,N. These prior distributions are the same irrespectively of thestate. The value of the hyper-parameter responsible for shrinking of the constant terms, λ3,


DOI: 10.2870/63610

2.4. LIKELIHOOD, PRIOR, AND POSTERIOR 83

is set to 0.033, as in Robertson and Tallman (1999). The overall shrinking hyper-parameterfor the autoregressive parameters, λ1, is set to 0.3 as in Adolfson et al. (2007). The valuesof the variances of the prior distributions decrease with the indicator for lag, k, accordingto the exponential pattern proposed by Robertson and Tallman (1999) in the denominatorof equations (2.17b) and (2.17c). The value of the hyper-parameter c is set to -0.13412,following the pattern of Robertson and Tallman for monthly data. Finally, the value of λ2

is set to 1, in order not to shrink the off-diagonal parameters of the autoregressive matricesmore than the diagonal parameters.

Finally, the priors for the state-dependent covariance matrices of the model, Σst , area priori set to an inverted-Whishart distribution as is standard for covariance matrices,following Koop and Korobilis (2010). The parameters of the distribution are the scale, setto a diagonal matrix of N ×N dimensions, and the degrees of freedom, set to N + 1.

To summarize, the prior specification (2.16) takes the detailed form:

p(θ) =M∏

i=1

p(Pi)p(βi)p(Σi), (2.18)

where each of the prior distributions is as assumed:

Pi· ∼ DM(ı′M + 9IM.i·)

βi ∼ N(0,Vβ)

Σi ∼ IW(IN,N + 1)

for i = 1, . . . ,M and j, k = 1, . . . ,N, where ıM is a M × 1 vector of ones and IM.i· is ith row ofan identity matrix IM.

Posterior distribution The structure of the likelihood function (2.15) and the prior distri-bution (2.18) have an effect on the form of the posterior distribution that is proportional tothe product of the two densities. The form of the posterior distribution, p(θ|y,S), resultingfrom the assumed specification, is as follows:

p(θ|y,S) ∝M∏

i=1

p(θi|y,S)p(P|y,S). (2.19)


DOI: 10.2870/63610


It is now easily decomposed into a posterior density of the transition probabilities matrix:

p(P|S) ∝ p(s0|P)M∏

i=1

M∏j=1

pNij(S)i j p(P), (2.20)

and the posterior density of the state-dependent parameters:

p(θi|y,S) ∝∏t:St=i

p(yt|θi, yt−1)p(θi). (2.21)

The non standard form of the posterior density is dealt with numerical methods, morespecifically a Monte Carlo Markov Chain (MCMC) algorithm, the Gibbs sampler (seeCasella and George, 1992, and references therein). The algorithm is presented in detail inSection 2.5.

2.5 Gibbs sampler

This section scrutinizes the MCMC sampler set up for sampling from the full conditionaldistributions. The Gibbs sampler setup is largely similar to the one used in Droumaguetand Wozniak (2012), with the main differences being in the Litterman priors used for theautoregressive coefficients, and the simulation from the covariance matrices, here drawingfrom an inverse-Wishart distribution. Each step describes the full conditional distributionof one element of the partitioned parameter vector. The parameter vector is broken up intofour blocks: the vector of the latent states of the economy, S, the transition probabilities, P,the regime-dependent covariance matrices,Σst , and finally the regime-dependent vector ofconstants plus autoregressive parameters, βst . For each block of parameters – conditionallyon the parameter draws from the three other blocks – this section describes how to samplefrom the posterior distribution. The symbols, l and l−1, refer to the iteration of the MCMCsampler. For the first iteration of a MCMC run, l = 1, initial parameter values come froman expectation maximization algorithm. The remainder of this section describes all theconstituting blocks that form the MCMC sampler.


DOI: 10.2870/63610

2.5. GIBBS SAMPLER 85

2.5.1 Sampling the vector of the states of the economy

The first drawn parameter is the vector representing the states of the economy, S. Being alatent variable, there are no priors nor restrictions on S. We first use a filter (see Section11.2 of Fruhwirth-Schnatter, 2006, and references therein) and obtain the probabilitiesPr(st = i|y, θ(l−1)), for t = 1, . . . ,T and i = 1, . . . ,M, and then draw S(l), for lth iteration of thealgorithm.

Algorithm 1. Multi-move sampling of the states.

1. BLHK filter: Inherited from classical inference, and following its description fromKrolzig (1997), it performs the filtering and smoothing operations on the regimeprobabilities ξt. ξt denotes the probabilities for the unobserved state of the system.

ξt =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣Pr (st = 1)...

Pr (st =M)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

The filter, introduced by Hamilton (1989), is an iterative algorithm calculating thefiltered probabilities, or optimal forecast ξt+1|t of the value of ξt+1 on the basis ofthe information set in time t consisting of the observed values of yt, namely yt =

(y′t, y

′t−1, . . . , y

′1−p)

′. The initial state ξ1|0 is initialized with the vector of ergodic regime

probabilities ξ = π, where π satisfies the equation Pπ = π. This step is a forwardrecursion, i.e. for t = 1, . . . ,T, written as:

ξt+1|t =P′ (ηt � ξt|t−1

)1′M

(ηt � P

′ ξt−1|t−1

) ,

where � denotes the element-wise matrix multiplication and ηt is the collection ofM densities, defined as:

ηt =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣Pr

(yt|st = 1, yt−1, θ(l−1)

)...

Pr(yt|st =M,yt−1, θ(l−1)

)⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ .

To compute the smoothed probabilities, full-sample information is used to make an


DOI: 10.2870/63610


inference about the unobserved regimes by incorporating the previously neglectedsample information yt+1···T = (y

′t+1, · · · , y

′T)′

into the inference about ξt. This step isa backward recursion, for j = 1, . . . ,T − 1. The iteration consists of the followingequation:

ξT− j|T =[P

(ξT− j+1|T � ξT− j+1|T− j

)]� ξT− j|T− j,

where � denotes the element-wise matrix division.

2. Using the smoothed probability ξT|T as the conditional distribution for sT|y, θ(l−1),we sample sT.

3. The conditional distributions for st|st+1,y, θ(l−1) with t = T − 1,T − 2, . . . , 0 are givenby the smoother:

ξt|t+1|st+1,y, θ(l−1) =

[P

(ξt+1 � ξt+1|t− j

)]� ξt|t.

St(l)|y, θ(l−1) is thereby sampled for all periods, t = 1, . . . ,T.

2.5.2 Sampling the matrix of transition probabilities

In the second step of the MCMC sampler, we draw from the posterior distribution of thetransition probabilities matrix, conditioning on the states drawn in the previous step ofthe current iteration, P(l) ∼ p(P|S(l)).

Notation Each row of the transition probabilities matrix P is modeled with M vectors Pj,j = 1, · · · ,M. Let all the elements of Pj belong to the (0, 1) interval and sum up to one. Thematrix of transition probabilities P is irreducible, forbidding the presence of an absorbingstate that would render the Markov chain S non-stationary. The non-independence of therows of P when using π as Pr(s0) is described in Fruhwirth-Schnatter (2006, Section 11.5.5).Once the initial state s0 is drawn from the ergodic distribution π of P, direct MCMCsampling from the conditional posterior distribution becomes impossible. However, aMetropolis-Hastings algorithm can be set up to circumvent this issue, since a kernel ofjoint posterior density of all rows is known: p(P|S) ∝ π∏M

j=1DM(Pj). Hence, the proposalfor transition probabilities is obtained by sampling each Pj from the convenient Dirichletdistribution. The priors for Pj follow a Dirichlet distribution, Pj ∼ DM(b1, j, . . . , bM, j).


DOI: 10.2870/63610

2.5. GIBBS SAMPLER 87

Then all Pj are gathered into the candidate matrix of transitions probabilities. Finally, theacceptance rate is computed before retaining or discarding the draw.

Algorithm 2. Metropolis-Hastings for the restricted transition matrix.

1. s0 ∼ π. The initial state is drawn from the ergodic distribution of P.

2. Pj ∼ DM(n1, j + b1, j, . . . , nM, j + bM, j) for j = 1, . . . ,M. ni, j corresponds to the numberof transitions from state i to state j, counted from S. The rows for the candidatetransition probabilities matrix are sampled from a Dirichlet distribution.

3. Pnew, the proposal for the transitions probabilities matrix, is reconstructed.

4. Accept Pnew if u ≤ πnew

πl−1 , where u ∼ U[0, 1]. πnew andπl−1 are the vectors of the ergodicprobabilities resulting from the draws of the transition probabilities matrix Pnew andPl−1 respectively.

2.5.3 Sampling the covariance matrices

In the third step of the Gibbs sampler, the posteriors for the covariance matrices of themodel, Σst , are drawn from an inverse-Wishart distribution, as described in Koop andKorobilis (2010). In Bayesian statistics, the inverse-Wishart distribution — defined onreal-valued positive-definite matrix — is used as the conjugate prior for the covariancematrix of a multivariate normal distribution. The covariance matrix for each regime isdrawn separately and sequentially.

Algorithm 3. Drawing of regime dependent covariance matrices The algorithm iterates on allthe covariance matrices Σst for st = 1, . . . ,M.

1. Σi|y,S,P, β ∼ IW(Si, νi), for the regimes i = 1, . . . ,M. SampleΣi from the conditionalinverse-Wishart posterior distribution, with the following parameters:

Si = S +∑t:st=i

⎛⎜⎜⎜⎜⎜⎜⎝yt − A0i −

p∑j=1

A( j)i yt− j

⎞⎟⎟⎟⎟⎟⎟⎠⎛⎜⎜⎜⎜⎜⎜⎝yt − A0

i −p∑

j=1

A( j)i yt− j

⎞⎟⎟⎟⎟⎟⎟⎠′

,

andνi = Ti + ν.


DOI: 10.2870/63610


2.5.4 Sampling the vector autoregressive parameters

Finally, the state-dependent autoregressive parameters, βst for st = 1, . . . ,M, are drawn.The Bayesian parameter estimation of finite mixtures of regression models when the re-alizations of states is known has been precisely covered in Fruhwirth-Schnatter (2006,Section 8.4.3). The procedure consists of estimating all the regression coefficients simul-taneously by stacking them into β = (β0, β1, . . . , βM), where β0 is a common regressionparameter for each regime. This for instance allows to estimate models with constantintercepts or constant autoregressive coefficients. The regression model is written as:

yt = Ztβ0 + ZtDi.1β1 + · · · + ZtDi.MβM + εt, (2.22)

εt ∼ i.i.N(0,Σst). (2.23)

M dummies Di.st are introduced. They take the value 1 when the regime occurs andset to 0 otherwise. A transformation of the regressors ZT also has to be performed in orderto allow for different coefficients on the dependent variables, for instance to impose zerorestrictions on parameters. In the context of VARs, Koop and Korobilis (2010, Section 2.2.3)detail a convenient notation that stacks all the regression coefficients on a diagonal matrixfor every equation. The notation is adapted by stacking all the regression coefficients for allthe states on diagonal matrix. If zn.st.t corresponds to the row vector of 1+Np independentvariables for equation n, state st (starting at 0 for regime-invariant parameters), and attime t, the stacked regressor Zt will be of the following form:

Zt = diag(z1.0.t, . . . , zN.0.t, z1.1.t, . . . , zN.1.t, . . . , z1.M.t, . . . , zN.M.t).

This notation enables the restriction of each parameter, by simply setting zn.st.t to 0where desired.

Algorithm 4. Sampling the autoregressive parameters. We assume a normal prior for β, i.e.β ∼ N(0,Vβ).

1. For all Zts, impose restrictions by setting zn,st,t to zero accordingly.

2. β|y,S,P, σ,R ∼ N(β,Vβ). Sample β from the conditional normal posterior distribu-


DOI: 10.2870/63610

2.6. NONLINEARITIES IN OIL MARKETS 89

tion, with the following parameters:

Vβ =

⎛⎜⎜⎜⎜⎜⎝V−1β +

T∑t=1

Z′tΣ−1st

Zt

⎞⎟⎟⎟⎟⎟⎠−1

and

β = Vβ

⎛⎜⎜⎜⎜⎜⎝T∑

t=1

Z′tΣ−1st

yt

⎞⎟⎟⎟⎟⎟⎠ .

2.6 Nonlinearities in oil markets

This section aims to apply the previously introduced methodology to the oil marketsand specifically their shock identification. After a decade of relative stability followingthe end of the first Gulf war, oil has been subject to tremendous price variations. Froman average of U.S. $27 in year 2000, prices of oil steadily increased over the subsequentperiod and peaked at $147 in July 2008. At that time, concerns about declining oil reserveswere raised. The world demand for oil was thought to get higher in the years anddecades ahead. Moreover, doubts from the supply side emerged: cheap oil has alreadybeen discovered and sooner or later would vanish, a phenomena labeled as the peak oil.Because oil exploration companies would increasingly have to drill for oil in more andmore difficult places, the cost of exploration and in the event of a discovery, the cost ofextraction would increase, therefore putting higher pressure on oil prices. However, theopposite behavior occurred on oil markets and in the second half of 2008, due to worseningeconomic conditions, oil prices collapsed to reach a four-year lowest value below $50 inNovember. This illustrates the complexity of oil markets. Moreover, as pointed out byHamilton (1983), ”all but one of the U.S. recessions since World War II have been preceded [...]by a dramatic increase in the price of crude petroleum”. This proves to be a sufficient source ofmotivation for research on oil markets.

Literature on oil shocks and their effect on the economy is vast. Important surveys areHamilton (2008), on the effects of oil shocks on the US economy, or Kilian (2007) on thesame topic. The latter emphasizes the need to account for the endogeneity of energy pricesand to differentiate between the effects of demand and supply shocks in energy markets.

In this section, the aim is to study oil shocks with an empirical approach. I base myanalysis on Kilian (2009), where a measure of global real activity is constructed and used in


DOI: 10.2870/63610


a SVAR to identify three types of oil shocks: supply, aggregate demand, and precautionarydemand. The impact of these shocks on the U.S. economy is then studied.

Nonlinearities are the motivation for this analysis. It finds its origin in the work byBlanchard and Galı (2007). Using a rolling bivariate VAR for the pre-1884 and post-1984periods, the authors study the changes of impact of the oil shocks on the economy andfind different results over the two periods. More recently, Gronwald (2012), refines theBlanchard and Galı (2007) analysis by using the more flexible machinery of rolling impulseresponses. A different analysis of time-varying effects is performed by Baumeister andPeersman (2012), who estimate a multivariate time-varying parameters Bayesian vectorautoregression with stochastic volatility. The remainder of this section is dedicated to theBayesian impulse response analysis of Markov-switching vector autoregressions, which— by allowing for several number of regimes — parsimoniously characterize changingeconomic conditions of a longer observation period.

2.6.1 Data

The data used for this analysis originate from Kilian (2009). The particularity of Kilian(2009) is that the author constructs an index of monthly global economic activity, not byaggregating different countries’ data, but by considering dry cargo single voyage oceanfreight rates, argued to capture shifts in the demand for industrial commodities driven bythe global business cycle. Briefly, in periods of strong global demand, freight shipmentincrease and so the freight rates does due to a limited supply of ships on the short run.Conversely, at lower activity levels, freight rates curves remain relatively flat since eitheridle ships can be reactivated or operate at higher speeds. This technique permits to skirtexchange-rate and country weighting, cross-countries aggregation, changes in the com-position or real output, and changes in the propensity to import industrial commoditiesfor a given unit of output. Kilian (2009) builds the index of global economic activity usingdata from the Drewry’s Shipping Monthly publication. This index is linearly detrended inorder to get rid of the technological advances in ship-building.

The three time series used in Kilian (2009) are the percent change in global crude oilproduction data, Δprodt, the index of real economic activity, reat, and the real price of oil,rpot, all sampled monthly over the period between January 1973 until December 2007. reat

and rpot are expressed in logarithms, while Δprodt is the percent change in global crude oil


DOI: 10.2870/63610


Table 2.1: Summary statistics

Variable Mean Median Standard Deviation Minimum Maximum

Δprodt 0.89 2.46 20.52 -118.89 77.98reat 0.18 -5.85 24.11 -47.39 76.71rpot 0.11 -10.45 45.66 -114.78 90.64Data Source: Kilian (2009).

production. The summary statistics of both series are presented in Table 2.1. Interestingly,large differences exist between the means and median of the three series, indicating thepresence of large extremes that affect the means, as can be seen with the minimum andmaximum values. The standard deviation of the three series is also high, and what can beseen from Table 2.1 is that the series in Kilian (2009) can be qualified as highly fluctuating.

The time series are plotted in Figure 2.2, and the plots display the occurrence oflarge changes in the series as well as of heteroskedasticity. For instance the percentchange in global oil crude production seems to have its variability diminishing duringthe eighties. That alone justifies the investigation of these series with Markov-switchingvector autoregressive models.

2.6.2 Evidence for regime switches

The data spanning about 35 years, could it be that structural changes occurred within thisperiod and that the model parameters changed over time? Just considering the real priceof oil for instance, the oil crises of 1973 and 1979, followed by the 1980s oil glut, as wellas the 1990 Gulf war and the post-2003 oil price movements reflect the variability of themarket. Hence the question of temporal stability is raised. First of all, the literature aboutstructural changes is discussed, before tests on the structural stability are performed onthe data. Additionally, the interested reader can find another approach briefly presentedin Appendix B.1, the rolling estimation of a VAR model associated to rolling impulseresponses, as an illustration of the low stability of the dynamics inside the sample.

The survey conducted in Hansen (2001) is informative on the existing apparatus forstructural change testing within the classical approach. First and foremost, the Chowtest from Chow (1960) compares two subsamples with a F-test, but necessitates the date


DOI: 10.2870/63610


Figure 2.2: Times series of percent change in global oil crude production data, index of realeconomic activity, and real price of oil. The sample is running from 1973.1 until 2007.12.

-100

-50

050

Pro

duct

ion

-40

-20

020

4060

80

RA

-100

-50

050

1975 1980 1985 1990 1995 2000 2005

RP

O

Time

of structural change to be known. Quandt (1960) resolves this limitation by runningthe Chow test over all the possible breaking dates of the sample. However, the Quandttest statistic had unknown critical values until the work of Andrews (1993). Researchfurther advanced in the direction of multiple breakpoints and multivariate time series. InAppendix B.2, the series are subject to the Qu and Perron test that allows for multivariatesystems, multiple structural change, and changes in variance. Changes can occur in theregression coefficients and/or the covariance matrix of the errors. The Qu and Perron teststatistic cannot reject the null hypothesis for structural changes.

In the Bayesian framework chosen for this paper, the structural breaks are modelledwith Markov-switching vector autoregression processes. Testing for nonlinearities boilsdown to comparing models exhibiting the desired features to a linear model. The com-parison metrics is the marginal density of data of each model. The procedure can besummarized as the following:

Step 1: Specify the VAR and MS-VAR models. Choose the order of VAR process, p ∈{1, . . . , pmax}, and the number of states, M ∈ {1, . . . ,Mmax}, using marginal densitiesof data.


DOI: 10.2870/63610


Step 2: Compare linear to nonlinear models. Test the linear VAR models versus the non-linear MS-VAR one using the Posterior Odds Ratio, e.g. according to the scale pro-posed by Kass and Raftery (1995).

Table 2.2 displays the marginal density of data for each model, computed with themodified harmonic mean applied to the posteriors draws.4. The data are estimated withVAR(p) and MSIAH(m)-VAR(p) models for different number of regimes, m = 2, 3, 4 anddifferent lag lengths, p = 1, ..., 6. For VAR models, each of the Gibbs algorithm is initiatedby the estimates from the OLS of the corresponding model. For MS-VAR models, eachGibbs algorithm is initiated by the estimates from an EM algorithm of the correspondingmodel. Then follows a 10000-iteration burn-in and, after convergence of the sampler, wesample 5000 final draws from the posteriors. The prior distributions are as defined inSection 2.4. First of all, the results for many models with higher number of lags for theautoregressive parameters are absent from the table. This occurs because these modelsare not well supported by the data and the simulations are problematic for such models.The best performing model overall is the MSIAH(4)-VAR(2) with 4 regimes and 2 lags forthe autoregressive coefficients. This model largely outperforms all the others in terms ofmarginal density of data with a difference of more than a hundred in logs to the second bestMSIAH(2)-VAR(2). Models with three regimes perform very poorly.5 MSIAH(4)-VAR(2)is hence by far the model preferred by the data.

Table 2.3 represents the testing results of the best linear VAR model against the bestMS-VAR model. Expressed in logarithm, the posterior odds ratio (equivalent to the Bayesfactor as no prior discrimination between the models is made) of the null hypothesisin favor of the MS-VAR model is equal to 337.34, which in the interpretation scale ofKass and Raftery (1995) corresponds to very strong evidence in favor of the model withMarkov-switching properties.

The next paragraph takes a closer look at the posterior probabilities of regimes for theMSIAH(4)-VAR(2), and attempts to characterize the regimes by observing their character-istics.


DOI: 10.2870/63610


Table 2.2: Model selection – determination of number of regimes and of lags

Lags 1 2 3 4 5 6 7

VAR -4691.93 -4638.97 -4637.71 -4646.67 -4650.54 -4661.99 -4666.662 regimes -4426.13 -4415.07 -4589.20 -4589.20 -4589.20 -4476.53 -4536.703 regimes -5317.69 -6995.59 -5878.82 -7537.85 - - -4 regimes -4338.30 -4300.37 -4592.20 -4351.81 -4448.47 -5059.82 -5877.04

Lags 8 9 10 11 12 13

VAR -4674.91 -4686.82 -4700.96 -4717.64 -4721.16 -4734.442 regimes -4764.66 - - - - -3 regimes - - - - - -4 regimes -4761.77 - - - - -Empty cells denote non-converging Gibbs for the corresponding models.The highest MHM value for each row is emphasized in bold font.

Table 2.3: Comparison of VAR specification to MS-VAR specification through PosteriorOdds Ratio.

Hypothesis ln p(y|M j) lnB j0

H0: Linear modelVAR(3) -4637.71 0

H1: Nonlinear model with structural breaks in all the model parametersMSIAH(4)-VAR(2) -4300.37 337.34

2.6.3 Regimes yielded by the selected MS-VAR model

The first output of the Gibbs sampler interesting to look at is the regime probabilities.After model selection, a larger estimation is started for the Gibbs sampler, with a 10000-iterations for burn-in and 50000-iterations for drawing from the posterior densities. Figure2.3 plots them for the selected model, the MSIAH(4)-VAR(2), which is composed of fourregimes for the latent state variable St.

4Droumaguet and Wozniak (2012) details the computation of the modified harmonic mean.5Estimation of models with five regimes was attempted but resulted in bad converging simulations, and

thus were discarded.


DOI: 10.2870/63610


Figure 2.3: Estimated probabilities of regimes for the latent variable of the MSIAH(4)-VAR(2) model.

0.0

0.2

0.4

0.6

0.8

1.0

Reg

ime

10.

00.

20.

40.

60.

81.

0

Reg

ime

20.

00.

20.

40.

60.

81.

0

Reg

ime

30.

00.

20.

40.

60.

81.

0

1975 1980 1985 1990 1995 2000 2005

Reg

ime

4

Year

As indicates the timeline of Figure 2.3, two regimes dominate the sample period, theregime 1 and the regime 4. Regime 1 is mostly represented over the first period of thesample, whereas from 1986 onwards, a shift to regime 4 is observed. 1986 constitutes apivotal year on the oil markets as during that year the prices of crude oil collapsed duethe increased production of the OPEC members. This event, known as the oil glut, beganin 1980 and was due to the slowed down economic activity after the 1970s energy crisesof 1973 and 1979, respectively due to the Arab oil embargo and to the Iranian revolution.This period is also referred by Baumeister and Peersman (2012) as a period of transitionfrom an administrated oil prices by the OPEC to a market-based oil trading. Table 2.4displays the median covariance matrices from the posteriors of the regimes. Interestingly,one can observe a period of high variance in oil production in regime 1, as well as of highvariance in real economic activity, and a low variance in oil prices. On the contrary, thepost 1986 regime has lower variance of oil production and of real economic activity, but a


DOI: 10.2870/63610


higher variance of oil prices.

Table 2.4: Median of the posterior covariance matrices Σst for the four estimated regimesof the MSIAH(4)-VAR(2) model.

Δprod rea rpo

Regime 1Δprod 864.00 - -rea 21.90 62.94 -rpo 4.06 0.60 2.80

Regime 2Δprod 1.27 - -rea -0.01 2.18 -rpo -0.14 -5.13 29.65

Regime 3Δprod 0.61 - -rea 0.12 0.66 -rpo -0.05 -0.11 0.58

Regime 4Δprod 122.69 - -rea 4.32 13.40 -rpo 1.26 1.63 47.40

The remaining regimes, regimes 2 and 3 rarely occur and both have short persistence,as indicates the median of the posterior transition probabilities displayed in Table 2.5.With probabilities of staying in the same regime of 0.60 for regime 2 and 0.73, both regimesare volatile. They are also seldom reached from regime 1 and regime 4. Regimes 1 and 4are however very persistent, with probabilities of regime invariance both over 0.90. Next,Bayesian impulse responses are calculated for the different regimes in order to characterizetheir dynamics. Besides the economic interpretation of the responses, the heterogeneousnature of the regimes — in terms of persistence — constitute an interesting case studyto illustrate the properties of the impulse response functions derived in the case of MS-VARs. Indeed, the propagation of impulse responses from initial low persistent regimesare expected to be influenced by the changes in regimes likely to occur over the impulsehorizon.


DOI: 10.2870/63610


Table 2.5: Median of the posterior matrix of transition probabilities P for the MSIAH(4)-VAR(2) model.

Regime 1 Regime 2 Regime 3 Regime 4

Regime 1 0.93 0.01 0.01 0.06Regime 2 0.05 0.60 0.05 0.29Regime 3 0.05 0.07 0.73 0.15Regime 4 0.02 0.01 0.01 0.95

2.6.4 Impulse responses analysis

From the MCMC simulation of the MSIAH(4)-VAR(2) model, and using the Choleskydecomposition as identification scheme, the impulse responses IR(∇ε,s0,t) are computed asdescribed in Section 2.3. A horizon h = 20 periods after the impulse is chosen. The shocks∇ε impact a system being in regime s0 at time 0. The dimension of the series is N = 3, sothree different structural shocks can impact the system. The latent variable can take fourvalues, so there are four possible starting regimes. All the responses are represented in theFigures 2.4, 2.5, 2.6, and 2.7. As the Figure 2.3 shows, two regimes dominate the sampleperiod, the regime 1 for the pre-1986 period and the regime 4 for the post-1986 period.Next, responses with these two regimes as starting point are presented before observingthe features of the responses for the other two less persistent regimes.

The interpretation of the structural shocks is borrowed from Kilian (2009), i.e. a globaloil production shock (which are normalized to negative), an unanticipated aggregatedemand expansion, and an unanticipated oil-market specific demand shock.

Pre-1986 and post-1986 regimes In Kilian (2009), and as presented in Appendix B.3,unanticipated oil supply disruptions are sharp upon impact but then partially declineover the year following the shock. Also they have almost no effect on the real economicactivity, and a surprisingly small effect on the real price of oil. With the Bayesian impulseresponses drawn for MS-VARs in Figures 2.4 and 2.5, regardless of the regime, any negativeoil supply shock quickly reverses to normal; the persistence of the shock is only of 2 to3 months. This is consistent with Kilian (2009), where it is argued that disruption ofproduction in one region triggers production in another region of the world. Differencesbetween the pre and post-1986 periods arise for the response of the two other variables.


DOI: 10.2870/63610


Figure 2.4: Regime 1, pre-1986 - Impulse responses to one standard deviation structuralshocks. The 10%, 40%, 50%, 60%, and 90% quantiles of the posterior from impulseresponses are represented.

dprod shock, dprod response

0 5 10 15 20

-30

-20

-10

010

dprod shock, rea response

0 5 10 15 20

-6-4

-20

24

dprod shock, rpo response

0 5 10 15 20-2

-10

12

34

rea shock, dprod response

0 5 10 15 20

-5-4

-3-2

-10

12

rea shock, rea response

0 5 10 15 20

-4-2

02

46

8

rea shock, rpo response

0 5 10 15 20

-2-1

01

23

45

rpo shock, dprod response

0 5 10 15 20

-10

-50

510

rpo shock, rea response

0 5 10 15 20

-6-4

-20

24

6

rpo shock, rpo response

0 5 10 15 20

-50

510


DOI: 10.2870/63610


Figure 2.5: Regime 4, post-1986 - Impulse responses to one standard deviation structuralshocks. The 10%, 40%, 50%, 60%, and 90% quantiles of the posterior from impulseresponses are represented.


0 5 10 15 20

-30

-20

-10

010


0 5 10 15 20

-6-4

-20

24


0 5 10 15 20

-2-1

01

23

4


0 5 10 15 20

-5-4

-3-2

-10

12


0 5 10 15 20

-4-2

02

46

8


0 5 10 15 20

-2-1

01

23

45


0 5 10 15 20

-10

-50

510


0 5 10 15 20

-6-4

-20

24

6


0 5 10 15 20

-50

510


DOI: 10.2870/63610


In the pre-1986 regime, oil supply disruption is followed by a slight decline of the globaleconomic activity as well as a sharp but brief — only for the 2 to 3 periods the shockpersists — increase of the price of oil. However, in the post-1986 regime, a negative oilsupply shock surprisingly triggers a lasting increase of the global economic activity anda lasting increase in the price of oil. Overall, the post-1986 regime is more affected by anegative oil supply shock, with oil prices positively increasing. The change of oil marketstructure between the regimes 1 and 4, developing to a market-based oil trading, couldexplain these differences of behavior. It is to be noted that in both regimes, the 90% errorbands are however very broad and the posterior of the impulse responses are not highlyconcentrated over one path, hence doubt can be cast on the significance of the responses.

Unanticipated aggregate demand expansion, in Kilian (2009), is highly persistent andsignificant, and temporarily increases oil production after a six months delay, plus causes asix months delayed increase in oil prices. In the regime 1 plotted in Figure 2.4, the shock hasa high persistence associated to a high significance. Yet the global production diminisheswhereas the oil prices are not much affected by the shock. In the regime 4 plotted in Figure2.5, the unanticipated aggregate demand expansion is also persistent and significant. Theoil supply also diminishes (significantly), whereas oil prices permanently increase (broadconfidence intervals indicate low significance of the response). As with the first structuralshock, the responses of the post-1986 regime indicate oil prices reacting more in accordancewith economic principles.

The last structural shock impacting the system was labeled as an unanticipated oil-market specific demand increase in Kilian (2009), where its effect was high, persistent andsignificant on oil prices. It was also triggering a small short-run decline in oil productionand a temporary increase in real economic activity. Within the MS-VAR, besides a lowerpersistence of the reaction of the oil prices, the responses for the pre-1986 regime are quitesimilar: the global oil production almost do not react to the shock, and the global economicactivity temporary increases. However the picture changes for the post-1986 regime.An unanticipated oil-market specific demand increase triggers an increase in global oilproduction and also spurs the global economic activity. Again, the relation between theoil prices and oil supply is more in accordance with the outcome of competitive marketsin the post-1986 regime.

Outliers regimes Figures 2.6 and 2.7 display the impulse responses for the remainingtwo highly non-persistent regimes. Interestingly, for these two regimes the shape and


DOI: 10.2870/63610


Figure 2.6: Regime 2, outliers - Impulse responses to one standard deviation structuralshocks. The 10%, 40%, 50%, 60%, and 90% quantiles of the posterior from impulseresponses are represented.


0 5 10 15 20

-30

-20

-10

010


0 5 10 15 20

-6-4

-20

24


0 5 10 15 20

-2-1

01

23

4


0 5 10 15 20

-5-4

-3-2

-10

12


0 5 10 15 20

-4-2

02

46

8


0 5 10 15 20

-2-1

01

23

45


0 5 10 15 20

-10

-50

510


0 5 10 15 20

-6-4

-20

24

6


0 5 10 15 20

-50

510


DOI: 10.2870/63610


Figure 2.7: Regime 3, outliers - Impulse responses to one standard deviation structuralshocks. The 10%, 40%, 50%, 60%, and 90% quantiles of the posterior from impulseresponses are represented.


0 5 10 15 20

-30

-20

-10

010


0 5 10 15 20

-6-4

-20

24


0 5 10 15 20-2

-10

12

34


0 5 10 15 20

-5-4

-3-2

-10

12


0 5 10 15 20

-4-2

02

46

8


0 5 10 15 20

-2-1

01

23

45


0 5 10 15 20

-10

-50

510


0 5 10 15 20

-6-4

-20

24

6


0 5 10 15 20

-50

510


DOI: 10.2870/63610


intensities of the responses are comparable, also in the broad error bands of the responses,with posterior mostly symmetrically distributed on both sides of the zero response line.To a global oil production disruption, both regimes see an increase in the real economicactivity and an increase in the oil prices. An unanticipated aggregate demand expansionleads to a negative response in oil supply and to an increase in oil prices. The mostinnovative finding comes from the third fundamental shock, an unanticipated oil-marketspecific demand increase. In this case and as occurring in the pre-1986 and the post-1986regimes, oil suppliers adapt their production by increasing it. However, for the first timewe witness a permanent decrease of the global real economic activity. This is compellingin the sense that of all the fundamental disturbances, this is the only one having a negativeeffect on the global economic activity.

Besides the economic interpretation of the impulse responses, two remarkable featuresstand out from this econometric framework of Markov-switching Autoregressive models:

1. The shapes of the impulse responses for both regimes are astonishingly similar.This has for origin the low persistence of these regimes. With median posteriorprobabilities of remaining in the same regime for the next period of respectively0.60 and 0.73 for regime 2 and regime 3, the simulated regime histories draws arebound to witness changes of regimes over the propagation of the shock. In bothregimes, as can be seen in Table 2.5, the second most likely to be attained after theregimes 2 and 3 is the regime 4 (p24 = 0.29 and p34 = 0.15). Low persistence inthese regimes, associated to the same second most probable regime switch, leads tosomehow similarly shaped impulse response shapes in Figures 2.6 and 2.7. This isan interesting feature of impulse responses taking into account the probabilities ofregime switches and would not be observed in regime-dependent impulse responsesa la Ehrmann et al. (2003).

2. The error bands for the regimes 2 and 3 are very large in comparison to the regimes1 and 4, large enough to cast doubt on the significance of the responses, as can beacknowledged if checking the 90% quantile responses for both regimes in Figures 2.6and 2.7, almost unequivocally containing the zero response line. Impulse responsesare drawn from each MCMC algorithm iteration, with posterior draws from thedistribution, but additionally to that, noise is added by the simulation of the impulsepropagation. Each posterior impulse response is associated with a random shockpropagation regime history, leading to these large error bands that could be seen


DOI: 10.2870/63610


as rendering the responses noisy. Interestingly one can notice how the responses ofregime 2 have larger error bands than the responses of regime 3. This can also beexplained by the lower persistence of regime 2 in comparison to regime 3. With errorbands, Bayesian impulse responses for MS-VARs hence incorporated informationrelated to the confidence in which inference can be based on the responses, which isan step of improvement upon the work of Krolzig (2006).

Summary This empirical application provides an illustration of what Bayesian impulseresponses for Markov-switching vector autoregressive models bring to the empirical anal-ysis. Applied to the oil market data of Kilian (2009), the simulations of the regime didsplit the observation sample into four regimes, among which two largely dominatingones, basically cutting the sample into a pre-1986 regime and a post-1986 regime. Impulseresponses differ between these regimes in the way oil supply and oil prices are reacting toeach other, differences imputable to structural changes occurring in oil markets. Indeed1986 saw the OPEC collapsing and there switched from an administrated oil trading toa market-based one. Also, the two other regimes of outliers allowed to identify periodwhere presumably (because of large error bands) unanticipated oil-market specific de-mand increases result in declining real economic activity. Overall, the simulations fromthe posteriors of the model allowed to discover, date, and characterize the regime switchesoccurring over the data sample. Moreover, Bayesian impulse responses shed light on thedynamics of the regimes, taking into account the probabilities of switching regimes in theresponses, yielding uncertainty in the propagation of these responses, itself materializedby the posterior distribution of the responses. The Bayesian framework hence consti-tutes an attractive solution to the challenges rising from drawing impulse responses withnonlinear models.

2.7 Conclusions

This work defines and advocates the use of Bayesian impulse responses when dealingwith Markov-switching Vector Autoregressions, in order to overcome the limitations ofthe classical approach. The advantages of the proposed approach, discussed throughoutthe analysis, are fourfold. Firstly, the Bayesian impulse responses defined in this articletake into account the possibility of regime changes over the propagation of the consid-ered shock. Changes in regimes imply changes of dynamics and shapes the impulse


DOI: 10.2870/63610

2.7. CONCLUSIONS 105

responses, therefore Bayesian integration over randomly drawn histories of regimes takethe nonlinear feature of the models into account. Secondly, Bayesian inference yieldingposterior densities instead of point parameters, inference on the likeliness of the responsescan immediately be performed from their posteriors. This is important because nonlinearmodels with switching regimes have much richer dynamics than linear ones, and therange of possible impulse responses can become large, as is shown in the case of regimeswith low persistence. Thirdly, model selection through the comparison of their marginaldensity of data allows to discriminate between linear and nonlinear models, and to se-lect the best model specification in terms of number of regimes and number of lags ofautoregressive parameters. In classical approach, the problem of choosing the number ofregimes has yet not been tackled for multivariate models. Fourthly, Bayesian shrinkagehelps to circumvent the curse of dimensionality, and allows to estimate larger dimensionalmodels, either with more lags on the autoregressive parameters, or with more regimes.

The methodology was applied to the dataset of Kilian (2009). The main finding wasto split the sample into two periods over the date of 1986 which saw the collapse of theOPEC. The structural market changes into more competitive ones were highlighted withthe dynamics appearing within each regime. Two other regimes identified periods whereunanticipated oil-market specific demand increases led to a global decline of the economicactivity.

The limitations of the methodology lie in three points. First, Markov-switching Vec-tor Autoregressions possess multiple regimes, and the number of impulse responses tostudy is equally augmented, which can lead to cumbersome empirical analysis, as differ-ent regimes with diverging dynamics can become hard to interpret. Second, the complexdynamics intervening in the propagation of shocks — due to the possibility of regimeswitches over the studied horizon — can lead to large error bands with a low significance.This is however inherent to the features of the Markov-switching Vector Autoregressivemodels, and hence can not be ignored. Third, this analysis is restrained to a specificstructural identification scheme, the Cholesky decomposition. The Gibbs sampler pro-posed here does not operate with other identification schemes. This can however besolved by using the Gibbs sampler of Waggoner and Zha (2003), which entails a specificblock for drawing from the posterior distribution of the matrix responsible for structuralidentification.

To some extent, the propagation of impulse responses is similar to a forecasting exercise.The algorithm presented here could be applied to produce the multiple period ahead


DOI: 10.2870/63610


density forecast of regimes and series. This area of research is promising, because multipleperiod ahead forecast of MS-VAR models is rendered possible with MCMC simulationsand since density forecasting is gaining popularity in the field of econometrics.


DOI: 10.2870/63610

Appendix B

B.1 Alternative classical approach, the rolling estimation

The flexible empirical framework of rolling estimation is here used to analyze the effect ofoil shocks, in the vein of Blanchard and Galı (2007) and more closely related to the analysisof Gronwald (2012) who also used rolling estimation. The rolling (window) estimatorsare based on a changing subsample of fixed length that moves sequentially through thesample, giving rise to a series of estimates for the parameters of the model. Subsampleof 15 years are iteratively used -i.e. with increasing starting date at each estimation- toestimate the SVAR and impulse responses to structural shocks. The choice of 15 years forthe window was dictated by the explosive character of the responses for smaller windowsizes.

The responses of the three variables to the three structural shocks are plotted as contourplots for clarity in Figures B.1, B.2, and B.3. The horizontal axis corresponds to the monthsfollowing the shock. The vertical axis is the starting date of the 15 years period, suchthat the horizontal line of 1980 is the response over time (18 months) of a structural shockfor the model estimated with data from 1980 until 1995. The intensity of the response isrepresented by its color.

Oil supply shock

• Over time, the response of oil production becomes less intense, until it almost dis-appear for the subsamples starting after 1985. That could be a sign that the worldproduction adapts better to disruptions so that the global oil supply is not affectedby negative shocks.

107


DOI: 10.2870/63610

Figure B.1: Rolling impulse responses for an oil supply shock (negative shock). Periodcorresponds to the starting time of the estimated subsample, for a window size of 15 years.

Months after shock

5

10

15

Period

1980

1985

1990

Response

−20

−15

−10

−5

Oil supply shock: response of oil production

(a) Response of oil production

Months after shock

5

10

15

Period

1980

1985

1990

Response

−2

−1

0

1

2

Oil supply shock: response of real activity

(b) Response of real activity

Months after shock

5

10

15

Period

1980

1985

1990

Response

−4

−2

0

2

4

6

Oil supply shock: response of real price of oil

(c) Response of real price of oil

• The propensity for real activity to react positively with about a one year delay tooil supply cuts is featured for windows starting after 1987, whereas beforehand thereaction was negative a year after the shock.

• The response of the real price of oil changes over time. For early windows startingbefore 1985, an oil supply shortage is followed by a transitory increase in the real priceof oil vanishing after 8 months. However for later samples, if the initial responsetends to be moderate, after 10 months oil prices rise significantly. This may reflectthe fears of a supply disruption because of vanishing oil reserves.

Aggregate demand shock

• After an unanticipated expansion in aggregate demand, oil production reacts posi-tively and in a permanent way for the post 1986 windows. Earlier windows have adelayed and transitory positive response.

• Aggregate demand shocks are permanent in the windows starting from 1975 until1980 and for windows starting after 1989. In between, the shocks are only tran-sitory and usually start to vanish after 6 months. This could be interpreted as amanifestation of the great moderation, where the economy would absorb more effi-ciently aggregate demand shocks. Based on Figure B.3(b), later estimation windows

108


DOI: 10.2870/63610

Figure B.2: Rolling impulse responses for an aggregate demand shock. Period correspondsto the starting time of the estimated subsample, for a window size of 15 years.

Months after shock

5

10

15

Period

1980

1985

1990

Response

−10

−5

0

5

Aggregate demand shock: response of oil production


Months after shock

5

10

15

Period

1980

1985

1990

Response

0

1

2

3

4

5

Aggregate demand shock: response of real activity


Months after shock

5

10

15

Period

1980

1985

1990

Response

0

2

4

6

8

Aggregate demand shock: response of real price of oil


would indicate that the great moderation would be over. One disadvantage withrolling impulse responses with a window size of 15 years lies in the impossibility oftemporally identifying an event.

• Interestingly, the real price of oil also saw a decline in response for windows startingbetween 1980 and 1989. Whereas outside these windows the aggregate demandshock triggered an oil price increase after about half a year, this increase is notobservable inside these windows, perhaps as another manifestation of the greatmoderation.

Oil-market specific demand shock

• The response of oil production to a precautionary demand shock in oil is overallpositive and what the rolling impulse responses technique shows us is that theresponse seems faster for the later windows.

• Unanticipated oil-market specific demand are associated with temporarily increaseof the real activity. This movement was particularly visible within the windowsstarting after 1984.

• Differently from Kilian (2009), the shock is no more permanent except for a fewwindows starting from 1984 to 1987, where the overshooting effect is the strongest.

109


DOI: 10.2870/63610

Figure B.3: Rolling impulse responses for an oil-specific demand shock. Period corre-sponds to the starting time of the estimated subsample, for a window size of 15 years.

Months after shock

5

10

15

Period

1980

1985

1990

Response

−5

0

5

Oil−specific demand shock: response of oil production


Months after shock

5

10

15

Period

1980

1985

1990

Response

−1

0

1

2

3

Oil−specific demand shock: response of real activity


Months after shock

5

10

15

Period

1980

1985

1990

Response

0

5

10

Oil−specific demand shock: response of real price of oil


For the latest windows starting after 1990, the real price of oil returns to normal afterabout a year, which is not inconsistent with the stronger response in oil productionfor the same periods.

B.2 Structural breaks, the Qu and Perron test

The testing procedure developed in Qu and Perron (2007) is appropriate for multivariatesystems, multiple structural change, allows studying changes in variance. Changes canoccur in the regression coefficients and/or the covariance matrix of the errors.

The test is the maximal value of the likelihood ratio over all admissible partitions inthe set λε,1 that is:

supLRT(m, pb,ndb, nbo, ε

)= sup

(λ1,...,λm)∈Λε2[log LT (T1, . . . ,Tm) − log LT

]

= 2[log LT

(T1, . . . , Tm

)− log LT

],

1The maximization is conducted over all the partitions T = (T1, . . . ,Tm) = (Tλ1, . . . ,Tλm) in the set:

Λε = {(λ1, . . . , λm) ; |λ j+1 − λ j| ≥ ε, λ1 ≥ ε, λm ≤ 1 − ε}

110


DOI: 10.2870/63610

where the estimates(T1, . . . , Tm

)are the quasi-maximum likelihood estimator obtained

by considering only those partitions in Λε. log LT is the log-likelihood under the nullhypothesis of no change in the structure. The parameter ε acts as a truncation thatimposes a minimal length for each segment and will affect the limiting distribution of thetest.

Necessity to account for structural changes on the oil market The optimal numberof lags for the VAR model, to be then used for the SupLR test, is determined by theinformation criteria reported in Table B.1. Kilian (2009) was using 24 lags, which is farabove the suggested values of 2 for the Hannan-Quinn and Schwarz criteria, and 3 for theAkaike information criterion. We choose a number of lags equal to 2, since the Akaikecriterion tends to overfit the optimal number of lags.

Table B.1: Information criteria values for optimal selection of lags of for a VAR model,with a maximum number of lags pmax = 24.

Akaike IC Hannan-Quinn Schwarz Criterion

3 2 2

Table B.2: SupLR statistics of Qu and Perron, reported for a 2 lags VAR model, and for upto 4 regime changes.

Number of breaks supLR test statistic Critical value (1% level)

1 255.38 40.372 349.97 63.843 410.82 86.744 418.23 108.32

Table B.2 reports the SupLR statistic for a VAR(2) model.2 For all the number of possiblebreaks reported, the hypothesis of structural changes can not be rejected, with test statistic

2The test was run from the source code provided by Qu and Perron (2007), which did not run for a numberof breaks higher than four with the data. Therefore the structural breaks testing is restricted to four structuralchanges.

111


DOI: 10.2870/63610

112 APPENDIX B. APPENDIX

well above the 1% critical values.

B.3 Kilian (2009)’s impulse responses

The impulse responses from Kilian (2009) are here replicated and plotted for comparisonpurpose.

Figure B.1: Impulse responses to one standard deviation structural shocks from Kilian(2009), obtained within the classical framework described in the paper of origin. One andtwo standard deviation confidence intervals are represented.

0 5 10 15

−20

−10

010

prod shock, prod

0 5 10 15

−50

510

prod shock, rea

0 5 10 15

−50

510

prod shock, rpo

0 5 10 15

−20

−10

010

rea shock, prod

0 5 10 15

−50

510

rea shock, rea

0 5 10 15

−50

510

rea shock, rpo

0 5 10 15

−20

−10

010

rpo shock, prod

0 5 10 15

−50

510

rpo shock, rea

0 5 10 15

−50

510

rpo shock, rpo


DOI: 10.2870/63610

BIBLIOGRAPHY 113

Bibliography

Adolfson, M., J. Linde, and M. Villani (2007). Forecasting Performance of an Open Econ-omy DSGE Model. Econometric Reviews 26(2-4), 289–328.

Andrews, D. (1993). Tests for Parameter Instability and Structural Change with UnknownChange Point. Econometrica: Journal of the Econometric Society, 821–856.

Baumeister, C. and G. Peersman (2012). Time-Varying Effects of Oil Supply Shocks on theUS Economy. Working paper,, Bank of Canada.

Blanchard, O. J. and J. Galı (2007). The Macroeconomic Effects of Oil Price Shocks: Whyare the 2000s so Different from the 1970s? In International Dimensions of Monetary Policy,NBER Chapters, pp. 373–421. National Bureau of Economic Research, Inc.

Casella, G. and E. I. George (1992). Explaining the Gibbs Sampler. The American Statisti-cian 46(3), 167–174.

Chow, G. (1960). Tests of Equality Between Sets of Coefficients in Two Linear Regressions.Econometrica: Journal of the Econometric Society, 591–605.

Doan, T., R. B. Litterman, and C. A. Sims (1983). Forecasting and Conditional ProjectionUsing Realistic Prior Distributions. NBER Working Paper 1202(September), 1–71.

Droumaguet, M. and T. Wozniak (2012). Bayesian Testing of Granger Causality in Markov-Switching VARs. Working paper series, European University Institute, Florence, Italy.Download at: http://cadmus.eui.eu/bitstream/handle/1814/20815/ECO_2012_06.pdf?sequence=1.

Ehrmann, M., M. Ellison, and N. Valla (2003). Regime-Dependent Impulse ResponseFunctions in a Markov-Switching Vector Autoregression Model. Economics Letters 78(3),295–299.

Fruhwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer.

Garcia, R. (1998). Asymptotic Null Distribution of the Likelihood Ratio Test in MarkovSwitching Models. International Economic Review 39(3), 763–788.

Geweke, J. (1999). Using Simulation Methods for Bayesian Econometric Models: Inference,Development, and Communication. Econometric Reviews 18(1), 1–73.


DOI: 10.2870/63610


Goncalves, S. and L. Kilian (2004). Bootstrapping Autoregressions with Conditional Het-eroskedasticity of Unknown Form. Journal of Econometrics 123, 89–120.

Gronwald, M. (2012). Oil and the US Macroeconomy: A Reinvestigation using RollingImpulse Responses. The Energy Journal 33(4), 143–160.

Hamilton, J. D. (1983). Oil and the Macroeconomy since World War II. Journal of PoliticalEconomy 91, 228–48.


Hamilton, J. D. (2008). Oil and the Macroeconomy. In S. N. Durlauf and L. E. Blume (Eds.),The New Palgrave Dictionary of Economics. Basingstoke: Palgrave Macmillan.

Hansen, B. E. (2001). The New Econometrics of Structural Change: Dating Breaks in U.S.Labor Productivity. The Journal of Economic Perspectives 15(4), 117–128.

Kass, R. E. and A. E. Raftery (1995). Bayes Factors. Journal of the American StatisticalAssociation 90(430), 773–795.

Kilian, L. (2007). The Economic Effects of Energy Price Shocks. Technical report, C.E.P.R.Discussion Papers.

Kilian, L. (2009). Not All Oil Price Shocks are Alike: Disentangling Demand and SupplyShocks in the Crude Oil Market. American Economic Review 99(3), 1053–1069.

Kilian, L. and C. Park (2007). The Impact of Oil Price Shocks on the U.S. Stock Market.Technical report, C.E.P.R. Discussion Papers.

Kim, C.-J. and C. R. Nelson (1999a). Has the U.S. Economy Become More Stable? ABayesian Approach Based on a Markov-Switching Model of the Business Cycle. Reviewof Economics and Statistics 81(4), 608–616.

Kim, C.-J. and C. R. Nelson (1999b). State-Space Models with Regime Switching: Classical andGibbs-Sampling Approaches with Applications. MIT press.

Koop, G. and D. Korobilis (2010). Bayesian Multivariate Time Series Methods for EmpiricalMacroeconomics. Foundations and Trends in Econometrics 3(4), 267–358.


DOI: 10.2870/63610

BIBLIOGRAPHY 115

Koop, G., M. Pesaran, and S. M. Potter (1996). Impulse Response Analysis in NonlinearMultivariate Models. Journal of Econometrics 74(1), 119 – 147.

Krolzig, H. (1997). Markov-Switching Vector Autoregressions: Modelling, Statistical Inference,and Application to Business Cycle Analysis. Springer Verlag.

Krolzig, H. (2006). Impulse Response Analysis in Markov Switching Vector AutoregressiveModels. In Economics Department, University of Kent. Keynes College.

Litterman, R. B. (1986). Forecasting with Bayesian Vector Autoregressions: Five Years ofExperience. Journal of Business & Economic Statistics 4(1), 25–38.

Lutkepohl, H. (2008). Impulse Response Function. In S. N. Durlauf and L. E. Blume (Eds.),The New Palgrave Dictionary of Economics. Basingstoke: Palgrave Macmillan.

Lutkepohl, H. and M. Kratzig (2004). Applied Time Series Econometrics. Cambridge Univer-sity Press.

Nakov, A. and A. Pescatori (2007). Inflation-Output Gap Trade-Off with a Dominant OilSupplier. Banco de Espaa Working Papers 0723, Banco de Espana.

Nakov, A. and A. Pescatori (2010). Oil and the Great Moderation. The Economic Jour-nal 120(543), 131–156.

Ni, S., D. Sun, and X. Sun (2007). Intrinsic Bayesian Estimation of Vector AutoregressionImpulse Responses. Journal of Business and Economic Statistics 25(2), 163–176.

Qu, Z. and P. Perron (2007). Estimating and Testing Structural Changes in MultivariateRegressions. Econometrica 75(2), 459–502.

Quandt, R. (1960). Tests of the Hypothesis that a Linear Regression System Obeys TwoSeparate Regimes. Journal of the American statistical Association, 324–330.

Robertson, J. C. and E. W. Tallman (1999). Vector Autoregressions: Forecasting and Reality.Federal Reserve Bank of Atlanta Economic Review (First Quarter), 4–18.

Sims, C. (1980). Macroeconomics and Reality. Econometrica: Journal of the EconometricSociety 48(1), 1–48.


DOI: 10.2870/63610


Smith, A., P. A. Naik, and C. Tsai (2006). Markov-Switching Model Selection usingKullback-Leibler Divergence. Journal of Econometrics 134(2), 553–577.

Waggoner, D. and T. Zha (2003). A Gibbs Sampler for Structural Vector Autoregressions.Journal of Economic Dynamics and Control 28(2), 349–366.


DOI: 10.2870/63610

Chapter 3

Bayesian Testing of GrangerCausality in Markov-Switching VARs

with Tomasz Wozniak

Abstract. Recent economic developments have shown the importance ofspillover and contagion effects in financial markets as well as in macroeco-nomic reality. Such effects are not limited to relations between the levels ofvariables but also impact on the volatility and the distributions. We proposea method of testing restrictions for Granger noncausality on all these levels inthe framework of Markov-switching Vector Autoregressive Models. The condi-tions for Granger noncausality for these models were derived by Warne (2000).Due to the nonlinearity of the restrictions, classical tests have limited use. We,therefore, choose a Bayesian approach to testing. The computational tools forposterior inference consist of a novel Block Metropolis-Hastings sampling al-gorithm for estimation of the restricted models, and of standard methods ofcomputing the Posterior Odds Ratio. The analysis may be applied to financialand macroeconomic time series with complicated properties, such as changesof parameter values over time and heteroskedasticity.

This paper was presented during the poster session of the 22nd EC2 Conference: Econometrics for PolicyAnalysis: after the Crisis and Beyond in Florence in December 2011. The authors thank Anders Warne, HelmutLutkepohl, Massimiliano Marcellino, Peter R. Hansen and William Griffiths for their useful comments on thepaper.

117


DOI: 10.2870/63610

118 CHAPTER 3. TESTING NONCAUSALITY IN MS-VAR MODELS

3.1 Introduction

The concept of Granger causality was introduced by Granger (1969) and Sims (1972). Onevariable does not Granger-cause some other variable, if past and current information aboutthe former cannot improve the forecast of the latter. Note that this concept refers to theforecasting of variables, in contrast to the causality concept based on ceteris paribus effectsattributed to Rubin (1974) (for the comparison of the two concepts used in econometrics,see e.g. Lechner, 2011). Knowledge of Granger causal relations allows a researcher toformulate an appropriate model and obtain a good forecast of values of interest. But whatis even more important, a Granger-causal relation, once established, informs us that pastobservations of one variable have a significant effect on the forecast value of the other,delivering crucial information about the relations between economic variables.

The original Granger causality concept refers to forecasts of conditional means. Thereare, however, extensions referring to the forecasts of higher-order conditional momentsor to distributions. We present and discuss these in Section 3.3. Again, information thatthey deliver not only helps in performing good forecasts of the variables, but is crucial fordecision-making in economic and financial applications as well.

Among the time series models that have been analyzed for Granger causality of dif-ferent types are: a family of Vector Autoregressive Moving Average (VARMA) models(see Boudjellaba et al., 1994, and references therein), the Logistic Smooth Transition VectorAutoregressive (LST-VAR) model (Christopoulos and Leon-Ledesma, 2008), some modelsfrom the family of Generalized Autoregressive Conditional Heteroskedasticity (GARCH)models (Comte and Lieberman, 2000; Wozniak, 2011; Wozniak, 2012). Finally, Warne (2000)derived conditions for different types of Granger noncausality for the Markov-switchingVAR models on which we focus in this study. We present the model and its estimationin Section 3.2, while in Section 3.3 the definitions for different types of noncausality andrestrictions on parameters are given. Note that all these works analyzed one period aheadGranger noncausality (see Lutkepohl, 1993; Lutkepohl and Burda, 1997; Dufour et al.,2006, for h periods ahead inference in VAR models).

The testing of the restrictions meets multiple problems. The most important limitationin the classical approach is that neither the asymptotic nor finite-sample distribution of theestimator has been derived so far. Consequently, the asymptotic distributions of the Wald,Likelihood Ratio and Lagrange Multiplier tests are not known. Further, the restrictions onthe parameters derived by Warne (2000) may result in several sets of restrictions associated


DOI: 10.2870/63610

3.1. INTRODUCTION 119

with one hypothesis. Therefore, a hypothesis of noncausality may be represented byseveral restricted models. Finally, some of the restrictions are nonlinear functions ofparameters. All these features of the Granger noncausality analysis for Markov-switchingVARs makes classical testing of hypotheses difficult, if possible at all.

The contribution of this work is a Bayesian testing procedure that allows the testingof all the restrictions derived by Warne (2000) for different kinds of Granger noncausality,as well as for the inference on the hidden Markov process. None of the existing classicalsolutions to the problem of testing nonlinear restrictions on parameters that we describe inSection 3.4 is easily applicable to Markov-switching VAR models. The proposed approachconsists of a Bayesian estimation of the unrestricted model, allowing for Granger causality,and of the restricted models, where the restrictions represent hypotheses of noncausality.For this purpose, we construct a novel Block Metropolis-Hastings sampling algorithm thatallows for restricting the models. The algorithm is discussed in Section 3.4 and presented inSection 3.5. Having estimated the models, we compare competing hypotheses, representedby the unrestricted and the restricted models, with standard Bayesian methods usingPosterior Odds Ratios and Bayes factors.

The main advantage of our approach is that we can test the nonlinear restrictions. Therestrictions of all the considered types of noncausality may be tested. Thus, the analysisof causal relations between variables is profound and potentially informative. Otheradvantages include an effect of adopting Bayesian inference. First, the Posterior OddsRatio method gives arguments in favour of the hypotheses, as posterior probabilities ofthe competing hypotheses are compared. In consequence, all the hypotheses are treatedsymmetrically. Finally, our estimation procedure combines and improves the existingalgorithms restricting the models, but it also preserves the possibility of using differentmethods for computing the marginal density of data necessary to compute the PosteriorOdds Ratio. We discuss further the benefits and costs of our approach at the end of Section3.4.

As potential applications of the testing procedure, we indicate macroeconomic as wellas financial time series. In particular, recent financial turmoil and the following globalrecession are interesting periods for analysis. There exist many applied studies presentingevidence that these events have the nature of switching the regime. Taylor and Williams(2009), on the example of Libor-OIS and Libor-Repo spreads, being an approximation forcounterpart risk, present how different the perception of the risk by agents on the financialmarket was, first, starting from August 2007 and then, even more, from October 2008.


DOI: 10.2870/63610


Further, Diebold and Yilmaz (2009) show how different behaviors characterize returnspillovers and volatility spillovers for stock exchange markets. These two studies clearlyindicate that the financial data should be analyzed in terms of Granger causality with amodel that allows for changes in regimes, such as a Markov-switching model.

For macroeconomic time series, the motivation for using Markov-switching modelscomes mainly from the business cycle analysis, as in Hamilton (1989). It is important toknow whether variables have different impacts on other variables during the expansionand recession periods. Still, allowing for higher number of states than two may allow amore detailed analysis of the interactions between variables within the cycles.

Psaradakis et al. (2005) used the Markov-switching VAR models to analyze, the socalled temporary Granger causality within the Money-Output system. They conditiontheir causality analysis on realizations of the Hidden-Markov process. They proposeda restricted MS-VAR specification that assumed four states of the economy: 1. bothvariables cause each other; 2. money does not cause output; 3. output does not causemoney; 4. none of the variables causes another. Our approach consists of choosing aMarkov-switching VAR model specification which is best supported by the data, and thenrestricting it according to the restrictions derived by Warne (2000). This approach takesinto account the two sources of relations between the variables: first, having a source inlinear relations modeled with the VAR model, and second, taking into consideration thefact that all of the variables are used to forecast the future probabilities of the states. Inthe setting analyzed by Warne (2000) Granger noncausality is not conditioned on the pastrealizations of the hidden Markov process.

The remaining part of the paper is organized as follows. In Section 3.2 we present themodel and the Bayesian estimation of the unrestricted model. The definitions for Grangernoncausality, noncausality in variance and noncausality in distribution are presented inSection 3.3, together with parameter restrictions representing them. Section 3.4 presentsdiscussion and critique of classical methods of testing restrictions for Granger noncausalityin different multivariate models. The discussion is followed by a proposal of solution ofthe testing problem. First, the computation of the Posterior Odds Ratio is shown, and thenthe algorithm for estimating the restricted models is discussed. It is described in detail inSection 3.5. Section 3.6 gives empirical illustration of the methodology, using the exampleof the money-income system of variables in the USA. The data support the hypothesisof Granger noncausality (in mean) from money to income, as well as the hypotheses ofcausality in variance and distribution. Section 3.7 concludes.


DOI: 10.2870/63610

3.2. MS-VAR MODEL 121

3.2 A Markov-Switching Vector Autoregressive Model

Model Let y = (y1, . . . , yT)′

denote a time series of T observations, where each yt is a N-variate vector for t ∈ {1, . . . ,T}, taking values in a sampling space Y ⊂ RN. y is a realizationof a stochastic process {Yt}Tt=1. We consider a class of parametric finite Markov mixturedistribution models in which the stochastic process Yt depends on the realizations, st, ofa hidden discrete stochastic process St with finite state space {1, . . . ,M}. Such a class ofmodels has been introduced in time series analysis by Hamilton (1989). Conditioned onthe state, st, and realizations of y up to time t− 1, yt−1, yt follows an independent identicalnormal distribution. A conditional mean process is a Vector Autoregression (VAR) modelin which an intercept, μst , as well as lag polynomial matrices, A(i)

st, for i = 1, . . . , p, and

covariance matrices, Σst , depend on the state st = 1, . . . ,M.

yt = μst +

p∑i=1

A(i)st

yt−i + εt, (3.1)

εt|st ∼ i.i.N(0,Σst), (3.2)

for t = 1, . . . ,T. We set the vector of initial values y0 = (yp−1, . . . , y0)′ to the first pobservations of the available data.

St is assumed to be an irreducible aperiodic Markov chain starting from its ergodicdistribution π = (π1, . . . , πM), such that Pr(S0 = i|P) = πi. Its properties are sufficientlydescribed by the (M ×M) transition probabilities matrix:

P =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

p11 p12 . . . p1M

p21 p22 . . . p2M...

.... . .

...

pM1 pM2 . . . pMM

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦,

in which an element, pij, denotes the probability of transition from state i to state j,pij = Pr(st+1 = j|st = i). The elements of each row of matrix P sum to one,

∑Mj=1 pij = 1.

Such a formulation of the model is called, according to the taxonomy of Krolzig (1997),MSIAH-VAR(p). Conditioned on the state st, it models a current vector of observations, yt,with an intercept, μst , and a linear function of its lagged values up to p periods backwards.


DOI: 10.2870/63610


The linear relation is captured by matrices of the lag polynomial A(i)st

, for i = 1, . . . , p. Theparameters of the VAR process, as well as the covariance matrix Σst , change with time,t, according to discrete valued hidden Markov process, st. These changes in parametervalues introduce nonlinear relationships between variables. Consequently, the inferenceabout interactions between variables must consider the linear and nonlinear relations; thisis the subject of the analysis in Section 3.3.

Complete-data likelihood function Let θ ∈ Θ ⊂ Rk be a vector of size k, collectingparameters of the transition probabilities matrix P and all the state-dependent parametersof the VAR process, θst : μst , A(i)

st, Σst , for st = 1, . . . ,M and i = 1, . . . , p. As stated by

Fruhwirth-Schnatter (2006), the complete-data likelihood function is equal to the jointsampling distribution p(S, y|θ) for the complete data (S, y) given θ, where S = (s1, . . . , sT)′.This distribution is now considered to be a function of θ for the purpose of estimating theunknown parameter vector θ. It is further decomposed into a product of a conditionaldistribution of y given S and θ, and a conditional distribution of S given θ:

p(S, y|θ) = p(y|S, θ)p(S|θ). (3.3)

The former is assumed to be a conditional normal distribution function of εt, for t = 1, . . . ,T,given the states, st, with the mean equal to a vector of zeros andΣst as the covariance matrix:

p(y|S, θ) =T∏

t=1

p(yt|S,yt−1, θ) =T∏

t=1

(2π)−K/2|Σst |−1/2 exp{−1

2ε′tΣ−1stεt

}. (3.4)

The form of the latter comes from the assumptions about the Markov process and is givenby:

p(S|θ) = p(s0|P)M∏

i=1

M∏j=1

pNij(S)i j , (3.5)

where Nij(S) = #{st−1 = j, st = i} is a number of transitions from state i to state j, ∀i, j ∈{1, . . . ,M}.

A convenient form of the complete-data likelihood function (3.3) results from repre-senting it as a product of M + 1 factors. The first M factors depend on the state-specific


DOI: 10.2870/63610


parameters, θst , and the remaining one depends on the transition probabilities matrix, P:

p(y,S|θ) =M∏

i=1

⎛⎜⎜⎜⎜⎜⎝∏t:st=i

p(yt|yt−1, θi)

⎞⎟⎟⎟⎟⎟⎠M∏

i=1

M∏j=1

pNij(S)i j p(s0|P). (3.6)

Classical estimation of the model consists of the maximization of the likelihood functionwith e.g. the EM algorithm (see Krolzig, 1997; Kim and Nelson, 1999b). For the purposeof testing Granger-causal relations between variables, we propose, however, the Bayesianinference, which is based on the posterior distribution of the model parameters θ. (Fordetails of a standard Bayesian estimation and inference on Markov-switching models, thereader is referred to Fruhwirth-Schnatter, 2006). The complete-data posterior distributionis proportional to the product of the complete-data likelihood function (3.6) and the priordistribution:

p(θ|y,S) ∝ p(y,S|θ)p(θ). (3.7)

Prior distribution The convenient factorization of the likelihood function (3.6) is main-tained by the choice of the prior distribution in the following form:

p(θ) =M∏

i=1

p(θi)p(Pi·). (3.8)

The independence of the prior distribution of the state-specific parameters for each stateand the transition probabilities matrix is assumed. This allows the possibility to incorpo-rate prior knowledge of the researcher about the state-specific parameters of the model,θst , separately for each state.

For the unrestricted MSIAH-VAR(p) model, we assume the following prior specifi-cation. Each row of the transition probabilities matrix, P, a priori follows an M variateDirichlet distribution, with parameters set to 1 for all the transition probabilities exceptthe diagonal elements Pii, for i = 1, . . . ,M, for which it is set to 10. Therefore, we as-sume that the states of an economy are persistent over time (see e.g. Kim and Nelson,1999a). Further, the state-dependent parameters of the VAR process are collected invectors βst = (μ′st

,vec(A(1)st

)′, . . . ,vec(A(p)st

)′)′, for st = 1, . . . ,M. These parameters follow a(N+pN2)-variate Normal distribution, with mean equal to a vector of zeros and a diagonalcovariance matrix with 100s on the diagonal. Note that the means of the prior distribution


DOI: 10.2870/63610


for the off-diagonal elements of matrices Ast are set to zero. If we condition our analysis onthe states, this would mean that we assume a priori the Granger noncausality hypothesis.However, in Section 3.3 we show that, when the states are unknown, the inference aboutGranger noncausality involves many other parameters of the model. Moreover, hugevalues of the variances of the prior distribution are assumed. Consequently, no valuesfrom the interior of the parameters space are, in fact, discriminated a priori.

We model the state-dependent covariance matrices of the MSIAH-VAR process, de-composing each to a N × 1 vector of standard deviations, σst , and a N × N correlationmatrix, Rst , according to the decomposition:

Σst = diag(σst)Rstdiag(σst).

Modeling covariance matrices using such a decomposition was proposed in Bayesianinference by Barnard et al. (2000). We adapt this approach to Markov-switching models,since the algorithm easily enables the imposing of restrictions on the covariance matrix(see the details of the Gibbs sampling algorithm for the unrestricted and the restrictedmodels in Section 3.5). We model the unrestricted model in the same manner, becausewe want to keep the prior distributions for the unrestricted and the restricted modelscomparable. Thus, each standard deviation σst. j for st = 1, . . . ,M and j = 1, . . . ,N, followsa log-Normal distribution, with a mean parameter equal to 0 and the standard deviationparameter set to 2.

Finally, we assume that the prior distributions of each of correlation coefficient Rst. jk

is uniformly-distributed at interval (a, b). The bounds a and b are set such that samplingindividual correlations one by one results in positive definite correlation matrix, Rst . Forthe implications of such a prior specification for the matrix of correlations and for thealgorithm for setting values a and b the reader is referred to the original paper of Barnardet al. (2000).

To summarize, the prior specification (3.8) now takes the detailed form of:

p(θ) =M∏

i=1

p(Pi)p(βi)p(Ri)

⎛⎜⎜⎜⎜⎜⎜⎝N∏

j=1

p(σi. j)

⎞⎟⎟⎟⎟⎟⎟⎠ , (3.9)


DOI: 10.2870/63610


where each of the prior distributions is as assumed:

Pi· ∼ DM(ı′M + 9IM.i·)

βi ∼ N(0, 100IN+pN2)

σi. j ∼ logN(0, 2)

Ri. jk ∝ U (a, b)

for i = 1, . . . ,M and j, k = 1, . . . ,N, where ıM is a M × 1 vector of ones and IM.i· is ith row ofan identity matrix IM.

Posterior distribution The structure of the likelihood function (3.6) and the prior distri-bution (3.9) have an effect on the form of the posterior distribution that is proportionalto the product of the two densities. The form of the posterior distribution (3.7), resultingfrom the assumed specification, is as follows:

p(θ|y,S) ∝M∏

i=1

p(θi|y,S)p(P|y,S). (3.10)

It is now easily decomposed into a posterior density of the transition probabilities matrix:

p(P|S) ∝ p(s0|P)M∏

i=1

M∏j=1

pNij(S)i j p(P), (3.11)

and the posterior density of the state-dependent parameters:

p(θi|y,S) ∝∏t:St=i

p(yt|θi,yt−1, )p(θi). (3.12)

Since the form of the posterior density for all the parameters is not standard, thecommonly used strategy is to simulate the posterior distribution with numerical methods.A Monte Carlo Markov Chain (MCMC) algorithm, the Gibbs sampler (see Casella andGeorge, 1992, and references therein), enables us to simulate the joint posterior distributionof all the parameters of the model by sampling from the full conditional distributions.Such an algorithm has also been adapted to Markov-switching models by Albert and Chib(1993) and McCulloch and Tsay (1994). However, the model specification considered in


DOI: 10.2870/63610


this study results in full conditional distributions that are not in a form of any standarddistribution functions. Therefore, the algorithm that samples from such full conditionaldistributions belongs to a broader class of Block Metropolis-Hastings algorithms. Thealgorithm is presented in detail in Section 3.5.

3.3 Granger Causality - Following Warne (2000)

Notation Let {yt : t ∈ Z} be a N × 1 multivariate square integrable stochastic process onthe integers Z. Write:

yt = (y′1t, y

′2t, y

′3t, y

′4t)′, (3.13)

for t = 1, . . . ,T, where yit is a Ni×1 vector such that y1t = (y1t, . . . , yN1.t)′, y2t = (yN1+1.t, . . . , yN1+N2.t)

′,

y3t = (yN1+N2+1.t, . . . , yN1+N2+N3.t)′, and y4t = (yN1+N2+N3+1.t, . . . , yN1+N2+N3+N4.t)

′(N1,N4 ≥

1,N2,N3 ≥ 0 and N1 + N2 + N3 + N4 = N). Variables of interest are contained in vectorsy1 and y4, between which we want to study causal relations. Vectors y2 and y3 (that forN2 = 0 and N3 = 0 are empty) contain auxiliary variables that are also used for fore-casting and modeling purposes. Finally, define two vectors: first (N1 + N2)-dimensional,v1t = (y′1t, y

′2t)′, and second (N3 +N4)-dimensional, v2t = (y′3t, y

′4t)′, such that:

yt =

⎡⎢⎢⎢⎢⎣v1t

v2t

⎤⎥⎥⎥⎥⎦ .Suppose that there exists a proper probability density function ft(yt+1|yt;θ) for each t ∈

{1, 2, . . . ,T}. Suppose that the conditional mean E[yt+1|yt] is finite and that the conditionalcovariance matrix:

E[(yt+1 − E[yt+1|yt])(yt+1 − E[yt+1|yt])′|yt

]positive definite for all finite t. Further, let ut+1 denote 1-step ahead forecast error for yt+1,conditional on yt when the predictor is given by the conditional expectations, i.e.:

ut+1 = yt+1 − E[yt+1|yt]. (3.14)

By construction, ut+1 has conditional mean zero and positive-definite conditional covari-ance matrix. And let ut+1 = yt+1 − E[yt+1|v1t,y3t] be 1-step ahead forecast error for yt+1,conditional on v1t and y3t with analogous properties.


DOI: 10.2870/63610

3.3. GRANGER CAUSALITY - FOLLOWING WARNE (2000) 127

Definitions We focus on the Granger-causal relations between variables y1 and y4. Thefirst definition of Granger causality, originally given by Granger (1969), states simply thaty4 is not causal for y1 when the past and current information about, y4.t cannot improvemean square forecast error of y1.t+1.

Definition 1. y4 does not Granger-cause y1, denoted by y4G� y1, if and only if:

E[u2

t+1

]= E

[u2

t+1

]< ∞ ∀t = 1, . . . ,T. (3.15)

This definition refers to the conditional mean process, and holds if and only if the twomeans conditioned on the full set of variables, yt, and on the restricted set, (v1t, y3t), are thesame (see Boudjellaba et al., 1992). It is argued, however, that this definition cannot give afull insight into relations between variables under changing economic circumstances: if theseries is heteroskedastic, then it is useful to refer to a different concept of causality, namelyGranger causality in variance, introduced by Robins et al. (1986). It states the noncausalitycondition for conditional second-order moments of the series. Note that this definitionstates noncausality in conditional covariance as well as in conditional mean processes.Therefore, this condition is stricter than (3.15).

Definition 2. y4 does not Granger-cause in variance y1, denoted by y4V� y1, if and only

if:E

[u2

t+1|yt]= E

[u2

t+1|v1t, y3t]< ∞ ∀t. (3.16)

Finally, we define the third concept of Granger noncausality, Granger noncausality indistribution.

Definition 3. y4 does not Granger-cause in distribution y1, denoted by y4D� y1, if and

only if:gt+1

(u2

t+1|yt, θ)= ht+1

(u2

t+1|v1t, y3t, θ)∀t, (3.17)

where gt+1 and ht+1 are probability distribution functions with properties as for ft+1.

All the definitions are given in the form following Warne (2000). Note that the defi-nition of Granger noncausality in variance (3.16) is stricter than the definition of Grangernoncausality (3.15); Granger noncausality in variance implies Granger noncausality. Thedefinition of Granger noncausality in distribution (3.17) is defined for conditional distri-butions. It applies also to these distributions that have their moments undefined. All threedefinitions are, however, identical in linear Gaussian models.


DOI: 10.2870/63610


Comte and Lieberman (2000) introduce a new definition of second-order Granger non-causality and distinguish it from Granger noncausality in variance of Robins et al. (1986).For the second-order noncausality, if there exists Granger causality (in mean), then itneeds to be modeled and filtered out; only then may the causal relations in conditionalsecond moments be established. The definition of noncausality in variance assumesGranger noncausality (in mean) and second-order noncausality, and therefore is stricterthan second-order noncausality. In effect, once Granger noncausality is established, thetwo definitions, noncausality in variance and second-order noncausality, are equivalent.The consequences of testing these different concepts are presented in Wozniak (2011).

MSIAH-VARs for Granger causality testing We now present the parameter restrictionsfor different definitions of Granger noncausality for Markov-switching vector autoregres-sions. Before that, however, we introduce the more convenient formulation of the modelspecified in Section 3.2. Firstly, we use the decomposition of the vector of observations intotwo sub-vectors, yt = (v′1t, v

′2t)′, and appropriate decomposition of the parameter matrices,

μst , A(l)st

, and vector of residuals, εt, which has covariance matrix specified in (3.19). Also,the hidden Markov process is decomposed for the purpose of setting the Granger causal-ity relations into two sub-processes, st = (s1t, s2t). The sub-processes have M1 and M2

states that are characterized by transition probability matrices, P(1) and P(2) (and ergodicprobabilities, π(1) and π(2)) respectively, such that M = M1 ·M2. The construction of thetransition probabilities matrix, P, is not specified for the moment and will be the subjectof further analysis. Parameters of the equation for v1t change in time with the Markovprocess s1t, whereas the parameters of the equation for v2t change with process s2t:

⎡⎢⎢⎢⎢⎣v1t

v2t

⎤⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎣μ1.s1t

μ2.s2t

⎤⎥⎥⎥⎥⎦ +p∑

i=1

⎡⎢⎢⎢⎢⎢⎣A(i)11.s1t

A(i)12.s1t

A(i)21.s2t

A(i)22.s2t

⎤⎥⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣v1t−i

v2t−i

⎤⎥⎥⎥⎥⎦ +⎡⎢⎢⎢⎢⎣ε1t

ε2t

⎤⎥⎥⎥⎥⎦ . (3.18)

The residual term in (3.18) has zero conditional mean and conditional covariance matrixdecomposed into sub-matrices as on the left-hand side of (3.19):

Var

⎛⎜⎜⎜⎜⎝⎡⎢⎢⎢⎢⎣ε1t

ε2t

⎤⎥⎥⎥⎥⎦⎞⎟⎟⎟⎟⎠ =

⎡⎢⎢⎢⎢⎣Σ11.s1t Σ′21.st

Σ21.st Σ22.s2t

⎤⎥⎥⎥⎥⎦ , Var

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

ε1t

ε2t

ε3t

ε4t

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

Ω11.s1t Ω′21.stΩ′31.st

Ω′41.st

Ω21.st Ω22.s1t Ω′32.stΩ′42.st

Ω31.st Ω32.st Ω33.s2t Ω′43.st

Ω41.st Ω42.st Ω43.st Ω44.s2t

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦, (3.19)


DOI: 10.2870/63610


where covariance matrices may be decomposed respectively into:

Σi j.st = diag(σi.st)Ri j.stdiag(σ j.st), Ωi j.st = diag(ωi.st)Rij.stdiag(ω j.st). (3.20)

We further decompose vectors of observations, v1t = (y′1t, y

′2t)′and v2t = (y

′3t, y

′4t)′, matrices

of model parameters with the covariance matrix of the residual term specified on the right-hand side of (3.19). The decomposition of the Markov process is maintained, as in (3.18):

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

y1t

y2t

y3t

y4t

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

m1.s1t

m2.s1t

m3.s2t

m4.s2t

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦+

p∑i=1

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

a(i)11.s1t

a(i)12.s1t

a(i)13.s1t

a(i)14.s1t

a(i)21.s1t

a(i)22.s1t

a(i)23.s1t

a(i)24.s1t

a(i)31.s2t

a(i)32.s2t

a(i)33.s2t

a(i)34.s2t

a(i)41.s2t

a(i)42.s2t

a(i)43.s2t

a(i)44.s2t

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

y1t−i

y2t−i

y3t−i

y4t−i

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦+

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

ε1t

ε2t

ε3t

ε4t

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦. (3.21)

Parameter restrictions The parameter restrictions for Markov-switching vector autore-gressions for the three definitions of Granger noncausality presented in this section havebeen derived by Warne (2000). Firstly, we present the restrictions that are specific for theMarkov-switching models. The Restriction 1 states the relations between the two Markovprocesses , s1t and s2t.

Restriction 1. The regime forecast of s1.t+1 is independent, and there is no information inv2t for predicting s1.t+1, i.e.:

Pr[(s1.t+1, s2.t+1) = ( j1, j2)|yt, θ

]= Pr

[s1.t+1 = j1|v1t, θ

] · Pr[s2.t+1 = j2|v2t, θ

],

for all j1 = 1, . . . ,M1 with M1 ≥ 2, j2 = 1, . . . ,M2 and t = 1, . . . ,T, if and only if either:

(A1): (i) P = (P(1) ⊗ P(2)),

(ii) μi.st = μi.si.t ,

(iii) A(k)i j.st= A(k)

i j.si.t,

(iv) Σii.st = Σii.si.t and

(v) Σ12.st = 0

for all i, j ∈ {1, 2}, k ∈ {1, . . . , p} and s1.t ∈ {1, . . . , q1}, and

(vi) A(k)12.s1.t

= 0 for all k ∈ {1, . . . , p} and s1.t ∈ {1, . . . , q1}; or

(A2): P = (ıM1π(1)′ ⊗ P(2)),


DOI: 10.2870/63610


is satisfied.

Note that if we change the restriction (A1)(vi) into A(k)21.s1.t

= 0, then there is no informa-tion in v1t for predicting s2.t+1.

Restriction (A1)(i) gives the condition for independence of the transition probabilities.Restrictions (A1)(ii)-(A1)(iv) state simply that the parameters of the equation for v1t changeonly according to the process s1t, and the parameters of the equation for v2t change onlyaccording to the process s2t. Consequently, the decomposition of the hidden Markovprocess st into two independent subprocesses (s1t, s2t) is fully respected. Further, restriction(A1)(v) states the instantaneous noncausality between the two vectors of variables, v1t andv2t, defined as zero correlation condition. Finally, restriction (A1)(vi) states the Grangernoncausality condition for the VAR process. According to condition (A2), all the states ofprocess s1t have the same probability of appearance for all t equal to the ergodic probability,π(1), which is a condition for s1t to be an independent hidden Markov chain.

Before we go on to the conditions for different types of Granger noncausality, we definethe conditional expected values of the parameters of the VAR process for one period aheadforecast:

m1t ≡ E[m1.st+1 |yt, θ

], (3.22a)

a(k)1rt ≡ E

[a(k)

1r.st+1|yt, θ

], (3.22b)

for all r = 1, . . . ,N and k = 1, . . . , p. These parameters are used for forecasting of variabley1.t+1 (see equation (16) of Warne, 2000), as well as for the purpose of setting noncausalityconditions. Restriction 2 states the conditions for Granger noncausality.

Restriction 2. y4 does not Granger-cause y1 if and only if either:

(A1) or

(A3): (i)∑M

j=1 m1. jpi j = m1,

(ii)∑M

j=1 a(k)1r. jpi j = a(k)

1r , and

(iii) a(k)14 = 0

for all i ∈ {1, . . . ,M}, r ∈ {1, . . . ,N}, and k ∈ {1, . . . , p},

is satisfied.


DOI: 10.2870/63610


Contrary to conditions (A1) and (A2), the condition (A3) is not linear in parameters.Still, conditions (A3)(i) and (A3)(ii) have equivalent form,

∑Mj=1 m1, j(pij − pkj) = 0 for

i, k = 1, . . . ,M and i � k, which for some special cases may give restrictions linear inparameters. The condition (A3)(iii) does not have such a form and thus stays nonlinear.Further, in Section 3.4 we discuss consequences of the nonlinearity of the restrictionsfor testing them. Restriction 3 for noncausality in variance contains highly nonlinearconditions as well.

Restriction 3. y4 does not Granger-cause in variance y1 if and only if either:

(A1) or

(A4): (i) (A2),

(ii)∑M

j=1

[(m1. j − m1) ⊗ (m1. j − m1)

]pij = ςm,

(iii)∑M

j=1

[(a(k)

1r. j − a(k)1r ) ⊗ (a(l)

1s. j − a(l)1s)

]pij = ς

(k.l)r.s ,

(iv)∑M

j=1

[(m1. j − m1) ⊗ (a(k)

1r. j − a(k)1r )

]pi. j = ς

(k)μ.r,

(v)∑M

j=1 σ1. jpi j = ςσ, and

(vi) a(k)14. j = 0 ,

for all i, j ∈ {1, . . . ,M}, r, s ∈ {1, 2, 3}, and k, l ∈ {1, . . . , p},is satisfied.

In condition (A4), ςm, ς(k.l)r.s , ς(k)

μ.r and ςσ, are time-invariant covariance matrices ofthe conditional expected value of the one period ahead forecast of the state-dependentparameters (see Warne, 2000, for the exact definition). Some of these restrictions may besimplified using the algebraically equivalent form:

∑Mj=1(m1. j ⊗m1. j)pij = ςm + (m1 ⊗ m1).

Finally, we present Restriction 4, which states the conditions for noncausality in distri-bution.

Restriction 4. y4 does not Granger-cause in distribution y1 if and only if either:

(A1) or

(A5): (i) (A2)

(ii) m1. j = m1. j1 ,


DOI: 10.2870/63610


(iii) a(k)1r. j = a(k)

1r. j1,

(iv) a(k)14. j = 0, and

(v) σ1. j = σ1. j1

for all j ∈ {1, . . . ,M}, r ∈ {1, 2, 3}, and k ∈ {1, . . . , p}

is satisfied.

All the Restrictions 4 are linear in parameters and can be easily tested. Conditions(A5)(ii)–(A5)(v) state simply that the parameters of the equation for y1t cannot vary intime according to process s1t , but should instead be s1t-invariant.

Warne (2000) sets additional and simplified forms of restrictions (A3)–(A5), given thecondition (A2) and that rank(P(2)) =M2. We present these in C.1.

3.4 Bayesian Testing

Restrictions 1–4 can be tested. We first consider classical tests and their limitations andthen present the Bayesian testing procedure as a solution. The obstacles in using classicaltests are threefold:

• The asymptotic distribution of the parameters of the MS-VAR is unknown;

• The conditions for noncausality may result in several sets of restrictions on parame-ters. Consequently, one hypothesis may be represented by several restricted models;

• Some of the restrictions are in the form of nonlinear functions of parameters of themodel.

The proposed solution consists of a new Block Metropolis-Hastings sampling algorithmfor the estimation of the restricted models, and of the application of a standard Bayesiantest to compare the restricted models to the unrestricted one.

Classical testing In the general case, all the mentioned problems with classical testingare difficult to cope with. While, the lack of the asymptotic distribution of the parameterscould be solved using simulation methods, the problem of testing a hypothesis representedby several restricted models seems unsolvable with existing classical methods.


DOI: 10.2870/63610

3.4. BAYESIAN TESTING 133

The problem of the nonlinearity of the restrictions, however, is well known in thestudies on testing parameter conditions for Granger noncausality in multivariate models.In the general case, nonlinear restrictions on parameters of the model may result in thematrix of partial derivatives of the restrictions with respect to the parameters not havinga full rank. Consequently, the asymptotic distribution of test statistic is not known.

This problem was met in several studies on Granger noncausality testing in timeseries models. Boudjellaba et al. (1992) derive conditions for Granger noncausality forVARMA models that result in multiple nonlinear restrictions on original parameters of themodel. As a solution to the problem of testing the restrictions, they propose a sequentialtesting procedure. There are two main drawbacks of this method. First, despite properlyperformed procedure, the test may still appear inconclusive, and second, the confidencelevel is given in the form of inequalities. The problem of testing non-linear restrictionswas examined for h-periods ahead Granger causality for VAR models. Dufour et al.(2006) propose the solution based on formulating a new model for each h, and obtainlinear restrictions on the parameters on the model. These restrictions can be easily testedwith standard tests. In another work by Dufour (1989) the approach is based on thelinear regression theory; its solutions would require separate proofs in order to apply itto Markov-switching VARs. Finally, Lutkepohl and Burda (1997) propose a solution fortesting nonlinear hypotheses based on a modification of the Wald test statistic. Given theasymptotic normality of the estimator of the parameters, the method uses a modificationthat, together with standard asymptotic derivations, overcomes the singularity problem.

Finally, the problem of testing the nonlinear restrictions was faced by Warne (2000), whoderives the restrictions for Granger noncausality, noncausality in variance and noncausal-ity in distribution for Markov-switching VAR models. Among the solutions reviewedin this Section, only that proposed by Lutkepohl and Burda (1997) seems applicable tothis particular problem. This finding should, however, be followed with further studiesproving its applicability.

Bayesian testing In this study we propose a method of solving the problems of testingthe parameter restrictions based on Bayesian inference. This approach to testing thenoncausality conditions was used by Wozniak (2011, 2012). Both of the papers work onthe Extended CCC-GARCH model of Jeantheau (1998). Two other works use the Bayesianapproach to make inference about a concept somehow related to Granger noncausality,namely exogeneity. Jarocinski and Mackowak (2011) use Savage-Dickey’s Ratio to test


DOI: 10.2870/63610


block-exonegeneity in the VAR model, while Pajor (2011) uses Bayes factors to assessexogeneity conditions for models with latent variables, and in particular in multivariateStochastic Volatility models.

In order to compare the unrestricted model, denoted byMi, and the restricted model,M j and = j � i, we use the Posterior Odds Ratio (POR), which is a ratio of the posteriorprobabilities, Pr(M|y), attached to each of these models representing the hypotheses:

POR =Pr(Mi|y)Pr(M j|y)

=p(y|Mi)p(y|M j)

Pr(Mi)Pr(M j)

, (3.23)

where p(y|M) is the marginal density of data and Pr(M) is the prior probability of a model.In order to compare two competing models, one might also consider using Bayes factors,defined by:

Bi j =p(y|Mi)p(y|M j)

. (3.24)

Note that if one chooses not to discriminate any of the models a priori, setting equal priorprobabilities for both of the models (Pr(Mi)/Pr(M j) = 1), the Posterior Odds Ratio isthen equal to a Bayes factor. This method of testing does not have any of the drawbacksof the Likelihood Ratio test, once samples of draws from the posterior distributions ofparameters for both the models are available (see Geweke, 1994; Kass and Raftery, 1995).

In this work, in order to asses the credibility of the hypotheses, each of which isrepresented by several sets of restrictions – and thus several models – we compute PosteriorOdds Ratios. The results of this analysis are reported in Table 3.6 in Section 3.6. Supposethat a hypothesis is represented by several models. Let Hi denote the set of indicatorsof the models that represent this hypothesis, Hi = { j : M j represents ith hypothesis}. Forinstance, in our example, the hypothesis of Granger noncausality in mean is representedby four models, such that H2 = {1, 2, 4, 5}. Further, suppose that one is interested incomparing the posterior probability of this hypothesis to the hypothesisH0, representedby the unrestricted modelM0. Then the credibility of the hypothesisHi compared to thehypothesisH0 may be assessed with the Posterior Odds Ratio given by:

POR =Pr(Hi|y)Pr(H0|y)

=

∑j∈Hi

Pr(y|M j)Pr(M j)

Pr(y|M0)Pr(M0). (3.25)

We set equal prior probabilities for all the models, which has the effect that none of the


DOI: 10.2870/63610

3.4. BAYESIAN TESTING 135

models is preferred a priori.

Testing the noncausality restrictions in MS-VARs Taking into account the complicatedstructure of the restrictions, we choose Posterior Odds Ratio (3.23) to assess the hypotheses.The crucial element of this method the is computation of marginal data densities, p(y|M),for the unrestricted and the restricted models. There are several available methods ofcomputing this value. In this study we choose the Modified Harmonic Mean (MHM)method of Geweke (1999). For a chosen model, given the sample of draws, {θ(i)}Si=1, fromthe posterior distribution of the parameters, p(θ|y,M), the marginal density of data iscomputed using:

p(y|M) =

⎛⎜⎜⎜⎜⎜⎝S−1S∑

i=1

h(θ(i))L(y;θ(i),M)p(θ(i)|M)

⎞⎟⎟⎟⎟⎟⎠−1

, (3.26)

where L(y;θ(i),M) is a likelihood function od model M. h(θ(i)), as specified in Geweke(1999), is a k-variate truncated normal distribution with mean parameter equal to theposterior mean and covariance matrix set to the posterior covariance matrix of θ. Thetruncation must be such that h(θ) had thinner tails than the posterior distribution.

Other methods of computing the marginal density of data may also be employed. Sev-eral estimators were derived, taking into account the characteristics of Markov-switchingmodels. The reader is referred to the original papers by Fruhwirth-Schnatter (2004), Simset al. (2008) and Chib and Jeliazkov (2001). Moreover, Fruhwirth-Schnatter (2004) rises theproblem of the bias of the estimators when the label permutation mechanism is missingin the algorithm sampling from the posterior distribution of the parameters. The biasappears to be due to the invariance of the likelihood function and the prior distributionof the parameters, with respect to permutations of the regimes’ labels. Then the model isnot globally identified. The identification can be insured by the ordering restrictions onparameters, and can also be implemented within the Gibbs sampler. Simply, it is sufficientthat the values taken by one of the parameters of the model in different regimes can beordered, and that the ordering holds for all the draws from the Gibbs algorithm to assureglobal identification (see Fruhwirth-Schnatter, 2004). We assure that this is the case, i.e.that the MS-VAR models considered for causality inference are globally identified by theordering imposed on some parameter.

Another element of the testing procedure is the estimation of the unrestricted modeland the restricted models representing hypotheses of interest. We present a new Block


DOI: 10.2870/63610


Metropolis-Hastings sampling algorithm specially constructed for the purpose of testingnoncausality hypotheses in the MS-VAR models in Section 3.5. It enables the imposingof restrictions on parameters resulting from conditions (A1) - (A7), and in effect testingdifferent hypotheses of Granger noncausality between variables. In the algorithm, therestrictions are imposed on different groups of the parameters of the model. First, lin-ear restrictions on the parameters of the VAR process, β, are implemented according toFruhwirth-Schnatter (2006). Next, parameters of the covariance matrices are decomposedinto standard deviations, σ, and correlation parameters, R. To these parameter groups weapply the Griddy-Gibbs sampler of Ritter and Tanner (1992), as in Barnard et al. (2000).Such a form of the sampling algorithm easily allows to restrict any of the parameters. Notethat the algorithm of Barnard et al. (2000) has not yet been applied to Markov-switchingmodels. Finally, we restrict the matrix of transition probabilities, P, joining the approach ofSims et al. (2008) with the Metropolis-Hastings algorithm of Fruhwirth-Schnatter (2006).The Metropolis-Hastings step needs to be implemented, as we require the hidden Markovprocess to be irreducible. Moreover, additional parts of the algorithm are constructedin order to impose nonlinear restrictions on the parameters of the VAR process and thedecomposed covariance matrix.

To summarize, we propose the following procedure in order to test different Grangernoncausality hypotheses in Markov-switching VAR models.

Step 1: Specify the MS-VAR model. Choose the order of VAR process, p ∈ {0, 1, . . . , pmax},and the number of states, M ∈ {1, . . . ,Mmax}, using marginal densities of data (esti-mation of all the models is required).

Step 2: Set the restrictions. For the chosen model, derive restrictions on parameters.

Step 3: Test restrictions (A1) and (A2). Estimate the restricted models and compute forthem marginal densities of data. Compare the restricted models to the the unre-stricted one using the Posterior Odds Ratio, e.g. according to the scale proposed byKass and Raftery (1995).

Step 4: Test hypotheses of noncausality. If the model restricted according to (A1) is pre-ferred to the unrestricted model, then noncausality of all kinds is established. In theother case, if the model restricted according to (A2) is preferred to the unrestrictedmodel, in order to test different noncausality hypotheses use conditions (A6)–(A7).


DOI: 10.2870/63610

3.5. THE BLOCK MH SAMPLER FOR RESTRICTED MS-VAR MODELS 137

In the opposite case use conditions (A3)–(A5). For testing, use the Posterior OddsRatio as in Step 3.

Advantages and costs of the proposed approach We start by naming the main advan-tages of the proposed Bayesian approach to testing the restrictions for Granger noncausal-ity. First, using the Posterior Odds Ratio testing principle, we avoid all the problems oftesting nonlinear restrictions on the parameters of the model that appear in classical tests.Secondly, in the context of the controversies concerning the choice of number of statesfor Markov-switching models in the classical approach (see Psaradakis and Spagnolo,2003; Psaradakis and Sola, 1998), the Bayesian model selection proposed in Step 1 is aproper method free of such problems. Next, as emphasized in Hoogerheide et al. (2009),the Bayesian Posterior Odds Ratio procedure gives arguments in favour of hypotheses.Accordingly, the hypothesis preferred by the data is not only rejected or not rejected, but isactually accepted with some probability. Finally, Bayesian estimation is a basic estimationprocedure proposed for the MS-VAR models and is broadly described and used in manyapplied publications.

However, this approach has also its costs. First of all, in order to specify the completemodel, prior distributions for the parameters of the model and the prior probabilities ofmodels need to be specified. This necessity gives way to subjective interpretation of theinference, on the one hand, but on the others it may ensure economic interpretation ofthe model. The other cost of the implementation of the Bayesian approach is the timerequired for simulation of all the models, first in the model selection procedure, andsecond in testing the restrictions of the parameters.

3.5 The Block Metropolis-Hastings sampler for restricted MS-

VAR models

This section scrutinizes the MCMC sampler set up for sampling from the full conditionaldistributions. Each step describes the full conditional distribution of one element of thepartitioned parameter vector. The parameter vector is broken up into five blocks: thevector of the latent states of the economy S, the transition probabilities P, the regime-dependent covariance matrices (themselves decomposed into standard deviations σ andcorrelations R), and finally the regime-dependent vector of constants plus autoregressive


DOI: 10.2870/63610


parameters β. For each block of parameters – conditionally on the parameter draws fromthe four other blocks – we describe how we sample from the posterior distribution. Thesymbols, l and l − 1, refer to the iteration of the MCMC sampler. For the first iteration ofa MCMC run, l = 1, initial parameter values come from an EM algorithm. The rest of thissection describes all the constituting blocks that form the MCMC sampler.

3.5.1 Sampling the vector of the states of the economy

The first drawn parameter is the vector representing the states of the economy, S. Being alatent variable, there are no priors nor restrictions on S. We first use a filter (see Section11.2 of Fruhwirth-Schnatter, 2006, and references therein) and obtain the probabilitiesPr(st = i|y, θ(l−1)), for t = 1, . . . ,T and i = 1, . . . ,M, and then draw S(l), for lth iteration ofthe algorithm. For the full description of the algorithm used in this work the reader isreferred to Droumaguet and Wozniak (2012).

3.5.2 Sampling the transition probabilities

In this step of the MCMC sampler, we draw from the posterior distribution of the transitionprobabilities matrix, conditioning on the states drawn in the previous step of the currentiteration, P(l) ∼ p(P|S(l)). For the purpose of testing, we impose restrictions of identical rowsof P. Sims et al. (2008) provide a flexible analytical framework for working with restrictedtransition probabilities, and the reader is invited to consult Section 3 of that work for anexhaustive description of the possibilities provided by the framework. We however limitthe latitude given by the reparametrization in order to ensure the stationarity of Markovchain S.

Reparametrization The transitions probabilities matrix P is modeled with Q vectors wj,j = 1, · · · ,Q and each of size dj. Let all the elements of wj belong to the (0, 1) interval andsum up to one, and stack all of them into the column vector w = (w

′1, . . . ,w

′Q)′of dimension

d =∑Q

j=1 dj. Writing p = vec(P′) as a M2 dimensional column vector, and introducing the

(M2 × d) matrix M, the transition matrix is decomposed as:

p =Mw, (3.27)


DOI: 10.2870/63610


where the M matrix is composed of the Mij sub-matrices of dimension (M × dj), wherei = 1, . . . ,M, and j = 1, . . . ,Q:

M =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣M11 . . . M1Q...

. . .

MM1 MMQ

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ ,

where each Mij satisfies the following conditions:

1. For each (i, j), all elements of Mij are non-negative.

2. ı′MMij = Λi jı

′dj

, where Λi j is the sum of the elements in any column of Mij.

3. Each row of M has, at most, one non-zero element.

4. M is such that P is irreducible: for all j, dj ≥ 2.

The first three conditions are inherited from Sims et al. (2008), whereas the last conditionassures that P is irreducible, forbidding the presence of an absorbing state that wouldrender the Markov chain S non-stationary. The non-independence of the rows of P isdescribed in Fruhwirth-Schnatter (2006, Section 11.5.5). Once the initial state s0 is drawnfrom the ergodic distributionπ of P, direct MCMC sampling from the conditional posteriordistribution becomes impossible. However, a Metropolis-Hastings algorithm can be setup to circumvent this issue, since a kernel of joint posterior density of all rows is known:p(P|S) ∝ ∏Q

j=1Ddj(wj)π. Hence, the proposal for transition probabilities is obtained bysampling each wj from the convenient Dirichlet distribution. The priors for wj followa Dirichlet distribution, wj ∼ Ddj(b1, j, . . . , bdj, j). We then transform the column vector w

into our candidate matrix of transitions probabilities using equation (3.27). Finally, wecompute the acceptance rate before retaining or discarding the draw.

Algorithm 5. Metropolis-Hastings for the restricted transition matrix.

1. s0 ∼ π. The initial state is drawn from the ergodic distribution of P.

2. wj ∼ Ddj(n1, j + b1, j, . . . , ndj, j + bdj, j) for j = 1, . . . ,Q. ni, j corresponds to the numberof transitions from state i to state j, counted from S. The candidate transitionprobabilities matrix – in the transformed notation – are sampled from a Dirichletdistribution.


DOI: 10.2870/63610


3. Pnew =Mw. The proposal for the transitions probabilities matrix is reconstructed.

4. Accept Pnew if u ≤ πnew

πl−1 , where u ∼ U[0, 1]. πnew andπl−1 are the vectors of the ergodicprobabilities resulting from the draws of the transition probabilities matrix Pnew andPl−1 respectively.

3.5.3 Sampling a second and independent hidden Markov process

Regime inference from proposition (A1) involves two independent Markov processes.Equation (3.18) decomposes the vector of observations into two sub-vectors. Equationscontained within each sub-vector are subject to switches from a different and independentMarkov process. Sims et al. (2008, section 3.3.3) cover a similar decomposition.

Adding a Markov process is trivial in the sense it involves repeating the steps of Section3.5.1 and Algorithm 5 subsequently for a second process, yielding two distinct transitionprobabilities matrices P(1) and P(2). The transition probabilities matrix for the whole systemis formed out of the transition probabilities matrices of two independent hidden Markovprocesses, P = (P(1) ⊗ P(2)).

3.5.4 Sampling the covariance matrices

Adapting the approach proposed by Barnard et al. (2000) to Markov-switching models, wesample from the full conditional distribution of non-restricted and restricted covariancematrices. We thus decompose each covariance matrix of the MSIAH-VAR process into avector of standard deviations (σst) and a correlation matrix (Rst) using the equality:

Σst = diag(σst)Rstdiag(σst).

This decomposition – statistically motivated – enables the partition of the covariancematrix parameters into two categories that are well suited for the restrictions we want toimpose on the matrices. In a standard covariance matrix, restricting a variance parameterto some value has some impact on the depending covariances, whereas here variancesand covariances (correlations) are treated as separate entities. The second and not theleast advantage of the approach of Barnard et al. (2000) lies in the employed estimationprocedure, the griddy-Gibbs sampler. The method introduced in Ritter and Tanner (1992)is well suited for sampling from an unknown univariate density p(Xi|X j, i � j). Thisis done by approximating the inverse conditional density function, which is done by


DOI: 10.2870/63610


evaluating p(Xi|X j, i � j) thanks to a grid of points. Imposing the desired restrictionson the parameters, and afterwards iterating a sampler for every standard deviation σi.st

and every correlation R j.st , we are able to simulate desired posteriors of the covariancematrices. While adding to the overall computational burden, the griddy-Gibbs samplergives us full latitude to estimate restricted covariance matrices of the desired form.

Algorithm 6. Griddy-Gibbs for the standard deviations. The algorithm iterates on all thestandard deviation parameters σi.st for i = 1, . . . ,N and st = 1, . . . ,M. Similarly to Barnardet al. (2000) we assume log-normal priors, log(σi.st) ∼ N(0, 2). The grid is centered on theresiduals’ sample standard deviation σi.st and divides the interval (σi.st −2σσi.st

, σi.st +2σσi.st)

into G grid points. σσi.stis an estimator of the standard error of the estimator of the sample

standard deviation.

1. Regime-invariant standard deviations: Draw from the unknown univariate den-sity p(σi|y,S,P, β, σ−i,R). This is done by evaluating a kernel on a grid of points,using the proportionality relation, with the likelihood function times the prior:σi|y,S,P, β, σ−i,R ∝ p(y|S, θ) · p(σi). Reconstruct the c.d.f. from the grid throughdeterministic integration and sample from it.

2. Regime-varying standard deviations: For all regimes st = 1, . . . ,M, draw from theunivariate density p(σi.st |y,S,P, β, σ−i.st ,R), evaluating a kernel thanks to the propor-tionality relation, with the likelihood function times the prior: σi.st |y,S,P, β, σ−i.st ,R ∝p(y|S, θ) · p(σi.st).

Algorithm 7. Griddy-Gibbs for the correlations The algorithm iterates on all the correlationparameters Ri.st for i = 1, . . . , (N−1)N

2 and st = 1, . . . ,M. Similarly to Barnard et al. (2000),we assume uniform distribution on the feasible set of correlations, Ri.st ∼ U(a, b), with aand b being the bounds that keep the implied covariance matrix positive definite; see theaforementioned reference for details of setting a and b. The grid divides (a, b) into G gridpoints.

1. Depending on the restriction scheme, set correlations parameters to 0.

2. Regime-invariant correlations: Draw from the univariate density p(Ri|y,S,P, β, σ,R−i),evaluating a kernel thanks to the proportionality relation, with the likelihood func-tion times the prior: Ri|y,S,P, β, σ,R−i ∝ p(y|S, θ) · p(Ri).


DOI: 10.2870/63610


3. Regime-varying correlations: For all regimes st = 1, . . . ,M, draw from the univari-ate density p(Ri.st |y,S,P, β, σ,R−i.st), evaluating a kernel thanks to the proportional-ity relation, with the likelihood function times the prior: Ri.st |y,S,P, β, σ,R−Ri.st

∝p(y|S, θ) · p(Ri.st).

3.5.5 Sampling the vector autoregressive parameters

Finally, we draw the state-dependent autoregressive parameters, βst for st = 1, . . . ,M. TheBayesian parameter estimation of finite mixtures of regression models when the realiza-tions of states is known has been precisely covered in Fruhwirth-Schnatter (2006, Section8.4.3). The procedure consists of estimating all the regression coefficients simultaneouslyby stacking them into β = (β0, β1, . . . , βM), where β0 is a common regression parameter foreach regime, and hence is useful for the imposing of restrictions of state invariance for theautoregressive parameters. The regression model becomes:

yt = Ztβ0 + ZtDi.1β1 + · · · + ZtDi.MβM + εt, (3.28)

εt ∼ i.i.N(0,Σst). (3.29)

We have here introduced the Di.st , which are M dummies taking the value 1 when theregime occurs and set to 0 otherwise. A transformation of the regressors ZT also has tobe performed in order to allow for different coefficients on the dependent variables, forinstance to impose zero restrictions on parameters. In the context of VARs, Koop andKorobilis (2010, Section 2.2.3) detail a convenient notation that stacks all the regressioncoefficients on a diagonal matrix for every equation. We adapt this notation by stackingall the regression coefficients for all the states on diagonal matrix. If zn.st.t correspondsto the row vector of 1 + Np independent variables for equation n, state st (starting at 0for regime-invariant parameters), and at time t, the stacked regressor Zt will be of thefollowing form:

Zt = diag(z1.0.t, . . . , zN.0.t, z1.1.t, . . . , zN.1.t, . . . , z1.M.t, . . . , zN.M.t).

This notation enables the restriction of each parameter, by simply setting zn.st.t to 0 wheredesired.


DOI: 10.2870/63610


Algorithm 8. Sampling the autoregressive parameters. We assume normal prior for β, i.e.β ∼ N(0,Vβ) .

1. For all Zts, impose restrictions by setting zn,st,t to zero accordingly.

2. β|y,S,P, σ,R ∼ N(β,Vβ). Sample β from the conditional normal posterior distribu-tion, with the following parameters:

Vβ =

⎛⎜⎜⎜⎜⎜⎝V−1β +

T∑t=1

Z′tΣ−1st

Zt

⎞⎟⎟⎟⎟⎟⎠−1

and

β = Vβ

⎛⎜⎜⎜⎜⎜⎝T∑

t=1

Z′tΣ−1st

yt

⎞⎟⎟⎟⎟⎟⎠ .

3.5.6 Simulating restrictions in the form of functions of the parameters.

Some of the restrictions for Granger noncausality presented in Section 3.3 will be in theform of complicated functions of parameters. Suppose some restriction is in the form:

θi = g(θ−i),

where g(.) is a scalar function of all the parameters of the model but θi. The restrictedparameter, θi, in this study may be one of the parameters from the autoregressive param-eters, β, or standard deviations, σ. In such a case, the full conditional distributions forβ or σ are no longer independent and need to be simulated with a Metropolis-Hastingsalgorithm.

Restriction on the vector autoregressive parameters β In this case, the deterministicfunction restricting parameter βi will be of the following form:

βi = g(β−i, σ,R,P).

We draw from the full conditional distribution of the vector autoregressive parameters,p(β|y,S,P, σ,R), using the Metropolis-Hastings algorithm:

Algorithm 9. Metropolis-Hastings for the restricted vector autoregressive parameters β.


DOI: 10.2870/63610


1. Form a candidate draw, βnew, using Algorithm 10.

2. Compute the probability of acceptance of a draw:

α(βl−1, βnew) = min[p(y|S,P, βnew, σ,R)p(βnew)p(y|S,P, βl−1, σ,R)p(βl−1)

, 1]. (3.30)

3. Accept βnew if u ≤ α(βl−1, βnew), where u ∼ U[0, 1].

The algorithm has its justification in the block Metropolis-Hastings algorithm of Green-berg and Chib (1995). The formula for computing the acceptance probability from equation(3.30) is a consequence of the choice of the candidate generating distributions. For theparameters β−i, it is a symmetric normal distribution, as in step 2 of Algorithm 8, whereasβi is determined by a deterministic function.

Algorithm 10. Generating a candidate draw β.

1. Restrict parameter βi to zero. Draw all the parameters (β1, . . . , βi−1, 0, βi+1, . . . , βk)′

according to the algorithms described in Section 3.5.5.

2. Compute βi = g(β−i, σ,R,P).

3. Return the vector (β1, . . . , βi−1, g(β−i, σ,R,P), βi+1, . . . , βk)′

3.6 Granger causal analysis of US money-income data

In both studies focusing on Granger causality analysis within Markov-switching vectorautoregressive models, Warne (2000) and Psaradakis et al. (2005),1 the focus of study isthe causality relationship between U.S. money and income. At the heart of this issue isthe empirical analysis conducted in Friedman and Schwartz (1971) asserting that moneychanges led income changes. The methodology was rejected by Tobin (1970) as a posthoc ergo propter hoc fallacy, arguing that the timing implications from money to incomecould be generated not only by monetarists’ macroeconomic models but also by Key-nesian models. Sims (1972) initiated the econometric analysis of the causal relationshipfrom the Granger causality perspective. While a Granger causality study concentrates on

1The total US economic activity is approached from two different perspectives in these papers: Warne(2000) uses monthly income data, whereas Psaradakis et al. (2005) use quarterly output data.


DOI: 10.2870/63610

3.6. GRANGER CAUSAL ANALYSIS OF US MONEY-INCOME DATA 145

forecasting outcomes, macroeconomic theoretical modeling tries to remove the questionmark over the neutrality of monetary policy for the business cycle. The causal relationshipbetween money and income is, however, of particular interest to the econometric debate,since over the past forty years researchers have not reached a consensus.

This historical debate between econometricians is well narrated by Psaradakis et al.(2005), and the interested reader is advised to consult this paper for a depiction of events.Without detailing the references of the aforementioned paper, there is a problem in theinstability of the empirical results found for the causality between money and output.Depending on the samples considered (postwar onwards data, 1970s onwards data, 1980sonwards, 1980s excluded, etc.), the existence and intensity of the causal effect of moneyon output are subject to different conclusions. Hence, the strategy of Psaradakis et al.(2005): to set up a Markov-switching VAR model in which the parameters responsible fornoncausality in VAR models are subject to regime switches, with some regimes in whichthey are set to zero (noncausality for VARs) and others in which they are allowed to bedifferent from zero. MS-VAR models are convenient tools because the switches in regimesare endogenous and can occur as many times as the data impose.

As outlined in the introduction, with the approach of Warne (2000) which we follow, theMS-VAR models are ’standard‘ ones, and we perform Bayesian model selection throughthe comparison of their marginal densities of data, to determine the number of statesas well as the number of autoregressive lags. Moreover, we perform an analysis withprecisely stated definitions of Granger causality for Markov-switching models. In thissection, we use the Bayesian testing apparatus to investigate this relationship once again.

Data The data are identical to those estimated by Warne (2000) and cover the same timeperiod as in the original paper. Two monthly series are included, the US money stock M1and the industrial production, both containing 434 observations covering the period, from1959:1 to 1995:2, and both were extracted from the Citibase database. As in the originalpaper, the data are seasonally adjusted, transformed into log levels, and multiplied by1200. Warne (2000) performed Johansen tests for cointegration, and – unlike for levelseries – trace statistics indicated no cointegration for differentiated series. Similarly, wework with the first difference of the series.

The summary statistics of both series are presented in Table 3.1. Income grows yearlyby 3% on average, with a standard deviation of 11%, which seems a lot, but one has tonote that we manipulate the monthly series for which the rates are annualized. Money


DOI: 10.2870/63610


Figure 3.1: Log-differentiated series of money and income.-4

00

2040

60

inco

me

-10

010

2030

1960 1965 1970 1975 1980 1985 1990 1995

mon

ey

Time

Table 3.1: Summary statistics

Variable Mean Median Standard Deviation Minimum Maximum

Δy 3.396 4.18 10.99 -51.73 73.72Δm 5.851 5.24 5.79 -17.39 30.03Data Source: Citibase.

has a stronger growth rate of nearly 6% on average, with a lower standard deviation thanthe income, below 6%.

Figure 3.1 plots the transformed series. Observation indicates that at least some het-eroskedasticity is present, as can be seen with the money series, where a period of highervolatility starts around 1980. Summary statistics and series observations all seem to in-dicate the possibility of different states in the series, in which case MS-VAR models canprovide a useful framework for analysis. We, however, start our analysis with Grangercausality testing in the context of linear VAR models.


DOI: 10.2870/63610


Table 3.2: Model selection for VAR(p) – determination of number of lags

Lags 0 1 2 3 4 5 6 7 8lnMHM -3149.63 -2991.7 -2983.4 -2966.49 -2970.25 -2954.49 -2948.57 -2944 -2939.52

Lags 9 10 11 12 13 14 15 16 17lnMHM -2936.67 -2941.2 -2917.97 -2916.77 -2917.87 -2926.21 -2923.23 -2930.82 -2936.96

Granger causal analysis with VAR model The reason why we begin by studyingGranger causality with linear models is that we want to relate to the standard methodol-ogy, and to illustrate whether a non-linear approach brings added value to the analysisby comparing the results. Also, the Block Metropolis-Hastings sampler of Section 3.5 caneasily be simplified to a Block Metropolis-Hastings sampler for VAR models. By doing so,estimating linear VAR models and comparing marginal densities, we will also comparewhether or not these models are preferred by the data to more complex MS-VAR ones.

We estimate the data with the VAR models for different lag lengths, p = 0, . . . , 17.Each of the Metropolis-Hastings algorithms is initiated by the OLS estimates of the VARcoefficients. Then follows a 10,000-iteration burn-in and, after convergence of the sampler,5000 final draws are to constitute the posteriors. The prior distributions are as follow:

βi ∼ N(0, 100IN+pN2)

σi. j ∼ logN(0, 2)

Ri j ∝ U (a, b)

for i = 1, . . . ,M and j = 1, . . . ,N.Table 3.2 displays the marginal density of data for each model, computed with the

modified harmonic mean obtained by applying formula (3.26) to the posteriors draws.As in Warne (2000), models with long lags are preferred. The VAR(12) model, i.e. with12 lags for the autoregressive coefficients, yields the highest lnMHM and hence is themodel we choose for the Granger causality analysis. Table C.1 in Appendix C.2 displays,for each parameter of the model, the mean, standard deviations, naive standard errors,autocorrelations of the Markov Chain at lag 1 and lag 10. Low autocorrelation at lag 10indicates that the sampler has good properties.

The set of restrictions to impose on the parameters for vector autoregressive moving


DOI: 10.2870/63610


average models were covered in Sims (1972) and Boudjellaba et al. (1992). Translated intothe VAR representation, and in the case of a bivariate VAR(p) model:

⎡⎢⎢⎢⎢⎣y1,t

y2,t

⎤⎥⎥⎥⎥⎦ =⎡⎢⎢⎢⎢⎣μ1

μ2

⎤⎥⎥⎥⎥⎦ +p∑

i=1

⎡⎢⎢⎢⎢⎣A(i)11 A(i)

12A(i)

21 A(i)22

⎤⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎣y1,t−i

y2,t−i

⎤⎥⎥⎥⎥⎦ +⎡⎢⎢⎢⎢⎣ε1,tε2,t

⎤⎥⎥⎥⎥⎦ ,for t = 1, . . . ,T, the restrictions for money, y2,t, being Granger noncausal on income, y1,t,read:

A(i)12 = 0 for i = 1, . . . , p.

Note that these restrictions, with assumed normal residual terms, are simultaneouslyencompassing Granger noncausality in mean, variance, and distribution.

The estimation of the restricted VAR(12) model, with its upper-right autoregressivecoefficients A(i)

12 set to 0 for all lags returns posteriors that yield a lnMHM of -2901.63.Expressed in logarithms, the posterior odds ratio of the null hypothesis of Granger causal-ity from money to income is equal to 15.13. Table 3.3 summarizes the results for VARmodels. This is a very strong acceptance of the restricted modelM1 over the nonrestrictedoneM0, hence Bayesian testing provides evidence in favor of Granger noncausality frommoney to income, within the VAR framework. This result is in line with Christiano andLjungqvist (1988), where Granger noncausality from money to output is established forthe VAR model with log-differences with US data. The authors contest this result andargue for a specification error for models with first differences. We continue our analysiswith nonlinear models that allow switches within their parameters.

Table 3.3: Noncausality and conditional regime independence in a VAR(12) model. Nu-merical efficiency results for these models are presented in Table C.3 of Appendix C.3.

M j Hypothesis Restrictions # restrictions ln p(y|M j) lnB j0

H0: Unrestricted model

M0 VAR(12) - 0 -2,916.77 0

H1: Granger noncausality from money to income

M1 (A1) A(i)12 = 0 p -2,901.63 15.13

for i = 1, . . . , p.


DOI: 10.2870/63610


Granger causal analysis with MS-VARs MS-VAR models capture the nonlinearities ofthe data, such as heteroskedasticity. Endogeneity in the regime estimation gives lots oflatitude for the capture of a variety of nonlinear features of the data, hence in a wayreducing the risk of model misspecification. The legitimacy of these models against VARscan easily be tested through the computation of the marginal distribution of data for therespective models.

Moreover, the Markov-switching models, framework provides a more detailed analysisof causality, as MS-VAR models produce different sets of restrictions for different types ofnoncausality, i.e. noncausality in mean, variance, or distribution. Therefore, we distinguishbetween more and less strict hypotheses, and make inferences that are more informativeby investigating causality in moments of different order.

We estimate the data MSIAH(m)-VAR(p) models for different number of regimesM = 2, 3, 4 and different lag lengths, p = 0, . . . , 6. Each of the Block Metropolis-Hastingsalgorithm is initiated by the estimates from the EM algorithm of the corresponding model.Then follows a 10,000-iteration burn-in and, after convergence of the sampler, we sample5000 final draws from the posteriors. The prior distributions are as defined in Section 3.2.

Table 3.4 reports the lnMHMs for the estimated models with 2 regimes. Though wealso estimated models with 3 or 4 regimes, estimation encountered difficulties of lowoccurrences of regimes. These phenomena indicate that the data does not support MS-VAR models with 3 or more regimes, and explains why we only present results with 2regimes. The number of estimated lags for the autoregressive coefficients is limited to 6lags – less than the 12 lags for VAR models – also due to insufficient state occurrenceswhen the number of AR parameters increases. The model preferred by the data is theMSIAH(2)-VAR(4), i.e. with 2 regimes and VAR process of order 4. Table C.2 in AppendixC.2 displays, for each parameter of the model, the mean, standard deviations, naivestandard errors, and autocorrelations of the Markov chains at lag 1 and lag 10. Decayingautocorrelation between draws indicates that the sampler has desirable properties.

Figure 3.2 plots the regime probabilities from the selected model. In comparison withthe second regime, the first regime matches times of higher variance for both variables.As well the constant for income growth, μ1,1, is negative during the occurrences of the firstregime. Hence, the first regime can be interpreted as the bad state of the economy.

Note that comparing the best unrestricted MS-VAR model from Table 3.4 to the bestVAR model of Table 3.3 (that is to the restricted model) yields a logarithm of the posteriorodds ratio of 6.41 in favor of the MS-VAR model.


DOI: 10.2870/63610


Table 3.4: Model selection for MSIAH(2)-VAR(p) – determination of the lag order

Lags 0 1 2 3 4 5 6lnMHM -3,002.64 -2,926.42 -2,903.89 -2,898.21 -2,895.22 -2,914.87 -2,913.49

0.0

0.4

0.8

Reg

ime

1

0.0

0.4

0.8

1960 1965 1970 1975 1980 1985 1990 1995

Reg

ime

2

Year

Figure 3.2: Estimated probabilities of regimes for a MSIAH(2)-VAR(4) model

Similarly to Warne (2000), we proceed with the analysis of Granger noncausality for theselected MSIAH(2)-VAR(4) model. The Bayesian testing strategy we employ renders theprocess straightforward: each type of causality implies different restrictions on the modelparameters; we impose them, estimate the models and compute all marginal densitiesof data. Table 3.5 summarizes all the sets of restrictions to impose when testing thenoncausality from money to income, and also logarithms of the marginal densities of datagiven the model, ln p(y|M j), and logarithms of the Bayes factors, lnB j0 for j = 1, . . . , 7.A positive logarithm of the Bayes factor is to be interpreted as evidence in favor of therestricted model. In a symmetric way, negative logarithm of the Bayes factor indicatesthat the non-restricted model is preferred by the data.

Analysis of Table 3.5 shows that only modelM5 is more probable a posteriori than theunrestricted modelM0. This model represents one of the sets of restrictions for Grangernoncausality in mean. All other models, however, are less probable than the unrestrictedmodel, which is represented with the negative values of the logarithms of the Bayes factors.


DOI: 10.2870/63610


Table 3.5: Noncausality and conditional regime independence in a MSIAH(2)-VAR(4)model. Numerical efficiency results for these models are presented in table C.3 of Ap-pendix C.3.

M j Hypothesis Restrictions # restrictions ln p(y|M j) lnB j0

H0: Unrestricted model

M0 MS(2)-VAR(4) - 0 -2895.22 0

H1: History of money does not impact on the regime forecast of income

M1 (A1) M1 = 1,M2 = 2 μ1,st = μ1,A(i)11,st= A(i)

11,A(i)12,st= 0 3p+4 -2964.72 -69.50

Σ11,st = Σ11,Σ12,st = 0M2 (A1) M1 = 2,M2 = 1 μ2,st = μ2,A

(i)21,st= A(i)

21,A(i)22,st= A(k)

22 , 4p+4 -2921.54 -26.32Σ22,st = Σ22,Σ12,st = 0,A(i)

12,st= 0

(A2) M1 = 1,M2 = 2 Always holds, no restrictions - - -M3 (A2) M1 = 2,M2 = 1 p11 = p21 1 -2907.39 -12.17

H2: Granger noncausality in mean

(A1) or - - - -

M4 (A6) M1 = 1,M2 = 2 μ1,st = μ1,A(i)11,st= A(i)

11,A(i)12,st= 0 3p+1 -2880.63 14.59

M5 (A6) M1 = 2,M2 = 1 p11 = p21,∑2

j=1 A(i)12, jπ j = 0 p+1 -2897.24 -2.02

H3: Granger noncausality in variance

(A1) or - - - -

M6 (A7) M1 = 1,M2 = 2 μ1,st = μ1,A(i)11,st= A(i)

11,A(i)12,st= 0 3p+2 -2953.15 -57.93

Σ11,st = Σ11

M7 (A7) M1 = 2,M2 = 1 p11 = p21,A(i)12,st= 0 2p+1 -2900.58 -5.36

H4: Granger noncausality in distribution

(A1) or - - - -(A7) - - - -

for i = 1, . . . , p.

Table 3.6 presents a summary of the assessment of the considered hypotheses. Wefound strong support for Granger noncausality in mean. This hypothesis has much biggerposterior probability compared to all other hypotheses, including the unrestricted model.Warne (2000) found a similar result, but holding only at the 10% level of significance.


DOI: 10.2870/63610


Table 3.6: Summary of the hypotheses testing

Hi Hypothesis Represented by models ln Pr(Hi|y)Pr(H0|y)

H0 Unrestricted model M0 0H1 History of money does not impact on the

regime forecast of incomeM1,M2,M4 -12.17

H2 Granger noncausality in mean M1,M2,M4,M5 14.59H3 Granger noncausality in variance M1,M2,M6,M7 -5.36H4 Granger noncausality in distribution M1,M2,M6,M7 -5.36

However, Bayesian testing establishes this strong result, and the conditional mean ofincome is invariant to the history of money. Table 3.6 provides strong evidence for Grangercausal relations in variance and, in effect, in distribution, as these two hypotheses for theconsidered model are represented by the same set of models.

Summary The results of Bayesian testing for Granger causality from money to inputon the US monthly series covering the period 1959–1995 are in line with the narration ofPsaradakis et al. (2005), in the sense that the strongly established noncausality in meanwithin VAR models (which is equivalent to the noncausality in variance and in distri-bution) does not hold with MS-VAR models. Allowing non-linearity in the models’coefficients, here by a Markov chain permitting switches between regimes of the economy,and testing for causality from money to income yields a different result and the strongnoncausal evidence is decomposed. We found that the history of money helps to pre-dict the regimes of income. We also found that money causes income both in varianceand in distribution. However, we did find evidence for Granger noncausality in meanfrom money to income, as did Warne (2000). Bayesian model estimation associated withBayesian testing provided tools with which to select the correct model specification, andalso with which to compare it to the VAR specifications, and the posterior odds ratio testsallowed us to test for the three types of Granger noncausality.

These findings have particular consequences for the forecasting of the income. Despitethe fact that past information about money does not change the forecast of the conditionalmean of income, it is still crucial for its modeling. Past observations of money improvesthe forecast of the state of the economy when modeled with a Markov-switching process.


DOI: 10.2870/63610


Therefore, if one is interested in forecasting regime switches in the income equation, thenone should add the money variable into the considered system. The same conclusionapplies to the forecasting of future variability of income and, in particular, for its den-sity forecast. The last finding is especially relevant for the Bayesian Markov-switchingvector autoregressions. We justify this statement with two features of such a model.First, Markov-switching vector autoregressions are designed to model and forecast acomplicated distribution of the residuals with heteroskedastic variances and non-normaldistribution. Second, the Bayesian inference is particularly suitable for the density forecastwith MS-VARs, due to the fact that the predictive density is constructed by integratingout the parameters of the models. In consequence, the forecast incorporates the uncer-tainty with respect to the parameter values. Moreover, the integration required in orderto construct the forecasts conditioned only on past observations of the variables, and notconditioned on the unobserved states, as in classical forecasting (see Hamilton, 1994), isstraightforward.

A note Using Bayes factors for the comparison of the models is not uncontroversial. Itappears that Bayes factors are sensitive to the specification of the prior distributions forthe parameters being tested. The more diffuse a prior distribution the more informative itis about the the parameter tested with a Bayes factor. This phenomenon is called Bartlett’sparadox (see Bartlett, 1957) and is a version of the Lindley’s paradox. Moreover, Strachanand van Dijk (2011) show that assuming a diffuse prior distribution for the parametersof the model, results in wrongly defined Bayes factors. As a solution to this problemStrachan and van Dijk recommend using a prior distribution belonging to a class ofshrinkage distributions.

In this study, a normally-distributed prior densities with mean zero and variance equalto 100 are assumed. This prior distribution for the VAR parameters belongs to a class ofdiffuse prior distributions. Therefore, the critique of Bayes factors applies. The problemis recognized and solved by employment of the shrinkage prior distribution for theseparameters in the newest version of this work (see Droumaguet et al., 2012). However, wedo not include this results in this work.


DOI: 10.2870/63610


3.7 Conclusions

We proposed a method of testing the nonlinear restrictions for the hypotheses of Grangernoncausality in mean, in variance and in distribution for Markov-switching Vector Au-toregressions. The employed Bayes factors and Posterior Odds Ratios overcome thelimitations of the classical approach. It requires, however, an algorithm of estimation ofthe unrestricted model and of the restricted models, representing hypotheses of interest.The algorithm we proposed, allows for the restriction of all the groups of parameters of themodel in an appropriate way. It combines several existing algorithms and improves themin order to maintain the desired properties of the model and the efficiency of estimation.The estimation method allows us to use many of the existing methods of computing ofthe marginal density of data that are required for both Bayes factors and Posterior OddsRatios.

The Bayesian approach to testing has also consequences for the way in which thecompeting hypotheses are treated. Contrary to classical tests, the hypotheses of Grangercausality or noncausality of different types are, in our approach, treated symmetrically. Weobtain this effect by comparing the posterior probabilities of the hypotheses (or models).In consequence, the output of our inference, in the form of choosing the hypothesis of thehighest posterior probability, reflects the choice of the hypothesis supported in the biggestrate by the data. This applies, of course, to cases in which the chosen prior probabilitiesand densities do not discriminate a priori some of the hypotheses.

In the empirical illustration of the methodology, we have found that in the USA moneydoes not cause income in mean. We have, however, found that the money impacts on theforecast of the future state of the economy, as well as on the forecast of the variability of theincome and on its density forecast. If the empirical analysis is to be something more thanjust an illustration of the methodology, and in effect be conclusive, robustness checks arerequired. In particular, considering more relevant variables in the system could impact onthe conclusions of the analysis of the Granger causality between money and income.

As the main limitation of the whole analysis of Granger causality for MS-VAR models,we find that only one period ahead Granger causality is considered in this study. Theconditions for h periods ahead noncausality should be further explored. We only mentionthat potentially establishing that one variable does not improve the forecast of the hiddenMarkov process, taking into account the Markov property, may imply the same for allperiods in the future. Still, establishing conditions for the noncausality h periods ahead for


DOI: 10.2870/63610

3.7. CONCLUSIONS 155

the autoregressive parameters, including covariances, would potentially require tediousalgebra. This statement is motivated by the complexity of formulating forecasts withMS-VAR models.


DOI: 10.2870/63610



DOI: 10.2870/63610

Appendix C

C.1 Alternative restrictions for Granger noncausality

The following restrictions were set by Warne (2000), and are all derived under the condition(A2) and rank(P(2)) =M2.

Restriction 5. Suppose that P = (ıM1π(1)′ ⊗ P(2)) with rank(P(2)) = M2, then condition (A3)

is equivalent to:

(A6): (i)∑M1

j1=1 m1.( j1, j2)π(1)j1= m1,

(ii)∑M1

j1=1 a(k)1r.( j1, j2)π

(1)j1= a(k)

1r , and

(iii) a(k)14 = 0

for all j2 ∈ {1, . . . ,M2}, r ∈ {1, 2, 3}, and k ∈ {1, . . . , p},Restriction 6. Suppose that P = (ıM1π

(1)′ ⊗ P(2)) with rank(P(2)) = M2, then condition (A4)is equivalent to:

(A7): (i) (A3),

(ii)∑M1

j1=1

[(m1.( j1, j2) − m1) ⊗ (m1.( j1, j2) − m1)

]π(1)

j1= ςμ,

(iii)∑M1

j1=1

[(a(k)

1r.( j1, j2) − a(k)1r ) ⊗ (a(l)

1s.( j1, j2) − a(l)1s)

]π(1)

j1= ς(k.l)

r.s ,

(iv)∑M1

j1=1

[(m1.( j1, j2) − m1) ⊗ (a(k)

1r.( j1, j2) − a(k)1r )

]π(1)

j1= ς(k)μ.r,

(v)∑M1

j1=1 σ1.( j1, j2)π(1)j1= ςω, and

(vi) a(k)14. j = 0

for all j ∈ {1, . . . ,M}, j2 ∈ {1, . . . ,M2}, r, s ∈ {1, 2, 3}, and k, l ∈ {1, . . . , p}

157


DOI: 10.2870/63610

is satisfied.

Restriction 7. Suppose rank(P) ∈ {1,M}, then y4 does not Granger-cause in distribution y1

if and only if it does not Granger-cause y1 in variance.

C.2 Summary of the posterior densities simulations

Table C.1: VAR(12): posterior properties

Mean Std. dev. Naive Std. error Autocorr. lag 1 Autocorr. lag 10

Standard deviations

σ1 9.192 0.137 0.002 0.028 0.006σ2 4.912 0.095 0.001 0.046 0.002

Correlations

ρ1 -0.025 0.058 0.001 0.060 -0.014

Intercepts

μ1 -0.004 0.300 0.004 0.001 -0.009μ2 0.582 0.266 0.004 -0.011 0.006

Autoregressive coefficients

A(1)11 0.284 0.049 0.001 -0.007 0.005

A(1)12 0.138 0.088 0.001 -0.006 -0.028

A(1)21 0.027 0.027 0.000 -0.024 -0.016

A(1)22 0.361 0.049 0.001 0.020 0.027

A(2)11 0.076 0.049 0.001 -0.009 0.014

A(2)12 0.108 0.094 0.001 -0.034 -0.014

A(2)21 -0.044 0.026 0.000 -0.001 0.012

A(2)22 -0.005 0.052 0.001 0.007 -0.001

A(3)11 0.068 0.049 0.001 0.002 0.011

A(3)12 0.133 0.093 0.001 -0.035 0.009

A(3)21 -0.054 0.026 0.000 -0.014 -0.009

A(3)22 0.199 0.052 0.001 0.001 -0.001

A(4)11 0.085 0.049 0.001 0.004 0.009

A(4)12 -0.053 0.092 0.001 -0.014 -0.008

A(4)21 -0.024 0.027 0.000 0.012 -0.011

158


DOI: 10.2870/63610


A(4)22 -0.106 0.051 0.001 -0.026 0.002

A(5)11 -0.054 0.049 0.001 -0.003 -0.010

A(5)12 0.032 0.094 0.001 -0.019 -0.010

A(5)21 0.007 0.026 0.000 0.008 -0.005

A(5)22 0.228 0.051 0.001 0.004 0.008

A(6)11 0.004 0.047 0.001 0.000 0.009

A(6)12 0.106 0.095 0.001 0.009 0.019

A(6)21 0.000 0.026 0.000 0.004 0.011

A(6)22 0.067 0.052 0.001 0.008 -0.010

A(7)11 0.035 0.048 0.001 -0.002 -0.007

A(7)12 -0.100 0.095 0.001 -0.008 0.003

A(7)21 0.001 0.025 0.000 0.017 -0.002

A(7)22 -0.012 0.053 0.001 -0.025 -0.008

A(8)11 0.031 0.048 0.001 0.035 -0.017

A(8)12 0.056 0.094 0.001 0.005 -0.005

A(8)21 0.052 0.025 0.000 -0.015 0.005

A(8)22 0.104 0.051 0.001 0.011 0.010

A(9)11 0.015 0.048 0.001 -0.016 0.019

A(9)12 -0.054 0.093 0.001 0.006 0.004

A(9)21 -0.043 0.025 0.000 0.016 -0.004

A(9)22 0.181 0.052 0.001 0.023 -0.012

A(10)11 0.020 0.047 0.001 0.023 0.020

A(10)12 0.008 0.090 0.001 0.007 -0.022

A(10)21 -0.008 0.026 0.000 -0.010 -0.005

A(10)22 -0.077 0.052 0.001 0.018 -0.012

A(11)11 0.008 0.048 0.001 -0.017 0.021

A(11)12 -0.064 0.093 0.001 -0.014 0.001

A(11)21 -0.036 0.026 0.000 0.007 -0.006

A(11)22 -0.023 0.052 0.001 -0.022 0.001

A(12)11 -0.069 0.044 0.001 0.008 0.003

A(12)12 -0.042 0.087 0.001 -0.031 0.006

A(12)21 0.061 0.024 0.000 0.010 -0.013

A(12)22 -0.029 0.049 0.001 -0.004 -0.002

159


DOI: 10.2870/63610

Table C.2: MSIAH(2)-VAR(4): posterior properties


Transition probabilities

p1,1 0.734 0.066 0.001 0.557 -0.005p2,1 0.059 0.018 0.000 0.624 0.088

Standard deviations

σ1,1 17.129 1.207 0.017 0.625 0.150σ2,1 8.746 0.646 0.009 0.559 0.111σ1,2 6.983 0.276 0.004 0.669 0.173σ2,2 4.011 0.179 0.003 0.666 0.105

Correlations

ρ1,1 -0.173 0.127 0.002 0.203 0.008ρ1,2 0.078 0.070 0.001 0.284 0.018

Intercepts regime 1

μ1,1 -0.213 0.949 0.013 0.014 0.032μ2,1 1.107 0.885 0.013 0.101 0.011

Autoregressive coefficients regime 1

A(1)11,1 0.497 0.147 0.002 0.128 0.016

A(1)12,1 0.209 0.287 0.004 0.142 -0.018

A(1)21,1 0.069 0.075 0.001 0.156 0.027

A(1)22,1 0.419 0.156 0.002 0.222 -0.002

A(2)11,1 -0.253 0.191 0.003 0.238 0.020

A(2)12,1 -0.134 0.361 0.005 0.191 -0.005

A(2)21,1 -0.018 0.094 0.001 0.131 0.025

A(2)22,21 -0.092 0.202 0.003 0.237 0.002

A(3)11,1 0.172 0.218 0.003 0.173 0.001

A(3)12,1 -0.176 0.376 0.005 0.105 0.008

A(3)21,1 -0.126 0.122 0.002 0.265 0.006

A(3)22,1 0.112 0.217 0.003 0.191 0.004

A(4)11,1 -0.490 0.217 0.003 0.325 0.078

A(4)12,1 0.409 0.343 0.005 0.164 0.019

A(4)21,1 0.088 0.106 0.001 0.252 0.029

A(4)22,1 0.098 0.205 0.003 0.281 0.031

160


DOI: 10.2870/63610


Intercepts regime 2

μ1,2 0.295 0.634 0.009 0.163 -0.005μ2,2 2.058 0.420 0.006 0.210 -0.012

Autoregressive coefficients regime 2

A(1)11,2 0.237 0.059 0.001 0.391 0.041

A(1)12,2 0.028 0.099 0.001 0.333 -0.002

A(1)21,2 -0.026 0.031 0.000 0.259 0.025

A(1)22,2 0.398 0.058 0.001 0.297 -0.024

A(2)11,2 0.130 0.048 0.001 0.210 0.014

A(2)12,2 0.165 0.088 0.001 0.195 0.013

A(2)21,2 -0.032 0.028 0.000 0.194 0.005

A(2)22,2 0.092 0.057 0.001 0.321 0.038

A(3)11,2 0.099 0.053 0.001 0.377 0.057

A(3)12,2 0.214 0.086 0.001 0.195 0.006

A(3)21,2 -0.014 0.026 0.000 0.176 0.023

A(3)22,2 0.285 0.053 0.001 0.284 0.007

A(4)11,2 0.106 0.052 0.001 0.394 0.039

A(4)12,2 -0.174 0.092 0.001 0.272 0.014

A(4)21,2 -0.019 0.025 0.000 0.200 0.009

A(4)22,2 -0.066 0.055 0.001 0.323 0.031

C.3 Characterization of estimation efficiency

Table C.3: Characterization of the efficiency in the models’ estimations

RNE Autocorr. lag 1 Autocorr. lag 10 Geweke z-scoreM j Median Min Max Median Min Max Median Min Max Median Min Max

Vector autoregressive models

M0 1.00 0.85 1.19 0.00 -0.03 0.06 0.00 -0.03 0.03 -0.10 -2.37 2.38M1 1.00 0.76 1.08 0.01 -0.03 0.07 0.00 -0.04 0.02 0.07 -2.57 2.43

Markov switching vector autoregressive models

M0 0.48 0.10 1.00 0.24 0.01 0.67 0.02 -0.02 0.17 -0.56 -2.14 3.27M1 0.47 0.06 1.00 0.17 -0.02 0.78 0.01 -0.03 0.29 0.22 -1.98 2.58

161


DOI: 10.2870/63610

162 APPENDIX C. APPENDIX

RNE Autocorr. lag 1 Autocorr. lag 10 Geweke z-scoreM j Median Min Max Median Min Max Median Min Max Median Min Max

M2 0.71 0.13 1.12 0.14 -0.02 0.71 0.01 -0.03 0.08 0.13 -2.10 1.59M3 0.30 0.02 0.94 0.27 0.03 0.89 0.04 -0.01 0.56 -0.32 -2.43 1.94M4 0.46 0.08 0.83 0.25 0.07 0.78 0.01 -0.03 0.23 -0.20 -1.57 1.56M5 0.22 0.02 0.43 0.44 0.12 0.85 0.07 -0.01 0.50 -0.10 -2.39 2.16M6 0.24 0.02 0.92 0.26 0.04 0.90 0.04 -0.02 0.58 -0.08 -1.43 1.91M7 0.33 0.05 0.83 0.31 0.03 0.84 0.04 -0.01 0.39 -0.16 -2.34 1.67

Table C.3 reports statistics for assessing the efficiency of each estimated model. Three types ofstatistics are presented: the relative numerical efficiency of Geweke (1989), autocorrelations atdifferent lags, and the convergence diagnostic of Geweke (1992). Statistics should be presentedseparately for each parameter of each model, but to save space, we summarize each model with amedian, minimum, and maximum.

The relative numerical efficiency represents the ratio of the variance of a hypothetical draw fromthe posterior density over the variance of the Gibbs sampler. Thus, it can be interpreted as a measureof the computational efficiency of the algorithm. The columns of Table C.3, unsurprisingly, tell usthat the algorithm for VAR models is more efficient than that for MS-VAR. The same observationcan be made when comparing unrestricted models with restricted ones. What is interesting forus is the magnitude of the RNE statistics between unrestricted and restricted models. Those arecomparable, which is a good sign that the algorithm for constrained models are, computationally,reasonable efficient.

The columns displaying the autocorrelations at lag 1 and lag 10 are here to ensure that thereis a decay over time. This is the case here, and the Gibbs samplers explore the entire posteriordistribution.

Geweke (1992) introduces the z scores test which tests the stationarity of the draws from theposterior distribution simulation comparing the mean of the first 30% of the draws with the last40% of the draws. We compare the two means with a z-test. Typically, values outside (−2, 2)indicate that the mean of the series is still drifting, and this occurs for some parameters in eachmodels, exceptM4 andM6 for MS-VARs. Increasing the burn in period might improve the scoresand stationarity of the MCMC chain.


DOI: 10.2870/63610

Bibliography

Albert, J. H. and S. Chib (1993). Bayes Inference via Gibbs Sampling of AutoregressiveTime Series Subject to Markov Mean and Variance Shifts. Journal of Business & EconomicStatistics 11(1), 1–15.

Barnard, J., R. McCulloch, and X.-l. Meng (2000). Modeling Covariance Matrices in Terms ofStandard Deviations and Correlations, with Applicationto Shrinkage. Statistica Sinica 10,1281–1311.

Bartlett, M. S. (1957). A Comment on D. V. Lindley’s Statistical Paradox. Biometrika 44(3/4),533–534.

Boudjellaba, H., J.-M. Dufour, and R. Roy (1992). Testing Causality Between Two Vectors inMultivariate Autoregressive Moving Average Models. Journal of the American StatisticalAssociation 87(420), 1082– 1090.

Boudjellaba, H., J.-M. Dufour, and R. Roy (1994). Simplified Conditions for NoncausalityBetween Vectors in Multivariate ARMA Models. Journal of Econometrics 63, 271–287.

Casella, G. and E. I. George (1992). Explaining the Gibbs Sampler. The American Statisti-cian 46(3), 167–174.

Chib, S. and I. Jeliazkov (2001). Marginal Likelihood from the Metropolis-Hastings Output.Journal of the American Statistical Association 96(453), 270– 281.

Christiano, L. J. and L. Ljungqvist (1988). Money Does Granger-Cause Output in theBivariate Money-Output Relation. Journal of Monetary Economics 22(2), 217 – 235.

Christopoulos, D. K. and M. A. Leon-Ledesma (2008). Testing for Granger (non)-Causalityin a Time Varying Coefficient VAR Model. Journal of Forecasting (27), 293–303.

Comte, F. and O. Lieberman (2000). Second-Order Noncausality in Multivariate GARCHProcesses. Journal of Time Series Analysis 21(5), 535–557.

Diebold, F. X. and K. Yilmaz (2009). Measuring Financial Asset Return and VolatilitySpillovers, with Application to Global Equity Markets. The Economic Journal 119(534),158–171.

163


DOI: 10.2870/63610

Droumaguet, M., A. Warne, and T. Wozniak (2012). Granger Causality and Regime In-ference in Bayesian Markov-Switching VARs. Unpublished manuscript, European Uni-versity Institute, Florence, Italy.

Droumaguet, M. and T. Wozniak (2012). Bayesian Testing of Granger Causality in Markov-Switching VARs. Working paper series, European University Institute, Florence, Italy.Download at: http://cadmus.eui.eu/bitstream/handle/1814/20815/ECO_2012_06.pdf?sequence=1.

Dufour, J.-M. (1989). Nonlinear Hypotheses, Inequality Restrictions, and Non-NestedHypotheses: Exact Simultaneous tests in Linear Regressions. Econometrica 54(2), 335–355.

Dufour, J.-M., D. Pelletier, and E. Renault (2006). Short Run and Long run Causality inTime Series: Inference. Journal of Econometrics 132(2), 337–362.

Friedman, M. and A. Schwartz (1971). A Monetary History of the United States, 1867-1960,Volume 12. Princeton University Press.

Fruhwirth-Schnatter, S. (2004). Estimating Marginal Likelihoods for Mixture and MarkovSwitching Models Using Bridge Sampling Techniques. Econometrics Journal 7(1), 143–167.

Fruhwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer.

Geweke, J. (1989). Econometrica 57(6), pp. 1317–1339.

Geweke, J. (1992). Evaluating the Accuracy of Sampling-Based Approaches to CalculatingPosterior Moments. In J. Bernardo, J. Berger, A. Dawid, and A. Smith (Eds.), BayesianStatistics 4, pp. 169–194. Clarendon Press, Oxford, UK.

Geweke, J. (1994). Bayesian Comparison of Econometric Models. Working Papers.

Geweke, J. (1999). Using Simulation Methods for Bayesian Econometric Models: Inference,Development, and Communication. Econometric Reviews 18(1), 1–73.

Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models andCross-spectral Methods. Econometrica 37(3), 424–438.

164


DOI: 10.2870/63610

Greenberg, E. and S. Chib (1995). Understanding the Metropolis-Hastings Algorithm. TheAmerican Statistician 49(4), 327–335.


Hamilton, J. D. (1994). State-Space Models. In R. F. Engle and D. L. McFadden (Eds.),Handbook of Econometrics (Volume IV ed.)., Chapter 50, pp. 3039–3080. Elsevier.

Hoogerheide, L. F., H. K. van Dijk, and R. van Oest (2009). Simulation Based BayesianEconometric Inference: Principles and Some Recent Computational Advances, Chapter 7, pp.215–280. Handbook of Computational Econometrics. Wiley.

Jarocinski, M. and B. Mackowak (2011). Choice of Variables in Vector Autoregressions.

Jeantheau, T. (1998). Strong Consistency of Estimators for Multivariate ARCH Models.Econometric Theory 14(01), 70–86.

Kass, R. E. and A. E. Raftery (1995). Bayes Factors. Journal of the American StatisticalAssociation 90(430), 773–795.

Kim, C.-J. and C. R. Nelson (1999a). Has the U.S. Economy Become More Stable? ABayesian Approach Based on a Markov-Switching Model of the Business Cycle. Reviewof Economics and Statistics 81(4), 608–616.

Kim, C.-J. and C. R. Nelson (1999b). State-space models with regime switching: classical andGibbs-sampling approaches with applications. MIT press.

Koop, G. and D. Korobilis (2010). Bayesian Multivariate Time Series Methods for EmpiricalMacroeconomics. Foundations and Trends in Econometrics 3(4), 267–358.

Krolzig, H. (1997). Markov-switching Vector Autoregressions: Modelling, Statistical Inference,and Application to Business Cycle Analysis. Springer Verlag.

Lechner, M. (2011). The Relation of Different Concepts of Causality Used in Time Seriesand Microeconometrics. Econometric Reviews 30(1), 109–127.

Lutkepohl, H. (1993). Introduction to Multiple Time Series Analysis. Springer-Verlag.

165


DOI: 10.2870/63610

Lutkepohl, H. and M. M. Burda (1997). Modified Wald Tests Under Nonregular Conditions.Journal of Econometrics 78(1), 315–332.

McCulloch, R. E. and R. S. Tsay (1994). Statistical Analysis of Economic Time Series viaMarkov Switching Models. Journal of Time Series Analysis 15(5), 523–539.

Pajor, A. (2011). A Bayesian Analysis of Exogeneity in Models with Latent Variables.Central European Journal of Economic Modelling and Econometrics 3(2), 49–73.

Psaradakis, Z., M. O. Ravn, and M. Sola (2005). Markov Switching Causality and theMoney-Output Relationship. Journal of Applied Econometrics 20(5), 665–683.

Psaradakis, Z. and M. Sola (1998). Finite-Sample Properties of the Maximum Likelihood Es-timator in Autoregressive Models with Markov Switching. Journal of Econometrics 86(2),369–386.

Psaradakis, Z. and N. Spagnolo (2003). On the Determination of the Number of Regimesin Markov-switching Autoregressive Models. Journal of Time Series Analysis 24, 237–252.

Ritter, C. and M. A. Tanner (1992). Facilitating the Gibbs Sampler: The Gibbs Stopper andthe Griddy-Gibbs Sampler. Journal of the American Statistical Association 87(419), 861–868.

Robins, R. P., C. W. J. Granger, and R. F. Engle (1986). Wholesale and Retail Prices: BivariateTime-Series Modeling with forecastable Error Variances, pp. 1–17. The MIT Press.

Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonran-domized Studies. Journal of Educational Psychology 66(5), 688 – 701.

Sims, C. A. (1972). Money, Income, and Causality. The American Economic Review 62(4),540 – 552.

Sims, C. A., D. F. Waggoner, and T. Zha (2008). Methods for Inference in Large Multiple-Equation Markov-Switching Models. Journal of Econometrics 146(2), 255–274.

Strachan, R. W. and H. K. van Dijk (2011). Divergent Priors and Well Behaved BayesFactors.

Taylor, J. B. and J. C. Williams (2009). A Black Swan in the Money Market. AmericanEconomic Journal: Macroeconomics 1(1), 58–83.

166


DOI: 10.2870/63610

Tobin, J. (1970). Money and Income: Post Hoc Ergo Propter Hoc? The Quarterly Journal ofEconomics 84(2), 301–317.

Warne, A. (2000). Causality and Regime Inference in a Markov Switching VAR. WorkingPaper Series 118, Sveriges Riksbank (Central Bank of Sweden).

Wozniak, T. (2011). Granger Causal Analysis of VARMA-GARCH Models. Unpublishedmanuscript,, Department of Economics, European University Institute.

Wozniak, T. (2012). Testing Causality Between Two Vectors in Multivariate GARCH Mod-els. Working Paper Series 1139, Department of Economics, The University of Mel-bourne, Melbourne. Download: http://www.economics.unimelb.edu.au/research/1139.pdf.

167


DOI: 10.2870/63610

Markov-Switching Vector Autoregressive Models

Documents

Transcript of Markov-Switching Vector Autoregressive Models