Large Vector Autoregressions with stochastic volatility and …...Large Vector Autoregressions with...
Transcript of Large Vector Autoregressions with stochastic volatility and …...Large Vector Autoregressions with...
Large Vector Autoregressions with stochastic volatilityand non-conjugate priors
Andrea Carriero 1 Todd E. Clark 2 Massimiliano Marcellino 3
Norges Bank, 3 October 2017
1Queen Mary, University of London2Federal Reserve Bank of Cleveland3Bocconi University and CEPRCarriero, Clark, Marcellino () Large VARs September 2017 1 / 22
Introduction
Introduction - two ingredients
Two main ingredients are key for the specification of a good Vector Autoregressivemodel (VAR) for forecasting and structural analysis of macroeconomic data:
A large cross section. Banbura, Giannone, and Reichlin (2010), Carriero, Clark,and Marcellino (2015), Giannone, Lenza, and Primiceri (2015) and Koop (2013)
Time variation in the volatilities. Clark (2011), Clark and Ravazzolo (2015),Cogley and Sargent (2005), D’Agostino, Gambetti and Giannone (2013), andPrimiceri (2005)
There are no papers which jointly allow for both time variation and large datasets
Carriero, Clark, Marcellino () Large VARs September 2017 2 / 22
Introduction
Introduction - heteroskedasticity
The reason lies in the structure of the likelihood function
Homoskedastic VARs are SUR models with the same set of regressors in eachequation −→ Kronecker structure in the likelihood−→ OLS equation by equation
Equation-specific stochastic volatility breaks this symmetry because each equationis driven by a different volatility
The system would need to be vectorised, and the conditional posterior involves
manipulation of a matrix of dimension pN2 (N=number of variables, p=number oflags)
The computational complexity is therefore N23= N6
Carriero, Clark, Marcellino () Large VARs September 2017 3 / 22
Introduction
Introduction - asymmetric priors
In a Bayesian framework, simmetry is not only needed in the likelihood, but also in
the prior
Kronecker structure in the likelihood+Kronecker structure in the prior= Kronecker
structure in the posterior
For example, the VAR estimated by Banbura, Giannone, and Reichlin (2010) is aVAR with 130 variables, but in order to make this estimation possible one needs to
assume:
(i) Homoskedasticity of the disturbances
(ii) A specific structure for the prior
Without either (i) or (ii) the system would need to be vectorised prior to estimation
Carriero, Clark, Marcellino () Large VARs September 2017 4 / 22
Introduction
The problem
Consider the VAR of a N-dimensional vector yt :
yt = Π(L)yt−1 + vt ; vt ∼ iid N(0,Σt ) (1)
Define Xt = [1, y ′t−1, ..., y′t−p ]
′ and Π = [Π0 |Π1 |...|Πp ]
In general we have the posterior vec(Π)|Σ, y ∼ N(vec(µΠ),ΩΠ) with posteriorprecision:
Ω−1Π = Ω−1ΠPrior
+T
∑t=1
(Σ−1t ⊗ XtX ′t )Likelihood
(2)
The precision matrix Ω−1Π is of size N(Np + 1). Its manipulation requires(pN2)3 = O(N6) elementary operations
For N very large modern computers (laptops/desktops) can’t even store such amatrix in RAM (e.g. N = 125 needs 330 GB of RAM).
Carriero, Clark, Marcellino () Large VARs September 2017 5 / 22
Introduction
The usual solution
In general we have the posterior vec(Π)|Σ, y ∼ N(vec(µΠ),ΩΠ) with
Ω−1Π = Ω−1ΠPrior
+T
∑t=1
(Σ−1t ⊗ XtX ′t )Likelihood
(2)
Now assume that
(i) Σt = Σ (homoskedasticity)
(ii) ΩΠ = Σ⊗Ω0 (conjugate prior)
Ω−1Π = Ω−1Π︸︷︷︸Σ−1⊗Ω−10
+T
∑t=1(Σ−1t︸︷︷︸
Σ−1
⊗ XtX ′t ) = Σ−1 ⊗(
Ω−10 +T
∑t=1
XtX ′t
), (3)
and the two terms can be manipulated separately, reducing complexity by O(N3)
Classical homoskedastic VARs can be estimated equation by equation.
Carriero, Clark, Marcellino () Large VARs September 2017 6 / 22
Introduction
Problems with the the usual solution
The Natural-conjugate homoskedastic approach allows to use large datasets, but it hasimportant limitations:
It imposes homoskedasticity, against the overwhelming evidence in macroeconomicand financial data
The prior structure Σ⊗Ω0 is restrictive (Rothemberg (1963), Sims and Zha
(1998))
It prevents any asymmetry in the prior across equations, because the
coeffi cients of each equation feature the same prior variance Ω0 (up to a scale
factor given by the elements of Σ).It has the unappealing consequence that prior beliefs must be correlated across
equations, with a correlation structure proportional to that of the shocks (as
described by Σ).
Carriero, Clark, Marcellino () Large VARs September 2017 7 / 22
Introduction
A new algorithm
In this paper we propose a new algorithm that makes possible to use:
A heteroskedastic model
The more general and less restrictive independent Normal - Inverse Wishart
(and Normal-diffuse) prior
Our procedure is based on a simple factorization of the likelihood, which allows todraw the VAR coeffi cients equation by equation
This reduces the computational complexity from N6 to N4.
Our new algorithm is very simple and can be easily inserted in any pre-existingalgorithm for estimation of BVAR models.
Carriero, Clark, Marcellino () Large VARs September 2017 8 / 22
Introduction
The Model
Consider the following VAR model for a N-dimensional yt with stochastic volatility:
yt = Π0 +Π(L)yt−1 + vt ; (1)
vt = A−1Λ0.5t εt , εt ∼ iid N(0, IN ) (2)
where Λt is a diagonal matrix with generic j-th element hj ,t and A−1 is a lowertriangular matrix with ones on its main diagonal.
The bottleneck is drawing vec(Π)|A,ΛT , yT ∼ N(vec(µΠ),ΩΠ); To obtain a draw
one needs to i) invert
Ω−1Π = Ω−1ΠPrior
+T
∑t=1
(Σ−1t ⊗ XtX ′t )Likelihood
(3)
ii) compute its Cholesky factor and iii) multiply the Cholesky factor by a random
vector
Each of the above operations is of complexity N6
Carriero, Clark, Marcellino () Large VARs September 2017 9 / 22
Introduction
An algorithm for large VARs
Consider again the decomposition vt = A−1Λ0.5t εt :v1,tv2,t...vN ,t
=
1 0 ... 0a∗2,1 1 ...... 1 0a∗N ,1 ... a∗N ,N−1 1
h0.51,t 0 ... 00 h0.52,t ...... ... 00 ... 0 h0.5N ,t
ε1,tε2,t...
εN ,t
,where a∗j ,i denotes the generic element of the matrix A
−1 which is available underknowledge of A.
Carriero, Clark, Marcellino () Large VARs September 2017 10 / 22
Introduction
An algorithm for large VARs
The VAR can be written as:
y1,t = π(0)1 +
N
∑i=1
p
∑l=1
π(i )1,l yi ,t−l + h
0.51,t ε1,t
y2,t = π(0)2 +
N
∑i=1
p
∑l=1
π(i )2,l yi ,t−l + a
∗2,1h
0.51,t ε1,t + h
0.52,t ε2,t
...
yN ,t = π(0)N +
N
∑i=1
p
∑l=1
π(i )N ,l yi ,t−l + a
∗N ,1h
0.51,t ε1,t + · · ·+ a∗N ,N−1h0.5N−1,t εN−1,t + h0.5N ,t εN ,t ,
with the generic equation for variable j :
yj ,t − (a∗j ,1h0.51,t ε1,t + ...+ a∗j ,,j−1h
0.5j−1,t εj−1,t )︸ ︷︷ ︸
y ∗j ,t
= π(0)j +
N
∑i=1
p
∑l=1
π(i )j ,l yi ,t−l + hj ,t εj ,t . (4)
When drawing the coeffi cients of equation j the term y ∗j ,t is known, since it is given bythe difference between the dependent variable of that equation and the realized residualsof all the previous j − 1 equations. Hence (4) is a standard generalized linear regressionmodel with i.i.d. Gaussian disturbances.
Carriero, Clark, Marcellino () Large VARs September 2017 11 / 22
Introduction
An algorithm for large VARs
The full conditional posterior distribution of the conditional mean coeffi cients can befactorized as:
p(Π|A,ΛT , y ) = p(π(N )|π(N−1),π(N−2), . . . ,π(1),A,ΛT , y )
×p(π(N−1)|π(N−2), . . . ,π(1),A,ΛT , y )...
×p(π(1)|A,ΛT , y ),and one can draw the coeffi cients in Π in separate blocks:
Πj|Π1:j−1,A,ΛT , y ∼ N(µΠj |1:j−1 ,ΩΠj |1:j−1 )
with
µΠj |1:j−1 = ΩΠj |1:j−1
T
∑t=1
Xj ,th−1j ,t y
∗′j ,t +Ω−1
Πj |1:j−1µΠj |1:j−1
Ω−1Πj |1:j−1 = Ω−1Πj |1:j−1 +
T
∑t=1
Xj ,th−1j ,t X
′j ,t ,
where µΠj |1:j−1 and ΩΠj |1:j−1 are moments of Πj|Π1:j−1 ∼ N(µ
Πj |1:j−1 ,ΩΠj |1:j−1 )
Carriero, Clark, Marcellino () Large VARs September 2017 12 / 22
Introduction
An algorithm for large VARs
The conditional posterior of Π obtained is the same as the one from thesystem-wide algorithm
The algorithm will produce draws numerically identical to those of thesystem-wide sampler
This is true regardelss of the ordering, which is irrelevant to the conditionalposterior of Π
The total computational complexity of this estimation algorithm is O(N4), with a
gain of N2.
Uses equations with at most Np + 1 regressors, and the correlation across
equations typical of SUR models is implicitly accounted for by the factorization
The dimension of the posterior variance matrix Ω−1Πj is (Np + 1), which
means that its manipulation only involves operations of order O(N3).
Carriero, Clark, Marcellino () Large VARs September 2017 13 / 22
A numerical comparison of the estimation methods
Computational complexity and speed of simulation
time for producing 10 draws as a function of N
Size of the crosssection (N)0 1 2 3 4 5 6 7 8 9 10
Sec
onds
0
2
4
6
8
10
12
14
16
18time for producing 10 draws as function of N
System wide agorithm
Triangular algorithm
Size of the crosssection (N)0 1 2 3 4 5 6 7 8 9 10
Num
ber o
f ele
men
tary
ope
ratio
ns
0
10
20
30
40
50
60
70
80
90
100Theoretical and Actual difference in computational complexity
Actual difference
Theoretical difference
Carriero, Clark, Marcellino () Large VARs September 2017 14 / 22
A numerical comparison of the estimation methods
Computational complexity and speed of simulation
time for producing 10 draws as a function of N - log scale
Size of the crosssection (N)0 5 10 15 20 25 30 35 40
Sec
onds
10 1
10 0
10 1
10 2
10 3
10 4 time for producing 10 draws as function of N
System wide algorithm
Triangular algorithm
Size of the crosssection (N)0 5 10 15 20 25 30 35 40
Num
ber o
f ele
men
tary
ope
ratio
ns
10 1
10 0
10 1
10 2
10 3
10 4 Theoretical and Actual difference in computational complexity
Actual differenceTheoretical dif ference
Carriero, Clark, Marcellino () Large VARs September 2017 15 / 22
A numerical comparison of the estimation methods
Convergence and mixing
Regardless of the power of the computers used to perform the simulation thetriangular algorithm will always produce many more draws than the traditional
system-wide algorithm in a given unit of time.
This has important consequences in terms of producing draws with good mixingand convergence properties.
The triangular algorithm can produce draws many times closer to i.i.d. samplingin the same amount of time.
These computational and storage gains increase quadratically with the system size
Carriero, Clark, Marcellino () Large VARs September 2017 16 / 22
A numerical comparison of the estimation methods
Convergence and mixing
Ineffi ciency factors= distance from i.i.d sampling: ideally should be around 1.
5 0 5 10 15 20 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7Conditinal mean parameters, systemwide algorithm
0 .5 0 0.5 1 1.5 2 2.5 3 3.50
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8Conditinal mean parameters, triangular algorithm
10 0 10 20 30 40 50 60 700
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2Covariances, systemwide algorithm
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
2
2.5
3Covariances, triangular algorithm
2 4 6 8 10 12 14 16 18 200
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2Volatility factors (averaged across time), systemwide algorithm
0.7 0.75 0.8 0.85 0.9 0.95 1 1.050
2
4
6
8
10
12
14
16
18
20Volatility factors (averaged across time), triangular algorithm
Systemwide algorithm, 5000 draws.
20 0 20 40 60 80 100 120 140 160 1800
0.005
0.01
0.015
0.02
0.025
0.03Volatility innovation variance, systemwide algorithm
Triangular algorithm results are based on 1305000 draws with skipsampling of 261, producing an effective sample of 5000 draws.
0 .5 0 0.5 1 1.5 2 2.5 3 3.50
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6Volatility innovation variance, triangular algorithm
Carriero, Clark, Marcellino () Large VARs September 2017 17 / 22
A large structural VAR with drifting volatilities
Empirical applications
As an illustration we estimate a VAR with stochastic volatilities, using 13 lags and a
cross-section of 125 variables from FRED-MD
For a model of this size the system-wide algorithm would have a covariance matrix
of the coeffi cients of dimension 203250, which would require about 330 GB of RAM(2032502 × 8/109).
Our estimation algorithm can produce 5000 draws in just above 7 hours on a 3.5
GHz Intel Core i7.
We find that:
The variance of the shocks was clearly unstable over time
There is a factor structure in the volatilities
The combined use of both time variation in volatilities and a large data-set
improves point and density forecasts, more that what these two ingredients do
if used separately.
Carriero, Clark, Marcellino () Large VARs September 2017 18 / 22
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 1250.4
0.2
0
0.2
0.4
Var explained 73.2837%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 1250.1
0
0.1
0.2
0.3
0.4
0.5
Var explained 19.0428%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 1250.4
0.2
0
0.2
0.4
0.6
Var explained 2.6287%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 1250.4
0.2
0
0.2
0.4
0.6
Var explained 1.6826%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 1250.30.20.1
00.10.20.30.4
Var explained 0.52826%
FFRNB reserves
CES1021000001
PCEPIHoursRPI
interest rates, exchange rates, andfinancial indicators
Monetaryaggregates
Real variables PricesSurveys
Figure 11: Principal components loadings of the variance-covariance of the volatilities (matrix ).
PCA of the variance matrix of the shocks to volatilities
heteroskedast ic2 2.5 3 3.5
hom
osch
edas
tic
2
2.5
3
3.5
RPI SCORE
heteroskedast ic3.7 3.8 3.9
hom
osch
edas
tic3.65
3.7
3.75
3.8
3.85
3.9
3.95
DPCERA3M086SBEA SCORE
heteroskedast ic3 3.1 3.2
hom
osch
edas
tic
2.95
3
3.05
3.1
3.15
3.2
3.25
CMRMTSPLx SCORE
heteroskedast ic3.3 3.4 3.5
hom
osch
edas
tic
3.3
3.35
3.4
3.45
3.5
3.55
INDPRO SCORE
heteroskedast ic2.5 2 1.5 1
hom
osch
edas
tic
2.5
2
1.5
1
CUMFNS SCORE
heteroskedast ic1 0.5 0
hom
osch
edas
tic
1.21
0.80.60.40.2
00.20.4
UNRATE SCORE
heteroskedast ic4.6 4.7 4.8 4.9 5
hom
osch
edas
tic
4.6
4.7
4.8
4.9
5
PAYEMS SCORE
heteroskedast ic1 0.8 0.6 0.4 0.2
hom
osch
edas
tic
1
0.8
0.6
0.4
0.2
CES0600000007 SCORE
heteroskedast ic4.2 4.3 4.4 4.5
hom
osch
edas
tic
4.2
4.25
4.3
4.35
4.4
4.45
4.5
CES0600000008 SCORE
heteroskedast ic3.4 3.5 3.6 3.7 3.8
hom
osch
edas
tic
3.43.45
3.53.55
3.63.65
3.73.75
3.8
PPIFGS SCORE
heteroskedast ic2 2.05 2.1
hom
osch
edas
tic
1.96
1.98
2
2.02
2.04
2.06
2.08
2.1
PPICMM SCORE
heteroskedast ic4.5 4.6 4.7 4.8 4.9
hom
osch
edas
tic
4.54.554.6
4.654.7
4.754.8
4.854.9
PCEPI SCORE
heteroskedast ic2.5 3 3.5 4 4.5
hom
osch
edas
tic
2.5
3
3.5
4
4.5
FEDFUNDS SCORE
heteroskedast ic0 0.5 1
hom
osch
edas
tic
0.2
0
0.2
0.4
0.6
0.8
1
HOUST SCORE
heteroskedast ic1.75 1.8 1.85 1.9
hom
osch
edas
tic
1.75
1.8
1.85
1.9
S&P 500 SCORE
heteroskedast ic2.25 2.3 2.35 2.4
hom
osch
edas
tic
2.25
2.3
2.35
2.4
EXUSUKx SCORE
heteroskedast ic1.5 1 0.5 0
hom
osch
edas
tic
1.5
1
0.5
0
T1YFFM SCORE
heteroskedast ic2 1.5 1 0.5
hom
osch
edas
tic
2
1.5
1
0.5
T10YFFM SCORE
heteroskedast ic2 1.5 1 0.5
hom
osch
edas
tic
2
1.5
1
0.5
BAAFFM SCORE
heteroskedast ic3.6 3.4 3.2 3 2.8
hom
osch
edas
tic
3.6
3.4
3.2
3
2.8
NAPMNOI SCORE
Figure 17: Comparison of density forecast accuracy. Each panel describes a different variable. The
x axis reports the (log) density score obtained using the BVAR with stochastic volatility (het-
eroschedastic), the y axis reports the (log) density score obtained using the homoschedastic BVAR.
Each point corresponds to a different forecast horizon from 1 to 12 step-ahead.
Score comparison: homoskedastic model (y axis) vs heteroskedastic model (x axis)
heteroskedastic 1035.7 5.75 5.8 5.85
hom
osch
edas
tic
103
5.7
5.75
5.8
5.85
RPI RMSFE
heteroskedastic 10 35 5.1 5.2
hom
osch
edas
tic
10 3
5
5.05
5.1
5.15
5.2
DPCERA3M086SBEA RMSFE
heteroskedastic 10 39.4 9.6 9.8
hom
osch
edas
tic
103
9.3
9.4
9.5
9.6
9.7
9.8
CMRMTSPLx RMSFE
heteroskedastic 10 36.4 6.6 6.8 7 7.2 7.4
hom
osch
edas
tic
10 3
6.4
6.6
6.8
7
7.2
7.4
INDPRO RMSFE
heteroskedastic1 2 3
hom
osch
edas
tic
1
1.5
2
2.5
3
CUMFNS RMSFE
heteroskedastic0.2 0.4 0.6 0.8
hom
osch
edas
tic
0.2
0.3
0.4
0.5
0.6
0.7
0.8
UNRATE RMSFE
heteroskedastic 10 31.5 1.6 1.7 1.8 1.9
hom
osch
edas
tic
103
1.5
1.6
1.7
1.8
1.9
PAYEMS RMSFE
heteroskedastic0.3 0.4 0.5
hom
osch
edas
tic
0.3
0.35
0.4
0.45
0.5
0.55
CES0600000007 RMSFE
heteroskedastic 1032.6 2.7 2.8 2.9
hom
osch
edas
tic
10 3
2.62.652.7
2.752.8
2.852.9
2.95
CES0600000008 RMSFE
heteroskedastic 10 35.6 5.8 6 6.2
hom
osch
edas
tic
10 3
5.65.75.85.9
66.16.26.3
PPIFGS RMSFE
heteroskedastic0.03 0.031 0.032
hom
osch
edas
tic
0.03
0.0305
0.031
0.0315
0.032
0.0325
PPICMM RMSFE
heteroskedastic 10 32 2.2 2.4
hom
osch
edas
tic
10 3
1.9
2
2.1
2.2
2.3
2.4
2.5
PCEPI RMSFE
heteroskedastic0.01 0.015 0.02
hom
osch
edas
tic
0.01
0.015
0.02
FEDFUNDS RMSFE
heteroskedastic0.1 0.15 0.2 0.25
hom
osch
edas
tic
0.1
0.15
0.2
0.25
HOUST RMSFE
heteroskedastic0.03680.0370.03720.03740.0376
hom
osch
edas
tic
0.0368
0.037
0.0372
0.0374
0.0376
S&P 500 RMSFE
heteroskedastic0.023 0.0235 0.024
hom
osch
edas
tic
0.0228
0.023
0.0232
0.0234
0.0236
0.0238
0.024
EXUSUKx RMSFE
heteroskedastic0.5 0.6 0.7 0.8 0.9
hom
osch
edas
tic
0.5
0.6
0.7
0.8
0.9
T1YFFM RMSFE
heteroskedastic0.6 0.8 1 1.2 1.4 1.6
hom
osch
edas
tic
0.6
0.8
1
1.2
1.4
1.6
T10YFFM RMSFE
heteroskedastic0.6 0.8 1 1.2 1.4 1.6 1.8
hom
osch
edas
tic
0.6
0.8
1
1.2
1.4
1.6
1.8
BAAFFM RMSFE
heteroskedastic4 5 6 7
hom
osch
edas
tic
44.5
55.5
66.5
77.5
NAPMNOI RMSFE
Figure 16: Comparison of point forecast accuracy. Each panel describes a different variable. The x
axis reports the RMSFE obtained using the BVAR with stochastic volatility (heteroschedastic), the
y axis reports the RMSFE obtained using the homoschedastic BVAR. Each point corresponds to a
different forecast horizon from 1 to 12 step-ahead (in most cases, a higher RMSFE corresponds to a
longer forecast horizon).
RMSFE comparison: homoskedastic model (y axis) vs heteroskedastic model (x axis)
Conclusions
Conclusions
The assumptions of conjugacy and homoskedasticity in a VARs are hardly
defendable, but a more general specification is only manageable with a smallcross-section.
We have proposed a new estimation method VARs with non-conjugate priors anddrifting volatilities which can be applied with large models
The method is based on a straightforward triangularization of the system, and it is
very simple to implement.
Indeed, if a researcher already has algorithms to produce draws from a VAR with anindependent N-IW prior and stochastic volatility, only a single needs to be slightlymodified with a few lines of code.
Given its simplicity and the advantages in terms of speed, mixing, and convergence,we argue that the proposed algorithm should be preferred in empirical applications,especially those involving large datasets.
Carriero, Clark, Marcellino () Large VARs September 2017 21 / 22
Conclusions
Prior dependence
We assumed that the prior variance was diagonal. This can be relaxed.
With a prior dependent across equations, the general form of the posterior can be
obtained using the triangularization also on the joint prior distribution, and is:
Πj|Π1:j−1,A,ΛT , y ∼ N(µΠj |1:j−1 ,ΩΠj |1:j−1 )
with
µΠj |1:j−1 = ΩΠj |1:j−1
T
∑t=1
Xj ,th−1j ,t y
∗′j ,t +Ω−1
Πj |1:j−1µΠj |1:j−1
Ω−1Πj |1:j−1 = Ω−1Πj |1:j−1 +
T
∑t=1
Xj ,th−1j ,t X
′j ,t ,
where µΠj |1:j−1 and ΩΠj |1:j−1 are moments of
Πj|Π1:j−1 ∼ N(µΠj |1:j−1 ,ΩΠj |1:j−1 ), i.e. the conditional priors implied by the
joint prior specification.
The moments of Πj|Π1:j−1 can be found recursively from the joint prior
Carriero, Clark, Marcellino () Large VARs September 2017 22 / 22
Forecasting
Model size, stochastic volatility, and forecasting
Pseudo out of sample exercise performed recursively, starting with the estimationsample 1960:3 to 1970:2 and ending with 1960:3 to 2014:5.
We consider four models.
1 A small homoskedastic VAR including the growth rate of industrial production
(∆ ln IP), the inflation rate based on consumption expenditures (∆ lnPECEPI )and the effective Federal Funds Rate (FFR).
2 A large (20 variables) homoskedastic VAR along the lines of Carriero, Clark,
and Marcellino (2015), Giannone, Lenza, and Primiceri (2015), and Koop
(2013).3 A small VAR with time variation in volatilities along the lines of Clark (2011),
Cogley and Sargent (2005) and Primiceri (2005).4 The fourth model includes both time variation in the volatilities and a large
(20 variables) information set.
Carriero, Clark, Marcellino () Large VARs September 2017 19 / 22
Forecasting
Forecasting
Direct effects:
The use of a larger dataset improves point forecasts via a better specification
of the conditional means.
The inclusion of time variation in volatilities improves density forecasts via a
better modelling of error variances,
Interactions:
A better point forecast improves the density forecast as well, by centering the
predictive density around a more reliable mean
Time varying volatilities improve the point forecasts at longer horizons -
because the heteroskedastic model will provide more effi cient estimates
(through a GLS argument) and a therefore a better characterization of the
predictive densities
Carriero, Clark, Marcellino () Large VARs September 2017 20 / 22