Shrinkage Estimation of Vector Autoregressive Models

Post on 23-Jan-2016

50 views 0 download

Tags:

description

Shrinkage Estimation of Vector Autoregressive Models. Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010. Introduction (1). We want to forecast: The rate of growth of employment, The change in annual inflation, The change in federal fund rate. - PowerPoint PPT Presentation

Transcript of Shrinkage Estimation of Vector Autoregressive Models

Shrinkage Estimation ofVector Autoregressive Models

Pawin Siriprapanukul

pawin@econ.tu.ac.th

11 January 2010

Introduction (1)

• We want to forecast:– The rate of growth of employment,– The change in annual inflation,– The change in federal fund rate.

• A standard and simple system approach in economics is the VAR.

Introduction (2)

• OLS provides the efficient estimator for the VAR.

• However, there are a lot of evidences showing that Bayesian VAR outperforms unrestricted OLS VAR in out-of-sample forecasting:– Litterman (1986), and Robertson and Tallman

(1999).

Introduction (3)

• Banbura et al. (2008) also show that it is possible and satisfactory to employ many endogenous variables with long lags in the Bayesian VAR (131 var, 13 lags).

• We see some studies following this direction.

Introduction (4)

• There is another related literature in forecasting using large number of predictors in the model.

• A popular method is the “Approximate Factor Model”, proposed by Stock and Watson (2002).

Introduction (5)

• In this literature, it was shown that using larger number of predictors (independent variables) does not always help improve the forecasting performances.

• Bai and Ng (2008) show that selecting variables using the LASSO or the elastic net, before applying the methodology of the approximate factor model can outperform bigger models.

Introduction (6)

• Even they interpret their results differently, we see that this is an evidence of redundancy of models with large predictors.

• Now, considering VAR with large endogenous variables and long lags, we think that redundancy should be the case as well.

Introduction (7)

• We have not gone into VAR with large endogenous variables yet. But we are working with 13 lags in the VAR.

Bias-Variance Tradeoff (1)

• Suppose OLS estimate is unbiased.

• Gauss-Markov Theorem:– OLS estimate has the smallest variance

among all linear unbiased estimates.

• However, we know that there are some biased estimates that have smaller variances than the OLS estimate.

Bias-Variance Tradeoff (2)

OLS;

Unbiased, but High Variances

Shrinkage;

Biased, but Small Variance

x

x

True Model

VAR (1)

• We consider a VAR relationship.

• Note here that we cannot write the bias-variance tradeoff for the VAR. – The OLS estimate is biased under finite

sample.

• We still think similar logic applies. However, direction of shrinkage may be important.

VAR (2)

• With T observations, we have:

where

We assume

,Y XB U

1

1 1

1

1

( ... ) ',

( ... ) ', ( ' ... ) '

( ... ) ',

( ... ) '.

T

T t t t p

p

T

Y Y Y

X X X X Y Y

B A A

U U U

~ ( , ).N I u 0

VAR (3)

• The unrestricted OLS estimator is:

• This estimator may not be defined if we have too many endogenous variables or too many lags.

( ) 1ˆ ( ' ) ( ' ).olsi iB X X X Y

Bayesian VAR (1)

• This is a shrinkage regression.

• We follow Kadiyala and Karlson (1997) and Banbura et al. (2008) to use the Normal-(Inverted)-Wishart as our prior distribution.

• We work with stationary and demeaned variables. Hence, we set the mean of prior distribution at zero.

Bayesian VAR (2)

• We can write the (point) estimator of our Bayesian VAR estimate as:

• where

( ) ( ) 1 1ˆ ( ' ) ( ' ),bvar bvari iB X X X Y

2 2 2 21 1

1 1 1 1( ,..., ;...; ,..., ).1 1 n n

diagp p

Ridge Regression (1)

• Well-known in statistical literature.• Can be defined as:

• This is a regression that imposes a penalty on the size of the estimated coefficients.

2

( ) ( ) 2

1 1 1

ˆ arg min .i

np npTrr rri it ji jt ji

B t j j

B y b x b

Ridge Regression (2)

• The solution of the previous problem is:

• Observe the similarity with:

( ) ( ) 1ˆ ( ' ) ( ' ).rr rri iB I X X X Y

( ) ( ) 1 1ˆ ( ' ) ( ' ).bvar bvari iB X X X Y

BVAR v RR (1)

• Proposition 1:– BVAR estimator can be seen as the solution of the

optimization problem:

– where is the (j,j)-th element of the matrix .

2

( ) ( ) 1 2

1 1 1

ˆ arg min ,i

np npTbvar bvari it ji jt j ji

B t j j

B y b x b

1~ j 1

BVAR v RR (2)

• Proposition 2:– Let , we have:

– Where

• Note: If , is just standardized .

2/1* ~XX

( )* ( ) * * 1 *ˆ ( ' ) ( ' )rr rri iB I X X X Y

( ) ( ) ( )* ( )* *1 , 1| , 1| 1

ˆ ˆˆ ˆ' 'bvar bvar rr rri T i T T i T T i TB X y y B X

0 *X X

LASSO (1)

• Least Absolute Shrinkage and Selection Operator.

• The LASSO estimate can be defined as:

2

( ) ( )

1 1 1

ˆ arg min .i

np npTlasso lassoi it ji jt ji

B t j j

B y b x b

LASSO (2)

• LASSO is proposed because:– Ridge regression is not parsimonious.– Ridge regression may generate huge

prediction errors under sparse matrix of true (unknown) coefficients.

• LASSO can outperform RR if:– True (unknown) coefficients are composed of

a lot of zeros.

LASSO (3)

• If there are a lot of irrelevant variables in the model, setting their coefficients at zeros every time can reduce variance without disturbing the bias that much.

• We see that VAR with 13 lags may possess a lot of irrelevant variables.

The Elastic Net (1)

• Zou and Hastie (2005) propose another estimate that can further improve the performance of LASSO.

• It is called the elastic net, and the naïve version can be defined as:

2

( ) ( ) ( ) 21 2

1 1 1 1

ˆ arg min .i

np np npTnen en eni it ji jt ji ji

B t j j j

B y b x b b

The Elastic Net (2)

• We modify the elastic to allow treating different lagged variables differently.

• Our modified naïve elastic net is:

12

2

( ) ( ) ( ) 1 21 2

1 1 1 1

ˆ arg min .i

np np npTnen en eni it ji jt j ji j ji

B t j j j

B y b x b b

Implementation

• We can use the algorithm called “LARS” proposed by Efron, Hastie, Johnstone, and Tibshirani (2004) to implement both LASSO and EN efficiently.

• This can be applied to our modified version as well.

Empirical Study (1)

• I use the US data set from Stock and Watson (2005).– Monthly data cover Jan 1959 – Dec 2003.– There are 132 variables. But I use only 7.

• I transformed the data as in De Mol, Giannone, and Reichlin (2008) to obtain stationary. – Their replication file can be downloaded.– Their transformation make every variable to be annual

growth or change in annual growth.

Empirical Study (2)

• Out-of-sample performances.– In each month from Jan 1981 to Dec 2003

(276 times), regress one model using the most recent 120 observations, to make one forecast.

– The performances are measured using Relative Mean Squared Forecast Errors (RMSFE), using OLS as the benchmark regression.

Empirical Study (3)

• There are 3 variables that we want to forecast:– The employment (EMPL)– The annual inflation (INF)– The Federal Fund Rate (FFR).

• The order of VAR is p = 13.

• There are 4 forecast horizons (1,3,6,12), and 3 values of (0,1,2).

Empirical Study (4)

• The most time-consuming part is to figure out suitable parameters for each regression.

• We use grid searches on out-of-sample performances during the test period Jan 1971 – Dec 1980 (120 times).– Bayesian VAR: We employ the process in my

previous chapter.– LASSO: A grid of 90 values.– Modified Elastic Net: A grid of 420 pairs of values.

Empirical Study (5)

• We also employ the combination of LASSO and Bayesian VAR as well. – LASSO discards some variables that tend to

correspond with zero true coefficients.– Bayesian VAR is similar to ridge regression, which

assigns better amount of shrinkage to positive coefficients.

Empirical Study (6)

• For the smallest model, we use the 3 variables to forecast themselves.

h = 1 173.13 1.943 0.25 2.005 0.50, 1 2.004pi = 0 h = 3 216.26 2.062 0.23 2.880 0.46, 1 2.088

h = 6 318.88 2.021 0.20 1.969 0.22, 0.1 1.969h = 12 108.51 2.700 0.26 2.739 0.54, 1 2.738h = 1 53.28 1.940 0.21 2.014 0.42, 1 2.009

pi = 1 h = 3 38.10 2.069 0.17 2.113 0.38, 1 2.109h = 6 50.30 2.044 0.20 2.009 0.30, 0.5 2.008

h = 12 13.32 2.714 0.27 2.795 0.62, 1 2.786h = 1 25.25 1.988 0.15 2.070 0.86, 10 2.025

pi = 2 h = 3 8.65 2.112 0.12 2.155 0.90, 5 2.130h = 6 9.64 2.084 0.15 2.048 0.14, 0.001 2.050

h = 12 1.28 2.733 0.28 2.806 0.86, 1 2.739

BVAR LASSO MNEN

Empirical Study (7)pi = 0 pi = 1 pi = 2

RMSFE RMSFE RMSFEEMPL 0.765 0.778 0.790

h = 1 FFR 0.448 0.409 0.400 INF 0.723 0.743 0.776

average 0.645 0.643 0.655 EMPL 0.862 0.825 0.811

h = 3 FFR 0.616 0.580 0.551 INF 0.696 0.719 0.748

average 0.725 0.708 0.703 EMPL 0.870 0.858 0.855

h = 6 FFR 0.524 0.519 0.508 INF 0.779 0.800 0.821

average 0.724 0.726 0.728 EMPL 0.804 0.812 0.825

h = 12 FFR 0.490 0.471 0.459 INF 0.677 0.694 0.707

average 0.657 0.659 0.664

BVAR

Empirical Study (8)pi = 0 pi = 1 pi = 2

RMSFE RMSFE RMSFEEMPL 0.788 0.804 0.809

h = 1 FFR 0.462 0.447 0.453 INF 0.681 0.681 0.715

average 0.644 0.644 0.659 EMPL 0.826 0.825 0.827

h = 3 FFR 0.637 0.589 0.579 INF 0.642 0.652 0.694

average 0.702 0.689 0.700 EMPL 0.847 0.878 0.893

h = 6 FFR 0.547 0.569 0.581 INF 0.740 0.735 0.755

average 0.711 0.728 0.743 EMPL 0.770 0.813 0.834

h = 12 FFR 0.444 0.444 0.497 INF 0.613 0.632 0.657

average 0.609 0.630 0.663

LASSO

Empirical Study (9)

BVARRMSFE RMSFE no. var RMSFE no. var RMSFE no. var

EMPL 0.765 0.788 11.37 0.787 11.47 0.771 11.37 h = 1 FFR 0.448 0.462 16.00 0.462 16.07 0.476 16.00

INF 0.723 0.681 11.72 0.681 11.79 0.712 11.72 average 0.645 0.644 0.643 0.653 EMPL 0.862 0.826 10.15 0.827 10.23 0.880 10.15

h = 3 FFR 0.616 0.637 14.93 0.637 15.01 0.639 14.93 INF 0.696 0.642 10.75 0.642 10.83 0.667 10.75

average 0.725 0.702 0.702 0.729 EMPL 0.870 0.847 7.85 0.907 3.45 0.888 7.85

h = 6 FFR 0.524 0.547 13.45 0.487 7.94 0.549 13.45 INF 0.779 0.740 9.27 0.804 5.11 0.760 9.27

average 0.724 0.711 0.733 0.732 EMPL 0.804 0.770 11.99 0.772 12.77 0.770 11.99

h = 12 FFR 0.490 0.444 16.43 0.448 17.01 0.451 16.43 INF 0.677 0.613 12.32 0.617 12.94 0.651 12.32

average 0.657 0.609 0.612 0.624

MNEN LASSO + BVARpi = 0 LASSO

Comparing different regressions. Pi = 0

Empirical Study (10)

Comparing different regressions. Pi = 0BVAR

RMSFE RMSFE no. var RMSFE no. var RMSFE no. varEMPL 0.765 0.788 11.37 0.787 11.47 0.771 11.37

h = 1 FFR 0.448 0.462 16.00 0.462 16.07 0.476 16.00 INF 0.723 0.681 11.72 0.681 11.79 0.712 11.72

average 0.645 0.644 0.643 0.653 EMPL 0.862 0.826 10.15 0.827 10.23 0.880 10.15

h = 3 FFR 0.616 0.637 14.93 0.637 15.01 0.639 14.93 INF 0.696 0.642 10.75 0.642 10.83 0.667 10.75

average 0.725 0.702 0.702 0.729 EMPL 0.870 0.847 7.85 0.907 3.45 0.888 7.85

h = 6 FFR 0.524 0.547 13.45 0.487 7.94 0.549 13.45 INF 0.779 0.740 9.27 0.804 5.11 0.760 9.27

average 0.724 0.711 0.733 0.732 EMPL 0.804 0.770 11.99 0.772 12.77 0.770 11.99

h = 12 FFR 0.490 0.444 16.43 0.448 17.01 0.451 16.43 INF 0.677 0.613 12.32 0.617 12.94 0.651 12.32

average 0.657 0.609 0.612 0.624

MNEN LASSO + BVARpi = 0 LASSO

Empirical Study (11)

When we change to 7-variable VAR.BVAR

RMSFE RMSFE no. var RMSFE no. var RMSFE no. varEMPL 0.290 0.295 22.41 0.295 28.07 0.283 22.41

h = 1 FFR 0.125 0.137 26.02 0.139 32.11 0.133 26.02 INF 0.197 0.212 22.25 0.216 28.36 0.196 22.25

average 0.204 0.215 0.217 0.204 EMPL 0.294 0.286 18.28 0.285 18.96 0.289 18.28

h = 3 FFR 0.134 0.131 22.27 0.128 23.47 0.125 22.27 INF 0.207 0.215 18.88 0.210 19.98 0.205 18.88

average 0.212 0.210 0.207 0.207 EMPL 0.289 0.301 9.83 0.300 11.75 0.288 9.83

h = 6 FFR 0.107 0.110 15.66 0.110 17.44 0.107 15.66 INF 0.147 0.146 12.23 0.145 14.17 0.141 12.23

average 0.181 0.186 0.185 0.179 EMPL 0.240 0.284 55.44 0.280 57.87 0.226 55.44

h = 12 FFR 0.079 0.152 55.41 0.127 57.91 0.073 55.41 INF 0.032 0.137 55.56 0.072 58.20 0.030 55.56

average 0.117 0.191 0.160 0.110

pi = 0 LASSO MNEN LASSO + BVAR

Conclusion

• Even the empirical results are not impressive, we still think this is a promising way to improve the performances of Bayesian VARs.

• When the model becomes bigger, e.g. models with 131 endogenous variables, this should be more relevant.

• We can think of some cautions like Boivin and Ng’s (2006) for the VAR as well.

Thank you very much.