Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul [email protected] 11...

38
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul [email protected] 11 January 2010

Transcript of Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul [email protected] 11...

Page 1: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Shrinkage Estimation ofVector Autoregressive Models

Pawin Siriprapanukul

[email protected]

11 January 2010

Page 2: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Introduction (1)

• We want to forecast:– The rate of growth of employment,– The change in annual inflation,– The change in federal fund rate.

• A standard and simple system approach in economics is the VAR.

Page 3: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Introduction (2)

• OLS provides the efficient estimator for the VAR.

• However, there are a lot of evidences showing that Bayesian VAR outperforms unrestricted OLS VAR in out-of-sample forecasting:– Litterman (1986), and Robertson and Tallman

(1999).

Page 4: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Introduction (3)

• Banbura et al. (2008) also show that it is possible and satisfactory to employ many endogenous variables with long lags in the Bayesian VAR (131 var, 13 lags).

• We see some studies following this direction.

Page 5: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Introduction (4)

• There is another related literature in forecasting using large number of predictors in the model.

• A popular method is the “Approximate Factor Model”, proposed by Stock and Watson (2002).

Page 6: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Introduction (5)

• In this literature, it was shown that using larger number of predictors (independent variables) does not always help improve the forecasting performances.

• Bai and Ng (2008) show that selecting variables using the LASSO or the elastic net, before applying the methodology of the approximate factor model can outperform bigger models.

Page 7: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Introduction (6)

• Even they interpret their results differently, we see that this is an evidence of redundancy of models with large predictors.

• Now, considering VAR with large endogenous variables and long lags, we think that redundancy should be the case as well.

Page 8: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Introduction (7)

• We have not gone into VAR with large endogenous variables yet. But we are working with 13 lags in the VAR.

Page 9: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Bias-Variance Tradeoff (1)

• Suppose OLS estimate is unbiased.

• Gauss-Markov Theorem:– OLS estimate has the smallest variance

among all linear unbiased estimates.

• However, we know that there are some biased estimates that have smaller variances than the OLS estimate.

Page 10: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Bias-Variance Tradeoff (2)

OLS;

Unbiased, but High Variances

Shrinkage;

Biased, but Small Variance

x

x

True Model

Page 11: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

VAR (1)

• We consider a VAR relationship.

• Note here that we cannot write the bias-variance tradeoff for the VAR. – The OLS estimate is biased under finite

sample.

• We still think similar logic applies. However, direction of shrinkage may be important.

Page 12: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

VAR (2)

• With T observations, we have:

where

We assume

,Y XB U

1

1 1

1

1

( ... ) ',

( ... ) ', ( ' ... ) '

( ... ) ',

( ... ) '.

T

T t t t p

p

T

Y Y Y

X X X X Y Y

B A A

U U U

~ ( , ).N I u 0

Page 13: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

VAR (3)

• The unrestricted OLS estimator is:

• This estimator may not be defined if we have too many endogenous variables or too many lags.

( ) 1ˆ ( ' ) ( ' ).olsi iB X X X Y

Page 14: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Bayesian VAR (1)

• This is a shrinkage regression.

• We follow Kadiyala and Karlson (1997) and Banbura et al. (2008) to use the Normal-(Inverted)-Wishart as our prior distribution.

• We work with stationary and demeaned variables. Hence, we set the mean of prior distribution at zero.

Page 15: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Bayesian VAR (2)

• We can write the (point) estimator of our Bayesian VAR estimate as:

• where

( ) ( ) 1 1ˆ ( ' ) ( ' ),bvar bvari iB X X X Y

2 2 2 21 1

1 1 1 1( ,..., ;...; ,..., ).1 1 n n

diagp p

Page 16: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Ridge Regression (1)

• Well-known in statistical literature.• Can be defined as:

• This is a regression that imposes a penalty on the size of the estimated coefficients.

2

( ) ( ) 2

1 1 1

ˆ arg min .i

np npTrr rri it ji jt ji

B t j j

B y b x b

Page 17: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Ridge Regression (2)

• The solution of the previous problem is:

• Observe the similarity with:

( ) ( ) 1ˆ ( ' ) ( ' ).rr rri iB I X X X Y

( ) ( ) 1 1ˆ ( ' ) ( ' ).bvar bvari iB X X X Y

Page 18: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

BVAR v RR (1)

• Proposition 1:– BVAR estimator can be seen as the solution of the

optimization problem:

– where is the (j,j)-th element of the matrix .

2

( ) ( ) 1 2

1 1 1

ˆ arg min ,i

np npTbvar bvari it ji jt j ji

B t j j

B y b x b

1~ j 1

Page 19: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

BVAR v RR (2)

• Proposition 2:– Let , we have:

– Where

• Note: If , is just standardized .

2/1* ~XX

( )* ( ) * * 1 *ˆ ( ' ) ( ' )rr rri iB I X X X Y

( ) ( ) ( )* ( )* *1 , 1| , 1| 1

ˆ ˆˆ ˆ' 'bvar bvar rr rri T i T T i T T i TB X y y B X

0 *X X

Page 20: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

LASSO (1)

• Least Absolute Shrinkage and Selection Operator.

• The LASSO estimate can be defined as:

2

( ) ( )

1 1 1

ˆ arg min .i

np npTlasso lassoi it ji jt ji

B t j j

B y b x b

Page 21: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

LASSO (2)

• LASSO is proposed because:– Ridge regression is not parsimonious.– Ridge regression may generate huge

prediction errors under sparse matrix of true (unknown) coefficients.

• LASSO can outperform RR if:– True (unknown) coefficients are composed of

a lot of zeros.

Page 22: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

LASSO (3)

• If there are a lot of irrelevant variables in the model, setting their coefficients at zeros every time can reduce variance without disturbing the bias that much.

• We see that VAR with 13 lags may possess a lot of irrelevant variables.

Page 23: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

The Elastic Net (1)

• Zou and Hastie (2005) propose another estimate that can further improve the performance of LASSO.

• It is called the elastic net, and the naïve version can be defined as:

2

( ) ( ) ( ) 21 2

1 1 1 1

ˆ arg min .i

np np npTnen en eni it ji jt ji ji

B t j j j

B y b x b b

Page 24: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

The Elastic Net (2)

• We modify the elastic to allow treating different lagged variables differently.

• Our modified naïve elastic net is:

12

2

( ) ( ) ( ) 1 21 2

1 1 1 1

ˆ arg min .i

np np npTnen en eni it ji jt j ji j ji

B t j j j

B y b x b b

Page 25: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Implementation

• We can use the algorithm called “LARS” proposed by Efron, Hastie, Johnstone, and Tibshirani (2004) to implement both LASSO and EN efficiently.

• This can be applied to our modified version as well.

Page 26: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (1)

• I use the US data set from Stock and Watson (2005).– Monthly data cover Jan 1959 – Dec 2003.– There are 132 variables. But I use only 7.

• I transformed the data as in De Mol, Giannone, and Reichlin (2008) to obtain stationary. – Their replication file can be downloaded.– Their transformation make every variable to be annual

growth or change in annual growth.

Page 27: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (2)

• Out-of-sample performances.– In each month from Jan 1981 to Dec 2003

(276 times), regress one model using the most recent 120 observations, to make one forecast.

– The performances are measured using Relative Mean Squared Forecast Errors (RMSFE), using OLS as the benchmark regression.

Page 28: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (3)

• There are 3 variables that we want to forecast:– The employment (EMPL)– The annual inflation (INF)– The Federal Fund Rate (FFR).

• The order of VAR is p = 13.

• There are 4 forecast horizons (1,3,6,12), and 3 values of (0,1,2).

Page 29: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (4)

• The most time-consuming part is to figure out suitable parameters for each regression.

• We use grid searches on out-of-sample performances during the test period Jan 1971 – Dec 1980 (120 times).– Bayesian VAR: We employ the process in my

previous chapter.– LASSO: A grid of 90 values.– Modified Elastic Net: A grid of 420 pairs of values.

Page 30: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (5)

• We also employ the combination of LASSO and Bayesian VAR as well. – LASSO discards some variables that tend to

correspond with zero true coefficients.– Bayesian VAR is similar to ridge regression, which

assigns better amount of shrinkage to positive coefficients.

Page 31: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (6)

• For the smallest model, we use the 3 variables to forecast themselves.

h = 1 173.13 1.943 0.25 2.005 0.50, 1 2.004pi = 0 h = 3 216.26 2.062 0.23 2.880 0.46, 1 2.088

h = 6 318.88 2.021 0.20 1.969 0.22, 0.1 1.969h = 12 108.51 2.700 0.26 2.739 0.54, 1 2.738h = 1 53.28 1.940 0.21 2.014 0.42, 1 2.009

pi = 1 h = 3 38.10 2.069 0.17 2.113 0.38, 1 2.109h = 6 50.30 2.044 0.20 2.009 0.30, 0.5 2.008

h = 12 13.32 2.714 0.27 2.795 0.62, 1 2.786h = 1 25.25 1.988 0.15 2.070 0.86, 10 2.025

pi = 2 h = 3 8.65 2.112 0.12 2.155 0.90, 5 2.130h = 6 9.64 2.084 0.15 2.048 0.14, 0.001 2.050

h = 12 1.28 2.733 0.28 2.806 0.86, 1 2.739

BVAR LASSO MNEN

Page 32: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (7)pi = 0 pi = 1 pi = 2

RMSFE RMSFE RMSFEEMPL 0.765 0.778 0.790

h = 1 FFR 0.448 0.409 0.400 INF 0.723 0.743 0.776

average 0.645 0.643 0.655 EMPL 0.862 0.825 0.811

h = 3 FFR 0.616 0.580 0.551 INF 0.696 0.719 0.748

average 0.725 0.708 0.703 EMPL 0.870 0.858 0.855

h = 6 FFR 0.524 0.519 0.508 INF 0.779 0.800 0.821

average 0.724 0.726 0.728 EMPL 0.804 0.812 0.825

h = 12 FFR 0.490 0.471 0.459 INF 0.677 0.694 0.707

average 0.657 0.659 0.664

BVAR

Page 33: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (8)pi = 0 pi = 1 pi = 2

RMSFE RMSFE RMSFEEMPL 0.788 0.804 0.809

h = 1 FFR 0.462 0.447 0.453 INF 0.681 0.681 0.715

average 0.644 0.644 0.659 EMPL 0.826 0.825 0.827

h = 3 FFR 0.637 0.589 0.579 INF 0.642 0.652 0.694

average 0.702 0.689 0.700 EMPL 0.847 0.878 0.893

h = 6 FFR 0.547 0.569 0.581 INF 0.740 0.735 0.755

average 0.711 0.728 0.743 EMPL 0.770 0.813 0.834

h = 12 FFR 0.444 0.444 0.497 INF 0.613 0.632 0.657

average 0.609 0.630 0.663

LASSO

Page 34: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (9)

BVARRMSFE RMSFE no. var RMSFE no. var RMSFE no. var

EMPL 0.765 0.788 11.37 0.787 11.47 0.771 11.37 h = 1 FFR 0.448 0.462 16.00 0.462 16.07 0.476 16.00

INF 0.723 0.681 11.72 0.681 11.79 0.712 11.72 average 0.645 0.644 0.643 0.653 EMPL 0.862 0.826 10.15 0.827 10.23 0.880 10.15

h = 3 FFR 0.616 0.637 14.93 0.637 15.01 0.639 14.93 INF 0.696 0.642 10.75 0.642 10.83 0.667 10.75

average 0.725 0.702 0.702 0.729 EMPL 0.870 0.847 7.85 0.907 3.45 0.888 7.85

h = 6 FFR 0.524 0.547 13.45 0.487 7.94 0.549 13.45 INF 0.779 0.740 9.27 0.804 5.11 0.760 9.27

average 0.724 0.711 0.733 0.732 EMPL 0.804 0.770 11.99 0.772 12.77 0.770 11.99

h = 12 FFR 0.490 0.444 16.43 0.448 17.01 0.451 16.43 INF 0.677 0.613 12.32 0.617 12.94 0.651 12.32

average 0.657 0.609 0.612 0.624

MNEN LASSO + BVARpi = 0 LASSO

Comparing different regressions. Pi = 0

Page 35: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (10)

Comparing different regressions. Pi = 0BVAR

RMSFE RMSFE no. var RMSFE no. var RMSFE no. varEMPL 0.765 0.788 11.37 0.787 11.47 0.771 11.37

h = 1 FFR 0.448 0.462 16.00 0.462 16.07 0.476 16.00 INF 0.723 0.681 11.72 0.681 11.79 0.712 11.72

average 0.645 0.644 0.643 0.653 EMPL 0.862 0.826 10.15 0.827 10.23 0.880 10.15

h = 3 FFR 0.616 0.637 14.93 0.637 15.01 0.639 14.93 INF 0.696 0.642 10.75 0.642 10.83 0.667 10.75

average 0.725 0.702 0.702 0.729 EMPL 0.870 0.847 7.85 0.907 3.45 0.888 7.85

h = 6 FFR 0.524 0.547 13.45 0.487 7.94 0.549 13.45 INF 0.779 0.740 9.27 0.804 5.11 0.760 9.27

average 0.724 0.711 0.733 0.732 EMPL 0.804 0.770 11.99 0.772 12.77 0.770 11.99

h = 12 FFR 0.490 0.444 16.43 0.448 17.01 0.451 16.43 INF 0.677 0.613 12.32 0.617 12.94 0.651 12.32

average 0.657 0.609 0.612 0.624

MNEN LASSO + BVARpi = 0 LASSO

Page 36: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Empirical Study (11)

When we change to 7-variable VAR.BVAR

RMSFE RMSFE no. var RMSFE no. var RMSFE no. varEMPL 0.290 0.295 22.41 0.295 28.07 0.283 22.41

h = 1 FFR 0.125 0.137 26.02 0.139 32.11 0.133 26.02 INF 0.197 0.212 22.25 0.216 28.36 0.196 22.25

average 0.204 0.215 0.217 0.204 EMPL 0.294 0.286 18.28 0.285 18.96 0.289 18.28

h = 3 FFR 0.134 0.131 22.27 0.128 23.47 0.125 22.27 INF 0.207 0.215 18.88 0.210 19.98 0.205 18.88

average 0.212 0.210 0.207 0.207 EMPL 0.289 0.301 9.83 0.300 11.75 0.288 9.83

h = 6 FFR 0.107 0.110 15.66 0.110 17.44 0.107 15.66 INF 0.147 0.146 12.23 0.145 14.17 0.141 12.23

average 0.181 0.186 0.185 0.179 EMPL 0.240 0.284 55.44 0.280 57.87 0.226 55.44

h = 12 FFR 0.079 0.152 55.41 0.127 57.91 0.073 55.41 INF 0.032 0.137 55.56 0.072 58.20 0.030 55.56

average 0.117 0.191 0.160 0.110

pi = 0 LASSO MNEN LASSO + BVAR

Page 37: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Conclusion

• Even the empirical results are not impressive, we still think this is a promising way to improve the performances of Bayesian VARs.

• When the model becomes bigger, e.g. models with 131 endogenous variables, this should be more relevant.

• We can think of some cautions like Boivin and Ng’s (2006) for the VAR as well.

Page 38: Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ.tu.ac.th 11 January 2010.

Thank you very much.