Forecasting High-Dimensional, Time-Varying Covariance ...

Forecasting High-Dimensional, Time-Varying Covariance

Matrices for Portfolio Selection

Jesse Windle and Carlos M. Carvalho

First draft: Oct. 2011. Current draft: December 4, 2012

Abstract

Classical methods for estimating time-varying, daily covariance between financial assetsmake use of daily data; however, most financial assets are traded many times within a day.This intraday information can be used to construct a “high-frequency” statistic called real-ized covariance that is related to daily covariance. We show that exponentially smoothingrealized covariance matrices provides superior forecasts compared to a variety of FSVol-like models, even those that incorporate some high-frequency data. Further, exponentiallysmoothing these high-frequency statistics works when some of the assets under considerationare traded infrequently and when one considers large portfolios of assets. Given the superi-ority of exponential smoothing we show how one may encase this procedure in a state-spacemodel along the lines of Uhlig [1997] and that one may marginalize these states to estimatethe few remaining parameters.

Contents

1 Introduction 2

2 Background 32.1 Factor Stochastic Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Realized Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Extensions to Factor Stochastic Volatility 53.1 Factor “Decomposition” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Factor Log Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Factor Loadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Data, Evaluation, and Computation 64.1 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 FSVol vs. RK 7

6 Illiquidity 8

7 Robustness to Number of Assets 10

1

8 Model-based Smoothing 118.1 An Uhlig-like Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128.2 Forward Filter and Backward Sampling . . . . . . . . . . . . . . . . . . . . . . . 148.3 Estimating n, k, and λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158.4 Similarity to IGARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

9 Conclusion 17

A Deriving the β′m distribution 17

1 Introduction

Factor stochastic volatility (FSVol) is a popular tool for modeling dynamic variation and co-variation of financial time series. It has been used to model foreign exchange markets [Pitt andShephard, 1999, Aguilar and West, 2000, Lopes and Carvalho, 2007, Zhou et al., 2012], con-tagion across markets [Lopes and Migon, 2002], equities [Lopes and Carvalho, 2007, Carvalhoet al., 2011], and interest rates [Hays et al., 2012]. FSVol assumes that each security’s returnsmay be decomposed into the linear combination of a few common, latent factors, which areshared across all assets, and an idiosyncratic term that is unique to each asset. In particular, ap-factor model decomposes the n-dimensional vector of asset returns rt as Xft+εt, t = 1, . . . , T ,where X is an n × p matrix of factor loadings, ft is a p-dimensional vector of latent, commonfactors, and εt is an n-dimensional vector of idiosyncratic noise, imposing the structure

Var(rt|X) = XVar(ft)X′ + Var(εt) (1)

on the conditional variance, where Var(εt) is diagonal, and reducing the problem of estimatingan O(n2)-dimensional time series to estimating an n× p matrix and p+ n time series.

In financial contexts, FSVol is typically used on the daily or longer time scales; but thisignores prices observed throughout the day, an enormous amount of data to throw away whenthe assets under consideration are liquid. This intraday data may be used to construct non-parametric, “high-frequency” statistics, one of which, realized covariance [Barndorff-Nielsen andShephard, 2004], can be used as an estimate of the daily covariance matrix. Koopman et al.[2005] found that forecasts based upon such high-frequency statistics are superior to forecastsgenerated by low-frequency methods, such as the univariate version of FSVol. Liu [2009] showsthat this holds true in the multivariate case as well using forecasts of the daily covariance matrixgenerated by exponential smoothing; however, Liu does not compare high-frequency forecastsagainst FSVol, a gap we want to fill as FSVol is a standard method for modeling dynamiccovariance matrices.

In addition to the data used, these high-frequency statistics differ form FSVol in that theyimpose no structure on the covariance matrix of returns. But financial theory suggests thatasset prices possess a factor structure [Sharpe, 1964, Rosenberg and McKibben, 1973, Famaand French, 1993]. These explicit factor models view asset returns through the lens of linear re-gression, using predictors related to aggregate information about return, market capitalization,earnings per price, or book value to market value. Though these models use explicit factors thebasic idea remains: returns are driven by a few common sources of variation. Thus one wouldlike to find models that use high-frequency data while preserving the factor structure.

To this end, we develop FSVol-like models that incorporate exogenous data, which in thispaper will be data derived form high-frequency statistics, to inform the state of the factor log

2

variances or the factor loadings. The hope is that one may be able to use some, but not allof the information contained in the realized covariance matrices to improve forecasts. We findthat one can find slight improvements; however, concurring with Liu, we show that exponentialsmoothing realized covariance matrices is superior to both FSVol and the FSVol-like models.Given the efficacy of exponential smoothing, we examine how far this forecasting proceduremay be pushed, looking at illiquid and high-dimensional settings and finding that it holds upin both cases. Lastly, we show that one may extend the work of Uhlig [1997] to encase theexponential smoothing forecasting procedure in a covariance-matrix-valued state-space model.This model has closed form forward filter and backward sampling distributions and its statesmay be marginalized to estimate the few remaining system parameters. Both features ease thecomputational challenge that quickly emerges when modeling objects that scale like the squareof the number of assets under consideration.

The outline of this paper is as follows. Section 2 reviews FSVol and realized covariance.Section 3 presents two extensions to FSVol that incorporate data from high-frequency statistics.Section 4 discusses how we compare these models and how we compute the forecasts. Section5 presents the results of the comparison. Section 6 discusses exponentially smoothing realizedkernels and illiquidity. Section 7 discusses exponential smoothing when working with a largeportfolio of assets. Section 8 shows how to encase exponential smoothing in a state space model.Section 9 concludes.

2 Background

2.1 Factor Stochastic Volatility

Factor stochastic volatility fuses factor analysis and stochastic volatility (SV). Factor analysisfinds low dimensional structure in high-dimensional data by assuming that joint movements inthe response are generated by a few common factors; specifically, rt = Xft + εt, where rt is n-dimensional, X is an n×p matrix, ft is p-dimensional, and εt is n-dimensional [Basilevsky, 1994].In factor analysis, ft and εt are independent across time. FSVol ties together these observationsusing stochastic volatility, that is, ft ∼ SV (µf , φf ,W f , hft ) and εt ∼ SV (µε, φε,W ε, hεt ). Theprocess xt ∼ SV (µ, φ,W, ht) when{

xit ∼ N(0, ehit), i = 1, . . . , p

ht = µ+ φ� (ht−1 − µ) + ωt, ωt ∼ N(0,W ),

where µ and φ are p-dimensional vectors and W is a p × p covariance matrix [Jacquier et al.,1994, Kim et al., 1998]. The stochastic volatility process produces heteroscedastic white noisewith heavy tails–the hallmark of financial asset returns over short time periods. One needs torestrict the matrix X for identifiability and herein we assume that X is unit lower triangular.The covariance structure of rt, as shown in equation (1), is such that the off diagonal elementsare completely determined by X and the variance of ft. In particular, the off diagonal elementsof the high-dimensional covariance matrix of rt are a p-dimensional weighted average of xix

′i

where xi is the ith column of X. Thus the O(n2) covariance structure is determined by howthe weighting of these outer products changes in time. For more reading on FSVol see Aguilar[1998], Lopes and West [2004], Chib et al. [2006], or Chib et al. [2009].

3

2.2 Realized Covariance

Let St be the price of a stock, Rt = logSt − logS0 be the cumulative log return of a stock,and rt(δ) = Rt −Rt−δ be the δ-log-returns. The day t realized variance is the sum of intradaysquared δ-returns,

RVt(δ) =∑

δi∈(t−1,t]

r2δi(δ).

Here, time is measured continuously and starts at zero, so that t = 1 corresponds to the endof the first day. Though we have phrased this in terms of 24 hour days we will restrict ourconstruction of the realized covariance to the hours in which the market is open. The theory ofcontinuous time stochastic processes shows that the realized variance converges in probabilityto a quantity we call the day t quadratic covariation as δ converges to zero so long as Rt satisfiessome weak regularity conditions [Jacod and Shiryaev, 2003]. (The grid need not be uniformeither, but we phrase it that way here for simplicity.)

There is both mathematical and empirical justification for using the realized variance as aproxy for the daily variance. Barndorff-Nielsen and Shephard [2004] show that if Rt = αtdWt

and the continuous process {αt} is independent of the Brownian motion {Wt}, then rt(1) = σtεtwhere εt is standard normal and σ2t = limδ→0RVt(δ). Andersen et al. [2001] shows that a similarresult holds empirically. The variance σ2t is “realized” in the sense that one can almost observeRVt(δ) at the end of day t for small δ, which is the case for liquid stocks.

The same construction works for multidimensional stochastic processes. In that case, aproxy for the daily covariance matrix becomes the outer product of intraday returns

RCt(δ) =∑

δi∈(t−1,t]

rδi(δ)rδi(δ)′,

called the realized covariance matrix. The probabilistic theory does not work flawlessly inpractice. The pioneers of realized variance found that calculating the intraday sum of squaredreturns using all available intraday data produces unrealistically large estimates of the dailyvariance. In fact, the estimates appear to diverge with the number of observations. This hasbeen attributed to market microstructure, such as the bid-ask spread and discreteness of prices[Zhang et al., 2005]. To avoid this bias, statisticians and econometricians initially calculatedthe realized variance using intraday information at a “safe” frequency [Andersen et al., 2003].More complicated estimators of the daily quadratic variation have since been developed, such asMykland et al.’s multi-scale estimator and Barndorff-Nielsen et al.’s realized kernel [Ait-Sahaliaet al., 2011, Barndorff-Nielsen et al., 2008]. We will use a multivariate version of the realizedkernel to estimate and forecast daily covariance [Barndorff-Nielsen et al., 2011]. We denote thedaily realized kernel by RKt.

2.3 Exponential Smoothing

Exponential smoothing is a weighted average of past observations in which the weights decaygeometrically. One-step ahead forecasts of the quantity of interest are taken to be the currentweighted average: {

St = (1− λ)RKt + λSt−1,

Σt+1 = St,(2)

where St is the day t weighted average and Σt+1 is the one-step ahead forecast. Such forecastscost little to compute (even for high dimensional objects), and are ensured to be positive definite.We pick λ by in-sample loss, as explained in section 4.3.

4

3 Extensions to Factor Stochastic Volatility

As we will see below, exponential smoothing provides superior one-step ahead forecasts com-pared to factor stochastic volatility; however, exponential smoothing does not yield any insightinto the common factors that may move all asset prices. Below we build improved FSVol-likemodels that incorporate some, but not all, of the information contained in realized kernels whilepreserving the factor structure found in classic factor models.

3.1 Factor “Decomposition”

One may use transformations of the realized kernel to augment FSVol. Suppose that theintraday returns are independent and identically distributed normal random variables whosedaily conditional variance agrees with the daily conditional variance implied by FSVol. Thengiven X,hft , and hεt the realized covariance, (RCt(δ)|Σt), is distributed as Wm(1/δ,Σt), where

Σt = Var(rt|hft , hεt , X) = XFtX′ + Et,

Ft = Var(ft|hft ), and Et = Var(εt|hεt ). One can use the observation (RCt(δ)|Σt) along with

the evolution equations for hft and hεt to forecast future daily variances; however, this modeldoes not have closed form forward filter equations nor is it realistic to assume that returnsare independent and identically distributed [Ait-Sahalia et al., 2011]. To account for thesediscrepancies we use the likelihood generated by (RCt(δ)|Λt, Dt,Ψt) ∼ Wm(1/δ,ΛtDtΛ

′t + Ψt)

to extract point estimates of Λt, Dt, and Ψt and treat those point estimates as noisy observationsof the corresponding values X, Ft, and Et found in FSVol. In this way, the likelihood is usedless as a tool for modeling and more tool for matrix decomposition.

For instance, at each time t we might extract Dt via the likelihood RCt ∼Wm(1,ΛtDtΛ′t +

Ψt) and then use Dt as exogenous data to be included in a FSVol-like model. In particular,

since Ft = Var(ft|hft ) we would treat log Diit to be a noisy observation of hfit. Similarly, Λtcould be seen as noisy observation of X, or it could be used inform a dynamic factor loadingsmodels [Lopes and Carvalho, 2007]. One is not limited to decomposing RCt using the likelihoodapproach described above and we try introducing other sources of exogenous information, albeitin an ad hoc manner, such as the eigenvalue decomposition and the Cholesky decomposition ofthe realized kernel. The extended models we consider are as follows.

3.2 Factor Log Variances

Assume that one has an external source of information vt about the factor log variances, hft .

Then one may augment the SV process to incorporate vt as a noisy observation of hft by addingan observation equation to SV:

fit =∼ N(0, ehit), i = 1, . . . , p,

vt = hft + ηt, ηt ∼ N(m,U)

hft = µ+ φ� (hft−1 − µ) + ωt, ωt ∼ N(0,W ).

The error term of the noisy observation vt is allowed to have non-zero mean so that one may

include information that is presumed to be proportional to the conditional variance, ehfit , of the

factors.

5

Alcoa (AA) American Express (AXP) Boeing (BA) Bank of America (BAC) Caterpillar (CAT)Cisco (CSCO)* Chevron (CVX) Du Pont (DD) Disney (DIS) General Electric (GE)Home Depot (HD) Hewlett-Packard (HPQ) IBM (IBM) Intel (INTC)* Johnson & Johnson (JNJ)JP Morgan (JPM) Kraft (KFT) Coca-Cola (KO) McDonald’s (MCD) 3M (MMM)Merk (MRK) Microsoft (MSFT)* Phizer (PFE) Proctor & Gamble (PG) AT&T (T)Traveler’s (TRV) United Technologies (UTX) Verizon (VZ) Walmart (WMT) Exxon Mobil (XOM)

Table 1: The thirty stocks that make up the primary dataset. The asterisk denotes companieswhose primary exchange is the NASDAQ. All other companies trade primarily on the NYSE.

3.3 Factor Loadings

Assume that one has an external source of information Xt that is a noisy observation of the dy-namic factor loadings, Θt. One may alter a dynamic factor loadings FSVol model to incorporatethis observation via

rt = Θtft + εt, ft, εt ∼ SVxt = θt + νt, νt ∼ N(0, V X)

θt = θt−1 + ωt ωt ∼ N(0,WX)

where V X and WX are diagonal matrices, xt = vecl(Xt), θt = vecl(Θt), and the operator veclvectorizes the elements on or below the diagonal by column. When all else is known, Θt is aDLM, which one may tractably simulate.

4 Data, Evaluation, and Computation

4.1 The Data

We follow the thirty stocks found in table 1 comprising the Dow Jones Industrial Average as ofOctober, 2010. The data consists of the intraday tick-by-tick trading prices from 9:30 AM to4:00 PM provided by the Trades and Quotes (TAQ) database through Wharton Research DataServices1 . The data set runs from February 27, 2007 to October 29, 2010 providing a total of927 trading days. We follow Barndorff-Nielsen et al.’s work in [Barndorff-Nielsen et al., 2009]and [Barndorff-Nielsen et al., 2011] as a guide for constructing the realized covariance matrices.

4.2 Evaluation

We evaluate models using one-day ahead daily covariance matrix forecasts. The primary met-ric by which we compare these forecasts is the empirical standard deviation of the minimumvariance portfolios (ESDMVP), which measures the ability of a forecasting procedure to hedgeuncertainty within a class of similarly risky assets. The day t portfolio is constructed by solving

argmin‖ξ‖1=1

Var[ξ′rt|Dt−1]

where Dt−1 is the data observed up to time t − 1. If we assume that rt can be factored sothat (rt|Σt) has mean zero and variance Σt, as is the case with FSVol, then the one-day aheadminimum variance portfolio may be calculated analytically by solving

πt = argmin‖ξ‖1=1

ξ′Σtξ where Σt = E[Σt|Dt−1],

1Wharton Research Data Services (WRDS) was used in preparing Factor Stochastic Volatility and RealizedCovariance. This service and the data available thereon constitute valuable intellectual property and trade secretsof WRDS and/or its third-party suppliers.

6

as Var[rt|Dt−1] = E[Σt|Dt−1]. In the case of exponential smoothing we take Σt = St−1 as inequation (2). Accumulated over the out-of-sample period, the total loss is

L1({Σt}, {rt}) = sd[{π′trt}]

where sd is the empirical standard deviation. The secondary metric by which we compareforecasts is the log-likelihood,

L2({Σt}, {rt}) =

T∑t=1

−1

2log |Σt| −

1

2r′tΣ−1t rt.

While lacking a natural economic interpretation, this secondary metric is a useful check uponour primary metric.

4.3 Prediction

For exponential smoothing, the dataset is split into three pieces: an initialization set, t =1, . . . , 50; an in-sample set, t = 51, . . . , 100; and an out-of-sample set t = 101, . . . , 920. Thesmoothing parameter λ from equation (2) is chosen by minimizing the in-sample loss and thenfixed to generate out-of-sample forecasts. For the dataset described above λ = 0.84.

Generating forecasts using FSVol and the FSVol-like models in considerably more timeconsuming. We follow the work of Aguilar [1998] to compute posterior distributions of theunknown parameters and hidden states—all have recognizable complete conditionals after usingthe discrete mixture of normals approximation of Kim et al. [1998] for the SV process. Thedata set is split into two pieces, a prior identification set, t = 1, . . . , 100; and a test set, t =101, . . . , 920. The prior identification set is used to pick hyper-parameters, which in this case arechosen so that the hidden states of the stochastic volatility processes are highly persistent. Ateach step t, we produce the point estimate Σt = E[XFtX

′+Et|Dt−1], which requires producing

the posterior distribution p(X,hft , hεt |Dt−1). However, there is no easy way to simulate this

distribution sequentially; instead, one must simulate an entire MCMC at each t in 101, . . . , 920.The forecasting routine, written in C++, takes on the order of a minute to produce 1500samples when t is several hundred; to generate a complete set of forecasts for a single modeltakes several hours; and to generate all the forecasts considered herein considered takes severaldays on a Intel Xeon 2.4 GHz processor.

5 FSVol vs. RK

As seen in Table 2, realized kernels provide good point estimates of the daily covariance matrixand exponentially smoothing realized kernels produces the best forecasts. In fact, we foundthat exponential smoothing is better than the all of the FSVol models for a range of λ, unlessλ is very close to unity (see Figure 1). Among the FSVol-like models we find mixed results.Only the p-factor dynamic factor loadings model does better than the classic factor stochasticvolatility model. For that model, we take the the first p columns of L from the LDL′ Choleskydecomposition of the realized kernel and treat those as the exogenous data Xt from section 3.3.We attempted to use the first few eigenvalues of the realized covariance matrix to inform thefactor log variances, but this did not improve the one-step ahead forecasts.

While FSVol loses, it does not lose by a lot. Comparing the two-factor FSVol forecastsagainst the realized kernel one finds a loss of less than 7 basis points on the daily time scale.

7

Figure 1: The in-sample and out-of-sample empirical standard deviation of the minimum vari-ance portfolios using exponential smoothing for various values of λ. We used exponentialsmoothing to generate one-step ahead forecasts of the covariance matrix as well as one-stepahead minimum variance portfolios. We chose the smoothing parameter λ by minimizing thein-sample empirical standard deviation of the minimum variance portfolios. That smoothingparameter was then used to calculate the out-of-sample loss. The out-of-sample loss is relativelyinsensitive to the choice of λ.

This translates into an extra $7 of standard deviation for every $10,000 invested when usingFSVol instead of smoothed realized kernels. While this difference may matter to a large investorit is not going to make a difference for those that lack the means to access the high-frequencydata as it may be cheaper to use daily stock price data that is freely available on the web.But if one has access to high-frequency data then models based upon daily returns should beabandoned in favor of models that make use of the realized kernel directly.

6 Illiquidity

To this point, we have studied forecasts of liquid stocks. In this case, there are many intradayobservations for all of the equities concerned and one can generate good approximations to thedaily covariance matrix using the realized kernel. However, when one of the stocks is illiquidthe realized kernel degrades. In particular, the construction of the realized kernel uses pricesupdated as frequently as the least liquid stock; consequently, the quality of the estimate willbe controlled by those stocks that trade infrequently. If there are only a few transactions fora particular stock within the day, then it is even possible to generate rank-deficient realizedkernels.

To examine how exponentially smoothing realized kernels is affected by less frequent obser-vations, we artificially reduce the liquidity of the 30 stocks from section 4.1 by only samplingthe prices every 5, 10, 15, 30, or 60 minutes. From the plot is Figure 2, one can see thatforecasts using the realized kernels matrix constructed using 5 and 10 minutes windows is notmuch worse than the realized kernel constructed at the highest possible frequency and that onlythe 30 and 60 minute windows perform worse than factor stochastic volatility.

8

Method Factors Filtered Estimate One-Step Ahead ForecastMVP LLH MVP LLH

Factor Stochastic Volatility 1 0.008837 97382 0.010213 949102 0.008504 97824 0.009923 951553 0.008400 98187 0.010035 95302

Factor Log Variances - Eigenvalues 1 0.009275 97361 0.010588 949482 0.009262 97762 0.010637 951773 0.009359 97998 0.010810 95280

Factor Loadings - Cholesky 1 0.007567 97567 0.009968 941202 0.007490 98280 0.009783 946063 0.007729 98443 0.009931 94714

Factor Log Variances - Factor “Decomp.” 1 0.009572 97199 0.010964 947982 0.009027 97633 0.010449 950503 0.009116 98002 0.010766 95264

Factor Loadings - Factor “Decomp” 1 0.008121 97787 0.010079 947782 0.007795 97520 0.010343 940283 0.007834 97449 0.010161 94592

Realized Kernel - Random Walk 0.007836 101384 0.010602 91966Exponential Smoothing 0.008782 98096 0.009290 96675

Uhlig-like Model 0.007836 101384 0.009615 94047

Table 2: For each t = 101, . . . , 920, we calculate the filtered estimate Var[rt|Dt] and the one-stepahead forecast Var[rt|Dt−1] of the day t covariance matrix. (Dt is the data up to time t.) A newMCMC simulation is run for each time step t when constructing these estimates and forecastsfor the FSVol-like models. For exponential smoothing, and in-sample period, t = 51, . . . , 100was used to pick the smoothing parameter λ, which was then fixed for t = 101, . . . , 920. Thefiltered estimate for exponential smoothing refers to the exponentially weighted moving averageof realized kernels, St, from equation (2). The entry labeled “Realized Kernel - Random Walk”estimates the day t covariance matrix using the day t realized kernel and forecasts the dayt covariance matrix using the day t − 1 realized kernel. The column labeled MVP reportsthe empirical standard deviation of the minimum variance portfolios and the column labeledLLH reports the log-likelihood as described in section 4.2, each calculated with both the filteredestimates and one-step ahead forecasts. The row labeled Uhlig-like model refers to the estimatesand forecasts produced using the model described in Section 8.

9

Figure 2: We construct the realized kernels for the 30 assets from section 4.1 using pricessampled periodically every 5, 10, 15, 30, and 60 minutes. The vertical line in the left-handplot is the in-sample choice of λ. The horizontal line in the right-hand plot is the ESDMVPfor the portfolio for the out-of-sample period using the in-sample choice of λ. The full realizedkernel performs best out-of-sample, though the 5 and 10 minute estimates are not far behind.The best FSVol like model had an ESDMVP of 0.00978, which is higher than the portfoliosconstructed using the 5, 10, or 15 minute realized covariances.

7 Robustness to Number of Assets

One potential fault of exponential smoothing is that it uses a single parameter to generateforecasts for O(n2) time series, where n is the number of assets. To see if this becomes aliability, we construct a new set of realized kernels {RK∗t } for n = 96 equities chosen fromthe S&P 500 for their relatively long life-span under the same ticker symbol 2. N stocks wereselected at random from the pool of 96 assets and the realized covariance of these assets ateach time t was taken as the appropriate submatrix of RK∗t . The smoothing parameter λ wasselected as described in Section 4.3, though the in-sample period was adjusted to t = 51, . . . , 150and the out-of-sample period to t = 151, . . . , 920. We enlarged the in-sample period in hopesof eliminating outlying λ. We repeated this procedure 200 times for each N = 30, . . . , 90.Searching for the best smoothing parameter λ was limited to the interval [0.4, 1.0]. Even whenthe smoothing parameter takes on relatively extreme values, such as 0.4, the ESDMVP isreasonable. It appears that on average the smoothing parameter is relatively stable as thenumber of assets increases and that the best guess on average is very near the smoothingparameter 0.84 found for the original dataset described in section 4.1. As seen in Figure 3,the forecasts steadily improves as the number of assets increases, suggesting that the singlesmoothing parameter is not a liability–simple forecasts still work as the system grows. Figure3 also suggests that as the number of assets increases, the standard deviation of the portfolio

2The ticker symbols for the assets used are AA, ABT, AFL, AIG, ALL, APA, APC, AXP, BA, BAC, BAX,BBT, BEN, BK, BMY, C, CAH, CAT, CCL, CL, COP, CVS, CVX, D, DD, DE, DHR, DIS, DOW, DUK, EMC,EMR, EXC, FDX, GD, GE, GIS, HAL, HD, HON, HPQ, IBM, ITW, JNJ, JPM, K, KMB, KO, LLY, LMT,LOW, MCD, MDT, MMM, MO, MRK, MRO, NEM, NKE, NOC, OXY, PEP, PFE, PG, PNC, PX, RTN, SLB,SO, STI, STT, SYY, T, TGT, TWX, TXN, UNH, UNP, USB, UTX, VZ, WAG, WFC, WMT, XOM, AAPL,AMAT, AMGN, COST, CSCO, DELL, INTC, MSFT, ORCL, QCOM, YHOO.

10

Figure 3: A plot of the smoothing parameter λ selected by minimizing in-sample loss of Nstocks selected at random within a pool of 96 stocks sits on the left while the subsequent out-of-sample loss is on the right. In this case we took the in-sample period to be t = 51, . . . , 150and the out-of-sample period to be t = 151, . . . , 920. The red line is the median ESDMVP foreach N and the green lines are the first and third quartiles.

converges to some lower limit. While the number of ways to select N stocks decreases as Ngrows, there are still 927,048,304 possible choices when N = 90. Thus, we do not think thatthe decrease in variance of λ or the out-of-sample ESDMVP is due to a smaller sample-space.

8 Model-based Smoothing

Exponentially smoothed realized kernels produce better forecasts than any FSVol-like model.But exponential smoothing is not a statistical model and does not include a distribution de-scribing how errent the forecasts might be. In this section, we explore a statistical model forrealized kernels that makes forecasts identical to exponential smoothing. This model abandonsasset returns altogether and uses high-frequency statistics as the sole source of data. We willdenote the set of positive definite, symmetric matrices by S++.

The construction of S++-valued processes dates back at least to variance discounting [Quin-tana and West, 1987]; Uhlig [1997] and Shephard [1994] formalized the notion of variance dis-counting within in a probability model. More recent S++-variate models include the Wishart-autoregressive process of Gourieroux et al. [2009], the multivariate stochastic volatility modelsof Philipov and Glickman [2006], the inverse-Wishart-autoregression process of Fox and West[2011], and the HEAVY models of Noureldin et al. [2011]. As noted by Fox and West thelatent variable construction of Pitt and Walker [2005] can be used to generate S++-valued pro-cesses as well. Further, one may indirectly generate S++-valued processes using the Choleskyfactorization [Chiriac and Voev, 2010] or the matrix logarithm [Bauer and Vorkink, 2011].

Uhlig [1997] develops a non-linear state-space model that smooths rank-1, positive definite,symmetric matrices. One may extend this method to matrices of any order, including full-rank matrices, so that the matrix-variate data {Yt} is tracked using matrix-variate states {Pt}.The virtue of this model is simplicity: it possesses only three parameters θ = (n, k, λ), whichwill reduce to two after imposing a constraint; it has closed form forward filter and backward

11

sampling equations for Pt; and it has a closed form conditional density p(Yt|Dt−1, θ) that may beused to estimate the parameters by maximum likelihood or posterior simulation via Metropolis-Hastings. For notational convenience, the parameter θ will henceforth be suppressed and theconditional densities will assumed to be conditioned upon θ unless stated otherwise. Prado andWest [2010] mention a similar model, however, they employ a discounting technique that limitswhat is essentially the smoothing parameter λ from equation (2). In particular, they requirethat λ > (m− 2)/(m− 1) where m is the order of the matrix-variate data. When m = 30, forinstance, this requires that λ > 0.9655. The method below avoids this pitfall and allows for agreater range of smoothing parameters.

8.1 An Uhlig-like Model

A version of Uhlig’s model is the following, in which one may interpret Yt as the observed,symmetric, rank-1 matrix-variate data and Pt as the dynamic, hidden state:

Yt ∼Wm(1,P−1t ),

Pt = U ′t−1ΨtUt−1/λ, Ψt ∼ βm(ν−12 , 1/2, I

),

Ut−1 = chol Pt−1.

Wm is the Wishart distribution and βm is the multivariate beta distribution, which is definedbelow. In his original paper, Uhlig observed a vector of returns distributed as rt ∼ N(0,P−1t ),which provides identical information to rtr

′t ∼Wm(1,P−1t ) when updating the precision Pt.

The non-singular analog of this model is

Yt ∼Wm(k, (kPt)−1),

Pt = U ′t−1ΨtUt−1/λ, Ψt ∼ βm(n

2,k

2, I), (3)

Ut−1 = chol Pt−1,

where n, k > m − 1. The evolution equation (3) for this model arises by considering thetransformation described by (7) found in the following theorem, which synthesizes results foundin Uhlig [1994], Muirhead [1982] Theorems 3.3.1, 2.1.6, and Dıaz-Garcıa and Jaimez [1997] andallows for singular matrices of any rank.

The following notation will be useful. We write S+m,i for the set of symmetric, non-negative

definite matrices of order m and rank i with the convention that S+m,i consists of those matrices

with rank die ∧ m when i is a real number. When considering full-rank matrices we writeS++m = S+

m,m. We write S+m,i(A) as the set of X ∈ S+

m,i such that A−X ∈ S++m and S++

m (A) =

S+m,m(A).

Definition 1. Let k be a positive integer less than m or a real number greater than m− 1 andn > m− 1. The multivariate beta distribution βm(k/2, n/2), has density

Γm[12(k + n)

]Γm(12k)Γm(12n)

(detU)(k−m−1)/2[

det(I − U)](n−m−1)/2

dU

for U ∈ S++m (I) when k > m− 1 and

Γm[12(k + n)

]Γm(12k)Γm(12n)

(detD)(k−m−1)/2[

det(I − U)](n−m−1)/2

dU

12

where U = HDH ′ ∈ S+m,k(I), H is a matrix of orthonormal columns of order m× k, and D is a

diagonal matrix of order k × k, when k is a positive integer less than m. When k is a positiveinteger less than m we define

V ∼ βm(n/2, k/2)

by V = I − U and U ∼ βm(k/2, n/2). We define

V ∼ βm(k/2, n/2, S)

by V = T ′UT where T ′T = S is the Cholesky factorization of S and U ∼ βm(k/2, n/2).

Theorem 2. Let k be a positive integer less than m or a real number greater than m− 1 andlet n > m− 1. The bijection from S+

m,k × S+m to S+

m × S+m,k(I) defined by (A,B) 7→ (S,U) via{

S = A+B,

U = (T−1)′AT−1,

where T ′T = S is the Cholesky factorization of S, or inversely{A = T ′UT,

B = T ′(I − U)T,

defines a change of variables from

A ∼Wm(k,Σ) ⊥ B ∼Wm(n,Σ) (4)

toS ∼Wm(n+ k,Σ) ⊥ U ∼ βm(k/2, n/2, I). (5)

Further, the conditional distributions (S|B) and (B|S) are

S | B = Wm(k,Σ) +B, (6)

andB | S = T ′(I − U)T = βm(n/2, k/2, S). (7)

This theorem tells one how the hidden state in equation (3) evolves and how to backwardsample. In particular, given the information set Dt−1, we know that

Zt ∼Wm(k,Σ−1t−1) ⊥ λPt ∼Wm(n,Σ−1t−1),

by (4), if and only if, by (5),

Pt−1 ∼Wm(n+ k,Σ−1t−1) ⊥ Ψt ∼ βm(n/2, k/2),

as determined by the bijection characterized by{Pt−1 = λPt + Zt,

λPt = U(Pt−1)′ΨtU(Pt−1).

Thus the evolution from Pt−1 to Pt, given Dt−1, is described by the law (Pt|Pt−1, Dt−1) ∼βm(n/2, k/2,Pt−1/λ) as in (7), which marginalizes to

Pt|Dt−1 ∼Wm(n,Σ−1t−1/λ). (8)

13

Going backward we have

(Pt−1 | Pt, Dt−1) ∼ λPt + Zt, Zt ∼Wm(k,Σ−1t−1)

as in (6). The latter provides the distribution from which one may backward sample in thespirit of Fruwirth-Schnatter [1994].

The model produces closed form distributions as we move from (Pt−1|Dt−1) to (Pt|Dt−1)to (Pt|Dt). However, the distribution of Pt degenerates as it evolves in the absence of data.Suppose one starts with the information Dt−1 and

Pt|Dt−1 ∼Wm(n,Σ−1t−1/λ) ⊥ Ψt ∼ βm(n/2, k/2).

ThenPt+1 = U(Pt)′Ψt+1U(Pt)

yields, according to Theorem (2),

Pt+1 ∼Wm(n− k,Σ−1t−1/λ2).

Thus, each time we evolve Pt forward without updating we lose k degrees of freedom; eventually,the process will not have a recognizable distribution. Since we are only interested in generatingone-step ahead forecasts and we have no missing data this degeneracy does not concern us.

8.2 Forward Filter and Backward Sampling

The program to forward filter relies upon the evolution of (Pt−1|Dt−1) to (Pt|Dt−1) describedby (8) and the conjugacy between a Wishart sampling distribution and Wishart prior. Inparticular, given the “prior” distribution

Pt | Dt−1 ∼Wm(n,Σ−1t−1/λ)

and sampling distributionYt|Pt ∼Wm(k, (kPt)−1),

Bayes theorem yields

p(Pt|Dt−1, Yt) ∝Wm(ν, (λΣt−1 + kYt)−1), ν = k + n.

Hence the forward filtering recursions may be summarized as:

• Time t− 1 posterior: Pt−1 | Dt−1 ∼Wm(ν,Σ−1t−1), ν = k + n.

• Evolution, i.e. time t prior: Pt | Dt−1 = λ−1Wm(n,Σ−1t−1).

• Observation: Yt | Pt ∼Wm(k, (kPt)−1), ensuring that E[Yt|Pt] = P−1t .

• Update: Pt | Dt ∼Wm(ν, (λΣt−1 + kYt)−1).

One only needs the recursionΣt = λΣt−1 + kYt

to generate a set of filtered distributions. To backward sample: start with PT ∼ Wm(ν,ΣT )and then proceed recursively by Pt−1 | Pt = λPt + Zt where Zt ∼ Wm(k,Σ−1t−1). The followingmoments are worth noting.

14

• “Posterior” mean of hidden variance:

E[P−1t−1|Dt−1] =Σt−1

ν −m− 1.

• “Prior” mean of hidden variance:

E[P−1t |Dt−1] =λΣt−1

ν − k −m− 1.

• Forecasted mean of hidden variance:

E[Yt | Dt−1] = E[ E[Yt | Pt, Dt−1] | Dt−1] = E[P−1t | Dt−1].

If we chose λ such thatν −m− 1 = k(1− λ)−1. (9)

then the time t “prior” mean of the hidden variance has the same value as the time t − 1“posterior” mean. In the limit

Σt ' kN∑i=0

λiYt−i,

in which case the constraint (9) ensures that

E[P−1t |Dt−1] ' (1− λ)N∑i=0

λiYt−1−i;

that is, the forecasted mean of the hidden variance is an exponentially weighted average of therealized covariance matrices.

8.3 Estimating n, k, and λ

Since we are unaware of any convenient conjugate updating step to estimate n, k, or λ weturn to the joint likelihood of n, k, and λ under Y1:T when n, k > m − 1. Marginalizing(Yt | Pt) ∼ Wm(k, (kPt)−1) over (Pt | Dt−1) ∼ Wm(n, V −1t ) where Vt = λΣt−1 (as we do inAppendix A), shows that

Yt | Dt−1 ∼ β′m(k

2,n

2, Vt/k

)lives within a family of beta distributions related to those defined by Olkin and Rubin [1964]whose density is

p(Yt|Dt−1) =Γm(ν2 )

Γm(n2 )Γm(k2 )

|Yt|(k−m−1)/2|Vt/k|n/2

|Vt/k + Yt|ν/2. (10)

We will refer to this density as “multivariate beta prime” in accordance with its univariatecounterpart. We can factor the density for (Y1:T | D0) as a product of such distributions

p(Y1:T |D0) =[ T∏i=1

p(Yi|Di−1)], (11)

where, to remind the reader, we implicitly condition on (k, n, λ) and Σ0 throughout.

15

Figure 4: The log-likelihood of (n, k) calculated using t = 51, . . . , 100 for the Uhlig-like modelwhere the constraint (9) determines the value of λ. The black line is the level set of themaximum likelihood as a function of (n, k). The grey line is the level set of λ as a function of(n, k). When perturbing n and k near the maximum likelihood the corresponding value of λ,which controls how the exponentially weighted average of the past realized covariance matricesdecays, does not change much.

While before we estimated the parameter λ, which controls the smoothing of the realizedkernels, using in-sample loss, we may now estimate λ within the context of a probabilistic model.In particular, we split our dataset into three pieces, an initialization set, an in-sample set, andand out-of-sample set. To initialize, we let Σ50 = kS50 where S50 is the weighted average∑49

i=0wiRK50−i with probabilities wi ∝ 0.9i for i = 0, . . . , 49. The discounting factor 0.9 ischosen arbitrarily, but with the rough rule of thumb that a good choice of such factors oftenlies in [0.9, 0.99]. The maximum likelihood of (k, n) with constraint (9) and hyperparameterΣ50 = kS50, is calculated over the in-sample set, p(Y51:100|k, n), and is (43.61, 70.66) implyingλ = 0.476. The implied value of λ is much smaller than expected; however, the ESDMVPfor the out-of-sample period under this parameter is 0.0096, still better than what one getswhen using the factor stochastic volatility models. Figure 4 suggests that this choice of λ isfairly insensitive the the specific value of (n, k). In particular, slight changes in n or k will notproduce larger values of λ, such 0.9 or 0.95, that are traditionally associated with a smoothingparameter.

As seen in Table (2), this model balances good estimates, as determined by the log-likelihoodmeasure of loss and good forecasts as determined by the minimum variance portfolio measureof loss. None of the other models do so well in both estimation and forecasting except for thefactor loadings model that uses data from the Cholesky decomposition; however, its adaptedestimates only do well under the minimum variance measure, which is of little consolation sinceone cannot invest after the fact.

8.4 Similarity to IGARCH

Beyond parameter estimation, the factorization of Yt above shows that the state-space modelwith which we began can be recast as a GARCH-like model [Bollerslev, 1986]. To illustrate,the analogous univariate IGARCH model for daily returns is{

rt ∼ N(0, σ2t ),

σ2t = λσ2t−1 + (1− λ)r2t .

16

In this case, the conditional variance σ2t is an exponentially weighted average of squared returnsand one may estimate the parameter λ by maximum likelihood by employing the factorizationp(r1:T |D0) =

∏Tt=1 p(rt|Dt−1). Similarly, as shown by equations (10) and (11), after marginal-

izing Pt we may write the Uhlig-like model as{Yt ∼ β′m(k2 ,

n2 , V

∗t ),

V ∗t = λ(V ∗t−1 + Yt−1),

where V ∗t = Vt/k, though not an average, is an exponentially weighted sum,

V ∗t =N∑i=1

λiYt−i.

9 Conclusion

In this paper, we have shown that exponentially smoothed realized kernels produce better one-step ahead forecasts than factor stochastic volatility; that one may use information extractedfrom realized kernels in conjunction with dynamic factor loadings FSVol models to improve uponfactor stochastic volatility, though these models are still inferior to exponential smoothing;that exponentially smoothing realized kernels still works well when assets are illiquid; thatexponential smoothing is robust to the number of assets one considers; and that one may wrapexponential smoothing realized kernels in a statistical model. The virtue of this model is itssimplicity, which manifests itself in closed form densities for forward filtering and backwardsampling, in its closed form conditional distributions of Yt|Dt−1, in that it has only two degreesof freedom to be estimated for the entire system, and in that it is still valid for rank-deficientobservations.

From a practical point of view, the take home message is that exponential smoothing is agood forecasting procedure for short horizons when working with realized covariances. It is acomputationally inexpensive procedure and useful when working with many assets. Computingforecasts for the covariance matrices of as many as 90 assets using exponential smoothing, as insection 7 is not a problem, whereas more complicated models may struggle to work with dataof that size. Exponential smoothing also bipasses the problem of model error. When workingwith high-dimensional covariance matrix-variate data, the model is certainly wrong. Modelerror infects the estimated parameters of the system and the subsequent forecasts one makes.As seen in section 8.3, one may arrive at point estimates of the parameters that are far fromoptimal when calculated using a model. Thus, even though the model’s forecasting procedureis identical to exponential smoothing, its likelihood is not as useful as using the measure of lossone is interested in when estimating the parameters of the system. By specifying less structure,that is, by only specifying a forecasting procedure, one avoids having to pick a likelihood thatis destined to be incorrect and hence one may ultimately produce more reliable predictions.

A Deriving the β′m distribution

Fact 3. Suppose that k, n > m, (Yt|Pt) ∼ Wm(k, (kPt)−1), and (Pt|Dt−1) ∼ Wm(n, V −1t ).Then the density for (Yt|Dt−1) is

β′m(k/2, n/2, Vt/k) :=Γm(ν2 )

Γm(n2 )Γm(k2 )

|Yt|(k−m−1)/2|Vt/k|n/2

|Vt/k + Yt|ν/2.

17

Proof. Consider the joint density p(Yt|Pt)p(Pt|Dt−1):

|kPt∣∣k/2

2km/2Γm(k2 )|Yt|(k−m−1)/2 exp

{−1

2tr kPtYt

}|Vt|n/2

2nm/2Γm(n2 )|Pt|(n−m−1)/2 exp

{−1

2tr VtPt

}which is

|Yt|(k−m−1)/2

2km/2Γm(k2 )

|Vt|n/2

2nm/2Γm(n2 )kkm/2|Pt|(ν−m−1)/2 exp

{−1

2tr(Vt + kYt

)Pt}.

The latter terms are the kernel for a Wishart distribution in Pt. Integrating the kernel produces

2νm/2Γm(ν2 )

|Vt + kYt|ν/2.

Hence the distribution of Yt|Dt−1 is

Γm(ν2 )kkm/2

Γm(n2 )Γm(k2 )

|Yt|(k−m−1)/2|Vt|n/2

|Vt + kYt|ν/2.

Factoring the k in the denominator gives us

Γm(ν2 )

Γm(n2 )Γm(k2 )

|Yt|(k−m−1)/2|Vt/k|n/2

|Vt/k + Yt|ν/2.

References

O. Aguilar. Latent Structure in Bayesian Multivariate Time Series Models. PhD thesis, DukeUniversity, 1998.

O. Aguilar and M. West. Bayesian dynamic factor models and portfolio allocation. Journal ofBusiness and Economic Statistics, 18(3):338–357, July 2000.

Y. Ait-Sahalia, P. A. Mykland, and L. Zhang. Ultra high frequency volatility estimation withdependent microstructure noise. Journal of Econometrics, 160(1):160–175, January 2011.

T. G. Andersen, T. Bollerslev, F. X. Diebold, and H. Ebens. The distribution of realized stockreturn volatility. Journal of Financial Econometrics, 61:43–76, 2001.

T. G. Andersen, T. Bollerslev, F. X. Diebold, and P. Labys. Modeling and forecasting realizedvolatility. Econometrica, 71(2):579–625, March 2003.

O. E. Barndorff-Nielsen and N. Shephard. Econometric analysis of realized covariation: Highfrequency based covariance, regression, and correlation in financial economics. Econometrica,72(3):885–925, May 2004.

18

O. E. Barndorff-Nielsen, P. R. Hansen, A. Lunde, and N. Shephard. Designing realized kernelsto measure the ex post variation of equity prices in the presence of noise. Econometrica, 76:1481–1536, 2008.

O. E. Barndorff-Nielsen, P. R. Hansen, A. Lunde, and N. Shephard. Realized kernels in practice:Trades and quotes. Econometrics Journal, 12(3):C1–C32, 2009.

O. E. Barndorff-Nielsen, P. R. Hansen, A. Lunde, and N. Shephard. Multivariate realizedkernels: Consistent positive semi-definite estimators of the covariation of equity prices withnoise and non-synchronous trading. Journal of Econometrics, 162:149–169, 2011.

A. Basilevsky. Statistical Factor Analysis and Related Methods. Wiley Series in Probability andMathematical Statistics. John Wiley & Sons, Inc., 1994.

G. H. Bauer and K. Vorkink. Forecasting multivariate realized stock market volatility. Journalof Econometrics, 160:93–101, 2011.

T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of Economet-rics, 31:307–327, 1986.

C. M. Carvalho, H. F. Lopes, and O. Aguilar. Dynamic stock selection strategies: A structuredfactor model framework. In Bayesian Statistics 9. Oxford University Press, 2011.

S. Chib, F. Nardari, and N. Shephard. Analysis of high dimensional stochastic volatility models.Journal of Econometrics, 134:341–371, 2006.

S. Chib, Y. Omori, and M. Asai. Multivariate stochastic volatility. In T. Andersen, R. Davis,J.-P. Kreiss, and T. Mikosch, editors, Handbook of Financial Time Series, pages 365–400.Springer-Verlag, 2009.

R. Chiriac and V. Voev. Modelling and forecasting multivariate realized volatility. Journal ofApplied Econometrics, 2010.

J. A. Dıaz-Garcıa and R. G. Jaimez. Proof of the conjectures of h. uhlig on the singular mul-tivariate beta and the jacobian of a certain matrix transformation. The Annals of Statistics,25:2018–2023, 1997.

E. F. Fama and K. R. French. Common risk factors in the returns on stocks and bonds. Journalof Financial Economics, 33:3–56, 1993.

E. B. Fox and M. West. Autoregressive models for variance matrices: Stationary inverse Wishartprocesses. Technical report, Duke University, July 2011.

S. Fruwirth-Schnatter. Data augmentation and dynamic linear models. Journal of Time SeriesAnalysis, 15:183–202, 1994.

C. Gourieroux, J. Jasiak, and R. Sufana. The Wishart autoregressive process of multivariatestochastic volatility. Journal of Econometrics, 150:167–181, 2009.

S. Hays, H. Shen, and J. Z. Huang. Functional dynamic factor models with applications toyield curve forecasting. Annals of Applied Statistics, 6(3):870–894, 2012.

J. Jacod and A. N. Shiryaev. Limit Theorems For Stochastic Processes. Springer, 2003.

19

E. Jacquier, N. G. Polson, and P. E. Rossi. Bayesian analysis of stochastic volatility models.Journal of Business and Economic Statistics, 12:371–389, 1994.

S. Kim, N. Shephard, and S. Chib. Stochastic volatility: Likelihood inference and comparisonwith ARCH models. The Review of Economic Studies, 65(3):361–393, Jul. 1998.

S. J. Koopman, B. Jungbackera, and E. Hol. Forecasting daily variability of the S&P 100 stockindex using historical, realised and implied volatility measurements. Journal of EmpiricalFinance, 12:445–475, 2005.

Q. Liu. On portfolio optimization: How and when do we benefit from high-frequency data?Journal of Applied Econometrics, 24:560–582, 2009.

H. F. Lopes and C. M. Carvalho. Factor stochastic volatility with time varying loadings andMarkov switching regimes. Journal of Statistical Planning and Inference, 137:3082–3091,2007.

H. F. Lopes and H. S. Migon. Case Studies in Bayesian Statistics, chapter Comovements andContagion in Emergent Markets: Stock Indexes Volatilities, pages 287–302. Springer, 2002.

H. F. Lopes and M. West. Bayesian model assessment in factor analysis. Statistica Sinica, 14:41–67, 2004.

R. J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley, 1982.

D. Noureldin, N. Shephard, and K. Sheppard. Multivariate high-frequency-based volatility(HEAVY) models. Applied Econometrics, 27(6):907–933, 2011.

I. Olkin and H. Rubin. Multivariate beta distributions and independence properties of theWishart distributions. Annals of Mathematical Statistics, 35(1):261–269, 1964.

A. Philipov and M. E. Glickman. Multivariate stochastic volatility via Wishart processes.Journal of Business and Economic Statistics, 24:313–328, July 2006.

M. K. Pitt and N. Shephard. Time varying covariances: a factor stochastic volatility approach.Bayesian Statistics, 6:547–570, 1999.

M. K. Pitt and S. G. Walker. Constructing stationary time series models using auxiliaryvariables with applications. Journal of the American Statistical Association, 100(470):554–564, 2005.

R. Prado and M. West. Time Series: Modeling, Computation, and Inference, chapter Multi-variate DLMs and Covariance Models, pages 263–319. Chapman & Hall/CRC, 2010.

J. M. Quintana and M. West. An analysis of international exchange rates using multivariateDLMs. The Statistician, 36:275–281, 1987.

B. Rosenberg and W. McKibben. The prediction of systematic and specific risk in commonstocks. The Journal of Financial and Quantitative Analysis, 8:317–333, 1973.

W. F. Sharpe. Capital asset prices: a theory of market equilibrium under conditions of risk.The Journal of Finance, 19:425–442, 1964.

20

N. Shephard. Local scale models: State space alternative to integrated garch processes. Journalof Econometrics, 60:181–202, 1994.

H. Uhlig. On singular Wishart and singular multivariate beta distributions. The Annals ofStatistics, 22(1):395–495, 1994.

H. Uhlig. Bayesian vector autoregressions with stochastic volatility. Econometrica, 65(1):59–73,Jan. 1997.

L. Zhang, P. A. Mykland, and Y. Ait-Sahalia. A tale of two time scales: Determining integratedvolatility with noisy high-frequency data. Journal of the American Statistical Association,100(472):1394–1411, 2005.

X. Zhou, J. Nakajima, and M. West. Dynamic dependent factor models: Improving forecastsand portfolio decisions in financial time series. Technical Report 2012-09, Duke University,2012. URL http://ftp.stat.duke.edu/WorkingPapers/11-16.html. Under review at:International Journal of Forecasting.

21

Forecasting High-Dimensional, Time-Varying Covariance ...

Documents

Transcript of Forecasting High-Dimensional, Time-Varying Covariance ...