Out-of-Sample Performance of Discrete-Time Spot …...Out-of-Sample Performance of Discrete-Time...

17
Out-of-Sample Performance of Discrete-Time Spot Interest Rate Models Yongmiao HONG Department of Economics and Department of Statistical Sciences, Cornell University, Ithaca, NY 14850 Department of Economics, Tsinghua University, Beijing, China 100084 ([email protected]) Haitao LI Johnson Graduate School of Management, Cornell University, Ithaca, NY 14850 ([email protected]) Feng ZHAO Department of Economics, Cornell University, Ithaca, NY 14850 ([email protected]) We provide a comprehensive analysis of the out-of-sample performance of a wide variety of spot rate models in forecasting the probability density of future interest rates. Although the most parsimonious models perform best in forecasting the conditional mean of many financial time series, we find that the spot rate models that incorporate conditional heteroscedasticity and excess kurtosis or heavy tails have better density forecasts. Generalized autoregressive conditional heteroscedasticity significantly improves the modeling of the conditional variance and kurtosis, whereas regime switching and jumps improve the modeling of the marginal density of interest rates. Our analysis shows that the sophisticated spot rate models in the existing literature are important for applications involving density forecasts of interest rates. KEY WORDS: Density forecast; Generalized autoregressive conditional heteroscedasticity; General- ized spectrum; Jumps; Parameter estimation uncertainty; Regime switching. 1. INTRODUCTION The short-term interest rate plays an important role in many areas of asset pricing studies. For example, the instantaneous risk-free interest rate, or the so-called “spot rate,” is the state variable that determines the evolution of the yield curve in the well-known term structure models of Vasicek (1977) and Cox, Ingersoll, and Ross (CIR) (1985). The spot rate is thus of fun- damental importance to the pricing of fixed-income securities and the management of interest rate risk. Many interest rate models have been proposed, and over the last decade, a vast body of literature has been developed to rigorously estimate and test these models using high-quality interest rate data. (See, e.g., Chapman and Pearson 2001 for a survey of the empiri- cal literature.) Despite the progress that has been made in modeling interest rate dynamics, most existing studies have typically focused on the in-sample fit of historical interest rates and ignored out-of- sample forecasts of future interest rates. In-sample diagnostic analysis is important and can reveal useful information about possible sources of model misspecifications. In many financial applications, such as the pricing and hedging of fixed-income securities and the management of interest rate risk, however, what matters most is the evolution of the interest rate in the future, not in the past. In general, there is no guarantee that a model that fits his- torical data well will also perform well out-of-sample due to at least three important reasons. First, the extensive search for more complicated models using the same (or similar) dataset(s) may suffer from the so-called “data-snooping bias,” as pointed out by Lo and MacKinlay (1989) and White (2000). A more- complicated model can always fit a given dataset better than simpler models, but it may overfit some idiosyncratic features of the data without capturing the true data-generating process. Out-of-sample evaluation will alleviate, if not eliminate com- pletely, such data-snooping bias. Second, an overparameterized model contains a large number of estimated parameters and in- evitably exhibits excessive sampling variation in parameter es- timation, which in turn may adversely affect the out-of-sample forecast performance. Third, a model that fits a historical dataset well may not forecast the future well because of unfore- seen structural changes or regime shifts in the data-generating process. Therefore, from both practical and theoretical stand- points, in-sample analysis alone is not adequate, and it is nec- essary to examine the out-of-sample predictive ability of spot rate models. We contribute to the literature by providing the first com- prehensive empirical analysis (to our knowledge) of the out- of-sample performance of a wide variety of popular spot rate models. Although some existing studies (e.g., Gray 1996; Bali 2000; Duffee 2002) have conducted out-of-sample analysis of interest rate models, what distinguishes our study is that we focus on forecasting the probability density of future interest rates, rather than just the conditional mean or the first few conditional moments. This distinction is important because in- terest rates, like most other financial time series, are highly non-Gaussian. Consequently, one must go beyond the condi- tional mean and variance to get a complete picture of interest rate dynamics. The conditional probability density character- izes the full dynamics of an interest rate model and thus es- sentially checks all conditional moments simultaneously (if the moments exist). Density forecasts are important not only for statistical evalua- tion, but also for many economic and financial applications. For © 2004 American Statistical Association Journal of Business & Economic Statistics October 2004, Vol. 22, No. 4 DOI 10.1198/073500104000000433 457

Transcript of Out-of-Sample Performance of Discrete-Time Spot …...Out-of-Sample Performance of Discrete-Time...

Out-of-Sample Performance of Discrete-TimeSpot Interest Rate Models

Yongmiao HONGDepartment of Economics and Department of Statistical Sciences Cornell University Ithaca NY 14850Department of Economics Tsinghua University Beijing China 100084 (yh20cornelledu)

Haitao LIJohnson Graduate School of Management Cornell University Ithaca NY 14850 (hl70cornelledu)

Feng ZHAODepartment of Economics Cornell University Ithaca NY 14850 (fz17cornelledu)

We provide a comprehensive analysis of the out-of-sample performance of a wide variety of spot ratemodels in forecasting the probability density of future interest rates Although the most parsimoniousmodels perform best in forecasting the conditional mean of many financial time series we find that thespot rate models that incorporate conditional heteroscedasticity and excess kurtosis or heavy tails havebetter density forecasts Generalized autoregressive conditional heteroscedasticity significantly improvesthe modeling of the conditional variance and kurtosis whereas regime switching and jumps improve themodeling of the marginal density of interest rates Our analysis shows that the sophisticated spot ratemodels in the existing literature are important for applications involving density forecasts of interest rates

KEY WORDS Density forecast Generalized autoregressive conditional heteroscedasticity General-ized spectrum Jumps Parameter estimation uncertainty Regime switching

1 INTRODUCTION

The short-term interest rate plays an important role in manyareas of asset pricing studies For example the instantaneousrisk-free interest rate or the so-called ldquospot raterdquo is the statevariable that determines the evolution of the yield curve in thewell-known term structure models of Vasicek (1977) and CoxIngersoll and Ross (CIR) (1985) The spot rate is thus of fun-damental importance to the pricing of fixed-income securitiesand the management of interest rate risk Many interest ratemodels have been proposed and over the last decade a vastbody of literature has been developed to rigorously estimateand test these models using high-quality interest rate data (Seeeg Chapman and Pearson 2001 for a survey of the empiri-cal literature)

Despite the progress that has been made in modeling interestrate dynamics most existing studies have typically focused onthe in-sample fit of historical interest rates and ignored out-of-sample forecasts of future interest rates In-sample diagnosticanalysis is important and can reveal useful information aboutpossible sources of model misspecifications In many financialapplications such as the pricing and hedging of fixed-incomesecurities and the management of interest rate risk howeverwhat matters most is the evolution of the interest rate in thefuture not in the past

In general there is no guarantee that a model that fits his-torical data well will also perform well out-of-sample due toat least three important reasons First the extensive search formore complicated models using the same (or similar) dataset(s)may suffer from the so-called ldquodata-snooping biasrdquo as pointedout by Lo and MacKinlay (1989) and White (2000) A more-complicated model can always fit a given dataset better thansimpler models but it may overfit some idiosyncratic featuresof the data without capturing the true data-generating process

Out-of-sample evaluation will alleviate if not eliminate com-pletely such data-snooping bias Second an overparameterizedmodel contains a large number of estimated parameters and in-evitably exhibits excessive sampling variation in parameter es-timation which in turn may adversely affect the out-of-sampleforecast performance Third a model that fits a historicaldataset well may not forecast the future well because of unfore-seen structural changes or regime shifts in the data-generatingprocess Therefore from both practical and theoretical stand-points in-sample analysis alone is not adequate and it is nec-essary to examine the out-of-sample predictive ability of spotrate models

We contribute to the literature by providing the first com-prehensive empirical analysis (to our knowledge) of the out-of-sample performance of a wide variety of popular spot ratemodels Although some existing studies (eg Gray 1996 Bali2000 Duffee 2002) have conducted out-of-sample analysis ofinterest rate models what distinguishes our study is that wefocus on forecasting the probability density of future interestrates rather than just the conditional mean or the first fewconditional moments This distinction is important because in-terest rates like most other financial time series are highlynon-Gaussian Consequently one must go beyond the condi-tional mean and variance to get a complete picture of interestrate dynamics The conditional probability density character-izes the full dynamics of an interest rate model and thus es-sentially checks all conditional moments simultaneously (if themoments exist)

Density forecasts are important not only for statistical evalua-tion but also for many economic and financial applications For

copy 2004 American Statistical AssociationJournal of Business amp Economic Statistics

October 2004 Vol 22 No 4DOI 101198073500104000000433

457

458 Journal of Business amp Economic Statistics October 2004

example the booming industry of financial risk management isessentially dedicated to providing density forecasts for impor-tant economic variables and portfolio returns and to trackingcertain aspects of distribution such as value at risk (VaR) toquantify the risk exposure of a portfolio (eg Morgan 1996Duffie and Pan 1997 Jorion 2000) More generally modernrisk control techniques all involve some form of density fore-cast the quality of which has real impact on the efficacy ofcapital assetliability allocation In macroeconomics monetaryauthorities in the United States and United Kingdom (the Fed-eral Reserve Bank of Philadelphia and the Bank of England)have been conducting quarterly surveys on density forecastsfor inflation and output growth to help set their policy instru-ments (eg inflation target) There is also a growing literatureon extracting density forecasts from options prices to obtainuseful information on otherwise unobservable market expecta-tions (eg Fackler and King 1990 Jackwerth and Rubinstein1996 Soderlind and Svensson 1997 Ait-Sahalia and Lo 1998)

In the recent development of time series econometrics thereis a growing interest in out-of-sample probability distributionforecasts and their evaluation motivated from the context ofdecision-making under uncertainty Important works in thisarea include those of Bera and Ghosh (2002) Berkowitz (2001)Christoffersen and Diebold (1996 1997) Clements and Smith(2000) Diebold Gunther and Tay (1998) Diebold Hahn andTay (1999) Diebold Tay and Wallis (1999) Granger (1999)and Granger and Pesaran (2000) One of the most important is-sues in density forecasting is to evaluate the quality of a forecast(Granger 1999) Suboptimal density forecasts will have real ad-verse impact in practice For example an excessive forecast forVaR would force risk managers and financial institutions to holdtoo much capital imposing an additional cost Suboptimal den-sity forecasts for important macroeconomic variables may leadto inappropriate policy decisions (eg inappropriate level andtiming in setting interest rates) which could have serious con-sequences for the real economy

Evaluating density forecasts however is nontrivial sim-ply because the probability density function of an underly-ing process is not observable even ex post Unlike for pointforecasts there are few statistical tools for evaluating densityforecasts In a pioneering contribution Diebold et al (1998)evaluated density forecast by examining the probability inte-gral transforms of the data with respect to a density forecastmodel Such a transformed series is often called the ldquogener-alized residualsrdquo of the density forecast model It should beiid U[01] if the density forecast model correctly captures thefull dynamics of the underlying process Any departure fromiid U[01] is evidence of suboptimal density forecasts andmodel misspecification

Diebold et al (1998) used an intuitive graphical method toseparately examine the iid and uniform properties of the ldquogener-alized residualsrdquo This method is simple and informative aboutpossible sources of suboptimality in density forecasts Hong(2003) recently developed an omnibus evaluation procedure fordensity forecasts by measuring the departure of the ldquogeneral-ized residualsrdquo from iid U[01] The evaluation statistic pro-vides a metric of the distance between the density forecastmodel and the true data-generating process The most appeal-ing feature of this new test is its omnibus ability to detect a

wide range of suboptimal density forecasts thanks to the use ofthe generalized spectrum The latter was introduced by Hong(1999) as an analytic tool for nonlinear time series and is basedon the characteristic function in a time series context There hasbeen an increasing interest in using the characteristic functionin financial econometrics (see eg Chacko and Viceira 2003Ghysels Carrasco Chernov and Florens 2001 Hong and Lee2003ab Jiang and Knight 1997 Singleton 2001)

Applying Hongrsquos (2003) procedure we provide a compre-hensive empirical analysis of the density forecast performanceof a variety of popular spot rate models including single-factordiffusion generalized autoregressive conditional heteroscedas-ticity (GARCH) regime-switching and jump-diffusion modelsThe in-sample performance of these models has been exten-sively studied but their out-of-sample performance especiallytheir density forecast performance is still largely unknownWe find that although previous studies have shown that sim-pler models such as the random walk model tend to providebetter forecasts for the conditional mean of many financial timeseries including interest rates and exchange rates (eg Duffee2002 Meese and Rogoff 1983) more-sophisticated spot ratemodels that capture volatility clustering and excess kurtosisand heavy tails of interest rates have better density forecastsGARCH significantly improves the modeling of the dynam-ics of the conditional variance and kurtosis of the generalizedresiduals whereas regime switching and jumps significantlyimprove the modeling of the marginal density of interest ratesOur analysis shows that the sophisticated spot rate models inthe existing literature can indeed capture some important fea-tures of interest rates and are relevant to applications involvingdensity forecasts of interest rates

The article is organized as follows In Section 2 we discussthe methodology for density forecast evaluation In Section 3we introduce a variety of popular spot rate models In Section 4we describe data estimation methods and the in-sample per-formance of each model In Section 5 we subject each modelto out-of-sample density forecast evaluation and in Section 6we provide concluding remarks In the Appendix we discussHongrsquos (2003) evaluation procedure for density forecasts

2 OUTndashOFndashSAMPLE DENSITYFORECAST EVALUATION

The probability density function is a well-established toolfor characterizing uncertainty in economics and finance (egRothschild and Stiglitz 1970) The importance of density fore-casts has been widely recognized in recent literature due to theworks of Diebold et al (1998) Granger (1999) and Grangerand Pesaran (2000) among many others These authors showedthat accurate density forecasts are essential for decision mak-ing under uncertainty when the forecasterrsquos objective functionis asymmetric and the underlying process is non-Gaussian Ina decision-theoretic context Diebold et al (1998) and Grangerand Pesaran (2000) showed that if a density forecast coincideswith the true conditional density of the data generating processthen it will be preferred by all forecast users regardless of theirobjective functions (eg risk attitudes) Thus testing the opti-mality of a forecast boils down to checking whether the densityforecast model can capture the true data-generating process

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 459

This is a challenging job because we never observe an ex postdensity So far there are few statistical evaluation procedures fordensity forecasts

Diebold et al (1998) used the probability integral transformof data with respect to the density forecast model to assess theoptimality of density forecasts Extending a result establishedby Rosenblatt (1952) they showed that if the model conditionaldensity is specified correctly then the probability integral trans-formed series should be iid U[01]

Specifically for a given model of interest rate rt there is amodel-implied conditional density

part

partrP(rt le r|Itminus1 θ) = p(r t|Itminus1 θ)

where θ is an unknown finite-dimensional parameter vectorand Itminus1 equiv rtminus1 r1 is the information set available attime t minus 1 Suppose that we have a random sample rtN

t=1 ofsize N which we divide into two subsets an estimation sam-ple rtR

t=1 of size R and a prediction sample rtNt=R+1 of size

n = N minus R We can then define the dynamic probability inte-gral transform of the data rtN

t=R+1 with respect to the modelconditional density

Zt(θ) equivint rt

minusinfinp(r t|Itminus1 θ)dr t = R + 1 N

Suppose that the model is correctly specified in the sense thatthere exists some θ0 such that p(r t|Itminus1 θ0) coincides withthe true conditional density of rt Then the transformed se-quence Zt(θ0) is iid U[01]

For example consider the discretized Vasicek model

rt = α0 + α1rtminus1 + σ zt

where zt sim iid N(01) For this model the conditional densityof rt given Itminus1 is

p(r t|Itminus1 θ)

= 1radic2πσ 2

exp

minus 1

2σ 2

[r minus (

α0 + (1 + α1)rtminus1)]2

r isin (minusinfininfin)

where θ = (α0 α1 σ )prime Suppose that p(r t|Itminus1 θ0) =p0(r t|Itminus1) for some θ0 where p0(r t|Itminus1) is the true con-ditional density of rt Then

Zt(θ0) =int rt

minusinfinp(r t|Itminus1 θ0)dr = (zt) sim iid U[01]

where (middot) is the N(01) cumulative distribution function (cdf)As noted earlier the transformed series Zt(θ ) is called the

ldquogeneralized residualrdquo of model p(r t|Itminus1 θ) where θ is a con-sistent estimator for θ0 This series provides a convenient ap-proach for evaluating the density forecast model p(r t|Itminus1 θ)Intuitively the U[01] property characterizes the correct spec-ification for the stationary distribution of rt and the iidproperty characterizes correct dynamic specification for rtIf Zt(θ) is not iid U[01] for all θ then the density forecastmodel p(r t|Itminus1 θ) is not optimal and there is room for fur-ther improvement

To test the joint hypothesis of iid U[01] for Zt(θ) is non-trivial One may suggest using the well-known KolmogorovndashSmirnov test which unfortunately checks U[01] under the

iid assumption rather than checking iid U[01] jointly It willmiss the alternatives for which Zt(θ) is uniform but not iidSimilarly tests for iid alone will miss the alternatives for whichZt(θ) is iid but not U[01] Moreover parameter estimationuncertainty is expected to affect the asymptotic distribution ofthe KolmogorovndashSmirnov test statistic

Diebold et al (1998) used the autocorrelations of Zmt (θ) to

check the iid property and the histogram of Zt(θ) to checkthe U[01] property These graphical procedures are simpleand informative To compare and rank different models how-ever it is desirable to use a single evaluation criterion Hong(2003) recently developed an omnibus evaluation procedure forout-of-sample density forecast that tests the joint hypothesis ofiid U[01] and explicitly addresses the impact of parameter esti-mation uncertainty on the evaluation statistics This is achievedusing a generalized spectral approach The generalized spec-trum was first introduced by Hong (1999) as an analytic toolfor nonlinear time series analysis It is based on the charac-teristic function transformation embedded in a spectral frame-work Unlike the power spectrum the generalized spectrum cancapture any kind of pairwise serial dependence across variouslags including those with zero autocorrelation [eg an ARCHprocess with iid N(01) innovations]

Specifically Hong (2003) introduced a modified generalizedspectral density function that incorporates the information ofiid U[01] under the null hypothesis Consequently the mod-ified generalized spectrum becomes a known ldquoflatrdquo functionunder the null hypothesis of iid U[01] Whenever Zt(θ) de-pends on Ztminusj(θ) for some j gt 0 or Zt(θ) is not U[01] themodified generalized spectrum will be nonflat as a functionof frequency ω Hong (2003) proposed an evaluation statistic(denoted by M1) for out-of-sample density forecasts by com-paring a kernel estimator of the modified generalized spectrumwith the flat spectrum

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the density forecast modelfrom the true data-generating process A better density fore-cast model is expected to have a smaller value of M1 becauseits generalized residual series Zt(θ) is closer to having theiid U[01] property Thus M1 can be used to rank competingdensity forecast models in terms of their deviations from opti-mality (See the Appendix for more discussion)

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection The model gener-alized residuals Zt(θ) contain much information on modelmisspecification As noted earlier Diebold et al (1998) haveillustrated how to use the histograms of Zt(θ) and autocor-relograms Zm

t (θ) to identify sources of suboptimal densityforecasts Although intuitive and convenient these graphi-cal methods ignore the impact of parameter estimation un-certainty in θ on the asymptotic distribution of evaluationstatistics which generally exists even when n rarr infin Hong(2003) provided a class of rigorous asymptotically N(01)

separate inference statistics M(m l ) that measure whether thecross-correlations between Zm

t (θ) and Zltminus| j|(θ) are significantly

different from 0 Although the moments of the generalized

460 Journal of Business amp Economic Statistics October 2004

residuals Zt are not exactly the same as those of the originaldata rt they are highly correlated In particular the choice of(m l ) = (11) (22) (33) and (44) is very sensitive to au-tocorrelations in level volatility skewness and kurtosis of rtFurthermore the choices of (m l ) = (12) and (21) are sen-sitive to the ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effectDifferent choices of order (m l ) can thus illustrate various dy-namic aspects of the underlying process rt (Again see the Ap-pendix for more discussion)

3 SPOT RATE MODELS

We apply Hongrsquos (2003) procedure to evaluate the densityforecast performance of a variety of popular spot rate modelsincluding single-factor diffusion GARCH regime-switchingand jump-diffusion models We now discuss these modelsin detail

31 Single-Factor Diffusion Models

One important class of spot rate models is the continuous-time diffusion models which have been widely used in modernfinance because of their convenience in pricing fixed-incomesecurities Specifically the spot rate is assumed to follow asingle-factor diffusion

drt = micro(rt θ)dt + σ(rt θ)dWt

where micro(rt θ) and σ(rt θ) are the drift and diffusion functionsand Wt is a standard Brownian motion For diffusion modelsmicro(rt θ) and σ(rt θ) completely determine the model transitiondensity which in turn captures the full dynamics of rt Table 1lists a variety of discretized diffusion models examined in ouranalysis which are nested by Ait-Sahaliarsquos (1996) nonlineardrift model

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 + σ rρ

tminus1zt (1)

where zt simiid N(01) To be consistent with the other mod-els considered in this article we study the discretized versionof the single-factor diffusion and jump-diffusion models Thediscretization bias for daily data that we use in this article isunlikely to be significant (eg Stanton 1997 Das 2002) In (1)we allow the drift to have zero linear and nonlinear specifi-cations and allow the diffusion to be a constant or to dependon the interest rate level which we refer to as the ldquolevel ef-fectrdquo We term the volatility specification in which the elasticityparameter ρ is estimated from the data the constant elasticityvolatility (CEV) Density forecasts for the spot rate rt are givenby the model-implied transition density p(rt t|Itminus1 θ) whereθ is a parameter estimator using the maximum likelihood es-timate (MLE) method Because we study discretized diffusionmodels the MLE method is suitable For the continuous-timediffusion models we can use the closed-form likelihood func-tion if available or the approximated likelihood function viaHermite expansions (see Ait-Sahalia 1999 2002)

32 GARCH Models

Despite the popularity of single-factor diffusion modelsmany studies (eg Brenner Harjes and Kroner 1996 Andersenand Lund 1997) have shown that these models are unreasonablyrestrictive by requiring that the interest rate volatility dependsolely on the interest rate level To capture the well-knownpersistent volatility clustering in interest rates Brenner et al(1996) introduced various GARCH specifications for volatil-ity and showed that GARCH models significantly outperformsingle-factor diffusion models for in-sample fit

To understand the importance of GARCH for density fore-cast we consider six GARCH models as listed in Table 1 whichare nested by the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt

ht = β0 + htminus1(β2 + β1r2ρtminus1z2

tminus1)

zt sim iid N(01)

(2)

We consider three different drift specifications (zero linear andnonlinear drift) and two volatility specifications (pure GARCHand combined CEVndashGARCH) These various GARCH modelsallow us to examine the importance of linear versus nonlineardrift in the presence of GARCH and CEV and the incrementalcontribution of GARCH with respect to CEV For identificationwe set σ = 1 in all GARCH models

Another popular approach to capturing volatility clustering isthe continuous-time stochastic volatility model considered byAndersen and Lund (1997) Gallant and Tauchen (1998) andmany others These studies show that adding a latent stochas-tic volatility factor to a diffusion model significantly improvesthe goodness of fit for interest rates We leave continuous-time stochastic volatility models for future research becausetheir estimation is much more involved On the other handwe also need to be careful about drawing any implicationsfrom GARCH models for continuous-time stochastic volatilitymodels Although Nelson (1990) showed that GARCH mod-els converge to stochastic volatility models in the limit Corradi(2000) later showed that Nelsonrsquos results hold only under hisparticular discretization scheme Other equally reasonable dis-cretizations will lead to very different continuous-time limitsfor GARCH models

33 Markov Regime-Switching Models

Many studies have shown that the behavior of spot rateschanges over time due to changes in monetary policy thebusiness cycle and general macroeconomic conditions TheMarkov regime-switching models of Hamilton (1989) havebeen widely used to model the time-varying behavior of interestrates (eg Gray 1996 Ball and Torous 1998 Ang and Bekaert2002 Li and Xu 2000 among others) Following most existingstudies we consider a class of two-regime models for the spotrate where the latent state variable st follows a two-state first-order Markov chain We refer to the regime in which st = 1 (2)as the first (second) regime Following Ang and Bekaert (2002)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 461

Table 1 Spot Rate Models Considered for Density Forecast Evaluation

Model Mean Volatility

Discretized single-factor diffusion modelsRandom walk α0 σLog-normal α1rtminus1 σ rtminus1Dothan 0 σ rtminus1Pure CEV 0 σ r ρ

tminus1Vasicek α0 + α1rtminus1 σ

CIR α0 + α1rtminus1 σ r 5tminus1CKLS α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

GARCH modelsNo drift GARCH 0 σ

radicht

Linear drift GARCH α0 + α1rtminus1 σradic

htNonlinear drift GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σradic

htNo drift CEVndashGARCH 0 σ r ρ

tminus1

radicht

Linear drift CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

Markov regime-switching models

No drift RS CEV 0 σ (st )rρ(st )tminus1

Linear drift RS CEV α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEV αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

No drift RS GARCH 0 σ (st )radic

htLinear drift RS GARCH α0(st ) + α1(st )rtminus1 σ (st )

radicht

Nonlinear drift RS GARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )

radicht

No drift RS CEVndashGARCH 0 σ (st )rρ(st )tminus1

Linear drift RS CEVndashGARCH α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEVndashGARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

Discretized jump-diffusion modelsNo drift JD CEV 0 σ r ρ

tminus1Linear drift JD CEV α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift JD CEV αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

No drift JD GARCH 0 σradic

htLinear drift JD GARCH α0 + α1rtminus1 σ

radicht

Nonlinear drift JD GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ

radicht

No drift JD CEVndashGARCH 0 σ r ρtminus1

radicht

Linear drift JD CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift JD CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

NOTE For consistency all models are estimated in a discrete time setting using the maximum likelihood method (MLE) The eight discretized single-factor diffusion models are nestedby the following specification rt minus 1 = αminus1rt minus 1 + α0 + α1 rt minus 1 + α2r 2

t minus 1 + σ r ρztt minus 1 where zt sim iid N(0 1) The six GARCH models are nested by the following specification rt =

αminus1rt minus 1 + α0 + α1rt minus 1 + α2r 2t minus 1 + σ r ρ

t minus 1

radicht zt where ht = β0 + ht minus 1 (β2 + β1r 2ρ

t minus 1z2t minus 1 ) and zt sim iid N(0 1) The nine regime-switching models are nested by the following specification

rt = αminus1(st )rt minus 1 + α0 (st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 22 + β2ht minus 1 et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows

a two-state Markov chain with the transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2 The nine discretized jump-diffusion models are nested by the following specifica-tion rt = αminus1rt minus 1 + α0 + α1rt minus 1 + α2 r 2

t minus 1 + σ r ρt minus 1

radicht zt + J(microγ 2 )πt (qt ) where ht = β0 + β1[rt minus 1 minus E(rt minus 1 |rt minus 2 )]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt ) is Bernoulli(qt ) with

qt = (1 + exp(minusc minus drt minus 1 ))minus1

we assume that the transition probability of st depends on theonce-lagged spot rate level

Pr(st = l|stminus1 = l ) = 1

1 + exp(minuscl minus dlrtminus1)

Table 1 lists a variety of regime-switching models all ofwhich are nested by the following specification

rt = αminus1(st)rtminus1 + α0(st) + α1(st)rtminus1

+ α2(st)r2tminus1 + σ(st)r

ρ(st)tminus1

radichtzt

ht = β0 + β1Eet|rtminus2 stminus22 + β2htminus1

et = [rtminus1 minus E(rtminus1|rtminus2 stminus1)]σ(stminus1)

zt sim iid N(01)

(3)

As in the aforementioned models we consider three specifica-tions of the conditional mean zero linear and nonlinear driftWe also consider three specifications of the conditional vari-ance CEV GARCH and combined CEVndashGARCH Thus wehave a total of nine regime-switching models

Although Gray (1996) removed the path-dependent natureof GARCH models by averaging the conditional and un-conditional variances over regimes at every time point weuse the same GARCH specification across different regimesConfirming results of Ang and Bekaert (2002) we find a veryunstable estimation of Grayrsquos (1996) model In contrast ourspecification turns out to have much better convergence prop-erties Unlike many previous studies that set the elasticityparameter equal to 5 we allow it to be regime-dependent and

462 Journal of Business amp Economic Statistics October 2004

estimate it from the data For identification we set the diffusionconstant σ(st) = 1 for st = 1 in the regime-switching modelswith GARCH effect

The conditional density of the interest rate rt in a regime-switching model is

p(rt|Itminus1) =2sum

l=1

p(rt|st = l Itminus1)Pr(st = l|Itminus1)

where the ex ante probability that the data are generated fromregime l at t Pr(st = l|Itminus1) can be obtained using Bayesrsquos rulevia a recursive procedure described by Hamilton (1989) Theconditional density of regime-switching models is a mixture oftwo normal distributions which can generate unimodal or bi-modal distributions and allows for great flexibility in modelingskewness kurtosis and heavy tails

34 Jump-Diffusion Models

There are compelling economic and statistical reasons toaccount for discontinuity in interest rate dynamics Various eco-nomic shocks news announcements and government interven-tions in bond markets have pronounced effects on the behaviorof interest rates and tend to generate large jumps in interestrate data Statistically Das (2002) and Johannes (2003) amongothers have shown that diffusion models (even with stochasticvolatility) cannot generate the excessive leptokurtosis exhibitedby the changes of the spot rates and that jump-diffusion mod-els are a convenient way to generate excessive kurtosis or moregenerally heavy tails

We consider a class of discretized jump-diffusion mod-els listed Table 1 As in the aforementioned models weconsider zero linear and nonlinear drift specifications andCEV GARCH and combined CEVndashGARCH specifications forvolatility The nine different jump-diffusion models are nestedby the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt + J(microγ 2)πt(qt)

ht = β0 + β1[rtminus1 minus E(rtminus1|rtminus2)]2 + β2htminus1

zt sim iid N(01)

πt(qt) sim independent Bernoulli(qt)

J sim N(microγ 2)

(4)

where J is the jump size and qt is the jump probability withqt = 1

1+exp(minuscminusdrtminus1)

The conditional density of the foregoing jump-diffusionmodels can be written as

f (rt|rtminus1)

= (1 minus qt)1radic

2πσ 2(rtminus1)exp

minus[rt minus micro(rtminus1)]2

2σ 2(rtminus1)

+ qt1radic

2π[σ 2(rtminus1) + σ 2J ]

times exp

minus[rt minus micro(rtminus1) minus micro]2

2[σ 2(rtminus1) + γ 2]

where micro(rtminus1) and σ 2(rtminus1) are the conditional mean and vari-ance of the diffusion part in (4) Similar to the regime-switchingmodels the conditional density of the jump-diffusion mod-els is also a mixture of two normal distributions Thus thesetwo classes of models represent two alternative approachesto generating excess kurtosis and heavy tails although theregime-switching models have more sophisticated specifica-tions For example in (3) all drift parameters are regime-dependent whereas in (4) only the intercept term is differentin the conditional mean and variance For the regime-switchingmodels in (3) the state probabilities evolve according to a tran-sition matrix with updated priors at every time point Thus itcan capture both very persistent and transient regime shifts Forthe jump-diffusion models in (4) the state probabilities are as-sumed to depend on the past interest rate level

The in-sample performances of the four classes of modelsthat we study have been individually studied in the literatureHowever no one has yet attempted a systematic evaluation ofall of these models in a unified setup particularly in the out-of-sample context because of the difficulty in density forecastevaluation and because of different nonnested specificationsbetween existing classes of models In the next section we com-pare the relative performance of these four classes of modelsin out-of-sample density forecast using the evaluation methoddescribed in Section 2 This comparison can provide valuableinformation about the relative strengths and weaknesses of eachmodel and should be important for many applications

4 MODEL ESTIMATION ANDINndashSAMPLE PERFORMANCE

41 Data and Estimation Method

In modeling spot interest rate dynamics yields on short-termdebts are often used as proxies for the unobservable instan-taneous risk-free rate These include the 1-month T-bill ratesused by Gray (1996) and Chan Karolyi Longstaff and Sanders(CKLS) (1992) the 3-month T-bill rates used by Stanton (1997)and Andersen and Lund (1997) the 7-day Eurodollar rates usedby Ait-Sahalia (1996) and Hong and Li (2004) and the Fedfunds rates used by Conley Hansen Luttmer and Scheinkman(1997) and Das (2002) In our study we follow CKLS (1992)and use the daily 1-month T-bill rates from June 14 1961 to De-cember 29 2000 with a total of 9868 observations The dataare extracted from the CRSP bond file using the midpoint ofquoted bid and ask prices of the T-bills with a remaining ma-turity that is closest to 1 month (30 calendar days) from thecurrent date Most of these T-bills have maturities of 6 monthsor 1 year when first issued and all T-bills used in our study havea remaining maturity between 27 and 33 days We then com-pute the annualized continuously compounded yield based onthe average of the ask and bid prices

Figure 1 plots the level and change series of the daily1-month T-bill rates as well as their histograms There isobvious persistent volatility clustering and in general thevolatility is higher at a higher level of the interest rate (egthe 1979ndash1982 period) The marginal distribution of the inter-est rate level is skewed to the right with a long right tail Mostdaily changes in the interest rate level are very small giving a

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 463

Figure 1 Daily 1-Month T-Bill Rates Between June 14 1961 and December 29 2000 This figure plots the level and change series of the daily1-month T-bill rates as well as their histograms The data are extracted from the CRSP bond file using the midpoint of quoted bid and ask prices ofthe T-bills with a remaining maturity that is closest to 1 month (30 calendar days) from the current date

sharp probability peak around 0 Daily changes of other spot in-terest rates such as 3-month T-bill and 7-day Eurodollar ratesalso exhibit a high peak around 0 But the peak is much lesspronounced for weekly changes in spot rates This togetherwith the long (right) tail implies excess kurtosis a stylized factthat has motivated the use of jump models in the literature (egDas 2002 Johannes 2003)

We divide our data into two subsamples The first fromJune 14 1961 to February 22 1991 (with a total of 7400 ob-servations) is a sample used to estimate model parameters thesecond from February 23 1991 to December 29 2000 (witha total of 2467 observations) is a prediction sample used toevaluate out-of-sample density forecasts We first consider thein-sample performance of the various models listed in Table 1all of which are estimated using the MLE method The opti-mization algorithm is the well-known BHHH with STEPBT forstep length calculation and is implemented via the constrainedoptimization code in GAUSS Windows Version 50 The op-timization tolerance level is set such that the gradients of theparameters are less than or equal to 10minus6

Tables 2ndash5 report parameter estimates of four classes of spotrate models using daily 1-month T-bill rates from June 14 1961to February 22 1991 with a total of 7400 observations To ac-count for the special period between October 1979 and Septem-ber 1982 we introduce dummy variables to the drift volatilityand elasticity parameters that is αD σD and ρD equal 0 out-side of the 1979ndash1982 period We also introduce a dummyvariable D87 in the interest rate level to account for the effectof 1987 stock market crash D87 equals 0 except for the weekright after the stock market crash Estimated robust standard er-rors are reported in parentheses

42 In-Sample Evidence

We first discuss the in-sample performance of models withineach class and then compare their performance across dif-ferent classes Many previous studies (eg Bliss and Smith

1998) have shown that the period between October 1979 andSeptember 1982 which coincides with the Federal Reserve ex-periment has a big impact on model parameter estimates Toaccount for the special feature of this period we introducedummy variables in the drift volatility and elasticity parame-ters for this period We also introduce a dummy variable in theinterest rate level for the week right after 1987 stock marketcrash Brenner et al (1996) showed that the decline of the in-terest rates during this week is too dramatic to be captured bymost standard models

Table 2 reports parameter estimates with estimated robuststandard errors and log-likelihood values for discretized single-factor diffusion models The estimates of the drift parametersof Vasicek CIR and CKLS models all suggest mean-reversionin the conditional mean with an estimated 6 long-run meanFor other models such as the random walk log-normal andnonlinear drift models most drift parameters are not signifi-cant A comparison of the pure CEV CKLS and Ait-Sahalia(1996) nonlinear drift models indicates that the incrementalcontribution of nonlinear drift is very marginal which is notsurprising given that the drift parameter estimates in the non-linear drift model are mostly insignificant On the other handthere is also a clear evidence of level effect all estimates ofthe elasticity parameter are significant Unlike previous stud-ies (eg CKLS 1992) which found an estimated elasticity pa-rameter close to 15 our estimate is about 25 (70) outside(inside) the 1979ndash1982 period Our results confirm previousfindings of Brenner et al (1996) Andersen and Lund (1997)Bliss and Smith (1998) and Koedijk Nissen Schotman andWolff (1997) that the estimate of the elasticity parameter isvery sensitive to the choice of interest rate data data frequencysample periods and specifications of the volatility functionThe 1979ndash1982 period dummies suggest that the drift does notbehave very differently (ie αD is not significant) whereasvolatility is significantly higher and depends more heavily on

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

458 Journal of Business amp Economic Statistics October 2004

example the booming industry of financial risk management isessentially dedicated to providing density forecasts for impor-tant economic variables and portfolio returns and to trackingcertain aspects of distribution such as value at risk (VaR) toquantify the risk exposure of a portfolio (eg Morgan 1996Duffie and Pan 1997 Jorion 2000) More generally modernrisk control techniques all involve some form of density fore-cast the quality of which has real impact on the efficacy ofcapital assetliability allocation In macroeconomics monetaryauthorities in the United States and United Kingdom (the Fed-eral Reserve Bank of Philadelphia and the Bank of England)have been conducting quarterly surveys on density forecastsfor inflation and output growth to help set their policy instru-ments (eg inflation target) There is also a growing literatureon extracting density forecasts from options prices to obtainuseful information on otherwise unobservable market expecta-tions (eg Fackler and King 1990 Jackwerth and Rubinstein1996 Soderlind and Svensson 1997 Ait-Sahalia and Lo 1998)

In the recent development of time series econometrics thereis a growing interest in out-of-sample probability distributionforecasts and their evaluation motivated from the context ofdecision-making under uncertainty Important works in thisarea include those of Bera and Ghosh (2002) Berkowitz (2001)Christoffersen and Diebold (1996 1997) Clements and Smith(2000) Diebold Gunther and Tay (1998) Diebold Hahn andTay (1999) Diebold Tay and Wallis (1999) Granger (1999)and Granger and Pesaran (2000) One of the most important is-sues in density forecasting is to evaluate the quality of a forecast(Granger 1999) Suboptimal density forecasts will have real ad-verse impact in practice For example an excessive forecast forVaR would force risk managers and financial institutions to holdtoo much capital imposing an additional cost Suboptimal den-sity forecasts for important macroeconomic variables may leadto inappropriate policy decisions (eg inappropriate level andtiming in setting interest rates) which could have serious con-sequences for the real economy

Evaluating density forecasts however is nontrivial sim-ply because the probability density function of an underly-ing process is not observable even ex post Unlike for pointforecasts there are few statistical tools for evaluating densityforecasts In a pioneering contribution Diebold et al (1998)evaluated density forecast by examining the probability inte-gral transforms of the data with respect to a density forecastmodel Such a transformed series is often called the ldquogener-alized residualsrdquo of the density forecast model It should beiid U[01] if the density forecast model correctly captures thefull dynamics of the underlying process Any departure fromiid U[01] is evidence of suboptimal density forecasts andmodel misspecification

Diebold et al (1998) used an intuitive graphical method toseparately examine the iid and uniform properties of the ldquogener-alized residualsrdquo This method is simple and informative aboutpossible sources of suboptimality in density forecasts Hong(2003) recently developed an omnibus evaluation procedure fordensity forecasts by measuring the departure of the ldquogeneral-ized residualsrdquo from iid U[01] The evaluation statistic pro-vides a metric of the distance between the density forecastmodel and the true data-generating process The most appeal-ing feature of this new test is its omnibus ability to detect a

wide range of suboptimal density forecasts thanks to the use ofthe generalized spectrum The latter was introduced by Hong(1999) as an analytic tool for nonlinear time series and is basedon the characteristic function in a time series context There hasbeen an increasing interest in using the characteristic functionin financial econometrics (see eg Chacko and Viceira 2003Ghysels Carrasco Chernov and Florens 2001 Hong and Lee2003ab Jiang and Knight 1997 Singleton 2001)

Applying Hongrsquos (2003) procedure we provide a compre-hensive empirical analysis of the density forecast performanceof a variety of popular spot rate models including single-factordiffusion generalized autoregressive conditional heteroscedas-ticity (GARCH) regime-switching and jump-diffusion modelsThe in-sample performance of these models has been exten-sively studied but their out-of-sample performance especiallytheir density forecast performance is still largely unknownWe find that although previous studies have shown that sim-pler models such as the random walk model tend to providebetter forecasts for the conditional mean of many financial timeseries including interest rates and exchange rates (eg Duffee2002 Meese and Rogoff 1983) more-sophisticated spot ratemodels that capture volatility clustering and excess kurtosisand heavy tails of interest rates have better density forecastsGARCH significantly improves the modeling of the dynam-ics of the conditional variance and kurtosis of the generalizedresiduals whereas regime switching and jumps significantlyimprove the modeling of the marginal density of interest ratesOur analysis shows that the sophisticated spot rate models inthe existing literature can indeed capture some important fea-tures of interest rates and are relevant to applications involvingdensity forecasts of interest rates

The article is organized as follows In Section 2 we discussthe methodology for density forecast evaluation In Section 3we introduce a variety of popular spot rate models In Section 4we describe data estimation methods and the in-sample per-formance of each model In Section 5 we subject each modelto out-of-sample density forecast evaluation and in Section 6we provide concluding remarks In the Appendix we discussHongrsquos (2003) evaluation procedure for density forecasts

2 OUTndashOFndashSAMPLE DENSITYFORECAST EVALUATION

The probability density function is a well-established toolfor characterizing uncertainty in economics and finance (egRothschild and Stiglitz 1970) The importance of density fore-casts has been widely recognized in recent literature due to theworks of Diebold et al (1998) Granger (1999) and Grangerand Pesaran (2000) among many others These authors showedthat accurate density forecasts are essential for decision mak-ing under uncertainty when the forecasterrsquos objective functionis asymmetric and the underlying process is non-Gaussian Ina decision-theoretic context Diebold et al (1998) and Grangerand Pesaran (2000) showed that if a density forecast coincideswith the true conditional density of the data generating processthen it will be preferred by all forecast users regardless of theirobjective functions (eg risk attitudes) Thus testing the opti-mality of a forecast boils down to checking whether the densityforecast model can capture the true data-generating process

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 459

This is a challenging job because we never observe an ex postdensity So far there are few statistical evaluation procedures fordensity forecasts

Diebold et al (1998) used the probability integral transformof data with respect to the density forecast model to assess theoptimality of density forecasts Extending a result establishedby Rosenblatt (1952) they showed that if the model conditionaldensity is specified correctly then the probability integral trans-formed series should be iid U[01]

Specifically for a given model of interest rate rt there is amodel-implied conditional density

part

partrP(rt le r|Itminus1 θ) = p(r t|Itminus1 θ)

where θ is an unknown finite-dimensional parameter vectorand Itminus1 equiv rtminus1 r1 is the information set available attime t minus 1 Suppose that we have a random sample rtN

t=1 ofsize N which we divide into two subsets an estimation sam-ple rtR

t=1 of size R and a prediction sample rtNt=R+1 of size

n = N minus R We can then define the dynamic probability inte-gral transform of the data rtN

t=R+1 with respect to the modelconditional density

Zt(θ) equivint rt

minusinfinp(r t|Itminus1 θ)dr t = R + 1 N

Suppose that the model is correctly specified in the sense thatthere exists some θ0 such that p(r t|Itminus1 θ0) coincides withthe true conditional density of rt Then the transformed se-quence Zt(θ0) is iid U[01]

For example consider the discretized Vasicek model

rt = α0 + α1rtminus1 + σ zt

where zt sim iid N(01) For this model the conditional densityof rt given Itminus1 is

p(r t|Itminus1 θ)

= 1radic2πσ 2

exp

minus 1

2σ 2

[r minus (

α0 + (1 + α1)rtminus1)]2

r isin (minusinfininfin)

where θ = (α0 α1 σ )prime Suppose that p(r t|Itminus1 θ0) =p0(r t|Itminus1) for some θ0 where p0(r t|Itminus1) is the true con-ditional density of rt Then

Zt(θ0) =int rt

minusinfinp(r t|Itminus1 θ0)dr = (zt) sim iid U[01]

where (middot) is the N(01) cumulative distribution function (cdf)As noted earlier the transformed series Zt(θ ) is called the

ldquogeneralized residualrdquo of model p(r t|Itminus1 θ) where θ is a con-sistent estimator for θ0 This series provides a convenient ap-proach for evaluating the density forecast model p(r t|Itminus1 θ)Intuitively the U[01] property characterizes the correct spec-ification for the stationary distribution of rt and the iidproperty characterizes correct dynamic specification for rtIf Zt(θ) is not iid U[01] for all θ then the density forecastmodel p(r t|Itminus1 θ) is not optimal and there is room for fur-ther improvement

To test the joint hypothesis of iid U[01] for Zt(θ) is non-trivial One may suggest using the well-known KolmogorovndashSmirnov test which unfortunately checks U[01] under the

iid assumption rather than checking iid U[01] jointly It willmiss the alternatives for which Zt(θ) is uniform but not iidSimilarly tests for iid alone will miss the alternatives for whichZt(θ) is iid but not U[01] Moreover parameter estimationuncertainty is expected to affect the asymptotic distribution ofthe KolmogorovndashSmirnov test statistic

Diebold et al (1998) used the autocorrelations of Zmt (θ) to

check the iid property and the histogram of Zt(θ) to checkthe U[01] property These graphical procedures are simpleand informative To compare and rank different models how-ever it is desirable to use a single evaluation criterion Hong(2003) recently developed an omnibus evaluation procedure forout-of-sample density forecast that tests the joint hypothesis ofiid U[01] and explicitly addresses the impact of parameter esti-mation uncertainty on the evaluation statistics This is achievedusing a generalized spectral approach The generalized spec-trum was first introduced by Hong (1999) as an analytic toolfor nonlinear time series analysis It is based on the charac-teristic function transformation embedded in a spectral frame-work Unlike the power spectrum the generalized spectrum cancapture any kind of pairwise serial dependence across variouslags including those with zero autocorrelation [eg an ARCHprocess with iid N(01) innovations]

Specifically Hong (2003) introduced a modified generalizedspectral density function that incorporates the information ofiid U[01] under the null hypothesis Consequently the mod-ified generalized spectrum becomes a known ldquoflatrdquo functionunder the null hypothesis of iid U[01] Whenever Zt(θ) de-pends on Ztminusj(θ) for some j gt 0 or Zt(θ) is not U[01] themodified generalized spectrum will be nonflat as a functionof frequency ω Hong (2003) proposed an evaluation statistic(denoted by M1) for out-of-sample density forecasts by com-paring a kernel estimator of the modified generalized spectrumwith the flat spectrum

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the density forecast modelfrom the true data-generating process A better density fore-cast model is expected to have a smaller value of M1 becauseits generalized residual series Zt(θ) is closer to having theiid U[01] property Thus M1 can be used to rank competingdensity forecast models in terms of their deviations from opti-mality (See the Appendix for more discussion)

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection The model gener-alized residuals Zt(θ) contain much information on modelmisspecification As noted earlier Diebold et al (1998) haveillustrated how to use the histograms of Zt(θ) and autocor-relograms Zm

t (θ) to identify sources of suboptimal densityforecasts Although intuitive and convenient these graphi-cal methods ignore the impact of parameter estimation un-certainty in θ on the asymptotic distribution of evaluationstatistics which generally exists even when n rarr infin Hong(2003) provided a class of rigorous asymptotically N(01)

separate inference statistics M(m l ) that measure whether thecross-correlations between Zm

t (θ) and Zltminus| j|(θ) are significantly

different from 0 Although the moments of the generalized

460 Journal of Business amp Economic Statistics October 2004

residuals Zt are not exactly the same as those of the originaldata rt they are highly correlated In particular the choice of(m l ) = (11) (22) (33) and (44) is very sensitive to au-tocorrelations in level volatility skewness and kurtosis of rtFurthermore the choices of (m l ) = (12) and (21) are sen-sitive to the ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effectDifferent choices of order (m l ) can thus illustrate various dy-namic aspects of the underlying process rt (Again see the Ap-pendix for more discussion)

3 SPOT RATE MODELS

We apply Hongrsquos (2003) procedure to evaluate the densityforecast performance of a variety of popular spot rate modelsincluding single-factor diffusion GARCH regime-switchingand jump-diffusion models We now discuss these modelsin detail

31 Single-Factor Diffusion Models

One important class of spot rate models is the continuous-time diffusion models which have been widely used in modernfinance because of their convenience in pricing fixed-incomesecurities Specifically the spot rate is assumed to follow asingle-factor diffusion

drt = micro(rt θ)dt + σ(rt θ)dWt

where micro(rt θ) and σ(rt θ) are the drift and diffusion functionsand Wt is a standard Brownian motion For diffusion modelsmicro(rt θ) and σ(rt θ) completely determine the model transitiondensity which in turn captures the full dynamics of rt Table 1lists a variety of discretized diffusion models examined in ouranalysis which are nested by Ait-Sahaliarsquos (1996) nonlineardrift model

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 + σ rρ

tminus1zt (1)

where zt simiid N(01) To be consistent with the other mod-els considered in this article we study the discretized versionof the single-factor diffusion and jump-diffusion models Thediscretization bias for daily data that we use in this article isunlikely to be significant (eg Stanton 1997 Das 2002) In (1)we allow the drift to have zero linear and nonlinear specifi-cations and allow the diffusion to be a constant or to dependon the interest rate level which we refer to as the ldquolevel ef-fectrdquo We term the volatility specification in which the elasticityparameter ρ is estimated from the data the constant elasticityvolatility (CEV) Density forecasts for the spot rate rt are givenby the model-implied transition density p(rt t|Itminus1 θ) whereθ is a parameter estimator using the maximum likelihood es-timate (MLE) method Because we study discretized diffusionmodels the MLE method is suitable For the continuous-timediffusion models we can use the closed-form likelihood func-tion if available or the approximated likelihood function viaHermite expansions (see Ait-Sahalia 1999 2002)

32 GARCH Models

Despite the popularity of single-factor diffusion modelsmany studies (eg Brenner Harjes and Kroner 1996 Andersenand Lund 1997) have shown that these models are unreasonablyrestrictive by requiring that the interest rate volatility dependsolely on the interest rate level To capture the well-knownpersistent volatility clustering in interest rates Brenner et al(1996) introduced various GARCH specifications for volatil-ity and showed that GARCH models significantly outperformsingle-factor diffusion models for in-sample fit

To understand the importance of GARCH for density fore-cast we consider six GARCH models as listed in Table 1 whichare nested by the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt

ht = β0 + htminus1(β2 + β1r2ρtminus1z2

tminus1)

zt sim iid N(01)

(2)

We consider three different drift specifications (zero linear andnonlinear drift) and two volatility specifications (pure GARCHand combined CEVndashGARCH) These various GARCH modelsallow us to examine the importance of linear versus nonlineardrift in the presence of GARCH and CEV and the incrementalcontribution of GARCH with respect to CEV For identificationwe set σ = 1 in all GARCH models

Another popular approach to capturing volatility clustering isthe continuous-time stochastic volatility model considered byAndersen and Lund (1997) Gallant and Tauchen (1998) andmany others These studies show that adding a latent stochas-tic volatility factor to a diffusion model significantly improvesthe goodness of fit for interest rates We leave continuous-time stochastic volatility models for future research becausetheir estimation is much more involved On the other handwe also need to be careful about drawing any implicationsfrom GARCH models for continuous-time stochastic volatilitymodels Although Nelson (1990) showed that GARCH mod-els converge to stochastic volatility models in the limit Corradi(2000) later showed that Nelsonrsquos results hold only under hisparticular discretization scheme Other equally reasonable dis-cretizations will lead to very different continuous-time limitsfor GARCH models

33 Markov Regime-Switching Models

Many studies have shown that the behavior of spot rateschanges over time due to changes in monetary policy thebusiness cycle and general macroeconomic conditions TheMarkov regime-switching models of Hamilton (1989) havebeen widely used to model the time-varying behavior of interestrates (eg Gray 1996 Ball and Torous 1998 Ang and Bekaert2002 Li and Xu 2000 among others) Following most existingstudies we consider a class of two-regime models for the spotrate where the latent state variable st follows a two-state first-order Markov chain We refer to the regime in which st = 1 (2)as the first (second) regime Following Ang and Bekaert (2002)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 461

Table 1 Spot Rate Models Considered for Density Forecast Evaluation

Model Mean Volatility

Discretized single-factor diffusion modelsRandom walk α0 σLog-normal α1rtminus1 σ rtminus1Dothan 0 σ rtminus1Pure CEV 0 σ r ρ

tminus1Vasicek α0 + α1rtminus1 σ

CIR α0 + α1rtminus1 σ r 5tminus1CKLS α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

GARCH modelsNo drift GARCH 0 σ

radicht

Linear drift GARCH α0 + α1rtminus1 σradic

htNonlinear drift GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σradic

htNo drift CEVndashGARCH 0 σ r ρ

tminus1

radicht

Linear drift CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

Markov regime-switching models

No drift RS CEV 0 σ (st )rρ(st )tminus1

Linear drift RS CEV α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEV αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

No drift RS GARCH 0 σ (st )radic

htLinear drift RS GARCH α0(st ) + α1(st )rtminus1 σ (st )

radicht

Nonlinear drift RS GARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )

radicht

No drift RS CEVndashGARCH 0 σ (st )rρ(st )tminus1

Linear drift RS CEVndashGARCH α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEVndashGARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

Discretized jump-diffusion modelsNo drift JD CEV 0 σ r ρ

tminus1Linear drift JD CEV α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift JD CEV αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

No drift JD GARCH 0 σradic

htLinear drift JD GARCH α0 + α1rtminus1 σ

radicht

Nonlinear drift JD GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ

radicht

No drift JD CEVndashGARCH 0 σ r ρtminus1

radicht

Linear drift JD CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift JD CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

NOTE For consistency all models are estimated in a discrete time setting using the maximum likelihood method (MLE) The eight discretized single-factor diffusion models are nestedby the following specification rt minus 1 = αminus1rt minus 1 + α0 + α1 rt minus 1 + α2r 2

t minus 1 + σ r ρztt minus 1 where zt sim iid N(0 1) The six GARCH models are nested by the following specification rt =

αminus1rt minus 1 + α0 + α1rt minus 1 + α2r 2t minus 1 + σ r ρ

t minus 1

radicht zt where ht = β0 + ht minus 1 (β2 + β1r 2ρ

t minus 1z2t minus 1 ) and zt sim iid N(0 1) The nine regime-switching models are nested by the following specification

rt = αminus1(st )rt minus 1 + α0 (st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 22 + β2ht minus 1 et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows

a two-state Markov chain with the transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2 The nine discretized jump-diffusion models are nested by the following specifica-tion rt = αminus1rt minus 1 + α0 + α1rt minus 1 + α2 r 2

t minus 1 + σ r ρt minus 1

radicht zt + J(microγ 2 )πt (qt ) where ht = β0 + β1[rt minus 1 minus E(rt minus 1 |rt minus 2 )]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt ) is Bernoulli(qt ) with

qt = (1 + exp(minusc minus drt minus 1 ))minus1

we assume that the transition probability of st depends on theonce-lagged spot rate level

Pr(st = l|stminus1 = l ) = 1

1 + exp(minuscl minus dlrtminus1)

Table 1 lists a variety of regime-switching models all ofwhich are nested by the following specification

rt = αminus1(st)rtminus1 + α0(st) + α1(st)rtminus1

+ α2(st)r2tminus1 + σ(st)r

ρ(st)tminus1

radichtzt

ht = β0 + β1Eet|rtminus2 stminus22 + β2htminus1

et = [rtminus1 minus E(rtminus1|rtminus2 stminus1)]σ(stminus1)

zt sim iid N(01)

(3)

As in the aforementioned models we consider three specifica-tions of the conditional mean zero linear and nonlinear driftWe also consider three specifications of the conditional vari-ance CEV GARCH and combined CEVndashGARCH Thus wehave a total of nine regime-switching models

Although Gray (1996) removed the path-dependent natureof GARCH models by averaging the conditional and un-conditional variances over regimes at every time point weuse the same GARCH specification across different regimesConfirming results of Ang and Bekaert (2002) we find a veryunstable estimation of Grayrsquos (1996) model In contrast ourspecification turns out to have much better convergence prop-erties Unlike many previous studies that set the elasticityparameter equal to 5 we allow it to be regime-dependent and

462 Journal of Business amp Economic Statistics October 2004

estimate it from the data For identification we set the diffusionconstant σ(st) = 1 for st = 1 in the regime-switching modelswith GARCH effect

The conditional density of the interest rate rt in a regime-switching model is

p(rt|Itminus1) =2sum

l=1

p(rt|st = l Itminus1)Pr(st = l|Itminus1)

where the ex ante probability that the data are generated fromregime l at t Pr(st = l|Itminus1) can be obtained using Bayesrsquos rulevia a recursive procedure described by Hamilton (1989) Theconditional density of regime-switching models is a mixture oftwo normal distributions which can generate unimodal or bi-modal distributions and allows for great flexibility in modelingskewness kurtosis and heavy tails

34 Jump-Diffusion Models

There are compelling economic and statistical reasons toaccount for discontinuity in interest rate dynamics Various eco-nomic shocks news announcements and government interven-tions in bond markets have pronounced effects on the behaviorof interest rates and tend to generate large jumps in interestrate data Statistically Das (2002) and Johannes (2003) amongothers have shown that diffusion models (even with stochasticvolatility) cannot generate the excessive leptokurtosis exhibitedby the changes of the spot rates and that jump-diffusion mod-els are a convenient way to generate excessive kurtosis or moregenerally heavy tails

We consider a class of discretized jump-diffusion mod-els listed Table 1 As in the aforementioned models weconsider zero linear and nonlinear drift specifications andCEV GARCH and combined CEVndashGARCH specifications forvolatility The nine different jump-diffusion models are nestedby the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt + J(microγ 2)πt(qt)

ht = β0 + β1[rtminus1 minus E(rtminus1|rtminus2)]2 + β2htminus1

zt sim iid N(01)

πt(qt) sim independent Bernoulli(qt)

J sim N(microγ 2)

(4)

where J is the jump size and qt is the jump probability withqt = 1

1+exp(minuscminusdrtminus1)

The conditional density of the foregoing jump-diffusionmodels can be written as

f (rt|rtminus1)

= (1 minus qt)1radic

2πσ 2(rtminus1)exp

minus[rt minus micro(rtminus1)]2

2σ 2(rtminus1)

+ qt1radic

2π[σ 2(rtminus1) + σ 2J ]

times exp

minus[rt minus micro(rtminus1) minus micro]2

2[σ 2(rtminus1) + γ 2]

where micro(rtminus1) and σ 2(rtminus1) are the conditional mean and vari-ance of the diffusion part in (4) Similar to the regime-switchingmodels the conditional density of the jump-diffusion mod-els is also a mixture of two normal distributions Thus thesetwo classes of models represent two alternative approachesto generating excess kurtosis and heavy tails although theregime-switching models have more sophisticated specifica-tions For example in (3) all drift parameters are regime-dependent whereas in (4) only the intercept term is differentin the conditional mean and variance For the regime-switchingmodels in (3) the state probabilities evolve according to a tran-sition matrix with updated priors at every time point Thus itcan capture both very persistent and transient regime shifts Forthe jump-diffusion models in (4) the state probabilities are as-sumed to depend on the past interest rate level

The in-sample performances of the four classes of modelsthat we study have been individually studied in the literatureHowever no one has yet attempted a systematic evaluation ofall of these models in a unified setup particularly in the out-of-sample context because of the difficulty in density forecastevaluation and because of different nonnested specificationsbetween existing classes of models In the next section we com-pare the relative performance of these four classes of modelsin out-of-sample density forecast using the evaluation methoddescribed in Section 2 This comparison can provide valuableinformation about the relative strengths and weaknesses of eachmodel and should be important for many applications

4 MODEL ESTIMATION ANDINndashSAMPLE PERFORMANCE

41 Data and Estimation Method

In modeling spot interest rate dynamics yields on short-termdebts are often used as proxies for the unobservable instan-taneous risk-free rate These include the 1-month T-bill ratesused by Gray (1996) and Chan Karolyi Longstaff and Sanders(CKLS) (1992) the 3-month T-bill rates used by Stanton (1997)and Andersen and Lund (1997) the 7-day Eurodollar rates usedby Ait-Sahalia (1996) and Hong and Li (2004) and the Fedfunds rates used by Conley Hansen Luttmer and Scheinkman(1997) and Das (2002) In our study we follow CKLS (1992)and use the daily 1-month T-bill rates from June 14 1961 to De-cember 29 2000 with a total of 9868 observations The dataare extracted from the CRSP bond file using the midpoint ofquoted bid and ask prices of the T-bills with a remaining ma-turity that is closest to 1 month (30 calendar days) from thecurrent date Most of these T-bills have maturities of 6 monthsor 1 year when first issued and all T-bills used in our study havea remaining maturity between 27 and 33 days We then com-pute the annualized continuously compounded yield based onthe average of the ask and bid prices

Figure 1 plots the level and change series of the daily1-month T-bill rates as well as their histograms There isobvious persistent volatility clustering and in general thevolatility is higher at a higher level of the interest rate (egthe 1979ndash1982 period) The marginal distribution of the inter-est rate level is skewed to the right with a long right tail Mostdaily changes in the interest rate level are very small giving a

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 463

Figure 1 Daily 1-Month T-Bill Rates Between June 14 1961 and December 29 2000 This figure plots the level and change series of the daily1-month T-bill rates as well as their histograms The data are extracted from the CRSP bond file using the midpoint of quoted bid and ask prices ofthe T-bills with a remaining maturity that is closest to 1 month (30 calendar days) from the current date

sharp probability peak around 0 Daily changes of other spot in-terest rates such as 3-month T-bill and 7-day Eurodollar ratesalso exhibit a high peak around 0 But the peak is much lesspronounced for weekly changes in spot rates This togetherwith the long (right) tail implies excess kurtosis a stylized factthat has motivated the use of jump models in the literature (egDas 2002 Johannes 2003)

We divide our data into two subsamples The first fromJune 14 1961 to February 22 1991 (with a total of 7400 ob-servations) is a sample used to estimate model parameters thesecond from February 23 1991 to December 29 2000 (witha total of 2467 observations) is a prediction sample used toevaluate out-of-sample density forecasts We first consider thein-sample performance of the various models listed in Table 1all of which are estimated using the MLE method The opti-mization algorithm is the well-known BHHH with STEPBT forstep length calculation and is implemented via the constrainedoptimization code in GAUSS Windows Version 50 The op-timization tolerance level is set such that the gradients of theparameters are less than or equal to 10minus6

Tables 2ndash5 report parameter estimates of four classes of spotrate models using daily 1-month T-bill rates from June 14 1961to February 22 1991 with a total of 7400 observations To ac-count for the special period between October 1979 and Septem-ber 1982 we introduce dummy variables to the drift volatilityand elasticity parameters that is αD σD and ρD equal 0 out-side of the 1979ndash1982 period We also introduce a dummyvariable D87 in the interest rate level to account for the effectof 1987 stock market crash D87 equals 0 except for the weekright after the stock market crash Estimated robust standard er-rors are reported in parentheses

42 In-Sample Evidence

We first discuss the in-sample performance of models withineach class and then compare their performance across dif-ferent classes Many previous studies (eg Bliss and Smith

1998) have shown that the period between October 1979 andSeptember 1982 which coincides with the Federal Reserve ex-periment has a big impact on model parameter estimates Toaccount for the special feature of this period we introducedummy variables in the drift volatility and elasticity parame-ters for this period We also introduce a dummy variable in theinterest rate level for the week right after 1987 stock marketcrash Brenner et al (1996) showed that the decline of the in-terest rates during this week is too dramatic to be captured bymost standard models

Table 2 reports parameter estimates with estimated robuststandard errors and log-likelihood values for discretized single-factor diffusion models The estimates of the drift parametersof Vasicek CIR and CKLS models all suggest mean-reversionin the conditional mean with an estimated 6 long-run meanFor other models such as the random walk log-normal andnonlinear drift models most drift parameters are not signifi-cant A comparison of the pure CEV CKLS and Ait-Sahalia(1996) nonlinear drift models indicates that the incrementalcontribution of nonlinear drift is very marginal which is notsurprising given that the drift parameter estimates in the non-linear drift model are mostly insignificant On the other handthere is also a clear evidence of level effect all estimates ofthe elasticity parameter are significant Unlike previous stud-ies (eg CKLS 1992) which found an estimated elasticity pa-rameter close to 15 our estimate is about 25 (70) outside(inside) the 1979ndash1982 period Our results confirm previousfindings of Brenner et al (1996) Andersen and Lund (1997)Bliss and Smith (1998) and Koedijk Nissen Schotman andWolff (1997) that the estimate of the elasticity parameter isvery sensitive to the choice of interest rate data data frequencysample periods and specifications of the volatility functionThe 1979ndash1982 period dummies suggest that the drift does notbehave very differently (ie αD is not significant) whereasvolatility is significantly higher and depends more heavily on

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 459

This is a challenging job because we never observe an ex postdensity So far there are few statistical evaluation procedures fordensity forecasts

Diebold et al (1998) used the probability integral transformof data with respect to the density forecast model to assess theoptimality of density forecasts Extending a result establishedby Rosenblatt (1952) they showed that if the model conditionaldensity is specified correctly then the probability integral trans-formed series should be iid U[01]

Specifically for a given model of interest rate rt there is amodel-implied conditional density

part

partrP(rt le r|Itminus1 θ) = p(r t|Itminus1 θ)

where θ is an unknown finite-dimensional parameter vectorand Itminus1 equiv rtminus1 r1 is the information set available attime t minus 1 Suppose that we have a random sample rtN

t=1 ofsize N which we divide into two subsets an estimation sam-ple rtR

t=1 of size R and a prediction sample rtNt=R+1 of size

n = N minus R We can then define the dynamic probability inte-gral transform of the data rtN

t=R+1 with respect to the modelconditional density

Zt(θ) equivint rt

minusinfinp(r t|Itminus1 θ)dr t = R + 1 N

Suppose that the model is correctly specified in the sense thatthere exists some θ0 such that p(r t|Itminus1 θ0) coincides withthe true conditional density of rt Then the transformed se-quence Zt(θ0) is iid U[01]

For example consider the discretized Vasicek model

rt = α0 + α1rtminus1 + σ zt

where zt sim iid N(01) For this model the conditional densityof rt given Itminus1 is

p(r t|Itminus1 θ)

= 1radic2πσ 2

exp

minus 1

2σ 2

[r minus (

α0 + (1 + α1)rtminus1)]2

r isin (minusinfininfin)

where θ = (α0 α1 σ )prime Suppose that p(r t|Itminus1 θ0) =p0(r t|Itminus1) for some θ0 where p0(r t|Itminus1) is the true con-ditional density of rt Then

Zt(θ0) =int rt

minusinfinp(r t|Itminus1 θ0)dr = (zt) sim iid U[01]

where (middot) is the N(01) cumulative distribution function (cdf)As noted earlier the transformed series Zt(θ ) is called the

ldquogeneralized residualrdquo of model p(r t|Itminus1 θ) where θ is a con-sistent estimator for θ0 This series provides a convenient ap-proach for evaluating the density forecast model p(r t|Itminus1 θ)Intuitively the U[01] property characterizes the correct spec-ification for the stationary distribution of rt and the iidproperty characterizes correct dynamic specification for rtIf Zt(θ) is not iid U[01] for all θ then the density forecastmodel p(r t|Itminus1 θ) is not optimal and there is room for fur-ther improvement

To test the joint hypothesis of iid U[01] for Zt(θ) is non-trivial One may suggest using the well-known KolmogorovndashSmirnov test which unfortunately checks U[01] under the

iid assumption rather than checking iid U[01] jointly It willmiss the alternatives for which Zt(θ) is uniform but not iidSimilarly tests for iid alone will miss the alternatives for whichZt(θ) is iid but not U[01] Moreover parameter estimationuncertainty is expected to affect the asymptotic distribution ofthe KolmogorovndashSmirnov test statistic

Diebold et al (1998) used the autocorrelations of Zmt (θ) to

check the iid property and the histogram of Zt(θ) to checkthe U[01] property These graphical procedures are simpleand informative To compare and rank different models how-ever it is desirable to use a single evaluation criterion Hong(2003) recently developed an omnibus evaluation procedure forout-of-sample density forecast that tests the joint hypothesis ofiid U[01] and explicitly addresses the impact of parameter esti-mation uncertainty on the evaluation statistics This is achievedusing a generalized spectral approach The generalized spec-trum was first introduced by Hong (1999) as an analytic toolfor nonlinear time series analysis It is based on the charac-teristic function transformation embedded in a spectral frame-work Unlike the power spectrum the generalized spectrum cancapture any kind of pairwise serial dependence across variouslags including those with zero autocorrelation [eg an ARCHprocess with iid N(01) innovations]

Specifically Hong (2003) introduced a modified generalizedspectral density function that incorporates the information ofiid U[01] under the null hypothesis Consequently the mod-ified generalized spectrum becomes a known ldquoflatrdquo functionunder the null hypothesis of iid U[01] Whenever Zt(θ) de-pends on Ztminusj(θ) for some j gt 0 or Zt(θ) is not U[01] themodified generalized spectrum will be nonflat as a functionof frequency ω Hong (2003) proposed an evaluation statistic(denoted by M1) for out-of-sample density forecasts by com-paring a kernel estimator of the modified generalized spectrumwith the flat spectrum

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the density forecast modelfrom the true data-generating process A better density fore-cast model is expected to have a smaller value of M1 becauseits generalized residual series Zt(θ) is closer to having theiid U[01] property Thus M1 can be used to rank competingdensity forecast models in terms of their deviations from opti-mality (See the Appendix for more discussion)

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection The model gener-alized residuals Zt(θ) contain much information on modelmisspecification As noted earlier Diebold et al (1998) haveillustrated how to use the histograms of Zt(θ) and autocor-relograms Zm

t (θ) to identify sources of suboptimal densityforecasts Although intuitive and convenient these graphi-cal methods ignore the impact of parameter estimation un-certainty in θ on the asymptotic distribution of evaluationstatistics which generally exists even when n rarr infin Hong(2003) provided a class of rigorous asymptotically N(01)

separate inference statistics M(m l ) that measure whether thecross-correlations between Zm

t (θ) and Zltminus| j|(θ) are significantly

different from 0 Although the moments of the generalized

460 Journal of Business amp Economic Statistics October 2004

residuals Zt are not exactly the same as those of the originaldata rt they are highly correlated In particular the choice of(m l ) = (11) (22) (33) and (44) is very sensitive to au-tocorrelations in level volatility skewness and kurtosis of rtFurthermore the choices of (m l ) = (12) and (21) are sen-sitive to the ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effectDifferent choices of order (m l ) can thus illustrate various dy-namic aspects of the underlying process rt (Again see the Ap-pendix for more discussion)

3 SPOT RATE MODELS

We apply Hongrsquos (2003) procedure to evaluate the densityforecast performance of a variety of popular spot rate modelsincluding single-factor diffusion GARCH regime-switchingand jump-diffusion models We now discuss these modelsin detail

31 Single-Factor Diffusion Models

One important class of spot rate models is the continuous-time diffusion models which have been widely used in modernfinance because of their convenience in pricing fixed-incomesecurities Specifically the spot rate is assumed to follow asingle-factor diffusion

drt = micro(rt θ)dt + σ(rt θ)dWt

where micro(rt θ) and σ(rt θ) are the drift and diffusion functionsand Wt is a standard Brownian motion For diffusion modelsmicro(rt θ) and σ(rt θ) completely determine the model transitiondensity which in turn captures the full dynamics of rt Table 1lists a variety of discretized diffusion models examined in ouranalysis which are nested by Ait-Sahaliarsquos (1996) nonlineardrift model

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 + σ rρ

tminus1zt (1)

where zt simiid N(01) To be consistent with the other mod-els considered in this article we study the discretized versionof the single-factor diffusion and jump-diffusion models Thediscretization bias for daily data that we use in this article isunlikely to be significant (eg Stanton 1997 Das 2002) In (1)we allow the drift to have zero linear and nonlinear specifi-cations and allow the diffusion to be a constant or to dependon the interest rate level which we refer to as the ldquolevel ef-fectrdquo We term the volatility specification in which the elasticityparameter ρ is estimated from the data the constant elasticityvolatility (CEV) Density forecasts for the spot rate rt are givenby the model-implied transition density p(rt t|Itminus1 θ) whereθ is a parameter estimator using the maximum likelihood es-timate (MLE) method Because we study discretized diffusionmodels the MLE method is suitable For the continuous-timediffusion models we can use the closed-form likelihood func-tion if available or the approximated likelihood function viaHermite expansions (see Ait-Sahalia 1999 2002)

32 GARCH Models

Despite the popularity of single-factor diffusion modelsmany studies (eg Brenner Harjes and Kroner 1996 Andersenand Lund 1997) have shown that these models are unreasonablyrestrictive by requiring that the interest rate volatility dependsolely on the interest rate level To capture the well-knownpersistent volatility clustering in interest rates Brenner et al(1996) introduced various GARCH specifications for volatil-ity and showed that GARCH models significantly outperformsingle-factor diffusion models for in-sample fit

To understand the importance of GARCH for density fore-cast we consider six GARCH models as listed in Table 1 whichare nested by the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt

ht = β0 + htminus1(β2 + β1r2ρtminus1z2

tminus1)

zt sim iid N(01)

(2)

We consider three different drift specifications (zero linear andnonlinear drift) and two volatility specifications (pure GARCHand combined CEVndashGARCH) These various GARCH modelsallow us to examine the importance of linear versus nonlineardrift in the presence of GARCH and CEV and the incrementalcontribution of GARCH with respect to CEV For identificationwe set σ = 1 in all GARCH models

Another popular approach to capturing volatility clustering isthe continuous-time stochastic volatility model considered byAndersen and Lund (1997) Gallant and Tauchen (1998) andmany others These studies show that adding a latent stochas-tic volatility factor to a diffusion model significantly improvesthe goodness of fit for interest rates We leave continuous-time stochastic volatility models for future research becausetheir estimation is much more involved On the other handwe also need to be careful about drawing any implicationsfrom GARCH models for continuous-time stochastic volatilitymodels Although Nelson (1990) showed that GARCH mod-els converge to stochastic volatility models in the limit Corradi(2000) later showed that Nelsonrsquos results hold only under hisparticular discretization scheme Other equally reasonable dis-cretizations will lead to very different continuous-time limitsfor GARCH models

33 Markov Regime-Switching Models

Many studies have shown that the behavior of spot rateschanges over time due to changes in monetary policy thebusiness cycle and general macroeconomic conditions TheMarkov regime-switching models of Hamilton (1989) havebeen widely used to model the time-varying behavior of interestrates (eg Gray 1996 Ball and Torous 1998 Ang and Bekaert2002 Li and Xu 2000 among others) Following most existingstudies we consider a class of two-regime models for the spotrate where the latent state variable st follows a two-state first-order Markov chain We refer to the regime in which st = 1 (2)as the first (second) regime Following Ang and Bekaert (2002)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 461

Table 1 Spot Rate Models Considered for Density Forecast Evaluation

Model Mean Volatility

Discretized single-factor diffusion modelsRandom walk α0 σLog-normal α1rtminus1 σ rtminus1Dothan 0 σ rtminus1Pure CEV 0 σ r ρ

tminus1Vasicek α0 + α1rtminus1 σ

CIR α0 + α1rtminus1 σ r 5tminus1CKLS α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

GARCH modelsNo drift GARCH 0 σ

radicht

Linear drift GARCH α0 + α1rtminus1 σradic

htNonlinear drift GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σradic

htNo drift CEVndashGARCH 0 σ r ρ

tminus1

radicht

Linear drift CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

Markov regime-switching models

No drift RS CEV 0 σ (st )rρ(st )tminus1

Linear drift RS CEV α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEV αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

No drift RS GARCH 0 σ (st )radic

htLinear drift RS GARCH α0(st ) + α1(st )rtminus1 σ (st )

radicht

Nonlinear drift RS GARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )

radicht

No drift RS CEVndashGARCH 0 σ (st )rρ(st )tminus1

Linear drift RS CEVndashGARCH α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEVndashGARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

Discretized jump-diffusion modelsNo drift JD CEV 0 σ r ρ

tminus1Linear drift JD CEV α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift JD CEV αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

No drift JD GARCH 0 σradic

htLinear drift JD GARCH α0 + α1rtminus1 σ

radicht

Nonlinear drift JD GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ

radicht

No drift JD CEVndashGARCH 0 σ r ρtminus1

radicht

Linear drift JD CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift JD CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

NOTE For consistency all models are estimated in a discrete time setting using the maximum likelihood method (MLE) The eight discretized single-factor diffusion models are nestedby the following specification rt minus 1 = αminus1rt minus 1 + α0 + α1 rt minus 1 + α2r 2

t minus 1 + σ r ρztt minus 1 where zt sim iid N(0 1) The six GARCH models are nested by the following specification rt =

αminus1rt minus 1 + α0 + α1rt minus 1 + α2r 2t minus 1 + σ r ρ

t minus 1

radicht zt where ht = β0 + ht minus 1 (β2 + β1r 2ρ

t minus 1z2t minus 1 ) and zt sim iid N(0 1) The nine regime-switching models are nested by the following specification

rt = αminus1(st )rt minus 1 + α0 (st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 22 + β2ht minus 1 et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows

a two-state Markov chain with the transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2 The nine discretized jump-diffusion models are nested by the following specifica-tion rt = αminus1rt minus 1 + α0 + α1rt minus 1 + α2 r 2

t minus 1 + σ r ρt minus 1

radicht zt + J(microγ 2 )πt (qt ) where ht = β0 + β1[rt minus 1 minus E(rt minus 1 |rt minus 2 )]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt ) is Bernoulli(qt ) with

qt = (1 + exp(minusc minus drt minus 1 ))minus1

we assume that the transition probability of st depends on theonce-lagged spot rate level

Pr(st = l|stminus1 = l ) = 1

1 + exp(minuscl minus dlrtminus1)

Table 1 lists a variety of regime-switching models all ofwhich are nested by the following specification

rt = αminus1(st)rtminus1 + α0(st) + α1(st)rtminus1

+ α2(st)r2tminus1 + σ(st)r

ρ(st)tminus1

radichtzt

ht = β0 + β1Eet|rtminus2 stminus22 + β2htminus1

et = [rtminus1 minus E(rtminus1|rtminus2 stminus1)]σ(stminus1)

zt sim iid N(01)

(3)

As in the aforementioned models we consider three specifica-tions of the conditional mean zero linear and nonlinear driftWe also consider three specifications of the conditional vari-ance CEV GARCH and combined CEVndashGARCH Thus wehave a total of nine regime-switching models

Although Gray (1996) removed the path-dependent natureof GARCH models by averaging the conditional and un-conditional variances over regimes at every time point weuse the same GARCH specification across different regimesConfirming results of Ang and Bekaert (2002) we find a veryunstable estimation of Grayrsquos (1996) model In contrast ourspecification turns out to have much better convergence prop-erties Unlike many previous studies that set the elasticityparameter equal to 5 we allow it to be regime-dependent and

462 Journal of Business amp Economic Statistics October 2004

estimate it from the data For identification we set the diffusionconstant σ(st) = 1 for st = 1 in the regime-switching modelswith GARCH effect

The conditional density of the interest rate rt in a regime-switching model is

p(rt|Itminus1) =2sum

l=1

p(rt|st = l Itminus1)Pr(st = l|Itminus1)

where the ex ante probability that the data are generated fromregime l at t Pr(st = l|Itminus1) can be obtained using Bayesrsquos rulevia a recursive procedure described by Hamilton (1989) Theconditional density of regime-switching models is a mixture oftwo normal distributions which can generate unimodal or bi-modal distributions and allows for great flexibility in modelingskewness kurtosis and heavy tails

34 Jump-Diffusion Models

There are compelling economic and statistical reasons toaccount for discontinuity in interest rate dynamics Various eco-nomic shocks news announcements and government interven-tions in bond markets have pronounced effects on the behaviorof interest rates and tend to generate large jumps in interestrate data Statistically Das (2002) and Johannes (2003) amongothers have shown that diffusion models (even with stochasticvolatility) cannot generate the excessive leptokurtosis exhibitedby the changes of the spot rates and that jump-diffusion mod-els are a convenient way to generate excessive kurtosis or moregenerally heavy tails

We consider a class of discretized jump-diffusion mod-els listed Table 1 As in the aforementioned models weconsider zero linear and nonlinear drift specifications andCEV GARCH and combined CEVndashGARCH specifications forvolatility The nine different jump-diffusion models are nestedby the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt + J(microγ 2)πt(qt)

ht = β0 + β1[rtminus1 minus E(rtminus1|rtminus2)]2 + β2htminus1

zt sim iid N(01)

πt(qt) sim independent Bernoulli(qt)

J sim N(microγ 2)

(4)

where J is the jump size and qt is the jump probability withqt = 1

1+exp(minuscminusdrtminus1)

The conditional density of the foregoing jump-diffusionmodels can be written as

f (rt|rtminus1)

= (1 minus qt)1radic

2πσ 2(rtminus1)exp

minus[rt minus micro(rtminus1)]2

2σ 2(rtminus1)

+ qt1radic

2π[σ 2(rtminus1) + σ 2J ]

times exp

minus[rt minus micro(rtminus1) minus micro]2

2[σ 2(rtminus1) + γ 2]

where micro(rtminus1) and σ 2(rtminus1) are the conditional mean and vari-ance of the diffusion part in (4) Similar to the regime-switchingmodels the conditional density of the jump-diffusion mod-els is also a mixture of two normal distributions Thus thesetwo classes of models represent two alternative approachesto generating excess kurtosis and heavy tails although theregime-switching models have more sophisticated specifica-tions For example in (3) all drift parameters are regime-dependent whereas in (4) only the intercept term is differentin the conditional mean and variance For the regime-switchingmodels in (3) the state probabilities evolve according to a tran-sition matrix with updated priors at every time point Thus itcan capture both very persistent and transient regime shifts Forthe jump-diffusion models in (4) the state probabilities are as-sumed to depend on the past interest rate level

The in-sample performances of the four classes of modelsthat we study have been individually studied in the literatureHowever no one has yet attempted a systematic evaluation ofall of these models in a unified setup particularly in the out-of-sample context because of the difficulty in density forecastevaluation and because of different nonnested specificationsbetween existing classes of models In the next section we com-pare the relative performance of these four classes of modelsin out-of-sample density forecast using the evaluation methoddescribed in Section 2 This comparison can provide valuableinformation about the relative strengths and weaknesses of eachmodel and should be important for many applications

4 MODEL ESTIMATION ANDINndashSAMPLE PERFORMANCE

41 Data and Estimation Method

In modeling spot interest rate dynamics yields on short-termdebts are often used as proxies for the unobservable instan-taneous risk-free rate These include the 1-month T-bill ratesused by Gray (1996) and Chan Karolyi Longstaff and Sanders(CKLS) (1992) the 3-month T-bill rates used by Stanton (1997)and Andersen and Lund (1997) the 7-day Eurodollar rates usedby Ait-Sahalia (1996) and Hong and Li (2004) and the Fedfunds rates used by Conley Hansen Luttmer and Scheinkman(1997) and Das (2002) In our study we follow CKLS (1992)and use the daily 1-month T-bill rates from June 14 1961 to De-cember 29 2000 with a total of 9868 observations The dataare extracted from the CRSP bond file using the midpoint ofquoted bid and ask prices of the T-bills with a remaining ma-turity that is closest to 1 month (30 calendar days) from thecurrent date Most of these T-bills have maturities of 6 monthsor 1 year when first issued and all T-bills used in our study havea remaining maturity between 27 and 33 days We then com-pute the annualized continuously compounded yield based onthe average of the ask and bid prices

Figure 1 plots the level and change series of the daily1-month T-bill rates as well as their histograms There isobvious persistent volatility clustering and in general thevolatility is higher at a higher level of the interest rate (egthe 1979ndash1982 period) The marginal distribution of the inter-est rate level is skewed to the right with a long right tail Mostdaily changes in the interest rate level are very small giving a

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 463

Figure 1 Daily 1-Month T-Bill Rates Between June 14 1961 and December 29 2000 This figure plots the level and change series of the daily1-month T-bill rates as well as their histograms The data are extracted from the CRSP bond file using the midpoint of quoted bid and ask prices ofthe T-bills with a remaining maturity that is closest to 1 month (30 calendar days) from the current date

sharp probability peak around 0 Daily changes of other spot in-terest rates such as 3-month T-bill and 7-day Eurodollar ratesalso exhibit a high peak around 0 But the peak is much lesspronounced for weekly changes in spot rates This togetherwith the long (right) tail implies excess kurtosis a stylized factthat has motivated the use of jump models in the literature (egDas 2002 Johannes 2003)

We divide our data into two subsamples The first fromJune 14 1961 to February 22 1991 (with a total of 7400 ob-servations) is a sample used to estimate model parameters thesecond from February 23 1991 to December 29 2000 (witha total of 2467 observations) is a prediction sample used toevaluate out-of-sample density forecasts We first consider thein-sample performance of the various models listed in Table 1all of which are estimated using the MLE method The opti-mization algorithm is the well-known BHHH with STEPBT forstep length calculation and is implemented via the constrainedoptimization code in GAUSS Windows Version 50 The op-timization tolerance level is set such that the gradients of theparameters are less than or equal to 10minus6

Tables 2ndash5 report parameter estimates of four classes of spotrate models using daily 1-month T-bill rates from June 14 1961to February 22 1991 with a total of 7400 observations To ac-count for the special period between October 1979 and Septem-ber 1982 we introduce dummy variables to the drift volatilityand elasticity parameters that is αD σD and ρD equal 0 out-side of the 1979ndash1982 period We also introduce a dummyvariable D87 in the interest rate level to account for the effectof 1987 stock market crash D87 equals 0 except for the weekright after the stock market crash Estimated robust standard er-rors are reported in parentheses

42 In-Sample Evidence

We first discuss the in-sample performance of models withineach class and then compare their performance across dif-ferent classes Many previous studies (eg Bliss and Smith

1998) have shown that the period between October 1979 andSeptember 1982 which coincides with the Federal Reserve ex-periment has a big impact on model parameter estimates Toaccount for the special feature of this period we introducedummy variables in the drift volatility and elasticity parame-ters for this period We also introduce a dummy variable in theinterest rate level for the week right after 1987 stock marketcrash Brenner et al (1996) showed that the decline of the in-terest rates during this week is too dramatic to be captured bymost standard models

Table 2 reports parameter estimates with estimated robuststandard errors and log-likelihood values for discretized single-factor diffusion models The estimates of the drift parametersof Vasicek CIR and CKLS models all suggest mean-reversionin the conditional mean with an estimated 6 long-run meanFor other models such as the random walk log-normal andnonlinear drift models most drift parameters are not signifi-cant A comparison of the pure CEV CKLS and Ait-Sahalia(1996) nonlinear drift models indicates that the incrementalcontribution of nonlinear drift is very marginal which is notsurprising given that the drift parameter estimates in the non-linear drift model are mostly insignificant On the other handthere is also a clear evidence of level effect all estimates ofthe elasticity parameter are significant Unlike previous stud-ies (eg CKLS 1992) which found an estimated elasticity pa-rameter close to 15 our estimate is about 25 (70) outside(inside) the 1979ndash1982 period Our results confirm previousfindings of Brenner et al (1996) Andersen and Lund (1997)Bliss and Smith (1998) and Koedijk Nissen Schotman andWolff (1997) that the estimate of the elasticity parameter isvery sensitive to the choice of interest rate data data frequencysample periods and specifications of the volatility functionThe 1979ndash1982 period dummies suggest that the drift does notbehave very differently (ie αD is not significant) whereasvolatility is significantly higher and depends more heavily on

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

460 Journal of Business amp Economic Statistics October 2004

residuals Zt are not exactly the same as those of the originaldata rt they are highly correlated In particular the choice of(m l ) = (11) (22) (33) and (44) is very sensitive to au-tocorrelations in level volatility skewness and kurtosis of rtFurthermore the choices of (m l ) = (12) and (21) are sen-sitive to the ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effectDifferent choices of order (m l ) can thus illustrate various dy-namic aspects of the underlying process rt (Again see the Ap-pendix for more discussion)

3 SPOT RATE MODELS

We apply Hongrsquos (2003) procedure to evaluate the densityforecast performance of a variety of popular spot rate modelsincluding single-factor diffusion GARCH regime-switchingand jump-diffusion models We now discuss these modelsin detail

31 Single-Factor Diffusion Models

One important class of spot rate models is the continuous-time diffusion models which have been widely used in modernfinance because of their convenience in pricing fixed-incomesecurities Specifically the spot rate is assumed to follow asingle-factor diffusion

drt = micro(rt θ)dt + σ(rt θ)dWt

where micro(rt θ) and σ(rt θ) are the drift and diffusion functionsand Wt is a standard Brownian motion For diffusion modelsmicro(rt θ) and σ(rt θ) completely determine the model transitiondensity which in turn captures the full dynamics of rt Table 1lists a variety of discretized diffusion models examined in ouranalysis which are nested by Ait-Sahaliarsquos (1996) nonlineardrift model

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 + σ rρ

tminus1zt (1)

where zt simiid N(01) To be consistent with the other mod-els considered in this article we study the discretized versionof the single-factor diffusion and jump-diffusion models Thediscretization bias for daily data that we use in this article isunlikely to be significant (eg Stanton 1997 Das 2002) In (1)we allow the drift to have zero linear and nonlinear specifi-cations and allow the diffusion to be a constant or to dependon the interest rate level which we refer to as the ldquolevel ef-fectrdquo We term the volatility specification in which the elasticityparameter ρ is estimated from the data the constant elasticityvolatility (CEV) Density forecasts for the spot rate rt are givenby the model-implied transition density p(rt t|Itminus1 θ) whereθ is a parameter estimator using the maximum likelihood es-timate (MLE) method Because we study discretized diffusionmodels the MLE method is suitable For the continuous-timediffusion models we can use the closed-form likelihood func-tion if available or the approximated likelihood function viaHermite expansions (see Ait-Sahalia 1999 2002)

32 GARCH Models

Despite the popularity of single-factor diffusion modelsmany studies (eg Brenner Harjes and Kroner 1996 Andersenand Lund 1997) have shown that these models are unreasonablyrestrictive by requiring that the interest rate volatility dependsolely on the interest rate level To capture the well-knownpersistent volatility clustering in interest rates Brenner et al(1996) introduced various GARCH specifications for volatil-ity and showed that GARCH models significantly outperformsingle-factor diffusion models for in-sample fit

To understand the importance of GARCH for density fore-cast we consider six GARCH models as listed in Table 1 whichare nested by the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt

ht = β0 + htminus1(β2 + β1r2ρtminus1z2

tminus1)

zt sim iid N(01)

(2)

We consider three different drift specifications (zero linear andnonlinear drift) and two volatility specifications (pure GARCHand combined CEVndashGARCH) These various GARCH modelsallow us to examine the importance of linear versus nonlineardrift in the presence of GARCH and CEV and the incrementalcontribution of GARCH with respect to CEV For identificationwe set σ = 1 in all GARCH models

Another popular approach to capturing volatility clustering isthe continuous-time stochastic volatility model considered byAndersen and Lund (1997) Gallant and Tauchen (1998) andmany others These studies show that adding a latent stochas-tic volatility factor to a diffusion model significantly improvesthe goodness of fit for interest rates We leave continuous-time stochastic volatility models for future research becausetheir estimation is much more involved On the other handwe also need to be careful about drawing any implicationsfrom GARCH models for continuous-time stochastic volatilitymodels Although Nelson (1990) showed that GARCH mod-els converge to stochastic volatility models in the limit Corradi(2000) later showed that Nelsonrsquos results hold only under hisparticular discretization scheme Other equally reasonable dis-cretizations will lead to very different continuous-time limitsfor GARCH models

33 Markov Regime-Switching Models

Many studies have shown that the behavior of spot rateschanges over time due to changes in monetary policy thebusiness cycle and general macroeconomic conditions TheMarkov regime-switching models of Hamilton (1989) havebeen widely used to model the time-varying behavior of interestrates (eg Gray 1996 Ball and Torous 1998 Ang and Bekaert2002 Li and Xu 2000 among others) Following most existingstudies we consider a class of two-regime models for the spotrate where the latent state variable st follows a two-state first-order Markov chain We refer to the regime in which st = 1 (2)as the first (second) regime Following Ang and Bekaert (2002)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 461

Table 1 Spot Rate Models Considered for Density Forecast Evaluation

Model Mean Volatility

Discretized single-factor diffusion modelsRandom walk α0 σLog-normal α1rtminus1 σ rtminus1Dothan 0 σ rtminus1Pure CEV 0 σ r ρ

tminus1Vasicek α0 + α1rtminus1 σ

CIR α0 + α1rtminus1 σ r 5tminus1CKLS α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

GARCH modelsNo drift GARCH 0 σ

radicht

Linear drift GARCH α0 + α1rtminus1 σradic

htNonlinear drift GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σradic

htNo drift CEVndashGARCH 0 σ r ρ

tminus1

radicht

Linear drift CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

Markov regime-switching models

No drift RS CEV 0 σ (st )rρ(st )tminus1

Linear drift RS CEV α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEV αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

No drift RS GARCH 0 σ (st )radic

htLinear drift RS GARCH α0(st ) + α1(st )rtminus1 σ (st )

radicht

Nonlinear drift RS GARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )

radicht

No drift RS CEVndashGARCH 0 σ (st )rρ(st )tminus1

Linear drift RS CEVndashGARCH α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEVndashGARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

Discretized jump-diffusion modelsNo drift JD CEV 0 σ r ρ

tminus1Linear drift JD CEV α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift JD CEV αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

No drift JD GARCH 0 σradic

htLinear drift JD GARCH α0 + α1rtminus1 σ

radicht

Nonlinear drift JD GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ

radicht

No drift JD CEVndashGARCH 0 σ r ρtminus1

radicht

Linear drift JD CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift JD CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

NOTE For consistency all models are estimated in a discrete time setting using the maximum likelihood method (MLE) The eight discretized single-factor diffusion models are nestedby the following specification rt minus 1 = αminus1rt minus 1 + α0 + α1 rt minus 1 + α2r 2

t minus 1 + σ r ρztt minus 1 where zt sim iid N(0 1) The six GARCH models are nested by the following specification rt =

αminus1rt minus 1 + α0 + α1rt minus 1 + α2r 2t minus 1 + σ r ρ

t minus 1

radicht zt where ht = β0 + ht minus 1 (β2 + β1r 2ρ

t minus 1z2t minus 1 ) and zt sim iid N(0 1) The nine regime-switching models are nested by the following specification

rt = αminus1(st )rt minus 1 + α0 (st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 22 + β2ht minus 1 et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows

a two-state Markov chain with the transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2 The nine discretized jump-diffusion models are nested by the following specifica-tion rt = αminus1rt minus 1 + α0 + α1rt minus 1 + α2 r 2

t minus 1 + σ r ρt minus 1

radicht zt + J(microγ 2 )πt (qt ) where ht = β0 + β1[rt minus 1 minus E(rt minus 1 |rt minus 2 )]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt ) is Bernoulli(qt ) with

qt = (1 + exp(minusc minus drt minus 1 ))minus1

we assume that the transition probability of st depends on theonce-lagged spot rate level

Pr(st = l|stminus1 = l ) = 1

1 + exp(minuscl minus dlrtminus1)

Table 1 lists a variety of regime-switching models all ofwhich are nested by the following specification

rt = αminus1(st)rtminus1 + α0(st) + α1(st)rtminus1

+ α2(st)r2tminus1 + σ(st)r

ρ(st)tminus1

radichtzt

ht = β0 + β1Eet|rtminus2 stminus22 + β2htminus1

et = [rtminus1 minus E(rtminus1|rtminus2 stminus1)]σ(stminus1)

zt sim iid N(01)

(3)

As in the aforementioned models we consider three specifica-tions of the conditional mean zero linear and nonlinear driftWe also consider three specifications of the conditional vari-ance CEV GARCH and combined CEVndashGARCH Thus wehave a total of nine regime-switching models

Although Gray (1996) removed the path-dependent natureof GARCH models by averaging the conditional and un-conditional variances over regimes at every time point weuse the same GARCH specification across different regimesConfirming results of Ang and Bekaert (2002) we find a veryunstable estimation of Grayrsquos (1996) model In contrast ourspecification turns out to have much better convergence prop-erties Unlike many previous studies that set the elasticityparameter equal to 5 we allow it to be regime-dependent and

462 Journal of Business amp Economic Statistics October 2004

estimate it from the data For identification we set the diffusionconstant σ(st) = 1 for st = 1 in the regime-switching modelswith GARCH effect

The conditional density of the interest rate rt in a regime-switching model is

p(rt|Itminus1) =2sum

l=1

p(rt|st = l Itminus1)Pr(st = l|Itminus1)

where the ex ante probability that the data are generated fromregime l at t Pr(st = l|Itminus1) can be obtained using Bayesrsquos rulevia a recursive procedure described by Hamilton (1989) Theconditional density of regime-switching models is a mixture oftwo normal distributions which can generate unimodal or bi-modal distributions and allows for great flexibility in modelingskewness kurtosis and heavy tails

34 Jump-Diffusion Models

There are compelling economic and statistical reasons toaccount for discontinuity in interest rate dynamics Various eco-nomic shocks news announcements and government interven-tions in bond markets have pronounced effects on the behaviorof interest rates and tend to generate large jumps in interestrate data Statistically Das (2002) and Johannes (2003) amongothers have shown that diffusion models (even with stochasticvolatility) cannot generate the excessive leptokurtosis exhibitedby the changes of the spot rates and that jump-diffusion mod-els are a convenient way to generate excessive kurtosis or moregenerally heavy tails

We consider a class of discretized jump-diffusion mod-els listed Table 1 As in the aforementioned models weconsider zero linear and nonlinear drift specifications andCEV GARCH and combined CEVndashGARCH specifications forvolatility The nine different jump-diffusion models are nestedby the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt + J(microγ 2)πt(qt)

ht = β0 + β1[rtminus1 minus E(rtminus1|rtminus2)]2 + β2htminus1

zt sim iid N(01)

πt(qt) sim independent Bernoulli(qt)

J sim N(microγ 2)

(4)

where J is the jump size and qt is the jump probability withqt = 1

1+exp(minuscminusdrtminus1)

The conditional density of the foregoing jump-diffusionmodels can be written as

f (rt|rtminus1)

= (1 minus qt)1radic

2πσ 2(rtminus1)exp

minus[rt minus micro(rtminus1)]2

2σ 2(rtminus1)

+ qt1radic

2π[σ 2(rtminus1) + σ 2J ]

times exp

minus[rt minus micro(rtminus1) minus micro]2

2[σ 2(rtminus1) + γ 2]

where micro(rtminus1) and σ 2(rtminus1) are the conditional mean and vari-ance of the diffusion part in (4) Similar to the regime-switchingmodels the conditional density of the jump-diffusion mod-els is also a mixture of two normal distributions Thus thesetwo classes of models represent two alternative approachesto generating excess kurtosis and heavy tails although theregime-switching models have more sophisticated specifica-tions For example in (3) all drift parameters are regime-dependent whereas in (4) only the intercept term is differentin the conditional mean and variance For the regime-switchingmodels in (3) the state probabilities evolve according to a tran-sition matrix with updated priors at every time point Thus itcan capture both very persistent and transient regime shifts Forthe jump-diffusion models in (4) the state probabilities are as-sumed to depend on the past interest rate level

The in-sample performances of the four classes of modelsthat we study have been individually studied in the literatureHowever no one has yet attempted a systematic evaluation ofall of these models in a unified setup particularly in the out-of-sample context because of the difficulty in density forecastevaluation and because of different nonnested specificationsbetween existing classes of models In the next section we com-pare the relative performance of these four classes of modelsin out-of-sample density forecast using the evaluation methoddescribed in Section 2 This comparison can provide valuableinformation about the relative strengths and weaknesses of eachmodel and should be important for many applications

4 MODEL ESTIMATION ANDINndashSAMPLE PERFORMANCE

41 Data and Estimation Method

In modeling spot interest rate dynamics yields on short-termdebts are often used as proxies for the unobservable instan-taneous risk-free rate These include the 1-month T-bill ratesused by Gray (1996) and Chan Karolyi Longstaff and Sanders(CKLS) (1992) the 3-month T-bill rates used by Stanton (1997)and Andersen and Lund (1997) the 7-day Eurodollar rates usedby Ait-Sahalia (1996) and Hong and Li (2004) and the Fedfunds rates used by Conley Hansen Luttmer and Scheinkman(1997) and Das (2002) In our study we follow CKLS (1992)and use the daily 1-month T-bill rates from June 14 1961 to De-cember 29 2000 with a total of 9868 observations The dataare extracted from the CRSP bond file using the midpoint ofquoted bid and ask prices of the T-bills with a remaining ma-turity that is closest to 1 month (30 calendar days) from thecurrent date Most of these T-bills have maturities of 6 monthsor 1 year when first issued and all T-bills used in our study havea remaining maturity between 27 and 33 days We then com-pute the annualized continuously compounded yield based onthe average of the ask and bid prices

Figure 1 plots the level and change series of the daily1-month T-bill rates as well as their histograms There isobvious persistent volatility clustering and in general thevolatility is higher at a higher level of the interest rate (egthe 1979ndash1982 period) The marginal distribution of the inter-est rate level is skewed to the right with a long right tail Mostdaily changes in the interest rate level are very small giving a

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 463

Figure 1 Daily 1-Month T-Bill Rates Between June 14 1961 and December 29 2000 This figure plots the level and change series of the daily1-month T-bill rates as well as their histograms The data are extracted from the CRSP bond file using the midpoint of quoted bid and ask prices ofthe T-bills with a remaining maturity that is closest to 1 month (30 calendar days) from the current date

sharp probability peak around 0 Daily changes of other spot in-terest rates such as 3-month T-bill and 7-day Eurodollar ratesalso exhibit a high peak around 0 But the peak is much lesspronounced for weekly changes in spot rates This togetherwith the long (right) tail implies excess kurtosis a stylized factthat has motivated the use of jump models in the literature (egDas 2002 Johannes 2003)

We divide our data into two subsamples The first fromJune 14 1961 to February 22 1991 (with a total of 7400 ob-servations) is a sample used to estimate model parameters thesecond from February 23 1991 to December 29 2000 (witha total of 2467 observations) is a prediction sample used toevaluate out-of-sample density forecasts We first consider thein-sample performance of the various models listed in Table 1all of which are estimated using the MLE method The opti-mization algorithm is the well-known BHHH with STEPBT forstep length calculation and is implemented via the constrainedoptimization code in GAUSS Windows Version 50 The op-timization tolerance level is set such that the gradients of theparameters are less than or equal to 10minus6

Tables 2ndash5 report parameter estimates of four classes of spotrate models using daily 1-month T-bill rates from June 14 1961to February 22 1991 with a total of 7400 observations To ac-count for the special period between October 1979 and Septem-ber 1982 we introduce dummy variables to the drift volatilityand elasticity parameters that is αD σD and ρD equal 0 out-side of the 1979ndash1982 period We also introduce a dummyvariable D87 in the interest rate level to account for the effectof 1987 stock market crash D87 equals 0 except for the weekright after the stock market crash Estimated robust standard er-rors are reported in parentheses

42 In-Sample Evidence

We first discuss the in-sample performance of models withineach class and then compare their performance across dif-ferent classes Many previous studies (eg Bliss and Smith

1998) have shown that the period between October 1979 andSeptember 1982 which coincides with the Federal Reserve ex-periment has a big impact on model parameter estimates Toaccount for the special feature of this period we introducedummy variables in the drift volatility and elasticity parame-ters for this period We also introduce a dummy variable in theinterest rate level for the week right after 1987 stock marketcrash Brenner et al (1996) showed that the decline of the in-terest rates during this week is too dramatic to be captured bymost standard models

Table 2 reports parameter estimates with estimated robuststandard errors and log-likelihood values for discretized single-factor diffusion models The estimates of the drift parametersof Vasicek CIR and CKLS models all suggest mean-reversionin the conditional mean with an estimated 6 long-run meanFor other models such as the random walk log-normal andnonlinear drift models most drift parameters are not signifi-cant A comparison of the pure CEV CKLS and Ait-Sahalia(1996) nonlinear drift models indicates that the incrementalcontribution of nonlinear drift is very marginal which is notsurprising given that the drift parameter estimates in the non-linear drift model are mostly insignificant On the other handthere is also a clear evidence of level effect all estimates ofthe elasticity parameter are significant Unlike previous stud-ies (eg CKLS 1992) which found an estimated elasticity pa-rameter close to 15 our estimate is about 25 (70) outside(inside) the 1979ndash1982 period Our results confirm previousfindings of Brenner et al (1996) Andersen and Lund (1997)Bliss and Smith (1998) and Koedijk Nissen Schotman andWolff (1997) that the estimate of the elasticity parameter isvery sensitive to the choice of interest rate data data frequencysample periods and specifications of the volatility functionThe 1979ndash1982 period dummies suggest that the drift does notbehave very differently (ie αD is not significant) whereasvolatility is significantly higher and depends more heavily on

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 461

Table 1 Spot Rate Models Considered for Density Forecast Evaluation

Model Mean Volatility

Discretized single-factor diffusion modelsRandom walk α0 σLog-normal α1rtminus1 σ rtminus1Dothan 0 σ rtminus1Pure CEV 0 σ r ρ

tminus1Vasicek α0 + α1rtminus1 σ

CIR α0 + α1rtminus1 σ r 5tminus1CKLS α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

GARCH modelsNo drift GARCH 0 σ

radicht

Linear drift GARCH α0 + α1rtminus1 σradic

htNonlinear drift GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σradic

htNo drift CEVndashGARCH 0 σ r ρ

tminus1

radicht

Linear drift CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

Markov regime-switching models

No drift RS CEV 0 σ (st )rρ(st )tminus1

Linear drift RS CEV α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEV αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

No drift RS GARCH 0 σ (st )radic

htLinear drift RS GARCH α0(st ) + α1(st )rtminus1 σ (st )

radicht

Nonlinear drift RS GARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )

radicht

No drift RS CEVndashGARCH 0 σ (st )rρ(st )tminus1

Linear drift RS CEVndashGARCH α0(st ) + α1(st )rtminus1 σ (st )rρ(st )tminus1

Nonlinear drift RS CEVndashGARCH αminus1(st )rtminus1 + α0(st ) + α1(st )rtminus1 + α2(st )r2tminus1 σ (st )r

ρ(st )tminus1

Discretized jump-diffusion modelsNo drift JD CEV 0 σ r ρ

tminus1Linear drift JD CEV α0 + α1rtminus1 σ r ρ

tminus1Nonlinear drift JD CEV αminus1rtminus1 + α0 + α1rtminus1 + α2r2

tminus1 σ r ρtminus1

No drift JD GARCH 0 σradic

htLinear drift JD GARCH α0 + α1rtminus1 σ

radicht

Nonlinear drift JD GARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ

radicht

No drift JD CEVndashGARCH 0 σ r ρtminus1

radicht

Linear drift JD CEVndashGARCH α0 + α1rtminus1 σ r ρtminus1

radicht

Nonlinear drift JD CEVndashGARCH αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1 σ r ρ

tminus1

radicht

NOTE For consistency all models are estimated in a discrete time setting using the maximum likelihood method (MLE) The eight discretized single-factor diffusion models are nestedby the following specification rt minus 1 = αminus1rt minus 1 + α0 + α1 rt minus 1 + α2r 2

t minus 1 + σ r ρztt minus 1 where zt sim iid N(0 1) The six GARCH models are nested by the following specification rt =

αminus1rt minus 1 + α0 + α1rt minus 1 + α2r 2t minus 1 + σ r ρ

t minus 1

radicht zt where ht = β0 + ht minus 1 (β2 + β1r 2ρ

t minus 1z2t minus 1 ) and zt sim iid N(0 1) The nine regime-switching models are nested by the following specification

rt = αminus1(st )rt minus 1 + α0 (st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 22 + β2ht minus 1 et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows

a two-state Markov chain with the transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2 The nine discretized jump-diffusion models are nested by the following specifica-tion rt = αminus1rt minus 1 + α0 + α1rt minus 1 + α2 r 2

t minus 1 + σ r ρt minus 1

radicht zt + J(microγ 2 )πt (qt ) where ht = β0 + β1[rt minus 1 minus E(rt minus 1 |rt minus 2 )]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt ) is Bernoulli(qt ) with

qt = (1 + exp(minusc minus drt minus 1 ))minus1

we assume that the transition probability of st depends on theonce-lagged spot rate level

Pr(st = l|stminus1 = l ) = 1

1 + exp(minuscl minus dlrtminus1)

Table 1 lists a variety of regime-switching models all ofwhich are nested by the following specification

rt = αminus1(st)rtminus1 + α0(st) + α1(st)rtminus1

+ α2(st)r2tminus1 + σ(st)r

ρ(st)tminus1

radichtzt

ht = β0 + β1Eet|rtminus2 stminus22 + β2htminus1

et = [rtminus1 minus E(rtminus1|rtminus2 stminus1)]σ(stminus1)

zt sim iid N(01)

(3)

As in the aforementioned models we consider three specifica-tions of the conditional mean zero linear and nonlinear driftWe also consider three specifications of the conditional vari-ance CEV GARCH and combined CEVndashGARCH Thus wehave a total of nine regime-switching models

Although Gray (1996) removed the path-dependent natureof GARCH models by averaging the conditional and un-conditional variances over regimes at every time point weuse the same GARCH specification across different regimesConfirming results of Ang and Bekaert (2002) we find a veryunstable estimation of Grayrsquos (1996) model In contrast ourspecification turns out to have much better convergence prop-erties Unlike many previous studies that set the elasticityparameter equal to 5 we allow it to be regime-dependent and

462 Journal of Business amp Economic Statistics October 2004

estimate it from the data For identification we set the diffusionconstant σ(st) = 1 for st = 1 in the regime-switching modelswith GARCH effect

The conditional density of the interest rate rt in a regime-switching model is

p(rt|Itminus1) =2sum

l=1

p(rt|st = l Itminus1)Pr(st = l|Itminus1)

where the ex ante probability that the data are generated fromregime l at t Pr(st = l|Itminus1) can be obtained using Bayesrsquos rulevia a recursive procedure described by Hamilton (1989) Theconditional density of regime-switching models is a mixture oftwo normal distributions which can generate unimodal or bi-modal distributions and allows for great flexibility in modelingskewness kurtosis and heavy tails

34 Jump-Diffusion Models

There are compelling economic and statistical reasons toaccount for discontinuity in interest rate dynamics Various eco-nomic shocks news announcements and government interven-tions in bond markets have pronounced effects on the behaviorof interest rates and tend to generate large jumps in interestrate data Statistically Das (2002) and Johannes (2003) amongothers have shown that diffusion models (even with stochasticvolatility) cannot generate the excessive leptokurtosis exhibitedby the changes of the spot rates and that jump-diffusion mod-els are a convenient way to generate excessive kurtosis or moregenerally heavy tails

We consider a class of discretized jump-diffusion mod-els listed Table 1 As in the aforementioned models weconsider zero linear and nonlinear drift specifications andCEV GARCH and combined CEVndashGARCH specifications forvolatility The nine different jump-diffusion models are nestedby the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt + J(microγ 2)πt(qt)

ht = β0 + β1[rtminus1 minus E(rtminus1|rtminus2)]2 + β2htminus1

zt sim iid N(01)

πt(qt) sim independent Bernoulli(qt)

J sim N(microγ 2)

(4)

where J is the jump size and qt is the jump probability withqt = 1

1+exp(minuscminusdrtminus1)

The conditional density of the foregoing jump-diffusionmodels can be written as

f (rt|rtminus1)

= (1 minus qt)1radic

2πσ 2(rtminus1)exp

minus[rt minus micro(rtminus1)]2

2σ 2(rtminus1)

+ qt1radic

2π[σ 2(rtminus1) + σ 2J ]

times exp

minus[rt minus micro(rtminus1) minus micro]2

2[σ 2(rtminus1) + γ 2]

where micro(rtminus1) and σ 2(rtminus1) are the conditional mean and vari-ance of the diffusion part in (4) Similar to the regime-switchingmodels the conditional density of the jump-diffusion mod-els is also a mixture of two normal distributions Thus thesetwo classes of models represent two alternative approachesto generating excess kurtosis and heavy tails although theregime-switching models have more sophisticated specifica-tions For example in (3) all drift parameters are regime-dependent whereas in (4) only the intercept term is differentin the conditional mean and variance For the regime-switchingmodels in (3) the state probabilities evolve according to a tran-sition matrix with updated priors at every time point Thus itcan capture both very persistent and transient regime shifts Forthe jump-diffusion models in (4) the state probabilities are as-sumed to depend on the past interest rate level

The in-sample performances of the four classes of modelsthat we study have been individually studied in the literatureHowever no one has yet attempted a systematic evaluation ofall of these models in a unified setup particularly in the out-of-sample context because of the difficulty in density forecastevaluation and because of different nonnested specificationsbetween existing classes of models In the next section we com-pare the relative performance of these four classes of modelsin out-of-sample density forecast using the evaluation methoddescribed in Section 2 This comparison can provide valuableinformation about the relative strengths and weaknesses of eachmodel and should be important for many applications

4 MODEL ESTIMATION ANDINndashSAMPLE PERFORMANCE

41 Data and Estimation Method

In modeling spot interest rate dynamics yields on short-termdebts are often used as proxies for the unobservable instan-taneous risk-free rate These include the 1-month T-bill ratesused by Gray (1996) and Chan Karolyi Longstaff and Sanders(CKLS) (1992) the 3-month T-bill rates used by Stanton (1997)and Andersen and Lund (1997) the 7-day Eurodollar rates usedby Ait-Sahalia (1996) and Hong and Li (2004) and the Fedfunds rates used by Conley Hansen Luttmer and Scheinkman(1997) and Das (2002) In our study we follow CKLS (1992)and use the daily 1-month T-bill rates from June 14 1961 to De-cember 29 2000 with a total of 9868 observations The dataare extracted from the CRSP bond file using the midpoint ofquoted bid and ask prices of the T-bills with a remaining ma-turity that is closest to 1 month (30 calendar days) from thecurrent date Most of these T-bills have maturities of 6 monthsor 1 year when first issued and all T-bills used in our study havea remaining maturity between 27 and 33 days We then com-pute the annualized continuously compounded yield based onthe average of the ask and bid prices

Figure 1 plots the level and change series of the daily1-month T-bill rates as well as their histograms There isobvious persistent volatility clustering and in general thevolatility is higher at a higher level of the interest rate (egthe 1979ndash1982 period) The marginal distribution of the inter-est rate level is skewed to the right with a long right tail Mostdaily changes in the interest rate level are very small giving a

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 463

Figure 1 Daily 1-Month T-Bill Rates Between June 14 1961 and December 29 2000 This figure plots the level and change series of the daily1-month T-bill rates as well as their histograms The data are extracted from the CRSP bond file using the midpoint of quoted bid and ask prices ofthe T-bills with a remaining maturity that is closest to 1 month (30 calendar days) from the current date

sharp probability peak around 0 Daily changes of other spot in-terest rates such as 3-month T-bill and 7-day Eurodollar ratesalso exhibit a high peak around 0 But the peak is much lesspronounced for weekly changes in spot rates This togetherwith the long (right) tail implies excess kurtosis a stylized factthat has motivated the use of jump models in the literature (egDas 2002 Johannes 2003)

We divide our data into two subsamples The first fromJune 14 1961 to February 22 1991 (with a total of 7400 ob-servations) is a sample used to estimate model parameters thesecond from February 23 1991 to December 29 2000 (witha total of 2467 observations) is a prediction sample used toevaluate out-of-sample density forecasts We first consider thein-sample performance of the various models listed in Table 1all of which are estimated using the MLE method The opti-mization algorithm is the well-known BHHH with STEPBT forstep length calculation and is implemented via the constrainedoptimization code in GAUSS Windows Version 50 The op-timization tolerance level is set such that the gradients of theparameters are less than or equal to 10minus6

Tables 2ndash5 report parameter estimates of four classes of spotrate models using daily 1-month T-bill rates from June 14 1961to February 22 1991 with a total of 7400 observations To ac-count for the special period between October 1979 and Septem-ber 1982 we introduce dummy variables to the drift volatilityand elasticity parameters that is αD σD and ρD equal 0 out-side of the 1979ndash1982 period We also introduce a dummyvariable D87 in the interest rate level to account for the effectof 1987 stock market crash D87 equals 0 except for the weekright after the stock market crash Estimated robust standard er-rors are reported in parentheses

42 In-Sample Evidence

We first discuss the in-sample performance of models withineach class and then compare their performance across dif-ferent classes Many previous studies (eg Bliss and Smith

1998) have shown that the period between October 1979 andSeptember 1982 which coincides with the Federal Reserve ex-periment has a big impact on model parameter estimates Toaccount for the special feature of this period we introducedummy variables in the drift volatility and elasticity parame-ters for this period We also introduce a dummy variable in theinterest rate level for the week right after 1987 stock marketcrash Brenner et al (1996) showed that the decline of the in-terest rates during this week is too dramatic to be captured bymost standard models

Table 2 reports parameter estimates with estimated robuststandard errors and log-likelihood values for discretized single-factor diffusion models The estimates of the drift parametersof Vasicek CIR and CKLS models all suggest mean-reversionin the conditional mean with an estimated 6 long-run meanFor other models such as the random walk log-normal andnonlinear drift models most drift parameters are not signifi-cant A comparison of the pure CEV CKLS and Ait-Sahalia(1996) nonlinear drift models indicates that the incrementalcontribution of nonlinear drift is very marginal which is notsurprising given that the drift parameter estimates in the non-linear drift model are mostly insignificant On the other handthere is also a clear evidence of level effect all estimates ofthe elasticity parameter are significant Unlike previous stud-ies (eg CKLS 1992) which found an estimated elasticity pa-rameter close to 15 our estimate is about 25 (70) outside(inside) the 1979ndash1982 period Our results confirm previousfindings of Brenner et al (1996) Andersen and Lund (1997)Bliss and Smith (1998) and Koedijk Nissen Schotman andWolff (1997) that the estimate of the elasticity parameter isvery sensitive to the choice of interest rate data data frequencysample periods and specifications of the volatility functionThe 1979ndash1982 period dummies suggest that the drift does notbehave very differently (ie αD is not significant) whereasvolatility is significantly higher and depends more heavily on

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

462 Journal of Business amp Economic Statistics October 2004

estimate it from the data For identification we set the diffusionconstant σ(st) = 1 for st = 1 in the regime-switching modelswith GARCH effect

The conditional density of the interest rate rt in a regime-switching model is

p(rt|Itminus1) =2sum

l=1

p(rt|st = l Itminus1)Pr(st = l|Itminus1)

where the ex ante probability that the data are generated fromregime l at t Pr(st = l|Itminus1) can be obtained using Bayesrsquos rulevia a recursive procedure described by Hamilton (1989) Theconditional density of regime-switching models is a mixture oftwo normal distributions which can generate unimodal or bi-modal distributions and allows for great flexibility in modelingskewness kurtosis and heavy tails

34 Jump-Diffusion Models

There are compelling economic and statistical reasons toaccount for discontinuity in interest rate dynamics Various eco-nomic shocks news announcements and government interven-tions in bond markets have pronounced effects on the behaviorof interest rates and tend to generate large jumps in interestrate data Statistically Das (2002) and Johannes (2003) amongothers have shown that diffusion models (even with stochasticvolatility) cannot generate the excessive leptokurtosis exhibitedby the changes of the spot rates and that jump-diffusion mod-els are a convenient way to generate excessive kurtosis or moregenerally heavy tails

We consider a class of discretized jump-diffusion mod-els listed Table 1 As in the aforementioned models weconsider zero linear and nonlinear drift specifications andCEV GARCH and combined CEVndashGARCH specifications forvolatility The nine different jump-diffusion models are nestedby the following specification

rt = αminus1rtminus1 + α0 + α1rtminus1 + α2r2tminus1

+ σ r ρtminus1

radichtzt + J(microγ 2)πt(qt)

ht = β0 + β1[rtminus1 minus E(rtminus1|rtminus2)]2 + β2htminus1

zt sim iid N(01)

πt(qt) sim independent Bernoulli(qt)

J sim N(microγ 2)

(4)

where J is the jump size and qt is the jump probability withqt = 1

1+exp(minuscminusdrtminus1)

The conditional density of the foregoing jump-diffusionmodels can be written as

f (rt|rtminus1)

= (1 minus qt)1radic

2πσ 2(rtminus1)exp

minus[rt minus micro(rtminus1)]2

2σ 2(rtminus1)

+ qt1radic

2π[σ 2(rtminus1) + σ 2J ]

times exp

minus[rt minus micro(rtminus1) minus micro]2

2[σ 2(rtminus1) + γ 2]

where micro(rtminus1) and σ 2(rtminus1) are the conditional mean and vari-ance of the diffusion part in (4) Similar to the regime-switchingmodels the conditional density of the jump-diffusion mod-els is also a mixture of two normal distributions Thus thesetwo classes of models represent two alternative approachesto generating excess kurtosis and heavy tails although theregime-switching models have more sophisticated specifica-tions For example in (3) all drift parameters are regime-dependent whereas in (4) only the intercept term is differentin the conditional mean and variance For the regime-switchingmodels in (3) the state probabilities evolve according to a tran-sition matrix with updated priors at every time point Thus itcan capture both very persistent and transient regime shifts Forthe jump-diffusion models in (4) the state probabilities are as-sumed to depend on the past interest rate level

The in-sample performances of the four classes of modelsthat we study have been individually studied in the literatureHowever no one has yet attempted a systematic evaluation ofall of these models in a unified setup particularly in the out-of-sample context because of the difficulty in density forecastevaluation and because of different nonnested specificationsbetween existing classes of models In the next section we com-pare the relative performance of these four classes of modelsin out-of-sample density forecast using the evaluation methoddescribed in Section 2 This comparison can provide valuableinformation about the relative strengths and weaknesses of eachmodel and should be important for many applications

4 MODEL ESTIMATION ANDINndashSAMPLE PERFORMANCE

41 Data and Estimation Method

In modeling spot interest rate dynamics yields on short-termdebts are often used as proxies for the unobservable instan-taneous risk-free rate These include the 1-month T-bill ratesused by Gray (1996) and Chan Karolyi Longstaff and Sanders(CKLS) (1992) the 3-month T-bill rates used by Stanton (1997)and Andersen and Lund (1997) the 7-day Eurodollar rates usedby Ait-Sahalia (1996) and Hong and Li (2004) and the Fedfunds rates used by Conley Hansen Luttmer and Scheinkman(1997) and Das (2002) In our study we follow CKLS (1992)and use the daily 1-month T-bill rates from June 14 1961 to De-cember 29 2000 with a total of 9868 observations The dataare extracted from the CRSP bond file using the midpoint ofquoted bid and ask prices of the T-bills with a remaining ma-turity that is closest to 1 month (30 calendar days) from thecurrent date Most of these T-bills have maturities of 6 monthsor 1 year when first issued and all T-bills used in our study havea remaining maturity between 27 and 33 days We then com-pute the annualized continuously compounded yield based onthe average of the ask and bid prices

Figure 1 plots the level and change series of the daily1-month T-bill rates as well as their histograms There isobvious persistent volatility clustering and in general thevolatility is higher at a higher level of the interest rate (egthe 1979ndash1982 period) The marginal distribution of the inter-est rate level is skewed to the right with a long right tail Mostdaily changes in the interest rate level are very small giving a

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 463

Figure 1 Daily 1-Month T-Bill Rates Between June 14 1961 and December 29 2000 This figure plots the level and change series of the daily1-month T-bill rates as well as their histograms The data are extracted from the CRSP bond file using the midpoint of quoted bid and ask prices ofthe T-bills with a remaining maturity that is closest to 1 month (30 calendar days) from the current date

sharp probability peak around 0 Daily changes of other spot in-terest rates such as 3-month T-bill and 7-day Eurodollar ratesalso exhibit a high peak around 0 But the peak is much lesspronounced for weekly changes in spot rates This togetherwith the long (right) tail implies excess kurtosis a stylized factthat has motivated the use of jump models in the literature (egDas 2002 Johannes 2003)

We divide our data into two subsamples The first fromJune 14 1961 to February 22 1991 (with a total of 7400 ob-servations) is a sample used to estimate model parameters thesecond from February 23 1991 to December 29 2000 (witha total of 2467 observations) is a prediction sample used toevaluate out-of-sample density forecasts We first consider thein-sample performance of the various models listed in Table 1all of which are estimated using the MLE method The opti-mization algorithm is the well-known BHHH with STEPBT forstep length calculation and is implemented via the constrainedoptimization code in GAUSS Windows Version 50 The op-timization tolerance level is set such that the gradients of theparameters are less than or equal to 10minus6

Tables 2ndash5 report parameter estimates of four classes of spotrate models using daily 1-month T-bill rates from June 14 1961to February 22 1991 with a total of 7400 observations To ac-count for the special period between October 1979 and Septem-ber 1982 we introduce dummy variables to the drift volatilityand elasticity parameters that is αD σD and ρD equal 0 out-side of the 1979ndash1982 period We also introduce a dummyvariable D87 in the interest rate level to account for the effectof 1987 stock market crash D87 equals 0 except for the weekright after the stock market crash Estimated robust standard er-rors are reported in parentheses

42 In-Sample Evidence

We first discuss the in-sample performance of models withineach class and then compare their performance across dif-ferent classes Many previous studies (eg Bliss and Smith

1998) have shown that the period between October 1979 andSeptember 1982 which coincides with the Federal Reserve ex-periment has a big impact on model parameter estimates Toaccount for the special feature of this period we introducedummy variables in the drift volatility and elasticity parame-ters for this period We also introduce a dummy variable in theinterest rate level for the week right after 1987 stock marketcrash Brenner et al (1996) showed that the decline of the in-terest rates during this week is too dramatic to be captured bymost standard models

Table 2 reports parameter estimates with estimated robuststandard errors and log-likelihood values for discretized single-factor diffusion models The estimates of the drift parametersof Vasicek CIR and CKLS models all suggest mean-reversionin the conditional mean with an estimated 6 long-run meanFor other models such as the random walk log-normal andnonlinear drift models most drift parameters are not signifi-cant A comparison of the pure CEV CKLS and Ait-Sahalia(1996) nonlinear drift models indicates that the incrementalcontribution of nonlinear drift is very marginal which is notsurprising given that the drift parameter estimates in the non-linear drift model are mostly insignificant On the other handthere is also a clear evidence of level effect all estimates ofthe elasticity parameter are significant Unlike previous stud-ies (eg CKLS 1992) which found an estimated elasticity pa-rameter close to 15 our estimate is about 25 (70) outside(inside) the 1979ndash1982 period Our results confirm previousfindings of Brenner et al (1996) Andersen and Lund (1997)Bliss and Smith (1998) and Koedijk Nissen Schotman andWolff (1997) that the estimate of the elasticity parameter isvery sensitive to the choice of interest rate data data frequencysample periods and specifications of the volatility functionThe 1979ndash1982 period dummies suggest that the drift does notbehave very differently (ie αD is not significant) whereasvolatility is significantly higher and depends more heavily on

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 463

Figure 1 Daily 1-Month T-Bill Rates Between June 14 1961 and December 29 2000 This figure plots the level and change series of the daily1-month T-bill rates as well as their histograms The data are extracted from the CRSP bond file using the midpoint of quoted bid and ask prices ofthe T-bills with a remaining maturity that is closest to 1 month (30 calendar days) from the current date

sharp probability peak around 0 Daily changes of other spot in-terest rates such as 3-month T-bill and 7-day Eurodollar ratesalso exhibit a high peak around 0 But the peak is much lesspronounced for weekly changes in spot rates This togetherwith the long (right) tail implies excess kurtosis a stylized factthat has motivated the use of jump models in the literature (egDas 2002 Johannes 2003)

We divide our data into two subsamples The first fromJune 14 1961 to February 22 1991 (with a total of 7400 ob-servations) is a sample used to estimate model parameters thesecond from February 23 1991 to December 29 2000 (witha total of 2467 observations) is a prediction sample used toevaluate out-of-sample density forecasts We first consider thein-sample performance of the various models listed in Table 1all of which are estimated using the MLE method The opti-mization algorithm is the well-known BHHH with STEPBT forstep length calculation and is implemented via the constrainedoptimization code in GAUSS Windows Version 50 The op-timization tolerance level is set such that the gradients of theparameters are less than or equal to 10minus6

Tables 2ndash5 report parameter estimates of four classes of spotrate models using daily 1-month T-bill rates from June 14 1961to February 22 1991 with a total of 7400 observations To ac-count for the special period between October 1979 and Septem-ber 1982 we introduce dummy variables to the drift volatilityand elasticity parameters that is αD σD and ρD equal 0 out-side of the 1979ndash1982 period We also introduce a dummyvariable D87 in the interest rate level to account for the effectof 1987 stock market crash D87 equals 0 except for the weekright after the stock market crash Estimated robust standard er-rors are reported in parentheses

42 In-Sample Evidence

We first discuss the in-sample performance of models withineach class and then compare their performance across dif-ferent classes Many previous studies (eg Bliss and Smith

1998) have shown that the period between October 1979 andSeptember 1982 which coincides with the Federal Reserve ex-periment has a big impact on model parameter estimates Toaccount for the special feature of this period we introducedummy variables in the drift volatility and elasticity parame-ters for this period We also introduce a dummy variable in theinterest rate level for the week right after 1987 stock marketcrash Brenner et al (1996) showed that the decline of the in-terest rates during this week is too dramatic to be captured bymost standard models

Table 2 reports parameter estimates with estimated robuststandard errors and log-likelihood values for discretized single-factor diffusion models The estimates of the drift parametersof Vasicek CIR and CKLS models all suggest mean-reversionin the conditional mean with an estimated 6 long-run meanFor other models such as the random walk log-normal andnonlinear drift models most drift parameters are not signifi-cant A comparison of the pure CEV CKLS and Ait-Sahalia(1996) nonlinear drift models indicates that the incrementalcontribution of nonlinear drift is very marginal which is notsurprising given that the drift parameter estimates in the non-linear drift model are mostly insignificant On the other handthere is also a clear evidence of level effect all estimates ofthe elasticity parameter are significant Unlike previous stud-ies (eg CKLS 1992) which found an estimated elasticity pa-rameter close to 15 our estimate is about 25 (70) outside(inside) the 1979ndash1982 period Our results confirm previousfindings of Brenner et al (1996) Andersen and Lund (1997)Bliss and Smith (1998) and Koedijk Nissen Schotman andWolff (1997) that the estimate of the elasticity parameter isvery sensitive to the choice of interest rate data data frequencysample periods and specifications of the volatility functionThe 1979ndash1982 period dummies suggest that the drift does notbehave very differently (ie αD is not significant) whereasvolatility is significantly higher and depends more heavily on

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

464 Journal of Business amp Economic Statistics October 2004

Table 2 Parameter Estimates for the Single-Factor Diffusion Models

NonlinearParameters RW Log-normal Dothan Pure CEV Vasicek CIR CKLS drift

αminus1 0289(0751)

α0 0012 0196 0190 0199 minus0098(0019) (0056) (0049) (0053) (0485)

α1 0006 minus0033 minus0032 minus0034 0045(0004) (0010) (0009) (0009) (0095)

α2 minus0006(0006)

σ 1523 0304 0304 1011 1522 0658 1007 1007(0013) (0003) (0003) (0041) (0013) (0006) (0041) (0041)

ρ 2437 2460 2460(0238) (0238) (0238)

αD minus0047 0196 0154 0233 0266 0402(0157) (0186) (0168) (0176) (0168) (0202)

σD 2768 0175 0175 2764 0712(0112) (0013) (0013) (0112) (0036)

ρD 3692 3684 3680(0132) (0132) (0132)

D87 minus3988 minus2107 minus2079 minus3549 minus4003 minus3101 minus3574 minus3582(0681) (0660) (0660) (0671) (0681) (0655) (0670) (0671)

Log-likelihood 26496 21870 21849 26545 26555 26720 26617 26625

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1rt minus 1 + α2r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1 zt + D87 wherezt sim iid N(0 1)

the interest rate level in this period (ie σD and ρD are signif-icantly positive) than in other periods For models with leveleffect we introduce the dummy only for elasticity parameter ρto avoid the offsetting effect of dummies in ρ and σ whichwould tend to generate unstable estimates The stock marketcrash dummy is significantly negative which is consistent withthe findings of Brenner et al (1996)

Estimation results of GARCH models in Table 3 showthat GARCH significantly improves the in-sample fit of dif-fusion models (ie the log-likelihood value increases from

about 2500 to about 5700) Previous studies such as thoseby Bali (2001) and Durham (2003) have also shown that forsingle-factor diffusion GARCH and continuous-time stochas-tic volatility models it is more important to correctly modelthe diffusion function than the drift function in fitting interestrate data This suggests that volatility clustering is an importantfeature of interest rates All estimates of GARCH parametersare overwhelmingly significant The sum of GARCH parame-ter estimates β1 + β2 is slightly larger (smaller) than 1 without(with) level effect Although the sum of GARCH parameters is

Table 3 Parameter Estimates for the GARCH Models

No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 1556 1716(0470) (0457)

α0 0053 minus1117 0063 minus1233(0025) (0323) (0025) (0311)

α1 minus0003 0263 minus0006 0290(0006) (0069) (0006) (0066)

α2 minus0018 minus0020(0004) (0004)

ρ 1709 1797 1926(0298) (0301) (0300)

β0 97Eminus05 86Eminus05 86Eminus05 89Eminus05 82Eminus05 82Eminus05(12Eminus05) (12Eminus05) (16Eminus05) (10Eminus05) (9Eminus05) (9Eminus05)

β1 1552 1544 1563 0896 0875 0851(0097) (0096) (0099) (0104) (0101) (0098)

β2 8707 8723 8712 8583 8582 8553(0065) (0064) (0065) (0074) (0076) (0078)

αD 0166 1119 0550 1317(0318) (0229) (0282) (0201)

σD 1368 1302 1017(0327) (0331) (0334)

ρD 0151 0064 minus0053(0134) (0146) (0142)

D87 minus5397 minus5426 minus5460 minus5288 minus5318 minus5352(0774) (0777) (0771) (0753) (0751) (0741)

Log-likelihood 56868 56995 57063 57011 57152 57255

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2r 2t minus 1 + (1 + σD )r (ρ + ρD )

t minus 1

radicht zt + D87 where

ht = β0 + ht minus 1(β2 + β1r 2ρt minus 1z2

t minus 1 ) and zt sim iid N(0 1)

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 465

Table 4 Parameter Estimates for the Markov Regime-Switching Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1(1) 0607 3321 2646(3569) (1701) (1605)

α0(1) 0973 1517 0457 minus1462 0450 minus1086(0259) (2027) (0107) (1009) (0105) (0989)

α1(1) minus0102 minus0288 minus0043 0274 minus0042 0209(0031) (0321) (0019) (0181) (0019) (0185)

α2(1) 0011 minus0015 minus0011(0014) (0010) (0011)

σ (1) 3361 3171 3106 1 1 1 1 1 1(0302) (0287) (0281)

ρ(1) 1021 1277 138 0 0 0 1193 1589 1670(044) (0446) (0448) (0452) (0460) (0487)

αminus1(2) minus0352 minus0267 minus0286(0329) (0242) (0183)

α0(2) minus0002 0141 minus0013 0062 minus0020 0062(0020) (0229) (0017) (0153) (0018) (0104)

α1(2) minus0006 minus0012 minus0008 minus0001 minus0007 0005(0004) (0048) (0004) (0031) (0004) (0020)

α2(2) minus0001 minus0002 minus0002(0003) (0002) (0001)

σ (2) 0143 0141 0138 2566 2567 2574 2769 3034 3051(0010) (0010) (0010) (006) (006) (006) (0275) (0303) (0311)

ρ(2) 8122 8179 8257 0 0 0 0844 0727 0800(0403) (0422) (0421) (0602) (0602) (0611)

β0 383Eminus4 318Eminus4 296Eminus4 321Eminus4 243Eminus4 225Eminus4(711Eminus5) (614Eminus5) (579Eminus5) (67Eminus4) (50Eminus4) (47Eminus4)

β1 0357 0336 0335 0265 0259 0252(004) (0039) (0038) (0057) (0055) (0053)

β2 9027 9066 9077 8972 8993 9001(009) (0089) (0086) (0099) (0098) (0094)

c1 3132 3001 3171 minus12515 minus12007 minus11293 minus11954 minus11429 minus10012(2734) (273) (2683) (3211) (3065) (3136) (3819) (3705) (3674)

d1 1081 107 1029 1303 1295 1192 1190 1193 0959(0391) (039) (0381) (044) (0426) (0447) (0594) (0582) (0593)

c2 40963 40342 40089 18864 17868 17633 18227 17299 16947(2386) (2393) (2313) (2048) (1947) (1909) (2092) (1983) (2009)

d2 minus2302 minus225 minus2225 minus0938 minus0806 minus0776 minus0906 minus0810 minus0757(0354) (0354) (0339) (0276) (0263) (0257) (0285) (0276) (0283)

Log-likelihood 64931 65079 65128 70790 71328 71382 70828 71386 71441

NOTE The models are nested by the following specification rt = αminus1 (st )rt minus 1 + α0(st ) + α1(st )rt minus 1 + α2(st )r 2t minus 1 + σ (st )r

ρ(st )t minus 1

radicht zt where ht = β0 + β1Eet |rt minus 2 st minus 2 2 + β2ht minus 1

et = [rt minus 1 minus E(rt minus 1 |rt minus 2 st minus 1 )]σ (st minus 1 ) zt sim iid N(0 1) and st follows a two-state Markov chain with transition probability Pr (st = l|st minus 1 = l ) = (1 + exp(minuscl minus dl rt minus 1))minus1 for l = 1 2

slightly larger than 1 it is still possible that the spot rate modelis strictly stationary (see Nelson 1991 for more discussion)We also find a significant level effect even in the presence ofGARCH but the estimated elasticity parameter is close to 20which is much smaller than that of single-factor diffusion mod-els The specification of conditional variance also affects the es-timation of drift parameters Unlike those in diffusion modelsmost drift parameter estimates in linear drift GARCH modelsare insignificant Among all GARCH models considered theone with nonlinear drift and level effect has the best in-sampleperformance In the 1979ndash1982 period the interest rate volatil-ity is significantly higher but its dependence on the interest ratelevel is not significantly different from other periods The 1987stock market crash dummy is again significantly negative

Parameter estimates of regime-switching models given inTable 4 show that the spot rate behaves quite differentlybetween two regimes Although the sum of GARCH parame-ters is slightly larger than 1 it is still possible that the spot ratemodel is strictly stationary (see Nelson 1991 for more discus-sion) For models with a linear drift in the first regime the spotrate has a high long-run mean (about 10) and exhibits strong

mean reversion (ie estimates of the mean-reversion speed pa-rameter are significantly negative) The spot rate in the secondregime behaves almost like a random walk because most driftparameter estimates are close to 0 and insignificant For mod-els with a nonlinear drift all drift parameters are insignificantVolatility in the first regime is much higher about three timesthat in the second regime Our estimates show that level ef-fect although significant in both regimes is much stronger inthe second regime in the absence of GARCH After includingGARCH the elasticity parameter estimate becomes insignif-icant in the second regime but remains the same in the firstregime Apparently regime switching helps capture volatilityclustering the sum of GARCH parameters β1 + β2 is smallerin regime-switching models than in pure GARCH models Theestimated transition probabilities of the Markov state variable st

show that the low-volatility regime is much more persistent thanthe high-volatility regime Our results suggest that the spot rateevolves like a random walk with low volatility most of time oc-casionally increasing and evolving with strong mean reversionand high volatility Overall the regime-switching model witha linear drift in each regime CEV and GARCH has the bestin-sample performance

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

466 Journal of Business amp Economic Statistics October 2004

Table 5 Parameter Estimates for the Jump-Diffusion Models

Linear drift Nonlinear drift No drift Linear drift Nonlinear drift No drift Linear drift Nonlinear driftParameters No drift CEV CEV CEV GARCH GARCH GARCH CEVndashGARCH CEVndashGARCH CEVndashGARCH

αminus1 0264 0702 0661(0293) (0384) (0380)

α0 minus0011 minus0340 0028 minus0506 0026 minus0486(0169) (0203) (0019) (0256) (0020) (0253)

α1 minus0007 0101 minus0013 0110 minus0012 0107(0004) (0043) (0004) (0053) (0004) (0052)

α2 minus0009 minus0008 minus0008(0003) (0003) (0003)

σ 0127 0126 0124(0008) (0008) (0008)

ρ 8936 8975 9045 0929 0852 0837(0390) (0397) (0400) (0518) (0510) (0512)

β0 87Eminus05 88Eminus05 87Eminus05 76Eminus5 77Eminus5 77Eminus5(12Eminus05) (12Eminus05) (12Eminus05) (12Eminus5) (12Eminus5) (12Eminus5)

β1 0631 0640 0635 0450 0472 0472(0076) (0073) (0073) (0101) (0102) (0102)

β2 8514 8484 8499 8504 8471 8483(0139) (0134) (0134) (0140) (0135) (0134)

c minus27404 minus27057 minus26972 minus37908 minus37470 minus37481 minus36908 minus36498 minus36518(1608) (1605) (1614) (1982) (1970) (1976) (2097) (2084) (2093)

d 1451 1423 1397 2554 2516 2509 2293 2274 2272(0275) (0275) (0275) (0307) (0305) (0307) (0343) (0340) (0341)

micro 0233 0385 0321 0700 0887 0895 0679 0892 0897(0133) (0142) (0128) (0152) (0158) (0158) (0162) (0164) (0163)

γ 4316 4279 4304 3928 3903 3929 4109 4046 4061(0117) (0120) (0104) (0197) (0189) (0187) (0199) (0182) (0178)

αD minus0094 0296 minus0192 0057 minus0197 0055(0121) (0112) (0117) (0154) (0112) (0150)

σD minus1027 minus1328 minus1551(1142) (1054) (1033)

ρD 1473 1872 minus0697 minus1220 minus1186 minus1277(1079) (0703) (0722) (0769) (0573) (0564)

cD 41604 43364 40036 40290 42505 41857 38622 40677 40272(6523) (6535) (5477) (7495) (7546) 7421 (6911) (7129) (7019)

dD minus2441 minus2721 minus1836 minus2748 minus2860 minus2791 minus2320 minus2487 minus2445(0808) (0730) (0534) (0689) (0682) 0667 (0685) (0676) (0662)

D87 minus4283 minus4253 minus4266 minus4226 minus4198 minus4217 minus4233 minus4209 minus4225(0646) (0641) (0643) (0826) (0820) (0815) (0793) (0793) (0791)

Log-likelihood 63063 63172 63263 67941 68108 68139 67963 68129 68161

NOTE The models are nested by the following specification rt = αminus1rt minus 1 + (α0 + αD ) + α1 rt minus 1 + α2 r 2t minus 1 + (σ + σD )r (ρ + ρD )

t minus 1

radicht zt + J(microγ 2)πt (qt + qD ) + D87 where ht = β0 +

β1[rt minus 1 minus E(rt minus 1 |rt minus 2)]2 + β2ht minus 1 zt sim iid N(0 1) J sim N(microγ 2 ) and πt (qt + qD ) is Bernoulli(qt + qD ) with qt + qD = (1 + exp(minusc minus cD minus (d + dD) middot rt minus 1))minus1

Table 5 reports parameter estimates for discretized jump-diffusion models There is some weak evidence of mean re-version especially when there is GARCH The contribution ofnonlinear drift is again very marginal most estimated drift pa-rameters are insignificant For jump-diffusion models withoutGARCH the elasticity parameter estimate is about 9 whichis closer to the 15 found by CKLS However level effect isweakened (ie ρ becomes close to 1) after GARCH is in-troduced It seems that CEV helps capture part of volatilityclustering but its importance diminishes in the presence ofGARCH We find that GARCH also significantly improvesthe performance of jump-diffusion models Without GARCHthere is a high probability of small jumps with an estimatedmean jump size between 2 and 4 With GARCH the jumpprobability becomes smaller but the mean jump size increasesto about between 7 and 9 This suggests that withoutGARCH jumps can capture part of volatility clustering butwith GARCH jumps mainly capture large interest rate move-ments Of course jumps also help explain volatility cluster-ing the sum of GARCH parameters is again much smallerthan that in pure GARCH models During the 1979ndash1982 pe-riod other than the probability of jumps becoming substan-

tially higher other aspects of the jump-diffusion models (egthe drift volatility or elasticity parameter) do not behavevery differently

To sum up our in-sample analysis reveals some importantstylized facts for the spot rate

1 The importance of modeling mean reversion in mean isambiguous It seems that linear drift is adequate for thispurpose once GARCH regime switching or jumps areincluded The contribution of Ait-Sahaliarsquos (1996) type ofnonlinear drift is very marginal

2 It is important to model conditional heteroscedasticitythrough GARCH or level effect although GARCH seemsto have much better in-sample performance than CEV

3 Regime switching and jumps help capture volatility clus-tering and especially the excess kurtosis and heavy-tailsof the interest rate

4 Interest rates behave quite differently during the1979ndash1982 period The level and volatility of the inter-est rate the dependence of the interest rate volatility onthe interest rate level and the probability of jumps aremuch higher during this period

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 467

5 OUTndashOFndashSAMPLE DENSITYFORECAST PERFORMANCE

The foregoing in-sample analysis demonstrates that model-ing volatility clustering through GARCH and heavy tails andexcess kurtosis through regime switching or jumps significantlyimproves the in-sample fit of historical interest rate data How-ever it is not clear whether these models will also performwell in out-of-sample density forecasts for future interest ratesIndeed some previous studies have shown that more compli-cated models actually underperform the simple random walkmodel in predicting the conditional mean of future interestrates (eg Duffee 2002) In many financial applications weare interested in forecasting the whole conditional distribu-tion We now apply the generalized spectral evaluation methoddescribed in Section 2 to evaluate density forecasts of the mod-els under study Among other things we examine whetherthe features found to be important for in-sample performanceremain important for out-of-sample forecasts and whether thebest in-sample performing models still perform best in out-of-sample forecasts

For each model we first calculate Zt(θ ) then model gen-eralized residuals using the prediction sample with modelparameter estimates based on the estimation sample Table 6reports the out-of-sample evaluation statistic M1 for each

model To compute M1 we select a normal cdf (radic

12u) forthe weighting function W(u) and a Bartlett kernel for the kernelfunction k(z) To choose a lag order p we use a data-drivenmethod that minimizes the asymptotic integrated mean squarederror of the modified generalized spectral density This methodinvolves choosing a preliminary lag order p We examine a widerange of p in our application We obtain similar results for thepreliminary lag order p between 10 and 30 and report results forp = 20 in Table 6 For comparison we also report the in-samplelog-likelihood values obtained from the estimation sample

The M1 statistics show that all single-factor diffusion mod-els are overwhelmingly rejected One of the most interestingfindings is that models that include a drift term (either linear ornonlinear) have much worse out-of-sample performance thanthose without a drift Dothanrsquos (1978) model and the pure CEVmodel have the best density forecast performance Both mod-els have a zero drift and level effect the elasticity parameterρ = 1 for the former and 25 (60 for the 1979ndash1982 period)for the latter On the other hand although the CKLS modeland Ait-Sahaliarsquos (1996) nonlinear drift model have the highestin-sample likelihood values they have the worst out-of-sampledensity forecasts in terms of M1 This seems to suggest that forthe purpose of density forecast it is much more important tomodel the diffusion function than the drift function Of coursethis does not necessarily mean that the drift is not important

Table 6 Out-of-Sample Density Forecast Performance of Spot Interest Rate Models

Model M1 Log-likelihood

Single-factor diffusion models Random walk 108lowastlowastlowast 26496Log-normal 121lowastlowastlowast 21870Dothan 067lowastlowast 21849Pure CEV 082lowastlowast 26545Vasicek 213lowastlowastlowast 26555CIR 219lowastlowastlowast 26720CKLS 219lowastlowastlowast 26617Nonlinear drift 224lowastlowastlowast 26625

GARCH models No drift GARCH 079lowastlowast 56868Linear drift GARCH 276lowastlowastlowast 56995Nonlinear drift GARCH 339lowastlowastlowast 57063No drift CEVndashGARCH 077lowastlowast 57011Linear drift CEVndashGARCH 291lowastlowastlowast 57152Nonlinear drift CEVndashGARCH 358lowastlowastlowast 57255

Regime-switching models No drift RS CEV 129lowastlowastlowast 64931Linear drift RS CEV 091lowastlowastlowast 65079Nonlinear drift RS CEV 162lowastlowastlowast 65128No drift RS GARCH 118lowastlowastlowast 70790Linear drift RS GARCH 047lowast 71328Nonlinear drift RS GARCH 055lowastlowast 71382No drift RS CEVndashGARCH 124lowastlowastlowast 70828Linear drift RS CEVndashGARCH 056lowastlowast 71386Nonlinear drift RS CEVndashGARCH 067lowastlowast 71441

Jump-diffusion models No drift JD CEV 180lowastlowastlowast 63063Linear drift JD CEV 039lowast 63172Nonlinear drift JD CEV 076lowastlowast 63263No drift JD GARCH 178lowastlowastlowast 67941Linear drift JD GARCH 056lowastlowast 68108Nonlinear drift JD GARCH 067lowastlowast 68139No drift JD CEVndashGARCH 174lowastlowastlowast 67963Linear drift JD CEVndashGARCH 053lowastlowast 68129Nonlinear drift JD CEVndashGARCH 065lowastlowast 68161

NOTE This table reports the out-of-sample density forecast performance of the single-factor diffusion GARCH Markov regime-switching andjump-diffusion models measured by the M1 statistic For convenience of comparison we also report the in-sample log-likelihood value for eachmodel The in-sample parameter estimation is based on the observations from June 14 1961 to February 2 1991 and the out-of-sample densityforecast evaluation is based on the observations from February 23 1991 to December 29 2000 The ratio between the size of the estimationsample and the forecast sample is about 3 1 In computing the out-of-sample M1 statistics a preliminary lag order p of 20 is used Other lagorders give similar results The asymptotic critical values for the out-of-sample evaluation statistic are 087 051 and 037 at the 1 5 and 10levels and lowastlowastlowast lowastlowast and lowast indicate that the M1 statistic is significant at the 1 5 and 10 levels

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

468 Journal of Business amp Economic Statistics October 2004

It might be simply because the drift specifications consideredare severely misspecified and do not outperform the zero driftmodel In fact most drift parameter estimates have large stan-dard errors and thus have excess sampling variation in para-meter estimation As a consequence assuming zero drift mayresult in a smaller adverse effect on out-of-sample performance

The generalized residuals Zt(θ ) contain much informationabout possible sources of model misspecification Instead ofbeing uniform the histograms of the generalized residuals forall diffusion models (Fig 2) exhibit a high peak in the cen-ter of the distribution which indicates that the diffusion mod-els cannot satisfactorily capture the marginal density of thespot rate The separate inference statistics M(m l ) in Table 7also show that all the models fail to satisfactorily capture thedynamics of the generalized residuals In particular the largeM(22) and M(44) statistics indicate that diffusion modelsperform rather poorly in modeling the conditional variance andkurtosis of the generalized residuals

Similar to single-factor diffusion models GARCH modelswith a zero drift have much better density forecasts than thosemodels with a linear or nonlinear drift The best GARCHand diffusion models have M1 statistics roughly between 07

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 2 Histograms of the Generalized Residuals of Four Classesof Spot Rate Models This figure plots the histograms of the generalizedresiduals of the single-factor diffusion GARCH regime-switching andjump-diffusion spot rate models

and 08 Comparing the best GARCH and single-factor dif-fusion models we find that GARCH provides better densityforecasts than CEV and combining CEV and GARCH furtherimproves the density forecasts Diagnostic analysis of general-ized residuals shows that GARCH models provide certain im-provements over diffusion models For example the histogramsof the generalized residuals of GARCH models have a less-pronounced peak in the center The M(m l ) statistics show thatGARCH models significantly improve the fitting of even-ordermoments of the generalized residuals the M(22) and M(44)

statistics are reduced from close to 100 to single digitsIn summary our analysis of single-factor diffusion and

GARCH models demonstrates that both classes of models failto provide good density forecasts for future interest rates Theyfail to capture both the high peak in the center of the mar-ginal density and the dynamics of different moments of thegeneralized residuals GARCH models have a more uniformmarginal density and significantly improve the ability to modelthe dynamics of even-order moments of the generalized residu-als For both classes of models modeling the conditional vari-ance through CEV or GARCH is much more important thanmodeling the conditional mean of the interest rate Zero-driftmodels outperform either linear or nonlinear drift models indensity forecast although the M(11) statistics suggest thatneither model adequately captures the dynamic structure in theconditional mean of the generalized residuals

We next turn to regime-switching models Regime-switchingmodels generally have better density forecasts than single-factor diffusion and GARCH models the best regime-switchingmodels reduce the M1 statistics of the best diffusion andGARCH models from above 07 to about 05 Although mod-eling the drift does not improve density forecast for diffusionand GARCH models the best regime-switching models in den-sity forecast have a linear drift in each regime As pointed outby Ang and Bekaert (2002) and Li and Xu (2000) a regime-switching model with a linear drift in each regime actually im-plies nonlinear conditional mean dynamics for the interest rateThus our results suggest that perhaps a specific nonlinear driftis needed for out-of-sample density forecasts GARCH is moreimportant than CEV for density forecasts and combining CEVwith GARCH does not further improve model performanceOverall the regime-switching GARCH model with a linear driftin each regime provides the best out-of-sample forecasts amongall of the regime-switching models Diagnostic analysis showsthat regime-switching models provide a much better charac-terization of the marginal density of interest rates the his-tograms of the generalized residuals are much closer to U[01]The M(22) and M(44) statistics of regime-switching modelsare slightly higher than those of GARCH models however andthe M(11) and M(33) statistics of regime-switching modelsare actually higher than those of diffusion and GARCH modelsTherefore the advantages of regime-switching models seem tocome from better modeling of the marginal density rather thanfrom the dynamics of the generalized residuals

The density forecast performance of jump-diffusion modelsshares many common features with that of regime-switchingmodels First they have comparable performance the M1 sta-tistics of the best jump-diffusion models are also around 05Second for jump-diffusion models the linear drift specifi-

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 469

Table 7 Separate Inference Statistics for Out-of-Sample Density Forecast Performance

Model M(1 1) M(1 2) M(2 1) M(2 2) M(3 3) M(4 4)

Single-factordiffusionmodels

Random walk 4008 635 4048 9973 2150 8012Log-normal 4652 877 3991 11010 2698 9251Dothan 4604 807 4262 10890 2702 9103Pure CEV 4162 663 4155 9681 2285 8042Vasicek 4040 813 3555 10020 2143 8179CIR 4361 868 3630 9857 2439 8502CKLS 4202 855 3575 9789 2280 8277Nonlinear drift 4240 837 3593 9847 2277 8298

GARCHmodels

No drift GARCH 4073 375 2821 768 1757 253Linear drift GARCH 4107 466 2511 753 1700 235Nonlinear drift GARCH 4071 487 2470 702 1663 209No drift CEVndashGARCH 4130 410 2840 746 1924 274Linear drift CEVndashGARCH 4168 511 2506 709 1873 244Nonlinear drift CEVndashGARCH 4143 545 2432 645 1854 217

Regime-switchingmodels

No drift RS CEV 5316 313 1131 705 3502 1021Linear drift RS CEV 5325 456 1116 657 3431 883Nonlinear drift RS CEV 5375 512 984 656 3450 849No drift RS GARCH 5729 405 2159 1095 4371 763Linear drift RS GARCH 5546 456 2109 1238 4071 914Nonlinear drift RS GARCH 5569 450 2105 1214 4113 893No drift RS CEVndashGARCH 4535 369 2091 1067 3425 708Linear drift RS CEVndashGARCH 4410 486 2075 1199 3212 887Nonlinear drift RS CEVndashGARCH 4436 484 2055 1140 3246 832

Jump-diffusionmodels

No drift JD CEV 4738 532 1676 8510 4301 9931Linear drift JD CEV 4632 448 1963 8509 4226 9982Nonlinear drift JD CEV 4650 485 1831 8510 4248 9929No drift JD GARCH 4332 510 2035 1238 2844 896Linear drift JD GARCH 4257 460 2148 1219 2822 884Nonlinear drift JD GARCH 4262 467 2142 1234 2824 895No drift JD CEVndashGARCH 4355 505 2051 1336 2931 1007Linear drift JD CEVndashGARCH 4274 454 2169 1300 2893 976Nonlinear drift JD CEVndashGARCH 4278 462 2157 1307 2894 981

NOTE This table reports the separate inference statistics M(m l) for the four classes of spot rate models The asymptotically normal statistic M(m l) can be used to testwhether the cross-correlation between the mth and lth moments of Zt is significantly different from 0 The choice of (m l) = (1 1) (2 2) (3 3) and (4 4) is sensitive toautocorrelations in mean variance skewness and kurtosis of rt We report results only for a lag truncation order p = 20 the results for p = 10 and 30 are rather similar Theasymptotical critical values are 128 165 and 233 at the 10 5 and 1 levels

cation outperforms those with no drift or nonlinear drift fordensity forecasts Third for jump-diffusion models GARCHgenerally provides better density forecasts than CEV Fourthjump-diffusion models have similar advantages over single-factor diffusion and GARCH models They have much bet-ter characterization of the marginal density of the generalizedresiduals but they may not necessarily perform better in mod-eling the dynamics of the generalized residuals as indicatedby their large M(m l ) statistics The jump model with a lin-ear drift and level effect although it has the smallest M1 statis-tic apparently fails to capture the even-order moments of thegeneralized residuals with M(22) and M(44) statistics sig-nificantly higher than other jump-diffusion models that includeGARCH effects

In summary our out-of-sample density forecast evaluationreveals the following important features

1 For out-of-sample density forecast the importance ofmodeling mean reversion in the interest rate is ambiguousFor diffusion and GARCH models a linear or nonlineardrift gives worse density forecasts than a zero drift How-ever when there are regime switches or jumps linear driftmodels outperform those with zero or nonlinear drift Thebest density forecast models still fail to adequately cap-ture the mean dynamics of the generalized residuals

2 It is important to capture volatility clustering in interestrates for both in-sample and out-of-sample performance

via GARCH Most of the best-performing models have aGARCH component which significantly improves theirability to model the dynamics of the even-order momentsof the generalized residuals

3 It is also important to capture excess kurtosis and heavytails of the interest rate for both in-sample and out-of-sample performance via regime switching or jumps Theadvantages of regime-switching and jump models aremainly reflected in modeling the marginal density ratherthan in the dynamics of the generalized residuals

Overall our out-of-sample analysis demonstrates that more-complicated models that incorporate conditional heteroscedas-ticity and heavy tails of interest rates tend to have a betterdensity forecast This result is quite different from the resultsfor those that focus on conditional mean forecast As widelydocumented in the literature the predictable component inthe conditional mean of the interest rate appears insignificantAs a result random walk models tend to outperform more-sophisticated models in terms of mean forecast (eg Duffee2002) However density forecast includes all conditional mo-ments and as a result those models that can capture the dy-namics of higher-order moments tend to have better densityforecasts Our analysis suggests that the more-complicated spotrate models developed in the literature can indeed capture someimportant features of interest rates and are relevant in applica-tions involving density forecasts of interest rates

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

470 Journal of Business amp Economic Statistics October 2004

6 CONCLUSION

Despite the numerous empirical studies of spot interest ratemodels little effort has been devoted to examining their out-of-sample forecast performance We have contributed to theliterature by providing the first (to our knowledge) compre-hensive empirical analysis of the out-of-sample performanceof a wide variety of popular spot rate models in forecastingthe conditional density of future interest rates Out-of-sampledensity forecasts help minimize the data snooping bias due toexcessive searching for more-complicated models using similardatasets The conditional density which completely character-izes the full dynamics of interest rates is also an essential inputto many financial applications Using a rigorous econometricevaluation procedure for density forecasts we examined theout-of-sample performance of single-factor diffusion GARCHregime-switching and jump-diffusion models

Although previous studies have shown that simpler modelssuch as the random walk model tend to have more accurateforecasts for the conditional mean of interest rates we find thatmodels that capture conditional heteroscedasticity and heavytails of the spot rate have better out-of-sample density forecastsGARCH significantly improves modeling of the conditionalvariance and kurtosis of the generalized residuals whereasregime-switching and jumps help in modeling the marginaldensity of interest rates Therefore our analysis shows that themore-sophisticated interest rate models developed in the litera-ture can indeed capture some important features of interest ratedata and perform better than simpler models in out-of-sampledensity forecasts Although we have focused on density fore-casts of the spot rates here there is evidence that the term struc-ture of interest rates is driven by multiple factors (eg Dai andSingleton 2000) Extending our analysis to forecast the jointdensity of multiple yields is an interesting issue that we willaddress in future research

ACKNOWLEDGMENTS

The authors are grateful for the helpful comments of CanlinLi Daniel Morillo Larry Pohlman and seminar participants atPanAgora Asset Management the 2002 Econometric SocietyEuropean Summer Meeting and the 2003 Econometric SocietyNorth American Winter Meeting They also thank the associateeditor and two anonymous referees whose comments and sug-gestions have greatly improved the article and David Georgefor editorial assistance Hongrsquos work was supported by NationalScience Foundation grant SES-0111769 Any errors are the au-thorsrsquo own

APPENDIX GENERALIZED SPECTRAL EVALUATIONFOR DENSITY FORECASTS

Here we describe Hongrsquos (2003) omnibus evaluation proce-dure for out-of-sample density forecasts as well as a class ofseparate inference procedures (For more discussion and formaltreatment see Hong 2003)

A1 Generalized Spectrum

Generalized spectrum was proposed by Hong (1999) as ananalytic tool to provide an alternative to the power spectrumand the higher-order spectrum (eg bispectrum) in time seriesanalysis Define the centered generalized residual

Zt equiv Zt(θ) =int rt

minusinfinp(r t|Itminus1 θ)dr minus 1

2 t = R + 1 T

where p(r t|Itminus1 θ) is a conditional density model for timeseries rt Suppose that the series Ztinfint=minusinfin has marginalcharacteristic function ϕ(u) equiv E(eiuZt) and pairwise jointcharacteristic function ϕj(u v) equiv E[expi(uZt + vZtminus| j|)]where i equiv radicminus1 u v isin (minusinfininfin) and j = 0plusmn1 The basicidea of generalized spectrum is to transform the series Zt andthen consider the spectrum of the transformed series Define thegeneralized covariance function

σj(u v) equiv cov(eiuZt eivZtminus| j|) j = 0plusmn1

Straightforward algebra yields σj(u v) = ϕj(u v) minus ϕ(u)ϕ(v)Because ϕj(u v) = ϕ(u)ϕ(v) for all (u v) if and only ifZt and Ztminus| j| are independent (cf Lukacs 1970) σj(u v) cancapture any type of pairwise serial dependence in Zt overvarious lags including those with zero autocorrelation in Zt

Under certain regularity conditions on the temporal depen-dence of Zt the Fourier transform of the generalized covari-ance function σj(u v) exists and is given by

f (ωu v) equiv 1

infinsumj=minusinfin

σj(u v)eminusijω ω isin [minusππ]

Like σj(u v) f (ωu v) can capture any type of pairwise se-rial dependence in Zt across various lags An advantage ofspectral analysis is that f (ωu v) incorporates all lags simulta-neously Thus it can capture the dependent processes in whichserial dependence occurs only at higher-order lags This canarise from seasonality (eg calender effects) or time delay infinancial markets Moreover f (ωu v) is particularly powerfulin capturing the dependent alternatives whose serial depen-dence decays to 0 slowly as j rarr infin such as slow mean-reverting or persistent volatility clustering processes For suchprocesses f (ωu v) has a sharp peak at frequency 0

The function f (ωu v) does not require any moment condi-tion on Zt When the moments of Zt exist however we candifferentiate f (ωu v) with respect to (u v) at (00)

f (0ml )(ω00) equiv partm+l

partum partvlf (ωu v)

∣∣∣∣(uv)=(00)

= im+l

infinsumj=minusinfin

cov(Zm

t Zltminus| j|

)eminusijω

In particular f (011)(ω00) is the well-known power spec-tral density (For applications of the power spectral density ineconomics and finance see eg Granger 1969 Durlauf 1991Watson 1993) For this reason f (ωu v) is called the ldquogener-alized spectral densityrdquo of Zt The parameters (u v) providemuch flexibility in capturing linear and nonlinear dependenceBecause f (ωu v) can be decomposed as a weighted sum of

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 471

f (0ml)(ω00) over various (m l ) via a Taylor series expan-sion around (0 0) it contains information on all autocorrela-tions and cross-correlations in the various power terms of ZtThus it can be used to develop an omnibus procedure against awide range of dependent processes

A2 Omnibus Evaluation for Density Forecast

To evaluate out-of-sample density forecasts of modelp(r t|Itminus1 θ) Hong (2003) used the generalized spectrum todevelop an omnibus test of iid U[minus 1

2 12 ] for the centered gen-

eralized residual Zt To test H0 Zt sim iid U[minus 12 1

2 ] we usethe modified generalized covariance

σUj (u v) equiv ϕj(u v) minus ϕU(u)ϕU(v)

where ϕU(u) equiv sin(u2)(u2) is the characteristic function ofa U[minus 1

2 12 ] random variable Because σU

j (u v) has incorpo-rated the information of U[minus 1

2 12 ] it can capture any pairwise

serial dependence in Zt and any deviation from U[minus 12 1

2 ] Theassociated generalized spectrum is

f U(ωu v) equiv 1

infinsumj=minusinfin

σUj (u v)eminusijω

ω isin [minusππ]u v isin (minusinfininfin)

Under H0 we have σUj (u v) = 0 for all j = 0 and so

f U(ωu v) becomes a known ldquoflatrdquo spectrum

f U0 (ωu v) equiv 1

2πσU

0 (u v)

= [ϕU(u + v) minus ϕU(u)ϕU(v)] ω isin [minusππ]When Zt and Ztminusj are not independent at some lag j = 0 orwhen Zt is not U[minus 1

2 12 ] we have f U(ωu v) = f U

0 (ωu v)Hence to test H0 one can check whether a consistent estimatorof f U(ωu v) is significantly different from f U

0 (ωu v)Suppose that θ is an estimator for model parameter θ based

on the estimation sample rtRt=1 To estimate f U(ωu v) using

the prediction sample rtNt=R+1 we have to use the estimated

proxies ZtNt=R+1 where

Zt equiv Zt(θ)

=int rt

minusinfinp(r t|Itminus1 θ )dr minus 1

2 t = R + 1 N

Define the empirical measure

σUj (u v) equiv ϕj(u v)minusϕU(u)ϕU(v) j = 0plusmn1 plusmn(nminus1)

where n equiv N minus R is the size of the prediction sample and theempirical pairwise joint characteristic function

ϕj(u v) equiv (n minus | j|)minus1Nsum

t=R+| j|+1

expi(uZt + vZtminus| j|

)

We introduce a class of smoothed kernel estimators

f U(ωu v) equiv 1

nminus1sumj=1minusn

(1 minus | j|n)12k( jp)σUj (u v)eminusijω

where k(middot) is a kernel and p equiv p(n) is a bandwidthlag order Anexample of k(middot) is the Bartlett kernel

k(z) =

1 minus |z| if |z| le 1

0 otherwise

For the choice of p Hong (1999) suggested a data-drivenmethod that delivers an asymptotically optimal bandwidth interms of an integrated mean squared error criterion This stillinvolves a preliminary bandwidth p but the impact of choos-ing p is much smaller than the impact of choosing lag order p

A convenient divergence measure for f U(ωu v) andf U0 (ωu v) is the quadratic form

2πnQ( f U f U0 )

equiv 2πnint int int π

minusπ

| f U(ωu v)

minus f U0 (ωu v)|2 dω dW(u)dW(v)

= nint int

|ϕ0(u + v) minus ϕU(u + v)|2 dW(u)dW(v)

+ 2nminus1sumj=1

k2( jp)(n minus j)int int

|σUj (u v)|2 dW(u)dW(v)

where the weighting function W(middot) is positive and nondecreas-ing and the unspecified integrals are taken over the supportof W(middot) An example of this is

W(u) = (radic

12u)

where (middot) is the N(01) cdf which is commonly used in thecharacteristic function literature The scale

radic12 matches the

standard deviation of U[minus 12 1

2 ] but this is not necessaryOur evaluation statistic for H0 is a properly centered and

scaled version of the quadratic form Q( f U f U0 )

M1 equiv[

nminus1sumj=1

k2( jp)

]minus1

timesint int nminus1sum

j=1

k2( jp)(n minus j)|σUj (u v)|2 dW(u)dW(v)

minus[int

[1 minus ϕU(u)2] dW(u)

]2

The asymptotic distribution of M1 was derived by Hong (2003)Under H0 and a set of regularity conditions

M1drarr

int|G(u v)|2 dW(u)dW(v)

as n rarr infinR rarr infin and n1+δR rarr 0 for any arbitrary smallconstant δ gt 0 where G(u v) is a complex-valued Gaussianprocess with mean 0 and a known covariance function

cov[G(u1 v1)G(u2 v2)]= E[G(u1 v1)G(u2 v2)

lowast]= σU

0 (u1minusu2)ϕU(v1)ϕ

U(v2) + σU0 (u1minusv2)ϕ

U(u2)ϕU(v1)

+ σU0 (v1minusu2)ϕ

U(u1)ϕU(v2)

+ σU0 (v1minusv2)ϕ

U(u1)ϕU(u2)

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

472 Journal of Business amp Economic Statistics October 2004

The limit distribution of M1 is nonstandard However be-cause the Gaussian process G(u v) has mean 0 and a knowndistribution-free covariance function the limit distributionof M1 can be easily tabulated When W(u) = (

radic12u) with

a truncated support on [minus11] the asymptotic critical values atthe 10 5 and 1 levels are 037 051 and 087

The most attractive feature of M1 is its omnibus propertyit has power against a wide range of suboptimal density fore-casts thanks to the use of the characteristic function in a spec-tral framework The M1 statistic can be viewed as an omnibusmetric measuring the departure of the forecast model from thetrue data-generating process A better density forecast model isexpected to have a smaller value for M1 because its generalizedresidual series Zt is closer to having the iid U[01] propertyThus M1 can be used to rank competing density forecast mod-els in terms of their deviations from optimality Moreover thekernel function provides flexible weighting for various lagsNonuniform weighting kernels usually discount higher-orderlags For most if not all economic and financial time seriestodayrsquos behaviors are more affected by recent events than byremote events Thus downward weighting for higher-order lagsis expected to yield good power in finite samples

A3 Separate Inferences

When a model is rejected using M1 it is interesting to ex-plore possible reasons for the rejection Hong (2003) provideda class of rigorous separate inference statistics that exploit theinformation contained in modeling generalized residuals Zt tounderstand possible reasons for model rejection The M(m l )statistic is based on a kernel estimator of the generalized spec-tral density derivative f (0ml)(ω00) where m and l are theorders of the derivatives with respect to u and v The statisticM(m l ) is defined as

M(m l ) =[

nminus1sumj=1

(n minus j)k2( jp)ρ2j (m l )

minusnminus1sumj=1

k2( jp)

][2

nminus2sumj=1

k4( jp)

]

where ρj(m l ) is the sample cross-correlation between Zmt

and Zltminus| j| t = R + 1 N Under H0 and a set of regular-

ity conditions

M(m l )drarrN(01)

for any given (m l ) as nR rarr infin The upper-tailed N(01)

critical values at the 10 5 and 1 levels are 128 165and 233 As in the M1 test Hong (2003) considered a data-driven method for p that also involves the choice of a prelimi-nary lag order p

Although the moments of the generalized residuals Zt arenot exactly the same as those of the original data rt theyare highly correlated In particular the choice of (m l ) =(11) (22) (33) and (44) is very sensitive to autocorrela-tions in level volatility skewness and kurtosis of rt Further-more the choices of (m l ) = (12) and (21) are sensitive tothe ldquoARCH-in-meanrdquo effect and the ldquoleveragerdquo effect Different

choices of order (m l ) can thus allow examination of variousdynamic aspects of the underlying process

[Received June 2002 Revised November 2003]

REFERENCES

Ait-Sahalia Y (1996) ldquoTesting Continuous-Time Models of the Spot InterestRaterdquo Review of Financial Studies 9 385ndash426

(1999) ldquoTransition Densities for Interest Rate and Other NonlinearDiffusionsrdquo Journal of Finance 54 1361ndash1395

(2002) ldquoMaximum-Likelihood Estimation of Discretely Sampled Dif-fusions A Closed-Form Approachrdquo Econometrica 70 223ndash262

Ait-Sahalia Y and Lo A (1998) ldquoNonparametric Estimation of State-Price Densities Implicit in Financial Asset Pricesrdquo Journal of Finance 53499ndash547

Andersen T G and Lund J (1997) ldquoEstimating Continuous Time StochasticVolatility Models of the Short-Term Interest Raterdquo Journal of Econometrics77 343ndash378

Ang A and Bekaert G (2002) ldquoRegime Switches in Interest Ratesrdquo Journalof Business amp Economic Statistics 20 163ndash182

Bali T (2000) ldquoModeling the Conditional Mean and Variance of the Short RateUsing Diffusion GARCH and Moving Average Modelsrdquo Journal of FuturesMarkets 20 717ndash751

(2001) ldquoNonlinear Parametric Models of the Short-Term InterestRaterdquo unpublished manuscript Baruch College

Ball C and Torous W (1998) ldquoRegime Shifts in Short-Term Riskless InterestRatesrdquo unpublished manuscript University of California Los Angeles

Bera A K and Ghosh A (2002) ldquoDensity Forecast Evaluation With Appli-cations in Financerdquo unpublished manuscript University of Illinois at Cham-paign Dept of Economics

Berkowitz J (2001) ldquoTesting the Accuracy of Density Forecasts With Appli-cations to Risk Managementrdquo Journal of Business amp Economic Statistics19 465ndash474

Bliss R and Smith D (1998) ldquoThe Elasticity of Interest Rate Volatility ChanKarolyi Longstaff and Sanders Revisitedrdquo unpublished manuscript FederalReserve Bank of Atlanta

Brenner R Harjes R and Kroner K (1996) ldquoAnother Look at AlternativeModels of Short-Term Interest Raterdquo Journal of Financial and QuantitativeAnalysis 31 85ndash107

Chacko G and Viceira L M (2003) ldquoSpectral GMM Estimation ofContinuous-Time Processesrdquo Journal of Econometrics 116 259ndash292

Chan K C Karolyi G A Longstaff F A and Sanders A B (1992)ldquoAn Empirical Comparison of Alternative Models of the Short-Term Inter-est Raterdquo Journal of Finance 47 1209ndash1228

Chapman D and Pearson N (2001) ldquoRecent Advances in Estimating Modelsof the Term-Structurerdquo Financial Analysts Journal 57 77ndash95

Christoffersen P and Diebold F X (1996) ldquoFurther Results on Forecastingand Model Selection Under Asymmetric Lossrdquo Journal of Applied Econo-metrics 11 561ndash572

(1997) ldquoOptimal Prediction Under Asymmetric Lossrdquo EconometricTheory 13 808ndash817

Clements M P and Smith J (2000) ldquoEvaluating the Forecast Densities ofLinear and Non-Linear Models Applications to Output Growth and Unem-ploymentrdquo Journal of Forecasting 19 255ndash276

Conley T G Hansen L P Luttmer E G J and Scheinkman J A (1997)ldquoShort-Term Interest Rates as Subordinated Diffusionsrdquo Review of FinancialStudies 10 525ndash578

Corradi V (2000) ldquoReconsidering the Continuous-Time Limit of theGARCH(11) Processrdquo Journal of Econometrics 96 145ndash153

Cox J C Ingersoll J E and Ross S A (1985) ldquoA Theory of the TermStructure of Interest Ratesrdquo Econometrica 53 385ndash407

Dai Q and Singleton K (2000) ldquoSpecification Analysis of Affine Term Struc-ture Modelsrdquo Journal of Finance 55 1943ndash1978

Das S R (2002) ldquoThe Surprise Element Jumps in Interest Ratesrdquo Journal ofEconometrics 106 27ndash65

Diebold F X Gunther T A and Tay A S (1998) ldquoEvaluating Density Fore-casts With Applications to Financial Risk Managementrdquo International Eco-nomic Review 39 863ndash883

Diebold F X Hahn J and Tay A S (1999) ldquoMultivariate Density ForecastEvaluation and Calibration in Financial Risk Management High-FrequencyReturns of Foreign Exchangerdquo Review of Economics and Statistics 81661ndash673

Diebold F X Tay A S and Wallis K F (1999) ldquoEvaluating Density Fore-casts of Inflation The Survey of Professional Forecastersrdquo in CointegrationCausality and Forecasting A Festschrift in Honour of Clive W J Grangereds R F Engle and H White Oxford UK Oxford University Presspp 76ndash90

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127

Hong Li and Zhao Out-of-Sample Analysis of Spot Rate Models 473

Dothan L U (1978) ldquoOn the Term Structure of Interest Ratesrdquo Journal ofFinancial Economics 6 59ndash69

Duffee G (2002) ldquoTerm Premia and Interest Rate Forecasts in Affine ModelsrdquoJournal of Finance 57 405ndash443

Duffie D and Pan J (1997) ldquoAn Overview of Value at Riskrdquo Journal ofDerivatives 4 13ndash32

Durham G (2003) ldquoLikelihood-Based Specification Analysis of Continuous-Time Models of the Short-Term Interest Raterdquo Journal of Financial Eco-nomics 70 463ndash487

Durlauf S (1991) ldquoSpectral Based Testing of the Martingale HypothesisrdquoJournal of Econometrics 50 355ndash376

Fackler P L and King R P (1990) ldquoCalibration of Option-Based Probabil-ity Assessments in Agricultural Commodity Marketsrdquo American Journal ofAgricultural Economics 72 73ndash83

Gallant A R and Tauchen G (1998) ldquoReprojection Partially Observed Sys-tems With Applications to Interest Rate Diffusionsrdquo Journal of the AmericanStatistical Association 93 10ndash24

Ghysels E Carrasco M Chernov M and Florens J (2001) ldquoEstimatingDiffusions With a Continuum of Moment Conditionsrdquo unpublished manu-script University of North Carolina

Granger C (1969) ldquoTesting for Causality and Feedbackrdquo Econometrica 37424ndash438

(1999) ldquoThe Evaluation of Econometric Models and of Forecastsrdquoinvited lecture at the Far Eastern Meeting of the Econometric SocietySingapore

Granger C and Pesaran M H (2000) ldquoA Decision Theoretic Approachto Forecasting Evaluationrdquo in Statistics and Finance An Interface edsW S Chan W K Li and H Tong London Imperial College pp 261ndash278

Gray S (1996) ldquoModeling the Conditional Distribution of Interest Rates asa Regime-Switching Processrdquo Journal of Financial Economics 42 27ndash62

Hamilton J D (1989) ldquoA New Approach to the Econometric Analysis of Non-stationary Time Series and the Business Cyclerdquo Econometrica 57 357ndash384

Hong Y (1999) ldquoHypothesis Testing in Time Series via the Empirical Charac-teristic Function A Generalized Spectral Density Approachrdquo Journal of theAmerican Statistical Association 94 1201ndash1220

(2003) ldquoEvaluation of Out-of-Sample Density Forecasts With Appli-cations to Stock Pricesrdquo unpublished manuscript Cornell University Deptof Economics and Dept of Statistical Science

Hong Y and Lee T H (2003a) ldquoInference on the Predictability of ForeignExchange Rate Changes via Generalized Spectrum and Nonlinear ModelsrdquoReview of Economics and Statistics 84 1048ndash1062

(2003b) ldquoDiagnostic Checking for Nonlinear Time Series ModelsrdquoEconometric Theory 19 1065ndash1121

Hong Y and Li H (2004) ldquoNonparametric Specification Testing forContinuous-Time Models With Applications to Interest Rate Term Struc-turesrdquo Review of Financial Studies forthcoming

Jackwerth J C and Rubinstein M (1996) ldquoRecovering Probability Distribu-tions From Option Pricesrdquo Journal of Finance 51 1611ndash1631

Jiang G and Knight J (1997) ldquoA Nonparametric Approach to the Estimationof Diffusion Processes With an Application to a Short-Term Interest RateModelrdquo Econometric Theory 13 615ndash645

Johannes M (2003) ldquoThe Statistical and Economic Role of Jumps in InterestRatesrdquo Journal of Finance 59 227ndash260

Jorion P (2000) Value at Risk The New Benchmark for Managing FinancialRisk New York McGraw-Hill

Morgan J P (1996) RiskMetricsmdashTechnical Document (4th ed) New YorkMorgan Guaranty Trust Company

Koedijk K G Nissen F G Schotman P C and Wolff C C P (1997)ldquoThe Dynamics of Short-Term Interest Rate Volatility Reconsideredrdquo Euro-pean Finance Review 1 105ndash130

Li H and Xu Y (2000) ldquoRegime Shifts and Short Rate Dynamicsrdquo unpub-lished manuscript Cornell University

Lo A and MacKinlay A C (1989) ldquoData-Snooping Biases in Tests of Finan-cial Asset Pricing Modelsrdquo Review of Financial Studies 3 175ndash208

Lukacs E (1970) Characteristic Functions (2nd ed) London Charles GriffinMeese R and Rogoff K (1983) ldquoEmpirical Exchange Rate Models of

the Seventies Do They Fit Out-of-Samplerdquo Journal of International Eco-nomics 14 3ndash24

Nelson D (1990) ldquoARCH Models as Diffusion Approximationrdquo Journal ofEconometrics 45 7ndash39

(1991) ldquoConditional Heteroskedasticity in Asset Returns A New Ap-proachrdquo Econometrica 59 347ndash370

Rosenblatt M (1952) ldquoRemarks on a Multivariate Transformationrdquo The An-nals of Mathematical Statistics 23 470ndash472

Rothschild M and Stiglitz J (1970) ldquoIncreasing Risk I A Definitionrdquo Jour-nal of Economic Theory 2 225ndash243

Singleton K (2001) ldquoEstimation of Affine Asset Pricing Models Using theEmpirical Characteristic Functionrdquo Journal of Econometrics 102 111ndash141

Soderlind P and Svensson L (1997) ldquoNew Techniques to Extract MarketExpectations From Financial Instrumentsrdquo Journal of Monetary Economics40 383ndash429

Stanton R (1997) ldquoA Nonparametric Model of Term Structure Dynamics andthe Market Price of Interest Rate Riskrdquo Journal of Finance 52 1973ndash2002

Vasicek O (1977) ldquoAn Equilibrium Characterization of the Term StructurerdquoJournal of Financial Economics 5 177ndash188

Watson M (1993) ldquoMeasures of Fit for Calibrated Modelsrdquo Journal of Politi-cal Economy 101 1011ndash1041

White H (2000) ldquoA Reality Check for Data Snoopingrdquo Econometrica 691097ndash1127