Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic...

35
Exact likelihood inference for autoregressive gamma stochastic volatility models 1 Drew D. Creal Booth School of Business, University of Chicago First draft: April 10, 2012 Revised: September 12, 2012 Abstract Affine stochastic volatility models are widely applicable and appear regularly in empir- ical finance and macroeconomics. The likelihood function for this class of models is in the form of a high-dimensional integral that does not have a closed-form solution and is difficult to compute accurately. This paper develops a method to compute the likelihood function for discrete-time models that is accurate up to computer tolerance. The key insight is to integrate out the continuous-valued latent variance analytically leaving only a latent discrete variable whose support is over the non-negative integers. The likelihood can be computed using standard formulas for Markov-switching models. The maximum likelihood estimator of the unknown parameters can therefore be calculated as well as the filtered and smoothed estimates of the latent variance. Keywords: stochastic volatility; Markov-switching; Cox-Ingersoll-Ross; affine models; autoregressive-gamma process. JEL classification codes: C32, G32. 1 I would like to thank Alan Bester, Chris Hansen, Robert Gramacy, Siem Jan Koopman, Neil Shephard, and Jing Cynthia Wu as well as seminar participants at the Triangle Econometrics Workshop (UNC Chapel Hill), University of Washington, Northwestern University Kellogg School of Management, and Erasmus Universiteit Rotterdam for their helpful comments. Correspondence: Drew D. Creal, University of Chicago Booth School of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637. Email: [email protected] 1

Transcript of Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic...

Page 1: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Exact likelihood inference for autoregressive

gamma stochastic volatility models1

Drew D. Creal

Booth School of Business, University of Chicago

First draft: April 10, 2012

Revised: September 12, 2012

Abstract

Affine stochastic volatility models are widely applicable and appear regularly in empir-

ical finance and macroeconomics. The likelihood function for this class of models is in

the form of a high-dimensional integral that does not have a closed-form solution and is

difficult to compute accurately. This paper develops a method to compute the likelihood

function for discrete-time models that is accurate up to computer tolerance. The key

insight is to integrate out the continuous-valued latent variance analytically leaving only

a latent discrete variable whose support is over the non-negative integers. The likelihood

can be computed using standard formulas for Markov-switching models. The maximum

likelihood estimator of the unknown parameters can therefore be calculated as well as the

filtered and smoothed estimates of the latent variance.

Keywords: stochastic volatility; Markov-switching; Cox-Ingersoll-Ross; affine models;

autoregressive-gamma process.

JEL classification codes: C32, G32.

1I would like to thank Alan Bester, Chris Hansen, Robert Gramacy, Siem Jan Koopman, Neil Shephard, andJing Cynthia Wu as well as seminar participants at the Triangle Econometrics Workshop (UNC Chapel Hill),University of Washington, Northwestern University Kellogg School of Management, and Erasmus UniversiteitRotterdam for their helpful comments. Correspondence: Drew D. Creal, University of Chicago Booth Schoolof Business, 5807 South Woodlawn Avenue, Chicago, IL 60637. Email: [email protected]

1

Page 2: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

1 Introduction

Stochastic volatility models are essential tools in finance and increasingly in macroeconomics

for modeling the uncertain variation of asset prices and macroeconomic time series. There

exists a literature in econometrics on estimation of both discrete and continuous-time stochastic

volatility (SV) models; see, e.g. Shephard (2005) for a survey. The challenge when estimating

SV models stems from the fact that the likelihood function of the model is a high-dimensional

integral that does not have a closed-form solution. The contribution of this paper is the

development of a method that calculates the likelihood function of the model exactly up to

computer tolerance for a class of SV models whose stochastic variance follows an autoregressive

gamma process, the discrete-time version of the Cox, Ingersoll, and Ross (1985) process. This

class of models is popular in finance because it belongs to the affine family which produces

closed-form formulas for derivatives prices; see, e.g. Heston (1993). A second contribution of

this paper is to illustrate how the ideas developed here can be applied to other affine models,

including stochastic intensity, stochastic duration, and stochastic transition models.

Linear, Gaussian state space models and finite state Markov-switching models are a corner-

stone of research in macroeconomics and finance because the likelihood function for these state

space models is known; see, e.g. Kalman (1960) and Hamilton (1989). SV models are non-

linear, non-Gaussian state space models which in general do not have closed-form likelihood

functions. This has lead to a literature developing methods to approximate the maximum

likelihood estimator including importance sampling (Danielsson (1994), Durbin and Koopman

(1997), Durham and Gallant (2002)), particle filters (Fernandez-Villaverde and Rubio-Ramırez

(2007)), and Markov-chain Monte Carlo (Jacquier, Johannes, and Polson (2007)). Alterna-

tively, estimation methods have also been developed that do not directly compute the likeli-

hood including Bayesian methods (Jacquier, Polson, and Rossi (1994), Kim, Shephard, and

Chib (1998)), the generalized method of moments, simulated method of moments (Duffie and

Singleton (1993)), efficient method of moments (Gallant and Tauchen (1996)), and indirect

inference (Gourieroux, Monfort, and Renault (1993)).

The results in this paper follow from the properties of the transition density of the Cox,

Ingersoll, and Ross (1985) (CIR) process, which in discrete time is known as the autoregressive

gamma (AG) process (Gourieroux and Jasiak (2006)). The transition density of the CIR/AG

2

Page 3: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

process is a non-central gamma (non-central chi-squared up to scale) distribution, which is a

Poisson mixture of gamma random variables. The CIR/AG process has two state variables.

One is the continuous valued variance ht which is conditionally a gamma random variable and

the other is a discrete mixing variable zt which is conditionally a Poisson random variable.

The approach to computing the likelihood function here relies on two key insights about the

model. First, it is possible to integrate out the continuous-valued state variable ht analytically.

Second, an interesting feature of the CIR/AG process is that once one conditions on the discrete

mixing variable zt and the data yt, the variance ht is independent of its own leads and lags.

The combination of these properties means that the original SV model can be reformulated as

a state space model whose only state-variable is discrete-valued with support over the set of

non-negative integers. I show that the Markov transition distribution of the discrete variable

zt is a Sichel distribution (Sichel (1974, 1975)). Like the Poisson distribution from which it is

built, the Sichel distribution assigns increasingly small probability mass to large integers and

the probabilities eventually converge to zero. Although the support of the state variable zt is

technically infinite, it is finite for all practical purposes. In this paper, I propose to compute

the likelihood for the affine SV model by representing it as a finite state Markov-switching

model. The likelihood of a finite-state Markov switching model can be computed from known

recursions; see, e.g. Baum and Petrie (1966), Baum, Petrie, Soules, and Weiss (1970), and

Hamilton (1989).

In addition to computing the maximum likelihood estimator, I also show how to compute

moments and quantiles of the filtered and smoothed distributions of the latent variance ht.

The filtering distribution characterizes the information about the state variable at time t given

information up to and including time t whereas the smoothing distribution conditions on all

the available data. Solving the filtering and smoothing problem for the variance ht reduces to

a two step problem. First, the marginal filtering and smoothing distributions for the discrete

variable zt are calculated using Markov-switching algorithms. Then, the marginal filtering and

smoothing distributions of the variance ht conditional on zt and the data yt are shown to be

generalized inverse Gaussian (GIG) distributions whose moments can be computed analyti-

cally. Moments and quantiles of the filtering distribution for the continuous-variable ht can be

computed by averaging the moments and quantiles of the conditional GIG distribution by the

marginal probabilities of zt calculated from the Markov-switching algorithm.

3

Page 4: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Another method to approximate the likelihood and filtering distribution for this class of

models are sequential Monte Carlo methods also known as particle filters. Particle filters were

introduced into the economics literature by Kim, Shephard, and Chib (1998). Creal (2012a)

provides a recent review of the literature on particle filters. In this paper, I describe how the

filtering recursions based on the finite state Markov-switching model are related to the particle

filter. Essentially, the finite state Markov-switching model can be interpreted as a particle filter

but where there are no Monte Carlo methods involved.

This paper continues as follows. In Section 2, I describe the autoregressive gamma stochastic

volatility models and demonstrate how to reformulate it with the latent variance integrated out.

The model reduces to a Markov-switching model, whose transition distribution is characterized.

In Section 3, I provide the details of the filtering and smoothing algorithms and a discussion of

how to compute them efficiently. In Section 4, I compare standard particle filterings to the new

algorithms and illustrate their relative accuracy. The new methods are then applied to several

financial time series. Section 5 concludes.

2 Autoregressive gamma stochastic volatility models

The discrete time autoregressive gamma stochastic volatility (AG-SV) model for an observable

time series yt is given by

yt = µ+ βht +√htεt, εt ∼ N(0, 1), (1)

ht ∼ Gamma (ν + zt, c) , (2)

zt ∼ Poisson

(φht−1c

), (3)

where µ controls the location, β determines the skewness, and φ is the first-order autocorrelation

of ht. The restriction φ < 1 is required for stationarity. The AG-SV model has two state

variables ht and zt, where the latter can be regarded as an “auxiliary” variable in the sense

that it is not of direct interest. From the relationship to the CIR process discussed below,

it is possible to deduce the restriction ν > 1 which is the well-known Feller condition that

guarantees that the process ht never reaches zero. The model is completed by specifying an

initial condition that h0 is a draw from the stationary distribution h0 ∼ Gamma(ν, c

1−φ

)with

4

Page 5: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

mean E [h0] = νc1−φ .

The stochastic process for ht given by (2) and (3) is known as the autoregressive gamma

(AG) process of Gourieroux and Jasiak (2006), who developed many of its properties. In

particular, they illustrated how it is related to the continuous-time model of Cox, Ingersoll,

and Ross (1985) for the short term interest rate. Consider an interval of time of length τ . The

autoregressive gamma process (2) and (3) converges in the continuous-time limit as τ → 0 to

dht = κ (θh − ht) dt+ σ√htdWt. (4)

where the discrete and continuous time parameters are related as φ = exp (−κτ) , ν = 2κθh/σ2,

and c = σ2 [1− exp (−κτ)] /2κ. In the continuous-time parameterization, the unconditional

mean of the variance is θh and κ controls the spead of mean reversion. The continuous-time

process (4) was originally analyzed by Feller (1951) and is sometimes called a Feller or square-

root process. The model is a staple of the literature in continuous-time finance because it is a

subset of the class of affine models, which have closed-form formulas for derivatives prices; see,

e.g. Heston (1993).

A stronger property than convergence in the continuous-time limit is that the AG and CIR

processes have the same transition density for any time interval τ . Integrating zt out of (2) and

(3), the transition density of ht is

p(ht|ht−1; θ) = hν−1t

1

cνexp

(−(ht + φht−1)

c

) ∞∑k=0

[hkt

ckΓ (ν + k)

(φht−1/c)k

k!

], (5)

which is a non-central gamma (non-central chi-squared up to scale) distribution. The con-

ditional mean and variance of this distribution can easily be derived using properties of the

gamma and Poisson distributions and are

E [ht|ht−1] = νc+ φht−1, (6)

V [ht|ht−1] = νc2 + 2cφht−1. (7)

The conditional mean (6) is a linear function of ht−1, which implies that φ is the autocorrelation

parameter. The conditional variance (7) is also a linear function of ht−1 making the variance of

5

Page 6: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

the variance change through time. This is a property of the AG/CIR model that the log-normal

SV model specified as

yt = γy + exp (ht/2) εt εt ∼ N (0, 1) ,

ht = µy + φy (ht−1 − µy) + σyηt ηt ∼ N (0, 1)

does not share; see, e.g. Shephard (2005). I let θ = (µ, β, ν, φ, c) denote the vector of unknown

parameters of the model. Throughout the paper, I use the notation xt−k:t to denote a sequence

of variables (xt−k, . . . , xt), which may be either observed or latent.

Estimating the parameter vector θ is challenging because the likelihood of the model p(y1:T ; θ)

is in the form of a high-dimensional integral

p(y1:T ; θ) =

∫ ∞0

. . .

∫ ∞0

∞∑zT=0

. . .∞∑z1=0

T∏t=1

p(yt|ht; θ)p(ht|zt; θ)p(zt|ht−1; θ)p(h0; θ)dh0, . . . , dhT .

For linear, Gaussian state space models and finite state Markov-switching models, this integral

is solved recursively beginning at the initial iteration using the prediction error decomposition;

see, e.g. Durbin and Koopman (2001) and Fruhwirth-Schnatter (2006). The solutions to

these integrals are the well-known Kalman filter and the filter for regime switching models; see

Kalman (1960) and Hamilton (1989), respectively.

It is the difficulty of computing this integral in closed-form for SV models that led researchers

to search for approximations to the likelihood function primarily by Monte Carlo methods.

The continuous-time CIR-SV model (possibly with jumps and leverage) has been estimated

in applications for the term-structure of interest rates and option pricing. Versions of the

continuous-time model appear in for example Eraker, Johannes, and Polson (2003), Chib, Pitt,

and Shephard (2006), and Jacquier, Johannes, and Polson (2007). Most work in the literature

estimates the continuous-time version of the model using an Euler discretization (conditional

normal approximation) for the dynamics of the variance instead of the exact non-central gamma

distribution.

Developing recursions that can accurately compute the likelihood function for the AG-SV

model relies on recognizing two key features of the model. First, it is possible to integrate the

variance ht out of the model analytically. The only unobserved state variable remaining is the

6

Page 7: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

auxiliary variable zt, whose Markov transition distribution can be characterized. Secondly, the

AG-SV model has a unique dependence structure where ht depends on ht−1 only through zt.

Combining these features of the model makes the results of this paper possible. Integrating ht

out of the model is a result of the following proposition.

Proposition 1 The conditional likelihood p(yt|zt; θ), transition distribution p(zt|zt−1, y1:t−1; θ),

and initial distribution are

p(yt|zt; θ) = Normal Gamma

(µ,

√2

c+ β2, β, ν + zt

)(8)

p(zt|zt−1, y1:t−1; θ) = Sichel

(ν + zt−1 −

1

2,φ

c(yt−1 − µ)2 ,

c

φ

[2

c+ β2

])(9)

p(z1; θ) = Negative Binomial (ν, φ) (10)

Definitions for the distributions are available in Appendix A.1.

Proof: See Appendix A.2.

The proposition illustrates that the variance ht can always be integrated out of the model

analytically by conditioning on the data yt and the auxiliary variable zt from only a single

neighboring time period. This is due to the unique dependence structure of the AG-SV model

where ht depends on ht−1 only through zt and not directly. This dependence structure is

atypical for state space models with both discrete and continuous latent variables.

It is informative to juxtapose the dependence structure of the AG-SV model with the class of

linear, Gaussian state space models with Markov switching (see e.g. Shephard (1994), Kim and

Nelson (1999), and Fruhwirth-Schnatter (2006)), which are popular in economics. In the latter

class of models, the continuous-valued state variable remains dependent on its own lag even after

conditioning on the discrete variable. In order to make the distributions conditionally Gaussian

and consequently analytically integrable via the Kalman filter, one needs to condition on all

past discrete state variables at every iteration of the algorithm. The number of discrete state

variables necessary to be integrated out grows exponentially over time. Although the likelihood

can in principle be computed exactly, the computational demands of an exact solution increase

with time as the number of observations increases. This is not the case for models which have

a dependence structure like the AG-SV model. As ht only depends on zt and not additionally

7

Page 8: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

on ht−1, the latent variance ht can be integrated out analytically at each time period without

having to condition on all past lags of zt.

Proposition 1 implies that the AG-SV model can be reduced to a new state space model

where the observation density p (yt|zt; θ) is a normal gamma (NG) distribution with a Markov-

switching state variable zt. The nomenclature for the normal gamma (NG) distribution used

here follows Barndorff-Nielsen and Shephard (2012) but this distribution is also known in the

literature as the variance gamma (VG) distribution as well as the generalized (asymmetric)

Laplace distribution; for the properties see, e.g. Kotz, Kozubowski, and Podgorski (2001). The

symmetric version (β = 0) of the distribution was introduced into economics by Madan and

Seneta (1990) and subsequently extended to the asymmetric distribution (β 6= 0) by Madan,

Carr, and Chang (1998). The NG model for asset returns (without the Markov-switching

variable zt) is popular among practitioners. Like the CIR-SV model, it has closed-form solutions

for option prices. It is often advocated as an alternative to the CIR-SV model. This result

illustrates how over discrete time intervals the CIR-SV model generalizes the NG model.

The state variable zt has a non-homogenous Markov transition kernel p(zt|zt−1, yt−1; θ)known as a Sichel distribution; see, e.g. Sichel (1974, 1975). A Sichel distribution is a Poisson

distribution whose mean is a random draw from a generalized inverse Gaussian (GIG) distri-

bution, see Appendix A.1 for details. The support of a Sichel random variable is therefore over

the set of non-negative integers.

After reformulating the AG-SV model as a Markov switching model, the challenge is now to

compute the likelihood function p(y1:T ; θ) of an infinite dimensional Markov switching model.

Given an initial distribution p(z1; θ), a closed-form solution to this problem would recursively

solve the integrals

p(zt = i|y1:t; θ) =p(yt|zt = i; θ)p(zt = i|y1:t−1; θ)∑∞j=0 p(yt|zt = j; θ)p(zt = j|y1:t−1; θ)

, (11)

p(zt+1 = i|y1:t; θ) =∞∑j=0

p(zt+1 = i|zt = j, yt; θ)p(zt = j|y1:t; θ). (12)

The contribution to the likelihood function for the current observation is the term p(yt|y1:t−1; θ) =∑Zj=0 p(yt|zt = j)p(zt = j|y1:t−1; θ) in the denominator of (11). The infinite sums over zt unfor-

tunately do not have closed form solutions. However, the stationarity of the model ensures that

8

Page 9: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

there always exists an integer Zθ,yt such that the probability p(zt = Zθ,yt|y1:t; θ) is zero up to

machine tolerance on a computer. For all practical purposes, the AG-SV model is a finite state

Markov switching model, which means that the likelihood as well as moments and quantiles

can be computed accurately to a high degree of tolerance. In section 3, I propose methods for

computing solutions for this model.

In addition to calculating the likelihood function, estimates of the latent variance ht are also

of interest and these are characterized by the posterior distributions p(ht|y1:t; θ), p(ht+1|y1:t; θ),and p(ht|y1:T ; θ). These distributions summarize the information about the latent state variable

ht based on different information sets. Estimates of the latent variance ht rely on the results

of the following proposition.

Proposition 2 Conditional on the discrete variable and the data, the predictive p(ht+1|z1:t, y1:t; θ),

marginal filtering p(ht|z1:t, y1:t; θ), and marginal smoothing p(ht|z1:T , y1:T ; θ), p(zt|h1:T , y1:T ; θ)

distributions as well as the distribution of the initial condition are

p(ht+1|z1:t+1, y1:t; θ) = Gamma

(ν − 1

2+ zt+1, c

)(13)

p(ht|z1:t, y1:t; θ) = GIG

(ν + zt −

1

2, (yt − µ)2 ,

2

c+ β2

)(14)

p(ht|z1:T , y1:T ; θ) = GIG

(ν + zt + zt+1 −

1

2, (yt − µ)2 ,

2(1 + φ)

c+ β2

)(15)

p(zt|h1:T , y1:T ; θ) = Bessel

(ν − 1, 2

√φhtht−1c2

)(16)

p(h0|z1:T , y1:T ; θ) = Gamma (ν + z1, c) (17)

Definitions for the generalized inverse Gaussian (GIG) and Bessel distributions are available

in Appendix A.1.

Proof: See Appendix A.3.

Conditional on the discrete mixing variable and the data, the marginal filtering and smooth-

ing distributions for the variance ht are either generalized inverse Gaussian (GIG) or gamma

distributions. Useful properties of GIG random variables that are needed for computational pur-

poses are discussed in Appendix A.1. The fact that the conditional distribution p(zt|h1:T , y1:T ; θ)

is a Bessel distribution was implicitly stated in Gourieroux and Jasiak (2006). This result

9

Page 10: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

will not be used in this paper. However, this proposition states that all the conditional dis-

tributions of the latent variables given the observed variables are recognizable distributions.

This implies that the latent variables can easily be imputed using Gibbs sampling with no

Metropolis-Hastings steps, which is useful for any researchers interested in estimating the model

via Bayesian methods.

It is possible to extend the model to include more general dynamics and densities than

in (1)-(3), while still retaining tractability. The simplest and most important extension is to

allow either or both of the measurement and transition densities to depend on an additional

discrete state variable that takes on a finite set of possible values. For example, the model

can be extended to allow the measurement density (1) to be a finite mixture of normals or the

dynamics of the variance ht may be generalized to allow the transition densities to depend on

a finite state Markov-switching variable. Extensions of this kind can also be computed with no

loss in accuracy, although they do require additional computations.

Another important contribution of this paper is to recognize that the ideas developed here

can be used to analyze other models with continuous and discrete state variables whose de-

pendence structure is like the CIR/AG dynamics. It is only necessary to analytically calculate

the distributions p(yt|zt; θ) and p(zt|zt−1, y1:t−1; θ). For example, suppose the state variable ht

follows an autoregressive gamma process as in (2)-(3). Then, additional models for which it is

possible to calculate the likelihood function are when the distribution of p(yt|ht; θ) is Poisson

with time-varying mean ht, exponential with time-varying intensity ht, and gamma with time-

varying mean ht. The exponential model with time-varying intensity is an important model in

the literature on credit risk; see, e.g. Duffie and Singleton (2003) and Duffie, Eckner, Horel, and

Saita (2009). These results are developed further in Creal (2012b) who builds methods for Cox

processes with CIR intensities, marked Poisson processes with CIR intensities, and duration

models built from the AG process.

3 Filtering and smoothing recursions

In this section, I describe different methods for computing the likelihood as well as the filtered

and smoothed estimates of the state variables. First, I consider deterministic schemes based

on finite-state Markov switching models and then I discuss Monte Carlo methods based on the

10

Page 11: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

particle filter.

3.1 Filtering recursions from finite-state Markov switching models

The infinite dimensional Markov switching model implied by the AG-SV model has an exact

filtering distribution described by the infinite dimensional vector p(zt|y1:t; θ). This vector can

be accurately represented using a finite dimensional (Zθ,yt + 1) × 1 vector of probabilities

p(zt|y1:t; θ) whose points of support are the integers from 0 to Zθ,yt . The Markov transition

distribution p(zt|zt−1, y1:t−1; θ) can also be represented by a (Zθ,yt + 1) × (Zθ,yt + 1) transition

matrix p(zt|zt−1, y1:t−1; θ). Both approximations will be equivalent to the estimates produced

by a closed-form solution as Zθ,yt →∞. Computing the probabilities for extremely large values

of zt is, however, computationally wasteful as these values are numerically zero. To obtain

a fixed precision across the parameter space, the integer Zθ,yt at which the distributions are

truncated should be a function of the parameters of the model θ and the data yt. A discussion

on how to determine the value of Zθ,yt is provided further below. In the following, I eliminate

dependence of θ and yt on Zθ,yt for notational convenience.

The recursions for the marginal filtering and one-step ahead predictive distributions proceed

forward in time from t = 1, . . . , T as follows. The distributions for the initial iteration are

computed analytically in the following proposition, assuming that the initial condition is a

draw from the stationary distribution h0 ∼ Gamma(ν, c

1−φ

).

Proposition 3 The initial contribution to the likelihood p(y1; θ), the marginal filtering distri-

butions p(z1|y1; θ), p(h1|y1; θ), and the predictive distribution p(z2|y1; θ) are

p(y1; θ) = Normal Gamma

(µ,

√2(1− φ)

c+ β2, β, ν

)

p(h1|y1; θ) = GIG

(ν − 1

2, (y1 − µ)2 ,

2(1− φ)

c+ β2

)p(z1|y1; θ) = Sichel

(ν − 1

2,φ

c(y1 − µ)2 ,

c

φ

[2(1− φ)

c+ β2

])p(z2|y1; θ) = Sichel

(ν − 1

2,φ

c(y1 − µ)2 ,

c

φ

[2(1− φ)

c+ β2

])(18)

Proof: See Appendix A.4.

11

Page 12: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

After the first iteration, the distributions cannot be solved analytically because the integrals

involving zt have an unknown solution. In this section, I describe how the infinite dimensional

Markov-switching model can be approximated by a finite-state Markov-switching model. The

likelihood function can therefore be computed using known filtering recursions; see, e.g. Baum

and Petrie (1966), Baum, Petrie, Soules, and Weiss (1970), and Hamilton (1989). It is conse-

quently straightforward to estimate the unknown parameters θ by maximum likelihood. The

marginal filtering p(zt|y1:t; θ), one-step ahead predictive p(zt+1|y1:t; θ), and smoothing distribu-

tions p(zt|y1:T ; θ) can be also computed from these recursions.

Proposition 3 provides a way to initialize the filtering algorithm for a finite-state Markov

switching model with Z + 1 states. Using the Sichel distribution (18), compute the (Z + 1)× 1

vector of predictive probabilities p(z2|y1; θ). Then, for t = 2, . . . , T , compute the filtering

probabilities p(zt|y1:t; θ) and the predictive probabilities p(zt+1|y1:t; θ) recursively as

p(zt = i|y1:t; θ) =p(yt|zt = i; θ)p(zt = i|y1:t−1; θ)∑Zj=0 p(yt|zt = j; θ)p(zt = j|y1:t−1; θ)

(19)

p(zt+1 = i|y1:t; θ) =Z∑j=0

p(zt+1 = i|zt = j, yt; θ)p(zt = j|y1:t; θ) (20)

The term p(yt|y1:t−1; θ) =∑Z

j=0 p(yt|zt = j)p(zt = j|y1:t−1; θ) is the contribution to the likeli-

hood function for the current observation. The log-likelihood function to be maximized is given

by

log p (y1:T ; θ) =T∑t=2

log p(yt|y1:t−1; θ) + log p(y1; θ)

Further discussion on Markov-switching algorithms can be found in Hamilton (1994), Cappe,

Moulines, and Ryden (2005), and Fruhwirth-Schnatter (2006).

The Markov switching algorithms provide estimates of the auxiliary variable zt but in appli-

cations interest centers on estimation of the latent variance ht through filtering and smoothing

algorithms similar to the linear, Gaussian state space model; see, e.g. Durbin and Koopman

(2001). The marginal filtering and smoothing distributions for ht are not known analytically

but moments and quantiles of the distributions can be accurately computed. Computing these

12

Page 13: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

values starts by decomposing the joint filtering distribution as

p(ht, zt|y1:t; θ) = p(ht|y1:t, zt; θ)p(zt|y1:t; θ) ≈ p(ht|yt, zt; θ)p(zt|y1:t; θ).

Moments and quantiles of the marginal distribution p(ht|y1:t; θ) can consequently be computed

by weighting the expectations of the conditional distribution p(ht|zt, yt; θ) by the discrete prob-

abilities p(zt|y1:t; θ) calculated from the Markov switching algorithm. Estimates of the moments

of the marginal filtering and predictive distributions for the variance are

E [hαt |y1:t; θ] =Z∑

zt=0

E [hαt |yt, zt; θ] p(zt|y1:t; θ) (21)

E[hαt+1|y1:t; θ

]=

Z∑zt=0

E[hαt+1|zt+1; θ

]p(zt+1|y1:t; θ) (22)

where the orders of integration for zt and ht have been exchanged. In these expressions, the

conditional expectations E [hαt |yt, zt; θ] and E[hαt+1|zt+1; θ

]are with respect to the GIG and

gamma distributions from Proposition 2. These can be calculated analytically for any value

of zt. As Z → ∞, the solution converges to the exact values whereas numerically accurate

estimates can be computed for finite values of Z due to the fact that the probabilities p(zt|y1:t; θ)converge to zero for large integer values.

If interest centers on the quantiles of the marginal filtering or predictive distributions of the

variance, these can be computed analogously as

P (ht < x|y1:t; θ) =Z∑

zt=0

[∫ x

0

p (ht|yt, zt; θ) dht]p(zt|y1:t; θ) (23)

P (ht+1 < x|y1:t; θ) =Z∑

zt=0

[∫ x

0

p (ht+1|zt+1; θ) dht

]p(zt+1|y1:t; θ) (24)

where the terms in brackets are the cumulative distribution functions of the GIG and gamma

distributions from Proposition 2. For a given quantile and with a fixed value of p(zt|y1:t; θ)or p(zt+1|y1:t; θ), the value of x in (23) and (24) can be determined with a simple zero-finding

algorithm. Similar ideas can be applied to estimate moments and quantiles of the marginal

smoothing distributions, which are also GIG and gamma distributions as shown in Proposition

13

Page 14: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

2. Further details are discussed in Section 3.2.

At this point, it is important to note that the prediction step in (20) is a vector-matrix

multiplication which is an O(Z2) operation. Fortunately, the Markov transition distribution

p(zt|zt−1, y1:t−1; θ) is a sparse matrix. In addition, many of the entries in the vector of filtered

probabilities p(zt|y1:t; θ) are close to zero. This feature of the model means that the O(Z2)

computation in the prediction step of the algorithm can be avoided. When computing the

vector-matrix multiplication in the prediction step, it is only necessary to compute those ele-

ments in each row times column operation that contribute to the sum in (20). In other words,

once the sum in (20) has already converged after computing a subset of the states zt = j or if

any combination of the states i and j in the product p(zt+1 = i|zt = j, y1:t−1; θ)p(zt = j|y1:t; θ)are numerically zero, then it is unnecessary to compute the probability for other transitions

whose probabilities are known to be smaller. In addition, each row times column operation in

the vector-matrix multiplication of the prediction step can be computed in parallel on multi-

core processors. The sparse nature of the transition matrix and the ability to compute the

operations in parallel make computing the likelihood easily feasible for macroeconomic and

financial data sets using daily, monthly, or quarterly data.

An important practical issue is the choice of the threshold Z that determines where the

infinite sums are truncated. There are several strategies one can take. Here, I propose a

procedure to determine a global value of Z for a given value of θ that works well in practice.

The idea is to choose Z such that the likelihood contribution

p (yt|y1:t−1; θ) =Z∑j=0

p(yt|zt = j; θ)p(zt = j|y1:t−1; θ) (25)

is guaranteed to converge numerically for all time periods. It suffices to bound the time period

which needs the largest value of Z necessary for the integral (25) to converge as it will converge

for all other time periods for this same value of Z as well. Determining the largest value of

Z that is necessary to bound the integral corresponds to the date with the largest posterior

mass for the filtering distribution p(zt|y1:t; θ). Given a data set and a value θ, it is simple to

determine the conditional likelihood p(yt|zt; θ) which gives the most weight to the largest value

of zt. As a function of yt, it is maximized for the largest absolute value of yt in the dataset. The

challenge is to determine an approximation to the distribution p(zt = j|y1:t−1; θ) in (25) that

14

Page 15: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

will weight the conditional likelihood. A simple solution is to start with the negative binomial

(stationary) distribution of zt for the parameters θ and compute two iterations of the filtering

algorithm using the likelihood p(yt|zt; θ) for the largest absolute value of yt for both iterations.

This produces an approximation to the unknown distribution p(zt = j|y1:t−1; θ) in the integral

(25). The value of Z can then be selected as the integer where this sum converges.

3.2 Marginal smoothing recursion

An object of independent interest are smoothed or two-sided estimators of the latent variance. It

is also possible to recursively compute the marginal smoothing distributions for p(zt|y1:T ; θ) and

moments and quantiles of the marginal distribution p(ht|y1:T ; θ). The solution for smoothing

is analogous to the solution for forward filtering. First, the joint distribution can be decom-

posed as p(h0:T , z1:T |y1:T ; θ) = p(h0:T |z1:T , y1:T ; θ)p(z1:T |y1:T ; θ) and the smoothing algorithms

for discrete-state Markov switching models can be applied to calculate the marginal probabil-

ities p(zt|y1:T ; θ). Moments and quantiles of the marginal smoothing distribution p(ht|y1:T ; θ)

can be computed using the results from Proposition 2.

First, the filtering algorithm (19) and (20) is run forward in time and the one-step ahead

predictive probabilities and the conditional likelihoods p(yt|zt; θ) are stored for t = 1, . . . , T .

The backwards smoothing algorithm is initialized using the final iteration’s filtering probabilities

p(zT |y1:T ). For t = T − 1, . . . , 1, the smoothing distributions for zt are computed as

p (zt = i|zt+1 = j, y1:T ; θ) =p (yt|zt = i; θ) p(zt = i|zt+1 = j, y1:t+1; θ)∑Zk=0 p(yt|zt = k; θ)p(zt = k|zt+1 = j, y1:t+1; θ)

(26)

p (zt = i, zt+1 = j|y1:T ; θ) = p (zt+1 = j|y1:T ; θ) p (zt = i|zt+1 = j, y1:T ; θ) (27)

p (zt = i|y1:T ; θ) =Z∑j=0

p (zt = j|y1:T ; θ) p (zt = i|zt+1 = j, y1:T ; θ) (28)

Computation of (26) also requires an O(Z2) operation on the backwards pass. This can be

reduced substantially by taking into account the sparse matrix structure of the transition

probabilities.

This algorithm also computes the bivariate smoothing probabilities p (zt = i, zt+1 = j|y1:T ; θ)

which are used to calculate the moments and quantiles of the marginal smoothing distribution

15

Page 16: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

for the variance

E [hαt |y1:T ] =Z∑

zt=0

Z∑zt+1=0

E [hαt |yt, zt, zt+1] p(zt, zt+1|y1:T ; θ)

P (ht < x|y1:T ; θ) =Z∑

zt=0

Z∑zt+1=0

[∫ x

0

p (ht|yt, zt, zt+1; θ) dht

]p(zt, zt+1|y1:T ; θ)

Conditional expectations and quantiles of the marginal smoothing distribution p(ht|yt, zt, zt+1; θ)

are calculated from the properties of the GIG and gamma distributions of Proposition 2.

Smoothed estimates of the variance are of independent interest but the algorithm can also

be used as part of the expectation-maximization (EM) algorithm of Dempster, Laird, and

Rubin (1977) as an alternative approach to calculating the maximum likelihood estimator.

3.3 Filtering recursions based on particle filters

An alternative method for calculating the likelihood function and filtering distributions for the

AG-SV model are sequential Monte Carlo algorithms also known as particle filters. Particle

filters approximate distributions whose support is infinite by a finite set of points and probability

masses; see Creal (2012a) for a recent survey. After integrating out the state variable ht, it is

possible to design a particle filter for the AG-SV model that is similar to the filtering algorithm

for finite state Markov switching models with a few key differences. First, the methods differ

in how the points of support are determined. A typical particle filter will select the support

points for zt at random via Monte Carlo draws while the finite state Markov-switching model

in the previous section has a deterministic set of points of support. Secondly, the particle filter

will approximate some probabilities via Monte Carlo that the Markov-switching algorithm will

compute exactly from the definition of the probability mass functions involved in the recursions.

The particle filter therefore introduces some variability due to Monte Carlo error that is not

present in the deterministic scheme. In this section, I describe the similarities between a particle

filter for the AG-SV model and the filter for finite state Markov-switching models as well as

some of the relative advantages and disadvantages.

The first step to obtaining a particle filter that is analogous to the Markov-switching model is

to analytically integrate out any continuous-valued state variables such as the variance ht. This

16

Page 17: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

is simply an application of Proposition 1 and is known as Rao-Blackwellisation, e.g. Chen and

Liu (2000). Analytical integration of any state variable always decreases the Monte Carlo vari-

ance of a particle filter; see, e.g. Chopin (2004). Next, the filtering distribution p(zt−1|y1:t−1; θ) is

represented by a collection of “particles” given by{w

(i)t−1, z

(i)t−1

}Z+1

i=1. The values z

(i)t−1 are locations

on the support of the distribution and the weights w(i)t−1 represent the probability mass at that

point. When the support of the state variable is continuous, the probability that two particles

take on the same value is zero but this is not true when the support is discrete. If the particles

in the particle filter are initialized at random with replacement, some integers may be dupli-

cated while other support points are randomly omitted. Conversely, in the filter for Markov-

switching models, the locations on the support of the distribution are the integers z(i)t−1 = i− 1

for i = 1 . . . , Z + 1 and the weights are the probabilities w(i)t−1 = p(zt−1 = i− 1|y1:t−1; θ). Each

“particle” in the Markov-switching algorithm is assigned a unique integer and no integers are

omitted smaller than Z.

At each iteration of a particle filtering algorithm, the particles’ locations and weights are up-

dated from one time period{w

(i)t−1, z

(i)t−1

}Zi=1

to the next time period{w

(i)t , z

(i)t

}Zi=1

using Monte

Carlo methods. The transitions for each particle from z(i)t−1 to z

(i)t are determined randomly by

drawing from an importance distribution q(z(i)t |yt−1, z

(i)t−1; θ) and calculating the unnormalized

importance weights as

w(i)t ∝ w

(i)t−1

p(yt|z(i)t ; θ)p(z(i)t |z

(i)t−1, yt−1; θ)

q(z(i)t |yt−1, z

(i)t−1; θ)

.

The distribution q(z(i)t |yt−1, z

(i)t−1; θ) is chosen by the user. The new set of particles

{w

(i)t , z

(i)t

}Zi=1

form a discrete distribution that approximates the next filtering distribution p(zt|y1:t; θ). It has

been argued in the particle filtering literature by Fearnhead and Clifford (2003) that when the

support of the state variable is discrete a particle filter should not allow particles to duplicate

transitions. Too see this, consider two particles that transitioned to the same integer, each

one will receive half the total probability mass at that location. However, it is equivalent to

have a single particle at that same location receiving all the mass at that point. Duplicating

transitions is redundant and computationally wasteful as any additional particle should go to

a new integer location that has yet to be explored by any other particle. This is essentially

17

Page 18: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

what the filtering algorithm for Markov-switching models of Section 3.1 is doing. It forces each

particle to transition deterministically through all of the possible future Z states creating a

total of Z2 unique combinations, z(i)t = j, z

(i)t−1 = k and no duplications.2

By comparing the filtering algorithm of the previous section with a particle filter that uses

Rao-Blackwellisation, it is easy to see that the algorithms share many similarities. The particle

filter has the advantage that one does not need to choose a value of Z to truncate the infinite

sums as this is endogenously determined. However, the finite dimensional Markov switching

algorithm of the previous section does not introduce Monte Carlo variability.

Finally, we note that there exists one important computational constraint that needs to be

considered for both methods. The densities p(yt|zt; θ) and p(zt|zt−1, yt−1; θ) are both functions

of the modified Bessel function of the second kind Kλ+zt(x) where the order parameter λ + zt

is a function of the discrete state variable. Unfortunately, this function cannot be evaluated

“directly” for arbitrary values of zt and a fixed argument x. Instead, all methods for calculating

modified Bessel functions in a computationally stable manner start by evaluating the function

twice at stable values for the order and then applying the recursion

Kλ+zt+1(x) = Kλ+zt−1(x) +2 (λ+ zt)

xKλ+zt(x).

This has an important practical ramification for methods that randomly sample zt such as the

particle filter. Even if an integer value of zt is not drawn at random, the probabilities associated

with it will likely need to be computed and discarded during the algorithm. Consequently,

for a given number of particles (support points), a particle filter that fully marginalizes out

zt−1 at each iteration will typically have to perform at least as many computations as the

deterministic algorithm of Section 3.1. This appears to be computationally wasteful as any

additional computations are better spent on increasing the value of Z in the deterministic

algorithm. It is for this reason as well as the lack of Monte Carlo variation that the empirical

section of this paper focuses on the filtering algorithms for finite-state Markov switching models.

2Particle filters also include a resampling step, which is not present in the deterministic filtering algorithm.When the state variable is discrete-valued, the only role played by the resampling step within a particle filter isto ensure that the number of particles does not explode.

18

Page 19: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

4 Application of the new methods

4.1 Comparisons with the particle filter

In this section, I compare the log-likelihood functions calculated from the new algorithm with

a standard particle filtering algorithm to demonstrate their relative accuracy. The algorithms

are compared by running them on the S&P500 series discussed further below in Section 4.2.

All filtering algorithms were run with the parameter values fixed at the ML estimates from

the next section. I run the filtering algorithm of Section 3.1 with the truncation parameter

set at Z = 3500. For comparison purposes, I implement a particle filter with the transition

density (5) as a proposal where the particles are resampled at random times according to the

effective sample size (ESS), see Creal (2012a). I use the residual resampling algorithm of Liu

and Chen (1998), which is known to be unbiased. This is a simple extension of the original

particle filter of Gordon, Salmond, and Smith (1993).3 The new methods are compared to

a simple particle filter in order to demonstrate the accuracy of a typical particle filter which

is commonly used in practice. Secondly, it is valuable to compare the new approach which

requires the evaluation of the modified Bessel function to compute the conditional likelihood

p(yt|zt; θ) to another algorithm that does not.

To compare the methods, I compute slices of the log-likelihood function for the parameters φ

and ν over the regions [0.97, 0.999] and [1.0, 2.2], respectively. Cuts of the log-likelihood function

for the remaining parameters (µ, β, c) are similar and are available in an online appendix. Each

of these regions are divided into 1000 equally spaced points, where the log-likelihood function

is evaluated. While the values of φ and ν change one at a time, the remaining parameters are

held fixed at their ML estimates. The log-likelihood functions from the particle filter are shown

for different numbers of particles N = 10000, N = 30000 and N = 100000. These values were

selected because they are representative of values of N used throughout the literature. One

limitation of the particle filter that was raised early in that literature can be seen from Figure 1.

The particle filter does not produce an estimate of the log-likelihood function that is smooth in

the parameter space. Consequently, standard derivative-based optimization routines may have

trouble converging to the maximum. The smoothness of the log-likelihood function produced

3I implemented additional particle filtering algorithms including the auxiliary particle filter of Pitt andShephard (1999). The results were practically identical.

19

Page 20: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

0.97 0.975 0.98 0.985 0.99 0.995

−4570

−4565

−4560

−4555

−4550

−4545

−4540

N = 10000

1.2 1.4 1.6 1.8 2 2.2

−4550

−4548

−4546

−4544

−4542

−4540

N = 10000

0.97 0.975 0.98 0.985 0.99 0.995

−4570

−4565

−4560

−4555

−4550

−4545

−4540

N = 30000

1.2 1.4 1.6 1.8 2 2.2

−4550

−4548

−4546

−4544

−4542

−4540

N = 30000

0.97 0.975 0.98 0.985 0.99 0.995

−4570

−4565

−4560

−4555

−4550

−4545

−4540

N = 100000

1.2 1.4 1.6 1.8 2 2.2

−4550

−4548

−4546

−4544

−4542

−4540

N = 100000

Figure 1: Slices of the log-likelihood function for φ (left) and ν (right) for the new algorithm (blue line)

versus the particle filter (red dots). The particle sizes are N = 10000 (top row), N = 30000 (middle row), and

N = 100000 (bottom row). The vertical (green) lines are the ML estimates of the model reported in Table 1.

20

Page 21: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

by the new approach is an attractive feature. It makes calculation of the ML estimates easily

feasible as shown in the next section.

Secondly, in this application, there is a sizeable downward bias in the particle filter’s es-

timates of the log-likelihood function as well as a reasonably large amount of Monte Carlo

variability in the estimates. It is known that the particle filter’s estimator of the likelihood

function is unbiased, as long as the resampling algorithm used within a particle filter is un-

biased. The particle filter’s estimate of the likelihood is also consistent and asymptotically

normal; see, e.g. Chapter 11 of Del Moral (2004). However, taking logarithms causes the par-

ticle filter’s estimate to be biased downward for finite values of the particles due to Jensen’s

inequality.

4.2 Applications to financial data

In this section, I illustrate the new methods on several daily financial time series. These

include the S&P 500 index, the MSCI-Emerging Markets Asia index, and the Euro-to-U.S.

dollar exchange rate. The first two series were downloaded from Bloomberg and the latter

series was taken from the Board of Governors of the Federal Reserve. All series are from

January 3rd, 2000 to December 16th, 2011 making for 3009, 3118, and 3008 observations

for each series, respectively. Starting values for the parameters of the variance (φ, ν, c) were

obtained by matching the unconditional mean, variance, and persistence of average squared

returns to the unconditional distribution. The values of (µ, β) were initialized at zero. The

Feller condition ν > 1 as well as the constraints 0 < φ < 1 and c > 0 were imposed throughout

the estimation.

Estimates of the parameters of the discrete-time model as well as standard errors are re-

ported in Table 1. The standard errors are calculated by numerically inverting the Hessian

at the ML estimates. The implied parameters of the continuous-time model are also reported

in Table 1 assuming a discretization step size of τ = 1256

.4 For all series, the risk premium

4If interest centers on the continuous-time model, it is possible to estimate the parameters θ =(µ, β, κ, θh, σ

2). However, one needs to make an assumption about the integrated variance

∫ t+τt

h(s)ds whichdoes not have a closed-form solution for the CIR model. A standard assumption in the literature see for exampleEraker, Johannes, and Polson (2003) is to sub-divide time intervals τ into M equal sub-intervals and approxi-

mate this integral by a Riemann sum∫ t+τt

h(s)ds ≈∑Mj=1 ht+j

τM . The discrete time model here assumes values

M = 1 and τ = 1 whereas in most applications in continuous-time finance the parameters are reported on anannual basis such that τ = 1

256 where 256 is the number of trading days in a year.

21

Page 22: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Table 1: Maximum likelihood estimates for the AG-SV model.

µ β φ c ν θh κ σ2 log-likeS&P500 0.102 -0.061 0.988 0.015 1.539 1.815 3.168 7.470 -4542.1

(0.020) (0.018) (0.004) (0.003) (0.193)

MSCI-EM-ASIA 0.250 -0.115 0.978 0.024 1.963 2.115 5.736 12.364 -5213.2(0.031) (0.020) (0.006) (0.005) (0.229)

Euro/$ 0.073 -0.148 0.994 0.0007 3.740 0.442 1.489 0.359 -2862.2(0.022) (0.058) (0.004) (0.0002) (1.263)

Maximum likelihood estimates of the parameters θ = (µ, β, φ, c, ν) of the discrete-time AG-SV model onthree data sets of daily returns. The series are the S&P500, the MSCI emerging markets Asia index, and theEURO-$ exchange rate. The data cover January 3, 2000 to December 16, 2011. Asymptotic standard errorsare reported in parenthesis. The continuous-time parameters are reported for intervals τ = 1

256 .

parameters β are estimated to be negative and significant implying that the distribution of

returns are negatively skewed. Estimates of the autocorrelation parameter φ for the S&P500

and MSCI series are slightly smaller than is often reported in the literature for log-normal SV

models. The dynamics of volatility for the Euro/$ series is substantially different than the

other series. The mean θh of volatility is much lower and the volatility is more persistent.

Figure 2 contains output from the estimated model for the S&P500 series. The top left

panel is a plot of the filtered and smoothed estimates of the volatility√ht over the sample

period. The estimates are consistent with what one would expect from looking at the raw

return series. There are large increases in volatility during the U.S. financial crisis of 2008

followed by another recent spike in volatility during the European debt crisis.

To provide some information on how the truncation parameter Z impacts the estimates,

the top right panel of Figure 2 is a plot of the marginal filtering distributions for the discrete

mixing variable p(zt|y1:t; θ) on three different days. The three distributions that are pictured are

representative of the distribution p(zt|y1:t; θ) during low (6/27/2007), medium (10/2/2008), and

high volatility (12/1/2008) periods. The distribution of zt for the final date (12/1/2008) was

chosen because it is the day at which the mean of zt is estimated to be the largest throughout

the sample. Consequently, it is the distribution where the truncation will have the largest

impact. This time period also corresponds to the largest estimated value of the variance ht.

The graph illustrates visually that the impact of truncating the distribution is negligible as the

22

Page 23: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

2002 2004 2006 2008 20100

0.5

1

1.5

2

2.5

3

3.5

4Filtered and smoothed estimates of volatility

Filtered

Smoothed

0 500 1000 1500 2000 25000

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

Probability mass functions for zt

← 6/27/2007

← 10/2/2008

← 12/1/2008

2004 20050

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Forecasting distribution of ht on 7/29/2005

95% interval

median

mean

2008 20090

2

4

6

8

10

12

14

16

18

20

Forecasting distribution of ht on 11/28/2008

95% interval

median

mean

Figure 2: Estimation results for the AG-SV model on the S&P500 index from 1/3/2000 to 12/16/2011.

Top left: filtered and smoothed estimates of the volatility√ht. Top right: marginal probability mass functions

p(zt|y1:t; θ) on three different dates. Bottom left: forecasting distribution of the variance ht for 100 days be-

ginning on 7/29/2005. Bottom right: forecasting distribution of the variance ht for 100 days beginning on

11/28/2008.

truncation point is far into the tails of the distribution.5

In Section 3.3, I argued that the only way to compute the log-likelihood function more

accurately was to increase the value of Z. To quantify this further, the algorithms were run

again and the log-likelihood functions as well as the c.d.f.s for zt were calculated for a series of

5Technically, the accuracy of the computations are limited to the accuracy of the modified Bessel functionof the second kind, which is typically around 10−14 or 10−15 depending on the point at which it is evaluated.

23

Page 24: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Table 2: Cumulative distribution functions P (zt|y1:t; θ) for different values of Z.

P (zt ≤ 1500|y1:t; θ) P (zt ≤ 2000|y1:t; θ) P (zt ≤ 2500|y1:t; θ) log-likeZ = 2500 0.9960571656208360 0.9999956716590525 1.0000000000000000 -4542.0625588879939Z = 3000 0.9960571624858776 0.9999956685945898 0.9999999986272218 -4542.0625588934108Z = 3500 0.9960571624854725 0.9999956685941842 0.9999999986268223 -4542.0625588934117Z = 5000 0.9960571624854725 0.9999956685941842 0.9999999986268223 -4542.0625588934117

This table contains the cumulative distribution functions of the filtering distributions P (zt|y1:t; θ) and thelog-likelihood as a function of the truncation parameter for Z = 2500, 3000, 3500, 5000 on 12/1/2008. This isfor the S&P 500 dataset. The log-likelihood values reported in the table are for the entire sample.

different truncation values on the S&P 500 dataset. The cumulative distribution functions for

the filtering distribution P (zt|y1:t; θ) on 12/1/2008 and the overall log-likelihoods are reported

in Table 2 for values of Z = 2500, 3000, 3500, 5000. The first row of the table contains the

cumulative probabilities for values of zt = 1500, 2000, 2500 when the algorithm was run with

Z = 2500. When the value of Z is set at 2500, there is 0.9960571656208360 cumulative

probability to the left of zt = 1500. When zt = 2500 and Z = 2500, the c.d.f. reaches a value

of one due to self-normalizing the probabilities. The purpose of the table is to illustrate that

the value of the log-likelihood function converges (numerically) between the value of Z = 3000

and Z = 3500. Therefore, it is possible to calculate the log-likelihood function of the model

accurately for reasonable values of Z that are computationally tractable for daily frequency

data.

An important feature of the AG-SV model that is useful in practice is that forecasts of

the variance can be computed accurately without simulation. These H-step ahead forecasts

of the variance are produced by iterating on the vector of probabilities p(zt+1|y1:t; θ) from the

last iteration of the Markov-switching algorithm using the transition distribution of the latent

variable zt. When there are no observations available out of sample, the transition distribu-

tion of zt was shown by Gourieroux and Jasiak (2006) to be a negative binomial distribution

p(zt|zt−1; θ) = Neg. Bin.(ν + zt−1,

φ1+φ

). Starting with the filtering distribution p(zT |y1:T ; θ)

at the final time period and iterating for H periods on the transition distribution produces

p(zt+H |y1:t; θ). This distribution can be combined with the conditional distribution of the vari-

ance at any horizon p(ht+H |zt+H ; θ) = Gamma (ν + zt+H , c) to construct forecasting intervals.

For example, to calculate quantiles of the H-step ahead forecasting distribution, one can solve

24

Page 25: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

for x given a fixed value of Z and the ML estimator θ in the equation

P (ht+H < x|y1:t; θ) =Z∑

zt=0

[∫ x

0

p(ht+H |zt+1; θ

)dht

]p(zt+H |y1:t; θ) (29)

using a simple zero-finding algorithm.

The bottom two plots in Figure 2 are forecasts of the future variance ht beginning on two

different dates 7/29/2005 and 11/28/2008. Forecasts for the mean, median, and 95% intervals

for ht are produced for H =100 days. These dates were selected to illustrate the substantial

difference in both asymmetry and uncertainty in the forecasts during low and high periods

of volatility. When volatility is low (7/29/2005), the forecasting distribution of ht is highly

asymmetric and the mean and median of the distribution differ in economically important

ways. Conversely, when volatility is high (11/28/2008), the distribution of the variance is

roughly normally distributed. The difference in width of the 95% error bands between the two

dates also illustrates how much more uncertainty exists in the financial markets during a crisis.

5 Conclusion

In this paper, I developed methods that can compute the log-likelihood function of discrete-

time autoregressive gamma stochastic volatility models exactly up to computer tolerance. The

new algorithms are based on the insight that it is possible to integrate out the latent variance

analytically leaving only a discrete mixture variable. The discrete variable is defined over

the set of non-negative integers but for practical purposes it is possible to approximate the

distributions involved by a finite-dimensional Markov switching model. Consequently, the log-

likelihood function can be computed using standard algorithms for Markov-switching models.

Furthermore, filtered and smoothed estimates of the latent variance can easily be computed as

a by-product.

Using the unique conditional independence structure highlighted in Proposition 2, it is also

easy to draw samples from the joint distribution p(h0:T , z1:T |y1:T , θ) and thus the marginal dis-

tribution of the variance. This follows by decomposing the joint distribution into a conditional

and a marginal as p(h0:T , z1:T |y1:T ; θ) = p(h0:T |y1:T , z1:T ; θ)p(z1:T |y1:T ; θ). A draw is taken from

the marginal distribution z1:T ∼ p(z1:T |y1:T ; θ) using standard results on Markov-switching al-

25

Page 26: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

gorithms. Then, conditional on the draw of z1:T , a draw is taken from h0:T ∼ p(h0:T |y1:T , z1:T ; θ)

which are the distributions from Proposition 2. Algorithms that draw samples from the joint

distribution of the latent variables given the data are known as simulation smoothers or alter-

natively as forward-filtering backward sampling (FFBS) algorithms; see, e.g. Carter and Kohn

(1994), Fruhwirth-Schnatter (1994), de Jong and Shephard (1995) and Durbin and Koopman

(2002) for linear, Gaussian models and Chib (1996) for Markov-switching models. Simulations

smoothers are practically important as they allow for frequentist and Bayesian analysis of more

complex AG-SV models. In addition, they are useful for calculating moments and quantiles of

nonlinear functions of the state variable such as the volatility√ht.

References

Abramowitz, M. and I. A. Stegun (1964). Handbook of Mathematical Functions. New York,

NY: Dover Publications Inc.

Barndorff-Nielsen, O. E. (1978). Hyperbolic distributions and distributions on hyperbolae.

Scandinavian Journal of Statistics 5 (3), 151–157.

Barndorff-Nielsen, O. E. and N. Shephard (2012). Basics of Levy processes. Working paper,

Oxford-Man Institute, University of Oxford.

Baum, L. E. and T. P. Petrie (1966). Statistical inference for probabilistic functions of finite

state Markov chains. The Annals of Mathematical Statistics 37 (1), 1554–1563.

Baum, L. E., T. P. Petrie, G. Soules, and N. Weiss (1970). A maximization technique oc-

curring in the statistical analysis of probabilistic functions of finite state Markov chains.

The Annals of Mathematical Statistics 41 (6), 164–171.

Cappe, O., E. Moulines, and T. Ryden (2005). Inference in Hidden Markov Models. New

York: Springer Press.

Carter, C. K. and R. Kohn (1994). On Gibbs sampling for state space models.

Biometrika 81 (3), 541–553.

Chen, R. and J. S. Liu (2000). Mixture Kalman filters. Journal of the Royal Statistical Society,

Series B 62 (3), 493–508.

26

Page 27: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Chib, S. (1996). Calculating posterior distributions and modal estimates in Markov mixture

models. Journal of Econometrics 75 (1), 79–97.

Chib, S., M. K. Pitt, and N. Shephard (2006). Likelihood inference for diffusion driven state

space models. Working Paper. Department of Economics, Nuffield College, University of

Oxford.

Chopin, N. (2004). Central limit theorem for sequential Monte Carlo and its application to

Bayesian inference. The Annals of Statistics 32 (6), 2385–2411.

Cox, J. C., J. E. Ingersoll, and S. A. Ross (1985). A theory of the term structure of interest

rates. Econometrica 53 (2), 385–407.

Creal, D. D. (2012a). A survey of sequential Monte Carlo methods for economics and finance.

Econometric Reviews 31 (3), 245–296.

Creal, D. D. (2012b). Maximum likelihood estimation of affine stochastic intensity models.

Working paper, University of Chicago, Booth School of Business.

Danielsson, J. (1994). Stochastic volatility in asset prices: estimation with simulated maxi-

mum likelihood. Journal of Econometrics 61 (1-2), 375–400.

de Jong, P. and N. Shephard (1995). The simulation smoother for time series models.

Biometrika 82 (2), 339–350.

Del Moral, P. (2004). Feyman-Kac Formulae: Genealogical and Interacting Particle Systems

with Applications. New York, NY: Springer.

Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete

data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39 (1), 1–38.

Duffie, D., A. Eckner, G. Horel, and L. Saita (2009). Frailty correlated default. The Journal

of Finance 64 (5), 2089–2123.

Duffie, D. and K. J. Singleton (1993). Simulated moments estimation of Markov models of

asset proces. Econometrica 61 (4), 929–952.

Duffie, D. and K. J. Singleton (2003). Credit Risk. Princeton, NJ: Princeton University Press.

Dugue, D. (1941). Sur un nouveau type de courbe de frequence. C.R. Academie Science

Paris 213, 634–635.

27

Page 28: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Durbin, J. and S. J. Koopman (1997). Monte Carlo maximum likelihood estimation of non-

Gaussian state space models. Biometrika 84 (3), 669–84.

Durbin, J. and S. J. Koopman (2001). Time Series Analysis by State Space Methods. Oxford,

UK: Oxford University Press.

Durbin, J. and S. J. Koopman (2002). A simple and efficient simulation smoother for state

space time series analysis. Biometrika 89 (3), 603–616.

Durham, G. and A. R. Gallant (2002). Numerical techniques for maximum likelihood esti-

mation of continuous-time diffusion processes (with discussion). Journal of Business and

Economic Statistics 20 (3), 297–338.

Eraker, B., M. Johannes, and N. G. Polson (2003). The impact of jumps in returns and

volatility. The Journal of Finance 53 (3), 1269–1300.

Fearnhead, P. and P. Clifford (2003). On-line inference for hidden Markov models via particle

filters. Journal of the Royal Statistical Society, Series B 65 (4), 887–899.

Feller, W. S. (1951). Two singular diffusion problems. Annals of Mathematics 54 (1), 173–182.

Fernandez-Villaverde, J. and J. F. Rubio-Ramırez (2007). Estimating macroeconomic models:

a likelihood approach. The Review of Economic Studies 74 (4), 1059–1087.

Fruhwirth-Schnatter, S. (1994). Data augmentation and dynamic linear models. Journal of

Time Series Analysis 15 (2), 183–202.

Fruhwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. New York,

NY: Springer Press.

Gallant, A. R. and G. Tauchen (1996). Which moments to match? Econometric Theory 12,

65–81.

Gordon, N. J., D. J. Salmond, and A. F. M. Smith (1993). A novel approach to nonlinear

and non-Gaussian Bayesian state estimation. IEE Proceedings. Part F: Radar and Sonar

Navigation 140 (2), 107–113.

Gourieroux, C. and J. Jasiak (2006). Autoregressive gamma processes. Journal of Forecast-

ing 25 (2), 129–152.

28

Page 29: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Gourieroux, C., A. Monfort, and E. Renault (1993). Indirect inference. Journal of Applied

Econometrics 8 (S), 85–118.

Hamilton, J. D. (1989). A new approach to the econometric analysis of nonstationary time

series and the business cycle. Econometrica 57 (2), 357–384.

Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ: Princeton University Press.

Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with appli-

cations to bond and currency options. The Review of Financial Studies 6 (2), 327–343.

Jacquier, E., M. Johannes, and N. G. Polson (2007). MCMC maximum likelihood for latent

state models. Journal of Econometrics 137 (2), 615–640.

Jacquier, E., N. G. Polson, and P. E. Rossi (1994). Bayesian analysis of stochastic volatility

models (with discussion). Journal of Business and Economic Statistics 12 (4), 371–417.

Kalman, R. (1960). A new approach to linear filtering and prediction problems. Journal of

Basic Engineering, Transactions of the ASME 82 (1), 35–46.

Kim, C.-J. and C. R. Nelson (1999). State Space Models with Regime Switching Classical and

Gibbs Sampling Approaches with Applications. Cambridge, MA: MIT Press.

Kim, S., N. Shephard, and S. Chib (1998). Stochastic volatility: likelihood inference and

comparison with ARCH models. The Review of Economic Studies 65 (3), 361–393.

Kotz, S., T. J. Kozubowski, and K. Podgorski (2001). The Laplace Distribution and Gener-

alizations. Boston, MA: Birkhauser.

Liu, J. S. and R. Chen (1998). Sequential Monte Carlo computation for dynamic systems.

Journal of the American Statistical Association 93 (443), 1032–1044.

Madan, D. B., P. P. Carr, and E. C. Chang (1998). The variance gamma process and option

pricing. Review of Finance 2 (1), 79–105.

Madan, D. B. and E. Seneta (1990). The variance gamma (V.G.) model for share market

returns. Journal of Business 63 (4), 511–524.

Pitt, M. K. and N. Shephard (1999). Filtering via simulation: auxiliary particle filters.

Journal of the American Statistical Association 94 (446), 590–599.

Shephard, N. (1994). Partial non-Gaussian state space. Biometrika 81 (1), 115–131.

29

Page 30: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Shephard, N. (2005). Stochastic Volatility: Selected Readings. Oxford, UK: Oxford University

Press.

Sichel, H. S. (1974). On a distribution representing sentence-length in written prose. Journal

of the Royal Statistical Society, Series A 135 (1), 25–34.

Sichel, H. S. (1975). On a distribution law for word frequencies. Journal of the American

Statistical Association 70 (350), 542–547.

Slevinsky, R. M. and H. Safouhi (2010). A recursive algorithm for the G transformation and

accurate computation of incomplete Bessel functions. Applied Numerical Mathematics 60,

1411–1417.

A Appendix:

A.1 Notation for the NG, GIG, Sichel and Bessel distributions

In this appendix, I define notation for the NG, GIG, Sichel and Bessel distributions that are used in the paper.

The notation for the parameters of the distributions is local to this section of the appendix. A normal gamma

random variable Y ∼ N. G. (µ, α, β, ν) is obtained by taking a normal distribution Y ∼ N(µ + βσ2, σ2) and

allowing its variance to have a gamma distribution σ2 ∼ Gamma(ν, 2

α2−β2

). The p.d.f. of a N.G. random

variable is

p(y|µ, α, β, ν) =

(α2 − β2

2

)ν √2 |yt − µ|(ν−12 )Kν− 1

2(α|yt − µ|)

√πΓ (ν)α(ν− 1

2 )exp (β|yt − µ|)

where Kλ(x) is the modified Bessel function of the second kind; see, e.g. Abramowitz and Stegun (1964). The

mean and variance of the distribution are

E [y] = Eσ2

[E(y|σ2

)]= Eσ2

[µ+ βσ2

]= µ+

2βν

α2 − β2

V [y] = E[V(y|σ2

)]+ V

[E(y|σ2

)]= E

[σ2]

+ V[µ+ βσ2

]=

α2 − β2+

4νβ2

(α2 − β2)2

see, e.g. Kotz, Kozubowski, and Podgorski (2001). The parameter β controls the symmetry of the distribution

such that β < 0 is left-skewed, β > 0 is right skewed, and a value of β = 0 is symmetric.

A generalized inverse Gaussian random variable X ∼ GIG(λ, χ, ψ) has p.d.f.

p(x|λ, χ, ψ) =χ−λ

(√χψ)λ

2Kλ

(√χψ) xλ−1 exp

(−1

2

[χx−1 + ψx

])

30

Page 31: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

The GIG distribution was first proposed by E. Halphen under the pseudonym Dugue (1941) and it was popular-

ized in statistics by Barndorff-Nielsen (1978). Moments of the GIG distribution are E[Xk]

=(χψ

) k2 Kλ+k(

√χψ)

Kλ(√χψ)

,

which are needed throughout the paper for computations of the filtering and smoothing distributions for the

latent variance. The c.d.f. of a GIG random variable needed for the calculation of the quantiles can be expressed

in terms of the incomplete Bessel function. Slevinsky and Safouhi (2010) provide a routine for its calculation

that does not use numerical integration.

A random variable Z ∼ Sichel (λ, χ, ψ) is obtained by taking a Poisson random variable Z ∼ Poisson (X)

and allowing the mean X to be a random draw from a GIG distribution X ∼ GIG(λ, χ, ψ). The mass function

for a Sichel random variable is

p(z|λ, χ, ψ) =

(√ψ

ψ + 2

)λ(√χ

ψ + 2

)z1

z!

Kλ+z

(√χ (ψ + 2)

)Kλ

(√χψ) , z ≥ 0

The first two moments of a Sichel distribution are often reported incorrectly in the literature. They follow from

the law of iterated expectations as

E [z] = Ex [E (z|x)] = Ex [x] =

ψ

) 12 Kλ+1

(√χψ)

(√χψ)

E[z2]

= Ex[E(z2|x

)]= Ex

[x+ x2

]=

ψ

) 12 Kλ+1

(√χψ)

(√χψ) +

ψ

)Kλ+2

(√χψ)

(√χψ)

The probabilities of the Sichel distribution and the density of the NG distribution can be computed in a

computationally stable manner using a recursive relationship for the ratio of modified Bessel functions.

Kλ+1(x)

Kλ(x)=

Kλ−1(x)

Kλ(x)+

x

A Bessel random variable S ∼ Bessel (κ, γ) has probability mass function

p(s|κ, γ) =1

Iκ (γ) s!Γ (s+ κ+ 1)

(γ2

)2s+κ, s ≥ 0

where Iκ (γ) is the modified Bessel function of the first kind; see, e.g. Abramowitz and Stegun (1964).

31

Page 32: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

A.2 Proof of Proposition 1:

The conditional likelihood obtained by integrating the variance ht out of the original measurement equation is

p(yt|zt; θ) =

∫ ∞0

p(yt|ht; θ)p(ht|zt; θ)dht

=

∫ ∞0

1√2πh−1/2t exp

(−1

2[yt − µ− βht]2 h−1t

)1

Γ(ν + zt)

(1

c

)(ν+zt)

h(ν+zt)−1t exp

(−htc

)dht

=

(1

c

)ν+zt √2 |yt − µ|ν+zt−12 Kν+zt− 1

2

(√2c + β2|yt − µ|

)√πΓ(ν + zt)

(√2c + β2

)ν+zt− 12

exp (|yt − µ|β)

= N.G.

(µ,

√2

c+ β2, β, ν + zt

)

see Appendix A.1 for the definition of the density.

By Proposition 2 below, we know that p(ht−1|zt−1, y1:t−1; θ) = GIG(ν + zt − 1

2 , (yt − µ)2, 2c + β2

). The

transition distribution of the discrete mixing variable conditional on observing y1:t−1 is given by

p(zt|zt−1, y1:t−1; θ) =

∫ ∞0

p(zt|ht−1; θ)p(ht−1|zt−1, y1:t−1; θ)dht−1

=

∫ ∞0

(φht−1c

)zt 1

zt!exp

(−φht−1

c

)[(yt−1 − µ)

2]−(ν+zt−1− 1

2 )(√

(yt−1 − µ)2 ( 2

c + β2))ν+zt−1− 1

2

2Kν+zt−1− 12

(√(yt−1 − µ)

2 ( 2c + β2

))hν− 1

2+zt−1−1t−1 exp

(−1

2

[(yt−1 − µ)

2h−1t−1 +

(2

c+ β2

)ht−1

])dht−1

=

(2c + β2

2(1+φ)c + β2

)(ν+zt−1− 12 )/2(

φ

c

)zt |yt−1 − µ|√2(1+φ)

c + β2

zt

1

zt!

Kν+zt−1+zt− 12

(√(yt−1 − µ)

2(

2(1+φ)c + β2

))Kν+zt−1− 1

2

(√(yt−1 − µ)

2 ( 2c + β2

))= Sichel

(ν + zt−1 −

1

2,φ

c(yt−1 − µ)

2,c

φ

[2

c+ β2

])The Markov transition distribution of zt is non-homogenous as it depends on the most recent observation yt−1.

The initial distribution of z1 was given by Gourieroux and Jasiak (2006) but we provide it for completeness.

32

Page 33: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

Recall that h0 is a draw from the stationary distribution Ga(ν, c

1−φ

). The initial distribution of zt is

p(z1; θ) =

∫ ∞0

p(z1|h0; θ)p(h0; θ)dh0

=

∫ ∞0

1

z1!

(φh0c

)z1exp

(−φh0

c

)hν−10

Γ(ν)exp

(−h0(1− φ)

c

)(c

1− φ

)−νdh0

=Γ (ν + z1)

Γ(z1 + 1)Γ(ν)(1− φ)

νφz1

= Negative Binomial (ν, φ)

where E [z1] = νφ1−φ .

A.3 Proof of Proposition 2:

The one-step ahead predictive distribution p(ht+1|z1:t, y1:tθ) = Gamma (ν + zt, c) follows immediately from the

definition of the model. The marginal filtering distribution of the variance is

p(ht|z1:t, y1:t; θ) ∝ p(yt|ht; θ)p(ht|zt; θ)

∝ h−1/2t exp

(−1

2[yt − µ− βht]2 h−1t

)1

Γ(ν + zt)c−(ν+zt)h

(ν+zt)−1t exp

(−htc

)∝ h

ν−1/2+zt−1t exp

(−1

2

[(yt − µ)

2h−1t +

(2

c+ β2

)ht

])= GIG

(ν + zt −

1

2, (yt − µ)

2,

2

c+ β2

)The conditional distribution only depends on zt and yt. The final distribution at time t = T is just a special

case of the former result

p(hT |z1:T , y1:T ; θ) = GIG

(ν + zT −

1

2, (yT − µ)

2,

2

c+ β2

).

The marginal smoothing distribution for the variance is

p(ht|z1:T , y1:T ; θ) ∝ p(yt|ht; θ)p(ht|zt; θ)p(zt+1|ht; θ)

∝ h− 1

2t exp

(−1

2[yt − µ− βht]2 h−1t

)1

Γ(ν + zt)c−(ν+zt)h

(ν+zt)−1t exp

(−htc

)(φhtc

)zt+1 1

zt+1!exp

(−φht

c

)∝ h

(ν+zt+zt+1− 12 )−1

t exp

(−1

2

[(yt − µ)

2h−1t +

(2(1 + φ)

c+ β2

)ht

])= GIG

(ν + zt + zt+1 −

1

2, (yt − µ)

2,

2(1 + φ)

c+ β2

)

33

Page 34: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

The marginal smoothing distribution of ht at intermediate dates only depends on zt, zt+1 and yt. The marginal

distribution of zt is

p(zt|h1:T , y1:T ; θ) ∝ p(zt|ht−1; θ)p(ht|zt; θ)

∝ 1

zt!

(φht−1c

)zt 1

Γ (ν + zt)

(htc

)zt= Bessel

(ν − 1, 2

√φhtht−1c2

)

which does not depend on the observed data. The distribution for the initial condition is independent of the

data conditional on z1. I find

p(h0|z1:T , y1:T ; θ) ∝ p(z1|h0; θ)p(h0; θ)

∝ 1

z1!

(φh0c

)z1exp

(−φh0

c

)hν−10

Γ(ν)exp

(−h0(1− φ)

c

)(c

1− φ

)−ν∝ hν+z1−10 exp

(−h0c

)= Gamma (ν + z1, c)

A.4 Proof of Proposition 3:

Recall that the stationary distribution of the variance is a gamma distribution h0 ∼ Ga(ν, c

1−φ

). One iter-

ation of the Markov transition kernel leaves the distribution invariant such that h1 ∼ Ga(ν, c

1−φ

). The first

contribution to the log-likelihood is

p(y1; θ) =

∫ ∞0

p(y1|h1; θ)p(h1; θ)dh1

=

∫ ∞0

1√2πh− 1

21 exp

(−1

2(y1 − µ− βh1)

2h−11

)exp

(−h1(1− φ)

c

)(c

1− φ

)−ν1

Γ(ν)hν−11 dh1

=

(1− φc

)ν1

Γ(ν)

1√2π

exp (β|y1 − µ|)∫ ∞0

hν− 1

2−11 exp

(−1

2

[(y1 − µ)

2h−11 +

[2(1− φ)

c+ β2

]h1

])dh1

=

(1− φc

)ν √2 |y1 − µ|ν−12 Kν− 1

2

((√2(1−φ)

c + β2

)|y1 − µ|

)√πΓ(ν)

(√2(1−φ)

c + β2

)ν− 12

exp (β |y1 − µ|)

= N.G.

(µ,

√2(1− φ)

c+ β2, β, ν

)

34

Page 35: Exact likelihood inference for autoregressive gamma .../media/Documents/... · Stochastic volatility models are essential tools in nance and increasingly in macroeconomics for modeling

The filtering distribution p(h1|y1; θ) immediately follows from Bayes’ rule

p(h1|y1; θ) ∝ p(y1|h1; θ)p(h1; θ)

∝ h−1/21 exp

(−1

2

[(y1 − µ)

2h−11 + β2h1

])exp

(−h1(1− φ)

c

)hν−11

∝ hν− 1

2−11 exp

(−1

2

[(y1 − µ)

2h−11 +

(2(1− φ)

c+ β2

)h1

])= GIG

(ν − 1

2, (y1 − µ)

2,

2(1− φ)

c+ β2

)The conditional filtering distribution of z1 is

p(z1|y1; θ) ∝ p(y1|z1; θ)p(z1; θ)

∝ 1

Γ(ν + z1)

c

)z1 Kν+z1− 12

(√2c + β2|y1 − µ|

)[(y1 − µ)

2]−z1 (√

2c + β2|y1 − µ|

)z1 1

z1!Γ (ν + z1)

φc

|y1 − µ|√2c + β2

z1

1

z1!Kν+z1− 1

2

(√2

c+ β2|y1 − µ|

)

= Sichel

(ν − 1

2,φ

c(y1 − µ)

2,c

φ

[2(1− φ)

c+ β2

])A Sichel distribution is typically derived as a Poisson random variable with a GIG mixing distribution. Here,

it is the posterior distribution of a normal gamma likelihood combined with a negative binomial prior. The

one-step ahead predictive distribution of z2 is

p(z2|y1; θ) =

∫ ∞0

p(z2|h1; θ)p(h1|y1; θ)dh1

=

∫ ∞0

1

z2!

(φh1c

)z2exp

(−φh1

c

)[(y1 − µ)

2]−(ν− 1

2 )(|y1 − µ|

√2(1−φ)

c + β2

)ν− 12

2Kν− 12

(|y1 − µ|

√2(1−φ)

c + β2

) h(ν− 1

2 )−11

exp

(−1

2

[(y1 − µ)

2h−11 +

(2(1− φ)

c+ β2

)h1

])dh1

=

(2(1−φ)

c + β2

2c + β2

)(ν− 12 )/2(

φ

c

)z2 √

(y1 − µ)2√

2c + β2

z2

1

z2!

Kν+z2− 12

(|y1 − µ|

√2c + β2

)Kν− 1

2

(|y1 − µ|

√2(1−φ)

c + β2

)= Sichel

(ν − 1

2,φ

c(y1 − µ)

2,c

φ

[2(1− φ)

c+ β2

])This is the usual construction of a Sichel distribution as a GIG mixture of Poisson random variables.

35