A Reconsideration of Continous Time One Factor Spot Rate Models

7/28/2019 A Reconsideration of Continous Time One Factor Spot Rate Models

1/149

A reconsideration of continuous timeone-factor spot rate models.

Master Thesis for the Cand.Scient.Oecon degree inMathematics and Economics at the University of

Copenhagen.

Christoffer Kanstrup

6th July 2004

Thesis counsellor: Anders Rahbek.Institute for Mathematical Sciences.

University of Copenhagen.


2/149

ii

Preface

This is my Master Thesis, it represents the conclusion of five years of studyingmathematics and economics at the University of Copenhagen. More specificallythis thesis is the product of my work during the spring and early summer of 2004and it is thus the product of six months work.

The work presented here is based on all the topics I have studied during the pre-vious five years specifically finance theory, statistics and econometrics. However,the actual methods and results in the thesis are all new to me and have not beenincluded in any of the courses I have followed earlier.

I would like to thank my thesis counsellor Anders Rahbek for his help, patienceand ability to read through my sometimes long ramblings.

Finally I would like to thank the people who helped me during the writing process,and also Mr. Yacine At-Sahalia for kindly sharing his research data.

University of CopenhagenJuly 2004

Christoffer kanstrup


3/149

Contents

Contents iii

List of Tables vii

List of Figures ix

1 Introduction 1

2 Stochastic Differential Equations 5

2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Definition and existence of solutions . . . . . . . . . . . . . . . . . 8

2.3 Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Ergodicity of the Solution . . . . . . . . . . . . . . . . . . . . . . 16

2.4.1 Scale functions and speed measures . . . . . . . . . . . . . 18

2.4.2 Bounded Solutions . . . . . . . . . . . . . . . . . . . . . . 19

2.4.3 Conditions for an Ergodic Solution . . . . . . . . . . . . . 20

2.4.4 Exponential Ergodicity . . . . . . . . . . . . . . . . . . . . 22

3 Parameter estimation in diffusion models 31

3.1 Approximating the likelihood function . . . . . . . . . . . . . . . 32

3.2 Martingale estimating functions . . . . . . . . . . . . . . . . . . . 35

3.2.1 Existence of optimal estimating functions . . . . . . . . . . 37

3.2.2 Asymptotic behavior of martingale estimating functions . . 39

3.2.3 Linear estimating functions . . . . . . . . . . . . . . . . . 42

3.2.4 Quadratic estimating functions . . . . . . . . . . . . . . . 44

iii


4/149

iv CONTENTS

3.2.5 Estimating the standard deviation . . . . . . . . . . . . . . 55

3.2.6 Estimators based on eigenfunctions . . . . . . . . . . . . . 58

3.3 Model misspecification analysis . . . . . . . . . . . . . . . . . . . 61

3.4 Empirical data example . . . . . . . . . . . . . . . . . . . . . . . 66

3.5 The At-Sahalia method . . . . . . . . . . . . . . . . . . . . . . . 69

4 Modelling the short rate 73

4.1 The term structure of interest rates . . . . . . . . . . . . . . . . . 73

4.2 Characteristics of the short rate . . . . . . . . . . . . . . . . . . . 76

4.2.1 Some standard models . . . . . . . . . . . . . . . . . . . . 78

4.3 Examining parametric models for the short rate . . . . . . . . . . 80

4.3.1 Specification of the general model . . . . . . . . . . . . . . 80

4.3.2 The estimation approach . . . . . . . . . . . . . . . . . . . 85

4.3.3 The identification problem . . . . . . . . . . . . . . . . . . 88

4.3.4 Using proxies . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4.1 The 7-day Eurodollar data . . . . . . . . . . . . . . . . . . 91

4.4.2 Conclusion on the empirical analysis . . . . . . . . . . . . 102

5 A semi-parametric approach 105

5.1 The Estimation Approach . . . . . . . . . . . . . . . . . . . . . . 105

5.1.1 Using transition probabilities to estimate the drift function 107

5.1.2 Kernel estimation of the diffusion function . . . . . . . . . 108

5.1.3 Semi-parametric diffusion estimation . . . . . . . . . . . . 110

5.2 A simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.3 Empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155.4 Misspecification analysis . . . . . . . . . . . . . . . . . . . . . . . 117

5.5 Conclusion on the semi-parametric estimation . . . . . . . . . . . 119

6 Conclusion 121

6.1 Discussion of the methods used . . . . . . . . . . . . . . . . . . . 122

6.2 Possible extensions of the work . . . . . . . . . . . . . . . . . . . 123


5/149

CONTENTS v

A Broydens Method 125

B Source Codes 129B.1 Broydens Method . . . . . . . . . . . . . . . . . . . . . . . . . . 129

B.2 Estimation Program . . . . . . . . . . . . . . . . . . . . . . . . . 132

Bibliography 137


6/149

vi


7/149

List of Tables

3.1 Results of simulation study of estimators based on the Euler ap-proximation of the likelihood function. . . . . . . . . . . . . . . . 35

3.2 Results of simulation study of maximum likelihood estimators inthe Ornstein-Uhlenbeck process. . . . . . . . . . . . . . . . . . . . 47

3.3 Estimators based on an approximation to the optimal quadraticestimating function. . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4 Standard deviation calculated by asymptotic estimator and boot-strap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.5 Estimation results of applying the CIR model to the one monthEurodollar rate, full sample ie. 1971-2004. The standard devia-tions are calculated using (3.32)-(3.33). . . . . . . . . . . . . . . 67

3.6 Estimation results of applying the CIR model to the one monthEurodollar rate, sub-sample ie. 1984-2001. . . . . . . . . . . . . . 68

4.1 Selection of parametric short rate models. . . . . . . . . . . . . . 78

4.2 Some models nested within the general parametric model. . . . . 81

4.3 Details about the 7-day Eurodollar data. . . . . . . . . . . . . . . 91

4.4 The estimated parameters in the CKLS model. . . . . . . . . . . . 93

4.5 The estimated parameters in the general-drift, CEV-diffusion model. 95

4.6 The estimated parameters in the linear-drift, general-diffusion model. 98

4.7 The estimated parameters in the unconstrained model. . . . . . . 99

4.8 Results of the model specification analysis. . . . . . . . . . . . . . 102

5.1 Results of simulation study of parametric estimators in semi-parametricmodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.2 The estimated parameters in semi-parametric model. . . . . . . . 115

vii


8/149

viii


9/149

List of Figures

2.1 A simulated sample-path of the CIR-process. . . . . . . . . . . . . 27

3.1 A simulated sample-path of the Ornstein-Uhlenbeck-process. . . . 453.2 Histogram and empirical density for estimators based on the opti-

mal quadratic estimating function. . . . . . . . . . . . . . . . . . 48

3.3 Estimators based on an approximation to the optimal quadraticestimating function. . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4 The estimated asymptotical standard deviation of the estimators. 58

3.5 Simulation study of the uniform residuals, tn = 250, = 2.5 . . . . 63

3.6 Simulation study of the uniform residuals, tn = 250, = 1. . . . . 64

3.7 Simulation study of the uniform residuals, tn = 250, = 0.1 . . . . 65

3.8 Daily observations of the one month Eurodollar rate. . . . . . . . 67

3.9 The uniform residuals ui based on the CIR model estimated onthe one month Eurodollar rate from 1971 to 2004 . . . . . . . . . 68

3.10 The uniform residuals ui based on the CIR model estimated onthe subsample of the one month Eurodollar rate from 1984 to 2001 69

4.1 A simulated sample-path of the full model. . . . . . . . . . . . . . 81

4.2 The function r 2r3 for various sets of parameter values. . . . 894.3 Daily observations of the 7-day Eurodollar rate . . . . . . . . . . 92

4.4 Cross-plot ofrti and rti1 for the 7-day Eurodollar data. . . . . . . 93

4.5 Roots of the companion matrix. . . . . . . . . . . . . . . . . . . . 93

4.6 The uniform residuals ui based on the CKLS model . . . . . . . . 94

4.7 The estimated parametric invariant density in the CKLS model. . 95

ix


10/149

x LIST OF FIGURES

4.8 Uniform residuals and drift function in the general-drift, CEV-diffusion model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.9 The estimated parametric invariant density in the general drift,CEV diffusion model. . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.10 Uniform residuals and diffusion function in the linear drift, generaldiffusion model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.11 The estimated parametric invariant density in the linear drift, gen-eral diffusion model. . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.12 Uniform residuals, drift and diffusion function in the full model. . 101

4.13 The estimated parametric invariant density in the full model. . . . 102

5.1 Histogram and empirical density for the parametric estimators inthe semi-parametric model. . . . . . . . . . . . . . . . . . . . . . 113

5.2 Model estimation and misspecification for a simulated CIR processwith tn = 500 and = 2.5. . . . . . . . . . . . . . . . . . . . . . 113

5.3 Model estimation and misspecification for a simulated CIR processwith tn = 500 and = 1. . . . . . . . . . . . . . . . . . . . . . . 114

5.4 Model estimation and misspecification for a simulated CIR processwith tn = 500 and = 0.1. . . . . . . . . . . . . . . . . . . . . . 114

5.5 Nonparametric density estimate of the invariant distribution and

estimated parametric drift. . . . . . . . . . . . . . . . . . . . . . . 1165.6 Estimated semi-parametric diffusion estimator. . . . . . . . . . . . 117

5.7 Estimated semi-parametric diffusion estimator compared with alinear function for lower values of the spot rate process. . . . . . . 117

5.8 The uniform residuals ui based on the semi-parametric model. . . 118

A.1 An example of a object function for which the Broyden proceduremight diverge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127


11/149

Chapter 1

IntroductionOne of the basic assumptions of models dealing with financial derivative pricingis that one or more assets and/or non-traded factors are given a priori. That is,sophisticated tools are developed to price derivative securities, when the under-lying object, X = (X1, . . . , X k)

T, is an empirically observable multi-dimensionalstochastic process whose continuous time dynamics is assumed to be given by thestochastic differential equation

dXt = (t, Xt)

k1dt + (t, Xt)

kndWt

n1

. (1.1)

Where W is a n-dimensional Brownian motion and and are deterministicmatrix functions whose functional forms are know except for the values of certainparameters.

For a range of popular derivative pricing models the dynamics of the underlyingasset are specified such that elegant solutions for eg. stock options prices or bondprices are available. Examples are the Black-Scholes model for stock prices orthe Vasicek model for the spot rate.

Specifying and estimating an appropriate stochastic differential equation requiressome quite advanced statistical/econometric tools. For instance the short termrisk free interest rate (the spot rate) is one of the most important subjects con-sidered in the financial markets. A wide range of models have been suggested toexplain the dynamics of this financial object, in fact some would claim that moremodels have been put forward to describe the dynamics of the spot rate thanfor any other issue in finance, see Chan et al. [1992] page 1209. However, thematter of how to specify a stochastic differential equation capable of capturingthe behavior of the spot rate is still an open question.

A main focus of the thesis is a critical examination of the widely used spot ratemodels. We present and discuss a wide range of the classical interest rate mod-els. From this, we continue the work of Chan et al. [1992] and show how theclassical linear drift models (Vasicek and CIR) fail to capture the dynamics of

observed short rate data. The literature propose ways of extending these modelsto compensate for the poor data-fit. In particular At-Sahalia [1996b] suggestthat nonlinear functional forms should be included. We show that this extensiononly improves part of the misspecification problems (the nonlinear forms bet-ter describe the observed invariant density), there are still clear indications ofmisspecification even for the extend models.

Focus will be almost entirely on modelling the underlying financial assets. Anatural extension, which is outside the scope of this thesis, would be to derive

1


12/149

2 CHAPTER 1. INTRODUCTION

expressions for various options or bond prices based on the suggested dynamicsof the financial assets. However, for more general models, which fit the observeddata better than the classical models, we would not expect to find analyticalexpressions for the price of derivatives and Monte Carlo simulation would haveto be implemented. As such, a major part of the analytical work in asset pricinglies in specifying an empirically acceptable a priori model.

The contribution of this thesis to the existing literature is thus threefold:

Firstly we collect and present the main theory of stochastic differential equations,we introduce some known results and describe them using examples. We alsopresent a continuous time version of the drift criterion, this appears to be new tothe theory of stochastic differential equations in this setting.

Secondly we introduce some relatively new results (from within the last decade)

concerning parameter estimation in continuous time. We discuss the followingmethods

Estimation using the likelihood function when possible. Estimation using an approximation to the true likelihood function. Estimation using linear estimating functions. Estimation using quadratic estimating functions. Estimation using estimating functions based on eigenfunctions. Semi-parametric estimation using the invariant density.

Implementation of these methods is discussed and the performance of the indi-vidual methods are compared using simulation studies.

Thirdly we analyse the various spot rate models including the relatively unex-plored nonlinear model proposed by At-Sahalia [1996b] which is presented indetail here. The identification and proxy problems that arise when attemptingto estimate the parameters in this model (and other short rate models) are il-lustrated. Along these lines is also the examination of a semi-parametric modelwhich only imposes parametric structure on the drift function.

Although the thesis is based on existing results regarding stochastic differentialequations there are new results. One new result regards topic three above: A crit-ical examination of the various spot rate models, including the new nonlinear andsemi-parametric models, has not been performed in detail in any of the literaturethat the author is aware of. The conclusions are also new and relevant in the lightof the fact classical no-arbitrage theory is based heavily on assumptions regardingthe spot rate dynamics. These assumptions are shown to be inconsistent withthe empirical findings.


13/149

3

The thesis is constructed in the following manner:

Chapter 2 formally defines the concept of a stochastic differential equation. It is

commonly known that the general form (1.1) does not guarantee that a solutioneven exists. We define what we mean by a solution to (1.1) and consider restric-tions on the functions and such that a solution actually exists. AlthoughChapter 2 does briefly present some introductory theory, a certain amount ofbasic knowledged of stochastic processes (such as the construction of the It o-Integral) and stochastic calculus (the Ito formula) is assumed known and notdiscussed in any detail. The probabilistic properties of a solution to a SDE arediscussed and conditions ensuring stationarity and ergodicity are derived.

A repeating example in this thesis will be the one-dimensional (k = n = 1)Cox-Ingersoll-Ross (CIR) model

dXt = a(b Xt)dt + dWt. (1.2)As an illustration of the theorems of Chapter 2, the existence and statisticalproperties, of a process satisfying (1.2) are derived in detail.

In (1.2) a, b and are parameters and the question of how to estimate the valuesof these parameters, for a set of observed values of the process, is the topic ofChapter 3. Although we assume that the actual process is continuous, empiri-cal observations will be of a discrete nature. The fact that an explicit solutionto a general stochastic differential equation can seldom be derived analytically,implies that methods such as maximum likelihood estimation will prove to be

impossible for the majority of the models considered. Instead we consider simplyreplacing the likelihood function with the likelihood function of an approximationto the true process. We also consider more sophisticated methods for parameterestimation based on discrete observations, see the list presented above for a ref-erence to the main estimation methods discussed in this chapter. The literatureon this topic has grown somewhat within the last decade and some of the mainreferences are given in Chapter 3.

The asymptotic properties of the proposed estimation methods are discussed andthe finite sample quality of the estimators are studied using simulation. Againthe CIR model is used as an example alongside the slightly more simple Ornstein-

Uhlenbek process which does in fact allow for explicit maximum likelihood esti-mation.

Apart from parameter estimation, Chapter 3 also presents a method for deter-mining how well a given model describes the data. This misspecification analysisis vital in light of the comment above of how the question of finding an acceptablespot rate model is still unresolved.

Having derived necessary theoretical statistical methods in the previous chapters,Chapter 4 turns focus to more financial topics. Basic concepts of term structure


14/149

4 CHAPTER 1. INTRODUCTION

modelling are recapitulated and an overview of the some of the main spot ratemodels from the literature is presented and the pros and cons of the models arediscussed. The estimation and misspecification analysis methods will be imple-mented on a new and previously unexamined class of spot rate models.

In Chapter 5 we consider a different approach to the task of estimating a stochas-tic differential equation on the basic of discrete observations. Instead of specifyingfunctional parametric forms of both () and () in (1.1), we consider a methodfor which it is sufficient to impose a parametric shape of () only. A nonparamet-ric estimator of () is derived thereby loosening the restrictive ties of imposinga parametric form of this function. The qualities of this method is examinedusing simulation studies and an empirical implementation on interest rate datais conducted.

Finally Chapter 6 concludes the thesis with a discussion of the results and somesuggestions for future work including extensions of the financial models used andimprovement of the estimation methods.


15/149

Chapter 2

Stochastic Differential EquationsThis chapter recapitulates the theory of stochastic integrals, stochastic differen-tials and stochastic differential equations. The contents in this chapter is basedon basic knowledge of Brownian motions and Ito integrals, it covers theorems onproperties of solutions of stochastic differential equations. Conditions for station-arity are presented this is something which is seldom discussed in great detail infinance theory. In general the purpose of this chapter is to provide a broad un-derstanding of random processes that are constructed as solutions to stochasticdifferential equations and prepare for the work ahead.

2.1 Preliminaries: Ito integrals, diffusions and

general stochastic integrals

We start of with some preliminary notes on the notation used in the following.Consider a given filtered probability space (, F, {Ft}t0, P) and a stochasticprocess X : [0, ) Rn. Hence for each t 0 and for each , X(t, )will be a vector in Rn. For a fixed t the notation Xt, will be used for the stochasticvariable defined by Xt = X(t, ) : Rn on the probability space (, Ft, Pt)where Pt is the restriction of P to Ft.Assume that (, F, {Ft}t0, P) is a filtered probability space, let Wt be a Stan-dard (one-dimensional) Brownian Motion (SBM) defined on this space1.

We briefly state a few essential results concerning stochastic integrals of the typeba

ftdWt (2.1)

for a large class of random functions (processes) f.

The only type of stochastic integrals on the form (2.1) considered in the following

will be Ito integrals. Though other types of stochastic integrals exists, the ItoIntegral is the essential in financial applications partly because of the martingaleproperty, see Theorem 2.1. Whenever a stochastic integral such as (2.1) is used

1that is let Wt be a standard Brownian motion defined on the probability space (, F, P)and adapted to the filtration {Ft}t0. It is common for textbooks on finance theory to definethe Brownian motion first and then define the filtration based on the Brownian motion, seee.g. Bjork [1998] page 78. Either view can of course be adopted here, the important thing isthat we have a filtered probability space and a Brownian motion on this space adapted to thefiltration.

5


16/149

6 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS

we implicitly assume that this integral is well defined. This is for instance thecase for any process f that belongs to the class L2. A stochastic process fbelongs to the classes

L2[a, b] if f is adapted to the filtration

{Ft

}t0, i.e. ft is

Ft measurable for all t and ba E[f2(t)] dt < . A stochastic process f is said tobelong to the class L2 if f belongs to L2[0, t] t > 0. It is possible to define thestochastic integral for the larger class of processes f satisfying

ba f

2t dt < with

probability one. Restricting attention to L2 is simply convenient in the sense thata range of properties follow directly for Ito integrals defined for L2[a, b]. Althoughsome of these properties are essential, we merely state the main property ofthe Ito integral

Theorem 2.1 (The Ito integral is a martingale)

For any process f L2[a, b] is holds that for a s < t bE

ta

fudWu|Fs

=

sa

fudWu (2.2)

The definition of the Ito integral does not guarantee that the process t ta fsdWshas continuous sample paths. However, it is possible to prove that there exists astochastic process X with continuous sample paths such that Xt =

ta fsdWs, t

[a, b] with probability one, see e.g. Theorem 3.11. in ksendal [1989]. In thefollowing we will use this fact to assume that all Ito integrals of the form

t

afsdWs

are continuous as a function of t.In the following the term diffusion refers to any stochastic process that has con-tinuous sample paths and has the strong Markov property: For any sequenceof stopping times2 T0 < T1 < . . . < T n a process Xt on the filtered probabilityspace (, F, {Ft}t0, P) satisfies the strong Markov property if for any s > 0,any x0, x1, . . . , xn, and any measurable set A,

P(XTn+s A|XT0 = x0, XT1 = x1, . . . , X Tn = xn) = P(XTn+s A|XTn = xn)(2.3)

That is, the future and the past are independent given the present value of the

process.The main area of interest in the following will be one-dimensional diffusions wherethe state space is an interval of the form (l, r) where we may have l = and/orr = . In some cases we may want to consider closed intervals instead, whenthis is the case it will be clearly stated, for now let I be any interval ofR. Asusual we let I denote the interior of I.

2A stochastic variable T : [0, ] is called a stopping time with respect to the filtration{Ft}t0 if for any t 0 : {T t} Ft.


17/149

2.1. PRELIMINARIES 7

We are going to focus on diffusions modelling certain financial data, for that tobe reasonable we need the process to behave nicely, let Ty be the stopping timedefined by

Ty = inf{t 0|Xt = y} (2.4)we then define the following important property for the diffusion Xt on statespace I

Definition 2.1 (Regular diffusion)A diffusion process, Xt, with state space I is said to be regular if

P(Ty < |X0 = x) > 0, for all x I, y I (2.5)

This condition rules out that I can be divided into non-communicating subsets,since all points in I can be reached with positive probability in finite time.

The fact that any diffusion has continuous sample paths implies that the proba-bility of a large change in the value of the process over a short period of time canbe made arbitrarily small by looking at a sufficiently small time period. That is,define

h(t, s) = Xt+x Xtso that h(t, s) is the change during a period of length s after t. It can be shown

that any diffusion has the property that

> 0 : lims0

P(|h(t, s)| > |Xt = x) = 0, x, t.

This gives clear indications as to why diffusions are well suited for modellingfinancial data. We would not expect any given financial asset, such as a stock,to have a large change in price when observed over an arbitrarily short period oftime.

Any diffusion process, X, can be characterized by its drift coefficient (vector) and diffusion coefficient (matrix) 2, that is processes (x, t) and 2(x, t) defined

by

lims0

1

sE[h(t, s)|Xt = x] = (x, t) (2.6)

lims0

1

sE

h(t, s)h(t, s)T|Xt = x

= 2(x, t) (2.7)

The processes we shall work with later on will be solutions to stochastic differen-tial equations. Note that it is not true that all such processes will be diffusions,


18/149


nor is it true that any diffusion can be reached as a solution to some stochasticdifferential equation.

When the term stochastic integral is used it will refer to a larger class of pro-cess than simply Ito integrals, we remember the formal definition of a stochasticintegral

Definition 2.2 (Stochastic integral)Let (, F, {Ft}t0, P) be a filtered probability space and assume that W ={Wt}t0 is a standard Brownian motion defined on this probability space. LetX = {Xt}t0 be a one-dimensional stochastic process on (, F, {Ft}t0, P),we say that X is a stochastic integral if it is on the form

Xt = x0 +t0

asds +t0

bsdWs (2.8)

where a and b are adapted to the filtration {Ft}t0, x0 constant and b L2or b is satisfying any weaker condition such that the Ito integral,

t0

bsdWs, iswell defined.

From this point on we will no longer explicitly state the assumption that the

processes satisfy the required integrability conditions such that the Ito integralis well defined. Whenever we present a stochastic integral we implicitly assumethat the conditions are satisfied.

To avoid any confusion we note that we use the term Ito integral to refer to termsof the type

ba

ftdWt whereas stochastic integrals refers to the more general typeof processes given by (2.8).

Following this definition we immediately introduce the following less complicatednotation, such that the integral equation (2.8) will be written in the shorterdifferential form

dXt = atdt + btdWt (2.9)

X0 = x0. (2.10)

Using this notation we say that that X has the stochastic differential given by(2.9) with the initial condition given by (2.10).

It is important to note that the expression (2.9) has no independent meaning, itis simply shorthand for the expression in Definition 2.2.


19/149

2.2. DEFINITION AND EXISTENCE OF SOLUTIONS 9

2.2 Stochastic Differential Equations - Defini-tion and Existence of Solutions

This section utilizes the introduction of the stochastic integral to define stochasticdifferential equations. We are going to look at conditions ensuring that a givenstochastic differential equation (SDE) has a unique solution. Also conditionsensuring that this solution is a diffusion and conditions ensuring other nicequalities regarding the solution to a SDE, such as for instance conditions ensuringstationarity of the solution. The results in this section are based to some extendon Arnold [1972] and also loosely on ksendal [1989].

We now formally define the topic of interest, consider

A k-dimensional Brownian motion W A function : [0, ) Rn Rn

A function : [0, ) Rn Rnk

A vector x0 Rn

where Rnk denotes the class of n k matrices.

Definition 2.3 (Stochastic Differential Equation)Let X be a n-dimensional stochastic process, we say that X satisfies the

stochastic differential equation with initial condition x0

dXt = (t, Xt)dt + (t, Xt)dWt (2.11)

X0 = x0 (2.12)

if X satisfies the integral equation

Xt = x0 +

t0

(s, Xs)ds +

t0

(s, Xs)dWs, t 0. (2.13)

Note that e.g. (2.13) is a system of integral equations for the vector X =(X1, . . . , X n)T that is for each Xi we have the equation

Xit = xi0 +

t0

i(s, Xs)ds +k

j=1

t0

ij(s, Xs)dWjs , t 0. (2.14)

We will often replace the initial condition (2.12) with X0D= Y for some F0-

measurable stochastic variable Y.


20/149


Following the notation from diffusion processes we denote the function (, )the drift coefficient (function) and (, ) (or sometimes 2(, )) is denoted thediffusion coefficient (function). If (

,) is identically equal to zero we note that

(2.11) reduces to a system of usual differential equations.

In some cases we can determine an explicit solution to a given SDE but in mostcases this will not be possible. However, it is still possible to state conditionsensuring existence and uniqueness of a solution.

If, for any two processes X and Y both solving (2.11), we have that

P

supt0

Xt Yt > 0

= 0

then we say the solution to the SDE is unique or pathwise unique.

When the Brownian motion is given and we find a process X adapted to thefiltration {Ft}t0 satisfying the SDE we say that we have a strong solution. Onthe other hand if we are given the two functions and and are free to construct

some probability space, F, {Ft}t0, P, a Brownian motion W and a process

X such that

Xt = x0 +t0

(s, Xs)ds +t0

(s, Xs)dWs, t 0.we then refer to X as a weak solution to the SDE.

Clearly the concept of having a Brownian motion given is purely theoretical.The phrase is simply used to indicate that two strong solutions will almost surelyhave the same sample paths whereas two weak solutions are the same in thesense that their probability laws coincide. Evidently any strong solution is alsoa weak solution, however, a weak solution can be thought of as a strong solution

for an identical SDE defined with the weak solutions Browinan motion. Sincethe probabilistic properties of the diffusion processes considered are the mainfocus, weak solutions are sufficient in most applications. It is clear though, thatthe distinction between weak and strong solutions is quite subtle and from anempirical viewpoint less important.

A number of different existence and uniqueness theorems for SDEs exists, allimposing certain sufficient (but seldom necessary) Lipschitz conditions on thefunctions and , we can now state from Arnold [1972]:


21/149


Theorem 2.2 (Existence and uniqueness of solutions to SDEs)Assume that there exists a constant K such thatx, y I, t 0

(t, x)2n + (t, x)2nk K2(1 + x2n) (2.15)(t, x) (t, y)n + (t, x) (t, y)nk Kx yn (2.16)

where n is any norm onRn and nk is a matrix norm on Rnk e.g.nk = tr(T)

12 . Let Y be a stochastic variable independent of the Brow-

nian motion and such that

EY2n < (2.17)

Then the SDE

dXt = (t, Xt)dt + (t, Xt)dWt

with initial condition X0D= Y has a unique strong solution X with continuous

sample paths.

Proof The proof is based on constructing a sequence of processes X0, X1, X2, . . .from the recursive definition

Xn+1 = X0 +

t0

(s, Xns )ds +

t0

(s, Xns )dWs

It is clear that if the limit limn Xn exists in L2 (, P) then this process X

would satisfy the SDE. The formal proof, which can be found on page 42-44 inksendal [1989], is based on shoving that the limit does indeed exist, and theuniqueness then follows from the Ito Isometry and the Lipschitz condition (2.16).

Although the conditions in Theorem 2.2 are sufficient to ensure the solution, theyturn out to be too restrictive for many of the financial implementations we willuse later. Consider the following example.Example 2.1 (The CIR process)

One of the first diffusion models one meets when modelling the short interest ratein courses in continuous time financial theory is the Cox-Ingersoll-Ross (CIR)

model. Let n = k = 1, let r be a short rate and let W be a one-dimensionalBrownian motion, the CIR specification is then that r has the following dynamicsunder an appropriate measure:

drt = a(b rt)dt + rtdWt, a, b, R+ (2.18)In the notation used earlier we have

(t, r) = a(b rt)(t, r) =

rt


22/149


In particular we note that this is a process where the drift- and diffusion coeffi-cients are independent of the time parameter t. This will be true for the majorityof the financial models we are going to work with.

Note the slight abuse of notation both here and in the subsequent chapters wherewe allow to represent both a function and a parameter, in general () i.e. with()will represent a function whereas without the brackets will be a parameter.

The topic of later chapters will be how to estimate values of the parameters a, band , for now we just consider them to be fixed positive real numbers.

We also note that (2.18) is well defined for rt [0, ) we shall later state con-ditions on the parameters ensuring that the process r satisfies the following

P(rt > 0

|r0 > 0) = 1.

Clearly r r does not satisfy a Lipschitz condition, so the theorem above doesnot guarantee that a unique solution to the CIR model exists.

However, the CIR model would hardly be an interesting financial model if thethere was no unique solution to (2.18). Although no explicit formula for rt interms ofWt cannot be found, we shall see below that we can relax the conditions inTheorem 2.2 enough to include the CIR SDE in the class of stochastic differentialequations for which a solution exists.

It is common to see the CIR process parameterized by

drt = ( + rt)dt +

rtdWt, > 0, < 0, > 0

which is clearly just a different way of stating the same model. This alternativeparametrization has the advantage, when discussing estimation that by the sepa-ration of and in a sum we avoid working with the product of two parametersin the sense of the term ab in the first formulation. However, in many financialapplications it is natural to use the first parametrization as this clearly states thecoefficient of mean reversion a as well as the long term mean, b see more aboutthis below. We therefore maintain the original formulation of the model in the

following (also in matters of parameter estimation), we can of course easily getfrom one parametrization to the other by a = and ab = .

We start our quest for weaker conditions by noting that the global Lipschitzcondition in Theorem 2.2 can be replaced by a local Lipschitz condition, seeArnold [1972], page 112. Whenever there is no doubt about the dimensions of agiven variable we will omit the subscripts n or n k on the norms used in thefollowing.


23/149


Theorem 2.3 (Existence and uniqueness, weaker conditions)The results of Theorem 2.2 are still valid if we replace the Lipschitz conditionwith the weaker condition,

N > 0,

KN :

x

N,

y

N,

t:

(t, x) (t, y) + (t, x) (t, y) KNx y (2.19)

Even these weaker conditions do not cover the CIR model, or other square rootdiffusion models for that matter. However, if we limit ourselves to the one-dimensional case n = k = 1 we can quote a much weaker condition due toYamada and Watanabe [1971], this result has been reported in e.g. Duffie [1992]page 240-241.

Theorem 2.4 (Yamada and Watanave)Assume n = k = 1, sufficient conditions for the existence and uniqueness ofa strong solution to the SDE is that is continuous and satisfies a Lipschitzcondition in x, and that is continuous with the property that

|(t, x) (t, y)| (|x y|) , x,y, t. (2.20)Here : [0, ) [0.) is a strictly increasing function with (0) = 0 suchthat:

z

0

(x)2dx = +, z > 0. (2.21)

Example 2.2 (The CIR process continued)

There is a unique solution to the CIR model for appropriate initial condition, wecan use Theorem 2.4 with (x) =

x. This clearly satisfies the conditions from

Theorem 2.4.

It should be noted that, as stated in Duffie [1992], even though the conditionsabove can be weakened even further, there is a limit of the amount of SDEs forwhich we can guarantee the existence of a unique strong solution. There exist

counterexamples to the uniqueness of the solution to a CIR-like process where wereplace the diffusion coefficient with (x) = |x| for < 12

. However, this doesnot imply that all SDEs with this diffusion function permit no strong solution, inparticular if we restrict our attention to a class of SDEs containing most interestrate models, including the CIR model we can state slightly different conditionsfor existence and uniqueness. In fact, we note from At-Sahalia [1996a] page 550-551, that for the special case of positive, time-invariant, one-dimensional SDEs,local Lipschitz and growth conditions on compact subsets not containing zero aresufficient for the pathwise uniqueness of the solution.


24/149


Theorem 2.5 (Local Lipschitz and growth conditions)Let n = k = 1 and let the state space of the process be I = (0, ). Let theSDE be time-invariant such that

(t, x) = (x)

(t, x) = (x).

Sufficient conditions for pathwise uniqueness of the solution, is that for eachcompact subset of I of the form K = [1/R,R], R > 0 there exists constantsNR,1 and NR,2 such thatx, y K

(x) + (x) NR,1(1 + x) (2.22)(x) (y) + (t, x) (t, y) NR,2 x y . (2.23)

This theorem can be implemented to show that sufficient conditions for the exis-tence of a unique strong solution (up to possibly an explosion, se below) is that thedrift and diffusion functions have s 2 continuous derivatives on I = (0, ) and(x) > 0, x I. See At-Sahalia [1996a] page 550-551 for a discussion of theseweaker existence conditions. Clearly these results apply to a much smaller classof processes than those above (one-dimensional, positive and time-invariant) butfor practical purposes this class is sufficient when modelling, for instance, interestrate processes.

Is is now natural to consider the properties of solution to a SDE in the cases

where one exists.From Arnold [1972] Theorem 9.3.1 or from ksendal [1989] Theorem 7.6 weimmediately note that the conditions guaranteeing the existence and uniquenessof the solution also guarantee that the solution will be a diffusion in the sensethat it has the strong Markov property, we have already seen that the solutionwill have continuous sample paths. We summarize in the following

Theorem 2.6 (The solution as diffusion processes)Assume that the conditions from the existence and uniqueness theorems aresatisfied. If (t, x)

Rn and (t, x)

Rnk are continuous in t then the

solution to the stochastic differential equation is a n-dimensional diffusionprocess with drift vector(t, x) and diffusion matrix2(t, x) = (t, x)(t, x)T.

In the case of time-invariant drift- and diffusion coefficient (like those of theCIR-model) where we have

(t, x) = (x)

(t, x) = (x)


25/149

2.3. TRANSITION PROBABILITIES 15

the solution to the SDE will always be a homogeneous diffusion process in thesense that the distribution ofXt given Xs for s < t only depends on (t, s) throughthe difference t

s. Again as mentioned above the majority of the economic

models used to describe financial assets will have this quality.

2.3 Transition Probabilities

We let X be the solution to the SDE

dXt = (t, Xt)dt + (t, Xt)dWt, X0D= Y.

That is, X is a n-dimensional diffusion with values in the space (Rn,Bn) where

Bn is the Borel algebra on Rn. As discussed earlier X might only take values ona subset I Rn with positive probability3. If this is the case we could just workwith the restriction ofBn to I.

For 0 < s < t we have, by the Markov property, that the conditional distributionfor Xt given that Xs = x is independent of the initial condition (that is indepen-dent of the initial distribution Y). We define the functions pt,s(, ) : Rn Bn [0, 1] by

pt,s(x, A) = P(Xt A|Xs = x), A Bn, x Rn (2.24)such that

x pt,s(x, A) is measurable for all A Bn

A pt,s(x, A) is a probability-function on (Rn,Bn) for all x Rn.

When the diffusion process is homogeneous, which for instance would be the casefor time-invariant drift and diffusion parameters in the SDE, we have that pt,sonly depends on the values (t, s) through the values of the difference t s.In some simple cases it is possible to solve a SDE explicit and derive an ex-pression for the transition probabilities pt,s. Even if an explicit solution for agiven SDE cannot be obtained it is still possible to derive certain properties of

the transition probabilities, especially the Kolmogorov differential equationscan sometimes provide means of solving for explicit expressions for the densitiesof the transition densities, see for instance Bjork [1998] section 4.6. We mentionedabove that the CIR-model does not have an explicit formula for rt in terms ofWtbut a formula for the transition probabilities can in fact be found, see formula 18in the original paper Cox et al. [1985].

3this will be the case for most interest rate models, for instance, where we almost alwayshave P(rt < 0|rs > 0) = 0,s < t.


26/149


However, for the majority of cases this is not possible. Even though we knowthat pt,s exists we can find no analytic expression, clearly this will cause problemsbased on the fact that when we attempt to model any process by using stochasticdifferential equations we will do so based on discretely made observations. Thatis, we have observed Xt0, Xt1, . . . , X tN and from these observations we wish toestimate some parameter , where and depends on . If the transitionprobabilities where known, we could consider the density fs,t (x,y,) say, for

pt,s(x, {y}) for the parameter . The likelihood function in this case is

L() =ni=1

fti1,ti

Xti1, Xti,

.

Inference about the parameter could then be based on maximum likelihoodestimation. These issues will be discussed in greater detail in the subsequentchapters.

For later use we introduce the transition operator based on the transition prob-abilities.

Let B(Rn) be the space of bounded measurable functions defined on Rn, for anyf B(Rn) we define the transition operator Ts,t : B(Rn) B(Rn) by

Ts,t(f)(x) =

Rnf(y)ps,t (x,dy) (2.25)

A natural relationship between ps,t and Ts,t is the obvious formula

ps,t(x, A) = Ts,t(1A)(x).

Again we note that for a homogeneous process Ts,t depend on (s, t) only throught s.

2.4 Ergodicity of the Solution

A natural question at this time would be to ask what we can say about the distri-bution of a solution to a given SDE, we have already seen sufficient conditions fora solution to exist and we have established conditions ensuring that this solutionwill be a diffusion process. Before we can answer the question of when a solutioncan be given a initial distribution such that it is stationary, we need to define anumber of concepts associated with stochastic differential equations and diffusionprocesses.


27/149

2.4. ERGODICITY OF THE SOLUTION 17

It is clear that for any Markov process to be stationary we must have that itis homogeneous, this will be the case for the solution to a SDE with time-invariant drift- and diffusion functions. We will restrict the following to theone-dimensional SDE

dXt = (Xt)dt + (Xt)dWt (2.26)

we will work in the following setting

() and () only depend on time through the parameter x. and are continuous and satisfy conditions such that a unique strong

solution exists.

(x) > 0, x

From Itos formula we note a close connection between SDEs and partial differ-ential equations, we therefore introduce the partial differential operator A

Definition 2.4 (The Infinitesimal Operator)For any function f : R R where f is a C2 function the infinitesimaloperator for the SDE in (2.26) is defined by

Af(x) = (x) fx

(x) +1

22(x)

2f

x2(x) (2.27)

If Xt is a solution to the time-invariant SDE (2.26), we note that It os formulafor the dynamics of a smooth transformation of Xt is simply expressed using theoperator A:Let f : R R be C2 (that is twice differentiable and with continuous secondderivative) and define Zt = f(Xt), the process Zt has the stochastic differentialgiven by

dZt = Af(Xt)dt + (Xt)fx

(Xt)dWt. (2.28)

For the transition operator Ts,t defined above we can derive A by the followingtheorem

Theorem 2.7 (Relationship between A and Ts,t)For any function bounded function f : R R where f is a C2 function andsuch that Ts,t(f) is also C

2 it holds that

Af = limts

1

t s (Ts,t(f) f) (2.29)


28/149


Proof This theorem is a special case of Theorem 3 page 293 in Gihman andSkorohod [1972] where the more general case of a multi-dimensional SDE withtime varying drif- and diffusion functions is treated.

It should be noted that many presentations define the infinitesimal operator by(2.29) and then derive the expression (2.27) for the stochastic differential equationin question. This is the case in Arnold [1972] the results are of course truewhichever way we choose to define A.

2.4.1 Scale functions and speed measures

We know from Theorem 2.1 that, given an integrability condition, any Ito stochas-tic integral is a martingale. This result can be strengthened to give that a solution

to a SDE is a martingale (assuming enough integrability) if and only if the driftfunction is identically equal to zero, see Bjork [1998] Lemma 3.9.

If we wish to find a function s : R R and s C2 such that Yt = s(Xt), t is amartingale; Itos lemma gives

dYt =

(Xt)

s

x(Xt) +

1

22(x)

2s

x2(Xt)

dt + Xt

s

xdWt

This means that we now have Yt is a martingale if and only if As = 0.We have assumed that (x) > 0, if ()2() is integrable the solution to the differential

equation As = 0 with respect to s

=s

x is

s(x) = Kexp

2xx0

(z)

2(z)dz

(2.30)

Where x0 is a fixed value in the interior of the range of X, I, and K is con-

stant. We need two conditions s(x1) = s1 and s(x2) = s2 to determine a uniqueexpression for s.

Functions s determined by (2.30) are called scale functions and measures with

density proportional to exp2 x

x0

(z)2(z)dz

with respect to the Lebesgue mea-

sure are called scale measures. That is, with slight risk of confusion, we shall

refer to s as a scale function and to s = sx as the density of the scale measure.

Since s is clearly monotone, it is invertible and for J = s(I) we have thats1 : J I satisfies

dYt = ds(Xt)

= (Xt)s(Xt)dWt

=

s1(Yt)

s

s1(Yt)

dWt

= a(Yt)dWt


29/149


where a : J R is given by a(y) = (s1(y)) s (s1(y)).If we let a,b,x I such that a < x < b and define the first time the process Xreaches either end-point in the interval [a, b] by

ab = inf{t 0|Xt {a, b}}then if we assume that Xt is regular it follows that

P(ab < |X0 = x) = 1.

By the fact that Y is a martingale it follows quite easily that the probability ofX reaching b before a is given by

P(Xab = b|X0 = x) =s(x)

s(a)

s(b) s(a) .

For the scale function s given by s(x) = exp2 x

x0

(z)2(z)dz

we introduce the

density of the speed measure by the function m

m(x) =1

2(x)s(x). (2.31)

The scale function and the speed measure turn out to be important in derivingconditions ensuring that a solution to a given SDE will be both bounded and

ergodic.

2.4.2 Bounded Solutions

Assume that the solution to the SDE, X, is defined on the open interval I = (l, r)where we may have l = and/or r = +. We define the first time the processreaches either boundary by

= inf{t 0|Xt {l, r}} = inf{t 0|Xt = l or Xt = r} (2.32)

We define the following two integrals, that may not be finite

I1(x) =

xx0

s(y)dy =

xx0

exp

2yx0

(z)

2(z)dz

dy

I2(x) =

x0x

s(y)dy =

x0x

exp

2yx0

(z)

2(z)dz

dy

where x0 (l, r) is fixed.


30/149


It now follows directly from Theorem 1, Chapter 4, section 16 in Gihman andSkorohod [1972] that we can state conditions ensuring that X does not hit theboundary in finite time

Theorem 2.8 (Bounded Solutions)Assume that the coefficients of the SDE are time-invariant and such that aunique strong solution exists. If I1(r) = I2(l) = + then

P( = +|X0 = x) = 1, x I

2.4.3 Conditions for an Ergodic Solution

As noted above a necessary condition for the existence of a invariant distributionis that the process is homogeneous, which is guaranteed by the time-invariantdrift- and diffusion functions.

An equally important condition is that the process stays finite with probabilityone, where finite is meant in the sense of Theorem 2.8.

If we impose the restriction that the drift function should be identically zero weknow from above that the solution (when one exist) is a martingale. If we fornow restrict ourselves to this case we can use the scale function as defined aboveto transform to the general case, the details are given below.

Consider the one-dimensional stochastic differential equation

dYt = a(Yt)dWt.

It turns out that working with this stochastic differential equation will simplifythings considerably. In fact we can now quote from Chapter 4, section 18 ofGihman and Skorohod [1972], that is assume

a() satisfies a Lipschitz condition such that a unique strong solution Yexists.

the condition in Theorem 2.8 is satisfied r

l1

a2(s)ds <

is a probability density proportional to 1a2(y) .

Assume also that the process Y satisfies the initial condition that Y0 is distributedaccording to the density . Then Y is stationary and ergodic and the invariantdistribution has density with respect to the Lebesgue measure.


31/149


For any initial condition Y0 = y0 where l < y0 < r the distribution given by thedensity is a limiting distribution in the sense that

limt

P(Yt y|Y0 = y0) = yl

(z)dz.

We are now ready to state the general case

Theorem 2.9 (Ergodic Solution)Consider the stochastic differential equation

dXt = (Xt)ds + (Xt)dWt

Assume that an unique strong solution exists.

Assume that the conditions in Theorem 2.8 hold, that isrx0

exp

2yx0

(z)

2(z)dz

dy =

x0l

exp

2yx0

(z)

2(z)dz

dy =

Assume that rl

m(x)dx < where m() is the speed function definedabove.

Then a probability distribution is given by the density (with respect to theLebesgue measure) proportional to the speed measure

(x) =K

2(x)exp

xx0

2(z)

2(z)dz

(2.33)

where K is a constant such thatrl

(x)dx = 1.

LetX0 be distributed according to the density .Then X is stationary and ergodic with a invariant measure with density .It holds for any x0

(l, r) that

limt

P(Xt x|X0 = x0) =xl

(z)dz.

Proof All the work has been done in Chapter 4, section 18 in Gihman andSkorohod [1972]. Define Yt = s(Xt) where s is the scale function for the SDE,since s is continuous and monotone the process Y is well defined and the distri-butional properties of interest exist simultaneously for X and Y see page 135 inGihman and Skorohod [1972].


32/149


We already know that Y follows the SDE

dYt = a(Yt)dWt

with a(Yt) = (s1(Yt)) s

(s1(Yt)).

We have by transformation of the integralrl

1

a2(y)dy =

rl

1

( (s1(y)) s (s1(y)))2dy

=

rl

1

( (x) s (x)))2dx < .

Therefore Y is ergodic with stationary distribution given by the density function

(y) = Ka2(y)

, K constant

by the theorem in Gihman and Skorohod [1972] mentioned above.

Now all that is left is to determine an expression for the density of the invari-ant distribution for Xt = s

1(Yt), by the rule of transformation of densities seeTheorem 10.3 in Hansen [2001] we have the invariant density of X

(x) =

(s1)1(x)

(s1)1

(x)= (s(x)) |s

(x)

|= Ka2(s(x))

s(x)

=K

2(x)(s(x))2s(x)

=K

2(x)exp

xx0

2(z)

2(z)dz

.

Where we have used the definition ofa() and the functional form of the derivativeof s. We note that the invariant density is proportional to the scale measure,hence the condition that the function m() is integrable is natural as it ensuresthat it can be scaled to be a probability density.

2.4.4 Exponential Ergodicity

When working with discrete time homogeneous Markov chains a useful and wellknown tool to prove stationarity and ergodicity is the drift criterion. Using thedrift criterion in discrete time it is possible to derive conditions under which theMarkov chain is not only ergodic but geometrically ergodic. That is, let Xn be a


33/149


discrete time Markov chain with state space X, let Pn(A|x) = P(Xn A|X0 = x)be the n step transition probability and the invariant measure. The discretetime drift criterion then provides conditions for which it holds that

limn

n||Pn(|x) || = 0, x X. (2.34)

Where || < 1 and ||f|| = sup| X

g(x)df(x)| : |g(x)| 1.Similar results can be obtained for continuous time Markov chains. These resultsmay seem somewhat more complicated to work with than their discrete timecounterparts. In this section we attempt to use the results from the generalcontinuous time Markov chain theory to examine the case where the Markovprocess in question is known to be a solution to a stochastic differential equation.

The author is not aware of any literature where the continuous time Markovchain results are used to derive conditions for the class of Markov chains that aresolutions to stochastic differential equations.

Necessary continuous time Markov chain theory

We start by introducing the necessary continuous Markov chain theory. For sim-plicity and because we aim to use the results on real valued stochastic differentialequations we consider the case of a Markov process with values in R.

Let Xt be a continuous time non-explosive Markov chain with state space X R.As in discrete time we let P

t

(A|x) = pt,0(x, A) = P(Xt A|X0 = x), we seekconditions implying that Xt is exponentially ergodic, that is an invariantmeasure, , exists such that

||Pt(|x) || M(x)t, t 0, x X. (2.35)Where M(x) is finite and || < 1.As shown in Down et al. [1995] the process may converge exponentially quicklyin the strong sense of V-uniform ergodicity:

For any measurable function V : X [1, ) Down et al. [1995] define exponen-tially ergodicity by V-uniform ergodicity which is given by

||Pt(|x) ||V V(x)Dt, t 0, x X. (2.36)Where D < is a constant, || < 1 and the V-norm || ||V is defined for ameasure by

||||V = sup

|X

g(x)d(x)| : |g| V

(2.37)

we thus note that the V-norm is equal to the total variation norm when V 1.


34/149


A final definition needed before we are able to state and prove the main result ofthis section is the extended generator. Let f : X R+ R, assume that ameasurable function g :

X R+

R exists such that

E[f(Xt, t)|X0 = x] = f(x, 0) + Et

0

g(Xs, s)dsX0 = x(2.38)t

0

E|g(Xs, s)| X0 = x ds < . (2.39)

In this case we write f = g and is called the extended generator of the Markovprocess Xt. For a given Markov chain we define D() to be the set of all functionsf for which a g function as above exists. D() is referred to as the domain of. Clearly the name extended generator hints at the fact that this definition

is an extension of the definition of a infinitesimal generator for general Markovchains. As we shall see below the same is true when we turn our attention backto stochastic differential equations, here we have a well defined expression forthe infinitesimal generator and the relationship between this generator and theextended generator will be explored below. We note that the extended generatoris only defined for non-explosive (ie. bounded) processes.

Assume that the Markov chain satisfies the regularity conditions: -irreducibilityand aperiodicity (see page 1674-1675 in Down et al. [1995] for recapitulation ofthese issues).

We can now state from Down et al. [1995], Theorem 5.2 (c): Let b > 0, c > 0 be

constants, let V : X [1, ) be a real valued function and let C be a petite4Borel set on X.If the drift condition

V(x) cV(x) + b1C(x) (2.40)is satisfied then Xt is V-uniformly ergodic.

Implementation on SDE setting

We now attempt to use the theory presented above on the particular case where

the Markov process is a weak solution to the stochastic differential equation withstate space X R

dXt = (Xt)dt + (Xt)dWt. (2.41)

To simplify the presentation we maintain the following assumptions for the restof this section

4The definition of a petite set is similar to that of a small set, see page 1674 Down et al.[1995]. Given continuity conditions on P(|x) it follows that all compact sets are petite, Tweedieand Pollard [1976]. When this is the case we refer to the Markov chain as a T-chain.


35/149


Assumption 2.1

Assume that a weak solution, Xt, to (2.41) exists. Assume that Xt is bounded. Assume also that Xt is an irreducible T-chain.

These assumptions thus impose restrictions on the processes for which the fol-lowing is true. In particular we note that the T-chain assumption is similar tothe usual regularity conditions including continuous transition densities from thediscrete time drift criterion.

We can now show the following important result, which simply states that forsufficiently smooth functions we can state an explicit expression satisfying the

conditions for the extended generator.

Lemma 2.1 (The extended generator)LetA be the infinitesimal operator for the process Xt as given by Definition2.4. Let f : R R be a C2 function, such thatt0

E

(s)

f

x(Xs)

X0 = x ds < , t0

E

2(s)

2f

x2(Xs)

X0 = x ds < (2.42)

then

Af = f. (2.43)

Proof We need to show that Af satisfies (2.38), condition (2.39) follows di-rectly from the assumptions in the theorem.

From Itos lemma we have

df(Xt) = Af(Xt)dt + (Xt)fx

(Xt)dWt

This gives

f(Xt) = f(x0) + t

0 Af(Xs)ds +

t

0

(Xs)f

x(Xs)dWs.

Taking the conditional mean gives us

E[f(Xt)|X0 = x] = f(x) + Et

0

Af(Xs)dsX0 = x


36/149


The lemma thus explains the name extended generator in this case.

We are now ready to prove the main result of this section

Theorem 2.10 (Exponential ergodicity)LetV : X [1, ) be a deterministic function such that (2.42) is satisfied.Assume that there exists constants c > 0 and b > 0 and a compact set C Xsuch that

AV(x) cV(x) + b1C(x) (2.44)then the solution to the SDE (2.41) is V-uniformly ergodic.

Proof From Lemma 2.1 we note that the infinitesimal operator is also anextended generator in this setting. This means that all we need to prove the the-orem is to verify the conditions from (2.40) necessary to imply uniform ergodicityin the general case. However, this follows directly from the assumptions made inregards to the existence of the solution to the SDE.

Necessary conditions on the drift and diffusion functions for existence and prop-erties of this solution are discussed earlier in this chapter. All we need is torequire that any one of the existence and uniqueness theorems is satisfied as wellas regularity conditions on the drift and diffusion function ensuring that the pro-cess is irreducible and aperiodic and that the process does not explode in finitetime, see for instance Theorem 2.8.

The theorem is similar to the drift criterion known from discrete time, any drift

function can be used. The only requirements on V is that it is smooth enoughto satisfy Itos lemma and bounded such that (2.42) is satisfied.

We end this section with a particular choice of drift function

Corollary 2.1 (Explicit drift function)Consider the drift function V(x) = 1 + x2.Sufficient conditions for V-uniform ergodicity is that there exists constantsc > 0 and b > 0 and a compact set C such that

2(x)x + 2(x) c(1 + x2) + b1C(x).

Proof For the drift function V(x) = 1 + x2 we have vx

= 2x and 2fx2

= 2.This give

AV(x) = (x)Vx

(x) +1

22(x)

2V

x2(x) = 2(x)x + 2(x)

and the result follows.


37/149


0 10 20 30 40 50 60 70 80 90 100

1

2

3

4

5

6CIR

Figure 2.1: A simulated sample-path of the CIR-process with parameter values a = 2, b = 3and = 1 for time index 0 to 100, simulated according to the Milstein scheme with t = 0.001and 100000 observations, see Seydel [2002] page 86.

The corollary presents a particular simple example of conditions sufficient for theresults to follow. However, no results guarantee that this particular choice is thebest for a given stochastic differential equation. Similar to the discrete case the

actual choice of drift function should be chosen on a case by case basis.

Example 2.3 (The CIR process continued)

We complete this chapter by returning to the CIR-process mentioned above thatis the process given by the SDE defined on I = (0, ).

drt = a(b rt)dt + rtdWt, a, b, R+

We have already seen that a strong solution exists and as stated above no closedform expression for the solution can be found. Now we discus the CIR process inmore detail, this will illustrate the use of the theorems above.

We start by noting that this process exhibits mean-reversion towards b with therate of a, when|b rt| is large then the dt-term will dominate the Ito term andpush the process towards b.

We start the analysis by determining the first derivative of the scale function andsolving for the speed measure, for now we omit the constant in the expression for


38/149


s

s(x) = exp2x

1

(z)

2(z)dz

= x2ab

2 exp

2a(x 1)

2

Theorem 2.8 gives that if r0 I = I = (0, ) then rt > 0 and rt is finite for allt with probability one if 1

0

s(x)dx =

1

s(x)dx =

where x0 = 1 in the definition of s and the limits of the integrals has been chosen

at random. The only requirement is according to 2.8 that x0 I so x0 = 1 is asgood a choice as any. This choice will help a bit when evaluating the integrals. We

see that the in the first integral the important term is x2ab

2 since the exponentialpart is clearly increasing. In other words we have that

exp

2a2

10

x2ab

2 dx 10

x2ab

2 exp

2a(x 1)

2

dx

10

x2ab

2 dx

and we know that10

x2ab

2 dx < if and only if 2ab2

1. So a necessarycondition for Theorem 2.8 to hold is 2ab 2. Likewise

1

x2ab

2 dx 1

x2ab

2 exp2a(x 1)

2

dx

so we see that a necessary and sufficient condition for Theorem 2.8 to hold forthe CIR process is 2ab 2.We now turn to the integral of the speed function

0

m(x)dx =

0

1

2(x)s(x)dx

=

0

x2ab

21 exp2a2 x + 2a2 12dx

= e2a

21

2

0

x2ab

21e

2a

2xdx

Define = 2ab2

> 0 and = 2a2

> 0 and we have0

m(x)dx = e1

2

0

x1exdx <


39/149


where we recognize the gamma-integral (or Eulers second integral).

So now we have that the CIR-process satisfies Theorem 2.9 and hence rt is ergodic,

we also know that the invariant distribution has density proportional to the speedfunction which means that we can determine the exact density by

1 =

0

Km(x)dx = Ke1

2

0

x1exdx

= Ke1

2 ()

1

so we have

K = e

2

()

we conclude that the invariant measure has the following density

(x) = Km(x) =

()x1ex.

Finally we note that we cannot use the drift criterion to derive weaker conditionsfor ergodicity. Since the condition 2ab 2 is necessary for the process to bebounded, it will also be necessary for the drift criterion whichever drift functionwe may use.

We can thus conclude this example with the following result: Consider the CIR-process

drt = a(b rt)dt + rtdWt, a, b, R+If 2ab 2 then the solution of the SDE satisfies the following

P(rt > 0|r0 > 0) = 1 rt is finite with probability one. r is ergodic with stationary measure a gamma-distribution with parameters

(,

1

), where =

2ab

2 > 0 and =

2a

2 > 0.

In particular we have that the stationary mean is given by

= b which we alreadysuspected based on the comments about mean reversion above.


40/149

30


41/149

Chapter 3

Parameter estimation in diffusionmodelsConsider the following one-dimensional stochastic differential equation

dXt = (Xt; ) dt + (Xt; ) dWt, X0 = x0 (3.1)

where we assume that the functions and are known functions except for thevalue of the d-dimensional parameter which belongs to some subset Rd.To illustrate one can think of the Cox-Ingersoll-Ross model, given by the SDE(2.18), in this case the parameter would be = (a,b,)T and = R3++ =

{x R3|xi > 0, i = 1, . . . , 3}. We will mostly restrict ourselves to the case wherethe functions and do not depend on time, however, many of the resultscan be modified to cover the more general case of diffusions that are not time-homogeneous.

We will implicitly assume throughout the remainder that the functions and satisfy the conditions from one of the existence and uniqueness theorem statedpreviously.

Under the regularity conditions discussed earlier the solution to the stochasticdifferential equation will be a homogeneous diffusion process, in particular it will

satisfy the Markov property which will be a help when considering the likelihoodfunction for . In many empirical implementations of diffusion models we will notbe able to maintain a continuous record of the process of interest, the availabledata will consist of discretely sampled observations. In all of the following themain task is to draw inference about the parameter based on observations ofthe process X, where X is assumed to satisfy the stochastic differential equationabove. We assume that we have a finite number of observations of X at discretetime point, that is

Xt0, Xt1, . . . , X tn, where t0 < t1 < . . . , < tn

As mentioned in Chapter 2 if fs,t (Xs, Xt; ) is the density of Xt given Xs, s < t,when the true parameter value is then the likelihood function for based onthe observations is

Ln() =ni=1

fti1,ti

Xti1, Xti;

.

In some cases we can use e.g. the Kolmogorov (forward and backward) differ-ential equations to derive explicit expressions for the transition densities, this is

31


42/149

32 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS

for instance true for the CIR model. However, the CIR model and other modelssimple enough to guarantee explicit solutions are seldom in good agreement withempirical observations. In the following we will to some extent continue to usethe CIR process as an illustrative example for the procedures considered. Thismay seem a bit tedious as we have just stated that we may be able to exploitthe analytical expressions for the transition densities to perform maximum likeli-hood. It is clear though, that using this approach will severely limit the numberof models we can work with. To work around this problem we introduce infer-ence based on estimating functions. One might consider the maximum likelihoodmethod mentioned above to be a special case of this method as the estimatingfunction in this case is the score function.

For completeness of the presentation we start with an attempt at estimating by using an Euler discretisation of the process X solving 3.1.

3.1 Approximating the likelihood function

Assume for simplicity in the following section that we have equidistant observa-tions, i.e. ti ti1 = , i = 1, . . . , n. Consider the Euler scheme for approxi-mating the solution to (3.1), where Wi = Wti Wti1

Xti = Xti1 +

Xti1 ;

+

Xti1;

Wi (3.2)

this scheme converges strongly to the solution of the SDE when 0 withorder 1/2 in the sense of Seydel [2002] Definition 3.3.

We say that a discrete time approximation, Xt, converges strongly with order > 0 to the true solution of a SDE, Xt, on the closed interval [0, T] if

EXT XT = O(). (3.3)

Using the Euler scheme (3.2) we can approximate the transition probabilities bythe Gaussian distribution, that is for a process that does indeed satisfy (3.2)

Xti

|Xti1=x

Nx + (x; ) , 2 (x; )

For this approximation to the real process we can easily derive an approximationto the log likelihood function

n() = log(Ln())

=ni=1

1

2log

2

Xti1 ;

+

Xti Xti1 (Xti1; )

222

Xti1 ;


43/149

3.1. APPROXIMATING THE LIKELIHOOD FUNCTION 33

Assuming all partial derivatives exist we get the score function corresponding tothe Euler approximation, where we let define the column vector of partialderivatives of with respect to

Sn () =ni=1

1

2

2

Xti1 ;

2

Xti1 ; Xti Xti1 (Xti1 ; ) Xti1;

2

Xti1;

Xti Xti1 (Xti1; )2 2 Xti1; 24 Xti1;

=ni=1

2

Xti1 ;

24

Xti1;

2

Xti1;

Xti Xti1 (Xti1; )2

Xti1 ; 2 Xti1; Xti Xti1 (Xti1; ) . (3.4)The estimator of is then found by solving, possibly numerically, the equationsSn() = 0 for . However, we can in general not expect that the approximatedscore is an unbiased estimating function. That is, it will in most cases be truethat

E [Sn ()] = 0.To see the importance of having an unbiased estimating function let 0 be thetrue unknown parameter and let n be the estimator derived by the conditionSn(n) = 0. Consider the first order Taylor formula

0 = Sn(n) = Sn(0) + Sn()(n 0).Where Sn() is the matrix of derivatives evaluated at some convex combinationsof n and 0.

IfE0 [Sn (0)] = 0 we would thus, in general, expect that the estimator n will bea non-central estimator. Below we present conditions ensuring that estimatorsderived from unbiased estimating functions are consistent. For now we examinefurther the properties of estimators derived from the approximation to the truescore (3.4).

Based on the Euler scheme we should expect the function S to provide good

estimators only when is small. We would also expect that estimators derivedfrom this scheme are consistent and asymptotically normal only when tn and 0. It can be shown that in the case where the diffusion function ()depends on the parameter the bias of the estimating function given by (3.4)will be of order n, see Srensen [1997].

To explore these issues further we turn to simulation and consider the CIR processonce again, this process has the property that we can solve Sn() = 0 explicitlyand thus derive closed form expressions for the estimators.


44/149


Example 3.1 (Simulation study of the Euler approximation)

Consider the CIR process

dXt = a(b Xt)dt + XtdWt, a, b, R+The estimators based on (3.4) are given by

an =n2 nni=1 XtiX1ti1+ (Xtn X0)ni=1 X1ti1

n2 ni=1 Xti1 ni=1 X1ti1bn =

nn

i=1 Xti n

i=1 XtiX1ti1

ni=1 Xti1

n2 n ni=1 XtiX1ti1+ (Xtn Xt0)ni=1 X1ti1

2n =1

n

n

i=1 X1ti1 Xti Xti1 a(b Xti1)

2

.

Consider now the situation where we simulate a sample path of the CIR processby using e.g. a Milstein scheme as mentioned briefly in Example 2.3 (see alsoSeydel [2002] page 86.). We choose the values (a,b,) = (1, 2, 1) for which thesolution is known to be ergodic and simulate the process and calculate the valueof estimators for different values of and tn. For each fixed choice of (, tn)the process is simulated 1000 times, the process itself is simulated for a much

finer discretization than the one used in the estimations. We choose to simulatethe process with time between each simulation point equal to 0.0001 thus giving agood approximation to the true process. We then sample from this process such

that we have observations that are apart, thereby ensuring that the majority ofthe inaccuracy in the estimation results stems from the quality of the estimating

function and not from the simulation of the process. The results are summarizedin Table 3.1. This table give an opportunity to examine the effects of both anincrease in the number of observations and the effect of decreasing the intervalbetween observations.

We see that all three estimators are quite inaccurate even for smaller intervals andhigher values of tn. In particular we note that both an and

2n are systematically

too small for all values whereas bn seem to be a more precise estimator for thesame values.

There is some ambiguity about the results of increasing the number of observa-tions without decreasing the size of the observation interval. However, there isnotable improvement in some of the estimators whentn increases and decreasessimultaneously. This verifies the results mentioned above that estimators basedon Euler approximation of the likelihood function are consistent under regularityconditions when 0 and tn .Based on the simulations results above, there should be no doubt that the smaller is, the closer is the mean of the estimators to the true values. This is somewhat


45/149

3.2. MARTINGALE ESTIMATING FUNCTIONS 35

tn an bn 2n

mean sde mean sde mean sde100 2.5 0.3821 0.0811 2.0010 0.1699 0.2312 0.0796100 1 0.6530 0.1132 2.0020 0.1509 0.4804 0.0937100 0.1 0.9898 0.1427 2.0039 0.1434 0.9177 0.0449250 2.5 0.3711 0.0530 1.9984 0.1063 0.2435 0.0519250 1 0.6399 0.0684 1.9992 0.0939 0.4916 0.0620250 0.1 0.9668 0.0866 2.0002 0.0885 0.9203 0.0282500 2.5 0.3697 0.0380 2.0015 0.0766 0.2503 0.0385500 1 0.6352 0.0490 2.0010 0.0673 0.4947 0.0438500 0.1 0.9588 0.0618 2.0023 0.0635 0.9210 0.0200

Table 3.1: Results of simulation study of estimators based on the Euler approximation of thelikelihood function, data is a simulated Cox-Ingersoll-Ross process with true values (a,b,2) =

(1, 2, 1).

less clear regarding the estimator for b but both n and 2n improve dramatically

when we decrease . We also note that the standard deviation of the estimatorsimprove significantly when we increase tn even for fixed .

For a brief review of asymptotic results for estimators based on approximationsto the likelihood function see Srensen [1997] page 4. In general we conclude thatthis method only works reasonably well when we have both a great number of ob-servations and a very short time period between each observation. As mentioned

earlier, when working with financial data we will seldom be able to exercise greatcontrol over the time between each observation, we can clearly not rely solely onestimators based on a Gaussian approximation of the likelihood function.

As a way of attempting to deal with the bias problems indicated by Example 3.1Kessler [1995] suggested that the approximate Gaussian likelihood function beimproved by using better estimates of the mean and variance. This results in amethod that gives more complicated calculations than the one suggested above,but also (given a number of regularity conditions) provides more control over theorder of the bias, see Kessler [1995].

3.2 Martingale estimating functions

Another way of dealing with the bias problems is to restrict our attention tomartingale estimating functions, that is estimating functions on the form

Gn() =ni=1

g

i, Xti1, Xti;

(3.5)


46/149


where i = ti ti1 and the function g satisfiesE g i, Xti1, Xti; |Xti1 = 0. (3.6)Where as usual E [X] indicates that the mean value is taken under the parameter

values, . That is, when fti1,ti is the transition density discussed previously wehave

E

g

i, Xti1 , Xti; |Xti1 = x = g (i, x , z ; ) fti1,ti (x, z; ) dz.

Equation (3.6) simply states that g

i, Xti1, Xti;

should be a martingaledifference under the appropriate measure. It is clear that (3.6) guarantees thatGn(0) is a martingale with respect to the usual filtration when 0 is the trueparameter value. In this we also note that Gn(0) has mean zero under the

probability measure with respect to 0, such that

E0 [Gn (0)] = 0.

Therefore the problems mentioned in the earlier section on the bias of the esti-mating functions are avoided by working with martingale estimating functions.

For any given function Gn() satisfying (3.5) and (3.6) we get an estimator for if we can solve Gn() = 0 either analytically or numerically.

It could seem that working with martingale estimating functions is little improve-ment since we still need the transition density for the process X. The fact that

this density is seldom known was the very reason for turning away from maxi-mum likelihood. However, even when we do not have a closed expression for thetransition density, we can find functions satisfying (3.6).

It is often useful to think of any given martingale estimating function as anapproximation to the true (often unknown) score function which is a martingaleunder weak regularity conditions.

From this point of view it is natural to attempt to choose estimating functionsthat closely resemble the true score function. Consider a given class of estimatingfunctions, G, we work with two terms of what it means for an element in G to bean optimal estimating function.

Gn G is said to be Fixed Sample Optimal if it is the element in Gclosest to the true score function, measured by an appropriate measure.

Gn G is said to be Asymptotic Optimal if it is the element in G withthe smallest asymptotic variance as n .

See Godambe and Heyde [1987] for a general definition and discussion of theseterms.


47/149

3.2. MARTINGALE ESTIMATING FUNCTIONS 37

3.2.1 Existence of optimal estimating functions

We now present results concerning how to choose the best estimating function

from a particular class of estimating functions. This presentation is largely basedon Srensen [1997].

First we make a few notes on the notation used in both this and the subsequentsections. For homogeneous diffusions we have already noted that the transitionprobabilities only depend on (s, t) through t s and for notational simplicitywe will use the notation fi (x, y; ) instead of fti1,ti (x, y; ) for the transitionprobabilities whenever there is no doubt about the interval i = titi1. Also fornotational simplicity we will use the symbol in the following to indicate how theinterval between two subsequent observations enter the various functions. Thisdoes not indicate that we are working with equidistant observations, it is simply

a notational simplification like writing g (, x , y; ) instead ofg i, Xti1, Xti; when working with certain properties of the function g.

We will in particular deal with the case where the estimating function has thesame dimension as , thus providing the same number of equations as the numberof unknown parameters. Consider now the function g given by

g

i, Xti1 , Xti;

=N

j=1

j(i, Xti1 ; )hj

i, Xti1, Xti;

(3.7)

where hj is one-dimensional and satisfies (3.6), for j = 1, . . . , N and j are ar-

bitrary functions of the same dimension as (if possible). Evidently any functiong defined this way satisfies (3.6) and it is thus possible to define an estimatingfunction Gn by (3.5) using g. Assume in the following that the parameter i d-dimensional and let G denote the class of all d-dimensional martingale estimatingfunctions given by (3.5) and (3.7).

We need a number of definitions, consider the following

ckl(, x; ) =

hk(, x , y; )hl(

A Reconsideration of Continous Time One Factor Spot Rate Models

Documents

Transcript of A Reconsideration of Continous Time One Factor Spot Rate Models