Shrinkage Drift Parameters Estimation for Multi-factors Ornstein ...

manuscript No.(will be inserted by the editor)

Shrinkage Drift Parameters Estimation for Multi-factors

Ornstein-Uhlenbeck Processes

Sévérien Nkurunziza · and · S. Ejaz Ahmed

Abstract We consider some inference problem concerning the drift parameters of multi-factors Vasicek

model (or multivariate Ornstein-Uhlebeck process). For example, it asserts that the term structure of interest

rate is not just a single process, but rather a superpositions of several analogous processes. This motivates us

to develop an improved estimation theory for the drift parameters when homogeneity of several parameters

may hold. However, the information regarding the equality of the theses parameters may be imprecise. In this

context, we consider Stein-rule (or shrinkage) estimators which allow us to improve upon the performance

of classical maximum likelihood estimator (MLE). Asymptotic properties of some shrinkage estimators are

studied. Under an asymptotic distributional quadratic risk criterion, their relative dominance picture is explored

and assessed. A simulation study illustrates the behavior of the suggested method for small and moderate length

of time observation period. Our analytical and simulation results demonstrate that shrinkage estimators provide

excellent estimation accuracy and outperform the MLE uniformly. An over-ridding theme of this paper is that

the shrinkage estimators provides a powerful extension of its classical counterparts.

Sévérien Nkurunziza · and · S. Ejaz Ahmed

University of Windsor, 401 Sunset Avenue, Windsor, Ontario, N9B 3P4

Tel. : 1-519-253-3000 ext. 3017 · Fax : 1-519-971-3649 · E-mail: [email protected] · [email protected]

Shrinkage Estimator of the drift parameters 1

Keywords Ornstein-Uhlenbeck process · Vasicek process · Gaussian process · shrinkage estimator · MLE ·

diffusion process

1 Introduction

The Ornstein-Uhlenbeck process has been extensively used in modelling of different phenomenon such

as in Biology (Engen and Sæther, 2005), in Ecology (Engen et al., 2002), in Finances (Schöbel and Zhu,

1999). In this last particular field, the Ornstein-Uhlenbeck process is used in modelling the term structure of

the interest rates and that is mostly known as Vasicek (1977) process. In a such particular point of view, the

Vasicek model describes the interaction between equilibrium value of what the interest rate should be and a

stochastic movements into the interest rate that results of the unpredictable economic environment (see for e.g.

Vasicek, 1977, Abu-Mostafa, 2001). Namely, the instantaneous interest rate, r(t) is governed by the stochastic

differential equation (SDE)

dr(t) = θ (α− r(t))dt +σdW (t) (1)

where α > 0 represents a steady-state interest rate (or the long-term mean), θ > 0 represents a speed of conver-

ging to the steady-state, σ > 0 is a volatility or “randomness level” and {W (t), t > 0} is a Wiener process. The

first part θ (α− r(t)) represents the drift term and thus, α and θ are so-called the drift parameters. Also, the

second part σdW (t) represents the diffusion term of the process whose σ is so-called the diffusion parameter.

With this in mind, let us illustrate the more general form of the above model in the following sub-section and

the remanning discussions follows.

1.1 Multi-factors Model

The model given in (1) is univariate while the term structure of the interest rates is embedded in a large

macroeconomic system (Langetieg, 1980). For this reason, Langetieg (1980) developed a multivariate version

2 S. Nkurunziza and S. E. Ahmed

of the model (1), so-called “multi-factors Vasicek model”. This model takes care of an arbitrary number of

economic relationships and given as follows :

drk(t) = θk (αk− rk(t))dt +σkdWk(t), k = 1,2, . . . , p, (2)

where {Wk, t > 0}, k = 1,2, . . . , p are Wiener processes possibly correlated. In some particular economic contexts,

the philosophy behind having a such multiple factors stems from the observation can be interpreted as different

time scales for the behavior of interest rates.

The interest rate Vasicek model has been extensively used in literature. This model was used to analyze

the maturity structure of the public debt both at the Bank of Canada and at the Department of Finance, and in

Danish National bank, see Georges (2003). Abu-Mostafa (2001) calibrates the correlated multi-factors Vasicek

model of interest rates, and then applied it to Japanese Yen swaps market and U.S. Dollar yield market.

From a statistical point of view, Liptser and Shiryayev (1978), Basawa, Ishwar and Rao (1980) and Ku-

toyants (2004) considered the inference problem concerning the drift parameters of the model (1) and derived

the maximum likelihood estimator (MLE). Our goal is here to improve the performance of the existing MLE of

the drift parameters of multivariate Vasicek (or Ornstein-Uhlenbeck) process. In order to simplify the analysis,

we consider that steady-state interest rate α is known or available from the previous study. Accordingly, our

parameter of interest is the speed of converging to the steady-state, i.e., θk. Thus, let Xk(t) = rk(t)−αk. Then,

from the multi-factors Vasicek model (2), we get

dXk(t) =−θkXk(t)dt +σkdWk(t), k = 1,2, . . . , p, (3)

where θk > 0, σk > 0, k = 1,2, . . . , p. We assume that {Wk(t), t > 0,k = 1,2 . . . , p} is p-dimensional Wiener

process. The independence assumption between the p different Wiener processes is made to simplify the analy-

sis. Similar results can be established for the case where these Wiener processes are pairwise jointly Gaussian

with a non-zero correlation coefficient.

Motivated by the diverse applications of the above model, we consider the estimation problem of drift

parameter vector θθθ = (θ1,θ2, . . . ,θp)′.


We are interested in the inference problem when it is suspected that

θ1 = θ2 = · · ·= θp

or nearly equal. In other words, considering the situations when all the parameters may be equal. Some aspects

of this behavior are observed in a short time horizon (high-speed factors). In other scenario perhaps the data was

collected from various sources under similar conditions. Hence, under these scenarios it is reasonable to suspect

the homogeneity of the several drift parameters. The estimation and testing of homogeneity or parameters is

of paramount interest equally and in scientific and statistical literatures. For this reason, the main focus of this

contribution is to suggest some improved estimators of θθθ with high estimation accuracy when homogeneity of

several drift parameters may hold. An asymptotic test for the equality of parameters is also suggested. Hence,

we have extended the single factor inference problem into a more general form of the Vasicek model, the

multi-factors model. Further, an alternative estimator of variance parameters is also suggested.

The rest of this paper is organized as follows. Section 2 is Meta Analysis and Testing the Homogeneity. In

this section, we present the shrinkage estimator and show its supremacy over the MLE. Section 3 is devoted to

some simulation studies of the ADR results. Finally, Section 4 gives concluding remarks and, technical results

are given in the Appendix.

2 Meta Analysis and Testing the Homogeneity

Let {X1(t),0 6 t 6 T}, {X2(t),0 6 t 6 T},. . . ,{

Xp(t),0 6 t 6 T}

be Ornstein-Uhlenbeck processes (Ku-

toyants, 2004, p. 51, Steele, 2001, p. 140) whose diffusion equations are given by

dXk(t) =−θkXk(t)dt +σkdWk(t) Xk(0) fixed , (4)

where θk > 0, σk > 0, k = 1,2, . . . , p and {Wk(t),0 6 t 6 T}, k = 1,2, . . . , p are p-independent Wiener pro-

cesses. Further, let θθθ be a p-column vector given by θθθ = (θ1,θ2, . . . ,θp)′ , and let Σ be diagonal matrix whose


diagonal entrees are σ21 ,σ2

2 , . . . ,σ2p , i.e., ΣΣΣ = diag

(σ2

1 ,σ22 , . . . ,σ2

p). Also, let XXX(t) and WWW (t) be column vectors

given by

XXX(t) = (X1(t),X2(t), . . . ,Xp(t))′ , WWW (t) = (W1(t),W2(t), . . . ,Wp(t))

′ ,

for 0 6 t 6 T . Further, let

VVV (t) = diag(X1(t),X2(t), . . . ,Xp(t)) , 0 6 t 6 T .

From relation (4), we get

dXXX(t) =−VVV (t)θθθdt +Σ12 dWWW (t), 0 6 t 6 T . (5)

As mentioned in the Introduction, we are interested in the situation where we suspect that

θ1 = θ2 = · · ·= θp,

which leads to consider the testing problem

H0 : θ1 = θ2 = · · ·= θp versus H1 : θ j 6= θk for some 1 6 j < k 6 p. (6)

The provided solution to (6) is based on the maximum likelihood estimators of θθθ under H0 and under the full pa-

rametric space. This estimation and/or testing problem is considered when ΣΣΣ is either known or unknown. Also,

note that the MLE established here corresponds to that given in Lipter and Shiryayev (1978, chapter 17, p. 206-

207) or in Kutoyants (2004, p. 63), for the univariate case. Hence, we extend univariate estimation problem to

a multivariate situation.

In the sequel, we denote

UUUT =(∫ T

0X1(t)dX1(t),

∫ T

0X2(t)dX2(t), . . . ,

∫ T

0Xp(t)dXp(t)

)′,

and let

DDDT =∫ T

0VVV 2(t)dt, θ̂θθ

∗=−DDD−1

T UUUT =(

θ̂ ∗1 , θ̂ ∗2 , . . . , θ̂ ∗p)′

. (7)

Also, let z+ = max(z,0). Conditionally to XXX0, let θ̂θθ be maximum likelihood estimator of θθθ satisfying the

model (5) and θ̃θθ be the restricted MLE (RMLE) of θθθ under H0.


Proposition 1 We have

θ̂θθ =(

θ̂ ∗+1 , θ̂ ∗+2 , . . . , θ̂ ∗+p

)′and θ̃θθ =

(eee′pΣ−1DDDT eeep

)−1eeepeee′pΣ−1DDDT θ̂θθ , (8)

where eeep is a p-column vector whose all entrees are equal to 1.

The proof follows from Proposition 4 that is given in the Appendix.

It is established that θ̂θθ and θ̃θθ are strongly consistent for θθθ (see Appendix, Proposition 5). Moreover, these

estimators are asymptotically normal as T tends to infinity (see, Appendix, Proposition 6). Based on these

results, we suggest the following test for the testing problem (6), when Σ is known,

Ψ =

1 if T(

θ̂θθ − θ̃θθ)′

Σ−1(

θ̂θθ − θ̃θθ)

> χ2p−1;α

0 if T(


Σ−1(


< χ2p−1;α ,

(9)

where α is fixed and 0 < α < 1. Also, we denote by χ2p−1, the chi-square random variable with p−1 degrees

of freedom and

Pr{

χ2p−1 > χ2

p−1;α}

= α.

When Σ is unknown, the test (9) can be modified by replacing Σ by a its strongly consistent estimator. Later,

we prove that the test Ψ is asymptotically α-level. Further, the test obtained for the unknown Σ case has

asymptotically the same level α , and it is as powerful as the test Ψ .

Corollary 1 Under the model (5), the test Ψ given in (6) is asymptotically α-level test for the testing pro-

blem (6).

The proof is straightforward by using Corollary 4 that is given in the Appendix.

2.1 Practical Aspect

In practice the covariance matrix Σ is unknown and needs to be estimated in order to compute θ̂θθ and θ̃θθ .

Namely, Σ is replaced by its corresponding strongly consistent estimator Σ̂ . By Slutsky Theorem, the corres-

ponding new estimators are strongly consistent and asymptotically normal. The benchmark strongly consistent


estimator for the diffusion parameter is based on quadratic variation. In the present investigation we suggest an

alternative estimator for (σ1,σ2, . . . ,σp)′, that is

(σ̂1, σ̂2, . . . , σ̂p)′ where σ̂2

i =X2

i (T )T

− 2T

∫ T

0Xi(t)dXi(t).

Further, if σ1 = σ2 = · · ·= σp = σ , then the common value σ2 is estimated by

σ̂2 =1p

p

∑i=1

(X2

i (T )T

− 2T

∫ T

0Xi(t)dXi(t)

).

Hence

Σ̂ = diag(σ̂2

1 , σ̂22 , . . . , σ̂2

p)

or Σ̃ = σ̂2Ip. (10)

Proposition 2 Assume that the model (5) holds. Then,

(i) Σ̂ a.s.−−−→T→∞

Σ and for any 0 < ν < 1, T ν(

Σ̂ −Σ)

a.s.−−−→T→∞

0;

(ii) if E{‖XXX(0)‖2

}< ∞ then lim

T→∞E

{∥∥∥Σ̂ −Σ∥∥∥

2}

= 0.

From Proposition 2, it should be noted that the suggested estimator Σ̂ converges faster than the classical esti-

mator based on quadratic variation. Since the strongly consistency and normality properties of θ̂θθ and θ̃θθ do not

change when Σ is replaced by Σ̂ , the brevity sake, we treat Σ as known in the remaining discussions.

Finally, we close this subsection by noting that a continuous time process can be viewed as an approxi-

mation of discrete time process ; since in practice the observations are collected in discrete times. Indeed, in

order to evaluate the suggested estimator, the stochastic integrals are replaced with their corresponding discrete

Riemann-Îto sums. In the following subsection, we showcase the main contribution of the paper.

2.2 Shrinkage Estimation Strategy (SES)

In this subsection, we shows how to improve the precision of the MLE by shrinking them towards a common

estimate. Following Ahmed (2001) and others, we consider two Stein-rule (or shrinkage) estimators of the drift

parameters.


The shrinkage estimator (SE) θ̂S

of θ is defined as

θ̂S= θ̃ +{1− cψ−1}(θ̂ − θ̃), (11)

where

ψ = T(


Σ−1(


,

and c is allowed to vary over [0,2(p−2)), p≥ 3, and often is taken as c = p−2 ; thus tacitly, we assume that

p≥ 3. Note that ψ ≥ 0, and hence, for

ψ < c⇐⇒ 1− cψ−1 < 0,

that causes a possible inversion of sign or over-shrinkage. For this reason θ̂S

may not be useful at all in the

sense that the components of this estimator may have different signs from the maximum likelihood estimator.

Thus, shrinking towards the pivot is commonly accepted but over-shrinking may be somewhat more difficult

to interpret, albeit this drawback does not adversely effect the risk performance of shrinkage estimator. The

positive-rule shrinkage estimators (PSE) control this drawback satisfactorily and may dominate both SE and

MLE and hence, are useful for practical purposes. Formally, the PSE(

θ̂S+)

is defined as

θ̂S+

= θ̃ +{1− cψ−1}+(

θ̂ − θ̃)

, (12)

where z+ = max(0,z). Accordingly, Ahmed (2001) recommended that shrinkage estimator should be used as a

tool for developing the PSE and should not be used as an estimator in its own right.

In parametric setups, the SE, PSE, and other related estimators have been extensively studied (Judge and

Bock, 1978 and the references therein). Large sample properties of these estimators were studied by Sen (1986),

Ahmed (2005) and others. Stigler (1990) and Kubokawa (1998) provide excellent reviews of (parametric) shrin-

kage estimators. Jurec̆ková and Sen (1996) have an extensive treatise of the asymptotic and interrelationships

of robust estimators of location, scale and regression parameters with due emphasis on Stein-rule estimators.


An interesting feature of this paper is that we consider the shrinkage estimation of drift parameters in the

multifactors model, a more general form of the Vasicek model. Based on the reviewed literature, this kind of

study is not available for practitioner.

Having all these estimators defined, we need to stress on the regularity conditions under which they have

good performance characteristics. Unlike maximum likelihood estimators θ̂ and θ̃ , these shrinkage estimators

are not linear. Hence, even if the distribution was a standard normal, the finite sample distribution theory of these

shrinkage estimators are not simple to obtain, for even normal ones. This difficulty has been largely overcome

by asymptotic methods (Ahmed and Saleh (1999), Jurec̆ková and Sen (1996), and others). These asymptotic

methods relate primarily to convergence in distribution which may not generally guarantee convergence in

quadratic risk. This technicality has been taken care of by the introduction of asymptotic distributional risk

(ADR) (Sen (1986)), which, in turn, is based on the concept of a shrinking neighborhood of the pivot for which

the ADR serves a useful and interpretable role in asymptotic risk analysis.

We focus on some small and moderate length of time observation period and study ADR of different

estimators by means of a simulation study. For this purpose, in the next section, we recapitulate briefly ADR

and asymptotic bias (AB) results pertaining to them. This forms a basis and provides some justifications for

moderate length of time observation period.

2.3 Asymptotic Properties

First, for the testing problem (6), let us consider the following local alternative

KT : θθθ = θ̄eeep +δδδ√T

(13)

where δδδ is a p-column vector with different direction with eeep. Also, we assume that ‖δ‖< ∞. Further, let

ϒ =(eee′pΣ−1eeep

)−1eeepeee′p, δδδ ∗ = δδδ −ϒ Σ−1δδδ , and let Σ ∗ = Σ −ϒ .


The asymptotic study of the shrinkage estimator (11) can be derived from the asymptotic distribution of the

quantities

ρρρT =√

T(

θ̂θθ − θ̄eeep

), ξξξ T =

√T

(θ̂θθ − θ̃θθ

), ζT =

√T

(θ̃θθ − θ̄eeep

).

Proposition 3 Assume that the model (5) holds. Under the local alternative (13),

(i)

ρρρT

ξξξ T

L−−−→T→∞

N2p

δδδ

δδδ ∗

,

Σ Σ ∗

Σ ∗ Σ ∗

.

(ii)

ζζζ T

ξξξ T

L−−−→T→∞

N2p

δδδ −δδδ ∗

δδδ ∗

,

ϒ 000

000 Σ ∗

.

The proof of this proposition is given in the Appendix.

Corollary 2 Let Ξ = Σ−1−Σ−1(eee′pΣ−1eeep

)−1 eeepeee′pΣ−1. If (13) holds, then

ξξξ ′T Σ−1ξξξ TL−−−→

T→∞χ2

p−1

(δδδ ∗

′Ξδδδ ∗

).

For the proof, see the Appendix.

Based on Corollary 2, we establish Corollary 3 which gives the asymptotic power for the test Ψ .

Corollary 3 Under the conditions of Corollary 2, we have

limT→∞

ΠΨ

(θθθ +

δδδ√T

)= Pr

{χ2

p−1

(δδδ ∗

′Ξδδδ ∗

)> χ2

p−1;α

(δδδ ∗

′Ξδδδ ∗

)}.

The proof is straightforward by using Corollary 2.

It is well known that, even for normal distribution, the effective domain of risk dominance of PSE or

SE over MLE is a small neighborhood of the chosen pivot (viz., θ = θe′p) ; and as we make the observation

period T larger and larger, this domain becomes narrower. In the present context, under the assumed regularity

conditions, Corollary 2 shows that for any fixed θ 6= θeee′p,


ψ L−−−→T→∞

χ2p−1

(δδδ ∗

′Ξδδδ ∗

)and T−1ψ L−−−→

T→∞0 (14)

as such, the shrinkage factor cψ−1 = Op(T−1

), as T → ∞, so that asymptotically there is no shrinkage effect.

This justifies the choice of the usual Pitman type of alternatives given in (13).

Even for such local alternatives, the SE, PSE and θ̃ may not be asymptotically unbiased estimators of

θ . With that in mind, we introduce the following optimality criterion. For an estimator θ̂?

of θ , we confine

ourselves to a quadratic loss function of the form

L(

θ̂?,θ ;W

)=

[√T

(θ̂

?−θ)]′

W[√

T(

θ̂?−θ

)], (15)

where W is positive semi-definite (p.s.d). Using the distribution of√

T(

θ̂?−θ

)and taking the expected

value both sides of (15), we get the expected loss that would be called the quadratic risk RoT

(θ̂

?,θ ;W

)(=

trace(

WΣ̂ T

), where Σ̂ T is the dispersion matrix of

√T

(θ̂

?−θ)

. Whenever

limT→∞

Σ̂ T = Σ

exists, RoT

(θ̂

?,θ ;W

)→ Ro

(θ̂

?,θ ;W

)= trace(WΣ), which is termed the asymptotic risk. In our setup, we

denote the distribution of√

T(

θ̂?−θ

)by G̃T (u), u ∈ Rp. Suppose that G̃T → G̃ (at all points of continuity),

as T → ∞, and let Σ G̃ be the dispersion matrix of G̃. Then the ADR of θ̂?

is defined as

Ro(

θ̂?,θ ;W

)= trace(WΣ G̃) . (16)

Note that if√

T(

θ̂?−θ

)converges in second moment, then ADR is indeed the asymptotic risk. However, this

stronger mode of convergence may be difficult to prove analytically, especially for SE and PSE, and thus, we

shall work with the ADR results in the ongoing discussions.

The shrinkage estimators are, in general, biased estimators. Nevertheless, the bias is accompanied by reduc-

tion in risk, and hence, it does not have serious impact on risk assessment. In this vein, we define the asymptotic

bias as

B0T

(θ̂

?,θ

)= E

[√T

(θ̂

?−θ)]

, (17)


and side by side, the asymptotic distributional bias (ADB) as the limit

B0T

(θ̂

?,θ

)=

∫. . .

∫xdG̃T (x) −−−→

T→∞

(B

(θ̂

?,θ

)=

∫. . .

∫xdG̃(x)

). (18)

Two central key results to the study of ADR and ADB of the shrinkage estimators are given in Proposition 3

and Corollary 2. With these results intact, we are in a position to use the results on the (normal distribution)

parametric model, and thereby give at the main results of this subsection. We present (without derivation)

the results on SE and PSE. The proofs are similar to that given in Ahmed (1992). Let ∆ = δδδ ∗′Ξδδδ ∗ and let

Hν(x ;∆) = P{χ2ν (∆)≤ x}, x ∈ R+.

Theorem 1 The ADB functions of the estimators are given as follows :

B(

θ̂ ,θ)

= 0, B(

θ̃ ,θ)

=−δ , B(

θ̂S,θ

)=−δ (p−3)E{χ−2

p+1(∆)

B(

θ̂S+

,β1

)= −δ

[Hp+1(p−3;∆)+E{χ−2

p+1(∆)I(

χ2p+1(∆)

)> (p−3))

].

(19)

Since for the ADB of θ̃ , θ̂S

and θ̂S+

, the component δ is common and they differ only by scalar factors,

it suffices to compare the scalar factors ∆ only. It is clear that bias of the θ̃ is an unbounded function of ∆ . On

the other hand, the ADB of both θ̂S

and θ̂S+

are bounded in ∆ . Noting that since E{χ−2p+1(∆)} is a decreasing

log-convex function of ∆ the ADB of θ̂S

starts from the origin at ∆ = 0, increases to a maximum, and then

decreases towards 0. The characteristic θ̂S+

is similar to θ̂S. Interestingly, the bias curve of θ̂

S+remains below

the curve of SE for all values of ∆ . The simulation results in the next section are also pointing in that direction.

Theorem 2 The ADR functions of the estimators are given as follows :

R(

θ̂ ,θ ;W)

= trace(WΣ),

R(

θ̃ ,θ ;W)

= trace(WΣ)− trace(WΣ−)+δ ∗′Wδ ∗, Σ− = Σ − epe′p,

R(

θ̂S,θ ;W

)= ADR(θ̂)+δ ∗′Wδ ∗(p2−3)E(χ−4

p+3(∆))

−(p−3)trace(WΣ−){2E(χ−2p+1(∆))− (p−3)E(χ−4

p+1(∆))},

(20)


R(

θ̂S+

,θ ;W)

= ADR(θ̂S)+(p−3)trace(WΣ−)

×[2E

{χ−2

p+1(∆)I(χ2p+1(∆)≤ (p−3)

}

−(p−3)E{

χ−4p+1(∆)I(χ2

p+1(∆)≤ (p−3)}]

−trace(WΣ)Hp+1(p−2;∆)

+δ ∗′Wδ ∗{

2Hp+1(p−3;∆)−Hp+3(p−3;∆)}

−(p−3)δ ∗′Wδ ∗[2E{χ−2

p+1(∆))I(χ2p+1(∆)≤ (p−3)}

−2E{χ−2p+3(∆)I(χ2

p+3(∆)≤ (p−3)}

+(p−2)E {χ−4p+3(∆)I(χ2

p+3(∆)≤ (p−2)}].

(21)

The proof is similar to that given in Ahmed (1992). For a suitable choice of the matrix W, risk dominance

of the estimators are similar to those under normal theory and can be summarized as follows :

– (i) Clearly

R(θ̂S,θ ;W) < trace(WΣ) for all ∆ ∈ [0,∞),

hence providing greater estimation accuracy than MLE, a gold standard. Indeed, the ADR function of

SE is monotone in ∆ , the smallest value is achieved at ∆ = 0 and the largest is trace(WΣ). Hence, θ̂S

outperforms θ̂ and is an admissible estimator when compared with θ̂ .

– (ii) θ̂S+

asymptotically superior to θ̂S

in the entire parameter space induced by ∆ . Therefore, θ̂S+

is also

superior to the classical MLE.

– (iii) Importantly, θ̂S+

does not inherent over-shrinking problem.

For practical reasons we conduct an extensive simulation study to investigate the performance of the propo-

sed estimators for moderate values of the observation period. This study is not available in reviewed literature

for the shrinkage procedures in the present context. Our simulation experiments provide strong evidence that

corroborates with the asymptotic theory and confirms the validity of the SES.


3 Simulation Study

In this section, we carry out a Monte Carlo simulation study to examine risk (namely MSE) performance

of the all estimators under consideration.

In order to conserve space, we report detailed results only for p = 3 and p = 5. The UPI (pivot) is given by

the null hypothesis H0 : θ = θeee′p. The length of the time period of observation T = 30,T = 50 are considered.

The relative efficiency of the estimators with respect to θ̂ is defined by

RMSE =risk

(θ̂)

risk(proposed estimator).

In particular, we define

RMSE_U =risk

(θ̂)

risk(

θ̂) , RMSE_R =

risk(

θ̂)

risk(

θ̃) ,

RMSE_S =risk

(θ̂)

risk(

θ̂S) , RMSE_S_plus =

risk(

θ̂)

risk(

θ̂S+) .

Thus, a relative efficiency larger than one indicates the degree of superiority of the estimator over θ̂ . The results

are graphically reported in Figure 1(a)-1(b) for p = 3 and in Figure 1(c)-1(d) for p = 5. Graphically, Figure 1

indicates that a reduction of 50% or more in the risk seem quite realistic depending on the values of ∆ and p.


0 0.5 1 1.5 2 2.5 3 3.5 4

1

1.5

2

2.5

3

3.5

4

4.5

Delta

RM

SE

RMSEU

RMSER

RMSES

RMSESp

lus

(a) T = 30 and p = 3

0 0.5 1 1.5 2 2.5 30.5

1

1.5

2

2.5

3

Delta

RM

SE

RMSEU

RMSER

RMSES

RMSESp

lus

(b) T = 50 and p = 3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

1

2

3

4

5

6

Delta

RM

SE

RMSEU

RMSER

RMSES

RMSESp

lus

(c) T = 30 and p = 5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

1

2

3

4

5

6

Delta

RM

SE

RMSEU

RMSER

RMSES

RMSESp

lus

(d) T = 30 and p = 5

FIG. 1 Relative efficiency vs ∆

The salient features given by these simulations can be summarized as follows (with few exceptions that are

detailed later) :

(a) The behavior of the SES is quite stable and does always have risk-dominance over the MLE around the

pivot, as theoretically expected.


(b) Comparing the above Figure 1(a)-1(b) to the following Figure 1(c)-1(d), one can see that higher is

the parameter dimensions better pronounced is the risk dominance of the shrinkage estimators over the

MLE.

(c) The efficiency of θ̃ converges to 0 as the normalized distance between true values of drift parameters

and hypothesized values tends to grow. However, the performance of θ̃ is superior to shrinkage estimators

around the pivot.

4 Recommendations and concluding remarks

In this paper, we consider the inference problem concerning the drift parameter vector of a multivariate

Ornstein-Uhlenbeck process. This model have a great application in finance, specifically in modelling the term

structure of interest rates in macro-economic context. It is natural to establish inference strategies for the drift

parameters of this model. Our contribution consists in suggesting shrinkage estimators that improve drastically

the performance of classical maximum likelihood estimator. This is demonstrated by theoretical results as well

as numerical results.

In summary, it is concluded that the positive-part shrinkage estimator dominates the usual shrinkage esti-

mator. Further, both shrinkage estimators outperform the MLE in the entire parameter space. We recommend

the use of PSE over SE and MLE. We re-emphasize here that θ̂S

should not be used since it is not a convex

combination of restricted and unrestricted methods and also changes the sign of the baseline estimator. One

may agree on adjusting the magnitudes of the usual estimator, but change of sign is somewhat a serious matter

and it would make a practitioner rather uncomfortable. For this reason we have introduced SE as a tool for de-

veloping θ̂S. Our simulation studies have provided strong evidence that corroborates with the said asymptotic

theory related to proposed SES.

Finally, our numerical findings indicate that a reduction of 50% or more in the risk seem quite realistic de-

pending on the values of ∆ and p (see Figure 1). Like the statistical models underlying the statistical inferences


to be made, the homogeneity of the drift parameters will be susceptible to uncertainty and the practitioners may

be reluctant to impose the additional information regarding parameters in the estimation process. The suggested

SES is robust in the sense that it still work better when such constraint may not hold. More importantly, the

estimation strategy combining the sample and parameter information is superior to a strategy based on sample

information only. Research on statistical implications of these and other estimators for a range of statistical

models is ongoing.

Acknowledgements This research received financial support from Natural Sciences and Engineering Research Council of canada.

Appendix

A Some Theoretical results

Let µWWW be the measure induced by the Wiener process {WWW (t), t ≥ 0}. Also, let us denote by µU the probability measure induced

by process {U(t), t ≥ 0}.

µU (B) = P{ω : Ut(ω) ∈ B}, where B is a Borel set.

The following result plays a central role in establishing the MLE of θθθ .

Proposition 4 Conditionally to XXX0, the Radon-Nicodym derivative of µXXX with respect to µWWW is given by

dµXXX

dµWWW(XXX) = exp

{−θθθ ′Σ−1UUUT − 1

2θθθ ′Σ−1DDDT θθθ

}. (22)

The proof is straightforward by applying Theorem (7.7) in Liptser and Shiryayev (1977). It should be noted that, the relation (22)

has the same form as the equation 17.24 of Liptser and Shiryayev (1977) for the univariate case with the non-random initial value.

A.1 Asymptotic Properties of MLE

In univariate case, Lipter and Shiryayev (1978, Theorem 17.3, Lemma 17.3 and Theorem 17.4) and Kutoyants (2004) give

some asymptotic results for the maximum likelihood estimator of the drift parameter of an Ornstein-Uhlenbech process. The ideas

of proof are the same as given in the quoted references. Nevertheless, for the completeness, we outline the proof of the strong

consistency.


Proposition 5 (Strong consistency) Assume that the model (5) holds.

(i) Then,

Pr{

limT→∞

θ̂θθ = θθθ}

= 1.

(ii) If θ1 = θ2 = · · ·= θp, then

Pr{

limT→∞

θ̃θθ = θ̄eeep

}= 1.

where θ̄ as the common drift parameter.

Proof For any matrix A, we denote by

‖A‖2 = trace(AA′).

(i) Let

M(T ) =(∫ T

0X1(t)dW1(t),

∫ T

0X2(t)dW2(t), . . . ,

∫ T

0Xp(t)dWp(t)

)′.

By some computations, we have

UUUT =−DDDT θθθ −M(T )Σ12 .

Therefore,

θ̂θθ∗−θθθ = DDD−1

T M(T )Σ12 . (23)

Obviously, M(T ) is a martingale whose quadratic variation is DDDT . Moreover, one can verify that

Pr{

limT→∞

‖DDDT ‖= ∞}

= 1,

and, for any column vector aaa, the process aaa′DDDT aaa is nondecreasing in T . Hence, by strong law of large number for martingale,

we get

θ̂θθ∗−θθθ a.s.−−−→

T→∞000,

that imply that θ̂θθ is strongly consistent for θθθ and this completes the proof of (i).

(ii) In the similar way, we prove that

θ̃θθ −θθθ a.s.−−−→T→∞

000,

which completes the proof.

¤

Proposition 6 (Asymptotic normality) Assume that the model (5) holds and suppose that XXX0 has the same moment as the invariant

distribution.


(i) Then,

√T

(θ̂θθ −θθθ

)L−−−→

T→∞Np (000, Σ) , and T

(θ̂θθ −θθθ

)′Σ−1

(θ̂θθ −θθθ

)L−−−→

T→∞χ2

p .

(ii) If θ1 = θ2 = · · ·= θp, then

√T


)L−−−→

T→∞Np

(000, eeepeee′p

(eee′pΣ−1eeep

)−1)

.

The proof follows by applying Proposition 1.34 or Theorem 2.8 of Kutoyants (2004, p. 61 and p. 121). Let

ρρρT =√

T(

θ̂θθ − θ̄eeep

), ξξξ T =

√T

(θ̂θθ − θ̃θθ

), ζT =

√T


).

ϒ =(eee′pΣ−1eeep

)−1eeepeee′p, and let Σ∗ = Σ −ϒ .

Proposition 7 Assume that Proposition 6 holds. Under H0 given in (6), we have

(i)

ρρρT

ξξξ T

L−−−→

T→∞N2p

000

000

,

Σ Σ∗

Σ∗ Σ∗

.

(ii)

ζζζ T

ξξξ T

L−−−→

T→∞N2p

000

000

,

ϒ 000

000 Σ∗

.

Proof

(i) By some computations, we get under the null hypothesis

(ρρρ ′T ,ξξξ ′T

)′=

(Ip, Ip−


)−1DDDT Σ−1eeepeee′p

)′√T

(θ̂θθ −θθθ

).

By combining Proposition 3 and Slutsky Theorem we get the result stated in (i).

(ii) In the similar way as in (i), we get under the null hypothesis,

ζζζ T

ξξξ T

=


)−1 eeepeee′pΣ−1DDDT

Ip−(eee′pΣ−1DDDT eeep


√

T(

θ̂θθ −θθθ)

.

Again, by combining Proposition 3 and Slutsky Theorem we get the result stated in (ii), which completes the proof.

¤

Corollary 4 Under the conditions of Proposition 7 and under H0,

ξξξ ′T Σ−1ξξξ TL−−−→

T→∞χ2

p−1.


Proof

ξξξ ′T Σ−1ξξξ T = ρρρ ′T(

Σ−1−Σ−1 (eee′pΣ−1eeep

)−1eeepeee′pΣ−1

)ρρρT +ρρρ ′T (ΞT −Ξ)ρρρT , (24)

where

ΞT =(

Ip−DDDT Σ−1eeep(eee′pΣ−1DDDT eeep

)−1)

Σ−1(


)−1eee′pΣ−1DDDT

),

and

Ξ = Σ−1p −Σ−1 (

eee′pΣ−1eeep)−1

eeepeee′pΣ−1.

Combining Proposition 4 and Slutsky theorem, we deduce that,

ρρρ ′T [ΞT −Ξ ]ρρρTL−−−→

T→∞000.

Moreover,

Σ(



)= ΣΞ

is an idempotent matrix and hence, using Proposition 4

ZZZ′(



)ZZZ = ZZZ′ΞZZZ ∼ χ2

r ,

where

r = rank(Ξ) = trace(

Σ(



))= p−1.

¤

Proof of Proposition 2

First, by Îto formula, we prove that

σ̂2i = σ2 +

X2i (0)T

, for i = 1,2, . . . , p. (25)

Hence, for i = 1,2, . . . , p,

Pr{

limT→∞

σ̂2i = σ2

i + limT→∞

X2i (0)T

}= 1 and then, Pr

{lim

T→∞σ̂2

i = σ2i

}= 1,

that proves the first statement of Proposition 2. Second, from (25), we get

limT→∞

E{

max16i6p

∣∣σ̂2i −σ2

i∣∣2

}= lim

T→∞

1T

E{

max16i6p

|Xi(0)|2}

6 pτ2 limT→∞

1T

= 0,

that completes the proof. ¤

Proof of Proposition 3


(i) By some computations, we get

(ρρρ ′T ,ξξξ ′T

)′=

(Ip, Ip−



)′√T

(θ̂θθ −θθθ

)

+(

Ip, Ip−(eee′pΣ−1DDDT eeep


)′δδδ .

By combining Proposition 7 and Slutsky Theorem we get the result stated in (i).

(ii) In the similar way as in (i), we get

ζζζ T

ξξξ T

=





√

T(

θ̂θθ −θθθ)

+





δδδ .

Again, by combining Proposition 7 and Slutsky Theorem we get the result stated in (ii) and that completes the proof. ¤

References

Abu-Mostafa, Y. S. (2001). Financial Model Calibration Using Consistency Hints. IEEE transactions on neural networks 12, 4,

791-808.

Ahmed, S. E.(1992). Large-sample pooling procedure for correlation. The Statistician 41, 425-438.

Ahmed, S. E. and Saleh, A. K. Md. E. (1999). Improved nonparametric estimation of location vector in a multivariate regression

model . Nonparametric Statistics. 11, 51-78.

Ahmed, S. E. (2001). Shrinkage estimation of regression coefficients from censored data with multiple observations. In S.E.

Ahmed and N. Reid (Eds.), Empirical Bayes and and Likelihood inference (pp. 103-120). NewYork :Springer.

Ahmed, S. E.(2002). Simultaneous estimation of coefficient of variations. Journal of Statistical Planning and Inference, 104,

31-51.

Ahmed, S. E.(2005). Assessing Process Capability Index for Nonnormal Processes. Journal of Statistical Planning and Infe-

rence, 129, 195-206.

Basawa, V. Ishwar, and B. L. S. Rao, P. B. L. S. (1980). Statistical inference for stochastic processes. London : Academic Press.

Engen, S. and Sæther, B. E. (2005). Generalizations of the Moran Effect Explaining Spatial Synchrony in Population Fluctua-

tions. The American Naturalist, 166, 603–612.

Engen, S., Lande, R., Wall, T. and DeVries, J. P. (2002). Analyzing Spatial Structure of Communities Using the Two-Dimensional

Poisson Lognormal Species Abundance Model. The American Naturalist, 160,1, 60–73.


Georges, P. (2003). The Vasicek and CIR Models and the Expectation Hypothesis of the Interest Rate Term Structure. The Bank

of Canada and Department of Finance, Canada.

Judge, G. G. and M. E. Bock (1978). The statistical implications of pre-test and Stein-rule estimators in econometrics. Amster-

dam :North Holland.

Jurec̆ková, J. and P. K. Sen (1996). Robust Statistical Procedures. New York :Wiley.

Kubokawa, T. (1998). The Stein phenomenon in simultaneous estimation : A review. In S.E. Ahmed et al. (Eds.), Nonparametric

Statistics and Related Topics (pp. 143-173). NewYork : Nova Science.

Kutoyants, A. Y. (2004). Statistical Inference for Ergodic Diffusion Processes. New York : Springer.

Liptser, R. S. and A. N., Shiryayev (1977). Statistics of Random Processes : Generale Theory, Vol. I. New York : Springer-

Verlag.

Liptser, R. S. and A. N., Shiryayev (1978). Statistics of Random Processes : Applications, Vol. II. New York : Springer-Verlag.

Sclove, S. L., C. Morris, R. Radhakrishnan (1972). Non-optimality of preliminary test estimators for the mean of a multivariate

normal distribution. Annals of Mathematical Statistics 43, 1481-1490.

Sen, P.K. (1986). On the asymptotic distributional risks of shrinkage and preliminary test versions of maximum likelihood

estimators. Sankhya, Series A, 48, 354-371.

R. Schöbel, R. and Zhu, J. (1999). Stochastic Volatility With an Ornstein-Uhlenbeck Process : An Extension. European Finance

Review 3, 23-46.

Steele, J. M. (2001). Stochastic calculus and financial applications. New York : Springer-Verlag.

Stigler, S. M. (1990). The 1988 Neyman Memorial Lecture : A Galtonian perspective on shrinkage estimators. Statistical

Science 5, 147-155.

Langetieg, T. C. (1980). A Multivariate Model of the Term Structure. The Journal of Finance, 35, 1, 71-97.

Vasicek, O. (1977). An Equilibrium Characterization of the Term Structure. Journal of Financial Economics 5, 177-188.

Shrinkage Drift Parameters Estimation for Multi-factors Ornstein ...

Documents

Transcript of Shrinkage Drift Parameters Estimation for Multi-factors Ornstein ...