Probability density estimation in stochastic environmental models

ORIGINAL PAPER

E. van den. Berg Æ A. W. Heemink Æ H. X. Lin

J. G. M. Schoenmakers

Probability density estimation in stochastic environmental modelsusing reverse representations

Published online: 20 November 2005� Springer-Verlag 2005

Abstract The estimation of probability densities ofvariables described by stochastic differential equationshas long been done using forward time estimators, whichrely on the generation of forward in time realizations ofthe model. Recently, an estimator based on the combi-nation of forward and reverse time estimators has beendeveloped. This estimator has a higher order of con-vergence than the classical one. In this article, we explorethe new estimator and compare the forward and for-ward–reverse estimators by applying them to a bio-chemical oxygen demand model. Finally, we show thatthe computational efficiency of the forward–reverseestimator is superior to the classical one, and discuss thealgorithmic aspects of the estimator.

1 Introduction

Many environmental models used today are based ondeterministic differential equations (Loucks et al.1981).However, the abilities of these models for prediction inreal life applications, are often hampered by uncertain-ties in initial conditions, model parameters and/orsources. Decision makers have also recognized theinherent uncertainty in modeling efforts and they aremore and more asking for risk estimates. For example,in a water pollution problem, they want to know theprobability of exceeding a critical concentration of asubstance.

Uncertainties can be included into the model byintroducing noise processes as forcing terms. As a resulta stochastic differential equation is obtained (Jazwinsky1970). In general stochastic differential equations cannotbe solved analytically and have to be approximatednumerically to derive a discrete stochastic model(Kloeden and Platen 1992). A well-known technique forestimating probabilistic characteristics of a discrete sto-chastic model is Monte Carlo simulation (Hammersleyand Handscomb 1964). Here many different realizationsof the stochastic model are generated to get informationabout the probability density of the model results. Thisapproach is conceptually very simple and can be appliedfor many different types of (highly nonlinear) problems.However, Monte Carlo techniques do consume a largeamount of CPU time and the accuracy of the estimatesimproves very slowly with the increase of sample size.

The efficiency of Monte Carlo methods can be im-proved by variance reduction (Kloeden and Platen1992). Here an approximation of the Kolmogorovbackwards equation is required. In some simple appli-cations analytical approximations of this partial differ-ential equation can be used. In general, however, anumerical solution is required. For high dimensionalsystems this may become very time consuming(Schoenmakers et al 2002), (Milstein and Schoenmakers2002).

Recently, (Milstein et al. 2002) introduced the con-cept of reverse time diffusion. The classical Monte Carloestimate is based on forward realizations of the originalstochastic model. (Milstein et al. 2002) derived a reversesystem from the original model and showed that theclassical Monte Carlo estimate can also be based onrealization of this reverse system. For many applicationsit is more efficient to use realizations of the reverse sys-tem instead of the original model. The most efficientimplementation is, however, obtained if the forwardrealizations and the reverse system realizations arecombined. This is called the forward–reverse estimator.Estimators based on reverse time realization are veryattractive if the probability density is required at a very

E. van den. Berg Æ A. W. Heemink (&) Æ H. X. LinDepartment of Applied Mathematical Analysis,Delft University of Technology Faculty of InformationTechnology and Systems, Mekelweg 4, 2628 CD Delft,The NetherlandsE-mail: [email protected].: +31-15-2785813Fax: +31-15-2787295

J. G. M. SchoenmakersWeierstrass Institute for Applied Analysis and Stochastics,Mohrenstrasse 39, 10117 Berlin, Germany

Stoch Environ Res Risk Assess (2005) 20: 126–139DOI 10.1007/s00477-005-0022-5

limited number of time steps and at a very limitednumber of exceedence levels.

In this paper, we apply the forward–reverse methodto estimate the probability density of a stochastic bio-chemical-oxygen demand (BOD) model. In Sect. 2, wedescribe the classical Monte Carlo estimation, present ageneral probabilistic representation of this estimatorbased on forward realizations and introduce the for-ward–reverse estimator. In Sect. 3, the BOD model isgiven, and its reverse model is derived. We describe indetail an application of the method to estimate theprobability density of the BOD model in Sect. 4 andcompare the performance of the forward with forward–reverse estimators in Sect. 5.

2 Transition density estimators for SDEs

Consider the stochastic differential equation (SDE) inthe Ito sense

dX ¼ aðs;X Þdsþ rðs;X ÞdW ðsÞ; t0 � s � T ; X ðt0Þ ¼ x0ð1Þ

where X ¼ ðX 1; :::;X dÞ>; a ¼ ða1; :::; adÞ> are d-dimen-sional vectors, W ¼ ðW 1; :::;W mÞ> is an m-dimensionalstandard Wiener process, and r ={rij} is a d· m-matrix,m‡ d. We assume that the d· d-matrix b :¼ rr>; b ¼fbijg; is of full rank for every (s,x), s2[t0,T], x2Rd. Thefunctions ai(s,x) and r ij(s,x) are assumed to be suffi-ciently smooth such that we have existence anduniqueness of the solution Xt,x(s)2Rd, Xt,x(t)=x, t0 £t £ s £ T, of (1), smoothness of the transition densityp(t,x,s,y) of the Markov process X, and existence of allthe moments of p(Æ ,Æ ,Æ ,y).

The solution of SDE ((1)) may be approximated bydifferent numerical methods, see (Kloeden and Platen1992; Milstein 1995). Here we consider the Eulerscheme,

X ðtkþ1Þ ¼ X ðtkÞ þ aðtk;X ðtkÞÞðtkþ1 � tkÞþ rðtk;X ðtkÞÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

tkþ1 � tkp

1k; ð2Þ

with t0<t1<... <tK=T, and 1k 2 Rm; k ¼ 0; . . . ;K � 1;being i.i.d. standard normal random variables. In fact,(2) is the most simple numerical scheme for integrationof SDE (1).

2.1 Classical and non-classical density estimators

We now review some known estimators for the densityp(t,x,T,y) in connection with (1).

2.1.1 The Parzen-Rosenblatt (pure forward or classical)estimator

Let �Xt;x be a numerical approximation ofXt,x obtained bythe Euler method and let X n :¼ �X ðnÞt;x ðT Þ; n ¼ 1; . . . ;N ; be

a sample of independent realizations of �Xt;xðT Þ: Then onemay estimate the transition density p(t,x,T,y) from thissample by using standard techniques of non-parametricstatistics such as the classical kernel (Parzen-Rosenblatt)estimator. The kernel (Parzen-Rosenblatt) density esti-mator with a kernel K and a bandwidth d is given by

pFEðt; x; T ; yÞ ¼1

Ndd

X

N

n¼1K

X n � yd

� �

; ð3Þ

see (Devroye and Gyorfi 1985), (Silverman 1986). Forexample, in (3) one could take the Gaussian kernelK(x)=(2p)�d/2exp (�|x|2/2). Here d should decrease tozero as N increases while Ndd fi ¥, see (Devroye 1986).

2.1.2 The pure reverse estimator (RE)

In order to proceed with more sophisticated densityestimators we introduce a reverse diffusion system for(1). In a more special setting the notion of reverse oradjoint diffusion is due to Thomson (Thomson 1987).The reverse system introduced in this paper can be seenas a generalization of Thomson’s approach and is de-rived in (Milstein, Schoenmakers and Spokoiny 2002) ina more transparent and more rigorous way. We firstintroduce a reversed time variable ~s ¼ T þ t � s anddefine the functions

~aið~s; yÞ ¼ aiðT þ t � ~s; yÞ;

~bijð~s; yÞ ¼ bijðT þ t � ~s; yÞ:

For a vector process Yt,y(s)2IRd and a scalar processYt;yðsÞwe then consider the reverse time stochastic system

dY ¼ aðs; Y Þdsþ ~aðs; Y Þd ~W ðsÞ; Y ðtÞ ¼ y;dY ¼ cðs; Y ÞYds; YðtÞ ¼ 1;

ð4Þ

with ~W being an m-dimensional standard Wiener processand

aiðs; yÞ ¼X

d

j¼1

@~bij

@yj� ~ai; ð5Þ

cðs; yÞ ¼ 1

2

X

d

i;j¼1

@2~bij

@yi@yj �X

d

i¼1

@~ai

@yi ; ð6Þ

~rðs; yÞ ¼ rðT þ t � s; yÞ ð7Þ

It is possible to construct an alternative density esti-mator in terms of the reverse system (4). Suppose that

ð�Y ðmÞ0;y ;�YðmÞ0;y Þ;m ¼ 1; :::;M ; is an i.i.d. sample of numerical

solutions of (4) starting at t=0, obtained by the Eulerscheme. Then a pure reverse estimator is given by

pREðt; x; T ; yÞ :¼ 1

Mdd

X

M

m¼1K

x� �Y ðmÞ0;y ðT Þd

!

�YðmÞ0;y ðT Þ: ð8Þ

127

In fact, the reverse estimator (8) can be obtained as aside case from the forward–reverse estimator discusseddiscussed in C.

2.1.3 The forward–reverse estimator (FRE)

By combining the forward (1) and reverse (4) estimatorsvia the Chapman-Kolmogorov equation with respect toan intermediate time t*, (Milstein, Schoenmakers, andSpokoiny 2002) constructed the so called FRE,

pFREðt; x; T ; yÞ ¼1

M

"

1

Ndd

X

M

m¼1

X

N

n¼1K

�X ðnÞt;x ðt�Þ � �Y ðmÞt�;y ðT Þd

!

� �YðmÞt�;yðT Þ

#

: ð9Þ

It is shown that the FRE (9) has superior propertiesin comparison with density estimators based on pureforward (3) or pure reverse (8) representations. Obvi-ously, by taking t*=T and t*=0, the estimator (9)collapses to the pure forward estimator (3) and purereverse estimator (8), respectively.

2.2 Theoretical properties of the estimators

2.2.1 Order of convergence.

In general, for estimating a target value p by an esti-mator p it is natural and usual to define the accuracy ofthe estimator by

AccuracyðpÞ :¼ �ðpÞ :¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Eðp � pÞ2q

¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Deviation2ðpÞ þ Bias2ðpÞq

: ð10Þ

Loosely speaking, for a second order kernel appliedin (9) and any choice of 0<t*<T, the FRE has root-NðOðN�1=2ÞÞ accuracy for dimension d £ 4. For d>4root-N accuracy is lost but then the FRE accuracy orderis still the square of the FE/RE accuracy order (seeTable 1). Moreover, it can be shown that root-N accu-racy of (9) can also be achieved for d>4 by using higherorder kernels in (9).

By definition (10) it is possible to relate the ‘‘expected’’accuracy of the different density estimators to the num-ber of simulated trajectories involved. However, simu-lating trajectories is not the only costly issue in the

density estimation. For all estimators one has to evaluatea functional of the simulated trajectories. In case of theFE and RE estimators, this functional consists of a singlesummation, whereas for the FRE estimator a morecomplicated double summation needs to be evaluated.Therefore, for a proper comparison it is better to con-sider the complexity of the different estimators which isdefined as the required computation cost for reaching agiven accuracy e . For instance, naive evaluation of thedouble sum in (9) would require a computational cost oforder OðMNÞ in contrast to OðNÞ for the FE and REestimators! Clearly, such an naive approach would havea dramatic impact on the complexity of the FRE. For-tunately, smarter procedures for evaluating this doublesum exist, which utilize the small support of the kernel K.Particularly, in (Greengard and Strain 1991) it is shownfor Gaussian kernels that this sum can be evaluated at acost of OðNÞ in caseM=N. In Table 1 we summarize theresults of (Milstein et al. 2002) and list the accuracy andcomplexity of the FE, RE and the FRE where the latter isimplemented with an efficient procedure for evaluatingthe double sum, for instance, according to the method ofGreengard and Strain. For the FRE estimator we as-sumed N=M and a second order kernel. It is because ofthe second order kernel that we have to distinguish forthe FRE between d<4 and d‡ 4.

Remark. Naturally, all estimators FE, RE and FREhave a bias which involves a component due to appli-cation of the Euler scheme. Using results of (Bally andTalay 1996b) and (Bally and Talay 1996a), it is proven in(Milstein et al. 2002) that for all estimators the accuracyloss due to the Euler scheme applied with time discreti-zation step h is of order OðhÞ where, most importantly,the order coefficient may be chosen independent of thebandwidth d

2.2.2 The choice of t* in the forward–reverse estimator

The results concerning the order of accuracy and com-plexity of the FRE estimator in Sect. 2.2 do not dependon the particular choice of t* with t<t*<T. However,the order coefficients do depend on this choice. Toinvestigate this inmore detail we consider Theorem 6.1. in(Milstein et al. 2002) for d < 4. According to this theo-rem we have for M=N and dN=N�1/dlog1/dN,

EðpFRE � pÞ2 ¼ DNþ oðN�1Þ

with

Table 1 Summary of accuracy and complexity of the forward (FE), reverse (RE), and forward–reverse (FRE) estimators

Estimator dN Accuracy Complexity Compl:fFE;REgCompl:fFREg

FE/RE N�1/(4+d) OðN�2=ð4þdÞÞ Oð��2�d=2ÞFRE d £ 4 N�1/dlog1/dN OðN�1=2Þ Oðj log ej��2Þ |log e|�1e�d/2

FRE d>4 N�2/(4+d) OðN�4=ð4þdÞÞ Oðj log ej��1�d=4Þ |log e|�1e�1-d/4

128

D :¼Z

rðuÞk2ðuÞduþZ

r2ðuÞl2ðuÞqðuÞdu� 2p2

¼Z

rðuÞq2ðuÞl2ðuÞdu� p2 þZ

r2ðuÞl2ðuÞqðuÞdu� p2

¼ Dð1Þ þ Dð2Þ; ð11Þ

where r(u):=p(t,x,t*,u) is the density of the randomvariable Xt,x(t*), q(u) denotes the density of Yt�;yðT Þ; lðuÞis defined as the conditional mean of Yt�;yðT Þ givenYt�;yðT Þ ¼ u; kðuÞ :¼ qðuÞlðuÞ; and l2ðuÞ :¼ EðY2

t�;yðT Þ jYt�;yðT Þ ¼ uÞ: So we have in (11),

Dð1Þ ¼Z

rðuÞq2ðuÞl2ðuÞdu�Z

rðuÞqðuÞlðuÞdu� �2

¼ Var qðXt;xðt�ÞÞlðXt;xðt�Þ� �

¼: f ðt�Þ

and

Dð2Þ ¼Z

r2ðuÞE½Y2t�;yðT Þ j Yt�;yðT Þ ¼ u�qðuÞdu� p2

¼ E½r2ðYt�;yðT ÞÞY2t�;yðT Þ� � E½rðYt�;yðT ÞÞYt�;yðT Þ�

� �2

¼ VarðrðYt�;yðT ÞÞYt�;yðT ÞÞ ¼: gðt�Þ:

Since for t� " T ; q! dy and for t� # t; r! dx; it fol-lows that

limt�"T

f ðt�Þ ¼ þ1;

limt�#t

gðt�Þ ¼ þ1:

Further,

limt�"T

gðt�Þ ¼ limt�#t

f ðt�Þ ¼ 0:

Therefore, there exists an optimal choice t�opt whichsatisfies

f ðt�optÞ þ gðt�optÞ ¼ mint<�<T

ff ðt�Þ þ gðt�Þg:

However, determination of such an exact optimum isnot easy and also not necessary in practice. Experi-mantally, one should rather seek a t* such that both f(t*)and g(t*) are small. For instance, one could choose somecandidates for t* and compare them by estimating thevariances f(t*) and g(t*) roughly via classical (e.g. Par-zen–Rozenblatt) approximations for q and r using smallsample sizes. In Sect. 4.3 a heuristic method for thedetermination of t* will be presented. Without goinginto detail we note that similar considerations concern-ing a proper choice of t* apply for the case d>4.

3 Test case: a stochastic BOD model

The biochemical oxygen demand model presented in thissection is often used in water quality determination ofriver and estuarine systems and is an extension of theso-called Streeter–Phelps model. In this model, the

concentration levels of Carbonaceous BiochemicalOxygen Demand (CBOD), Dissolved Oxygen (DO), andNitrogenous Biochemical Oxygen Demand (NBOD) arerelated to each other by a set of differential equations.With b, o, and n representing the concentration levels(mg/l) of CBOD, DO, and NBOD, respectively, thesystem is given by

db=dtdo=dtdn=dt

2

4

3

5 ¼�kb 0 0�kc �k2 �kn

0 0 �kn

2

4

3

5

bon

2

4

3

5þs1s�2s3

2

4

3

5: ð12Þ

where we defined kb :=kc+k3 and s�2 :¼ k2ds þ po�ro þ s2. A description of the parameters used in themodel, along with their units and typical values is givenin Table 2. Although the concentrations are timedependent, the purpose behind the model is often tomonitor the concentration levels within a fixed volumeof water, flowing downstream a river. An underlyingassumption is that the velocity of the water flow isconstant and thus time and distance are linearly related.For more information about the model and its back-ground, the reader is referred to e.g. (Stijnen 2002).

3.1 Forward BOD model

As for many environmental models, BOD is also subjectto various uncertainties. These uncertainties can beincorporated into the model by adding white noiseprocesses to each of the three equations of the model,effectively changing deterministic sources and sinks intostochastic ones:

dBdOdN

2

4

3

5 ¼�kb 0 0�kc �k2 �kn

0 0 �kn

2

4

3

5

BON

2

4

3

5dt þs1s�2s3

2

4

3

5dt

þrB 0 00 rO 00 0 rN

2

4

3

5dWt; ð13Þ

where the dWt term denotes the Wiener process incre-ment at time t. The stochastic BOD model is Markovand its variables Gaussian. Because the noise terms ad-ded do not depend on the values of B, O, or N, themodel can be interpreted in both the Ito and Stratono-vich sense. The stochastic equation (15) is linear, and ananalytical solution is easily found.

A disadvantage of the simple additive white noiseterms, is the possibility of getting negative concentrationlevels. This problem can be solved by scaling the noiseterm using a suitably chosen function, e.g.:

sðxÞ ¼

0 x\0

12

2xs

� �d06x\s=2

1� 12 ð2ðs� xÞ=zÞd s=26x\s

1 s6x

8

>

>

>

>

<

>

>

>

>

:

129

with d > 0 the order of the polynomial used, and s thethreshold below which the noise should be damped.Applying this to the BOD model would result in thefollowing set of equations

dB

dO

dN

2

6

4

3

7

5

¼�kb 0 0

�kc �k2 �kn

0 0 �kn

2

6

4

3

7

5

B

O

N

2

6

4

3

7

5

dt þs1

s�2s3

2

6

4

3

7

5

dt

þsðBÞrB 0 0

0 sðOÞrO 0

0 0 sðNÞrN

2

6

4

3

7

5

dWt;

Although this solves the problem of negative con-centrations, the linearity of the model is lost, alongwith the analytical solution and the Gaussian prop-erty. Being a test case for the forward–reverse esti-mator, the presence of an analytical solution of thepresented model outweighs the disadvantage of a re-duced practical meaning, as it allows for verificationof the outcome. Therefore, it was decided to use thesimple stochastic model (15) despite its obviousshortcoming.

Given this model, we want to determine p(t,x,T,y),i.e. the probability density associated with the transitionbetween a given start (x) and end point (y) indicated bytwo (B, O, N)T vectors, at two distinct times t £ T.Based on the forward model, it is possible to apply (3),which requires a set of N realizations. These can beobtained from numerical integration of (15), as shown inFig. 1a.

3.2 Reverse BOD model

In order to apply a forward–reverse estimator (Sect.2.1.3), the reverse representation of the forward modelneeds to be derived. Since the presented BOD model isalready given in the form prescribed by (1), straight-forward substitution yields,

aðt;X Þ ¼�kbX ð1Þ þ s1

�kcX ð1Þ � k2X ð2Þ � knX ð3Þ þ s�2�knX ð3Þ þ s3

2

4

3

5;

and

rðt;X Þ ¼rB 0 0

0 rO 0

0 0 rN

2

4

3

5;

b ¼ rrT ¼r2

B 0 0

0 r2O 0

0 0 r2N

2

4

3

5:

Next, using Eqs. (5) and (6), the expressions for a andc can be derived to be as follows:

Table 2 Description andtypical values for theparameters used in the BODmodel

Par. Description Unit Value

kc Reaction rate coefficient (l/day) 0.763k2 Reaeration rate coefficient (l/day) 4.250k3 Sedimentation and adsorption loss rate for CBOD (l/day) 0.254kn Decay rate of NBOD (l/day) 0.978po Photosynthesis of oxygen (l/day) 7.280ro Respiration of oxygen (l/day) 7.750ds Saturation concentration of oxygen (l/day) 10.00s1 Independent source for CBOD (l/day) 3.000s2 Independent source for OD (l/day) 0.000s3 Independent source for NBOD (l/day) 0.000

Fig. 1 Realization paths of a the forward process X,and b the Xand Y process used in the forward–reverse approach. The asterisksymbols indicate the end-points of the realizations that are used inthe Monte Carlo estimator

130

aðt;X Þ ¼kbX ð1Þ � s1

kcX ð1Þ þ k2X ð2Þ þ knX ð3Þ � s�2knX ð3Þ � s3

2

4

3

5;

c ¼ kb þ k2 þ kn:

with X(1)=B, X(2)=O, and X(3)=N. For the reverse rprocess we have ~r ¼ r: Now, with all variable terms inEq. (4) known, the reverse process becomes

dY ¼kbX ð1Þ � s1

kcX ð1Þ þ k2X ð2Þ þ knX ð3Þ � s�2knX ð3Þ � s3

2

4

3

5ds

þrB 0 00 rO 00 0 rN

2

4

3

5d ~W ðsÞ; ð14Þ

with initial condition

Y ðt0Þ ¼ y; ð15Þ

and

dY ¼ ðkb þ k2 þ knÞYdt; ð16Þ

with

Yðt0Þ ¼ 1: ð17Þ

With all constants in the reverse model positive, both theY and Y processes are unstable, and grow exponentially.This relation is true in general; processes that are stable inthe forward formulation will be unstable in the reverseformulation and vice versa. Just as for the forward for-mulation of the given BOD model, the reverse formula-tion is a linear system of equations, and the closed-formanalytical solution is easily derived. How exactly the for-ward–reverse estimator is applied to obtain the probabil-ity density given earlier, is explained in the next section.

4 Application of the forward–reverse estimator

One way to carry out the forward–reverse estimation aspresented in Sect. 2.1.3 is by means of a Monte Carlosimulation. Here, a number of realizations for both theforward and reverse models are generated (e.g. for theCBOD model this is done by numerically integrating Eq.(13) for the forward part and Eq. (14) and (16) for thereverse part). Based on the forward realizations, kernelestimation techniques are used to evaluate the transitiondensity from the starting point of the simulation to theend point of each of the realizations of the reverse part,as illustrated by Fig. 1. A number of choices have to bemade for the implementation of the algorithm. Thesewill be discussed below.

4.1 Numerical integration

It is important to note that in the context of forward–reverse estimation, two sources of errors other than

the kernel estimation exist. The first one is due to theuse of a numerical scheme to approximate the SDEand the other originates from the use of Monte Carlosimulation. Numerous integration schemes can be usedto generate numerical solutions of a stochastic differ-ential equation. For our experiments, we used theEuler scheme, which was given by (2). The error of thenumerical scheme can be reduced by using higherorder schemes. The statistical error of the MonteCarlo simulation can be reduced by increasing thenumber of paths generated. Obviously, there should bea balance between these two sources of error. Reduc-ing only one of the two will not always contribute to areduction of the global error. As such, there shouldalso be a balance between the stepsize used in thenumerical scheme and the number of paths generated.This is discussed in more detail in (Schoenmakers andHeemink 1997).

4.2 Kernel estimation

The two most important aspects of kernel estimationare the kernel shape used, and the bandwidthparameter. These aspects are in close connection withthe computational effort required to carry out a kernelestimation. This is of minor importance for singlepoint evaluations, as used in forward density estima-tion, where a kernel estimation based on N realiza-tions can easily be implemented with OðNÞ timecomplexity. However, it is of major importance incases where the approximated density function has tobe evaluated at a large number of points. This is thecase for forward–reverse density estimation. Naivealgorithms easily lead to an OðN �MÞ computationaltime, which is clearly prohibitively expensive. Green-gard and Strain [4] developed an optimal algorithmfor Gaussian kernels, based on the properties ofGaussian curves, that leads to an OðN þMÞ timecomplexity. One known drawback of their method,however, is the large constant multiplication factor.Unless the number of sample and evaluation points isvery large, this constant will cause the computationaltime to be exceedingly long. We therefore decided toaim for a more modest OðM logNÞ algorithm, withlower overhead, instead.

By using a parabolic kernel, the domain of influenceof a realization is bounded, as opposed to the Gaussiankernel which has an unbounded support [3]. Because ofthis bounded support, most of the NÆ M kernel evalua-tions will be zero, which means that those realizationswill not contribute to the probability density estimationat the given location. Based on this observation and thefact that our parabolic kernel only has a support radiusof, say, /, it is possible to group the realizations in sucha way that most of the zero kernel evaluations can beavoided, thus saving a lot of work. Given a set of Nforward realizations, xi, in a d-dimensional space, with1 £ i £ N, we proceed as follows.

131

1. Find the bounding box enclosing all realizations;2. Create a regular grid such that the length of each side

equals the kernel support, /;3. Record which realizations are contained in which

grid cell;

Once this information is known, the kernel estimationof a point at location y can be obtained by following theprocedure outlined below

1. Determine the cell containing y. Note that this celldoes not necessarily have to be an existing cell, nordoes it need to be inside the bounding box;

2. Find all cells that share at least one corner point withthe aforementioned cell. This way, a total of 3d cellsare selected, including the cell containing y.

3. Retrieve all samples stored in the selected cells anduse them to get a kernel estimation.

The distance from y to a particle contained in anyother cell is guaranteed to exceed the support range, /,because of the choice of cell size. Therefore, no infor-mation is lost, and the resulting kernel estimation is ex-actly the same as the one obtained using a ‘brute-force’approach. In case of the FRE, the above three steps aresimply repeated for each of the reverse realizations.

4.2.1 Data structure

Some remarks regarding the part of the algorithm wherethe data structure is set up are appropriate. This datastructure stores the cells and the realizations contained inthem. Once constructed, the data structure can be used tosupport fast evaluation of the kernel estimation based onthese realizations. It is important to realize that onlythose cells containing realizations should be stored,which means a maximum of N cells. Especially when thedistributions of realizations is wide, compared to thekernel support, and the dimension of the problem is high,the potential number of cells that would otherwise bestored is enormous. To allow for both the selective stor-age and fast lookup of cells, a hashtable construction wasused. Each entry in the table contains cells sharing thesame hash value. The hash value is based on the coordi-nates of the cell which are carefully combined to allow foran even distribution of cells over table entries. Becausethe number of cells stored in each table entry is not knowna priori and differs between entries, each entry merelycontains a pointer to a linked list of cells. For similarreasons, each cell only contains a pointer to a linked list ofthe realizations it encloses, as shown in Fig. 2.

Retrieval of cells from the data structure proceeds ina series of steps. First, the hash value based on the cell’scoordinates is computed. This value is then used to indexthe hashtable and access the list of all cells storedbeginning at that link. Next, the list is traversed and thecoordinates of each cell compared to the given coordi-nates, until either a match was found, or the end of thelist reached. The realizations stored by the given cell canbe retrieved likewise.

4.2.2 Computational efficiency

Although the algorithm described can be shown to havea worst-case OðN �MÞ time complexity, it performed wellin our experiments, showing the desired OðM logNÞbehavior in most cases. In the cases where this timecomplexity was not achieved, the support size to reali-zation distribution domain size ratio was found to behigh. That is, either the support of the kernel was toolarge or the realizations were highly concentrated in asmall region. Ultimately, it comes down to the problemthat all realizations are contained in a small number ofadjacent cells. When a kernel evaluation is made innearby regions, most, if not all, realizations are includedin the kernel evaluation leading to worst-case times. Inthe forward–reverse estimator, the optimal bandwidthused for kernel estimation depends on the number offorward realizations:

dN ¼ N�1=d log1=d N : ð18Þ

When the number of realizations is increased, thebandwidth and cellsize decrease and, as a result, thenumber of kernel evaluations made is reduced. Even forour three dimensional BOD model, OðM logNÞ kernelestimations were made with N well below 10,000.

4.3 Combination point t* and number of tracks

The remaining parameters to be discussed are the num-ber of realizations generated and the location, t*, wherethese realizations are combined using a kernel estima-tion. The effect the number of realizations have on theforward–reverse estimator is well known, at least, interms of orders of convergence. Given a d-dimensionalproblem, the overall estimator accuracy is of orderOðN�1=2Þ for d £ 4 and of order OðN�4=ð4þdÞÞ; otherwise,with N=M the number of forward and reverse realiza-tions, respectively. Until now, the influence t* has on theestimator has not been thoroughly studied. In this sec-tion, we will show that in practice, this parameter is ofutmost importance to the efficiency of the estimator. Theexperimental results given below are all based on theapplication of the forward–reverse estimator on the BODmodel. Although many experiments were done to vali-date results, we only selected a single set of parameters toillustrate our results. The parameter settings are sum-marized in Table 3. The parabolic kernel was used forkernel evaluations, with a bandwidth of (N/log N)�1/3,which is equivalent to a support of

ffiffiffiffiffiffiffiffi

7=3p

ðN= logNÞ�1=3:

4.3.1 The influence of t*

In our first experiment, we determined the influence of t*on the estimator. The continuous parameter space of t*was first discretized, and only values of t* between 0.01and 3.99, inclusive, at a regular interval of 0.01 wereconsidered. For each of those values, 100 independent

132

density estimations were done, to determine the meanand variance of the estimator. To avoid errors due to theuse of a numerical integration scheme, the exact solutionwas sampled to provide realizations of the forward andreverse processes. Not only does this reduce the noise inthe estimator, thus showing the influence of t* moreclearly, it also greatly reduces the time requirement forthe 39,900 estimations that were made. The results ob-tained for these runs, with N=M=1,000, are summa-rized in Fig. 3. For small t*, the results show that manyestimations are not very accurate, compared to the exactanswer of approximately 0.13. A closer look at the resultsfor t*2[2.7, 4), given by Fig. 4, however, shows that theFRE is indeed capable of giving accurate results for arange of t* values. The question remains what exactlycauses the highly inaccurate answers for small t*? Themost obvious answer is to say thatN andM are too small,and indeed, increasing them will result in more accurateestimations. In fact, all other reasons given below can bereduced to the lack of realizations in either direction.Nevertheless, they give valuable insight in the applicationof the FRE and therefore deserve mentioning.

Given the limited number of realizations used, onereason behind the erratic outcome of the FRE is thekernel estimation. During each density estimation thenumber of reverse realizations that fell within the kernelsupport was counted. The support domain was taken tobe the smallest box enclosing all forward realizations,enlarged at both sides in each dimension by the band-

width of the kernel. This way, only those reverse real-izations that are within the support can eventuallyattribute to the probability density. In Fig. 5 the averageand range of these counts are plotted against t*. Forvalues of t* less than 2.5, the number of reverse real-izations is zero in all but a few cases, causing the sum-mation of all kernel estimations to evaluate to zero aswell. In those cases where some kernel estimations weredone, the density estimation is often exceedingly large.This is caused by a combination of factors. First, theprocess is unstable (see Fig. 14 for the covariance traceof the reverse process) which leads to an extremely flatGaussian distribution. Combined with the small samplesize, a realization that does end up inside the supportdomain can do so virtually anywhere. In addition, thealready inaccurate estimation will then be inflated by thescaling factor Y; which grows exponential in reversetime for the BOD model, resulting in a large overesti-mation of the probability density in the worst case. Forthe above problem, two solutions exist; (a) use more

Fig. 2 Data structure used forstoring realizations on which akernel evaluation is based

Table 3 Default parameter settings for the BOD model

Parameter Value

Start time (t) t=0End time (T) T=4Model parameters rB=1.5, rO=1.5, rN=1.0,

see further Table 2Initial point Bt=15, Ot=8.5, Nt=5Default evaluation point BT=3, OT=9, NT=0.1Exact transition density 0.13477720...

Fig. 3 Range and mean of p values for different t*2(0,4), based on100 forward–reverse estimations at each of the t* locations, withN=M=1,000 and BOD parameters as given by Table 3

133

realizations to improve accuracy, and (b) avoid values oft* where the problem occurs. Both solutions will beconsidered below as we work towards a practicalapproach.

4.3.2 The influence of the number of realizations

The influence of the number of realizations on the esti-mation is perhaps best studied by looking at the variancein the estimation observed for different values of N andM. Because of the results found earlier, we will hence-forth only consider t*2[2.8,4). Fig. 6 To start with, theratio between variance obtained using N=1,000 andN=10,000, while keeping M fixed at 1,000 was deter-mined and is shown in Fig. 7. For t* < 3.5 there isn’tany notable improvement with variance ratio’s around1. It is only for values of t* near T that considerableimprovements are made. The reason for the lack of

improvement is the fast rate at which the reverse processdiffuses, and the reduction of non-zero kernel evalua-tions that follows. Therefore, improving the kernelestimation accuracy will only result in marginalimprovements.

In the reverse experiment, we increase M from 1,000to 10,000 while maintaining N=1,000 and again deter-mine the variance improvement. The results are shownin Fig. 8. This time, there is no improvement in theestimate for t* near T, with a ratio of 1. Better results areobtained by moving t* towards the intitial point t wherethe ratio increases to the maximum possible value of 10.In between, the kernel estimation’s accuracy and reverseMonte Carlo accuracy are clearly not balanced, or moreprecisely, their balance was disturbed by the increase of

Fig. 4 Range and mean of p values for different t*2[2.7, 4), basedon 100 forward–reverse estimations at each of the t* locations, withN=M=1,000 and BOD parameters as given by Table 3

Fig. 6 5 cm Variance in p values for different t* 2[2.8,4), based on100 forward–reverse estimations at each of the t* locations, withN=M=10,000 and BOD parameters as given by Table 3, usingparabolic kernel estimation (solid line) or the analytical transitiondensity function (dashed line)

Fig. 5 Range and mean of number of reverse realizations that fallwithin the support of the kernel estimation at different t*2(0,4).Based on 100 forward–reverse estimations with N=M=1,000 andBOD parameters as given by Table 3

Fig. 7 The effect of increasing the number of forward realizationsN from 1,000 to 10,000, while retaining M = 1,000. Measured inthe ratio of variance in p variables for different t* 2(2.8,4], beforeand after the change. Based on 100 forward–reverse estimationswith BOD parameters as given by Table 3

134

reverse realizations. In case both M and N are simulta-neously increased to 10,000, the results are as given byFig. 9. Here, the results are good overall with varianceimprovement ratio’s of around 10.

4.3.3 The choice of the number of realizations and t*

Increasing the number of realizations adds to thecomputation time of the estimator. Therefore, wewould like to know the best combination of N, M, andt* to reach a certain level of accuracy in the estimation.Going back to the results obtained with N=M=1,000,we will explain how this can be done. In Fig. 10 thevariance in 100 estimations is plotted against t* along

with a dotted line indicating the minimum level ofvariance. Given the number of realizations, this isclearly the best result we can obtain, but at what ex-pense? Because of the sampling, we could only measurethe kernel estimation times for each run. The averageruntime in seconds for each t* is given by Fig. 11. Itcan be seen that the kernel time at the assumed optimalt* is high, compared to those for lower valued com-bination points. This raises the question whether thegiven t* is indeed optimal, or if lower variations couldbe reached using a different t* and more realizations,at the same computational time. Suppose it were pos-sible to decrease variance linear with the number offorward and reverse realizations (we assume N=Mfrom now, unless specified otherwise) infinitely, andthat such increase will affect the computation timelinearly. The fact that this is practically impossible,

Fig. 8 The effect of increasing the number of reverse realizations Mfrom 1,000 to 10,000, while retaining N = 1,000. Measured in theratio of variance in p variables for different t* 2(2.8,4], before andafter the change. Based on 100 forward–reverse estimations withBOD parameters as given by Table 3

Fig. 9 The effect of simultaneously increasing the number offorward and reverse realizations from 1,000 to 10,000. Measured inthe ratio of variance in p variables for different t* 2(2.8,4], beforeand after the change. Based on 100 forward–reverse estimationswith BOD parameters as given by Table 3

Fig. 10 Variance in p values for different t* 2[2.8,4), based on 100forward–reverse estimations at each of the t* locations, withN=M=10,000, parabolic kernel estimation, and BOD parametersas given by Table 3

Fig. 11 Average kernel estimation times recorded for different t*2[2.8,4). Based on 100 forward–reverse estimations withN=M=10,000 and BOD parameters as given by Table 3

135

with the kernel evaluation order and the need to de-crease the numerical integration stepsize, will onlystrengthen our reasoning. Then, based on the minimumvariance seen in Fig. 10, it is possible to determine thefactor by which the number of realizations be multi-plied to achieve the same level of variance, as shown inFig. 12. By multiplying the kernel times with this fac-tor, we obtain optimistic computation time for kernelevaluation to get iso-variance estimations Fig. 13. Thelocation where this time is minimum, say t*Low, isshifted towards t slightly from the location of theoriginal minimum variance, t*High. Under theassumption that evaluation of the numerical integra-tion scheme for the forward and reverse is equally

expensive in terms of computational cost, the realoptimum, t*Opt, is limited as follows:

t�Low\t�Opt � t�High: ð19Þ

The lower bound will shift further towards t�High whennumerical integration is included and the kernel evalu-ation times are determined more realistically. In case thereverse process is much more expensive per timestepthan the forward process, the lower bound may shift alittle towards t.

Normally, it is not practically feasible to use theabove approach to determine a near-optimal t*, becauseof the computational effort required, at least, not in asimilar level of detail. As suggested earlier the processcan be done roughly but that will result in an crudeselection of t*. It would be better if, somehow, we coulddetermine this location directly from the two processes.For the BOD process, and other Gaussian processes ingeneral, it was noted that the difference in variance (forhigher dimensional systems, the covariance matrix’trace) between the forward and reverse processes wasreflected in the variance of the estimate, as can be seenby comparing Figs. 14 and 10. This would lead to asimple and elegant solution to the problem of finding agood value for t*. Simply generate a small set of real-izations and repeatedly increase them by use of thenumerical integration scheme. While doing so, maintainthe sum of the variance of each component of the real-izations, for each timestep. Next, find t* where theabsolute difference between the obtained forward andreverse variances is at its minimum and use this locationas the combination point for the full-run. The resultsfrom experiments based on a simple diffusion algorithm(see [10] for further information) where the analyticalvariance is known, were very promising as the valuesfound for t* proved to be good indications of the opti-mal location.

Fig. 14 Trace of the covariance matrices of the forward and reverseprocesses, plotted against time. Time t=4 is the end time of thesimulation and hence the start time of the reverse process

Fig. 12 Factor by which number of forward and reverse tracksshould at least be multiplied in order to obtain a density estimationwith a variance equal to the one indicated by the dotted line inFigure 10

Fig. 13 Idealistic time required to complete kernel estimation incase the number of realizations is chosen such that the variance indensity estimation is the same for all t* 2[2.8 4)

136

5 Performance evaluation of FRE versus FE

In this section, the FRE and FE are compared to eachother both in terms of order of convergence of the errorin the estimation, and in terms of accuracy to computingtime.

5.1 The order of convergence

The order of convergence of the accuracy of theestimated probability density can be approximated byrunning a number of estimations for each of a set ofvalues for N. The standard deviation and bias of theestimation are then determined for each N, and the re-sults plotted on a log-log scaled figure. From this figure,the order of convergence corresponds directly to theslope of the plotted line and can hence be easily deter-mined. We did a large number of experiments with dif-ferent choices for t* and location of the evaluation pointy, in p(t,x,T,y). Because of the similarity of the results, weonly show the results for the BOD model with parame-ters as given by Table 3. For the FRE, we usedN=M=2i, with i=5,...,15, and t*=3.803800, while forthe FE, n=2j, with j=5,...,18 was used. In both estima-tors, the transition density was estimated 10,000 times toget accurate values for the standard deviation and bias.In addition, the numerical integration was replaced bysampling the known distributions. Figs. 15 and 16respectively show the standard deviation and squaredbias in the forward and forward–reverse estimations,along with dotted lines indicating the theoretical ordersof convergence, plotted to the number of realizations.

The results match the theoretical results very closely,with the exception of the squared bias for the forward–reverse estimator which shows a convergence ofOðN�1:1Þ, faster than the OðN�1Þ plotted. It isknown(Milstein et al.2002) that the squared bias forproblems with d < 4, is o(N�1). Therefore, the results

still match the theory, although the convergence ratefound may be overestimated because of the relativelysmall number of realizations used.

5.2 Computational costs

For a realistic comparison between the two estimators,the numerical integration scheme has to be taken intoaccount since it accounts for a major part of the totalcomputational cost. For example, the run-time of theforward estimation can be expressed as follows

runtimeFE ¼ sintegration stepN � ðT � tÞ

Dþ skernel eval �N ð20Þ

where the constants sintegration step and skernel eval representthe execution time of a numerical integration step and akernel evaluation respectively, and D is the timestep usedin the integration scheme. The constants will differslightly for each implementation and computer platform,but are easily measured. The value for sintegration step wasfound to be several times larger than skernel eval, mostlybecause of the cost of generating random numbers. As aresult, even at a moderate timespan (T�t) and large D,most of the computation time will be spent on generatingrealizations.

The run-time for the forward–reverse estimator ismore difficult to express because it is not linear in thenumber of realizations used. Especially the kernel esti-mation time is hard to predict because it depends on theforward and reverse distributions at the combinationpoint, which in turn depend on the x and y in the densityfunction. Despite this, we still want to compare the FEand FRE in terms of computation time. This can be doneby recording the simulation time of a forward–reverseestimator. This time can then be substituted in (20) todetermine the matching number of realizations in the FEsimulation. After running an equivalent FE simulation,the results of the two estimators can be compared.

Fig. 16 Convergence of squared bias in the estimated transitiondensity

Fig. 15 Convergence of the standard deviation in the estimatedtransition density

137

The above approach was applied to the BOD modelwith parameters as given by Table 3. For numericalintegration, the Euler scheme as given by (2) was usedwith stepsize, D, of 0.1, 0.01 and 0.001. The number ofrealizations used in the FRE was N=M=2i, withi=7,...,18. The value of t* was related to the stepsize,resulting in value of 3.8, 3.80, and 3.804, respectively. Asan example, Table 4 gives the equivalent number ofrealizations to be used in the FE, in case i=18, or sim-ilarly, N=M=262, 144;

The results of the experiments are given by Figs. 17and 18. Curves with smaller timesteps will have largerassociated run-times and will hence start more to theright in the plot. Both the standard deviation andsquared bias of the FRE are consistently smaller thanthose of the FE, while obtained at a similar computa-tional effort. Therefore we can conclude that the FRE isindeed superior to FE, even for small numbers of real-izations.

6 Conclusions

We have applied the recently introduced forward–re-verse (Milstein et al.2002) to a stochastic environmentalBOD model. After the derivation of the forward–reverserepresentation of the model, we investigated a number ofissues that required special attention in order to makeapplication of the forward–reverse method efficient in apractical sense. We argued that the calculation of the

kernel estimator must be implemented with care andpresented an OðM logNÞ scheme. In Sect. 4.3, the rela-tion between combination point t* and the computa-tional efficiency has been analyzed. We presented anefficient practical approach for determining the appro-priate value for t*. It was then shown that the forward–reverse estimator indeed achieves an OðN�1=2Þ accuracywhen applied to the BOD model, compared to anOðN�2=7Þ accuracy for the forward estimator (seeTable 1). The comparison of the estimation error to therun-time clearly shows the advantage of using the for-ward–reverse estimator. Especially for problems where ahigh accuracy is required, the FRE will be substantiallyfaster than the FE, as seen in Fig. 17. These results areencouraging and show the practical use of the forward–reverse approach for density estimation in environmen-tal risk analyses.

Acknowledgements The authors would like to thank G.N. Milsteinfor helpful discussions and comments.

References

Bally V, Talay D (1996a) The law of the Eulerscheme for stochasticdifferential equations I: convergence rate of the density. MonteCarlo Methods Appl 2:93–128

Bally V,Talay D (1996b) The law of the Euler scheme for stochasticdifferential equations I: convergence rate of the distributionfunction. Probab Theory Relat Fields 104:43–60

Devroye L, Gyorfi L (1985) Nonparametric density estimation: theL1. View Wiley

Devroye L (1986) Non-uniform random number generation.Springer, Berlin Heidelberg New York

Greengrad L, Strain J (1991) The fast Gauss transform. SIAM J sicstat Comp 12:79–94

Hammersley JM, Handscomb DC (1964) Monte Carlo methods.Wiley, London

Jazwinsky AW (1970) Stochastic processes and filtering theory.Academic, New York

Kloeden PE, Platen E (1992) Numerical solution of stochasticdifferential equations. Springer, Berlin Heidelberg New York

Table 4 Number of realizations that can be used in the FE toobtain a run time similar to the FRE, when using N = M =262,144, and a stepsize as indicated

Stepsize 0.1 0.01 0.001

N 574,950 293,389 266,052

Fig. 17 Standard deviation in transition density estimation for theforward and forward–reverse estimators, plotted against simulationtime. From left to right D=1/10, 1/100 and 1/1000

Fig. 18 5 cm Squared bias in transition density estimation for theforward and forward–reverse estimators, plotted against simulationtime. From left to right D=1/10, 1/100 and 1/1000

138

Loucks DP, Stedinger JR, Haith DA (1981) Water resources sys-tems planning and analysis. Prentice-Hall, New York

Milstein GN (1995) Numerical integration of stochastic differentialequations. Kluwer, Dordrecht

Milstein GN, Schoenmakers JGM (2002)Monte Carlo constructionof hedging strategies against multi-asset European claimsstochastics. Stochastics Rep 1–2:125–157

Milstein GN, Schoenmakers JGM, Spokoiny V (2002) Transitiondensity estimation for stochastic differential equations via for-ward reverse representations. Bernoulli (submitted)

Schoenmakers JGM, Heemink AW (1997) Fast valuation offinancial derivatives. J Comput Finance 1(1):47–67

Schoenmakers JGM, Heemink AW, Ponnambalam K, KloedenPE (2002) Variance reduction for Monte Carlo simulationof stochastic environmental models. Appl Math Model 26:785–795

Silverman BW (1986) Density estimation for statistics and dataanalysis. Chapman and Hall Ltd, New York

Stijnen JW (2002) Numerical methods for stochastic environmentalmodels Delft University of Technology. Ph.D. Thesis, DefaltUniversity of Technology

Thomson DJ (1987) Criteria for the selection of stochastic modelsof particle trajectories in turbulent flows. J Fluid Mech180:529–556

139

Probability density estimation in stochastic environmental models

Documents

Transcript of Probability density estimation in stochastic environmental models