› ama › staff › new › Nonlinear...NONLINEAR FILTERING OF STOCHASTIC CLIMATE MODELS WITH...

NONLINEAR FILTERING OF STOCHASTIC CLIMATE MODELS WITH

PARTIAL INFORMATION

JIANHUI HUANG†, MICHAEL KOURITZIN‡

Abstract. Here, we investigate the nonlinear filtering problems of stochastic climatemodels with the presence of mechanism noise in climate proxy datum. The underlyingstate process is the intrinsic climate or temperature process which can be formulated bya very general Markov process taking value in a complete, separable metric space. Thisformulation includes the diffusion process driven by Brownian motion or α−stable process,the jumping-diffusion process driven by (Poisson) random measure as its special cases.All the processes arise naturally in the study of stochastic climatology when consideringdifferent geophysical characters. The observation process is given by the climate proxy(e.g., δ18O ratio from ice-core reproduction) which is additive by some observation noiseor ice-mass irregularities. The Duncan-Mortensen-Zakai (DMZ) equation and its robustversion for the climate filtering problem are derived based on the martingale problemformulation, gauge transform and equivalent random measure transform. We also dealwith the model selection problem to different climate models. The Bayes factor and itsrobust version are also derived to this end. For numerical computation purpose, the particlefilter and its particle MCMC method are also presented here.

1. Introduction

This paper focuses on the stochastic filtering and model selection (or calibration) prob-lems for climate models. It is well known the climate change has been extensively studiedduring the past few decades. Due to the intrinsic uncertainties inherited, considerable sto-chastic climate models are proposed to explain the climate mechanism. For example, theatmosphere forcing as well as the glacial-interglacial climate cycles in the last 800kyr BP.On the other hand, the paleoclimatic data is only available through some climate-proxysuch as the ice-core record. Nevertheless, it is well recognized that there exist consider-able mechanism irregularities in the formulation of ice-core models including the ice-sheetshifting, wind perturbation, air bubbles, etc. Therefore, it is more feasible to consider thestochastic filtering to climate model by taking into account these mechanism irregulari-ties or observation noises. As a response, this paper aims to develop the basic method ofstochastic filtering to climate models with observation noise.

Now, we first introduce the stochastic nonlinear filtering in a rather general framework.Let (Ω,F ,P) be a complete probability space on which all stochastic processes (e.g., theunderlying climate state and its proxy, e.g., ice-core) will be defined. Let N denote allthe null sets in (Ω,F ,P) and for any stochastic process S, define its augmented naturalfiltration as

FSt , σSu : 0 ≤ u ≤ t ∨ N .

The filtering problem, can be formulated through a pair of processes (X,Y ), where the signalor state X is a latent process (the climate state such as the temperature, or atmosphere-ocean interaction) which is always assumed to be Markov; the observation Y is some func-tional of X but corrupted with noise (for example, the climate proxy with additive ice-core

Key words and phrases. Nonlinear Filtering, Duncan-Mortensen-Zakai (DMZ) Equation, Robust Filter,Stochastic Climate Models, α−Stable Temperature Process, Double-well Potential Climate Function, Ice-core Climate Proxy, Glacial-interglacial Cycles.

1

2 JIANHUI HUANG†, MICHAEL KOURITZIN‡

mechanism irregularities, see McConnell et al. 2000). The available information to X isjust the observation filtration FY

t and the primary goal of filtering is hence to recursivelycharacterize the conditional distribution

(1.1) πt(·) = P[Xt ∈ ·|FYt ].

In this way, we get the stochastic filtering problem for climate models. Now we first brieflyintroduce the mathematical expects of nonlinear filtering theory as follows. Stemmingfrom Fujisaki, Kallianpur, Kunita (1972) and Kushner (1967), it is known that undersuitable regularity conditions (see Kouritzin and Long (2006) for more general setup byrelaxing the finite energy condition), the optimal filter πt satisfies some stochastic differ-ential equations driven by the observation process and these equations are often called theFujisaki-Kallianpur-Kunita (FKK) or Kushner-Stratonovich (KS) equation (for the condi-tional probability density process). An equivalent but much simpler equation is the Duncan-Mortensen-Zakai (DMZ) equation for the unnormalized conditional distribution σt whichcan be linked to πt through the Kallianpur-Stribel (KS) formula. These filtering equations,in principle, yield the optimal filter in the theoretical sense and the optimal filtering prob-lem seems to be completely solved. However, as explained by Clark (1978), these filteringequations are actually impractical to implement in real life as their sensitivity to the pos-sible modeling errors is not readily apparent. The most common modeling inaccuracy iscaused by the popular use of Brownian motion as the idealized observation noise in filteringframework. A robust representation of the conditional distribution was first introduced byClark (1978) to discern and potentially decrease the effects of modeling errors. Such robustrepresentation is called the “robust filter” and its rigorous mathematical deduction wasprovided in Clark and Crisan (2005) in full detail. A substantial number of papers havestudied the robust filter since its introduction. The papers having a direct connection toour work are the ones of Davis (1980, 1981) where a semi-group approach to robust filteris proposed when the signal is Hunt process. Another important related work is Heunis(1990) where a stochastic partial differential equation approach is given when the signal ismulti-dimensional diffusion process.

The robust filter has great importance to our climate model in partial information setupdue to the following reasons: (i) Due to the relatively few data presented in climate record,it is more reliable to investigate the robust filter which is more immune to the possible errorsin interpolation in both spatial and temporal grid; (ii) It is always the case the underlyingclimate state or its climate proxy mechanism are subject to some formulation uncertaintiesthus it is more necessary to investigate the robust filter. To this end, our work aims toderive the robust filter through two probability transforms: the first transform changes theprobability law of the observation and keep that of signal unchanged; the second path-dependent transform changes the law of the signal and the conditional expectation takesthe form of the robust filter is derived based on this new probability law. The advantage ofthis new signal law is that, the conditional expectation takes some form of the Feynman-Kac multiplicative functional under it and its operator can be easily derived. This is alsoimportant to the potential computation of complex climate models.

The rest of this paper is organized as follows. Section 2 introduces the basic filteringsetup and some preliminary results when studying the stochastic climatology. Section 3 isdevoted to the robust filter via the path-dependent probability measure transform wherewe impose the assumption A6. In section 4, we drop the assumption A6 and derive therobust filter in a more general setup with the help of the random measures. Section 5turns to the study of Bayes factor for climatology model selection. The robust Bayes factormethod is introduced in Section 6. The relation to mean-field linear-quadratic-Gaussian

NONLINEAR FILTERING OF STOCHASTIC CLIMATE MODELS WITH PARTIAL INFORMATION 3

(LQG) control, stochastic delayed control problem are also discussed. Section 7 discussesthe numerical computation to stochastic climatology through the particle filtering method.

2. Notations and Basic Setup

For any Polish space E, we let M(E) be the set of real-valued, Borel measurable functionson E, B(E) denote the Banach space of bounded measurable functions with the sup-norms∥·∥∞ and Cb (E) the subspace of real-valued bounded continuous functions on E. DE [0,∞)represents the space of all right-continuous left-limit (RCLL) functions from [0,∞) into Eendowed with the Skorohod topology; CE [0,∞) will be the subspace of continuous functionsfrom [0,∞) into E with the topology of unform convergence. The underlying climatologystate Xt is of DE [0,∞) sample paths, we let Xt− = lims↑tXs and ∆Xt = Xt −Xt− for anyt > 0. For any semi-martingales M and N , [M,N ] denotes their cross variation process and⟨M,N⟩ their angle bracket process if it can be well defined. For generality, we use the localmartingale problem introduced by Either and Kurtz (1986, page 224).

Definition 2.1. For Lt ⊂ M(E)×M(E) with the common domain D(L), an E-valued pro-cess X is said to be a solution of the (Lt, µ)-local martingale problem if its initial distributionis µ = P X−1

0 and for all f ∈ D(L),

(2.1) Mft = f(Xt)− f(X0)−

∫ t

0Lsf(Xs)ds

is a FXt −local martingale. If its sample paths are in DE [0,∞), we called it a DE [0,∞)−

solution to local martingale problem.

The setup of local martingale problem allows us to work with more general domains thanthose for the corresponding martingale problem. For ease of notation, we also define theLie bracket for operator Lt :

Definition 2.2. For f1, f2 ∈ D(L), the Lie bracket for Lt is defined as

(2.2) [f1, f2]t , (Ltf1f2 − f1Ltf2 − f2Ltf1).

2.1. Filtering Setup for Stochastic Climate Models. Consider the additive whitenoise observation model:

(2.3) Yt =

∫ t

0g(s,Xs)ds+Bt, t ≥ 0,

where the climate state process X is a DE [0,∞) solution of the (Lt, µ)-local martingaleproblem; the time-dependent sensor function g = g(t, x) is a Rp-valued Borel function andwe write gt(·) = g(t, ·); the observation noise B is a Rp-valued Brownian motion independentof X. One concrete example of X from stochastic climate model is the following stochas-tic resonance model (SRM) introduced by Benzi, Sutera, and Vulpiani (1981) through aLangevin equation:

dXt = [−U ′(Xt) +A cost

T]dt+ σdWt(2.4)

where the climate state Xt can represent the evolution of temperature; U(x) is some pseudo-climate potential, period T is related to the glaciation cycles, Bt is a standard Brownianmotion, A, σ are magnitude parameters. Based on SRM and motivated by Gihman andSkorohod (1979), Heunis (1990), Ito (2004), the state process X is characterized by thefollowing stochastic differential equation

dXt = [−U ′(Xt) +A cost

T]dt+ σ1dVt + σ2dLt,(2.5)


where U(x) is the pseudo-climate potential, T is the magnitude of periodicity while A isthe scaling parameter, the first driving term Vt satisfies

dVt = −ρ2Vtdt+ ρ√

1 + V 2t dWt,(2.6)

where the scaling parameter ρ signifies the (inter-annual) correlation time, Bt is a standardBrownian motion, the second driving term Lt is a alpha-stable process with stable indexα ≤ 2. It follows our formulation (2.5), (2.6) includes the SRM as its special case (σ1 = 0,σ2 = σ, α = 2). However, our formulation is more flexible to accommodate the statisticalcharacters like the heavy-tail of era-transitions founded in Ditlevsen (1999). As to thepseudo-potential function, we will apply the Fokker-Planck equation approach proposed inFujisaki, Kallianpur and Kunita (1972), Heunis (1990), Holley and Stroock (1981), and thebenchmark model will be the following Stommel double well-potential

U(x) = 4∆

(x4

4− x2

2

)with ∆ as the height of the potential barrier. The observation process Y is modeled by thewhite-noise additive equation

dYt = g(t,Xt)dt+ dBt,(2.7)

where the sensor function g(t, x) depends on the observation mechanism. Some geophysicalfactors of observation noise Y are specified as follows.

Time-scaling noise. This is due to the error embedded in the temperature reconstructionfrom ice-core. In principle, for the ice-core timescale based on considering the seasonalcycles (for example, in δ18O, sulfate, etc.), the uncertainty noise will increase with depth(i.e., the historical time recorded) in an ice-core observation. Such uncertainty is alsoaffected by the possible eruption of volcanos. Steig et al. (2005) emphasized the need todistinguish absolute accuracy from relative accuracy. For example, in the 200-year-longU.S. ITASE ice cores from West Antarctica, they showed that while the absolute accuracyof the dating was ±2 years, the relative accuracy among several cores was < ±0.5 year, dueto identification of several volcanic marker horizons in each of the cores. To remove theeffects of such uncertainty, we can make average of the observed ice-core data because thesystematic errors in timescale is consistent to all the measurement of our ice-core datumcollected.

Diffusion noise. This uncertainty noise is due to the migration of geochemical signalsoccurring in polar ice cores primarily in the upper 60− 80-m-thick firn layer; in glaciers ex-periencing summer melt, migration may be much faster and persist to much greater depths.For cold firn, migration is essentially by molecular diffusion below the upper 10m. Diffusionuncertainty is important to the extent that the characteristic length of diffusion exceeds thecharacteristic depth-resolution at a proxy is being used. The diffusion uncertainty can beformulated through the vapor diffusion models and the vapor transmission, snow accumula-tion rate functions can be constructed into our sensor function g(t, x). For more geophysicalmechanism of vapor diffusion and its quantitative analysis, see Cuffey and Steig (1998).

Sampling noise. This noise is due to the possible continuous snowfall in the ice-core sheet.Consequently, there may raise some bias by snowfall occurring during a particular season,or during a particular storm. Quantifying this uncertainty can probably be conducted byevaluating the mean and variance of snowfall events in observational data sets and models.

Spatial noise. this type of noise can be attributed to the chemical composition amountof the snowfall can varying significantly over short distances because of the local microme-terological effects (e.g., snow dunes). In general, the degree of spatial variance is reduced


when greater time averages are considered. Quantifying this uncertainty requires obtain-ing multiple ice cores from nearby locations. In mathematics, we can model such noise byconsidering the spatial-temporal observation model (see Kurtz and Xiong 1999) by con-sidering some space-time Gaussian noise, or considering the possible spatially-distributedmulti-agent data collection. The former generates the following more general index-valuedobservation model

Y (A, t) =

∫ t

0

∫Ag(u,Xs)µ(du)dt+W (A, t).(2.8)

Here, W is some spatial-time Gaussian white noise field, A denotes the possible spatialdistribution of ice-core sheets. The concrete example includes the spatially-distributed ice-core datums from the Antarctic and Greenland. The latter will suggest some correlationanalysis on time-series, or the datum assimilation to mean-field stochastic models (e.g.,the McKean-Vlasov equation). In both cases, the particle filtering representation will beneeded. This can be solved by using the particle representation of stochastic partial differ-ential equations to the derived nonlinear filters. Furthermore, we list some mathematicalassumptions to our state and observation processes used here:

A 1. D(L) is closed under multiplication.

A 2. For f ∈ D(L) and T > 0,

(2.9) sup0≤t≤T

E|f(Xt)|2 < ∞,

(2.10) E∫ T

0|Ltf(Xt)|2dt < ∞.

A 3. For all f ∈ D(L), t ≥ 0,

(2.11) E[Mf ,Mf ]t < ∞.

A 4. For any t ∈ [0, t], gt ∈ D(L).

A 5. For T > 0,

(2.12)

∫ T

0|g(s,Xs)|2ds < ∞ P a.s.,

(2.13) E exp(1

2

∫ t

0[g, g]s(Xs)ds) < ∞ ∀t ∈ [0, T ].

A 6. The sample paths of g(t,Xt) is continuous.

Remark 2.1. Following Kouritzin and Long (2006), here we suppose the more generalcondition (2.7) instead the following finite energy condition,

E∫ T

0|g(s,Xs)|2ds < ∞.

Such condition relaxation is important when we deal with the stochastic climate models bysome α-stable process with double-well potential function where the empirical estimate ofstable index is around to be α = 1.75 therefore the second moment of state (climate) processdoes not exist (see Ditlevsen 1999).


Remark 2.2. From A4,

g(t,Xt) = g(0, X0) +

∫ t

0Lsg(s,Xs)ds+Mg

t(2.14)

for some local martingale Mgt . If A 6 also holds true, then

Mgt = g(t,Xt)− g(0, X0)−

∫ t

0Lsg(s,Xs)ds(2.15)

is a continuous local martingale.

To facilitate the path-dependent probability transformation, we must determine a con-dition under which local martingale Mf is actually a martingale for f ∈ D(L). Clearly, A3is a sufficient condition for each Mf to be a square-integrable martingale and A3 follows iffor all f ∈ D(L), ⟨Mfc,Mfc⟩t is integrable and ∆f(Xt) is bounded.

Proposition 2.1. Assume A2, then for f ∈ D(A), Mft is a locally square integrable mar-

tingale.

Proof. Apply the Schwarz inequality directly.

From proposition 2.1, for any f1, f2 ∈ D(L), the angle bracket process ⟨Mf1 ,Mf2⟩t iswell defined whenever A2 holds true. Actually, we have the following important result.

Lemma 2.1. If A1, A2 hold true, then for f1, f2 ∈ D(L),

(2.16) ⟨Mf1 ,Mf2⟩t =∫ t

0[f1, f2]

s(Xs)ds,

Moreover, for f ∈ D(L), Mft is quasi-left-continuous.

Proof. First, for f ∈ D(L),

Mf (t) = f(Xt)− f(X0)−∫ t

0Lsf(Xs)ds

is a local martingale with Mf (0) ≡ 0. Next, from A 1, f2 ∈ D(L) and

f2(Xt) = f2(X0) +

∫ t

0Lsf

2(Xs)ds+Mf2(t)

for some RCLL local martingale Mf2(t). Applying Ito’s formula,

f2(Xt) = f2(X0) +

∫ t

02f(Xs−)Lsf(Xs)ds+

∫ t

02f(Xs−)dM

f (s) + [Mf ,Mf ]t.

Therefore,∫ t

02f(Xs−)Lsf(Xs)ds−

∫ t

0Lsf

2(Xs)ds+ [Mf ,Mf ]t = Mf2(t)−

∫ t

02f(Xs−)dM

f (s).

However,

(2.17) Kt = [Mf ,Mf ]t − ⟨Mf ,Mf ⟩tis a local martingale so we have

∫ t

02f(Xs−)Lsf(Xs)ds−

∫ t

0Lsf

2(Xs)ds+⟨Mf ,Mf ⟩t = Mf2(t)−

∫ t

02f(Xs−)dM

f (s)−Kt.


The left hand side is a RCLL, adapted predictable finite variation process null at zero whilethe right hand side is a local martingale starting from zero. Thus, from the decompositionuniqueness to special semi-martingale, we have

⟨Mf ,Mf ⟩t =

∫ t

0(Lsf

2(Xs)− 2f(Xs)Lsf(Xs))ds =

∫ t

0[f, f ]s(Xs)ds.(2.18)

Following from the polarization, for f1, f2 ∈ D(L),

(2.19) ⟨Mf1 ,Mf2⟩t =∫ t

0[f1, f2]

s(Xs)ds.

Moreover, for f ∈ D(L), ⟨Mf ,Mf ⟩t is absolutely continuous with respect to the Lebesgue

measure, thus Mft is quasi-left-continuous.

2.2. Reference Probability Measure and Stochastic Filtering Equation. Now in-troduce the augmented filtration

Ft , σXs, Ys : 0 ≤ s ≤ t ∨ N .

Note that

(2.20) Λt , exp (−∫ t

0g(s,Xs)dYs +

1

2

∫ t

0|g(s,Xs)|2ds)

is a Ftt≥0 -martingale by A5 and

(2.21)dPdP

= ΛT

defines a reference probability measure P equivalent to P on (Ω,FT ). As a standard resultin filtering theory, we have,

Proposition 2.2. Under P, Y is a Brownian motion independent of X and the law of Xunder P is same as its law under P.

For f ∈ B(E), the Kallianpur-Striebel-Bayes formula (see Kallianpur 1980, Page 283)links πt with the unnormalized filter σt by

(2.22) πt(f) =σt(f)

σt(1)

where

(2.23) σt(f) , E[f(Xt)Λ−1t |FY

t ].

Furthermore, πt(f) and σt(f) are characterized respectively by the following FKK and DMZequation. (We refer the reader to Fujisaki, Kallianpur, Kunita (1972) and Zakai (1969) forthe details)

Theorem 2.1. (FKK equation) Suppose A1-A4 hold true. If f ∈ D(L), then πt(f) satisfiesthe following stochastic differential equations.

(2.24) πt(f) = π0(f) +

∫ t

0πs(Lsf)ds+

∫ t

0πs(gsf)− πs(f)πs(gs)(dYs − πs(g)ds).

Theorem 2.2. (DMZ equation) Under the conditions of Theorem (1.1), for f ∈ D(L), theunnormalized filter σt(f) satisfies

(2.25) σt(f) = σ0(f) +

∫ t

0σs(Lsf)ds+

∫ t

0σs(gsf)dYs, σ0(f) = E(f(X0)).


Unfortunately, the representations of optimal filter through the FKK or DMZ equationsare not robust to observation equation modeling error and there are stochastic integralsto be evaluated. However, in case of X taking values in locally compact space such asthe finite dimensional diffusion process, the above Zakai equation can be transformed tosome equivalent but non-stochastic partial differential equation parameterized by its samplepath which implies the robustness. As mentioned at beginning, Clark (1978) introducedthe notion of “robust filter” to create versions of the optimal filter that are continuouslydependent on the underlying observation path. Clark (1978) showed that such version, ifit exist, shall be unique under mild conditions. Pardoux (1979), Henuis (1990) obtainedthe same result using the approach of Feynman-Kac formula and path-dependent measurechange. Davis (1980, 1981) generalized the above results to the case of the signal as someHunt process in locally compact Hausdorff spaces. Empirical results have demonstratedthat the robust filter does indeed perform favorably when applied to real data problemwhere the Brownian observation noise assumptions are unrealistic. In particular, when westudy the stochastic climate datum such as the ice-core reconstruction.

3. Robust Filter via Path-dependent Probability Transform

In the remaining of this paper, denote y = yt, t ≥ 0 an arbitrary but fixed observationtrajectory, in other words, yt = Y (t, ω) for all t ≥ 0 and some ω ∈ Ω.

Definition 3.1. For f ∈ B(E), define the gauge transform of X as

(3.1) νt(f) , E[f(Xt)Λ−1t exp(−Ytgt(Xt))|FY

t ],

and it is equivalent to σt in the following sense,

(3.2) νt(f) = σt(f exp(−Ytgt)) and σt(f) = νt(f exp(Ytgt)).

The main result of this section characterizes the robust filter of νt(f) recursively whenwe assume A 6, that is,

Theorem 3.1. Assume A1-A6 hold true, then νt(f) satisfies the evolution equation

d

dtνt(f) = νt(L

yt f) + νt(f−ytLtgt −

1

2|gt|2 +

y2t2[g, g]t)(3.3)

for all f ∈ D(Ly) = D(L) where Lyt f = Ltf − yt[g, f ]

t.

Its proof relies on some path-dependent probability transform and we first present somepreliminary results.

Proposition 3.1. Let M and N be two semi-martingales of which N is continuous suchthat d[N ]t(ω) is absolutely continuous to Lebesgue measure a.s. , then we have

(3.4)

∫ t

0MsdNs = NtMt −

∫ t

0NsdMs − [M,N ]t.

Proof. By integration by parts with continuous N ,

(3.5) NtMt =

∫ t

0Ms−dNs +

∫ t

0NsdMs + [M,N ]t.

Denote λNt the Radon-Nikodym derivative d[N ]t

dt , then on ([0,∞)×Ω,B([0,∞))⊗F), weintroduce the Doleans measure νN as

νN (A) , E∫ ∞

01A(t, ω)d[N ]t = E

∫ ∞

01A(t, ω)λ

Nt dt.


Moreover, because predictable process Mt− has at most a countable number of discontinuitypoints, it is equivalent to Mt under νN and∫ t

0Ms−dNs =

∫ t

0MsdNs a.e.

Hence the result.

Proposition 3.2. Suppose X and Y are respectively the signal and observation processesintroduced in (2.1), (2.3), then we have

(3.6)

∫ t

0g(s,Xs)dYs = Ytg(t,Xt)−

∫ t

0Ysdg(s,Xs)

Proof. By the (Lt, µ)-local martingale problem and A 4,

g(t,Xt) = g(0, X0) +

∫ t

0g(s,Xs)ds+Mg

t ,

and Yt =∫ t0 g(t,Xt)dt + Bt are both special semi-martingales with Yt is continuous so

[Y, Y ]t = ⟨Y, Y ⟩t = t, then from Proposition 3.1, we have

(3.7)

∫ t


∫ t

0Ysdg(t,Xt)− [g(X), Y ]t.

On the other hand, we need show

(3.8) [g(X), Y ]t = [Mg, B]t = ⟨Mg, B⟩t = 0.

For any two semi-martingales M,N , [M,N ] and ⟨M,N⟩t are both preserved ( in the senseof modifications) under equivalent measures so we focus ourself on P under which Mg andB are independent. The first identity is obvious and the second identity comes from thefact that Mg and B are continuous. Moreover, [Mg, B]t = ⟨Mg, B⟩t are also continuousprocess so their versions under P or P are indistinguishable. To show [Mg, B]t = 0, it issufficient to show (MgB)t is a local martingale under P. Due the localization, we need onlyshow in the martingale case, say, (MgB)t is a martingale under P provided Mg and B areboth P-martingales. Following Fujisaki, Kallianpur and Kunita (1972),

E((MgB)t − (MgB)s|Fs) = E((Mgt −Mg

s )(Bt −Bs)|Fs) = 0,

where the second equality is due to the independence of Bt−Bs to Fs∨σMgt −Mg

s underP, thus proving (3.8). Consequently,∫ t


∫ t

0Ysdg(s,Xs).

From (2.16) and Proposition 3.2,

νt(f) = E[f(Xt) exp(−∫ t

0YsLg(s,Xs) +

1

2|g(s,Xs)|2ds) exp(−

∫ t

0YsdM

gs )|FY

t ].(3.9)

Moveover, the independence of X and Y under P implies

(3.10) νt(f) = E[f(Xt) · Ξt ·Ayt ],

where


(3.11) Ξt , exp(

∫ t

0−ysLg(s,Xs)−

1

2|g(s,Xs)|2 +

y2s2[g, g]s(Xs)ds),

and

(3.12) Ayt , exp(−

∫ t

0ysdM

gs −

∫ t

0

y2s2[g, g]s(Xs)ds)

Lemma 3.1. If A4, A5, A6 hold true, then Ayt is a continuous martingale.

Proof. From A 6, we know Ayt is a continuous process and

(3.13) dAyt = −Ay

t ytdMgt ,

which implies Ayt is a local martingale. Meanwhile, from A 4,

E exp(1

2

∫ t

0[g, g]s(Xs)ds) < ∞, ∀t ∈ [0, T ]

then from the Novikov criteria, Ayt is a martingale.

After the first measure change to P, the law of the signal process X remains unchanged.In contrast, with the second path-dependent measure change, the law of X will be changedand characterized by some observation-dependent martingale problem. The gauge transformν(·) will then take the form of Feynman-Kac multiplicative functional. Now, we are readyto introduce the path-dependent probability transform. For given y(·), introduce the path-

dependent probability measure Py by

dPy

dP

∣∣∣Ft

= Ayt ,

under which the gauge transform can be characterized in the form of Feynman-Kac multi-plicative functional:

νt(f) = Ey[f(Xt)Ξt],

where

(3.14) Ξt = exp(

∫ t

0−ysLsg(Xs)−

1

2|g(Xs)|2 +

y2s2[g, g]s(Xs)ds).

Lemma 3.2. Assume A1-A6, then under Py, X is a RCLL solution to the Ly martingaleproblem

(3.15) df(Xt) = Lyt f(Xt)dt+ dMf

t ,

where D(Ly) = D(L) and for f ∈ D(Ly),

(3.16) Lyt f , Ltf − yt[g, f ]

t,

(3.17) dMft , dMf

t + yt[g, f ]t(Xt)dt.

Moreover, the Kolmogorov forward equation of X is

(3.18) Pyt (f) = Py

0(f) +

∫ t

0Pys(L

ysf)ds.


Proof. For f ∈ D(A), it is obvious that

(3.19) df(Xt) = Lyt f(Xt)dt+ dMf

t

and we only need to show Mft is a martingale. Due to the continuity of Ay, we have

∆[Mf ,Ay]t = ∆(Mf )t ·∆(Ay)t = 0,

therefore

[Mf ,Ay]t = ⟨Mf ,Ay⟩t =∫ t

0−Ay

sys[f, g]sds,

and

dMft = dMf

t − 1

Ayt

d[Mf ,Ay]t.(3.20)

Thus, from the Girsanov-Meyer theorem, Mft is a local martingale under Py and

Ey[Mf , Mf ]t = E([Mf ,Mf ]Ay)t < ∞.

Moreover, from A2, A3 and proposition 1.3, it follows that Mft is a martingale under Py.

Proof of Theorem 3.1.

Proof. We have

νt(f) = Ey(f(Xt)Ξt).

From (3.13) and integration by parts,

d(f(Xt)Ξt) = Lyt f(Xt)Ξtdt+ f(Xt)Ξt−ytLtg(Xt)−

1

2|g(Xt)|2(3.21)

+y2t2[g, g]t(Xt)dt+ ΞtdM

ft .∫ t

0 ΞsdMfs is a Py-martingale. Now, taking the expectation under Py, we have

νt(f) = ν0(f) +

∫ t

0νs(L

ysf)ds+

∫ t

0νs(f−ysLsg −

1

2|g|2 + y2s

2[g, g]s)ds.

4. Robust Filter via Random Measure

In this section, we drop off the condition A6, that is, we no longer assume Mgt is a

continuous, instead, only a RCLL martingale and we derive the robust filter in this caseapplying the random measure approach. Our result includes that of Davis (1980, 1981) asits special case. An excellent account to the general random measure theory can be foundin Jacod and Shiryaev (1987). Here, we focus on the finite integer-valued random measurewhich definition is

Definition 4.1. A finite integer-valued random measure ν defined on [0, t]×R is a randommeasure satisfies (1) ν(ω, t×R) ≤ 1, (2) for each A ∈ B(R), ν(ω, [0, t]×A) ∈ N. (3) ν is

optional and P −σ-finite. Here P = P×B(R) and P is the predictable algebra of Ω× [0, T ].

A fundamental example of the integer-valued random measure of our interests will be


Definition 4.2. (Jump random measure of a RCLL process) Suppose Z is an arbitraryR-valued RCLL process, then for any ω ∈ Ω,

(4.1) νZ(ω, dt, dz) ,∑

0≤s≤T

δs,∆Zs(ω)(dt, dz)1∆Zs(ω)=0

defines a finite random measure νZ(ω, ·) on R+ × E where δ denotes the Dirac measure.

This formulation is introduced here to represent the possible volcano eruptions.

Definition 4.3. (Gihman, Skorohod, 1979, Page 88) An orthogonal local martingale mea-sure is a random measure µ = µ(ω, ·) defined on R+×E if the stochastic process µ(ω, [0, t]×A1)·µ(ω, [0, t]×A1) is a locally square integrable martingale and for any A, the angle bracketprocess ⟨µ(ω, [0, t]×A), µ(ω, [0, t]×A)⟩t = π(ω, [0, t]×A), where π(ω, [0, t]×A) is a randommeasure which is a continuous monotonically nondecreasing integrable process for a fixed A.π = π(ω, ·) is its compensator or characteristic.

A useful and general result will be the following

Proposition 4.1. (Gihman and Skorohod, 1979, page 85, 88) For an arbitrary integer-valued random measure ν(t, A), if it satisfies (1) for ∀t ≥ 0;A ∈ B(R) such that A∩(−ε, ε) =∅ for some ε > 0, then Eν(t, A) < ∞. (2) for a fixed A, the function ν(t, A) is monotonicallynondecreasing and RCLL. (3) for an arbitrary monotonically nondecreasing sequence ofstopping times τn such that lim τn = τ ≤ T ,

limEν(τn, A) = Eν(τ,A),

then it admits a unique decomposition with the form

(4.2) ν(t, A) = µ(t, A) + π(t, A),

where µ is an orthogonal local martingale measure with the compensator π which is pre-dictable.

Proposition 4.2. For the compensator π in Proposition 4.1, there exists a predictable, inte-grable and finite variation process A, a kernel K(ω, t, dz) from (Ω× [0, T ],P) into (R,B(R))such that

(4.3) π(ω, dt, dz) = dAt(ω)K(ω, t, dz).

A consequence of proposition 4.2 is the following lemma concerning our (Lt, µ)-localmartingale problem:

Lemma 4.1. If f ∈ D(L), then there exists a unique decomposition

Mft = Mfc

t +Mfdt

with

Mfdt =

∫ t

0

∫Rzµf (ω, ds, dz).

Here, Mfc is a continuous local martingale and Mfd a purely discontinuous local martingale;µf is an orthogonal local martingale measure with compensator πf and

(4.4) νf (t, A) = µf (t, A) + πf (t, A),

is the jumping random measure of Mft .


For any predictable process H = H(t, ω) satisfying

(4.5)

∫ t

0

∫RH2(s, ω)πf (ω, ds, dz) < ∞,

the stochastic integral∫ t0 HsdM

fds is well defined and the following identity holds true∫ t

0HsdM

fds =

∫ t

0

∫RzH(s, ω)µf (ω, ds, dz).

A 7. For each f ∈ D(L), there exists a kernel Kf (ω, t, dz) such that

πf (dt, dz) = Kf (ω, t, dz)dt.

Lemma 4.2. Under A1, A2 and A9, for f1, f2 ∈ D(L), we have

(4.6) ⟨Mf1c,Mf2c⟩t =∫ t

0[f1, f2]s(Xs)−

∫Rz2K⟨f1,f2⟩(ω, s, dz)ds,

where

K⟨f1,f2⟩ =1

4(Kf1+f2 −Kf1−f2),

and Kf1+f2, Kf1−f2 are defined as in A 9.

Proof. First,

f(Xt) = f(X0) +

∫ t

0Lsf(Xs)ds+Mfc(t) +

∫ t

0

∫Rzµf (ω, ds, dz)

and

f2(Xt) = f2(X0) +

∫ t

0Lsf

2(Xs)ds+Mf2(t)(4.7)

where Mf2(t) is a RCLL local martingale. On the other hand, from Lemma 2.1, we know

Mft is quiasi-left-continuous so we can apply the Ito’s formula from Gihman and Skorohod

(1979, page 105),

f2(Xt) = f2(X0) +

∫ t

02f(Xs)Lsf(Xs)ds+

∫ t

02f(Xs)dM

fc(s) + ⟨Mfc,Mfc⟩t

+

∫ t

0

∫R[z2 + 2zf(X(s, ω)]µf (ω, ds, dz) +

∫ t

0

∫Rz2Kf (ω, s, dz)ds

= f2(X0) +

∫ t

02f(Xs)Lsf(Xs)ds+

∫ t

0

∫Rz2Kf (ω, s, dz)ds+ ⟨Mfc,Mfc⟩t(4.8)

+

∫ t

0

∫R[z2 + 2zf(X(s, ω)]µf (ω, ds, dz) +

∫ t

02f(Xs)dM

fc(s).

Note that ∫ t

0

∫R[z2 + 2zf(X(s, ω)]µf (ω, ds, dz)

is a local martingale and the decomposition uniqueness of special semi-martingale, then wehave

⟨Mfc,Mfc⟩t = ⟨Mf ,Mf ⟩t −∫ t

0

∫Rz2Kf (ω, s, dz)ds.

Thus we get

(4.9) ⟨Mfc,Mfc⟩t =∫ t

0[f, f ]s(Xs)−

∫Rz2Kf (ω, s, dz)ds


From polarization, we have

⟨Mf1c,Mf2c⟩t =1

4⟨Mf1c+f2c,Mf1c+f2c⟩t − ⟨Mf1c−f2c,Mf1c−f2c⟩t.

As a result,

⟨Mf1c,Mf2c⟩t = ⟨Mf1 ,Mf2⟩t −∫ t

0

∫Rz2Kf1,f2(ω, s, dz)ds.

Thus

(4.10) ⟨Mf1c,Mf2c⟩t =∫ t

0[f1, f2]s(Xs)−

∫Rz2Kf1,f2(ω, s, dz)ds

Following Davis (1980), for 0 ≤ s ≤ t, we introduce

(4.11) kts , exp−∫ t

s(yuLg(Xu) +

1

2|g(Xu)|2)du · exp(−

∫ t

syudM

gu),

which is a multiplicative functional. However, it is not Kac-type multiplicative functional[see Ito (2003), page 164. or Williams (1993), page 272 ] as there exists the stochasticintegration

∫ytdM

gt . Next, we can define the two-parameter semi-group on the Banach

space B(E).

(4.12) T ys,tf(x) = Ex[f(Xt)k

ts]

where Ex denotes expectation with Px, the probability measure starting from x ∈ E. Now,note that the law of X remains unchanged and is still µ, thus we have

(4.13) νt(f) =< T ys,tf, µ > .

Hence, to determine the dynamics of the robust filter νt(f), we can determine the dynamicsof T y

s,t, namely, its extended generator which is defined in Davis (1980):

Definition 4.4. Suppose kts is a multiplicative functional and T ys,t the corresponding two-

parameter semi-group defined in (2.29), then (J,D(J)) is called the extended generator of T ys,t

if for each f ∈ D(J) ⊂ M(E), there exist Jf ∈ M(E) such that Nfs,t is a local martingale,

where

(4.14) Nfs,t , ktsf(Xt)− f(Xs)−

∫ t

skus Jf(Xu)du.

Using the definition of extended generator, we turn to the path-dependent measure changein the Polish space E. Then, we can employ the martingale problem technique to derivethe extended generator. To achieve this, first note that

(4.15) ⟨Mg,Mg⟩t =∫ t

0[g, g]s(Xs)ds.

A 8. For any f ∈ D(L), there exists a kernel Kf (x, dz) such that

Kf (ω, t, dz) = Kf (X(t, ω), dz)

In particular, such kernel exists for the Hunt process and in such case, it is just the Levysystem (n(·, ·), ϕ).


Theorem 4.1. Suppose A1-A5, A8 hold true, then for any f ∈ D(L), νt(f) satisfies theevolution equation

νt(f) = ν0(f) +

∫ t

0νs(Lsf)ds+

∫ t

0νs(f−ysLsgs −

1

2|gs|2

+y2s2[g, g]s+

∫R(e−zys − 1 + zys + z2)Kg(s, dz))ds,

where

(4.16) Lt(f)(·) = Lt(f)(·)− yt[g, f ]t(·) +

∫R(e−ytz − 1)zKf,g(·, dz)

Proof. From lemma 4.2, we know

⟨Mgc,Mgc⟩t =∫ t

0[g, g]s(Xs)ds−

∫ t

0

∫Rz2Kg(ω, s, dz)ds

Thus, it follows that the pathwise evaluation representation takes the form,

(4.17) νt(f) = E[f(Xt) · γt · δt],

where

(4.18)

δt = exp(

∫ t

0

1

2y2sd⟨Mgc,Mgc⟩t−

∫ t

0ysLsgs(Xs)+

1

2|g(s,Xs)|2ds+

∫ t

0

∫R(e−zys−1+zys)K

g(ω, s, dz)ds),

and(4.19)

γt = exp(

∫ t

0−ysdM

gcs −1

2

∫ t

0y2sd⟨Mgc,Mgc⟩t−

∫ t

0

∫Rzysµ

g(ds, dz)−∫ t

0

∫R(e−zys−1+zys)K

g(ω, s, dz)ds),

From the generalized Ito formula (Gihman and Skorohod, 1979, Page 104 ), it follows that

(4.20) γt = 1−∫ t

0γsysdM

gcs +

∫ t

0

∫Rγs(e

−zys − 1)µg(ds, dz).

So γt is a local martingale and

γtf(Xt) = γtf(Xs) +

∫ t

sγuLuf(Xu)du−

∫ t

sγuyud⟨Mfc,Mgc⟩u +

∫ t

s

∫Rγu(e

−yuz − 1)zKf,g(u, dz)du+ It,

where It is a local martingale. Therefore, the extended generator of νt(f) satisfying

(4.21) Lt(f)(Xt) = Lt(f)(Xt)− yt[g, f ]t(Xt) +

∫R(e−ytz − 1)zKf,g(ω, t, dz)

From A8, there exist a Kf,g(x, dz) such that

(4.22) Lt(f)(Xt) = Lt(f)(Xt)− yt[g, f ]t(Xt) +

∫R(e−ytz − 1)zKf,g(X(t, ω), dz)

thus we have

(4.23) Lt(f)(·) = Lt(f)(·)− yt[g, f ]t(·) +

∫R(e−ytz − 1)zKf,g(·, dz).


On the other hand, δt is a Feynman-Kac multiplicative functional and its effect on νt(f) isjust adding the following potential term

(4.24)1

2y2sd⟨Mgc,Mgc⟩t − ytLtgtdt−

1

2g2t dt−

∫R(e−zyt − 1 + zyt)K

g(ω, t, dz)dt

on extended generator.

(4.25) νt(f) , E(f(Xt)γt).

Finally, adding this potential term associated with Ξt, we can write the robust filter infor-mally as

νt(f) = ν0(f) +

∫ t

0νs(Lsf)ds+

∫ t

0νs(f−ysLsgs −

1

2|gs|2

+y2s2[g, g]s+

∫R(e−zys − 1 + zys + z2)Kg(s, dz))ds.

Hence the result.

5. Bayes Factor for Climate Model Selection

Here, we aim to develop the basic model selection structure using the Bayes factor, anddiscuss how to apply it to calibrate the more statistical reliable stochastic climate models.For sake of presentation, we consider the underlying climate state (i.e., the temperature)is in finite-dimensional space, that is, X ∈ Rnx . In this way, we do not consider the sto-chastic evolution equation of climate state with delayed operator (which thus lives in someinfinite-dimensional space). To this end, we can make average of our ice-core datum thus toremove the possible time-scale inconsistency in historical climate datum). The possible cli-mate evolution or measurement parameter θ ∈ Rnθ jointly satisfy the (D(A),A)-martingaleproblem, that is

(5.1) Mft = f(Xt, θ)− f(X0, θ)−

∫ t

0Af(Xs, θ)ds

is FX, θt −martingale for f ∈ D(A) ⊂ B(Rnx+nθ). Note that the Bayes factor we plan to

introduced will be denoted by L thus here, we denote the generator of martingale problem byA instead L as above. The martingale problem proposed by Stroock and Varadhan (1979)provides a general formulation of the Markov processes, and many stochastic systems canbe nested into this setup (see Ethier and Kurtz (1986)). In particular, we can exam twoclasses processes used in stochastic climate models:

Example 1. (Diffusion Climate Process)

Af(x, θ) =1

2c2(x, θ, t)

∂2f

∂x2(x, θ, t) + b(x, θ, t)

∂f

∂x(x, θ, t),

where b, c are drift and diffusion functions. D(A) is the set of bounded second-order continu-ously differentiable functions on Rnx+nθ . This includes the SRM climate model as its specialcase and in case the coefficients is non-Lipschtize, we can apply the truncation method tothe quadratic growth SDE.

Example 2. (Markov Jump Climate Process)

Af(x, θ) = λ(x, θ)

∫Rnx+nθ

(f(y, θ)− f(x, θ))µ(x, θ, dy),

where µ(x, θ, dy) is a transition function on (Rnx+nθ × B(Rnx+nθ)) and λ ≥ 0. D(A) =B(Rnx+nθ). This type model includes the possible climate state transmission such as theglacial-interglacial immigration due to the abrupt-change in climate mechanism.


Now consider the additive white noise observation model:

dYt = h(Xt, θ)dt+ dWt,(5.2)

where the observation noise W is a standard Brownian motion independent of (X, θ); thesensor function h = h(x, θ) satisfies the relaxed finite energy condition:

(5.3)

∫ T

0|h(Xt, θ)|2dt < ∞. a.s.

Note that the robust Bayes factor is denoted by g(t, x) thus here we denote the sensorfunction by H = h(t, x) instead g(t, x) to avoid confusion. The partial information system(1.1), (1.2) can be used to characterize a wide range of stochastic structures in appliedprobability (see Kallianpur (1980), Kailath and Poor (1998) etc. in signal processing; Duffieand Lando (2001), Back (2003), Frey and Runggaldier (2007) etc. in financial engineering).The purpose of this section can be roughly expressed in the following question: supposethere are two candidate models for partial information system,

M(1) : M(1), ft = f(Xt, θ)− f(X0, θ)−

∫ t

0A(1)f(Xs, θ)ds,(5.4)

dYt = h(1)(Xt, θ)dt+ dWt;

M(2) : M(2), ft = f(Xt, θ)− f(X0, θ)−

∫ t

0A(2)f(Xs, θ)ds,(5.5)

dYt = h(2)(Xt, θ)dt+ dWt.

We need evaluate which competing model best fits the observed data Ys : 0 ≤ s ≤ t.We propose a Bayesian approach to solve this problem with the help of Bayes factor. Ourapproach is efficient and reliable in that it can be updated recursively by incorporating thenew observations. We also illustrate how to define and compute the Bayes factor betweenthe competing models M(k) = (A(k), h(k)), k = 1, 2, and how it can be used to make modelcalibration.

The key point of Bayes factor is to transform all observation models into the same canon-ical process via Girsanov measure change. Note that, for k = 1, 2,

(5.6) L(k)t , exp

(∫ t

0h(k)(Xs, θ)dYs −

1

2

∫ t

0|h(k)(Xs, θ)|2ds

)is a Ft-martingale where Ft , FX,θ; Y

t . Thus

dPdQ(k)

∣∣∣Ft

= L(k)t

defines a probability measure Q(k)which is mutually absolutely continuous to P. As a stan-dard result of filtering theory (see Kallianpur (1980)), we have

Proposition 5.1. For M(k), k = 1, 2, Y is a Brownian motion independent of (X, θ) and

the law of (X, θ) keeps unchanged under Q(k).

For f ∈ B(Rnx+nθ), introduce the unnormalized filter σ(k) of (X, θ) for M(k), k = 1, 2:

(5.7) σ(k)(f, t) , EQ(k)(f(Xt, θ)L

(k)t |FY

t

).

It is remarkable that L(k)t is the joint likelihood of X, θ, Y in M(k) and

σ(k)(1, t) = EQ(k)(L

(k)t |FY

t )


is the integrated or marginal likelihood of Y . It characterizes the possibility that thehistorical observation Ys, 0 ≤ s ≤ t is generated by M(k), k = 1, 2. The Bayes factors

between M(1) and M(2) are defined as the ratio of the integrated likelihoods:

Definition 5.1. (Bayes factor)

B12(t) =σ(1)(1, t)

σ(2)(1, t)and B21(t) =

σ(2)(1, t)

σ(1)(1, t).(5.8)

Now introduce the filter ratio process:

Definition 5.2. (Filter ratio)

q1(f, t) =σ(1)(f, t)

σ(2)(1, t)and q2(f, t) =

σ(2)(f, t)

σ(1)(1, t).(5.9)

Then we have

(5.10) B12(t) = q1(1, t); B21(t) = q2(1, t).

As discussed in Jeffreys (1961), the Bayes factor, say B12, is the summary of the evidence

provided by the historical observation in favor of M(1) over M(2). Following Kass andRaftery (1995), Kouritzin and Zeng (2005), the Bayes factor can be explained by the fol-lowing table:

B12 Evidence of M(1) over M(2)

1− 3 Not worth more than a bare mention3− 12 Positive

12− 150 Strong> 150 Decisive

Now, the remaining issue is how to calculate the Bayes factor. As discussed in Kouritzinand Zeng (2005), there exist two alternatives to calculate the Bayes factor. The first one is

calculating σ(k)(1, t), k = 1, 2 respectively and then take the ratio. However, this approachis not always computationally efficient or numerically stable. It is quite possible that bothσ(1)(1, t) and σ(2)(1, t) get very large or very small as time evolves. Therefore, I focus on thesecond approach, i.e., to characterize the dynamics of the Bayes factor (or filter ratio) andthen implement it numerically. The following evolution equation of the filter ratio processis the main result of this paper.

Theorem 5.1. Suppose there are two models M(k), k = 1, 2, then for f ∈ D(A(k)),

dqk(f, t) =

qk(f, t) ·(qk(h

(3−k), t)

q3−k(1, t)

)2

+ qk(A(k)f, t)− qk(h

(k)f, t) · qk(h(3−k), t)

q3−k(1, t)

dt

(5.11)

+

[qk(h

(k)f, t)− qk(f, t) ·qk(h

(3−k), t)

q3−k(1, t)

]dYt.

Proof. Without loss of generality, we prove the result for k = 1. The unnormalized filter sat-isfies the following Duncan-Mortensen-Zakai (DMZ) equation (see Zakai (1969), Kallianpur

(1980)), for f ∈ D(A(1)),

(5.12) dσ(1)(f, t) = σ(1)(A(1)f, t)dt+ σ(1)(h(1)f, t)dYt,

(5.13) σ(2)(1, t) = σ(2)(1, 0) +

∫ t

0σ(2)(h(2), s)dYs.


Note that σ(1)(f, t), σ(2)(1, t) are both semi-martingales, so we can apply the Ito formula toσ(1)(f,t)

σ(2)(1,t). For notation simplicity, denote U(t) = σ(1)(f, t), V (t) = σ(2)(1, t) and we have

(5.14) dV −1(t) = −V −2(t)σ(2)(h(2), t)dYt + V −3(t)(σ(2)(h(2), t)

)2dt.

Moreover,

d(UV −1)(t) = U(t)d(V −1)(t) + V −1(t)dU(t) + d[U−1, V ]t

=

(−σ(1)(f, t)σ(1)(h(2), t)

V 2+

σ(1)(h(1)f, t)

V

)dYt

+

(σ(1)(A(1)f)

V+

σ(1)(f)σ(2)(h(2))σ(2)(h(2))

V 3− σ(1)(h(1)f)σ(2)(h(2))

V 2

)dt.

Note that

σ(2)(h(2), t)

V=

σ(2)(h(2),t)

σ(1)(1,t)

Vσ(1)(1,t)

=q(2)(h(2), t)

q(2)(1, t).

Therefore, we obtain the following evolution of the Bayes factor:

dq1(f, t) =

q1(f, t) ·(q1(h(2), t)

q2(1, t)

)2

+ q1(A(1)f, t)− q1(h

(1)f, t) · q1(h(2), t)

q2(1, t)

dt(5.15)

+

[q1(h

(1)f, t)− q1(f, t) ·q1(h

(2), t)

q2(1, t)

]dYt.

Similarly, we have

dq2(f, t) =

q2(f, t) ·(q2(h(1), t)

q1(1, t)

)2

+ q2(A(2)f, t)− q2(h

(2)f, t) · q2(h(1), t)

q1(1, t)

dt(5.16)

+

[q2(h

(2)f, t)− q2(f, t) ·q2(h

(1), t)

q1(1, t)

]dYt.

Hence the result. In particular,

Corollary 5.1. If M(k), k = 1, 2 coincide, then we have the Fujisaki-Kunita-Kallianpur(FKK) equation, for f ∈ D(A),

dπ(f, t) = π(Af, t)dt+ [π(hf, t)− π(f, t) · π(h, t)] (dYt − π(h, t)dt) ,(5.17)

where the normalized filter πt is defined as

(5.18) π(f, t) , E(f(Xt, θ)|FY

t

)for f ∈ B(Rnx+nθ).

Proof. In case M(k), k = 1, 2 coincide, we have

A(1) = A(2) = A, h(1) = h(2) = h.

In such case,

q1(f, t) =σ(1)(f, t)

σ(2)(1, t)=

σ(1)(f, t)

σ(1)(1, t)= π(f, t),


and the Bayes factor

q1(1, t) = q2(1, t) = 1.

Thus the evolution equation becomes

dπ(f, t) =[π(f, t) · π2(h, t) + π(Af, t)− π(hf, t) · π(h, t)

]dt

+ [π(hf, t)− π(f, t) · π(h, t)] dYt,

= π(Af, t)dt+ [π(hf, t)− π(f, t) · π(h, t)] (dYt − π(h, t)dt) .

6. Robust Bayes Factor for Climate Model Selection

The DMZ equation of Bayes filter involves some stochastic integration so the unnormal-ized filter σt is not easy to implement in real time to catch the rapid data change. Moreover,is is not robust to the possible modeling errors, as discussed in Clark (1978, 2005). Instead,here we adopt the robust evolution equation of the Bayes filter to characterize the Bayesfactor. Empirical results show the robust filter does indeed performs favorably when appliedto real data problem. Clark (1978) introduces the robust filter and some other importantworks on it include Davis (1980, 1981), Pardoux (1979), Heunis (1990). The robust Bayesfilter is closely related to the following gauge transform.

6.1. The evolution of robust Bayes factor.

Definition 6.1. Assume A1 and A2 hold true for M(k), k = 1, 2, then the gauge transformν(k) of (X, θ) is

(6.1) ν(k)(f, t) , EQ(k)[f(Xt, θ)L

(k)t exp(−Yth

(k)(Xt, θ))|FYt ]

for f ∈ B(Rnx+nθ).

It is equivalent to σ(k) because

(6.2) ν(k)(f, t) = σ(k)(f exp(−Yth(k))) and σ(k)(f, t) = ν(k)(f exp(Yth

(k))).

The following result gives the robust evolution of the gauge transform.

Theorem 6.1. Assume A1 and A2 hold true for model M(k), k = 1, 2, then ν(k)(f, t)satisfies the evolution equation

dν(k)(f, t)

dt= ν(k)(Ay, (k)f, t)(6.3)

+ ν(k)(f−YtA(k)h(k) − 1

2(h(k))2 +

Y 2t

2[h(k), h(k)], t)

for all f ∈ D(Ay, (k)) = D(A(k))and for f1, f2 ∈ D(A(k)),

Ay, (k)f = A(k)f − Yt[h(k), f ].(6.4)

Proof. See the Appendix.

Compared to DMZ equation, equation (6.3) has the property that its randomness onlyappears in the coefficients of the equation and no stochastic integration involved. It can beshown it is robust and continuously dependent to the modeling errors. This provides somecomputational and practical advantages in real-time online computation. Now introducethe robust filter ratio between M(1) and M(2):


Definition 6.2. (Robust filter ratio)

q1(f, t) ,ν(1)(f, t)

ν(2)(g(2), t)with g(2) = exp(Yth

(2)),(6.5)

q2(f, t) ,ν(2)(f, t)

ν(1)(g(1), t)with g(1) = exp(Yth

(1)).

Note that

q1(f, t) =σ(1)(f, t)

σ(2)(1, t)=

ν(1)(f exp(Yth(1)), t)

ν(2)(exp(Yth(2)), t)= q1(fg

(1), t).(6.6)

Therefore,

B12(t) = q1(1, t) = q1(g(1), t).(6.7)

Theorem 6.2. Suppose A1 and A2 hold true for M(k), k = 1, 2, then the robust filterratio qi, i = 1, 2 satisfies the following measure-valued evolution equation

dqi(f, t) = qi

(A

y, (i)t f, t

)dt

(6.8)

+ qi

(f−YtA

(i)h(i) − 1

2(h(i))2 +

Y 2t

2[h(i), h(i)], t

)dt

− qi(f, t)

q3−i(g(3−i), t)· q3−i

(A

y, (i)t g(3−i), t

)dt

− q3−i(f, t)

q3−i(g(3−i), t)· q3−i

(g(3−i)−YtA

(3−i)h(3−i) − 1

2(h(3−i))2 +

Y 2t

2[h(3−i), h(3−i)], t

)dt.

Proof. We have

d

dtν(1)(f, t) = ν(1)(Ay, (1)f, t)(6.9)

+ ν(1)(f−YtA(1)h(1) − 1

2(h(1))2 +

Y 2t

2[h(1), h(1)], t)

andd

dtν(2)(g(2), t) = ν(2)(Ay, (2)g(2), t)(6.10)

+ ν(2)(g(2)−YtA(2)h(2) − 1

2(h(2))2 +

Y 2t

2[h(2), h(2)], t)

Note that

q1(f, t) =ν(1)(f, t)

ν(2)(g(2), t)and q2(f, t) =

ν(2)(f, t)

ν(1)(g(1), t).

By the product rule of the common Lebesgue integral,

dq1(f, t) =ν(2)(g(2), t)dν(1)(f, t)− ν(1)(f, t)dν(2)(g(2), t)(

ν(2)(g(2), t))2

=dν(1)(f, t)

ν(2)(g(2), t)−

ν(1)(f, t)dν(2)(g(2), t)

ν(2)(g(2), t)ν(1)(g(1), t)

ν(2)(g(2), t)

ν(1)(g(1), t)

.

Therefore


dq1(f, t) =dν(1)(f, t)

ν(2)(g(2), t)−

ν(1)(f, t)

ν(2)(g(2), t)· dν(2)(g(2), t)

ν(1)(g(1), t)

ν(2)(g(2), t)

ν(1)(g(1), t)

=ν(1)(Ay, (1)f, t)

ν(2)(g(2), t)dt

+ν(1)(f−YtA

(1)h(1) − 12(h

(1))2 +Y 2t2 [h(1), h(1)], t)

ν(2)(g(2), t)dt

−ν(1)(f, t)

ν(2)(g(2), t)

ν(2)(g(2), t)

ν(1)(g(1), t)

· dν(2)(g(2), t)

ν(1)(g(1), t).

Thus

dq1(f, t) = q1

(A

y, (1)t f

)dt

+ q1

(f−YtA

(1)h(1) − 1

2(h(1))2 +

Y 2t

2[h(1), h(1)], t

)dt

− q1(f, t)

q2(g(2), t)·ν(2)

(A

y, (2)t g(2), t

)ν(1)(g(1), t)

dt

− q1(f, t)

q2(g(2), t)·ν(2)

(g(2)−YtA

(2)h(2) − 12(h

(2))2 +Y 2t2 [h(2), h(2)], t

)ν(1)(g(1), t)

dt.

That is

dq1(f, t) = q1

(A

y, (1)t f, t

)dt

+ q1

(f−YtA

(1)h(1) − 1

2(h(1))2 +

Y 2t

2[h(1), h(1)], t

)dt

− q1(f, t)

q2(g(2), t)· q2(A

y, (2)t g(2), t

)dt

− q2(f, t)

q2(g(2), t)· q2(g(2)−YtA

(2)h(2) − 1

2(h(2))2 +

Y 2t

2[h(2), h(2)], t

)dt.

Similarly, we have

dq2(f, t) = q2

(A

y, (2)t f, t

)dt

+ q2

(f−YtA

(2)h(2) − 1

2(h(2))2 +

Y 2t

2[h(2), h(2)], t

)dt

− q2(f, t)

q1(g(1), t)· q1(A

y, (2)t g(1), t

)dt

− q1(f, t)

q1(g(1), t)· q1(g(1)−YtA

(1)h(1) − 1

2(h(1))2 +

Y 2t

2[h(1), h(1)], t

)dt.

This completes the proof.


7. Particle filtering to model calibration

The evolution equation (6.3) does not admit the explicit solution excepts a few cases.Instead, we need some efficient and recursive numerical algorithm to implement it. Toavoid the “curse of dimensionality,” we propose the particle filtering algorithm that can bethought of a generalization of Del Moral, Noyer and Salut (1994). The particle filtering isbased on the following N -equalized time partitions

(7.1) τ0 = 0, τ1 =T

N, . . . τi =

iT

N, · · · τN = T.

The key point of particle filtering is to construct a sequence of particle pair such that theirempirical measures converge to the measure-valued process (q1, q2) as mN −→ ∞. Hereafter

we denote (P(1), kt , P

(2), kt )mN

k=1 the states of this particle pairs at time t.

7.1. Initialization.

• At τ0 = 0, we draw mN independent equally-weighted particle pairs with states

(P(1), k0 , P

(2), k0 )mN

k=1 satisfying the following conditions:

limN−→∞

mN = ∞,

limN−→∞

(φ(1)N (0), f) = q1(f, 0) ∀f ∈ B(Rnx+nθ),

limN−→∞

(φ(2)N (0), f) = q2(f, 0) ∀f ∈ B(Rnx+nθ).

Here,

φ(1)N (0) , 1

mN

mN∑k=1

δP

(1), k0

φ(2)N (0) , 1

mN

mN∑k=1

δP

(2), k0

Remark 7.1. Here, δx(·) is the Dirac measure at x. Because Y0 = 0 and L(k)0 = 1 for all

k = 1, 2, thus

σ(k)(f, 0) = ν(k)(f, 0) = qk(f, 0) ∀f ∈ B(Rnx+nθ), k = 1, 2;

σ(k)(1, 0) = ν(k)(1, 0) = qk(1, 0) = 1, k = 1, 2.

7.2. Evolution.

• During [τi−1, τi), i = 1, 2, · · · , N, the particles P (1), k, P (2), kmNk=1 move indepen-

dently to explore the state space. As θ is time-invariant, we only need consider thedynamics of X during this interval which turns out to be:

Lemma 7.1. Suppose model M(k), k = 1, 2, then under QY, (k) (defined in Appen-dix), X is a path-dependent diffusion process

dXt = bY, (k)(Xt, θ, t)dt+ c(k)(Xt, θ, t)dBt,(7.2)

where

bY, (k)(x, θ, t) =

[b(k)(x, θ, t)− Yt · (c(k)(x, θ, t))2

∂h(k)(x, θ)

∂x

]∂f(x, θ)

∂x.(7.3)

Proof. See Appendix.


7.3. Testing the weight.

• At time τi, the particles are respectively given a weight (ω(1), ki , ω

(2), ki )mN

k=1 based onits trajectory realized on [τi−1, τi) :

ω(1), ki = exp

(∫ τi

τi−1

(−YsA

(1)h(1)(P (1), ks )− 1

2(h(1))2(P (1), k

s ) +Y 2s

2[h(1), h(1)](P (1), k

s )

)ds

)

ω(2), ki = exp

(∫ τi

τi−1

(−YsA

(2)h(2)(P (2), ks )− 1

2(h(2))2(P (2), k

s ) +Y 2s

2[h(2), h(2)](P (2), k

s )

)ds

).

7.4. Re-sampling.

• At time τi, i = 1, 2, · · · , N, we give each particle component (P (1), k, P (2), k), k =

1, 2, · · · ,mN a weight and these weights (ω(1), ki , ω

(2), ki ) are stored along with the

states of particles before re-sampling. Introduce the average weight at τi as

ω(1)i , 1

mN

mN∑k=1

ω(1), ki , ω

(2)i , 1

mN

mN∑k=1

ω(2), ki(7.4)

The re-sampling procedure is: if a particle P (1), k has a weight ω(1), ki = rω

(1)i + z,

where r ∈ 0, 1, 2, · · · and z ∈ (0, ω(1)i ) before the re-sampling, then there will be

r or r + 1 particles at this state after the re-sampling with a probability selectedin order to leave the system unbiased. The same re-sampling procedure applies to

particle component P (2), k and average weight ω(2)i .

It is important to consider the variance minimization of partial filter, see Crisan and Lyons(1997), Crisan, Gaines and Lyons (1998).

The variance minimization is related to the linear-quadratic Gaussian (LQG) controlproblem essentially, see Huang and Yu (2014), Chen and Huang (2014). Especially, Chenand Huang (2014) is for delayed LQ problem.

8. Appendix

To simplify the notation, hereafter we suppress the superscripts and derive the resultsfor both M(1),M(2) simultaneously. We use y = ys, 0 ≤ s ≤ T to denote a specificobservation realization, that is, yt = Yt(ω).

8.1. Path-dependent measure transform.

Proposition 8.1. For given observation path y,

ν(f, t) = EQ[f(Xt, θ) · Γyt · Ξ

yt ],

where

Γyt , exp

(−∫ t

0ysdM

hs −

∫ t

0

y2s2[h, h](Xs, θ)ds

),(8.1)

Ξyt , exp

(∫ t

0

(−ysAh(Xs, θ)−

1

2h2(Xs, θ) +

y2s2[h, h](Xs, θ)

)ds

).

Proof. Due to the independence of B, (X, θ) and the integration by parts, we have

(8.2)

∫ t

0h(Xs, θ)dYs = Yth(Xt, θ)−

∫ t

0Ysdh(Xs, θ).


It follows that

Lt = exp (

∫ t

0h(Xs, θ)dYs −

1

2

∫ t

0h2(Xs, θ)ds) = exp (Yth(Xt, θ)−

∫ t

0Ysdh(Xs, θ)−

1

2

∫ t

0h2(Xs, θ)ds).

Consequently,

(8.3) ν(f, t) = EQ[f(Xt, θ) exp

(−∫ t

0Ysdh(Xs, θ)−

1

2

∫ t

0h2(Xs, θ)ds

)|FY

t

].

Therefore, for given observation path y,(8.4)

ν(f, t) = EQ[f(Xt, θ) exp

(−∫ t

0ysAh(Xs, θ)ds−

1

2

∫ t

0h2(Xs, θ)ds−

∫ t

0ysdM

hs

)].

Hence the result.

Lemma 8.1. Γyt is a continuous martingale.

Proof. It is easy to see that Γyt is a continuous process and

(8.5) dΓyt = −Γy

t ytdMht ,

which implies Γyt is a local martingale. Meanwhile, from A2,

E exp(1

2

∫ t

0[h, h](Xs, θ)ds) < ∞, ∀t ∈ [0, T ].

Then from the Novikov criteria, Γyt is a martingale.

Now, based on Proposition 8.1 and Lemma 8.1, we are ready to introduce the path-dependent probability transform.

Definition 8.1. For given yt, introduce the path-dependent probability measure Qy by

dQy

dQ

∣∣∣Ft

= Γyt .

After the first measure change to Q, the law of the signal process (X, θ) remains un-changed. In contrast, with the second path-dependent measure change, the law of X willbe changed and characterized by some observation-dependent martingale problem. Thegauge transform ν(·) then takes the form of Feynman-Kac multiplicative functional:

νt(f) = Ey[f(Xt) · Ξyt ],

where the expectation is under the measure Qy and

Ξyt = exp

(∫ t

0

(−ysAh(Xs, θ)−

1

2h2(Xs, θ) +

y2s2[h, h](Xs, θ)

)ds

).

Lemma 8.2. Assume A1 and A2, then under Qy, (X, θ) is the unique solution to the Ayt

martingale problem

(8.6) df(Xt, θ) = Ayt f(Xt, θ)dt+ dMf

t ,

where D(Ayt ) = D(A) and for f ∈ D(Ay),

Ayt f , Af − yt[h, f ],(8.7)

dMft , dMf

t + yt[h, f ](Xt, θ)dt.(8.8)


Proof. For f ∈ D(A), it is obvious that

(8.9) df(Xt, θ) = Ayt f(Xt, θ)dt+ dMf

t

and we only need to show Mft is a martingale. Due to the continuity of Γy, we have

⟨Mf ,Γy⟩t =∫ t

0−Γy

sys[f, h]ds

and

dMft = dMf

t − 1

Γyt

d[Mf ,Ay]t.(8.10)

Thus, from the Girsanov-Meyer theorem, Mft is a local martingale under Qy. Moreover,

from A2,

Ey[Mf , Mf ]t = EQ([Mf ,Mf ]Γy)t < ∞.

Then it follows that Mft is a martingale under Qy. This complete the proof.

8.2. Proof of Theorem 6.1.

Proof. We have

ν(f, t) = Ey(f(Xt, θ) · Ξt).

From Lemma 8.2 and integration by parts,

d(f(Xt, θ)Ξt) = Ayt f(Xt, θ)Ξtdt+ f(Xt, θ) · Ξt−ytAh(Xt, θ)−

1

2h2(Xt, θ)(8.11)

+y2t2[h, h](Xt, θ)dt+ ΞtdM

ft .

Note that∫ t0 ΞsdM

fs is a Qy-martingale under A1 and A2, then taking the expectation

under Qy, we get

ν(f, t) = ν(f, 0) +

∫ t

0ν(Ay

sf, s)ds+

∫ t

0ν

(f−ysAh− 1

2h2 +

y2s2[h, h], s

)ds.

Replace the fixed observation path yt with the observation process Yt to get the result. 8.3. Proof of Lemma 7.1.

Proof. Therefore, it suffices to show that under QY , (X, θ) is a solution of theAYmartingale

problem with generator

AYf(x, θ) =

1

2c2(x, θ, t)

∂2f

∂x2(x, θ) +

(b(x, θ, t)− c2(x, θ, t)

∂h(x, θ)

∂x

)∂f

∂x(x, θ).

Under QY , (X, θ) is a solution of the AY martingale problem with

AYt f , Atf − yt[h, f ].

Note that

(8.12) Af(x, θ) =1

2c2(x, θ, t)

∂2f(x, θ)

∂x2+ b(x, θ, t)

∂f(x, θ)

∂x,

then we have

A(hf) =1

2c2(x, θ, t)

[f∂2h(x, θ)

∂x2+ h

∂2f(x, θ)

∂x2+ 2

∂f

∂x

∂h

∂x(x, θ)

]+ b(x, θ, t)

[f∂h(x, θ)

∂x+ h

∂f(x, θ)

∂x

].


Therefore,

[h, f ] = c2(x, θ, t)∂f

∂x

∂h

∂x,

and

AYt f =

1

2c2(x, θ, t)

[∂2f(x, θ)

∂x2− 2yt

∂f

∂x

∂h

∂x(x, θ)

]+ b(x, θ, t)

∂f(x, θ)

∂x

=1

2c2(x, θ, t)

∂2f(x, θ)

∂x2+

[b(x, θ, t)− yt · c2(x, θ, t)

∂h(x, θ)

∂x

]∂f(x, θ)

∂x.

Thus

AY = AY.

Hence the result.

References

[1] Benzi, R., Sutera, A. and Vulpiani, A. (1981). The mechanism of stochastic resonance. J. Phys. A: Math.Gen. 14, L453-L457.

[2] Boulanger, C. and Schiltz, G. (1999). Nonlinear filtering with an infinite dimensional signal process.Portugaliae Mathematica 56, 345-359.

[3] Chen, L. and Huang, J. H. (2014). Stochastic maximum principle for controlled backward delayed systemvia advanced stochastic differential equation. J. Optim. Theory Appl. 10.1007/s10957-013-0386-5.

[4] Clark, J. M. C. (1978). The design of robust approximations to the stochastic differential equations ofnonlinear filtering. Communication Systems and Random Process Theory (Proc. 2nd NATO AdvancedStudy Inst., Darlinton. ), 721-734. NATO Advanced Study Inst. Ser., Ser.E: Appl. Sci., No. 25, Sijthoffand Noordhoff, Alphen aan den Rijn.

[5] Clark, J. M. C. and Crisan, D. (2005). On a robust version of the integral representation formula ofnonlinear filtering. Probab. Theory Related Fields 133, 43-56.

[6] Crisan, D. and Lyons, T. (1997). Nonlinear filtering and measure-valued processes. Probab. Theory Re-lated Fields 109, 217-244.

[7] Crisan, D., Gaines, J. and Lyons, T. (1998). Convergence of a branching particle method to the solutionof the Zakai equation. SIAM J. Appl. Math. 58, 1568-1590.

[8] Cuffey, K. M. and Steig, E. J. (1998). Isotope diffusion in polar firn: Implications for interpretationof seasonal climate parameters in ice core records, with emphasis on central Greenland. J. Glac. 44,273-284.

[9] Davis, M. H. A. (1980). On a multiplicative functional transformation arising in nonlinear filtering theory.Z. Wahrsch. Verw. Gebiete. 54, 125-139.

[10] Davis, M. H. A. (1981). Factorization of a multiplicative functional of nonlinear filtering theory. Systemsand Control Letters 1, 49-53.

[11] Ditlevsen, P. D. (1999a). Observation of α-stable noise induced millennial climate changes from anice-core record. Geophys, Res. Lett. 26, 1441-1444.

[12] Ditlevsen, P. D. (1999b). Anomalous jumping in a double-well potential. Physical Review E 60, 172-179.[13] Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes: Characterization and Convergence. Wiley,

New York.[14] Fujisaki, M., Kallianpur, G. and Kunita, H. (1972). Stochastic differential equations for the nonlinear

filtering problem. Osaka J. Math. 9, 19-40.[15] Gihman, I. I. and Skorohod, A. V. (1979). The Theory of Stochastic Processes III. Springer Verlag, New

York.[16] Heunis, A. J. (1990). On the stochastic differential equations of filtering theory. Applied Mathematics

and Computation 37, 185-218.[17] Holley, R. and Stroock, D. (1981). Diffusions on an infinite dimensional torus. Journal of Functional

Analysis 42, 29-63.[18] Huang, J. H. and Yu, Z. Y. (2014). Solvability of indefinite stochastic Riccati equations and linear

quadratic optimal control problems. Systems and Control Letters, accepted.[19] Ito, K. (2004). Stochastic Processes. Springer, Berlin.[20] Jacod, J. and Shirayev, A. N. (1987). Limit Theorems for Stochastic Processes. Springer Verlag, Berlin.


[21] Jarrow, R. and Protter, P. (2003). A short history of stochastic integration and mathematical finance.Online Lecture Notes.

[22] Kallianpur, G. (1980). Stochastic Filtering Theory. Springer Verlag, Heidelberg.[23] Kouritzin, M. and Long, H. W. (2006). On generalizing the classical filtering equations to financial

logstable models. Manuscript.[24] Kurtz, T. G. and Xiong, J. (1999). Particle representations for a class of nonlinear SPDEs. Stochastic

Processes and their Applications 83, 103-126.[25] Kushner, H. (1967). Dynamical equations for optimal nonlinear filtering. J. Differential. Equations 3,

179-190.[26] McConnell, J. R. et al. (2000). Changes in Greenland ice sheet elevation attributed primarily to snow

accumulation variability. Nature 406, 877-879.[27] Pardoux, E. (1979). Stochastic partial differential equations and filtering of diffusion processes. Stochas-

tics 3, 127-167.[28] Protter, P. (1990). Stochastic Integration and Differential Equations. Springer Verlag, Heidelberg.[29] Rogers, L. C. G. and Williams, D. (1994). Diffusions, Markov Processes, and Martingales. Second

Edition. Wiley, New York.[30] Shiga, T. and Shimizu A. (1980) Infinite dimensional stochastic differential equations and their appli-

cations. J. Math. Kyoto Univ. 20, 395-416.[31] Steig, E. J. et al. (2005). High-resolution ice cores from US ITASE (West Antarctica): development and

validation of chronologies and estimate of precision and accuracy. Annals of Glaciology 41, 77-84.[32] Stroock, D. W. and Varadhan, S. R. S. (1979). Multidimensional Diffusion Processes. Springer, Berlin.[33] Zakai, M. (1969). On the optimal filtering of diffusion processes. Z. Wahrschein. Verw. Geb. 11, 230-243.

†Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong.‡Department of Mathematics, University of Alberta, Edmonton, Alberta T6G 2G1 Canada.The first author acknowledges the financial support from RGC Earmarked grant 500909.

E-mail address: [email protected];

› ama › staff › new › Nonlinear...NONLINEAR FILTERING OF STOCHASTIC CLIMATE MODELS WITH...

Documents

Transcript of › ama › staff › new › Nonlinear...NONLINEAR FILTERING OF STOCHASTIC CLIMATE MODELS WITH...